Stats lingo in econometrics and epidemiology

Last week I came across an article I wish I’d found a year or two ago: “Glossary for econometrics and epidemiology” (PDF from JSTOR, ungated version here) by Gunasekara, Carter, and Blakely.

Statistics is to some extent a common language for the social sciences, but there are also big variations in language that can cause problems when students and scholars try to read literature from outside their fields. I first learned epidemiology and biostatistics at a school of public health, and now this year I’m taking econometrics from an economist, as well as other classes that draw heavily on the economics literature.

Friends in my economics-centered program have asked me “what’s biostatistics?” Likewise, public health friends have asked “what’s econometrics?” (or just commented that it’s a silly name). In reality both fields use many of the same techniques with different language and emphases. The Gunasekara, Carter, and Blakely glossary linked above covers the following terms, amongst others:

  • confounding
  • endogeneity and endogenous variables
  • exogenous variables
  • simultaneity, social drift, social selection, and reverse causality
  • instrumental variables
  • intermediate or mediating variables
  • multicollinearity
  • omitted variable bias
  • unobserved heterogeneity

If you’ve only studied econometrics or biostatistics, chances are at least some of these terms will be new to you, even though most have roughly equivalent forms in the other field.

Outside of differing language, another difference is in the frequency with which techniques are used. For instance, instrumental variables seem (to me) to be under-used in public health / epidemiology applications. I took four terms of biostatistics at Johns Hopkins and don’t recall instrumental variables being mentioned even once! On the other hand, economists just recently discovered randomized trials. (Now they’re more widely used) .

But even within a given statistical technique there are important differences. You might think that all social scientists doing, say, multiple linear regression to analyze observational data or critiquing the results of randomized controlled trials would use the same language. In my experience they not only use different vocabulary for the same things, they also emphasize different things. About a third to half of my epidemiology coursework involved establishing causal models (often with directed acyclic graphs)  in order to understand which confounding variables to control for in a regression, whereas in econometrics we (very!) briefly discussed how to decide which covariates might cause omitted variable bias. These discussions were basically about the same thing, but they differed in terms of language and in terms of emphasis.

I think an understanding of how and why researchers from different fields talk about things differently helps you to understand the sociology and motivations of each field.  This is all related to what Marc Bellemare calls the ongoing “methodological convergence in the social sciences.” As research becomes more interdisciplinary — and as any applications of research are much more likely to require interdisciplinary knowledge — understanding how researchers trained in different academic schools think and talk will become increasingly important.

4 Comments Add Yours ↓

The upper is the most recent comment

  1. 1

    Cool post. I never even thought about the fact that instrumental variables were never mentioned in my epi program while they were definitely emphasized in econometrics and health economics (within the economics dept). I just asked the 3 other epis in my office and none had heard about IVs.
    (This is why I was so excited to come across your site, since I also have the econ/epi background)

  2. 3

    I’m an epi with no experience in econ & don’t think I have ever heard the term ‘instrumental variable’ either. I am likely misunderstanding the meaning, but the only public health scenarios I can imagine where it would be anything but an effect modifier would involve behavioral sciences research (which I admit I have limited experience with).

    • 4

      Mary — I think there are actually a lot of scenarios where they would be useful (the challenge is finding good instruments, which are often debatable, but my impression is that the technique isn’t even covered at all, which is different). I think IV’s could be useful in pretty much any situation where you want to estimate the causal effect of a variable that you’re not able to randomize — and that comes up a lot in epi! Epi papers (see the recent one on red meat consumption and mortality risk that got a lot of media attention) often present observational data and make *no attempt* to estimate causality with more sophisticated matters. Of course, on the other end of the spectrum one can argue that reliance on *bad* instruments can give you unrealistic levels of confidence in causal estimates where the data is just too difficult to parse… but I think at least attempting such adjustments is a good place to start. I’d recommend the Hernan paper I linked to above — it’s an intro to IV for epidemiologists who haven’t heard of it.



Your Comment to Brett Keller