An acquaintance (who doesn’t work in public health) recently got typhoid while traveling. She noted that she had had the typhoid vaccine less than a year ago but got sick anyway. Surprisingly to me, even though she knew “the vaccine was only about 50% effective” she now felt that it was a mistake to have gotten the vaccine. Why? “If you’re going to get the vaccine and still get typhoid, what’s the point?”
I disagreed but am afraid my defense wasn’t particularly eloquent in the moment: I tried to say that, well, if it’s 50% effective and you and, I both got the vaccine, then only one of us would get typhoid instead of both of us. That’s better, right? You just drew the short straw. Or, if you would have otherwise gotten typhoid twice, now you’ll only get it once!
These answers weren’t reassuring in part because thinking counterfactually — what I was trying to do — isn’t always easy. Epidemiologists do this because they’re typically told ad nauseum to approach causal questions by first thinking “how could I observe the counterfactual?” At one point after finishing my epidemiology coursework I started writing a post called “The Top 10 Things You’ll Learn in Public Health Grad School” and three or four of the ten were going to be “think counterfactually!”
A particularly artificial and clean way of observing this difference — between what happened and what could have otherwise happened — is to randomly assign people to two groups (say, vaccine and placebo). If the groups are big enough to average out any differences between them, then the differences in sickness you observe are due to the vaccine. It’s more complicated in practice, but that’s where we get numbers like the efficacy of the typhoid vaccine — which is actually a bit higher than 50%.
You can probably see where this is going: while the randomized trial gives you the average effect, for any given individual in the trial they might or might not get sick. Then, because any individual is assigned only to the treatment or control, it’s hard to pin their outcome (sick vs. not sick) on that alone. It’s often impossible to get an exhaustive picture of individual risk factors and exposures so as to explain exactly which individuals will get sick or not in advance. All you get is an average, and while the average effect is really, really important, it’s not everything.
This is related somewhat to Andrew Gelman’s recent distinction between forward and reverse causal questions, which he defines as follows:
1. Forward causal inference. What might happen if we do X? What are the effects of smoking on health, the effects of schooling on knowledge, the effect of campaigns on election outcomes, and so forth?
2. Reverse causal inference. What causes Y? Why do more attractive people earn more money? Why do many poor people vote for Republicans and rich people vote for Democrats? Why did the economy collapse?
The randomized trial tries to give us an estimate of the forward causal question. But for someone who already got sick, the reverse causal question is primary, and the answer that “you were 50% less likely to have gotten sick” is hard to internalize. As Gelman says:
But reverse causal questions are important too. They’re a natural way to think (consider the importance of the word “Why”) and are arguably more important than forward questions. In many ways, it is the reverse causal questions that lead to the experiments and observational studies that we use to answer the forward questions.
The moral of the story — other than not sharing your disease history with a causal inference buff — is that reconciling the quantitative, average answers we get from the forward questions with the individual experience won’t always be intuitive.