Archive for the ‘statistics’Category

Have recent global health gains gone to the poor?

Have recent global gains gone to the poor in developing countries? Or the relatively rich? An answer:

We find that with the exception of HIV prevalence, where progress has, on average, been markedly pro-rich, progress on the MDG health outcome (health status) indicators has, on average, been neither pro-rich nor pro-poor. Average rates of progress are similar among the poorest 40 percent and among the richest 60 percent.

That’s Adam Wagstaff, Caryn Bredenkamp, and Leander Buisman in a new article titled “Progress on Global Health Goals: are the Poor Being Left Behind?” (full text here). The answer seems to be “mostly no, sometimes yes”, but the exceptions to the trend are as important as the trend itself.

I originally flagged this article to read because Wagstaff is one of the authors, and I drew on a lot of his work for my masters thesis (which looked at trends in global health inequities in Ethiopia). One example is this handy World Bank report (PDF) which is a how-to for creating concentration indexes and other measures of inequality, complete with Stata. A concentration index is essentially a health inequality version of the Gini index: instead of showing the concentration of wealth by wealth, or income by income, you measure the concentration of some measure of health by a measure of wealth or income, often the DHS wealth index since it’s widely available.

If your chosen measure of health — let’s say, infant mortality — doesn’t vary by wealth, then you’d graph a straight line at a 45 degree angle — sometimes called the line of equality. But in most societies the poor get relatively more of a bad health outcome (like mortality) and rather less of good things like access to vaccination. In both cases the graphed line would be a curve that differs from the line of equality, which is called a concentration curve. The further away from the line of equality the concentration curve is, the more unequal the distribution of the health outcome is. And the concentration index is simply twice the area between the two lines (again, the Gini index is the equivalent number when comparing income vs. income). The relationship between the two is illustrated in this example graph from my thesis:

You can also just compare, say, mortality rates for the top and bottom quintiles of the wealth distribution, or comparing the top 1% vs. bottom 99%, or virtually any other division, but all of those measures essential ignore a large amount of information in middle of the distribution, or require arbitrary cutoffs. The beauty of concentration curves and indexes is that they use all available information. An even better approach is to use multiple measures of inequality and see if the changes you see are sensitive to your choice of measures; it’s a more a convincing case if they’re not.

The new Wagstaff, Bredenkamp, and Buisman paper uses such concentration indexes, and other measures of inequity, to “examine differential progress on health Millennium Development Goals (MDGs) between the poor and the better off within countries.” They use a whopping 235 DHS and MICs surveys between 1990-2011, and find the following:

On average, the concentration index (the measure of relative inequality that we use) neither rose nor fell. A rosier picture emerges for MDG intervention indicators: whether we compare rates of change for the poorest 40 percent and richest 60 percent or consider changes in the concentration index, we find that progress has, on average, been pro-poor.

However, behind these broad-brush findings lie variations around the mean. Not all countries have progressed in an equally pro-poor way. In almost half of countries, (relative) inequality in child malnutrition and child mortality fell, but it also increased in almost half of countries, often quite markedly.We find some geographic concentration of pro-rich progress; in almost all countries in Asia, progress on underweight has been pro-rich, and in much of Africa, inequalities in under-five mortality have been growing. Even on the MDG intervention indicators, we find that a sizable fraction of countries have progressed in a pro-rich fashion.

They also compared variations that were common across countries vs. common across indicators — in other words, to see whether the differences across countries and indicators were because, say, some health interventions are just easier to reach the poorest with, and found that more of the variation came from differences between countries, rather than differences between indicators.

One discussion point they stress is that it’s been easier to promote equality in interventions, rather than equality in outcomes, and that part of the story is related to the quality of care that poorer citizens receive. From the discussion:

One hypothesis is that the quality of health care is worse for lower socioeconomic groups; though the poorest 40 percent may have experienced a larger percentage increase in, for example, antenatal visits, they have not observed the same improvement in the survival prospects of their babies. If true, this finding would point to the need for a monitoring framework that captures not only the quantity of care (as is currently the case) but also its quality.

04

08 2014

Born in the year of […]

I was looking for the Kenyan 2009 census data and came across that survey’s guide for enumerators (ie, data collectors) in PDF form, here. There’s an appendix towards the end — starting on page 60 of the PDF — that’s absolutely fascinating.

Collecting information on the age of a population is important for demographic purposes. But what do you do when a large proportion of people don’t have birth certificates? The Kenyan census has a list of prominent events from different regions to help connect remembered events to the years in which they happened.

This may well be standard practice for censuses — I’ve never worked on one — but the specific events chosen are interesting nonetheless. Here’s the start of the list for Kirinyaga County in Kenya:

So if you know you were born in the year of the famine of (or in?) Wangara, then you were 100 years old in 2009. Likewise, 1917 was notable for being the year that “strong round men were forced to join WWI”.

On the same note, the US birth certificate didn’t have an option for mother’s occupation until 1960! (That and other fascinating history here. Academic take here.) Also, there are 21 extant birth certificates from Ancient Rome.

09

04 2014

Data: big, small, and meta

When I read this New York Times piece back in August, I was in the midst of preparation and training for data collection at rural health facilities in Zambia. The Times piece profiles a group called Global Pulse that is doing good work on the ‘big data’ side of global health:

The efforts by Global Pulse and a growing collection of scientists at universities, companies and nonprofit groups have been given the label “Big Data for development.” It is a field of great opportunity and challenge. The goal, the scientists involved agree, is to bring real-time monitoring and prediction to development and aid programs. Projects and policies, they say, can move faster, adapt to changing circumstances and be more effective, helping to lift more communities out of poverty and even save lives.

Since I was gearing up for ‘field work’ (more on that here; I’ll get to it soon), I was struck at the time by the very different challenges one faces at the other end of the spectrum. Call it small data? And I connected the Global Pulse profile with this, by Wayan Vota, from just a few days before:

The Sneakernet Reality of Big Data in Africa

When I hear people talking about “big data” in the developing world, I always picture the school administrator I met in Tanzania and the reality of sneakernet data transmissions processes.

The school level administrator has more data than he knows what to do with. Years and years of student grades recorded in notebooks – the hand-written on paper kind of notebooks. Each teacher records her student attendance and grades in one notebook, which the principal then records in his notebook. At the local district level, each principal’s notebook is recorded into a master dataset for that area, which is then aggregated at the regional, state, and national level in even more hand-written journals… Finally, it reaches the Minister of Education as a printed-out computer-generated report, complied by ministerial staff from those journals that finally make it to the ministry, and are not destroyed by water, rot, insects, or just plain misplacement or loss. Note that no where along the way is this data digitized and even at the ministerial level, the data isn’t necessarily deeply analyzed or shared widely….

And to be realistic, until countries invest in this basic, unsexy, and often ignored level of infrastructure, we’ll never have “big data” nor Open Data in Tanzania or anywhere else. (Read the rest here.)

Right on. And sure enough two weeks later I found myself elbow-deep in data that looked like this — “Sneakernet” in action:

In many countries a quite a lot of data — of varying quality — exists, but it’s often formatted like the above. Optimistically, it may get used for local decisions, and eventually for high-level policy decisions when it’s months or years out of date. There’s a lot of hard, good work being done to improve these systems (more often by residents of low-income countries, sometimes by foreigners), but still far too little. This data is certainly primary, in the sense that was collected on individuals, or by facilities, or about communities, but there are huge problems with quality, and with the sneakernet by which it gets back to policymakers, researchers, and (sometimes) citizens.

For the sake of quick reference, I keep a folder on my computer that has — for each of the countries I work in — most of the major recent ultimate sources of nationally-representative health data. All too often the only high-quality ultimate source is the most recent Demographic and Health Survey, surely one of the greatest public goods provided by the US government’s aid agency. (I think I’m paraphrasing Angus Deaton here, but can’t recall the source.) When I spent a summer doing epidemiology research with the New York City Department of Health and Mental Hygiene, I was struck by just how many rich data sources there were to draw on, at least compared to low-income countries. Very often there just isn’t much primary data on which to build.

On the other end of the spectrum is what you might call the metadata of global health. When I think about the work the folks I know in global health — classmates, professors, acquaintances, and occasionally thought not often me — do day to day, much of it is generating metadata. This is research or analysis derived from the primary data, and thus relying on its quality. It’s usually smart, almost always well-intentioned, and often well-packaged, but this towering edifice of effort is erected over a foundation of primary data; the metadata sometimes gives the appearance of being primary, when you dig down the sources often point back to those one or three ultimate data sources.

That’s not to say that generating this metadata is bad: for instance, modeling impacts of policy decisions given the best available data is still the best way to sift through competing health policy priorities if you want to have the greatest impact. Or a more cynical take: the technocratic nature of global health decision-making requires that we either have this data or, in its absence, impute it. But regardless of the value of certain targeted bits of the metadata, there’s the question of the overall balance of investment in primary vs. secondary-to-meta data, and my view — somewhat ironically derived entirely from anecdotes — is that we should be investing a lot more in the former.

One way to frame this trade-off is to ask, when considering a research project or academic institute or whatnot, whether the money spent on that project might result in more value for money if it was spent instead training data collectors and statistics offices, or supporting primary data collection (e.g., funding household surveys) in low-income countries. I think in many cases the answer will be clear, perhaps to everyone except those directly generating the metadata.

That does not mean that none of this metadata is worthwhile. On the contrary, some of it is absolutely essential. But a lot isn’t, and there are opportunity costs to any investment, a choice between investing in data collection and statistics systems in low-income countries, vs. research projects where most of the money will ultimately stay in high-income countries, and the causal pathway to impact is much less direct.  

Looping back to the original link, one way to think of the ‘big data’ efforts like Global Pulse is that they’re not metadata at all, but an attempt to find new sources of primary data. Because there are so few good sources of data that get funded, or that filter through the sneakernet, the hope is that mobile phone usage and search terms and whatnot can be mined to give us entirely new primary data, on which to build new pyramids of metadata, and with which to make policy decisions, skipping the sneakernet altogether. That would be pretty cool if it works out.

The Napoleon cohort

I’ve recently had to think through two problems related to tracking cohorts over time, and each time I’ve mentally referred back to what is considered by some to be the greatest data visualization of all time.

Charles Joseph Minard, an engineer, created the graphic below: “Carte figurative des pertes successives en hommes de l’Armée Française dans la campagne de Russie 1812-1813” (loosely translated as “don’t follow Napoleon or anyone else when launching a land war in Asia”).

This single picture shows the size of the army as it entered Russia, then the size as it left, their relative geographic location, groups leaving and re-entering the force, and the temperature the army faced as they returned.  And to me it meets one of the main tests for “is this graphic great?” — it sticks in my head and I find myself referring back to it again and again.

24

07 2013

Slow down there

Max Fisher has a piece in the Washington Post presenting “The amazing, surprising, Africa-driven demographic future of the Earth, in 9 charts”. While he notes that the numbers are “just projections and could change significantly under unforeseen circumstances” the graphs don’t give any sense of the huge uncertainty involved in projecting trends out 90 years in the future.

Here’s the first graph:

 

The population growth in Africa here is a result of much higher fertility rates, and a projected slower decline in those rates.

But those projected rates have huge margins of error. Here’s the total fertility rate, or “the average number of children that would be born to a woman over her lifetime”  for Nigeria, with confidence intervals that give you a sense of just how little we know about the future:

That’s a lot of uncertainty! (Image from here, which I found thanks to a commenter on the WaPo piece.)

It’s also worth noting that if you had made similar projections 87 years ago, in 1926, it would have been hard to anticipate World War II, hormonal birth control, and AIDS, amongst other things.

18

07 2013

Typhoid counterfactuals

An acquaintance (who doesn’t work in public health) recently got typhoid while traveling. She noted that she had had the typhoid vaccine less than a year ago but got sick anyway. Surprisingly to me, even though she knew “the vaccine was only about 50% effective” she now felt that it was  a mistake to have gotten the vaccine. Why? “If you’re going to get the vaccine and still get typhoid, what’s the point?”

I disagreed but am afraid my defense wasn’t particularly eloquent in the moment: I tried to say that, well, if it’s 50% effective and you and, I both got the vaccine, then only one of us would get typhoid instead of both of us. That’s better, right? You just drew the short straw. Or, if you would have otherwise gotten typhoid twice, now you’ll only get it once!

These answers weren’t reassuring in part because thinking counterfactually — what I was trying to do — isn’t always easy. Epidemiologists do this because they’re typically told ad nauseum to approach causal questions by first thinking “how could I observe the counterfactual?” At one point after finishing my epidemiology coursework I started writing a post called “The Top 10 Things You’ll Learn in Public Health Grad School” and three or four of the ten were going to be “think counterfactually!”

A particularly artificial and clean way of observing this difference — between what happened and what could have otherwise happened — is to randomly assign people to two groups (say, vaccine and placebo). If the groups are big enough to average out any differences between them, then the differences in sickness you observe are due to the vaccine. It’s more complicated in practice, but that’s where we get numbers like the efficacy of the typhoid vaccine — which is actually a bit higher than 50%.

You can probably see where this is going: while the randomized trial gives you the average effect, for any given individual in the trial they might or might not get sick. Then, because any individual is assigned only to the treatment or control, it’s hard to pin their outcome (sick vs. not sick) on that alone. It’s often impossible to get an exhaustive picture of individual risk factors and exposures so as to explain exactly which individuals will get sick or not in advance. All you get is an average, and while the average effect is really, really important, it’s not everything.

This is related somewhat to Andrew Gelman’s recent distinction between forward and reverse causal questions, which he defines as follows:

1. Forward causal inference. What might happen if we do X? What are the effects of smoking on health, the effects of schooling on knowledge, the effect of campaigns on election outcomes, and so forth?

2. Reverse causal inference. What causes Y? Why do more attractive people earn more money? Why do many poor people vote for Republicans and rich people vote for Democrats? Why did the economy collapse?

The randomized trial tries to give us an estimate of the forward causal question. But for someone who already got sick, the reverse causal question is primary, and the answer that “you were 50% less likely to have gotten sick” is hard to internalize. As Gelman says:

But reverse causal questions are important too. They’re a natural way to think (consider the importance of the word “Why”) and are arguably more important than forward questions. In many ways, it is the reverse causal questions that lead to the experiments and observational studies that we use to answer the forward questions.

The moral of the story — other than not sharing your disease history with a causal inference buff — is that reconciling the quantitative, average answers we get from the forward questions with the individual experience won’t always be intuitive.

17

07 2013

(Not) knowing it all along

David McKenzie is one of the guys behind the World Bank’s excellent and incredibly wonky Development Impact blog. He came to Princeton to present on a new paper with Gustavo Henrique de Andrade and Miriam Bruhn, “A Helping Hand or the Long Arm of the Law? Experimental evidence on what governments can do to formalize firms” (PDF). The subject matter — trying to get small, informal companies to register with the government — is outside my area of expertise. But I thought there were a couple methodologically interesting bits:

First, there’s an interesting ethical dimension, as one of their several interventions tested was increasing the likelihood that a firm would be visited by a government inspector (i.e., that the law would be enforced). From page 10:

In particular, if a firm owner were interviewed about their formality status, it may not be considered ethical to then use this information to potentially assign an inspector to visit them. Even if it were considered ethical (since the government has a right to ask firm owners about their formality status, and also a right to conduct inspections), we were still concerned that individuals who were interviewed in a baseline survey and then received an inspection may be unwilling to respond to a follow-up. Therefore a listing stage was done which did not involve talking to the firm owner.

In other words, all their baseline data was collected without actually talking to the firms they were studying — check out the paper for more on how they did that.

Second, they did something that could (and maybe should) be incorporated into many evaluations with relative ease. Because findings often seem obvious after we hear them, McKenzie et al. asked the government staff whose program they were evaluating to estimate what the impact would be before the results were in. Here’s that section (emphasis added):

A standard question with impact evaluations is whether they deliver new knowledge or merely formally confirm the beliefs that policymakers already have (Groh et al, 2012). In order to measure whether the results differ from what was anticipated, in January 2012 (before any results were known) we elicited the expectations of the Descomplicar [government policy] team as to what they thought the impacts of the different treatments would be. Their team expected that 4 percent of the control group would register for SIMPLES [the formalization program] between the baseline and follow-up surveys. We see from Table 7 that this is an overestimate…

They then expected the communication only group to double this rate, so that 8 percent would register, that the free cost treatment would lead to 15 percent registering, and that the inspector treatment would lead to 25 percent registering…. The zero or negative impacts of the communication and free cost treatments therefore are a surprise. The overall impact of the inspector treatment is much lower than expected, but is in line with the IV estimates, suggesting the Descomplicar team have a reasonable sense of what to expect when an inspection actually occurs, but may have overestimated the amount of new inspections that would take place. Their expectation of a lack of impact for the indirect inspector treatment was also accurate.

This establishes exactly what in the results was a surprise and what wasn’t. It might also make sense for researchers to ask both the policymakers they’re working with and some group of researchers who study the same subject to give such responses; it would certainly help make a case for the value of (some) studies.

Fun projects are fun

Jay Ulfelder, of the blog Dart-Throwing Chimp, recently wrote a short piece in praise of fun projects. He links to my Hunger Games survival analysis, and Alex Hanna’s recent application of survival analysis to a reality TV show, RuPaul’s Drag Race. (That single Hunger Games post has accounted for about one-third of the ~100k page views this blog got in the last year!) Jay’s post reminded me that I never shared links to Alex’s survival analysis, which is a shame, so here goes:

First, there’s “Lipsyncing for your life: a survival analysis of RuPaul’s Drag Race”:

I don’t know if this occurs with other reality shows (this is the first I’ve been taken with), but there is some element of prediction involved in knowing who will come out as the winner. A drag queen we spoke with at Plan B suggested that the length of time each queen appears in the season preview is an indicator, while Homoviper’s “index” is largely based on a more qualitative, hermeneutic analysis. I figured, hey, we could probably build a statistical model to know which factors are the most determinative in winning the competition.

And then come two follow-ups, where Alex digs into predictions for the next episode of the current season, and again for the one after that. That last post is a great little lesson on the importance of the proportional hazards assumption.

I strongly agree with this bit from Jay’s post about the value of these projects:

Based on personal experience, I’m a big believer in learning by doing. Concepts don’t stick in my brain when I only read about them; I’ve got to see the concepts in action and attach them to familiar contexts and examples to really see what’s going on.

Right on. And in addition to being useful, these projects are, well, fun!

02

04 2013

This beautiful graphic is not really that useful

This beautiful infographic from the excellent blog Information is Beautiful has been making the rounds. You can see a bigger version here, and it’s worth poking around for a bit. The creators take all deaths from the 20th century (drawing from several sources) and represent their relative contribution with circles:

I appreciate their footnote that says the graphic has “some inevitable double-counting, broad estimation and ball-park figures.” That’s certainly true, but the inevitably approximate nature of these numbers isn’t my beef.

The problem is that I don’t think raw numbers of deaths tell us very much, and can actually be quite misleading. Someone who saw only this infographic might well end up less well-informed than if they didn’t see it. Looking at the red circles you get the impression that non-communicable and infectious diseases were roughly equivalent in importance in the 20th century, followed by “humanity” (war, murder, etc) and cancer.

The root problem is that mortality is inevitable for everyone, everywhere. This graphic lumps together pneumonia deaths at age 1 with car accidents at age 20, and cancer deaths at 50 with heart disease deaths at 80. We typically don’t  (and I would argue should’t) assign the same weight to a death in childhood or the prime of life with one that comes at the end of a long, satisfying life.  The end result is that this graphic greatly overemphasizes the importance of non-communicable diseases in the 20th century — that’s the impression most laypeople will walk away with.

A more useful graphic might use the same circles to show the years of life lost (or something like DALYs or QALYs) because those get a bit closer at what we care about. No single number is actually  all that great, so we can get a better understanding if we look at several different outcomes (which is one problem with any visualization). But I think raw mortality numbers are particularly misleading.

To be fair, this graphic was commissioned by Wellcome as “artwork” for a London exhibition, so maybe it should be judged by a different standard…

26

03 2013

First responses to DEVTA roll in

In my last post I highlighted the findings from the DEVTA trial of deworming in Vitamin A in India, noting that the Vitamin A results would be more controversial. I said I expected commentaries over the coming months, but we didn’t have to wait that long after all.

First is a BBC Health Check program features a discussion of DEVTA with Richard Peto, one of the study’s authors. It’s for a general audience so it doesn’t get very technical, and because of that it really grated when they described this as a “clinical trial,” as that has certain connotations of rigor that aren’t reflected in the design of the study. If DEVTA is a clinical trial, then so was

Peto also says there were two reasons for the massive delay in publishing the trial, 1) time to check things and “get it straight,” and 2) that they were ” afraid of putting up a trial with a false negative.” [An aside for those interested in publication bias issues: can you imagine an author with strong positive findings ever saying the same thing about avoiding false positives?!]

Peto ends by sounding fairly neutral re: Vitamin A (portraying himself in a middle position between advocates in favor and skeptics opposed) but acknowledges that with their meta-analysis results Vitamin A is still “cost-effective by many criteria.”

Second is a commentary in The Lancet by Al Sommers, Keith West, and Reynaldo Martorell. A little history: Sommers ran the first big Vitamin A trials in Sumtra (published in 1986) and is the former dean of the Johns Hopkins School of Public Health.  (Sommers’ long-term friendship with Michael Bloomberg, who went to Hopkins as an undergrad, is also one reason the latter is so big on public health.) For more background, here’s a recent JHU story on Sommers’ receiving a $1 million research prize in part for his work on Vitamin A.

Part of their commentary is excerpted below, with my highlights in bold:

But this was neither a rigorously conducted nor acceptably executed efficacy trial: children were not enumerated, consented, formally enrolled, or carefully followed up for vital events, which is the reason there is no CONSORT diagram. Coverage was ascertained from logbooks of overworked government community workers (anganwadi workers), and verified by a small number of supervisors who periodically visited randomly selected anganwadi workers to question and examine children who these workers gathered for them. Both anganwadi worker self-reports, and the validation procedures, are fraught with potential bias that would inflate the actual coverage.

To achieve 96% coverage in Uttar Pradesh in children found in the anganwadi workers’ registries would have been an astonishing feat; covering 72% of children not found in the anganwadi workers’ registries seems even more improbable. In 2005—06, shortly after DEVTA ended, only 6·1% of children aged 6—59 months in Uttar Pradesh were reported to have received a vitamin A supplement in the previous 6 months according to results from the National Family Health Survey, a national household survey representative at national and state level…. Thus, it is hard to understand how DEVTA ramped up coverage to extremely high levels (and if it did, why so little of this effort was sustained). DEVTA provided the anganwadi workers with less than half a day’s training and minimal if any incentive.

They also note that the study funding was minimalist compared to more rigorous studies, which may be an indication of quality. And as an indication that there will almost certainly be alternative meta-analyses that weight the different studies differently:

We are also concerned that Awasthi and colleagues included the results from this study, which is really a programme evaluation, in a meta-analysis in which all of the positive studies were rigorously designed and conducted efficacy trials and thus represented a much higher level of evidence. Compounding the problem, Awasthi and colleagues used a fixed-effects analytical model, which dramatically overweights the results of their negative findings from a single population setting. The size of a study says nothing about the quality of its data or the generalisability of its findings.

I’m sure there will be more commentaries to follow. In my previous post I noted that I’m still trying to wrap my head around the findings, and I think that’s still right. If I had time I’d dig into this a bit more, especially the relationship with the Indian National Family Health Survey. But for now I think it’s safe to say that two parsimonious explanations for how to reconcile DEVTA with the prior research are emerging:

1. DEVTA wasn’t all that rigorous and thus never achieved the high population coverage levels necessary to have a strong mortality impact; the mortality impact was attenuated by poor coverage, resulting in the lack of a statistically significant effect in line with prior results. Thus is shouldn’t move our priors all that much. (Sommers et al. seem to be arguing for this.) Or,

2. There’s some underlying change in the populations between the older studies and these newer studies that causes the effect of Vitamin A to decline — this could be nutrition, vaccination status, shifting causes of mortality, etc. If you believe this, then you might discount studies because they’re older.

(h/t to @karengrepin for the Lancet commentary.)

25

03 2013