Archive for the ‘methodological quibbles’Category

Mimicking success

If you don’t know what works, there can be an understandable temptation to try to create a picture that more closely resembles things that work. In some of his presentations on the dire state of student learning around the world, Lant Pritchett invokes the zoological concept of isomorphic mimicry: the adoption of the camouflage of organizational forms that are successful elsewhere to hide their actual dysfunction. (Think, for example, of a harmless snake that has the same size and coloring as a very venomous snake — potential predators might not be able to tell the difference, and so they assume both have the same deadly qualities.)

For our illustrative purposes here, this could mean in practice that some leaders believe that, since good schools in advanced countries have lots of computers, it will follow that, if computers are put into poor schools, they will look more like the good schools. The hope is that, in the process, the poor schools will somehow (magically?) become good, or at least better than they previously were. Such inclinations can nicely complement the “edifice complex” of certain political leaders who wish to leave a lasting, tangible, physical legacy of their benevolent rule. Where this once meant a gleaming monument soaring towards the heavens, in the 21st century this can mean rows of shiny new computers in shiny new computer classrooms.

That’s from this EduTech post by Michael Trucano. It’s about the recent evaluations showing no impact from the One Laptop per Child (OLPC) program, but I think the broader idea can be applied to health programs as well. For a moment let’s apply it to interventions designed to prevent maternal mortality. Maternal mortality is notoriously hard to measure because it is — in the statistical sense — quite rare. While many ‘rates’ (which are often not actual rates, but that’s another story) in public health are expressed with denominators of 1,000 (live births, for example), maternal mortality uses a denominator of 100,000 to make the numerators a similar order of magnitude.

That means that you can rarely measure maternal mortality directly — even with huge sample sizes you get massive confidence intervals that make it difficult to say whether things are getting worse, staying the same, or improving. Instead we typically measure indirect things, like the coverage of interventions that have been shown (in more rigorous studies) to reduce maternal morbidity or mortality. And sometimes we measure health systems things that have been shown to affect coverage of interventions… and so forth. The worry is that at some point you’re measuring the sort of things that can be improved — at least superficially — without having any real impact.

All that to say: 1) it’s important to measure the right thing, 2) determining what that ‘right thing’ is will always be difficult, and 3) it’s good to step back every now and then and think about whether the thing you’re funding or promoting or evaluating is really the thing you care about or if you’re just measuring “organizational forms” that camouflage the thing you care about.

(Recent blog coverage of the OLPC evaluations here and here.)

05

07 2012

Stats lingo in econometrics and epidemiology

Last week I came across an article I wish I’d found a year or two ago: “Glossary for econometrics and epidemiology” (PDF from JSTOR, ungated version here) by Gunasekara, Carter, and Blakely.

Statistics is to some extent a common language for the social sciences, but there are also big variations in language that can cause problems when students and scholars try to read literature from outside their fields. I first learned epidemiology and biostatistics at a school of public health, and now this year I’m taking econometrics from an economist, as well as other classes that draw heavily on the economics literature.

Friends in my economics-centered program have asked me “what’s biostatistics?” Likewise, public health friends have asked “what’s econometrics?” (or just commented that it’s a silly name). In reality both fields use many of the same techniques with different language and emphases. The Gunasekara, Carter, and Blakely glossary linked above covers the following terms, amongst others:

  • confounding
  • endogeneity and endogenous variables
  • exogenous variables
  • simultaneity, social drift, social selection, and reverse causality
  • instrumental variables
  • intermediate or mediating variables
  • multicollinearity
  • omitted variable bias
  • unobserved heterogeneity

If you’ve only studied econometrics or biostatistics, chances are at least some of these terms will be new to you, even though most have roughly equivalent forms in the other field.

Outside of differing language, another difference is in the frequency with which techniques are used. For instance, instrumental variables seem (to me) to be under-used in public health / epidemiology applications. I took four terms of biostatistics at Johns Hopkins and don’t recall instrumental variables being mentioned even once! On the other hand, economists just recently discovered randomized trials. (Now they’re more widely used) .

But even within a given statistical technique there are important differences. You might think that all social scientists doing, say, multiple linear regression to analyze observational data or critiquing the results of randomized controlled trials would use the same language. In my experience they not only use different vocabulary for the same things, they also emphasize different things. About a third to half of my epidemiology coursework involved establishing causal models (often with directed acyclic graphs)  in order to understand which confounding variables to control for in a regression, whereas in econometrics we (very!) briefly discussed how to decide which covariates might cause omitted variable bias. These discussions were basically about the same thing, but they differed in terms of language and in terms of emphasis.

I think an understanding of how and why researchers from different fields talk about things differently helps you to understand the sociology and motivations of each field.  This is all related to what Marc Bellemare calls the ongoing “methodological convergence in the social sciences.” As research becomes more interdisciplinary — and as any applications of research are much more likely to require interdisciplinary knowledge — understanding how researchers trained in different academic schools think and talk will become increasingly important.

03

05 2012

Group vs. individual uses of data

Andrew Gelman notes that, on the subject of value-added assessments of teachers, “a skeptical consensus seems to have arisen…” How did we get here?

Value-added assessments grew out of the push for more emphasis on measuring success through standardized tests in education — simply looking at test scores isn’t OK because some teachers are teaching in better schools or are teaching better-prepared students. The solution was to look at how teachers’ students improve in comparison to other teachers’ students. Wikipedia has a fairly good summary here.

Back in February New York City released (over the opposition of teachers’ unions) the value-added scores of some 18,000 teachers. Here’s coverage from the Times on the release and reactions.

Gary Rubinstein, an education blogger, has done some analysis of the data contained in the reports and published five posts so far: part 1, part 2, part 3, part 4, and part 5. He writes:

For sure the ‘reformers’ have won a battle and have unfairly humiliated thousands of teachers who got inaccurate poor ratings. But I am optimistic that this will be be looked at as one of the turning points in this fight. Up until now, independent researchers like me were unable to support all our claims about how crude a tool value-added metrics still are, though they have been around for nearly 20 years. But with the release of the data, I have been able to test many of my suspicions about value-added.

I suggest reading his analysis in full, or at least the first two parts.

For me one early take-away from this — building off comments from Gelman and others — is that an assessment might be a useful tool for improving education quality overall, while simultaneously being a very poor metric for individual performance. When you’re looking at 18,000 teachers you might be able to learn what factors lead to test score improvement on average, and use that information to improve policies for teacher education, recruitment, training, and retention. But that doesn’t mean one can necessarily use the same data to make high-stakes decisions about individual teachers.

On food deserts

Gina Kolata, writing for the New York Times, has sparked some debate with this article: “Studies Question the Pairing of Food Deserts and Obesity”. In general I often wish that science reporting focused more on how the new studies fit in with the old, rather than just the (exciting) new ones. On first reading I noticed that one study is described as having explored the association of “the type of food within a mile and a half of their homes” with what people eat.

This raised a little question mark in my mind, as I know that prior studies have often looked at distances much shorter than 1.5 miles, but it was mostly a vague hesitation. And if you didn’t know that before reading the article, then you’ve missed a major difference between the old and new results (and one that could have been easily explained). Also, describing something as “an article of faith when it’s arguably something more like “the broad conclusion draw from most most prior research“… that certainly established an editorial tone from the beginning.

Intrigued, I sent the piece to a friend (and former public health classmate) who has work on food deserts, to get a more informed reaction. I’m sharing her thoughts here (with permission) because this is an area of research that I don’t follow as closely, and her reactions helped me to situate this story in the broader literature:

1. This quote from the article is so good!

“It is always easy to advocate for more grocery stores,” said Kelly D. Brownell, director of Yale University’s Rudd Center for Food Policy and Obesity, who was not involved in the studies. “But if you are looking for what you hope will change obesity, healthy food access is probably just wishful thinking.”

The “unhealthy food environment” has a much bigger impact on diet than the “healthy food environment”, but it’s politically more viable to work from an advocacy standpoint than a regulatory standpoint. (On that point, you still have to worry about what food is available – you can’t just take out small businesses in impoverished neighborhoods and not replace it with anything.)

2. The article is too eager to dismiss the health-food access relationship. There’s good research out there, but there’s constant difficulty with tightening methods/definitions and deciding what to control for. The thing that I think is really powerful about the “food desert” discourse is that it opens doors to talk about race, poverty, community, culture, and more. At the end of the day, grocery stores are good for low-income areas because they bring in money and raise property values. If the literature isn’t perfect on health effects, I’m still willing to advocate for them.

3. I want to know more about the geography of the study that found that low-income areas had more grocery stores than high-income areas. Were they a mix of urban, peri-urban, and rural areas? Because that’s a whole other bear. (Non-shocker shocker: rural areas have food deserts… rural poverty is still a problem!)

4. The article does a good job of pointing to how difficult it is to study this. Hopkins (and the Baltimore Food Czar) are doing some work with healthy food access scores for neighborhoods. This would take into account how many healthy food options there are (supermarkets, farmers’ markets, arabers, tiendas) and how many unhealthy food options there are (fast food, carry out, corner stores).

5. The studies they cite are with kids, but the relationship between food insecurity (which is different, but related to food access) and obesity is only well-established among women. (This, itself, is not talked about enough.) The thinking is that kids are often “shielded” from the effects of food insecurity by their mothers, who eat a yo-yo diet depending on the amount of food in the house.

My friend also suggested the following articles for additional reading:

More on microfoundations

Last month I wrote a long-ish post describing the history of the “microfounded” approaches to macroeconomics. For a while I was updating that post with links to recent blog posts as the debate continued, but I stopped after the list grew too long.

Now Simon Wren-Lewis has written two more posts that I think are worth highlighting because they come from someone who is generally supportive of the microfoundations approach (I’ve found his defense of the general approach quite helpful), but who still has some specific critiques. The end of his latest post puts these critiques in context:

One way of reading these two posts is a way of exploring Krugman’s Mistaking Beauty for Truth essay. I know the reactions of colleagues, and bloggers, to this piece have been quite extreme: some endorsing it totally, while others taking strong exception to its perceived targets. My own reaction is very similar to Karl Smith here. I regard what has happened as a result of the scramble for austerity in 2010 to be in part a failure of academic macroeconomics. It would be easy to suggest that this was only the result of unfortunate technical errors, or political interference, and that otherwise the way we do macro is basically fine. I think Krugman was right to suggest otherwise. Given the conservative tendency in any group, an essay that said maybe there might just be an underlying problem here would have been ignored. The discipline needed a wake-up call from someone with authority who knew what they were talking about. Identifying exactly what those problems are, and what to do about them, seems to me an important endeavour that has only just begun.

Here are his two posts:

  1. The street light problem: “I do think microfoundations methodology is progressive. The concern is that, as a project, it may tend to progress in directions of least resistance rather than in the areas that really matter – until perhaps a crisis occurs.”
  2. Ideological bias: “In RBC [Real Business Cycle] models, all changes in unemployment are voluntary. If unemployment is rising, it is because more workers are choosing leisure rather than work. As a result, high unemployment in a recession is not a problem at all…. If anyone is reading this who is not familiar with macroeconomics, you might guess that this rather counterintuitive theory is some very marginal and long forgotten macroeconomic idea. You would be very wrong.”

22

04 2012

Up to speed: microfoundations

[Admin note: this is the first of a new series of “Up to speed” posts which will draw together information on a subject that’s either new to me or has been getting a lot of play lately in the press or in some corner of the blogosphere. The idea here is that folks who are experts on this particular subject might not find anything new; I’m synthesizing things for those who want to get up to speed.]

Microfoundations (Wikipedia) are quite important in modern macroeconomics. Modern macroeconomics really started with Keynes. His landmark General Theory of Employment, Interest and Money (published in 1936) set the stage for pretty much everything that has come since. Basically everything that came before Keynes couldn’t explain the Great Depression — or worse yet how the world might get out of it — and Keynes’ theories (rightly or wrongly) became popular because they addressed that central failing.

One major criticism was that modern macroeconomic models like Keynes’ were top-down, only looking at aggregate totals of measures like output and investment. That may not seem too bad, but when you tried to break things down to the underlying individual behaviors that would add up to those aggregates, wacky stuff happens. At that point microeconomic models were much better fleshed out, and the micro models all started with individual rational actors maximizing their utility, assumptions that macroeconomists just couldn’t get from breaking down their aggregate models.

The most influential criticism came from Robert Lucas, in what became known as the Lucas Critique (here’s a PDF of his 1976 paper). Lucas basically argued that aggregate models weren’t that helpful because they were only looking at surface-level parameters without understanding the underlying mechanisms. If something — like the policy environment — changes drastically then the old relationships that were observed in the aggregate data may no longer apply. An example from Wikipedia:

One important application of the critique is its implication that the historical negative correlation between inflation and unemployment, known as the Phillips Curve, could break down if the monetary authorities attempted to exploit it. Permanently raising inflation in hopes that this would permanently lower unemployment would eventually cause firms’ inflation forecasts to rise, altering their employment decisions.

Economists responded by developing “micro-founded” macroeconomic models, ones that built up from the sum of microeconomic models. The most commonly used of these models is called, awkwardly, dynamic stochastic general equilibirum (DGSE). Much of my study time this semester involves learning the math behind this. What’s the next step forward from DGSE? Are these models better than the old Keynesian models? How do we even define “better”? These are all hot topics in macro at the moment. There’s been a recent spat in the economics blogosphere that illustrates this — what follows are a few highlights.

Back in 2009 Paul Krugman (NYT columnist, Nobel winner, and Woodrow Wilson School professor) wrote an article titled “How Did Economists Get It So Wrong?” that included this paragraph:

As I see it, the economics profession went astray because economists, as a group, mistook beauty, clad in impressive-looking mathematics, for truth. Until the Great Depression, most economists clung to a vision of capitalism as a perfect or nearly perfect system. That vision wasn’t sustainable in the face of mass unemployment, but as memories of the Depression faded, economists fell back in love with the old, idealized vision of an economy in which rational individuals interact in perfect markets, this time gussied up with fancy equations. The renewed romance with the idealized market was, to be sure, partly a response to shifting political winds, partly a response to financial incentives. But while sabbaticals at the Hoover Institution and job opportunities on Wall Street are nothing to sneeze at, the central cause of the profession’s failure was the desire for an all-encompassing, intellectually elegant approach that also gave economists a chance to show off their mathematical prowess.

Last month Stephen Williamson wrote this:

[Because of the financial crisis] There was now a convenient excuse to wage war, but in this case a war on mainstream macroeconomics. But how can this make any sense? The George W era produced a political epiphany for Krugman, but how did that ever translate into a war on macroeconomists? You’re right, it does not make any sense. The tools of modern macroeconomics are no more the tools of right-wingers than of left-wingers. These are not Republican tools, Libertarian tools, Democratic tools, or whatever.

A bit of a sidetrack, but this prompted Noah Smith to write a long post (that is generally more technical than I want to get in to here) defending the idea that modern macro models (like DSGE) are in fact ideologically biased, even if that’s not their intent. Near the end:

So what this illustrates is that it’s really hard to make a DSGE model with even a few sort-of semi-realistic features. As a result, it’s really hard to make a DSGE model in which government policy plays a useful role in stabilizing the business cycle. By contrast, it’s pretty easy to make a DSGE model in which government plays no useful role, and can only mess things up. So what ends up happening? You guessed it: a macro literature where most papers have only a very limited role for government.

In other words, a macro literature whose policy advice is heavily tilted toward the political preferences of conservatives.

Back on the main track, Simon Wren-Lewis, writing at Mainly Macro, comes to Krugman’s defense, sort of, by saying that its conceivable that an aggregate model might actually be more defensible than a micro-founded one in certain circumstances.

This view [Krugman’s view that aggregate models may still be useful] appears controversial. If the accepted way of doing macroeconomics in academic journals is to almost always use a ‘fancier optimisation’ model, how can something more ad hoc be more useful? Coupled with remarks like ‘the economics profession went astray because economists, as a group, mistook beauty, clad in impressive-looking mathematics, for truth’ (from the 2009 piece) this has got a lot of others, like Stephen Williamson, upset. [skipping several paragraphs]

But suppose there is in fact more than one valid microfoundation for a particular aggregate model. In other words, there is not just one, but perhaps a variety of particular worlds which would lead to this set of aggregate macro relationships….Furthermore, suppose that more than one of these particular worlds was a reasonable representation of reality… It would seem to me that in this case the aggregate model derived from these different worlds has some utility beyond just one of these microfounded models. It is robust to alternative microfoundations.

Back on the main track, Krugman followed up with an argument for why its OK to use both aggregate and microfounded models.

And here’s Noah Smith writing again, “Why bother with microfoundations?”

Using wrong descriptions of how people behave may or may not yield aggregate relationships that really do describe the economy. But the presence of the incorrect microfoundations will not give the aggregate results a leg up over models that simply started with the aggregates….

When I look at the macro models that have been constructed since Lucas first published his critique in the 1970s, I see a whole bunch of microfoundations that would be rejected by any sort of empirical or experimental evidence (on the RBC side as well as the Neo-Keynesian side). In other words, I see a bunch of crappy models of individual human behavior being tossed into macro models. This has basically convinced me that the “microfounded” DSGE models we now use are only occasionally superior to aggregate-only models. Macroeconomists seem to have basically nodded in the direction of the Lucas critique and in the direction of microeconomics as a whole, and then done one of two things: either A) gone right on using aggregate models, while writing down some “microfoundations” to please journal editors, or B) drawn policy recommendations directly from incorrect models of individual behavior.

The most recent is from Krugman, wherein he says (basically) that models that make both small and big predictions should be judged more on the big than the small.

This is just a sampling, and likely a biased one as there are many who dismiss the criticism of microfoundations out of hand and thus aren’t writing detailed responses. Either way, the microfoundations models are dominant in the macro literature now, and the macro-for-policy-folks class I’m taking at the moment focuses on micro-founded models (because they’re “how modern macro is done”).

So what to conclude? My general impression is that microeconomics is more heavily ‘evolved’ than macroeconomics. (You could say that in macro the generation times are much longer, and the DNA replication bits are dodgier, so evolving from something clearly wrong towards something clearly better is taking longer.)

Around the same time that micro was getting problematized by Kahneman and others who questioned the rational utility-maximizing nature of humans, thus launching behavioral economics revolution — which tries to complicate micro theory with a bit of reality — the macroeconomists were just  getting around to incorporating the original microeconomic emphasis on rationality. Just how much micro will change in the next decades in response to the behavioral revolution is unclear, so expecting troglodytesque macro to have already figured this out is unrealistic.

A number of things are unclear to me: just how deep the dissatisfaction with the current models is, how broadly these critiques (vs. others from different directions) are endorsed, and what actually drives change in fields of inquiry. Looking back in another 30-40 years we might see this moment in time as a pivotal shift in the history of the development of macroeconomics — or it may be a little hiccup that no one remembers at all. It’s too soon to tell.

Updates: since writing this I’ve noticed several more additions to the discussion:

Coincidence or consequence?

Imagine there’s a pandemic flu virus on the loose, and a vaccine has just been introduced. Then come reports of dozens of cases of Guillain-Barré syndrome (GBS), a rare type of paralysis. Did the new vaccine cause it? How would you even begin to know? One first step (though certainly not the only one) is to think about the background rate of disease:

Inappropriate assessment of vaccine safety data could severely undermine the eff ectiveness of mass campaigns against pandemic H1N1 2009 influenza. Guillain-Barré syndrome is a good example to consider. Since the 1976–77 swine influenza vaccination campaign was associated with an increased number of cases of Guillain-Barré syndrome, assessment of such cases after vaccination will be a high priority. Therefore, it is important to know the background rates of this syndrome and how this rate might vary with regard to population demographics. The background rate of the syndrome in the USA is about 1–2 cases per 1 million person-months of observation. During a pandemic H1N1 vaccine campaign in the USA, 100 million individuals could be vaccinated. For a 6-week follow-up period for each dose, this corresponds to 150 million person-months of observation time during which a predicted 200 or more new cases of Guillain-Barré syndrome would occur as background coincident cases. The reporting of even a fraction of such a large number of cases as adverse events after immunisation, with attendant media coverage, would probably give rise to intense public concern, even though the occurrence of such cases was completely predictable and would have happened in the absence of a mass campaign.

That’s from a paper by Steven Black et al. in 2009, “Importance of background rates of disease in assessment of vaccine safety during mass immunisation with pandemic H1N1 infl uenza vaccines”. They also calculate background rates for spontaneous abortion, preterm delivery, and spontaneous death among other things.

17

01 2012

Platform evaluation

Cesar Victora,  Bob Black,  Ties Boerma, and Jennifer Bryce (three of the four are with the Hopkins Department of International Health and I took a course with Prof Bryce) wrote this article in The Lancet in January 2011: “Measuring impact in the Millennium Development Goal era and beyond: a new approach to large-scale effectiveness evaluations.” The abstract:

Evaluation of large-scale programmes and initiatives aimed at improvement of health in countries of low and middle income needs a new approach. Traditional designs, which compare areas with and without a given programme, are no longer relevant at a time when many programmes are being scaled up in virtually every district in the world. We propose an evolution in evaluation design, a national platform approach that: uses the district as the unit of design and analysis; is based on continuous monitoring of different levels of indicators; gathers additional data before, during, and after the period to be assessed by multiple methods; uses several analytical techniques to deal with various data gaps and biases; and includes interim and summative evaluation analyses. This new approach will promote country ownership, transparency, and donor coordination while providing a rigorous comparison of the cost-effectiveness of different scale-up approaches.

Discarding efficacy?

Andrew Grove, former CEO of Intel, writes an editorial in Science:

We might conceptualize an “e-trial” system along similar lines. Drug safety would continue to be ensured by the U.S. Food and Drug Administration. While safety-focused Phase I trials would continue under their jurisdiction, establishing efficacy would no longer be under their purview. Once safety is proven, patients could access the medicine in question through qualified physicians. Patients’ responses to a drug would be stored in a database, along with their medical histories. Patient identity would be protected by biometric identifiers, and the database would be open to qualified medical researchers as a “commons.” The response of any patient or group of patients to a drug or treatment would be tracked and compared to those of others in the database who were treated in a different manner or not at all.

Alex Tabarrok of Marginal Revolution (who is a big advocate for FDA reform, running this site) really likes the idea. I hate it. While the current system has some problems, Grove’s system would be much, much worse than the current system. The biggest problem is that we would have no good data about whether a drug is truly efficacious, because all of the results in the database would be confounded by selection bias. Getting a large sample size and having subgroups tells you nothing about why someone got the treatment in the first place.

Would physicians pay attention to peer-reviewed articles and reviews identifying the best treatments for specific groups? Or would they just run their own analyses? I think there would be a lot of the latter, which is scary since many clinicians can’t even define selection bias or properly interpret statistical tests. The current system has limitations, but Grove’s idea would move us even further from any sort of evidence-based medicine.

Other commenters at Marginal Revolution rightly note that it’s difficult to separate safety from efficacy, because recommending a drug is always based on a balance of risks and benefits. Debilitating nausea or strong likelihood of heart attack would never be OK in a drug for mild headaches, but if it cures cancer the standards are (and should be) different.

Derek Lowe, a fellow Arkansan who writes the excellent chemistry blog In The Pipeline, has more extensive (and informed) thoughts here.

Update (1/5/2012): More criticism, summarized by Derek Lowe.

What does social science know?

Marc Bellemare wrote a post “For Fellow Teachers: Revised Primers on Linear Regression and Causality.” Good stuff for students too — not just teachers. The primers are PDFs on linear regression (6 pages) and causality (3 pages), and they’re either 1) a concise summary if you’re studying this stuff already, or 2) something you should really read if you don’t have any background in quantitative methods.

I also really enjoyed an essay by Jim Manzi that Marc links to, titled “What Social Science Does — and Doesn’t — Know.” Manzi reviews the history of experimentation in natural sciences, and then in social sciences. He discusses why it’s more difficult to extrapolate from randomized trials in the social sciences due to greater ‘causal density,’ amongst other reasons. Manzi summarized a lot of research in criminology (a field I didn’t even know used many field trials) and ends with some conclusions that seem sharp (emphasis added):

…After reviewing experiments not just in criminology but also in welfare-program design, education, and other fields, I propose that three lessons emerge consistently from them.

First, few programs can be shown to work in properly randomized and replicated trials. Despite complex and impressive-sounding empirical arguments by advocates and analysts, we should be very skeptical of claims for the effectiveness of new, counterintuitive programs and policies, and we should be reluctant to trump the trial-and-error process of social evolution in matters of economics or social policy.

Second, within this universe of programs that are far more likely to fail than succeed, programs that try to change people are even more likely to fail than those that try to change incentives. A litany of program ideas designed to push welfare recipients into the workforce failed when tested in those randomized experiments of the welfare-reform era; only adding mandatory work requirements succeeded in moving people from welfare to work in a humane fashion. And mandatory work-requirement programs that emphasize just getting a job are far more effective than those that emphasize skills-building. Similarly, the list of failed attempts to change people to make them less likely to commit crimes is almost endless—prisoner counseling, transitional aid to prisoners, intensive probation, juvenile boot camps—but the only program concept that tentatively demonstrated reductions in crime rates in replicated RFTs was nuisance abatement, which changes the environment in which criminals operate….

I’d note here that many researchers and policymakers who are interested in health-related behavior change have been moving away from simply providing information or attempting to persuade people to change their behavior, and moving towards changing the unhealthy environments in which we live. NYC Health Commissioner Thomas Farley spoke explicitly about this shift in emphasis when he addressed us summer interns back in June. That approach is a direct response to frustration with the small returns from many behavioral intervention approaches, and an acknowledgment that we humans are stubborn creatures whose behavior is shaped (more than we’d like to admit) by our environments.

Manzi concludes:

And third, there is no magic. Those rare programs that do work usually lead to improvements that are quite modest, compared with the size of the problems they are meant to address or the dreams of advocates.

Right, no pie in the sky. If programs or policies had huge effects they’d be much easier to measure, for one. Read it all.

18

08 2011