## Hunger Games survival analysis

**Update:** subsequently republished at *Jezebel* as “An Incredibly Detailed Super Statistical Hunger Games Survival Analysis”.

April 11, 2012

Whenever a new fad takes over pop culture, social scientists take note. They host conferences, teach courses, and write popular pieces trying to tie the newest thing to their favorite area of research. Consider this post a humble attempt to take that trend to its epic, nerdtastic next level.

As a student of epidemiology and economics I feel duty-bound to apply my cursory knowledge of statistics to the novel natural cohort presented in the Hunger Games novel, as documented by author Suzanne Collins. I present a **Hunger Games survival analysis: in a Cox proportional hazards model, which covariates are associated with the odds (or hazard ratios) being ever in your favor? **A taste of what’s to come:

The agenda:

- an explanation of the Hunger Games and facts relevant to this analysis
- a snappy literature review of peripherally-related things other scholars have written on the subject
- construction and presentation of data set
- do the Gamemakers rig the draw? an analysis of expected lottery outcomes in scenarios differing by tessera and demographic trends
- the main event: a Cox proportional hazard model to explore predictors of survival time in the 74th annual Hunger Games.

(**Spoiler alert:** there will be book spoilers throughout this post. I haven’t read the second and third books yet, so please don’t spoil them for me or others in the comments. I look forward to analyzing their data as well, and the integrity of this very important research will be enhanced if I can pre-specify the hypotheses I’ll test before I’ve observed the outcomes. Data-mining is not the fanboy way.)

**1. What are the Hunger Games? **For the woefully unaware, Suzanne Collins’ Hunger Games is aptly summarized by that scholarly redoubt, Wikipedia:

[The Hunger Games] is written in the voice of sixteen-year-old Katniss Everdeen, who lives in a post-apocalyptic world in the country of Panem where the countries of North America once existed. The Capitol, a highly advanced metropolis, holds hegemony over the rest of the nation. The Hunger Games are an annual event in which one boy and one girl aged 12 to 18 from each of the 12 districts surrounding the Capitol are selected by lottery [as “tributes”] to compete in a televised battle in which only one person can survive.

Other facts relevant to this analysis: All children in Panem register for the Hunger Games lottery once a year, and all registrations stay in until you age out (after age 18). So a 12 year old is in a minimum of one time, a 13 year old is in a minimum of twice, etc. The Capitol keeps most districts in near-starvation conditions, so the remedy for many families is to apply for tesserae, a yearly allocation of food that necessitates adding your name to the lottery an additional time. In terms of welfare policy, tessera (plural) are a kind of conditional in-kind transfer of goods (more).

If a child requests one tesserae annually they will have two lottery entries at age 12, four at age 13, and so on. A child requesting five tessera per year will be entered six times at age 12 and on up to a whopping 42 entries at age 18. This is the situation one of the main characters, Gale, faces at the beginning of the book. Matt Yglesias worried this scenario would lead to runaway tessera inflation, but that shouldn’t be a problem as eligibility is apparently capped by number of family members — Gale takes out five tessera annually to support his five family members.

Still, the odds that any given child will be selected are relatively low — and even lower in districts with large populations — but it is the fear that really controls the population. Psychology research helps explain why the Games are so terrible to the citizens of Panem. A little Prospect Theory here: people are loss averse, and they overweight the likelihood of rare events. The Capitol regime also knows that once children are selected from each district, Prospect Theory predicts that citizens will have a new “reference point” (they assume their tributes will be killed) and thus see victory in the Games as a gain to be celebrated, rather than merely the maintenance of the status quo (the tribute being alive) that existed before the annual lottery.

Also notable are the “Careers,” or tributes from wealthy districts who have been trained from early childhood to compete in the Games. Regardless of who is picked by lottery from their district, one of the Careers will volunteer. (There is obvious room for fruitful anthropological inquiry into this subculture.)

This should relieve the fear of the Games in their districts, though it is unclear whether the districts are able to maintain this system because they are wealthy, or whether the districts are wealthy because their citizens work free of the fear of the Games and any impact on demographic trends the fear of losing children might have. This is also an important area for study, but the endogeneity problems and problematic historical data are a challenge worthy of Acemoglu and Robinson.

**2. Obligatory literature review**

In addition to the aforementioned tesserae inflation post by Matt Yglesias, Matt also wrote “The Economics of the Hunger Games” where he asks and answers his own question in the subtitle: “could any real country have an economy like Panem’s? Actually, yes.” In that piece he uses the Hunger Games to illustrate Acemoglu and Robinson’s distinction between inclusive vs. extractive institutions.

Next up is Erik Kain’s Five Economic Lessons of the Hunger Games, in Forbes. His lessons are 1) Markets Are More Efficient Than Command Economies, 2) Globalism Only Works If You Ditch The Extraction Model, 3) Economic Inequality Is Bad For Business, 4) War Drains Economic Resources, and 5) Technology Can Be Used For Good Or Evil. I particularly liked this note:

Furthermore, arbitrarily picking Districts to supply only one type of good to the Capitol means that human capital is badly mismanaged. Nothing about being born in District 12 makes you a better miner, any more than being born in District 2 makes you a better soldier.

Ironically, a flourishing market economy likely would have meant a far more wealthy populace in the Capitol as well. Hundreds of years in the future, we should expect to be far wealthier and more technologically advanced than even the wealthiest citizens of Panem.

This reminds me that our current world economy is in some ways very much like Panem, with massive restrictions to cross-border flow of human capital (i.e., people). Michael Clemens of the Center for Global Development has done some valuable work exploring the economic benefits (to everyone, not just the immigrants) of allowing greater migration.

Finally, there’s “Probability and Game Theory in the Hunger Games,” by Michael A. Lewis, writing at *Wired*. Lewis explores the lottery process — and I think here he’s quite wrong, but more on that below in the nitty-gritty of my analysis. Then he moves on to game theory, using the decision within the Hunger Games of whether to sleep or not sleep as an illustration of Prisoner’s Dilemma. I think his analysis of the dilemma the children face and how it prompts them to form alliances is correct. But as I read Lewis’ article I thought of another, related dilemma:

The people of any given District could game the system by all agreeing to put in for the maximum quantity of tessera without hurting anyone. A simplified example: if the entire child population of District 12 consisted of 10 fourteen-year-old kids and they all have the minimum three entries, there are 30 entries total and each child has a 3/30 (or 1 in 10) likelihood of being chosen.

But if each of the kids agrees to request two tessera annually, they would each have nine entries ate age fourteen, making their likelihood of selection 9 out of 90… which is still 1 in 10. In other words, so long as everyone agrees to increase tessera selection simultaneously, everyone benefits in terms of extra food and no one is more likely to be selected.

So why doesn’t this happen? Two reasons: first, Capitol may require the tessera requests to be private and otherwise block coordination efforts amongst the population in a given district. Without perfect information about what others are actually getting, Gale could tell everyone he put in for five tessera (to decrease their marginal risk from requesting more, which would make him less likely to be selected!) while actually only putting in for three.

Second, income inequality exacerbates problems as wealthy families would not opt into such a coordinated system because they do not need the extra food — if everyone were equally poor andable to coordinate such a scheme might work.

**3. Construction of data set**

The iteration of the Hunger Games depicted in the novel is the 74th annual, meaning that there have been 1,776 participants to date and 75 winners (two in year 74). It sure would be nice to have all that data! Alas, Collins only documents the 74th Games, but I think there is still some interesting analysis to be done.

I began by perusing the novel for information about each character. Later I discovered that much of the information I needed was already available through the Hunger Games Wiki, helpfully compiled by people even nerdier than me (and with less binding time constraints). Here’s how the data looks in Stata (version 11.2), which I used for the survival anaylsis:

To allow others to check and improve upon my analysis, I’ve made the data available for download in a Google Doc spreadsheet, along with my Stata code in a Google Document (which contains instructions for downloading and importing the data into Stata).

On methodology: I assigned each tribute a unique ID (1 through 24) and names, where available. The other variables in the data set are:

- District (1 through 12)
- Sex (0 = male, 1=female)
- Age (12 through 18)
- Volunteer (0 = no, 1 = yes)
- Career (0 = no, 1 = yes)
- Gamemakers’ Rating (5-11 or 3-11, depending on methodology described below)
- Rank (order in which they exited the games, from 1st for Katniss and Peeta to 14th for those who died in the initial bloodbath)
- Winner (0 = no, 1 = yes for Peeta and Katniss)
- Alliance (0 = not formed, 1 = formed an alliance)

In cases where data from the book and novel conflicted (as was the case with rank for a few tributes) I went with the book. In cases where data is available in the movie that isn’t in the book (as with age), I did use the movie data. District, sex, age, rank, and winner were unambiguous.

For the alliance “dummy” (not necessarily an indicator of intelligence) variable, those who formed an alliance are represented by 1 (this includes both the Careers alliance and the short-lived Rue-Katniss alliance) and 0 if they survived the bloodbath but did not form an alliance. I thought it would be unfair to assign an alliance value to those who died in the bloodbath, as they may well have formed one had they survived. Also, Katniss is notable for being the only volunteer in the data set who was not also a Career.

The Gamemakers’ ratings should be good predictors of survival time, as they represent expert judgments based on knowledge of the previous games and inside knowledge of fighting ability. However, precise ratings were only available for the nine named characters: Marvel, Glimmer, Cato, Clove, Foxface, Thresh, Rue, Peeta, and Katniss. The variable “rating” gives the other tributes missing values.

However, the book also notes that the Careers scored mostly between 8 and 10, whereas the other non-Career tributes scored a 5 “on average”. So I added another variable “rating_ave” that assigns a 9 to the Careers with missing values and a 5 to non-Careers with missing values. I also created “rating_rand,” which creates plausible ratings by using the generator at Random.org to give random integers between 8 and 10 for the Careers and between 3 and 7 for the non-Careers. This allows the rating data to be analyzed using any of these three methods.

**4. Do the Gamemakers rig the draw?**

Building off Michael A. Lewis’ thoughts on the odds of being selected in the lottery, I wanted to explore the age distribution of the tributes. How can we tell whether the Gamemakers are actually rigging the vote, to bring in younger or more compelling tributes to enhance the drama of the Games? Luckily we have enough information to explore this question a bit.

When you use Stata to tabulate the tributes by age, you find a few funny things. For instance, seven (29.2%) of the 24 tributes are age 15, and 6 (25.0%) are age 16, which seem high to me. But in a random process like a lottery — and with small numbers of children actually chosen — it can be difficult for human intuition to distinguish between random variation and something else. For that, we need statistical analysis. Graphically (using Excel), the distribution of the proportion of tributes by age looks like this:

How does this compare to the expected distribution of tributes based on the increasing risk of being selected in any given year? Astute readers will have already notice that I’m including all 24 tributes here, whereas eight of them (Katniss plus 7 Careers) volunteered and were not randomly selected. So breaking the tributes down by volunteer vs. non-volunteer status yields this graph:

The red bars represent counts of non-volunteers, which should be due to random selection based on their number of entries in the lottery. Here’s where I differ with Lewis’ analysis at *Wired*. Lewis presents this graph of increasing probability of selection as children age:

As Lewis notes, this is from an example where all the children are of the same age, and the total pool of children gets smaller as children get selected or age out. This is a bit silly, and doesn’t really help us understand the lottery dynamics. A more realistic scenario would involve cohorts that are roughly the same size each year, so that the same number of children enter the pool and exit the pool in any given year. This rests on the assumption that demographic trends in Panem are fairly stable — after all, it has existed with the same system in place for 74 years, and Capitol would likely avoid incentivizing rapid demographic changes that could destabilize the situation.

Discounting tesserae for now, if a cohort of children enters and has one entry at 12, two at 13, and so on, the probability that a child from a particular age will be selected in any given year increases arithmetically: 3.6% at age 12, 7.1% at age 13, 10.7% at age 14, 14.3% at age 15, 17.9% at age 16, 21.4% at age 17, and 25% at age 18. In other words, just by nature of the accumulating lottery entries as you age, one out of four lottery draws should be an 18 year old. Using these proportions and multiplying them by the 18 tributes actually selected by lottery, I get the blue line on the graph below, which represents the expected number of children from each age if the lottery is completely random:

By comparing the red bars (non-volunteer children) to the blue expected line, you can see the apparent difference in outcomes. However, we haven’t yet tested whether these differences are statistically significant or could simply happen by chance.

[**A side note on tesserae:** Lewis rightly notes that it is difficult to analyze further without understanding why some families select tesserae and others do not. I don’t think the problem is quite as difficult as that. Choosing tesserae is likely dependent on income so it makes sense to assume that a certain percentage of the population in each District doesn’t request it at all, whereas others require one, two, or more tessera every year to survive (we know that Gale took at five every year.) But that shouldn’t matter when analyzing the probability of a given age being selected unless the likelihood of choosing tesserae differs systematically (in the whole population) by age. In other words, people who put in for one tessera at age 12 are likely to continue requesting one tessera per year, and if they increase or decrease that request the change will be offset by changes in requests from other citizens. If 10% of the population requests the maximum tesserae every year but that doesn’t change as the children age (because family size remains constant) then children remain at constant risk based on their age. I modeled a few scenarios in Excel to convince myself of this, but I won’t present those here. Thus, I’m assuming that *on average *the number of tesserae requested is determined by income but not by age, as need stays constant over time.]

Given those assumptions, tesserae can be discounted as a major factor impacting lottery selection. On an individual level it’s still bad to be poor and thus need tesserae, and it’s still bad to get older and thus have your name in the lottery more times, but on the population level we can model the probability of selection of a given age.

To do so I used a Chi-square test in Stata, testing to see whether there’s a difference between the observed counts and the expected counts, the blue line in the graph above. Stata’s built-in functionalities don’t quite cut it, you’ll need to install the tab_chi package (just use the command “*findit tab_chi*“) which was created by Nicholas Cox. Then you just enter this command to perform a Chi-square test: “*chitesti 1 0 1 5 4 3 2 \ 0.571428576 1.142857136 1.714285712 2.285714288 2.857142864 3.428571424 4*“. This yields a **Pearson Chi-square statistic of 6.4958 (P=0.370).**

This indicates that the difference between the age distribution in the 74th Hunger Games is not different (with a traditional statistical significance threshold of P=0.05) from the expected age distribution given an arithmetic increase in lottery names by age and the tesserae assumptions outlined above. Thus, **the available data does not support the hypothesis that the Gamemakers rigged the draw in the 74th games.** Of course, this test depends on the assumptions regarding tesserae and demographics explained above. And ideally we’d have data from *all* the games, which would allow us to detect more subtle patterns with a larger sample size. This analysis works for the 74th games, but external validity — whether we can extrapolate from these results to all instances of the Games — may be problematic.

**5. Survival analysis: which covariates are associated with longer survival in the Games?**

Survival analysis is quite important in epidemiology. Kaplan-Meier graphs are commonly used in the medical literature to visually represent the difference in survival over time between two different groups — often treatment vs. control. Survival analysis can be used with any outcome, not just death, so long as the data you have is measure time that passes until an event.

When these statistical tool are used in economic and other non-health data, they’re often described as “event history analysis” as the vocabulary of survival analysis makes less sense. However, it should be morbidly clear that the language of survival analysis works just fine with the Hunger Games scenario. In fact, it’s a perfect opportunity to explore survival analysis as we have data for how long each tribute survived in the arena, as well as a number of covariates measured at the beginning of the contest.

I won’t go into much coding detail on the survival analysis, but Stata users can access my .do file (linked above) to replicate the analysis. Rather, I’ll present the results graphically where possible. Graphs are pretty. First, you can see how the Nelson-Aalen cumulative hazard estimator increases with time:

Next I ran a few univariate analyses of what I thought might be important predictors. In the following Kaplan-Meier graphs the y axis shows the proportion of tributes surviving at a given point in time. Since our unit of time is days, you see a “step down” each time someone dies. Here’s a graph showing the survival by sex:

Not much difference. On the first day both male and female drop from 1.00 (everyone surviving) to close to 0.50. On the second day the male line doesn’t budge from 0.50, but the female line drops down to 0.50: this is the demise of the District female, killed by Peeta and Cato in the book but by Glimmer in the film, as noted at the Hunger Games Wiki). The last blue step down — on day 17 — is Cato. In this graph each step is the same height (0.083 = 1/12) because the numbers of male and female tributes are the same, but in subsequent graphs this isn’t always true. (An aside on gender: Lewis’ article also points out that it’s unclear how Capitol deals with transgender citizens.)

Next, by Careers vs. non-Careers, the graph included as a teaser at the beginning of this post:

The Careers get a head-start at the bloodbath, but things even out by around day 10. At least in this particular iteration of the Games, the do not fare better overall than the non-Careers.

Graphing by age is messy because there are too many categories (and Kaplan-Meier graphs don’t work with continuous variables) so I grouped the tributes into three age categories:

I ran several more of these bivariate analyses, and then finally fit a Cox proportional hazards model, yielding this Stata output:

The only statistically significant effect (at the traditional and arbitrary cutoff of P<0.05) comes from the Gamemakers’ rating variable. The career dummy variable just misses the cutoff (P=0.065) and might be significant if we had a larger sample size and saw similar trends in the data, but effect is in the wrong direction: holding other things constant (sex, age, and Gamemakers’ rating), Careers do less well than non-Careers! Of course, this only happens in this analysis because Peeta and Katniss (but mostly Katniss) are awesome.

Because it would be disappointing to end on a non-visual note, I created a dichotomous low vs. high rating variable based on whether tributes had scores of 3 to 8 or 9 to 11 on the rating_rand variable. Here you can see just how different the survival outcomes for the low and high ability groups are:

My interpretation of this is that **the Gamemakers know what they’re doing when they assign the ratings. **They’ve been doing this for years, so they give scores that are so accurate that they’re actually better predictors of survival time than whether a tribute is a volunteer, a Career, male or female, or forms an alliance. Pretty impressive.

An alternate and more cynical interpretation is that the Gamemakers are concerned about their own reputations and thus engineer the games so as to confirm their ratings, occasionally killing off players who do better or worse than expected based on the ratings, all so that the Gamemakers can look like they knew what they were doing all along. Unfortunately, the political system of Panem ranks so slow on Freedom House’s annual scores that we simply can’t tell what’s going on behind the scenes at all. To cut through their lies we simply need more data.

(P.S. Feel free to run your own analysis of the data, but please credit and link to me. After all, a social science community lacking incentives for researchers to share the data they gather is no community at all.)

**Updates: **I changed the links to the data and .do file to be to Google Docs rather than files hosted on this site. Thanks to my classmate Kristin for some coding suggestions (yay “peer review”!). I also added the results table from the regression model and some commentary related to it. And then I added some clarifying comments throughout the post but haven’t noted every minor change.

To head off the data analysis geeks: I realize there are some things that could be better, as with the last two data points (Katniss and Peeta) which should be censored rather than failing at day 20… but this already took a while to write up and I need to get back to my classes.

This is amazing.

I would strongly suggest you submit this for publication in the BMJ/CMAJ Christmas issue. They love this sort of thing.

Great job dude. Great job.

This is terrific. I don’t think I have seen a clearer explanation of survival analysis.

Thanks!

AMAZING! Combines my absolute love of The Hunger Games with my hatred of biostatistics. <3

I’m not a number-cruncher but I would assume that the use of tesserae would vary not only with socio-economic status, but with the number of children in the family, and with the age of the child. That is, the younger the child the more future years of tesserae they represent to the family in economic terms, so the less incentive to risk their lives by taking too many tesserae in any year. But the older they get, the more valuable to the family to max out on tesserae in any given year.

Charli – that’s a great thought which hadn’t occurred to me. Then again, the older kids are already at the highest risk, and families might choose to equalize the risk across their children by having the younger ones take out tesserae too. But I think you’re right that overall this would lead to more older kids taking tesserae (at least in families with multiple children) than younger kids.

Another potential problem would be socioeconomic status affects the number of kids: could be because wealthy families have more because they can feed them anyway and want to hedge their bets against losing a kid to the reaping, or it could be the poorer families will have more kids to be able to ensure a constant flow of food for the others if one is taken. Hard to know which of those countervailing forces would win without having more data.

In terms of the story, both Gale and Katniss tried to insist that their younger siblings not dip in to the tesserae. I think the idea of equalizing the risk makes sense on paper, but in the case of two of the main families of the book, they weren’t willing to take that risk. For the Mellarks, we don’t see that there was a need–and Peeta’s reaping was based on his being entered four times( I think he’s sixteen).

Great article.

“… there is no evidence that the Gamemakers are rigging the draws.”

would be better phrased as

“… there is not strong evidence that the Gamemakers are rigging the draws.”

Any single year can be taken as evidence for or against them rigging the draws, depending on what pattern they might be trying to rig.

Ray – I updated that section a bit. Let me know if you think its’ better now?

I bet Marie Diener-West would totally approve of this use of Stata

Zach — hah! I actually emailed it to her and Tonascia (I had Diener-West for 621-3 and Tonascia for 624) and they’ve put it up as the link of the day on the Bio624 website.

Hope your Christmas BMJ paper is going to note the limitations of a) possibly informative censoring b) flaky asymptotics (i.e. too-small confidence intervals) of standard Cox regression output, in small samples and c) not exactly pre-specified hypothesis.

Oh definitely, as well as the limitations of filtering information through the mind of an author — who knows what dramatic license she took with what truly happens in Panem! On a) though, none of the data are censored prior to Katniss and Peeta when they win. I.e., everyone else “fails”.

You should give a talk at the Joint Statistical Meetings. I bet you can even win a biostat student paper competition with that. With minimal effort, this can be converted to a publication in TAS or Significance.

A small note: you would want to run the analysis with the randomly assigned ranks as a multiple imputation analysis: create five or ten sets of random ranks, run your Cox ph model, and summarize them, all using -mi-. As a social scientist-slash-biostatistician, you would want to figure this technique out, anyway.

Shouldn’t you add Prim to the calculation of whether the Gamemakers rig the lottery? We don’t have any information about the lottery-chosen tributes from the other districts with volunteers, but we do know that the lottery selected at least two 12-year olds this year, not just the one shown in your graph.

Amazing page.

Jessica – thanks for pointing that out. I actually thought of it at one point and made a note, but then forgot to do it. Including Prim brings the actual number of 12 year olds from 1 up to 2 (out of a new total of 17 chosen by lottery), and changes the expected number (but not the proportions) of tributes to 0.61, 1.21, 1.82, 2.43, 3.04, 3.64, and 4.25, in increasing age from 12 to 18. The new command for the test is “chitesti 2 0 1 5 4 3 2 \ 0.607142857 1.214285714 1.821428571 2.428571429 3.035714286 3.642857143 4.25″. This yields a Chi-square statistic of 1.62, so while including Prim is correct, it doesn’t change the conclusion that this distribution could happen by chance.

Hmm. I’d like to see this re-run with Katniss excluded on the grounds that plot armor might negate the significance of her other attributes. Good review of the existing literature, though.

This is awesome! I’m a political philosophy student by training, and considering my past aversion to the words “bivariate”, “statistically significant” and “chi-square test” I enjoyed it more than I expected. Finally, my undergraduate course on empirical methodologies comes into use!

Question: this was sort of addressed in one of the other comments, but I’ve been curious about whether or not the math actually confirms Gale’s (implied) theory that it is strategically advantageous to have one child (out of multiple children) take on tessara on part of the whole family. Assuming the goal of the family is to not have *any* of their children be chosen as tribute, is it better (read: smarter) to have each child take on their own tessarae, or to lump them all on one of the children? I assume the answer would change depending on how many children there were in the family unit, but I still would have thought that it would be safer to keep the odds low on each individual child’s selection, than to gamble on the one heavily tessara-ed child not being selected.

In regards to Gale’s theory, it would benefit the younger kids, but be the same for the family as a whole either way. For example, if you have 5 children with 3 tessera apiece, you have 20/X chance of one child getting chosen (5 normal entries + 15 tessera)…which ends up being the same for when you have 5 children with 1 of them taking out the 15 tessera on his own (i.e. you still have 20/X chance of one of them getting chosen…it’s just more likely to be the oldest one).

I think Mary’s response here is right. Everyone has a baseline risk that increases over time, and then the tesserae add to that. The only complication is that you could have two children get picked (one male, one female) from the same family, so if you don’t want that to happen you’d want to shift more of the risk over to the children of one gender. However, that’s such low odds (assuming the Gamemakers don’t rig it because they think two kids from one family would be a good spectacle) that it doesn’t seem like it would come into play. So maybe Gale’s implied theory is an indication that people have a hard time thinking objectively about the situation – the power of fear!

Wow, great analysis! I didn’t think about the draws being rigged at all. From looking at the numbers, I mean, the Capitol would totally do that, I have no doubt. They looked ok and statistically random to me. Especially when we know of only one year. So the age distribution didn’t bother me but the victor distribution however… In Catching Fire we find out that during the 74 years of the Games, there has been at least one male and female victor from each district! What are the odds for that? Even if they all had the same chance of winning (which they clearly do not, given the superiority of the career districts) it still seems a bit odd that there has been one victor of every “kind”.

What IS the probability for that? In other words: what’s the probability to get at least one of every number when rolling a die for, say, 10 times? Anyone? It’s been way too long since I did this…

Frida — that’s a great tidbit, which I hadn’t come across since I haven’t read Catching Fire yet. I think it’s just a version of the Coupon Collector’s Problem (http://en.wikipedia.org/wiki/Coupon_collector's_problem). Wolfram Alpha actually has a calculator for this sort of problem at http://demonstrations.wolfram.com/CouponCollectorProblem/, which says that for 24 coupons (or types of winners by district*sex) the expected number of draws (instances of the Games) required to get all of them would be ~91. That’s more than 74, but not a ton more, so it’s conceivable it could happen by chance, or that they’re rigging the draws…

Really great analysis – I wonder how it would’ve differed in District 12 in a few years given the mine explosion. Presumably there would be both a smaller 12 year old pool 13 or so years after the mines exploding due to the lack of fathers as well as – similar to Gale – an increase in the number of tesserae for the children due to the lack of income/dual income. I also wonder about the distribution in ages across the Districts over time – it seems through reading the books that the economies of each District had been stagnant for several years, but over the full 74 years some changes had to have occurred. I’d love to see the distributions over time.

It wouldn’t surprise me in the least if the Capitol rigged the show – I imagine it would’ve occurred after one year seemed boring – like they all died of TB instead of killing each other.

Brett – thank you! Of course it is! That’s an interesting one, I need to look into it.

What seems to be missing in the analysis of “do I take more tesserae if it increases the chance my children will be picked” is the size of the total number of lots. In the movie (I haven’t read the book yet), district 12 is depicted as a small mining town, and all the children can easily be gathered in a small plaza. In that case, the 42 lots of Gale is significant, specially if the other families haven’t opted for tesserae. In the movie, half the ballots in the bowl that’s holding the male ballots could have been Gales — it’s not that big. But a small mining town can never supply the mining demands for a future country the size of North America. Districts are more likely to have millions, if not tens of millions of people. Meaning there are tens or hundreds of thousands of kids to choose from.

This means that the larger to population of a district is, the lesser the investment (increased chance of losing a kid) a family has to do for the same reward (additional tesserae). That is, as a family I may decide not to go for additional tesserae if the chance of one of the kids being selected grows from 2% to 10%, but I may opt to do so if the chance increases from 0.0002% to 0.0010%. In both cases, the chance increases five fold, but the absolute chance is still very low.

I was sitting in my graduate level health economics class this morning when some oh-so-very libertarian classmates went off on a welfare economics rant. I won’t bore you with their arguements because they were as predictable as what you’re currently imagining. After a couple snide counter-arguements on my part, the room went silent. A few seconds of blank noise later, I looked to our professor and very matter of factly asked if I can give a five minute presentation on Hunger Games economics in our next class. When libertarian friend 2 asked, “You mean, like the book?” I replied, “Sure.” Becuase it’s not just about the book, is it? If we can get mainstream attention for the stats and econ behind a smart fiction series, I hope we can channel the public’s fervor to the international and domestic implications the book offers. Needless to say, I’ll cite you during Wednesday’s class. Happy Hunger Games, and “May the odds be ever in your favor.”

Hah. I appreciate the sentiment, but I’d also hope a graduate health econ class could drill down into the economics of health in the real world too!

I LOVE(!!!!) your post.

I now just wish someone would do it in R

Amazing work! This combines two of my favorite things as a biostatistician. My only suggestion would be that when testing association use Fisher’s Exact Test rather than a Chi^2 test. As I’m sure you know, Chi^2 tests have problems with low expected cell counts (less than 5). Fisher’s test bypasses that problem, and still tests the same thing. Anyways, I have to get back to my actual homework but I loved the post!

Jay — yes, true, but I couldn’t figure out how to get a command similar to chitesti to work for an exact test. I’m sure there’s a command out there, but I wasn’t able to find it very quickly and decided I was already spending too long on the post. If I had realized how many people were going to end up reading this I might have taken more time to find it…

You are ignoring the fact that tesserae are permanent. If you take one when you are 12 it gives you an extra chance to be selected that year *and every future year*. Thus a tesserae taken at the age of 12 is six times more expensive than one taken at the age of 18. A 12-year-old can only have 5 extra chances from tesserae, whereas an 18-year-old can have 30 extra chances. That’s because an 18-year-old can have 5 chances from this year’s tesserae and 25 from previous tesserae.

So even if people chose to take tesserae at random, older children would be far more likely to be selected. Given the reality that tesserae for younger children are far more expensive it makes sense for older children to always take all of the tesserae required for the family. I know that if I were a parent I would insist upon it and disown older children who didn’t do so. Most parents would do the same and it would simply be culturally unacceptable for older children to allow their younger siblings to take them. Older siblings who didn’t take their maximum number would be the victims of constant harassment, thrown rocks, beatings, etc…

I don’t ignore that, and in fact I’m pretty explicit on the fact that I take it into account. I agree that older children are more likely to be selected (see the graph of the expected probability of being selected). The thing we disagree on is whether that means families will have the older kids take all the tesserae — I think it’s also quite possible that parents wouldn’t want to put undue risk on any single child and would thus spread the tesserae out, or that they wouldn’t want to risk an older child (who is more productive as a breadwinner) if they only have younger children.

In short, I went with the simplistic assumption that tesserae are taken out at equal amounts in each age because it made the calculation possible. If the reality of the situation differs greatly from those assumptions (in either direction) then the results may be invalid. But that’s the value of clearly stating assumptions (which are almost always necessary for any sort of analysis): readers can see why you’re reasoning the way you are and decide for themselves how valid they think the argument is.

Very cool! It’s too bad you can’t take into account whether or not a tribute had Sponsors (or maybe # of Sponsors, or gifts received). This might mean you need an interaction term in your model!

epi — I thought about that, but decided that I didn’t have complete enough information for it to be worthwhile. But forming sponsors and alliances are both things that really only have an effect after the bloodbath, which makes me think that a better way of looking at this problem may be to first ask ‘what makes someone more likely to survive the bloodbath?’ and then have a separate question be ‘what is associated with survival post-bloodbath?’…

Very interesting analysis.

One attribute that would be worthwhile considering analysis, especially since you are concerned about the effects of Gamemaker interference would be to analyze the tributes by those who died from other tributes versus those that died from other means… often ad hoc rule changes made by the Gamemakers on the Games. The obvious one is the introduction of the mutant dogs that was a clear kill of the fourth-to-last tribute. This is in contrast to the fireball effects created by the Gamemakers to kill Katniss (but didn’t).

This could be expanded to include Cato (who was killed by Katniss’ arrow but out of mercy for him being eaten to death), the tribute girl who died of the berries (which would seem odd since Katniss knew their lethality but both the tribute and Peeta did not), the tribute killed by Trackerjackers (although it was instigated by Katniss and Rue), the career who was killed during the Gamemaker present drop, etc.

This would allow to really see how much Gamemaker “interference” then impacts survival.

It might also be interesting to analyze the effects of gifting on survivability (as this proved critical in the story to Katniss’ survival).

JF – thanks, nice idea. I think this would work if I had a broader data set (maybe 10 years’ worth of games) but it’s hard to imagine there’d be enough sample size to do a good comparison of players-killed-by-others vs. players-killed-by-Gamemakers. If only we had all 74 years of data!

> Unfortunately, the political system of Panem ranks so slow on Freedom House’s annual scores that we simply can’t tell what’s going on behind the scenes at all.

I was really disappointed when the link didn’t work.

Oh crap, they must have taken it down! I swear they had a report up (ranking Panem just below North Korea) when I looked the first time…

I found certain aspects of the movie very annoying. So, the whole district, a sizable fraction of NA, is at this gathering… and there are only a thousand or so of them? What? And so forth.

The moviemakers were obviously not expecting a substantial portion of the movie-going public to be numerate.

Toms point that tesserae are cumulative may not change the fact that the expected age curve is linear, but is could very well change the slope of that curve. I’d guess that the actual distribution is still well within the realm of possibility.

Did you do a graph for survival vs volunteer? Looking at the career graph, and noting that there is one additional volunteer who also did very well, I’d think that graph would come close to having a predictive value, perhaps one that in most years would corelate well to the “career” probability as well.

Lastly, it’s interesting to try to extrapolate from this one iteration of the games given how profoundly bizzare they were (teaming up by district, two winenrs,dist. 12 having a volunteer and winning, etc.)

Don’t bother to read the other two books. The first book is great, but the series goes downhill quickly. Katniss, who was just barely likable in the first book, becomes both less likable and less believable with each new book. The last book is just random ‘war stinks’ imagery.

Btw – sorry for mentioning the second book in my comment! I hope you didn’t have the spoiler alert where you asked not to write about the other two books there earlier. It’s not a horrible spoiler though, and I think I assumed that since you know so much about the first book, you definitely read the other two, and that the same would go for most other readers of this awesome post. Anyway, just wanted to make that clear, since I’m very sensitive to spoilers myself and always try to be careful!

Dear Brett,

Nice analysis. As a wonky prof, I do think that you might actually have a violation of the PH assumption for gender and career. Note that your K-M curves cross, which would violate the proportionality of the hazard functions needed for the Cox model. The same appears to be holding for the career variable. You could fit PH models where you stratify the baseline hazard on these variables. More generally, I also worry about a bit about asymptotic normality needed for your regression results given that you have 24 observations. Again, overall this is quite awesome.

Debashis

Nice effort on the analysis, but the data is not suitable for KM and Cox. In KM, Cox and practically almost everything that requires statistical inference on a population, your variable of interest should be in no doubt independent from sample unit to sample unit.

Since your variable of interest is life span during the game where increasing ones chances in a longer life means deterring another persons lifespan (i.e. killing them), then obviously your variable of interest is dependent from sample unit to sample unit.

Your test for determining whether the gamemakers rig the selection of tributes is inappropriate, since the way of selecting tributes is by district. In the way your testing whether the selection was rigged, you are assuming that the tributes were taken as a lot regardless of how many are taken from a district. And the way you computed the expected frequency assumes that the number of 12 year olds equals the number of 13 year olds and so on when it is not certain.

Thanks for the blog. It was entertaining.

You may be interested in another way at looking at the statistics – ask how many battles the winner needs to survive. I’ve reported on my modelling of this at http://unconsenting.wordpress.com/2013/01/31/who-wins-the-hunger-games/.

Thank you! My psych statistics students thank you for making the class session on chi-squares more interesting

Nice one! Reading this, a colleague of mine suggested a similar piece on the Game of Thrones. There’s a lot of dying there, too…

One comment: Running a Cox model you HAVE to check the appropriateness of the proportional hazards assumption. If hazards turn out to be non-proportional, this often times is due to time-dependent effects. Some of your non-significant covaritates could then become part of a significant interaction with survival time, as in: Being a carreer tribute reduces the risk of being killed in the beginning of the game, but this effect wears off (or evern reverses) as the game continues.

Cheers,

a.

I have something regarding Game of Thrones in mind actually — not a survival analysis, but something taking advantage of the excess dying!

And you’re spot on regarding proportional hazards; several other commenters pointed that out as well. Will hopefully be updated in a future version of the analysis…

In the most recent film, the tributes are taken from previous winners. What are the chances there is actually 1 male and 1 female winner still alive from each of the 12 districts? I would imagine there would be a higher percentage of winners from the ‘career’ districts, as well as more male than female. The chances of there being a female winner still alive from each district 4-12, seems highly unlikely.