This is all very meta

One of the best things about XKCD is that the mouse-over text (simply rendered using the title attribute in the <img> HTML tag) will almost always give you a second laugh or an interesting thought. This comic isn’t his best, but it has this great mouse-over text:

Wikipedia trivia: if you take any article, click on the first link in the article text not in parentheses or italics, and then repeat, you will eventually end up at “Philosophy.”

Naturally, I went to Wikipedia and clicked Random Article, which gave me the Bradford-Union Street Historic District in Plymouth, Massachusetts. The links followed in order were:

  1. Bradford-Union Street Historic District
  2. Plymouth, Massachusetts
  3. Plymouth County, Massachusetts
  4. County (United States)
  5. U.S. state
  6. Federated state
  7. Constitution
  8. State (polity)
  9. Institution
  10. Social structure
  11. Social sciences
  12. List of academic disciplines
  13. Academia
  14. Organism
  15. Community
  16. Interaction
  17. Causality
  18. Event
  19. Observable
  20. Physics
  21. Natural science
  22. Science
  23. Knowledge
  24. Fact
  25. Information
  26. Sequence
  27. Mathematics
  28. Quantity
  29. Property (philosophy)
  30. Modern philosophy
  31. Philosophy

According to the Wikipedia page on the phenomenon (of course there’s one, which also of course already referenced the XKCD mention) the longest known link chain is just 35 links back to philosophy, so my random find is way up there.

It’s interesting to observe how the selections work back from entries on specific, literal things to broader categories.  The selections go from place to categories of place to knowledge to a meta-description of what knowledge. I imagine that if you grouped Wikipedia entries by category you’d see similar chains leading back to philosophy. For instance, I think all place names should follow a similar trajectory to my example.

I also wonder what the distribution would look like if you took a list of all entries on Wikipedia and graphed them by this philosophy index number. I think all articles listed together would be messy, but a list of articles weighted by web traffic would yield a a logarithmic distribution with the bulk of the entries being people, places, or things that are far from philosophy but eventually link there. Also, the distribution of any single category (places in the United States, for example) should be more similar to a normal distribution, and the narrower the category is the more true that would be. Now if someone will just build a computer program to test my hypothesis.


Everyday science

David Brooks highlights a discussion “on what scientific concepts everyone’s cognitive toolbox should hold” on Brooks’ first highlight is this:

Clay Shirkey nominates the Pareto Principle. We have the idea in our heads that most distributions fall along a bell curve (most people are in the middle). But this is not how the world is organized in sphere after sphere. The top 1 percent of the population control 35 percent of the wealth. The top two percent of Twitter users send 60 percent of the messages. The top 20 percent of workers in any company will produce a disproportionate share of the value. Shirkey points out that these distributions are regarded as anomalies. They are not.

The full symposium is here. I’m not sure these individual insights are science or even scientific concepts, as much as “insights on thinking that some scientists have found useful” — but still interesting. Here’s Richard Dawkins on the Double-Blind Control Experiment (emphasis added):

….Why do half of all Americans believe in ghosts, three quarters believe in angels, a third believe in astrology, three quarters believe in Hell? Why do a quarter of all Americans and believe that the President of the United States was born outside the country and is therefore ineligible to be President? Why do more than 40 percent of Americans think the universe began after the domestication of the dog?

Let’s not give the defeatist answer and blame it all on stupidity. That’s probably part of the story, but let’s be optimistic and concentrate on something remediable: lack of training in how to think critically, and how to discount personal opinion, prejudice and anecdote, in favour of evidence. I believe that the double-blind control experiment does double duty. It is more than just an excellent research tool. It also has educational, didactic value in teaching people how to think critically. My thesis is that you needn’t actually do double-blind control experiments in order to experience an improvement in your cognitive toolkit. You only need to understand the principle, grasp why it is necessary, and revel in its elegance.

If all schools taught their pupils how to do a double-blind control experiment, our cognitive toolkits would be improved in the following ways:

1. We would learn not to generalise from anecdotes.

2. We would learn how to assess the likelihood that an apparently important effect might have happened by chance alone.

3. We would learn how extremely difficult it is to eliminate subjective bias, and that subjective bias does not imply dishonesty or venality of any kind. This lesson goes deeper. It has the salutary effect of undermining respect for authority, and respect for personal opinion….


Review: "The Panic Virus"

Review of The Panic Virus, by Seth Mnookin. Simon & Schuster Jan 2011 (Available at Amazon) [Disclosure: I got a free copy of the Panic Virus from a friend who has a friend that works at the publisher — I wasn’t given the copy specifically to write a review, but it’s still probably better to disclose I didn’t pay for the book.]

Seth Mnookin’s The Panic Virus starts and ends with two stories of parents whose seemingly normal children come down with a serious illness. He describes their children before the episodes, and then their dread as they go downhill, are hospitalized, and fight for their lives. These stories intentionally parallel the narrative of the vaccines-cause-autism movement — “our child was normal, then he got the vaccine, and then he got autism, so it must have been the vaccine.” However, Mnookin’s carefully chosen stories don’t support the anti-vaccine movement; they do just the opposite and make you feel heartsick for the children affected by vaccine-preventable diseases.

Mnookin knows how to tug on heart strings, and how to get his readers riled up, so it’s a good thing that he comes down strongly pro-vaccine. His case studies are selected for emotional value, and they illustrate how a thoughtfully written narrative can humanize statistics about disease outbreaks and the danger of the anti-vaccine movement. But I approve of Mnookin’s tactics ultimately because his stories are true — vaccines save lives, and much harm has been done by the spread of unfounded fear.

That said, Mnookin’s book isn’t at all a fearmongering tale of what will happen if you don’t vaccinate your child — the bookend stories are just that, and he could probably have included a few more narratives throughout without stretching it. For the most part his book is a sober narrative of a social movement that goes back to the earliest vaccines, but has only come to nationwide fruition with the rise of the Internet.

Mnookin chronicles the development of early vaccines, and, to his credit, spends a good deal of time on what was done badly by the scientists and advocates. The Cutter Incident is there,  along with the 1976 swine flu vaccine. Mnookin doesn’t mince words in describing injuries that have been caused by vaccines, and at many times I found myself cringing and thinking “why weren’t better systems in place earlier?” and “they really should have done more”.

This willingness to confront unpleasant truths is a strong point for the Panic Virus, and it also gives Mnookin an opportunity to introduce the safety innovations that stemmed from each incident, all while setting the stage for the anti-vaccine movement. Another strength is that The Panic Virus also offers compelling humanizations of many of the parents of autistic children who have been involved in the anti-vaccine movement. Their despair at seeing their children suffer, their ostracization in a society where autism is not accepted, their occasionally callous treatment by physicians who have no easy answers to offer — all of this makes it impossible not to sympathize with them.

For the most part, Mnookin doesn’t present parents as the villains of his story. That role is reserved for shoddy physicians, scientists and pseuodoscientists, and most of all for journalists. Andrew Wakefield, Mark and David Geier, and journalist/author David Kirby all come in for harsh reckonings, along with many other “expert witnesses” for anti-vaccine lawsuits. This book left me quite depressed regarding the role of journalists and TV personalities in the whole fiasco. There has been so much bad reporting, and so little good.

While reading The Panic Virus, I kept thinking that its major shortcoming is a lingering uncertainty about its target audience. Is Mnookin writing for the uninitiated who want an introduction to where the anti-vaccine movement? Or is he writing a broadside for those already staunchly in the pro-vaccine community? There are sections where the rhetoric made me think it was the latter, while the majority of the book seems to be for those with little outside knowledge of vaccine science. Since Mnookin cautions so much against being led astray by charlatans who peddle fear with a thin veneer of scientific-sounding verbiage, I wish he had done a bit more to explain the science done in recent years on vaccine safety, thiomersal, MMR, and autism. I understand why an author writing a popular narrative would avoid trying to describe these subjects: they are incredibly complicated and divert the reader from the narrative. [Note that I haven’t read Paul Offit’s Autism’s False Prophets, which I understand might have a bit more of that.] And it’s not like good science writing is entirely missing from The Panic Virus. Some things are explained well, but overall there’s just a bit too much deference to the authority of  science and scientists for my tastes, especially for a book intended for lay audiences. It’s a good book, but not a great book.

I also wish Mnookin had provided a better counter-narrative in the second half of the book. Broadly speaking, the first half follows the development of vaccines and early vaccine injury scares (founded and unfounded), and the second half explores the rise of the anti-vaccine social movement. The second half is missing strong pro-vaccine characters, such as one or two scientists or policymakers who have been working to combat the anti-vaccine crowd. A lot of good research has been done to disprove fallacious claims, and to look for policy solutions aimed at decreasing opt-out rates on a state level, but none of that is here.

To date the anti-vaccine crowd has really won the narrative war: their message is simpler, and scarier, and has the added perk of being anti-establishment in appealing ways. The Panic Virus didn’t give me much hope that that would change soon — although the book itself is mostly a step in the right direction, combining a pro-science view with a few emotional narratives about vaccine-preventable diseases.

Our best hope is that eventually our scientific explanations of autism etiology will solidify a bit more, and coupled with much more demonstrably effective treatments, the snake oil appeal of the “cures” sold by the anti-vaccine movement will lose their charm. One theme of the Panic Virus is that the anti-vaccine movement arose because parents of autistic children weren’t getting the sympathy, explanations, and help they needed. Many factors including a lack of understanding by doctors and communities, isolation, weak scientific explanations, and a lack of viable treatments all created a situation like a field of dry grass. When a powerful idea — “vaccines cause autism” — arose and was amplified by the echo chambers of Internet communities, it ripped through the dry field like a wildfire, sowing panic and fear. And the fire still hasn’t been put out.


I arrived back in my hometown of Searcy, Arkansas. I haven’t been back in a year — I was living in DC from January through June, then traveling in Guatemala for the summer, and most recently living in Baltimore — so it’s good to be back. Searcy is ~18,000 people in central Arkansas, where the flat plains of the Mississippi Delta meet the first foothills of the Ozarks. The town once put up a billboard that proclaimed “Searcy, where thousands live as millions wish they could.” It’s also the home of Harding University, a conservative Christian university affiliated with the Church of Christ. My dad’s been a professor at Harding for decades, so Searcy has always been home and likely always will be.

Because I lived in the same small town for the first 23 years of my life, moving to Washington, DC in May of 2008 was a huge change. I had culture shock, but mostly in positive ways. When people would ask me if I liked DC, I would answer “Yes! But… I don’t really have anything to compare it to, because it’s my first city.” I was never sure if I just liked DC, or what I actually liked was the urban environment in general. (My friends from NYC laugh because DC hardly feels urban to them.) Now that I’ve lived for several months in my second city, Baltimore, I can say that I do like it, but maybe not as much as DC.

One realization I’ve had over the last year is that I believe the divide between urban and rural America (to dichotomize it) is as significant, or maybe more significant than the divide between liberal and conservative, religious and secular. Most of my friends from high school and college are rural, Southern, politically conservative (though often apathetic), married (some with kids on the way), and quite religious — of the evangelical Christian persuasion. No Jews, Muslims, Hindus, or Buddhists here. All of those adjectives (except married) once described me, but now I’m a politically liberal, secular, single young professional living in a big city. Yes, these traits are correlated: there are relatively few very religious young professionals living in big cities, there are relatively few hardcore secularists in rural Southern towns. But I think the urban/rural divide has a bigger impact on my daily experience, and on shaping my views and actions, than any of the others traits.

I think I’ve become a thoroughly urban creature, but the small town roots linger. I like so many things about cities: the density that leads to so many people, so many jobs and so much food, culture, entertainment and transportation all being close. But I also like the space and beauty of the small town. That’s kind of a universal American narrative in a way; we all like to think we were born — and remain rooted — in small towns, even though the majority of us live in cities. I appreciate having grown up in a small town, and it’s nice to be back for an occasional visit, but it’s hard for me to imagine coming back here to live.


A few random observations from my visit back to Arkansas so far:

1) Little Rock ain’t that big, though it felt huge when I was growing up.

2) There’s so much space around the roads and freeways, and within the towns. The space gives you a sense of openness, but it also means you have to drive everywhere.

3) I’m at my favorite coffeeshop in town (one of two, and the only other options for places to hang out are churches, a Hastings, and Wal-Mart) and the first person who walked in the door after me is wearing a t-shirt that says (only) “JESUS”.

4) People are different. Strong Southern accents for one. A lot more baseball caps, and pickup trucks. Women are wearing more makeup. More overweight and obese people than you typically see on the streets in a city. Lots of white people, few of anything else.

5) Finally, today’s lunch spot was the Flying Pig BBQ:


Several prominent scholars and the Google Books team have published a new paper that’s generating a lot of buzz (Google pun intended). The paper is in Science (available here) and (update) here’s the abstract:

We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of “culturomics”, focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. “Culturomics” extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.

It’s readable, thought-provoking, and touches on many fields of study, so I imagine it will be widely read and cited. Others have noted many of the highlights, so here are some brief bulleted thoughts:

  • The authors don’t explore the possible selection bias in their study. They note that the corpus of books they studied includes 4% of all published books. They specifically chose works that scanned better and have better metadata (author, date of publication, etc), so it seems quite likely that these works differ systematically from those that were scanned and not chosen, and differ even more from those not yet scanned. Will the conclusions hold up when new books are added? Since many of the results were based on random subsets of the books (or n-grams) they studied, will those results hold up when other scholars try and recreate them with separate randomly chosen subsets?
  • Speaking of metadata, I would love to see an analysis of social networks amongst authors and how that impacts word usage. If someone had a listing of, say, several hundred authors from one time period, and some measure of how well they knew each other, and combined that information with an analysis of their works, you might get some sense of how “original” various authors were, and whether originality is even important in becoming famous.
  • The authors are obviously going for a big splash and make several statements that are a bit provocative and likely to be quoted. It will be great to see these challenged and discussed in subsequent publications. One example that is quotable but may not be fully supported by the data they present: “We are forgetting our past faster with each passing year.” But is the frequency with which a year (their example is 1951) appears in books actually representative of collective forgetting?
  • I love the word plays. An example: “For instance, we found “found” (frequency: 5×10^-4) 200,000 times more often than we finded “finded.” In contrast, “dwelt” (frequency: 1×10^-5) dwelt in our data only 60 times as often as “dwelled” dwelled.”
  • The “n-grams” studied here (a collection of letters separate from others by spaces, which could be words, numbers, or typos) are too short for a study of copying and plagiarism, but similar approaches could yield insight into the commonality of copying or borrowing throughout history.


