The mathematical modeling tools we employ at once extend and limit our ability to conceive the world. - David Hestenes6
There were no Hispanics living in the United States before 1970. At least, there weren’t according to the census. There couldn’t be, because the census form did not include “Hispanic” or “Latino” or anything like it.iii
Actually there were about nine million Hispanics living in the country by 1970.7 In many ways the lack of census data made them invisible. You couldn’t say with certainty where they were living. It would have been difficult to know how the health, education, and income of Hispanic families compared to other families, much less contemplate ways to close the gaps. You wouldn’t even know how many people might be affected if you did.
Quantification is the process that creates data. You can only measure what you can conceive. That’s the first challenge of quantification. The next challenge is actually measuring it, and knowing that you measured it accurately. Data is only useful because it represents the world, but that link can be fragile. At some point, some person or machine counted or measured or categorized, and recorded the result. The whole process has to work just right, and our understanding of exactly how it all works has to be correct, or the data won’t be meaningful.
Sometimes this is not a simple thing to do. It seems clear enough how to quantify the number of cars sold or the amount of grain exported, where counting has the feel of something objective and definite. But journalists are interested in many other things where the proper relationship between the words, the numbers, and the world is much less clear.
Are mass shootings more or less common today than 10 years ago? What fraction of the population is Hispanic? How many people suffer from depression? These seem like questions that counting can answer, but “mass shootings,” “Hispanics,” and “depression” are not easy things to count. Who, precisely, counts as depressed? And how would you determine the number of depressed people in the entire country?
Quantification is a problem without a home. Statisticians and computer scientists do not normally spend a lot of time asking how data came to be. Actually, their methods are powerful precisely because they are abstract. Physicists and engineers were the first to think seriously about quantification, and they have carefully developed the processes of measurement over many centuries. Even in such “hard” disciplines there are many choices that must be made about what gets measured, but these fields usually only deal with quantities that can be expressed in the units of physics. Econometrics broadened the horizons, but it is psychologists and social scientists who have thought most deeply about the quantification of people and societies, the sorts of quantifications that are often most interesting and most vexing to a journalist.iv
I’m going to try to give the flavor of the problems of quantification with two examples: recording someone’s race in a database and estimating the monthly unemployment rate. The first is a parable about the difficulty of categories. The second is a tour through the beautiful ideas of random sampling and quantified uncertainty so central to modern statistical work. But before we can get there, we have to talk about what makes something “quantitative” at all.