The Art and Science of Data-Driven Journalism

I. Introduction

Today, the world is awash in unprecedented amounts of data and an expanding network of sources for news. As of 2012, there were an estimated 2.5 quintillion bytes of data being created daily, or 2.5 exabytes, with that amount doubling every 40 months. (For the sake of reference, that’s 115 million 16-gigabyte iPhones.) It’s an extraordinary moment in so many ways. All of that data generation and connectivity have created new opportunities and challenges for media organizations that have already been fundamentally disrupted by the Internet. To paraphrase author William Gibson, in many ways the post-industrial future of journalism is already here”it’s just not evenly distributed yet.1socially connected friends, family, and colleagues”and delivered by applications and streaming video accessed from mobile devices, apps, and tablets. Newsrooms are now just a component, albeit a crucial one, of a dramatically different environment for news. They are also not always the original source for it. News often breaks first on social networks, and is published by people closest to the event. From there, it’s gathered, shared, and analyzed; then fact-checked and synthesized into contextualized journalism.Media organizations today must be able to put data to work quickly.2during Hurricane Sandy, when public, open government data feeds became critical infrastructure.3decision in which Chief Justice John Roberts opined that disclosure through online databases would balance the effect of classifying political donations as protected by the First Amendment, it’s worth emphasizing that much of the “modern technology” that is a “particularly effective means of arming the voting public with information” has been built and maintained by journalists and nonprofit organizations.4data, computers, and algorithms can be used by journalists in the public interest, but rather how, when, where, why, and by whom.5Today, journalists can treat all of that data as a source, interrogating it for answers as they would a human. That work is data journalism, or gathering, cleaning, organizing, analyzing, visualizing, and publishing data to support the creation of acts of journalism. A more succinct definition might be simply the application of data science to journalism, where data science is defined as the study of the extraction of knowledge from data.6journalism combines:1) the treatment of data as a source to be gathered and validated,2) the application of statistics to interrogate it,3) and visualizations to present it, as in a comparison of batting averages or stock prices.Some proponents of open data journalism hold that there should be four components, where data journalists archive and publish the underlying raw data behind their investigations, along with the methodology and code used in the analyses that led to their published conclusions.7stories with numbers, or finding stories in them. It’s treating data as a source to complement human witnesses, officials, and experts. Many different kinds of journalists use data to augment their reporting, even if they may not define themselves or their work in this way. “A data journalist could be a police reporter who’s managed to fit spreadsheet analysis into her daily routine, the computer-assisted reporting specialist for a metro newspaper, a producer with a TV station investigative unit, someone who builds analysis tools for journalists, or a news app developer,” said David Herzog, an associate professor at the Missouri School of Journalism. Consider four examples: A financial journalist cites changes in price-to-earning ratios in stocks over time during a radio appearance. A sports journalist adds a table that illustrates the on-base percentages of this year’s star rookie baseball players. A technology journalist creates a graph comparing how many units of competing smartphones have been sold in the last business quarter. A team of news developers builds an interactive website that helps parents find nearby playgrounds that are accessible to all children and adds data about it to a public data set.8each case, journalists working with data must be conscious about its source, the context for its creation, and its relationship to the stories they’re telling.“Data journalism is the practice of finding stories in numbers and using numbers to tell stories,” said Meredith Broussard, an assistant professor of journalism at Temple University. To become a good data journalist, it helps to begin by becoming a good journalist. Hone your storytelling skills, experiment with different ways to tell a story, and understand that data is created by people. We tend to think that data is this immutable, empirically true thing that exists independent of people. It’s not, and it doesn’t. Data is socially constructed. In order to understand a data set, it is helpful to start with understanding the people who created the data set”think about what they were trying to do, or what they were trying to discover. Once you think about those people, and their goals, you’re already beginning to tell a story.Data-driven reporting and analysis require more than providing context to readers and sorting fact from fictions and falsehoods in vast amounts of data. Achieving that goal will require media organizations that can think differently about how they work and whose contributions they value or honor. In 2014, technically gifted investigators in the corner of the newsroom may well be of more strategic value to a media company than a well-paid pundit in the corner office. Publishers will need to continue to evolve toward a multidisciplinary approach to delivering the news, where reporters, developers, designers, editors, and community managers collaborate on storytelling, instead of being segregated by departments or buildings.Many of the pioneers in this emerging practice of data-driven journalism won’t be found on broadcast television or in the lists of the top journalists over the past century. They’re drawn from the pool of people who are building collaborative newsrooms and extending the theory and practice of data journalism. These people see the reporting that provisions their journalism as data, a body of work that itself can be collected, analyzed, shared, and used to create insights about how society, industry, or government are changing.9following report, I look at what the media is doing, offer insights from data journalists, list the tools they’re using, share notable projects, and look ahead at what’s to come”and what’s needed to get there. You’ll also find more to read and consider in the Data Journalism Handbook that O’Reilly Media published in 2012.10