The Art and Science of Data-Driven Journalism

An Internet Inflection Point

At the start of the 21st century, a revolution in mobile computing; increases in online connectivity, access, and speed; and explosion in data creation fundamentally changed the landscape for computer-assisted reporting. “It may seem obvious, but of course the Internet changed it all, and for a while it got smushed in with trying to learn how to navigate the Internet for stories, and how to download data,” said Sarah Cohen, a New York Times investigative journalist and a former Knight professor of the practice of journalism and public policy at Duke University. She added:Then there was a stage when everyone was building internal intranets to deliver public records inside newsrooms to help find people on deadline, etc. So for much of the time, it was focused on reporting, not publishing or presentation. Now the data journalism folks have emerged from the other direction: People who are using data obtained through APIs often skip the reporting side, and use the same techniques to deliver unfiltered information to their readers in an easier format than the government is giving us. But I think it’s starting to come back together”the so-called data journalists are getting more interested in reporting, and the more traditional CAR reporters are interested in getting their stories on the Web in more interesting ways.Given the universality of computer use today among the media, the term computer-assisted reporting now feels dated, itself inherited from a time when computers were still a novelty in newsrooms. There’s probably not a single reporter or editor working in a newsroom in the United States or Europe today, after all, who isn’t using a computer in the course of his or her journalism.Many members of the media, in fact, may use several during the day, from the powerful handheld computers we call smartphones, to crunching away at analysis or transformations on laptops and desktops, to relying on servers and cloud storage for processing big data at Internet scale. Much has changed since Philip Meyer’s pioneering days in the 1960s, offered Scott Klein: One is that the amount of data available for us to work with has exploded. Part of this increase is because open government initiatives have caused a ton of great data to be released. Not just through portals like”getting big data sets via FOIA has become easier, even since ProPublica launched in 2008.Another big change is that we’ve got the opportunity to present the data itself to readers”that is, not just summarized in a story but as data itself. In the early days of CAR, we gathered and analyzed information to support and guide a narrative story. Data was something to be summarized for the reader in the print story, with of course graphics and tables (some quite extensive), but the end goal was typically something recognizable as a words-and-pictures story.What the Internet added is that it gave us the ability to show to people the actual data and let them look through it for themselves. It’s now possible, through interaction design, to help people navigate their way through a data set just as, through good narrative writing, we’ve always been able to guide people through a complex story.The past decade has seen the most dynamic development in data journalism, driven by rapid technological changes. Ten years ago, “data journalism was mostly seen as doing analyses for stories,” said Chase Davis, an assistant editor on the Interactive News Desk at the New York Times. He explained:Great stories, for sure, but interactives and data visualizations were more rare. Now, data journalism is much more of a big tent speciality. Data journalists report and write, craft interactives and visualizations, develop storytelling platforms, run predictive models, build open source software, and much, much more. The pace has really picked up, which is why self-teaching is so important.These are all still relatively new and powerful tools, which both justify excitement about their application and prompt understandable skepticism about what difference they will make if practicing journalists or their editors don’t support developing digital skills. Going digital first brings with it concerns about potential privacy, security, and sustainability relying upon third parties.