The Art and Science of Data-Driven Journalism


As Liliana Bounegru highlighted in the introduction to the Data Journalism Handbook, this idea of treating data as a source for the news is far from novel: Journalists have been using data to improve or augment traditional reporting for centuries.11Guardian’s ebook on data journalism, Simon Rogers (now Twitter’s first data editor) said that the first example of data journalism at the Guardian newspaper was back in 1821, reporting student enrollment and associated costs.12however, is very much of the 21st century, although its origin is murky. Rogers said he heard the term data journalism used first by software developer Adrian Holovaty,13may have originated earlier somewhere else in Europe,14in conversations about database journalism of the kind Holovaty advocated.15term’s origins,16the patron saint of data journalism.17talented software developer at the Washington Post and founder of EveryBlock, decried how data was organized and treated by media organizations in a 2006 post on how newspaper websites needed to change.18inspired the creation19Matt Waite. The fact-checking website subsequently won the Pulitzer Prize in 2009.20momentum around the world after Tim Berners-Lee called analyzing data the future of journalism in 2010, as part of a larger conversation around opening government data up to the public through publishing it online.21Datablog. Using structured data extracted from the PDF that the United Kingdom’s Parliament published online,22visualized the expenses of Ministers of Parliament, launching a public row about their spending that has continued into the present day.23journalism based on the War Logs,24of thousands of Afghanistan war records leaked through Wikileaks. Over the following years, the use of the term data journalism began to catch fire, at least within the media world.25adopted by David Kaplan, a pillar of the investigative journalism community, and used as self-identification by many attendees of the annual conference of the National Institute for Computer-Assisted Reporting (NICAR), where nearly a thousand journalists from 20 countries gathered in Baltimore to teach, learn, and connect. 26 It was in 2014, however, that data journalism entered mainstream discourse, driven by the highly publicized relaunch of Nate Silver’s and Vox Media’s April release of general news site, as well as new ventures from the New York Times and Washington Post. On that count, it’s worth noting a broader challenge that the data journalism mainstream presents: the novelty of the term has divorced it from the long history of computer-assisted reporting that came before in public discourse. Hopefully, this report will act as a corrective on that count. Today, the context and scope of data-driven journalism have expanded considerably from its evolutionary antecedent, following the explosion of data generated in and about nearly every aspect of society, from government, to industry, to research, to social media. Data journalists can now use free, powerful online tools and open source software to rapidly collect, clean, and publish data in interactive features, mobile apps, and maps. As data journalists grow in skill and craft, they move from using basic statistics in their reporting to working in spreadsheets, to more complex data analysis and visualization, finally arriving at computational journalism, the command line, and programming. The most advanced practitioners are able to capitalize on algorithms and vast computing power to deliver new forms of reporting and analysis, from document mining applied to find misconduct,27campaigns,28trading plans, and autocompletions.Data journalists are in demand today throughout the news industry and beyond. They can get scoops, draw large audiences, and augment the work of other journalists in a media organization or other collaboration. By automating common reporting tasks, for instance, or creating custom alerts, one data journalist can increase the capacity of the people with whom she works, building out databases that may be used for future reporting. “On every desk in the newsroom, reporters are starting to understand that if you don’t know how to understand and manipulate data, someone who can will be faster than you,” said Scott Klein, a managing editor at ProPublica. He continued: Can you imagine a sports reporter who doesn’t know what an on-base percentage is? Or doesn’t know how to calculate it himself? You can now ask a version of that question for almost every beat. There are more and more reporters who want to have their own data and to analyze it themselves. Take, for example, my colleague, Charlie Ornstein. In addition to being a Pulitzer Prize-winner, he’s one of the most sophisticated data reporters anywhere. He pores over new and insanely complex data sets himself. He has hit the edge of Access’ abilities and is switching to SQL Server. His being able to work and find stories inside data independently is hugely important for the work he does.There will always be a place for great interviewers, or the eagle-eyed reporter who finds an amazing story in a footnote on page 412 of a regulatory disclosure. But, here comes another kind of journalist who has data skills that will sustain whole new branches of reporting.29