The Art and Science of Data-Driven Journalism

On Empiricism, Skepticism, and Public Trust

While the tools and context may have evolved, the basic goals of data-driven journalism have remained the same over the decades, observed Brant Houston, former executive director of Investigative Reporters and Editors. “Sift through data and make sense of it, often with social science methods,” he said. Today, powerful open source frameworks for the collection, storage, analysis, and publication of immense amounts of data are integrated with rigorous thinking, sound design principles, powerful narratives, and creative storytelling techniques to produce acts of journalism. Practiced at the highest level, data-driven journalism can be applied to auditing algorithms or testing whether predictive policing is delivering justice or further institutionalizing inequities in society.53responsible for mistakenly targeting an innocent citizen or denying a loan to another, the skills required of watchdog journalism move well beyond the rapid production of infographics and maps. “Data is at the heart of what journalism is,” said New York Times developer advocate Chrys Wu, speaking at the White House in 2012, “and the more substantive it is, the more organized it is, the more easily accessible it is, the better we all can understand the events that affect our world, our nation, our communities, and ourselves.”54sources, however, not all data sets are synonymous with facts. They must be treated with skepticism, from origin to quality to hidden biases. “The Latin etymology of ”data’ means ”something given,’ and though we’ve largely forgotten that original definition, it’s helpful to think about data not as facts per se, but as ”givens’ that can be used to construct a variety of different arguments and conclusions; they act as a rhetorical basis, a premise,” wrote Nick Diakopoulos, a Tow Fellow. “Data does not intrinsically imply truth. Yes, we can find truth in data, through a process of honest inference, but we can also find and argue multiple truths or even outright falsehoods from data.”55time, it could benefit readers and society as a whole. A managing editor might float an assertion or hypothesis about what lies behind news, and then assign an investigative journalist to go find out whether it’s true or not. That reporter (or data editor) then must go collect data, evidence, and knowledge about it. To prove to the managing editor”and skeptical readers”that whatever conclusions presented are sound, the journalist may need to show his or her work, from the sources of the data to the process used to transform and present them. That also means embracing skepticism, avoiding confirmation bias, and not jumping to conclusions about observed correlations.“In a world awash with opinion there is an emerging premium on evidence-led journalism and the expertise required to properly gather, analyze, and present data that informs rather than simply offers a personal view,” wrote Cardiff University journalism professor Richard Sambrook. “The empirical approach of science offers a new grounding for journalism at a time when trust is at a premium.”56college of humanities and social sciences at Northeastern University, highlighted many of these issues in a long essay on the need for openness in data journalism. 57 The pressures of deadlines and tight budgets are real: Realistically, practices only change if there are incentives to do so. Academic scientists aren’t awarded tenure on the basis of writing well-trafficked blogs or high-quality Wikipedia articles, they are promoted for publishing rigorous research in competitive, peer-reviewed outlets. Likewise, journalists aren’t promoted for providing meticulously documented supplemental material or replicating other analyses instead of contributing to coverage of a major news event. Amidst contemporary anxieties about information overload as well as the weaponization of fear, uncertainty, and doubt tactics, data-driven journalism could serve a crucial role in empirically grounding our discussions of policies, economic trends, and social changes. But unless the new leaders set and enforce standards that emulate the scientific community’s norms, this data-driven journalism risks falling into traps that can undermine the public’s and scientific community’s trust.Keegan suggested several sound principles for data journalists to adopt: open data, open deliberation, open collaboration, and data ombudsmen:Data-driven journalists could share their code and data on open source repositories like GitHub for others to inspect, replicate, and extend. [This is already happening at ProPublica and other outlets.] Journalists could collaborate with scientists and analysts to pose questions that they jointly analyze and then write up as articles or features as well as submitting for academic peer review. But peer review takes time and publishing results in advance of this review, even working with credentialed experts, doesn’t imply their reliability. Organizations that practice data-driven journalism (to the extent this is different from other flavors of journalism) should invite and provide empirical critiques of their analyses and findings. Making well-documented data available or finding the right experts to collaborate with is extremely time-intensive, but if you’re going to publish original empirical research, you should accept and respond to legitimate critiques.Data-driven news organizations might consider appointing independent advocates to represent public interests and promote scientific norms of communalism, skepticism, and empirical rigor. Such a position would serve as a check against authors making sloppy claims, using improper methods, analyzing proprietary data, or acting for their personal benefit. It now feels clichéd to say it in 2014, but in this context transparency really may be the new objectivity. The latter concept is not one that has much traction in the sciences, where observer effects and experimenter bias are well-known phenomena. Studies and results that can’t be reproduced are regarded with skepticism for a reasonSuch thinking about the scientific method and journalism isn’t new, nor is its practice by journalists around the country who have pioneered the craft of data journalism with much less fanfare than Making sense of what sources mean, putting their perspective in context, and creating a narrative that enables people to understand a complex topic is what matters. The ultimate accomplishment for journalists may be to integrate data into stories in a way that not only conveys information, but imparts knowledge to the humans reading and sharing it. To do this kind of work well, journalists need “a firm understanding of public records laws, a grasp of programs such as Excel or Access, contacts with statisticians, and a comfort level in creating data sets where none exist,” said Charles Ornstein of ProPublica. “My colleagues and I put together a data set using Access when we were analyzing more than 2,000Registered Nursing. It was the only way of analyzing real data and not piecing together anecdotes. It was very time consuming but very worthwhile.”Data-driven investigative techniques can substantially augment the ability of technically savvy journalists to master information and hold governments accountable. Applying data journalism enables investigative journalists to find trends, chase hunches, and explore hypotheses. It can enable beat reporters to look beyond anecdotes or a rotating cast of sources to find hidden trends or scoops. A body of empirical evidence, based upon rigorously vetted data, can also give editors and reporters the ability to move away from “he said, she said” journalism that leaves readers wondering where the truth lies.