The Art and Science of Data-Driven Journalism

Fuel for Robo-journalism

It’s certain that data will also play a role in other kinds of ventures, perhaps underpinning “robo-journalism” from services like Narrative Science.104report on a March 2014 earthquake in Los Angeles was written by a robot105programmer for the Los Angeles Times. It’s not the first bot “roboporter” on staff; Schwencke and the Times’ data desk modeled the “Quakebot” on a similar algorithm that creates automatic reports about homicides in the area.106traffic, weather, high school sports, and police blotters are inevitable, although a human editor may still play a role in publishing bot-reported stories. “Having spent some years as a local news reporter, I can attest that slapping together brief, factual accounts of things like homicides, earthquakes, and fires is essentially a game of Mad Libs that might as well be done by a machine,” wrote Will Oremus at Slate. “…At the same time, Quakebot neatly illustrates the present limitations of automated journalism. It can’t assess the damage on the ground, can’t interview experts, and can’t discern the relative newsworthiness of various aspects of the story.”In the near term, such newsbots may be most useful as early alert systems for beat reporters and editors, finding signal in the noise that journalists can then use as digital tips to assign, investigate, and confirm. This kind of data journalism”powered by alerts, scrapers, and algorithms”created scoops, which should be catnip to city desk editors. Such automation has widespread applications, from government accountability to financial reporting.“I would like to get more into monitoring and notification,” said Aron Pilhofer. “We ingest millions of records of campaign finance contributions and expenditures every year. If for example, a member of Congress is at risk, you see a spike in ”legal services,’ a standard variation above the mean. That should send a notification to congressional reporters. You’d be using tech to improve the reporter’s ability to do their job.”What Google Now,107Science, and other algorithmic approaches to local information will all need is good data. Some data will come from municipalities, other data will come from the private sector, nonprofits, and academia, and some will be created by media organizations themselves using sensors and scrapers. (The Omaha World-Herald’s Curbwise is one such project, focused on real estate.)108Holovaty intended, it’s in this area that EveryBlock may end up making the biggest contribution, in terms of making open government data more useful in an automated fashion. “The thesis of it was not take public records and make them usable,” he told me in an interview in 2014. “It was to show what you need to know at the level of a block or neighborhood, but because we were doing it and no one else was, people focused on public records. People focused on things that were unique versus the purpose of what the site was.” He added, “That’s like focusing on the Beatles because of their use of a sitar””I love the Beatles because they’re such a great sitar band.’ It’s just something they used to the end of making great music.”The part of his vision for media organizations that is coming to pass may be expressed in news applications and other interactives that are born digital, divorced from the constraints of print and the daily front pages, personalized for individual users, and automatically updated with data as it becomes available. For some time to come, however, there will be a role and need for humans to fact-check the algorithms generating automated news from data, adding context, shaping visually compelling narratives, and conducting investigative journalism that algorithms alone cannot. Someday, that may change, as Kristian Hammond, the CTO and cofounder of Narrative Science, suggested to Steven Levy:Hammond believes that as Narrative Science grows, its stories will go higher up the journalism food chain”from commodity news to explanatory journalism and, ultimately, detailed long-form articles. Maybe at some point, humans and algorithms will collaborate, with each partner playing to its strength. Computers, with their flawless memories and ability to access data, might act as legmen to human writers. Or vice versa, human reporters might interview subjects and pick up stray details”and then send them to a computer that writes it all up. As the computers get more accomplished and have access to more and more data, their limitations as storytellers will fall away. It might take a while, but eventually even a story like this one could be produced without, well, me. “Humans are unbelievably rich and complex, but they are machines,” Hammond says. “In 20 years, there will be no area in which Narrative Science doesn’t write stories.”