In automating traditional journalistic tasks, such as data collection and analysis, as well as the actual writing and publication of news stories, there are two obvious economic benefits: increasing the speed and scale of news coverage. Advocates further argue that automated journalism could potentially improve the accuracy and objectivity of news coverage. Finally, the future of automated journalism will potentially allow for producing news on demand and writing stories geared toward the needs of the individual reader.
Automation allows for producing news in nearly real time, or at the earliest point that the underlying data are available. For example, the AP’s quarterly earnings report on Apple (see Chapter 1) was published only minutes after the company released its figures. Another example is Los Angeles Times’s Quakebot, which first broke the news about an earthquake in the Los Angeles area in 2014 (see Case Study 2).
Automation allows for expanding the quantity of news by producing stories that were previously not covered due to limited resources. For example, both the Los Angeles Times (for homicide reports; Case Study 1) and the Associated Press (for company earnings reports; Case Study 3) reported that automation increased the amount of published stories by more then ten times. Similarly, while human journalists have traditionally only covered earthquakes that exceeded a certain magnitude or left significant damage, Quakebot provides comprehensive coverage of all earthquakes detected by seismographic sensors in Southern California (Case Study 2). While any one of these articles may attract only a few hits in targeting a small audience, total traffic increases with positive effects on advertising revenues.
Algorithms do not get tired or distracted, and—assuming that they are programed correctly and the underlying data are accurate—they do not make simple mistakes like misspellings, calculation errors, or overlooking facts. Advocates thus argue that algorithms are less error-prone than human journalists. For example, Lou Ferrara, former vice president and managing editor for entertainment, sports, and interactive media at the Associated Press, reports that automation has decreased the rate of errors in AP’s company earning reports from about seven percent to only about one percent, mostly by eliminating typos or transposed digits. “The automated reports almost never have grammatical or misspelling errors,” he told me, “and the errors that do remain are due to mistakes in the source data.”
Yet, Googling “generated by automated insights correction” lists thousands of examples where automatically generated articles had to be corrected after their publication.20 In the vast majority of cases, the errors are rather uncritical, such as wrong information about where the company is based or when its quarter ends. Sometimes the errors are crucial, however. A prominent example is a July 2015 report about Netflix’s second-quarter earnings.21 This article, which was later corrected, wrongly reported that the company missed expectations and that the share price had fallen by seventy-one percent since the beginning of the year when, in fact, it had more than doubled during that period. The reason for the error was that the algorithm failed to realize that the Netflix stock underwent a seven-to-one split. This example thus demonstrates the importance of, first, foreseeing unusual events in the initial development of the algorithms and, second, being able to detect outliers and request editorial monitoring if necessary.22
Case Study 2: Earthquake Alerts
In automatically producing short news stories about earthquakes in California, the Los Angeles Times’s Quakebot demonstrates the use of sensor data for automated journalism. When the U.S. Geological Survey’s Earthquake Notification Service releases an earthquake alert, Quakebot creates a story that provides all the basic information a journalist would initially cover—including time, location, and magnitude of the earthquake—and saves it as a draft in the Los Angeles Times content management system. After a staff member has reviewed the story for potential errors, it only takes a single click to publish the story. Although the system has been in use since 2011, Quakebot first attracted national media attention in March 2013 when it was the first news outlet to break the story that a 4.4 magnitude earthquake had hit Southern California. When Ken Schwencke, who developed Quakebot, felt the earth shaking at 6:27 a.m., he went to his computer to review the automatically generated story already waiting for him in the system and published it. Three minutes later, at 6:30 a.m., the story was online at the Los Angeles Times’s “L.A. Now” blog.23
Quakebot is all about speed. Its goal is to get the information out as quickly as possible. However, while speed is important, so is the accuracy of the news, and achieving both goals can be difficult. For automated news, a crucial aspect of accuracy is the quality of the underlying data. This became evident in May 2015 when seismologic sensors in Northern California picked up signals from major earthquakes that happened in Japan and Alaska, which the U.S. Geological Survey (USGS) mistakenly reported as three separate earthquakes in California with magnitudes ranging from 4.8 to 5.5. Earthquakes of that magnitude would leave significant local damage. Luckily, the alarms were false. The earthquakes had never happened and nobody could feel them. Nonetheless, Quakebot published stories for each of the three false alarms. In other words, the human review process failed. The editor trusted the algorithm and published the story without making sure that the information was correct.24
A simple way to verify the correctness of earthquake alerts might be to look at the number of related tweets. As soon as the earth starts shaking, Twitter users who feel the earthquake publish the information on the network. When a 6.0 earthquake hit the Napa Valley in August 2014, the first tweets appeared almost immediately and beat the official USGS alerts by minutes. Thus, the number of tweets provides an independent source of data for verifying whether a reported earthquake has actually occurred. In fact, research at the USGS showed that Twitter data can be used to locate an earthquake within twenty seconds to two minutes after its origin time. This is considerably faster than the traditional method of using seismometers to measure ground motion, particularly in poorly instrumented regions of the world.25 Along with earthquake alerts, the USGS now publishes the number of tweets per minute that contain the word “earthquake” in several languages on its official Twitter account @USGSted. For the false alarms discussed above, @USGSted reported zero tweets per minute, which is not surprising since no earthquake had happened. In comparison, for the actual earthquake that did occur off Japan, @USGSted reported fifty-six tweets per minute at the time it published the earthquake alert. The Los Angeles Times editor could have looked at this information when deciding whether or not to publish the news. Or, even better, Quakebot could be updated so that its algorithm accounts for this information and automatically publishes a story if the number of tweets in a respective area is above a certain threshold.
Algorithms strictly follow predefined rules for analyzing data and converting the results into written stories. Advocates argue that automated news provides an unbiased account of facts. This argument of course assumes that the underlying data are correct and the algorithms are programmed without bias, a view that, as discussed in the next chapter, is false or too optimistic at best.26 That said, experimental evidence available to date suggests that readers perceive automated news as more credible than human-written news (see Textbox I).
Automation allows for providing relevant information for very small audiences and in multiple languages. In the most extreme case, automation can even create news for an audience of one. For instance, Automated Insights generates personalized match day reports (a total of more than three hundred million in 2014) for each player of Yahoo Fantasy Football, a popular online game in which people can create teams of football players and compete against each other in virtual leagues. Similarly, one of Narrative Science’s core businesses is to automatically generate financial market reports for individual customers. It is easy to imagine similar applications for other areas. For example, algorithms could create recaps of a sports event that focus on the performance of a particular player that interests the reader most (e.g., grandparents interested in the performance of their grandchild). Furthermore, as shown with Automated Insights’ Fantasy Football match day reports, the algorithms could even tell the same story in a different tone depending on the reader’s needs. For example, the recap of a sporting event could be written in an enthusiastic tone for supporters of the winning team and in a sympathetic tone for supporters of the losing one.
News on demand
The ability to personalize stories and analyze data from different angles also provides opportunities for generating news on demand. For example, algorithms could generate stories that answer specific questions by comparing the historical performance of different baseball players. Algorithms could also answer what-if scenarios, such as how well a portfolio would have performed if a trader had bought stock X as compared to stock Y. While algorithms for generating news on demand are currently not yet available, they will likely be the future of automated journalism. In October 2015, Automated Insights announced a new beta version of its Wordsmith platform, which enables users to upload their own data, pre-write article templates, and automatically create narratives from the data.27 The German company AX Semantics provides a similar functionality with its ATML3 programming language.