The State of Automated Journalism in Newsrooms

Automated news emerged almost half a century ago from the domain of weather forecasting. One early study describes a software that works similarly to the process detailed above. The software takes the outputs of weather forecasting models (e.g., wind speed, precipitation, temperature), prioritizes them by importance (e.g., whether the value is above or below a certain threshold level), and uses about eighty pre-written phrases to generate “worded weather forecasts.” Interestingly, the author’s discussion of the software’s benefits resembles much of today’s conversation about how automated journalism could potentially free up journalists and leave time for more important work (see Chapter 3): “The more routine tasks can be handled by a computer, thereby freeing the meteorologist for the more challenging roles of meteorological consultant and specialist on high-impact weather situations.”12

Another domain in which organizations have long used automation is financial news, where the speed in which information can be provided is the key value proposition. For example, companies such as Thomson Reuters and Bloomberg extract key figures from press releases and insert them into pre-written templates to automatically create news alerts for their clients. In this business, automation is not about freeing up time. It is a necessity. Reginald Chua, executive editor for editorial operations, data, and innovation at Thomson Reuters, told me: “You can’t compete if you don’t automate.”

In more recent years, automated journalism also found its way into newsrooms to address other types of problems, often in the form of custom-made, in-house solutions. A prominent example is the work at the Los Angeles Times on automating homicide and earthquake reporting described in case studies 1 and 2. When asked to describe the algorithms, Ken Schwencke, who developed them (and now works for The New York Times), noted that the underlying code is “embarrassingly simple,” as it merely extracts numbers from a database and composes basic news stories from pre-written text modules.13 Despite—or perhaps because of—its simplicity, Schwencke’s work marks an important step in the era of automated journalism, demonstrating how simple in-house solutions can help to increase both the speed and breadth of news coverage.

Many newsrooms, however, lack the necessary resources and skills to develop automated journalism solutions in-house. Media organizations have thus started to collaborate with companies that specialize in developing natural language generation technology to automatically generate stories from data for a variety of domains. In 2012, for example, Forbes.com announced its use of Narrative Science’s Quill platform to automatically create company earnings previews.14 A year later, ProPublica used the same technology to automatically generate descriptions for each of the more than 52,000 schools for its Opportunity Gap news application.15 In 2014, automated journalism made its way into the public’s focus when the Associated Press, one of the world’s major news organizations, began automating its quarterly company earnings reports using Automated Insights’ Wordsmith platform. As described in Case Study 3, the project was a success and, as a result, the AP recently announced the expansion of its automated coverage to sports.16


Case Study 1: Crime Reporting

Mary Lynn Young and Alfred Hermida describe the evolution of the the Los Angeles Times’s Homicide Report as an early example of automated journalism.17 Before the project’s launch in January of 2007, the Times’s print edition covered only about ten percent of the nearly 1,000 annual homicides in L.A. County. Thereby, the coverage typically focused on the most newsworthy cases, which were often the most sensational ones and therefore did not provide a representative picture of what was really happening. The goal of the Homicide Report was to address this bias in the media coverage by providing comprehensive coverage of all annual homicides. The project originally started as a blog that posted basic information about each homicide, such as the victim’s race and gender or where the body was found. A few months later, an interactive map was added to visualize the information. Soon, however, it became clear that the project was too ambitious. Due to limited newsroom resources, as well as technical and data issues, it was impossible to report every homicide. The project was put on hold in November 2008. When the Homicide Report was relaunched in January 2010, it relied on structured data from the L.A. County coroner’s office, which includes information such as the date, location, time, race or ethnicity, age, jurisdiction, and neighborhood of all homicides in the area. The revised Homicide Report used these data to automatically produce short news snippets and publish them on the blog. While these news reports were simple, providing only the most rudimentary information, they accomplished the project’s original goal to cover every single homicide and were able to do so in a quick and efficient manner. As noted by Ken Schwencke, who wrote the code for automatically generating the homicide-related news, this technological innovation reduced “the load on reporters and producers and pretty much everybody in getting the information out there as fast as possible.”18 Journalists at the Los Angeles Times were open-minded toward the automation process. A study of the Homicide Report found that journalists “understood the algorithm as enhancing the role of crime reporters rather than replacing them.”19 That is, crime reporters used the automatically generated stories as initial leads for exploring a particular case in more detail, for example by adding information about the victim’s life and family.

A related Los Angeles Times’s project that also uses algorithms to create automated news, Mapping L.A. provides maps and information that allow readers to compare two hundred seventy-two neighborhoods in Los Angeles County with regard to demographics, crime, and schools. The platform uses data provided by the L.A. Police and County Sheriff’s Departments to automatically generate warnings if crime reports surpass certain predefined thresholds. For example, the system triggers a crime alert for a certain neighborhood if a minimum of three crimes is reported in a single week, and if the number of reported crimes in that week is significantly higher than the weekly average of the previous quarter.