Algorithms for generating automated news follow a set of predefined rules and thus cannot innovate. Therefore, their application is limited to providing answers to clearly defined problems for which data are available. Furthermore, at least at the current stage, the quality of writing is limited.
Data availability and quality
Automated journalism requires high-quality data in structured and machine-readable formats. In other words, you need to be able to save your data in a spreadsheet. For this reason, automation works particularly well in domains such as finance, sports, or weather, where data providers make sure that the underlying date are accurate and reliable. Needless to say, automation cannot be applied to domains where no data are available. Automation is challenging in situations where data quality is poor. For example, in March 2015, the Associated Press announced that it would commence automatically producing stories on college sports events for lower divisions using game statistics data from the NCAA. The goal of this endeavor is to expand the existing sports coverage by providing stories on sports events that were previously not covered. According to Lou Ferrara, this project was more complicated than expected due to issues with the underlying data. Since the data are often entered by coaches and do not undergo strict verification procedures, they can be messy and contain errors.
Algorithms can add value by generating insights from data analysis. In applying statistical methods to identify outliers or correlations between multiple variables, algorithms could find interesting events and relationships, which in turn could lead to new stories. However, algorithms that analyze correlations cannot establish causality or add meaning. That is, while algorithms can provide accounts of what is happening, they cannot explain why things are happening.28 As a result, findings derived from statistical analysis—regardless of their statistical significance—can be completely meaningless (see www.tylervigen.com/ for examples of statistically significant but completely spurious correlations). Humans still need to validate the findings by applying logic and reasoning.29
Once the findings have been validated, algorithms can contribute knowledge. Yet, this contribution is limited to providing answers to prewritten questions by analyzing given data. Algorithms cannot use the knowledge to ask new questions, detect needs, recognize threats, solve problems, or provide opinions and interpretation on, for example, matters regarding social and policy change. In other words, algorithms lack ingenuity and cannot innovate. As a result, automated journalism is limited in its ability to observe society and fulfill journalistic tasks, such as orientation and public opinion formation.30
Another often mentioned limitation of automated news is the quality of the writing. Current algorithms are limited in understanding and producing nuances of human language, like humor, sarcasm, and metaphors. Automated news can sound technical and boring, and experimental evidence shows that people prefer reading human-written to automated news (see Textbox I). That said, according to Gartner’s “Hype Cycle for Business Intelligence and Analytics, 2015” natural language generation is only at the very beginning of its development.31 Therefore, the technology, and thus the quality of writing, is likely to further improve over time. It remains an open question, however, whether algorithms will ever be able to produce sophisticated narration comparable to human writing.32
Case Study 3: Company Earnings Reports
In July 2014, the Associated Press began to automate the process of generating corporate earnings stories using the Wordsmith platform for natural language generation, developed by Automated Insights with data provided by Zacks Investment Research (for an example, see the Apple quarterly earnings report shown in Chapter 1). The project turned into a massive success. In January 2015, AP announced that the automation allowed for the generation of more than 3,000 stories per quarter, compared to about three hundred stories that AP reporters and editors previously created manually. By the end of 2015, the AP expects to generate 4,700 stories, and soon it will also generate earnings reports for companies in Canada and the European Union. According to AP assistant business editor Philana Patterson, the reaction from both AP members and readers has been “incredibly positive.”33 First, readers are happy because they have access to more stories, which also contain fewer errors than the manually written ones. Second, staff members are pleased because “everybody hated doing earnings reports” and, more importantly, “automation has freed up valuable reporting time for more interesting tasks,” said Lou Ferrara. Patterson also revealed that, in addition to increasing the number of corporate earning reports by more than ten times, automation has freed up about twenty percent of the time previously spent producing earnings reports. According to AP, the freed resources have not led to any job losses but have been used to improve activities in other areas, like AP’s breaking news operations or investigative and explanatory journalism.34
The AP was not the first major news organization to use natural language generation for writing company earnings stories. Since 2012, http://www.forbes.com/ has been cooperating with Narrative Science to automatically create company earnings previews. The goal of this project was to provide cost-effective, broad, and deep market coverage for its readers. Similar to the experience at the AP, Forbes’s automating has allowed for generating more stories while freeing up resources. As a result of the additional coverage, Forbes’s audience has broadened, and site traffic and advertising revenues have increased.35