For News Organizations

The coverage of routine topics like sports and finance only provides a starting point. Given the obvious economic benefits in providing opportunities to cut costs and, at the same time, increase the breath of news content, more media organizations are likely to adopt automation technology. Most likely, automation will soon be applied to more challenging subjects, such as public interest journalism, by covering political and social issues. In fact, the precursors of this development can already be observed in the form of algorithms that automatically create content on Twitter.⁵⁶

When automating content for critical problems, issues of accuracy, quality of the content, and transparency of the underlying data and procedures become more important. In a first attempt to address these questions, Tom Kent proposed “an ethical checklist for robot journalism,” which he derived from AP’s experience automating corporate earnings reports, a project that took close to one year. The checklist poses ten questions that news organizations and editors need to think about when automating content.⁵⁷ These questions consider quality issues relating to the source data, the data processing, and the final output.

Source data

News organizations need to ensure that, first, they have the legal right to modify and publish the source data and, second, the data are accurate. Data provided by governments and companies are probably more reliable and less error-prone than user-generated data like scores from local youth sporting entered into a database by coaches or the players’ parents. That said, as demonstrated in the case of earthquake reporting (see Case Study 2), even government data may contain errors or false information. Data problems may also arise if the structure of the source data changes, a common problem for data scraped from websites. Thus, news organizations need to implement data management and verification procedures, which could be either performed automatically or by a human editor.

Data processing

If the underlying data or the algorithms that process them contain errors, automation may quickly generate large numbers of erroneous stories, which could have disastrous consequences for a publisher’s reputation. News organizations therefore need to engage in thorough testing before initial publication of automated news. When publication starts, Kent recommends having human editors check each story before it goes live, although, as demonstrated by the Quakebot (Case Study 2), this so-called “hand break” solution is not error-free either. Once the error rate is down to an acceptable level, the publication process can be fully automated, with occasional spot checks. The latter is the approach the AP currently uses for its company earnings reports.

Output

Regarding the final output, Kent recommends that the writing match the official style guide of the publishing organization and be capable of using varied phrasing for different stories. Furthermore, news organizations should be aware of legal and ethical issues that may arise when the text is automatically enhanced with videos or images without proper checking. For such content, publishing rights may not be available or the content may violate standards of taste. News organizations must also provide a minimum level of transparency by disclosing that the story was generated automatically, for example, by adding information about the source of the data and how the content was generated. The AP adds the following information at the end of its fully automated company earnings reports:

This story was generated by Automated Insights (http://automatedinsights.com/ap) using data from Zacks Investment Research. Access a Zacks stock report on ACN at http://www.zacks.com/ap/ACN.

Of course, news consumers may be unfamiliar with these companies and their technologies, and therefore unaware that the content is provided by an algorithm. It remains unclear whether readers actually understand the meaning of such bylines. Further research on how they are perceived would be useful. Also, since more and more stories are the result of collaboration between algorithms and humans, the question arises of how to properly disclose when certain parts of a story were automated. The AP currently deals with such cases by modifying the first sentence in the above statement to “Elements of this story were generated by Automated Insights.”⁵⁸ That said, Kent noted that the discussion about how to properly byline automated news may be a temporary one. Once automated news becomes standard practice, some publishers may choose not to reveal which parts of a story were automatically generated.

Accountability

Automation advocates argue that algorithms allow for an unbiased account of facts. This view, however, assumes that the underlying data are complete and correct and, more importantly, the algorithms are programmed correctly and without bias. Like any other model, algorithms for generating automated news rely on data and assumptions, both of which are subject to biases and errors.⁵⁹ First, the underlying data may be wrong, biased, or incomplete. Second, the assumptions built into the algorithms may be wrong or reflect the (conscious or unconscious) biases of those who developed or commissioned them. As a result, algorithms could produce outcomes that were unexpected and unintended, and the resulting stories could contain information that is inaccurate or simply false.⁶⁰

In such situations, it is not enough to state that an article was generated by software, in particular when covering critical or controversial topics for which readers’ requirements of transparency and accountability may be higher. When errors occur, news organizations may come under pressure to publish the source code behind the automation. At the very least, they should be able to explain how a story was generated, rather than simply stating that “the computer did it.”⁶¹ From a legal standpoint, algorithms cannot be held accountable for errors. The liability is with a natural person, which could be the publisher or the person who made a mistake when feeding the algorithm with data.⁶²

While providers of automated news could—and in some cases probably should—be transparent about many details of their algorithms, there was consensus among experts at the Tow workshop on algorithmic transparency that most organizations are unlikely to voluntarily provide full transparency, especially without a clear value proposition. However, if news organizations and software developers do not fully disclose their algorithms, it remains unclear how to evaluate the quality of the algorithms and the content produced, in particular, its sensitivity to changes in the underlying data. A promising yet complex approach might be reverse engineering, which aims at decoding an algorithm’s set of rules by varying certain input parameters and assessing the effects on the outcome.⁶³ Another important question for future research is whether, and if so to what extent, users of automated content ultimately care about transparency, in which case the provision of such information could be a competitive advantage by increasing a publisher’s credibility and legitimacy.⁶⁴