Discussion
Looking forward, we’re faced with a number of challenges to actualizing algorithmic accountability in practice. Here I briefly touch on some of those challenges, including issues of human resources, legality, ethics, and the role that transparency might still effectively play.
Developing the human resource to do algorithmic-accountability reporting will take dedicated efforts to teach the computational thinking, programming, and technical skills needed to make sense of algorithmic decisions. While there is growing awareness of more complex algorithms among data journalists, the number of computational journalists with the technical skills to do a deep investigation of algorithms is still rather limited. Teaming computationally literate reporters with tech-savvy computer scientists might be one method for doing more algorithmic accountability reporting. Another way would be to train journalists themselves in more computational techniques. Either way, we probably need more experience with the method before we can effectively teach it. “There’s no conventional or obvious approach to it. It’s a lot of testing or trial and error, and it’s hard to teach in any uniform way,” noted Jeremy Singer-Vine. I also spoke to Chase Davis, an assistant editor at The New York Times and instructor at the Missouri School of Journalism, who concurred:
Teaching it explicitly at this point might be difficult...a beat would be a stretch because there’s no single unifying theme to it. It crosses a lot of boundaries in a way that standard data-driven journalism or CAR does.
Legally speaking, the reverse engineering of commercial software does have some pitfalls. Other than the Digitial Millennium Copyright Act (DMCA), there are no laws that directly prohibit or restrict reverse engineering, and even the DMCA has exemptions.40 Software vendors do typically add anti-reverse engineering clauses to End User License Agreements (EULAs),41 forcing the decision: Is it okay to breach such a contract if it gets you closer to the truth about the algorithm? Helen Nissenbaum, a professor at New York University, has suggested that laws might be in order to stipulate limits on Terms of Service toallow more room for individuals to negotiate their relationship with online entities.42 Perhaps more problematic is a law like the Computer Fraud and Abuse Act (CFAA).43 Peter Ludlow recounts the story of Andrew Auernheimer, who wrote a script to collect private customer information that was inadvertently available on a public AT&T site.44 Auernheimer was prosecuted under the CFAA and sentenced to 41 months in prison. Tread carefully here and seek qualified legal advice before attempting to reverse engineer algorithms or collect data from corporate or government entities.
Besides the legality of reverse engineering corporate or government systems, there are other ethical questions that arise in the context of studying algorithms. In particular we need to ask ourselves about the possible ramifications or negative consequences of publishing details of how certain algorithms work. Would publishing such information negatively affect any individuals? More importantly, perhaps, is the issue of gaming brought up in the earlier section on transparency. Goodhart’s law states, again, that once people know about a measure it’s no longer a good one since they’ll start trying to manipulate it. By publishing details of how an algorithm functions, specifically information about what inputs it pays attention to, how it uses various criteria in a ranking, or what criteria it uses to censor, how might that allow the algorithm to be manipulated or circumvented? And who stands to benefit from that manipulation? If publishing reverse- engineering information on how Google-search ranking works helps SEO black-hats get more spam information into our search results, then what did we really accomplish? The ethical principal of beneficence offers guidance here: Try to maximize anticipated benefits while minimizing possible risks of harm to the public.
It may still be too early to develop standards on how entities creating and running algorithms might be more transparent about their technical systems, while respecting their right to trade secrets and their desire to mitigate gaming. Ultimately, we need to find a workable balancing point between trade secrets and transparency. Well-trodden transparency policies in other domains do offer some opportunity to reflect on how such policy might be adapted for algorithms.45 For instance, targeted transparency policies always indicate the boundaries of disclosure, a contentious point as that boundary will dictate the limits of trade secret, and the degree of time and money invested by the algorithm creator in publishing the required transparency information. A policy would also need to indicate what factors or metrics of the algorithm would be disclosed, the frequency of their disclosure (e.g., daily, monthly, or real-time), and the vehicle for communicating that information (e.g., a separate document, or integrated into the algorithmic output in some way).
The challenge to standardizing what should be disclosed about algorithms may come down to building consensus about what factors or metrics are both significant and acceptable. Frequency of disclosure, as well as communication vehicle, are important for adoption, but before we get there we need to know the informational content that might be disclosed. The questions at the beginning of the “Algorithmic Accountability” section form the basis for aspects of algorithms that we might consider here. This includes things like:
the criteria used to prioritize, rank, emphasize, or editorialize things in the algorithm, including their definitions, operationalizations, and possibly even alternatives;
what data act as inputs to the algorithm— what it “pays attention” to, and what other parameters are used to initiate the algorithm;
the false positive and false negative rate of errors made in classification, including the rationale for how the balance point is set between those errors;
training data and its potential bias, including the evolution and dynamics of the algorithm as it learns from data;
the definitions, operationalizations, or thresholds used by similarity or classification algorithms
To achieve a comprehensive public audit of an algorithm, we need to reach a consensus about which of these factors might be appropriate to make public, or semi-public (e.g., to an escrow third-party auditor). The hope is that as we develop more experience doing algorithmic accountability reporting the factors that are most significant to embed in a standardized algorithmic transparency policy will come into clearer focus.
In the case of algorithms, where complexity reigns, and the computational literacy of the public may be limited, the role of professional and more sophisticated interpreters of transparency information will be essential. In the same way that business journalists contextualize and help the public understand the information produced through financial transparency of companies, journalists will also be needed to frame, contextualize, and explain the transparency information about algorithms.
It’s worth noting that as news organizations also come to employ algorithms in the shaping of the news they report, whether that be in finding new stories in massive datasets or presenting stories interactively, some of the same issues with transparency arise—news organizations are, after all, corporations that may have trade secrets to keep or systems to buttress from manipulation. But with the recent shift toward transparency as a core ideal of the journalistic enterprise,46 tensions emerge between the ideal of transparency and the reality of algorithms. Chase Davis noted that one of the main challenges to building newsroom algorithms is providing a window for the reporter into how a particular algorithmic decision was made. It remains to be seen how news organizations will incorporate the evidence that algorithms or simulations provide with an epistemology and ethic that demands full transparency. Perhaps the public editor of the future will also play the role of algorithmic ombudsman.