Autocompletions on Google and Bing
The Google autocomplete FAQ reads, “We exclude a narrow class of search queries related to pornography, violence, hate speech, and copyright infringement.” Bing, on the other hand, makes sure to “filter spam” as well as to “detect adult or offensive content.” Such editorial choices set the stage for broadly specifying the types of things that get censored. But what exactly are the boundaries of that censorship, and how do they differ among search engines? More importantly, what kinds of mistakes do these algorithms make in applying their editorial criteria?
To answer these questions, I automatically gathered autosuggest results from hundreds of queries related to sex and violence in an effort to find those that were surprising or deviant.26 Using a list of 110 sex-related key- words drawn from academic and slang sources as inputs to the algorithm, I looked to see which inputs resulted in zero output—suggesting a blocked word. While many of the most obvious words were outright blocked—like “ass” and “tits”—a number of the search terms were not. The lack of block- age becomes more significant when adding the prefix “child” to the query, since some of the suggestions lead to child pornography, which is illegal and ought to be blocked.
This case illustrates an ideal situation for the use of algorithmic-accountability reporting. Some transparency by the services through their FAQ’s and blogs suggest a hypothesis and tip as to what types of input the algorithm might be sensitive to (i.e., pornography and violence-related words). Moreover, the algorithms themselves, both their inputs and outputs, are observable and accessible through APIs, which make it relatively easy to quickly collect a wide range of observations about the input-output relationship.