Targeting Political Emails
During the 2012 presidential campaign, Dan Sinker, head of the Knight-Mozilla OpenNews project, noticed that the Obama campaign was sending slight variations of the same email to different people. ProPublica picked up the tip and started gathering hundreds and then thousands of these targeted emails, soliciting them from people who were willing to forward them on to the news organization. Reporters had heard the Obama team was running a sophisticated data operation but no one inside the campaign was talking.
The Message Machine,28 as it came to be called, tried to reverse engineer how the campaign was using targeting information to adapt and personalize email messages for different recipients. In addition to collecting the emails, ProPublica solicited the recipients to fill out a survey asking about basic demographic information, where they lived, and if they had donated or volunteered for the campaign before. These survey answers then served as the input to the algorithm they were trying to dissect. In this case, the output was observable—crowdsourced from thousands of people—but the types of inputs used by the targeting algorithm were hidden behind the campaign wall and thus not controllable by journalists. Instead, ProPublica was tasked with determining, based on the outputs collected and a proxy for the inputs (collected with the survey), what types of inputs the campaign’s targeting algorithm was actually paying attention to.
In one instance the analysis was wrong, as Scott Klein, an editor who worked on Message Machine, explained to me. “We slipped and we said that ‘in such and such an example they are targeting by age.’” After the campaign was over, however, Klein and his colleagues found that in fact the campaign was not targeting by age, but by another correlated variable: donation history. The lesson here for reverse engineering is that we need to be careful when using correlations to make claims about what inputs an algorithm is actually using. When we don’t have access to the algorithm’s inputs we can only make statistically informed guesses. Correlation does not imply causation, nor intent on the part of the designer. As much as algorithmic accountability can help us diagnose the existence of a problem, we have to go deeper and do reporting (when possible) to understand the motivations or intentions behind an algorithm. Ultimately, we still need to answer the question of “why?”