Appendix: Data Gathering and Cleaning
Data on the number of PGP key registrations in each newsroom was gathered by scraping the MIT key server in March of 2016.15 Registrations on the MIT server have considerable, if not total, overlap with other key servers because they synchronize data with one another.
Each employee was identified by the supplied email address attached to their news organization. Three concerns should be noted here: Anyone can register a PGP key to a particular email address, even if they do not own that account; new email addresses can be added to a registry at a later date; and not every journalist who is using encrypted email will be doing it on their work account. These are the main limitations in the coverage of the data presented here.
Other problems, such as duplicate entries, were fixed by manually cleaning the data. Many people, over time, have registered several different keys. Some people register more than one in a single day, perhaps in the midst of a tutorial. Many others choose to revoke an existing key and register a new one at some point in time. Whatever the case, duplicates were eliminated from this report’s data, wherever possible. Each entry in which two or more keys were registered under the same name, or under clear variations of one person’s name, were also manually deleted. Only the earliest registration was maintained so that timelines would not include later registrations from the same individual. Thus, each entry should reflect their earliest enrollment with encryption keys.
This condition has one notable but largely unavoidable effect on the data: Sometimes a security-savvy reporter will move to a new organization but their entry is still counted for their previous employer, where they first enrolled their encryption keys. It’s worth emphasizing that the numbers associated with each organization should not be read as figures for total staff using encryption, but rather for the number of staff who enrolled their first encryption key at that organization. This number becomes less accurate and less useful at organizations with a longer history of registrations, such as those that stretch back to the 1990s.
Key registrations for a news organization are also not necessarily journalists. Especially in the earlier key registrations in this data set, it is clear that many were working in the information technology department. Thus, some of these registrations do not indicate that an employee has set up encryption for the sake of communicating with sources.
Finally, entries for general addresses like contact@ or tips@ each site were removed. While these may be useful avenues for secure communication, they do not signal that a particular journalist has begun using encryption.