The Art and Science of Data-Driven Journalism

Crowdsourcing Data Creation and Analysis

Today, journalists have many more options for sourcing. Newsroom reporters and developers can download, scrape, and digitize data from a wealth of sources, from websites to document dumps. In the future, more journalists will create data themselves using sensors, and engage their distributed audiences of readers, listeners, and watchers to help gather data with them. In some forward-thinking media organizations, this is already happening. In 2011, ProPublica began an effort to “Free the Files,” making the physical “public inspection documents” detailing political advertising spending at local stations open to the public.124Commission ordered TV stations in the top 50 markets in the United States to begin posting these documents online. The trouble is that the FCC didn’t require that publishing be done in an open, standardized format. As a result, the stations submitted a mass of unsearchable PDFs.125enable volunteers to translate the files into structured data and sort the files by market, amount, candidate, and political group. ProPublica later open sourced the app as Transcribable.126was to take thousands of hard-to-parse documents and make them useful, helping to reveal hidden spending in the election,” senior engagement editor Amanda Zamora explained.1271,000spending data to create a public database that otherwise wouldn’t exist. We logged as much as $1 billion in political ad buys, and a month after the election, people are still reviewing documents.”In 2013, New York City’s public radio station asked its listeners to help track the emergence of cicadas with inexpensive sensors. WNYC’s “Cicada Tracker” project turned up some 8,000people making trackers.128data collection129journalism130reality”not just a theoretical project”that resulted in 1,5000 collected temperature readings. The lessons from the cicada tracker project should inform future efforts at public media to engage in public engagement, citizen science, and data collection. For more of a deep dive into the topic, check out the proceedings of the sensor journalism workshop at the Tow Center last year.131team released a project around accessible playgrounds. NPR made a request of its community of listeners and readers: Help public media collect the data that drives it and make the resource better for everyone. The NPR playgrounds app enables parents and children to search for accessible playgrounds, taking commonly used consumer-recommendation engines and combining them with a strong public service element.132playgrounds for kids with special needs,” said Brian Boyer, the head of NPR’s Visuals team, in an interview. “It is the first of its kind”a nationwide database of playgrounds that are well suited to kids in wheelchairs, kids with autism, or kids with other special needs.”NPR activated its audience to become participants in data collection, much in the same way that Audubon’s “Christmas Bird Count” and eBird are crowdsourcing data collection about bird species.133the first 48 hours after the app was launched, data for 336 more playgrounds was added to the database, for a total of 1,293. In May of 2014, the playgrounds app had 1,907counting. The app is a notable case study for the power of public engagement and crowdsourcing-data creation. There are decades of precedent where a listening or viewing audience collaborate with a media organization in collecting images, videos, or stories. What remains relatively new is the capacity for a networked populace to contribute data, whether it comes from sensors in droughts134geiger counters135If turning data into stories is now a core element of investigative journalism, WNYC and NPR’s Visuals team have showed how to do it best and serve the public in the process.136