The rapid expansion in the amount of unstructured
data,268need for this kind of expertise in-house. When the Guardian’s data team
was faced with making sense of the Wikileaks cables, it took months to
work through them.269hammering governments to give us data in columns and rows,” said Cohen.
“I think we’re increasingly seeing that stories just as likely (if not
more likely) come from the unstructured information that comes from
documents, audio and video, tweets, other social media”from government
and non-government sources.”Making sense of all of that data is both a
huge opportunity and an immense challenge for newsrooms. Once upon a
time, it was difficult for investigators to find information relevant to
answering a question. Today, in many (if not all) scenarios, the
opposite is true, particularly in a world where readers have access to
search engines. That has shifted the value that journalists can add”from
finding information to making sense of what’s actually happening,
processing, analyzing and vetting data, and finding signal in the
digital noise.That new landscape is precisely why the Knight News
Challenge gave $1.5examine data.270Project,271newsroom with a set of open source,272oriented at making it easier for journalists to use and analyze data,
and Overview,273cleaning, visualizing, and interactively exploring large documents and
data sets, acting as a kind of “editorial search
engine.”274Stray, Overview’s project manager and a research fellow at the Tow
Center, describes it as a organizational structure for
data.275bread-and-butter issues for newsrooms struggling to manage data. As of
March 2014, PANDA has been installed in 25 newsrooms around the United
States.“It’s a pain to search across data sets, but we also have this
general newsroom content management issue,” said Brian Boyer, the
product manager for PANDA and head of NPR’s News Applications team. “The
data stuck on your hard drive is sad data. Knowledge management isn’t a
sexy problem to solve, but it’s a real business problem. People could be
doing better reporting if they knew what was available. Data should be
visible internally.”Boyer thinks the trends toward big data in media are
clear, and that he and other hacker journalists can help their
colleagues to not only understand it, but to thrive. “There’s a lot more
of it, with government releasing its stuff more rapidly,” he said in
- “This city of Chicago is releasing a lot of it. We’re going for
increased efficiency, to help people work faster and write better
stories. Every major news org in the country is hiring a news app
developer right now. Or two. For smaller news organizations, it really
works for them. Their data apps account for the majority of their
traffic.” Once such databases are up and running, journalists can apply
analytical tools to produce evidence-driven reporting. The difficulty
ProPublica had with building the “Dollars for Docs” project puts the
scale of that work into perspective, from converting PDFs to dirty data,
to fact-checking correlations within the massive
databases.276read Dan Nguyen’s guide to scraping data,277Klein’s style guide for news apps,278exploration of “how data sausage is made.”279journalists start working more with data, they have more choices for
tools than ever before. There is also powerful new data-journalism
software coming online, from analysis to visualization tools. As Eric
Newton highlighted at the Knight Foundation, many of these new tools
help journalists gather, clean, analyze, and publish data and do not
require sophisticated programming knowledge to
use.280the head of the Knight-Mozilla News Technology Partnership for Mozilla,
wrote last year, journo-coders are now taking social coding “to a whole
new level.” 281 Just as civic software282baked into government, open source is playing a pivotal role in the
practice of data journalism. 283 While many news developers are agnostic
with respect to which tools they use to get a job done, the people who
are building and sharing tools for data journalism are often doing it
with open source code.While some of that open source development has
been driven by the requirements of the Knight News Challenge, which
funded the PANDA and Overview projects, there’s a broader collaborative
spirit evidenced in the interstitial communication on Twitter, GitHub,
and mailing lists that connect the data-driven journalism community
around the world.Members of newsrooms that compete on beats are working
together on code. For instance, New York Times and Washington Post
developers are teaming up284database. 285 Data journalists from WNYC, the Chicago Tribune and the
Spokesman-Review are collaborating on building a better interface for
Census data.286helped build the Internet are building out civic
infrastructure.287newsroom stack,288be fiercely committed to “showing your work.” For data journalists, that
means sharing your source data, methodology, and code, not just a
notebook. To put it another way, “code, don’t tell.”289