The Art and Science of Data-Driven Journalism

Mentorship, Numeracy, Competition, Recruiting

While data journalism has gone mainstream in recent years, significant challenges lie ahead for traditional media and news organization to take full advantage of advances in technology.211McKinsey identified a gap between available analytic talent and the demand created by big data, there is a data science skills gap in journalism.212useless without the skills to analyze them, whatever the context.213exclude some of the best candidates for these jobs”but there will need to be training to bring them onboard.214universities have noticed the need to build capacity in these areas. In May of 2012, the Knight Foundation gave Columbia University $2 million for research215gap.216said Emily Bell, director of the Tow Center for Digital Journalism at Columbia University, in 2012. “Lots of people teach at the very low level, very few at the elevated level. Nobody teaches the algorithmic, advanced courses that you’d see in computational journalism. There aren’t many people who can do the latter, either professionally or [on the] teaching side.”In the United States, I’d estimate that the headcount of working data journalists numbers well under a thousand across all newsrooms and media organizations. Their ranks are growing, especially given clear demand in both traditional and startup media companies. Globally, there may be thousands of data specialists in the media, but not many more, unless we expand what practicing data journalism means. If creating and generating charts or tables from financial or sport statistics qualifies as data journalism, there are many more people who could be fairly said to be practitioners. The number of people applying data science to journalism or practicing high-level computational journalism, however, is clearly far smaller.The New York Times, for instance, has fewer than 10 staff working at that level in the entire organization, according to Aron Pilhofer, of which three are in editorial. “We have data scientists on the business side,” he said. “R&D has a couple, like Mike Dewar, who used to be at Bitly. These are people who are applying data science techniques to actual journalism, stories, infographics, and data visualizations.”In the United States, much of the top talent in the field is split between the New York Times, ProPublica, NPR, the Washington Post, the Chicago Tribune, the Wall Street Journal, and the Los Angeles Times, although there are many smaller shops doing great work. In many ways, the New York Times’ growing data and interactive teams look a lot like the New York Yankees of data journalism”while they do grow their own players, they also find and acquire the best talent available. Given the growth of news applications and online interactives in media, investment in this area is looking like a strategic imperative”and developing core competencies in creating them should be a preoccupation of university professors and journalism school deans around the world. When I interviewed academics and some of the leading practitioners of data journalism over the past two years, several obstacles to closing this gap emerged. The first, mentorship, is common to any profession, but less of an issue in this field. Most data journalists had a mentor or two who guided them early in their development and helped them to get started. A capacity for self-motivation and self-guided learning is important: While mentors played an important role as data journalists developed, in many cases people have picked up the skills on the job or in their free time, learning online and in workshops, not in their undergraduate or graduate educations. In each of the profiles of data journalists that I’ve published over the years, mentors were an important part of development.Said John Keefe, the data editor at WNYC: I could not have done so much so fast without kindness, encouragement, and inspiration from Aron Pilhofer, Scott Klein, Al Shaw, Jennifer LaFleur, Jeff Larson, Chris Groskopf, Joe Germuska, Brian Boyer, and Jenny 8. Lee. Each has unstuck me at various key moments and all have demonstrated in their own work what amazing things were possible. And they have put a premium on sharing what they know”something I try to carry forward. The moment I may remember most was at an afternoon geek talk aimed mainly at programmers.217of a phone app called Twilio, I turned to Al Shaw, sitting next to me, and lamented that I had no idea how to play with such things.“You absolutely can do this,” he said. He encouraged me to pick up Sinatra,218programming language. And I was off.Sisi Wei, a news application developer at Propublica,219different people for specific skills and ways of thinking: Tom Giratikanon showed me that journalists could use programming to tell stories and exposed me to ActionScript and how programming works. Kat Downs taught me not to let the story be overshadowed by design or fancy interaction, and Wilson Andrews showed me how a pro handles making live interactive graphics for election night. Todd Lindeman taught me how to better visualize data and how to really take advantage of Adobe Illustrator. Lakshmi Ketineni and Michelle Chen honed my JavaScript and really taught me SQL and PHP.Now at ProPublica, my teammates are my mentors.220news app development really works and how to handle large databases with first ActiveRecord and now ElasticSearch.New York Times news developer Derek Willis started working with databases in graduate school at the University of Florida:I had an assistantship at an environmental occupations training center and part of my responsibilities was to maintain the mailing list database. And I just took to it. I really enjoyed working with data, and once I found Investigative Reporters and Editors, things just took off for me. A researcher [at the Palm Beach Post], Michelle Quigley, taught me how to find information online and how sometimes you might need to take an indirect route to locating the stuff you want. Kinsey Wilson, now at NPR, hired me at Congressional Quarterly and constantly challenged me to think bigger about data and the news. Willis’ experience was not unique: The trend that leapt out of the research was the degree to which peer-to-peer learning and peer networks are crucial in the practice and growth of data journalism. (He and other IRE members continue to pay it forward.)The NICAR listserv is a busy, daily reminder of the generosity of the connected community of over 1700 subscribers. Given the reality of many working journalists who never attended journalism school, mentorship and networked learning will continue to be important factors in the development of more data journalists.Second, there’s improving the level of fundamental numeracy in the media, according to Pilhofer:Journalism programs need to step up and understand that we live in a data-rich society, and math skills and basic data analysis skills are highly relevant to journalism. The 400+ journalists at NICAR still represent something of an outlier in the industry, and that has to change if journalism is going to remain relevant in an information-based culture. Journalism is one of the few professions that not only tolerates general innumeracy, but celebrates it.I still hear journalists who are proud of it, even celebrating that they can’t do math, even though programming is about logic. It’s hard to get a journalist to open up a spreadsheet, much less open up a command line. It is just not something that they, in general, think is held to be an important skill.It’s baffling to me. Look at the Sun-Sentinel, which just won another Pulitzer for a story on speeding cops that you could only do with data analysis. You would think you wouldn’t have to make the case that this is core to what journalists should know. It’s a cultural problem. There is still far too much tolerance for anecdotal evidence as the foundation for news stories.Like many data journalists I interviewed, Pilhofer originally learned to program because he needed to do something, in this case while he was at the Center for Public Integrity:I can thank an IRS story on 527 committees, which were then the campaign finance loophole du jour. They were previously unregulated and Congress, in its wisdom, put the IRS in charge of regulating them. It was idiotic. The IRS is not a disclosure agency. They put together the world’s worst disclosure website. There was basic data there, but you couldn’t aggregate it or access it in a meaningful way. It would have taken thousands of mouse clicks to get all of it.I talked to a public information officer, after they denied my FOIA request for the database underlying the site. He said it was all on the website. So, I created the world’s worst Web scraper in PHP. It ran from the browser. I didn’t know the command line well.Cultural changes will need to start before journalists leave school. “I wish that no j-school ever reinforces or finds acceptable, actively or passively, the stereotype that journalists are bad at math,” said Wei. “All it takes is one professor who shrugs off a math error to add to this stereotype, to have the idea pass onto one of his or her students. Let’s be clear: Journalists do not come with a math disability.” Chase David agreed, saying, “Most journalism students can’t code or do math, while most computer science students don’t know storytelling. Hybrids on either side are rare, and we’re scooping them up as fast as we can.”Third, students with the most aptitude for data journalism have data science skills that are in high demand in the private sector. In 2013, found that job postings for data scientists had jumped 15,000McKinsey & Company predicted in 2011 that there would be a 50 to 60 percent shortfall in data scientists by 2018.221data science skills that are useful in media, from programming, to statistics, to data cleaning, to analytical thinking are directly transferable to finance, business, or technology jobs, with a much bigger paycheck at the end of the week. In some cases, they’re transferable into media as well. Many data journalists didn’t go to journalism school, said Chris. W. Anderson, assistant professor of media culture at the City University of New York. For example, people like NPR’s Brian Boyer or the AP’s Jonathan Stray developed programming skills elsewhere and then entered ”journalism” because of their interest in public interest work. Organizations are now competing for talent with more than prestigious outlets or broadcast news.“The best people who could help media organizations are getting hired away by Silicon Valley” or ”Silicon Alley”’before they finish j-school,” said Anderson. “The top-of-the-line programs at NYU and Columbia are beautiful recruiting grounds for Google or Facebook.” Finally, media companies and journalism schools need to value and fund training in digital skills, from teaching journalists how to use spreadsheets to thinking algorithmically. While not every journalist needs to code, everyone who works in the media does need to be digitally literate, numerate, and understand how technology relates to sourcing, storytelling, and audience development and relationships. The good news is that many journalists have been learning how to use these tools for decades, aided by the experiences and support of others.“I discovered that I really enjoyed the coding part in addition to reporting,” said Aron Pilhofer, associate managing editor for digital strategy at the New York Times. “The art of it. That’s how I ended up shifting into my current job.” Before, he reported about politics:I was a political reporter, but always used data in my reporting. I just started doing it in college. I just started messing around. I had a history professor who was not well known then. Now, he’s borderline famous from doing quantitative methods in history. He’d do statistical sampling of historical census data that had just been paper records before that. Suddenly, you could do queries on the 1930 Census. You were not just basing a historic analysis on papers or on interviews with people, or what you could glean from anecdotes. You were looking at data. It was incredible. That’s not that different from a data journalist does, on the CAR side. Instead of a person, you’re using data as a source.Jeremy Bowers, a news developer at the Wall Street Journal, started on the tech side:I started in data journalism at the St. Petersburg Times. I’d been working as the blog administrator for our online team and was informally recruited by Matt Waite to help out with a project that would turn into “MugShots.”I have no special degrees or certificates. I was a political science major and I had planned to go to law school before a mediocre LSAT performance made me rethink my priorities. I did have a background in server administration and was really familiar with Linux because of a few semesters spent hacking with a good friend in college, so that’s been pretty helpful.News app developer Dan Hill learned both journalism and computer science at Medill:I’ve always wanted to be a reporter, but the work of Phillip Reese at The Sacramento Bee and the Chicago Tribune’s news apps team222I was a student fellow for the Northwestern University Knight Lab223an internship with the Washington Post taught me how to apply what I was learning in a newsroom.AP data journalist Serdar Tumgoren, co-creator of the Knight News Challenge-funded OpenElections project,224and picked up new skills as he went:The document chase quickly broadened to include data, and led me down a traditional “CAR path” of spreadsheets, to databases, to programming languages and Web development. When I first started programming around 2005, I took a Perl class at a community college. …You don’t need a computer science degree to master the various skills of data journalism. I learned how to apply technology to journalism through lots of late-night hacking, tons of programming books, and the limitless generosity of NICARians, who shared technical advice, provided moral support, and taught classes at NICAR conferences.Mother Jones interactive editor Tasneem Raja also picked up data skills in journalism school: I was a staff writer at the Chicago Reader in the mid-2000s, which was, of course, a scary time to be in news. When a bunch of my senior mentors there, all writers, got canned in 2007, I decided to reevaluate my career and went to j-school at Berkeley to learn new skills. I was lucky enough to be there while Josh Williams was teaching Web development (he left for the NYT, where he worked on “Snowfall” and tons of other big interactive pieces), and essentially attached myself at the hip. It turned into a year-long independent study, and got me a job on the launch team at The Bay Citizen, where I created a news apps team that made some really cool data projects for the Bay Area. (RIP, TBC.)Culture really matters here, said Scott Klein:People with the right mindset, who feel valued for their editorial judgment and creativity, and who are given real responsibility over their work, will learn whatever they need to learn in order to get a project done. The people on my team focus on telling great journalistic stories and don’t let not knowing how to do something stop them from doing so. They learn whatever skills, techniques, and expertise they need to learn. In terms of journalists learning how to program, I think there are some myths about what programming means. It doesn’t have to mean a computer science degree and it doesn’t have to mean what Google does. I know journalists who make incredibly complex scrapers for their reporting work who will tell you they don’t know how to program. Really, making tools to automate tasks is what a programmer does. There’s no magic threshold you have to pass between programmer and not-programmer.Of course, there is a difference between knowing how to code and being a computer scientist. If you’ve learned about algorithmic efficiency and can express it mathematically, and if you’ve studied how compilers work, all under the guidance of a person who knows the subject very well in an academic environment, you’ve got skills that will help you write better, faster, more efficient code. That’s different than learning how to use a high-level programming language to get a task done.Much of what we do in newsrooms is on deadline and meant to be put behind a caching system that makes efficient code much less important, so computer science is not a prerequisite for being a great newsroom coder. In newsrooms, most of us rely on frameworks like Rails or Django that already make great low-level programming decisions anyway.“I suspect it is possible that a journalism degree will become a bolt-on for most of this kind of work,” said David Johnson, a journalism professor at American University. “People will probably get their main degrees in hardcore fields, either doing minors in journalism or getting a degree like the Columbia 2-year or the Medill program.”Historically, only a few journalism schools have done a good job teaching data-driven journalism, said Anderson. (That’s changing, as I explain later.) Much of what’s cutting-edge today in data ”journalism”, extending into data science, he suggested, goes well beyond traditional CAR and is being shared through peer-to-peer learning online and in person, at meetups, hackathons, and workshops. One clear exception, however, lies along the Missouri River, in the center of North America. For decades, the National Institute for Computer-Assisted Reporting (NICAR) has been one of the most important institutions training journalists to use information technology. Founded in 1989, NICAR is a program of the Missouri School of Journalism and the Investigative Researchers and Editors (IRE). Since NICAR was created, the use of data analysis and statistics has evolved into a core component of investigative reporting, augmenting and extending what journalists can do. If you want to find the people doing the best work, look to NICAR’s extended community around the globe, subscribe to the email newsgroup, or attend its annual conference, which has become the preeminent gathering of practicing data journalists in the world.225demand for more tutorials and workshops on data-driven journalism tools and best practices beyond those offered in classrooms or at NICAR’s annual conference. One of the most common questions I heard from members of the media over the past three years has been, “Where can I go to learn more?” As time has gone on, I’ve been able to point to more.Interest in the industry as a whole is present: In the spring of 2013, the University of California at Berkeley’s free online data journalism training226create data journalism educational materials was fully funded.227campaign were taught by leading practitioners in the field. “For Journalism” endures as a free online resource for anyone who wants to learn more, including webinars, ebooks, code repositories, and forums.228