Taking the plunge into data journalism

Data journalism can be time consuming, but it will bolster the authority of your reporting. (Lindsay Holmwood/Flickr Creative Commons)

This article originally appeared on Navigator, GroundTruth’s newsletter for early-career journalists. You can subscribe to Navigator here

Even if you’re on a daily assignment beat, you can still be more enterprising and use data to back up your reporting, or to verify a public official’s statement. But compiling and vetting your data can be a time-consuming process — it isn’t always as simple as finding a single data point.

But it is worth it.

Cody Winchester, a training director at Investigative Reporters and Editors, says using data, even in daily reporting work, lets you move beyond a “he said, she said,” approach to journalism, adding authority to your reporting, rather than “leaving it up to your readers to sort out the truth.”

We talked to Cody about some of the basics of beginning to work with datasets.

Beyond that, working with data can become technical and requires some knowledge of programming, which can be learned! Hopefully, these tips about working with data and thinking about it pique your interest.

1. Have a specific inquiry

It is more effective to come to data with an idea in mind or a question you’re trying to answer, rather than to pull a story out of a data set.

“You’ll get a better story if you’re coming into it and you already have a hypothesis to test…and you start looking for data as evidence to support or refute whatever your hypothesis is,” Cody says.

And it’s easy to get lost in the data, so stay focused. Cody says he makes a list of “questions I have for my data, and keep them targeted to the thesis of my story.”

Also, you have to find out what kind of data are kept and where. “You need to do the homework to understand at a very specific level how information moves through the organization that you’re covering,” Cody says. “So that when it comes time to request the data and then to analyze it, you’ll not only have an understanding of how it’s kept and by whom, but also understand the assumptions of people who collect the data. A lot of that comes down to shoe leather reporting, source work.”

Data is not an end into itself, Cody says, it’s just another way to get to the story.

2. Data literacy

This means, basically, being well-versed in basic math and statistic concepts, like sums, averages, means, margins of error, outliers and the like, so that you can figure out whether values are newsworthy and so that you don’t come to any faulty conclusions.

You can find a basic math guide for journalists here.

3. Cleaning up your data

“One mistake would be to dive into your analysis without understanding the flaws in your data,” Cody says. “Every data set is dirty in its own way, and the vast majority of the work in your analysis is going to be cleaning and prepping the data.”

Spreadsheets (through Excel or Google Sheets) are the essential baseline tool you’ll use for working with data. But when you receive a spreadsheet from an agency, the data will usually be a mess. You’ll need to standardize and format your data. There’ll be blank rows, misspellings, mistakes in data entry, duplications. Cody advises that in addition to requesting the actual data, you should also request a record layout or code book, the internal guides the agency uses that will help you understand the data entry.

Cody gives the example of looking at campaign finance records when he was a reporter in Nebraska. He and his team saw the same name multiple times and misspelled names, and they had to figure out whether those entries represented one person or different people, to avoid double-counting. “We grouped our data by last name, and manually vetted all of those names,” Cody says.

OpenRefine is an online resource to help clean up your data (but it does require some programming knowledge).

4. Ask for help

This stuff can be daunting. But experienced reporters are really willing to help! Cody recommends finding mentors, who can teach you, or or buddying up with other people who are learning.

Cody recommends signing up fro the NICAR listserv, where you can connect with the data journalism community and ask questions. He connected with the listserv when he was a beginner, and found it to be a welcoming community.

And you can always get formal training from organizations like IRE.

Additional Resources

The Quartz guide to bad data

ProPublica’s Guide to Bulletproofing Your Data

IRE’s Events and Trainings