Increase data reusability and enhance your curation investments with these three tips

data reuse

In many cases, collecting and processing original research data is incredibly costly and difficult. It can involve travel, field work, painstaking examinations, and observations. Sometimes unique, expensive equipment or one-time access to materials or events that can’t be recreated is required. But it’s worth it if the data yields new scientific insights and advances.

And if that data can be reused in other studies, it makes the return on investment (ROI) much more attractive for universities and funding bodies. Professionals in libraries, archives, and museums have a unique view into the needs of researchers. We can develop and promote new services and procedures that encourage data sharing and data reuse.

For 10 years, I’ve been studying data reuse in scholarly communities. It has evolved into studying scholars’ data management and sharing practices and the library’s role in supporting these kinds of activities. My goal is to inform the design and delivery of research data management programs in ways that increase data’s value.

Aligning needs for effective data flows

The needs for scholars who create, manage, and share data are different than those of discovery, access, and reuse. If we want data to flow through the life cycle more effectively, we must consider how different phases positively and negatively influence each other and what adjustments can be made to better align needs. My colleagues Elizabeth Yakel and Zachary J. Maiorana and I were able to do just that for The Anatolia Project.

Increase data reusability and enhance your curation investments with these three tips. #OCLCnext Click To Tweet

The project involved 12 zooarchaeologists and two data curators working collaboratively to share, curate, and reuse the data from 14 archaeological sites in Anatolia (now primarily Turkey). It brought together animal bone data collected from many sites, over a long time period and a broad geographic range, to answer questions about animal domestication and the transition from a hunter-gatherer to an agrarian society in a more comprehensive way. The study was published in PLOS One and the data were published in Open Context. It was a perfect opportunity for us to study the impact data practices have on key life cycle phases, in order to inform the work of intermediaries that support them.

Steering data through life cycle phases

In our study, the negative impact that inefficient data production practices had on later phases of the life cycle was stark. If it weren’t for the curatorial interventions that steered data through the life cycle phases, the data wouldn’t have been reusable. But interventions after-the-fact are time- and effort- intensive. Like many, we believe intervening earlier in the data life cycle is critical.

I’d like to focus on three ways archaeological data were steered through the life cycle that other information specialists can adapt:

  1. Create transparency throughout the life cycle
  2. Discuss data selection
  3. Apply consistent data standards

Working together to improve the outputs, we greatly increase the likelihood that data will be reused in ways that increase the ROI on original research.

1. Create transparency throughout the life cycle

A few researchers had the foresight to discuss data quality with curators prior to sharing as a means to create transparency around what data they collected and how the data were recorded. But for most, it was exposure to others’ data during reuse and/or questions resulting from the data they shared that will have the most profound influence on their data collection and recording practices going forward. As one of the zooarchaeologists said during our study:

“When I worked with other people’s data … That also made me more conscious about the way I look at my own data … it actually has changed, tremendously changed the way I look at my own data and data collection.”

The transparency created through increased access to data via sharing and reuse changed scholars’ perceptions around acceptable data management and documentation practices within their discipline.

Steering data production: Give researchers access to well-curated data within their discipline to prime discussions about data quality prior to data production and to shape perceptions about data publishing norms.

2. Discuss data selection

When we talk about data selection, it’s usually from the perspective of repository staff building data collections. Even though data producers were encouraged to share all of their data for The Anatolia Project, some made data selection decisions without telling the data curators. The decisions were made for various personal, practical, and legal reasons and balanced perceived time and effort required to prepare data with what they thought re-users would need. One zooarchaeologist explained how she “cleaned” the data, which included deleting data she didn’t think necessary to share.

“I also cleaned everything as much as possible, so for example, I only sent sort of standard things like species. So, I didn’t send for example butchery marks, which I have a very specific code for … it wasn’t necessary for this particular data sharing so that was cut …”

Steering data selection: Require data producers to specify their data selection decisions and rationale, so both can be discussed, negotiated, and documented.

3. Apply consistent data standards

Developing and using data standards during data production is great, but only if the standard is applied consistently. We found researchers introduced personal and regional variations to Payne’s 1973 system for categorizing tooth eruption and wear, a well-established standard in archaeology. This not only caused confusion when trying to understand the data, but also limited data integration. As one of the curators explained:

“We do have tooth wear data, but it just wasn’t in a format that could be clearly integrated … We could provide all of that to the analysts, but it will be a lot of columns of pretty disparate data … ”

After much back and forth, the curators were able to sort out the different recording schemes and apply Payne’s standard as specified.

Steering data standards: Work with disciplinary communities and repository staff to create data content and data value guidelines that ensure researchers apply and record data standards in the same way.

Amplifying the value of research

As professionals in libraries, archives, and museums, we are responsible for meeting researchers’ data production and reuse needs. We are uniquely positioned to use our insights to help align those needs throughout the life cycle. By applying these lessons to data at our institutions, we will multiply the value of research data and increase the visibility, relevance, and return on investment of the work we do every day.

That’s a great achievement for any library, museum, or archive to crow about.