In January 2020, OCLC announced that the Andrew W. Mellon Foundation had awarded us a grant to build a shared entity management structure that supports libraries as we move toward new ways to create and share information about their collections. These new methods—commonly referred to as “linked data”—represent changes to both underlying library data and the type of activities that library workers perform.
Even more importantly, they also signal a shift in how the library community can work together to build on each other’s work. I believe that no matter what type of library you are associated with, you and your users will benefit from this project.
A six-month checkpoint
In a recent post, we talked about Project Passage, which offered insights into some new interfaces and systems for librarians to create linked data. But those aren’t the whole story. We asked participants about what large data providers, such as OCLC, would need to do in order to make these systems work for real.
As OCLC’s past, present, and future are deeply intertwined with the notion of shared cataloging, I thought a six-month checkpoint for this new grant would provide a good time to talk about what our work here means for you and your library. When operationalized, linked data will provide participating libraries with:
- A massive collection of descriptive information and identifiers for creative works, persons, and other things libraries need to refer to
- The capability to enhance these descriptions, or add them for things missing from the collection
- An ecosystem (including a lightweight UI and APIs) that will allow library workers to create linked data natively, instead of through conversion from MARC
- Tools to reconcile local library metadata with that of the ecosystem, and connect library metadata with nonlibrary sources
It will also seed the web with identifiers that are meaningful to both library users and workers. We’ll be creating and publishing data on many millions of creative works, and persons associated with them—providing critical links for both describing and discovering our collections. By referring to these creative works and persons using consistent identifiers, applications will be able to make connections across disparate or diffuse collections.
A quick start
After hearing in late December about the impending grant award, we assembled a technical team and got to work. By the time the announcement was published on 9 January, this core team had already set up the technical tools and environments to begin development. OCLC is contributing a substantial amount of staff time and dollars to the project, essentially matching the Mellon award and allowing us to integrate the work into many other existing processes.
Our team is addressing each of the items above during the grant process, with a particular focus on providing linked metadata “at the point of need,” during the creation process. By making lookups faster and reducing the cycle time of traditional authority work, the project will make it easier and faster to include linked data Uniform Resource Identifiers (URIs) in all types of metadata.
That means no matter what type of metadata work you are doing—traditional MARC-based cataloging, a mix of Dublin Core and local terms in your digital repository, or full-on BIBFRAME—you will be able to make use of these identifiers. And as other libraries start to use them, providing your users context and additional content will be easier than ever.
Given the length of the project—this is a two-year grant, wrapping up in December 2021—we knew it was important to keep lines of communication libraries open. The field of linked data is rapidly evolving, and we want to make sure that at the end of the project we are meeting expectations and advancing the field.
In June, we welcomed more libraries to our Entity Management Advisory Group, which is the primary setting for conversations about the project. Twenty-five libraries from seven countries currently participate through online discussions and in monthly meetings. So far the topics have included APIs, UX Research, and data modeling.
We have continued and intensified our partnership with the Linked Data for Production (LD4P) project, also funded by Mellon, and with the Program for Cooperative Cataloging (PCC), who have made linked data and concepts of “identity management” a core part of their strategy.
In order to more successfully meet library workers’ needs, the User Experience Research staff at OCLC have also held focused interviews with dozens of librarians in order to shed light on the challenges and concerns around working with these new concepts in real-life workflows.
A first milestone … with more to come
There are other types of input that can be gathered only from users looking at and interacting with data and data tools. To that end, we recently offered a first look at the work we have been doing to a select set of libraries from the Advisory Group. This testing is the first of three preliminary checkpoints before the final, official release in December 2021. Libraries will be able to review data assembled for more than 1 million creative works and persons associated with their creation and subject matter. They will also be able to make use of both a simple user interface and a set of APIs.
While these tests are for a very early, limited set of functionality that we are already working to surpass, they are an important step for this critical project.
Linked data represents a way to dramatically increase the utility of library metadata for those of us who work in libraries and for our users. It’s also a way to increase our ability to partner with outside organizations that have data we can use productively, and that value the kinds of work we do. This significant step into operationalizing linked data and creating a shared infrastructure for the community is truly exciting for us here at OCLC, the libraries working with us, and partner organizations like the Mellon Foundation. I look forward to sharing more information about the project later this year.
For a look at recent developments, please see the recording of our 29 July 2020 webinar, “OCLC and Linked Data: Moving from research to reality.” The event featured speakers from OCLC and Temple University who discussed the CONTENTdm linked data project and the shared entity management infrastructure project.