Getting started with linked data

2016-02-04-Roy-linked-data

“Linked data” is a popular topic at library conferences these days, with overflow crowds wondering what it might mean for their institutions and their personal professional development. Why? Because linked data can be easily understood by computers, resulting in opportunities for improved library workflows, enhanced user experiences, and discovery of library collections through a variety of popular sites and Web services, including Google, Wikipedia and social networks.

Before briefly explaining the nuts and bolts of linked data, it is important to point out that for most libraries and librarians the sea change from records full of text strings to fully linked data elements will largely occur under your radar.

While there will likely be changes to workflows and certain operational procedures, the main changes will be absorbed by the automated systems we use and our underlying bibliographic processes. That is why OCLC has been working for years to understand the benefits and opportunities of linked data. Having said that, to best understand the opportunities offered by linked data, a basic understanding of what it is will be important.

The sea change from records full of text to linked data will occur largely under libraries' radar. Click To Tweet

A triple by any other name

The essential concept of linked data is actually quite simple: you are stating, in a machine-understandable way, that something has a particular kind of relationship with something else. For example, in human terms you can say that the person William Shakespeare has a relationship of “author” with the work “Hamlet”:

William Shakespeare -> is the author of -> Hamlet

That relationship is known, in the world of linked data, as a “triple.”

For a machine to be able to process that information, it must be encoded in an appropriate computer language. And to be absolutely clear about which things you are linking, any string of characters (“William Shakespeare,” “is the author of” and “Hamlet” in the example above) should be replaced with a URL link that references machine descriptions of the individual elements:

http://viaf.org/viaf/96994048 (Shakespeare) -> http://schema.org/author (has an author relationship to) -> http://worldcat.org/entity/work/id/1154449927 (Hamlet)

This method also allows for the discovery of more information about each part of the triple, as the URLs can contain a variety of information, including relationships to other data stores.

How this relates to library data

OCLC has worked for many years to leverage the value of our massive aggregation of library bibliographic data represented by WorldCat to establish these kinds of machine-understandable relationships. We do this in a variety of ways, from creating new kinds of services like WorldCat Identities, to creating linked data for exposure to Web search engines and other uses.
OCLC Research has developed techniques for data normalization, deduplication, and enhancement that continuously improve the data aggregation that our member libraries rely upon. We produce regular statistical reports on the use of bibliographic elements that provide a view into how the profession has used its foundational standard over time. To see this, go to http://experimental.worldcat.org/marcusage/ .

Our work in linked data leads the industry in the creation of authoritative linked data entities for individuals and works. These linked data assertions are constructed not by a simple one-to-one translation from a bibliographic record to linked data, but by mining meaning from the entirety of WorldCat.

For example, all instances of William Shakespeare are aggregated and reviewed to make one canonical linked data assertion about that author. This linked data work forms a key value proposition and differentiator for our member libraries.

Here is an example of a Work record that aggregates information from all of the manifestations of that work: http://worldcat.org/entity/work/id/2406166 .

The potential

In the future, we expect to provide data services to our member libraries that could include record conversion from MARC to BIBFRAME, linked data entity resolution and enhancement, and other new kinds of data services that libraries will increasingly require.

Already underway is a Person Entity Pilot project, now in its second phase, which we are using to better understand library use cases for linked data and the types of services libraries will require to support their local use of linked data resources.

By contributing your records to WorldCat, your data will be a part of this rich ecology of modern data services that will enable enhancements to your existing services as well as providing opportunities for new kinds of library services to better serve your community.