Collective collections: Prospective and retrospective collection coordination

btaa-report-cover-smCollective collections are collections addressed at a level above the individual institution. I introduced our recent report on operationalizing collective collections in my last post, where I noted that there is a trend towards more managed collective collections, particularly within consortial settings. Collection coordination is central to building collective collections.

In the report we make an important distinction between two approaches to coordinating collective collections: retrospective collection coordination and prospective collection coordination. This is really about the level of coordination of collection development strategy and practice across a consortium or group. (I comment on our terminology choice at the end of this brief post.)

In the current model, collections are optimized locally and only retrospectively considered consortially. Shared approaches are layered over relatively autonomously developed collections. We can identify three important strands of retrospective collections activity, in each of which the BTAA has been active:

  1. Resource sharing: providing access to the resources of the full network to all members. Of course, resource sharing is evolving in interesting ways, becoming a channel for interacting with a variety of fulfilment networks, spot acquisitions, open access resources, and so on. One outcome of this is that we will see greater convergence between discovery, resource sharing, collection development and acquisitions. We can expect to see the links between these service areas become more automated and data driven. So, in a simple example, acquisition of articles or books may be triggered at some level of demand as indicated by discovery or resource sharing patterns of use.
  2. Shared print: coordinating stewardship of the print and scholarly record at the network level. This has been a major consortial interest in recent years. It is being coordinated by existing resource sharing networks, but new affiliations have also emerged to address shared print specifically, and we are seeing nascent collaboration across these.
  3. Digitization: selective digitization of collections. Digitization was given a boost by the Google initiative, and Hathi Trust has emerged as an important collaboration, consolidating access, management and preservation.

These areas have tended to be managed independently. There is great opportunity to coordinate them more strongly at the consortial level. Shared print and resource sharing are alike aspects of collections logistics, for example, thinking about optimal distribution of collections for rapid delivery across a consortium. ReCAP is particularly interesting in this regard, representing a hub which consolidates print collections, a logistical infrastructure, and stewardship responsibilities. And digitization could be driven by demand in context. This begins to move us towards a more prospective model.

A prospective collection coordination model offers stronger coordination of collecting activity across the network. This could be based on some division of collecting responsibilities and better awareness of systemwide distribution of collections, or even on a move of some decision-making and budget to the center (shared acquisition of e-books for example, or of open access collections). In this model, collecting activity could be optimized across the network rather than at the individual library level. An approach might be based on type of material, or publisher, or subject. Over time, addressing particular needs at a shared level might allow institutions to specialize in areas of local expertise or distinctive research interests. In this model shared print is not a retrospective rationalization of the collective collection, but a formative element of its development.

Of course, consortia are used to working with a version of this model for licensed materials, but it is not common for purchased or print materials.

While attractive in theory, prospective collection coordination has proved difficult in practice and is not widely adopted outside of some niche areas. There is some concern that coordination costs are high, potentially reducing local responsiveness (to particular faculty or learning needs), and there is reluctance to surrender local decision making or local control over budgets.

While there is some acceptance that the retrospective model of coordinating relatively autonomous collections model is inefficient, it has been easier to achieve in practice than a prospective model. This is because it does not require strong coordination of institutional collection strategies. It also seems that the risks of working with legacy collections are perceived to be less than those associated with prospective collection development, which will require a new level of mutual interdependence and ceding of control or budget.

In our discussions with BTAA and other librarians we sensed a real desire to make progress with a prospective coordination model, while acknowledging the difficulties.

Of course, the two are not mutually exclusive and if prospective collection coordination were to become more common it would exist in concert with the retrospective approaches we identify. More optimal distribution of collections would make the whole system more efficient.


A note on words

We found the use of ‘coordination’ and ‘prospective’/’retrospective’ useful in various ways.

Our use of coordination has two aspects. First, and more broadly, we discuss library consortial activity as involving various degrees of coordination along a spectrum from consolidation to relative autonomy. "Coordination" admits of some possible variability in approach. In particular we wanted to emphasise that the degree of coordination is a negotiated outcome – not a defined given. This is related to the second motivation. It is free of the legacy associations or preconceptions that "collection development" or "collaborative collection development" often inspire. In particular, "collaborative collection development" comes with a host of strong opinions and memories of less than successful outcomes. In summary, "Collection coordination" acknowledges a discretionary approach and is neutral in respect of past practice or expectations.

Our use of “retrospective” and “prospective” is influenced by Karla Strieb’s usage: “Shared Collections: Collaborative Stewardship, Chapter 1: Collaboration: The Master Key to Unlocking Twenty-First-Century Library Collections.”

Acknowledgements: This blog entry is based on the text of Operationalizing the Big Collective Collection by Lorcan Dempsey, Constance Malpas, and Mark Sandler.

Our thinking was influenced by a broad range of discussions with colleagues in BTAA libraries, in OCLC, and in other organizations with an interest in these topics. A full list of acknowledgements is included in the report.

A reservoir not an ocean – visualizing and operationalizing collective collections

When I think of the Google Books initiative now, three things stick with me. The first is simply what an audacious idea it was – to digitize all the books. The second is that without it, the book literature is less accessible than the web literature, which seems a pity. Google Books has allowed fine-grained discovery over the topics, people, places and so on which otherwise would largely be hidden between the covers.

The third is more subtle, but marks an interesting shift in how we think about library collections and books in general.  Before the initiative, we thought of the books in library collections as a vast expanse. We could not see the edges. They were like an ocean. Afterwards, the aggregate library collection appeared more bounded, more finite. More like a reservoir which could be measured and managed. And as Google spoke with various libraries about filling out parts of their digital corpus this became clearer. Indeed, it made it more realistic to actually talk about an ‘aggregate library collection.’

OCLC and WorldCat have played an important role in this shift also. WorldCat represents the holdings of thousands of libraries around the world. It is the best available proxy for the aggregate library collection and by extension for the scholarly and cultural record of which libraries are the steward.

OCLC and WorldCat have played an important role in this shift also. WorldCat represents the holdings of thousands of libraries around the world. It is the best available proxy for the aggregate library collection and by extension for the scholarly and cultural record of which libraries are the steward. In recent years, we have looked at providing an empirical base for discussions about that aggregate collection. This has meant that we can talk about the aggregate library collection or the collective library collection while having a real sense of its contours, in whole or at different levels (the collections in a particular region, for example, or the intellectual output of a particular country). While much of work has been in North America, we have also done work with library collections elsewhere.

In fact, an important early analysis of this kind was of the original ‘Google 5’ libraries who participated in the ‘Google Print Library Project.’ The findings here have been confirmed over subsequent investigations, largely carried out by Constance Malpas and Brian Lavoie. Especially notable is the finding that, in my colleague Brian Lavoie’s words, rareness is common – many libraries do in fact have materials that are not widely held. Which in turn leads to the need to have wide library participation to ensure broad coverage of the published record (in resource sharing, digitization, or other initiatives).

Brian has just written about some of that work  – mining WorldCat for insights about the aggregate library collection and the characteristics of the scholarly and cultural record it represents. He presents an updated version of a map of library collections in the US and Canada, originally created several years ago by our colleague J.D. Shipengrover for our Print management at mega-scale report. This shows the concentration of library collections laid out over Richard Florida’s mega-regions framework.


I think that this picture – and the variety of analyses that accompany it – has been influential in the shift I mentioned above – the shift to thinking about library collections as a reservoir. It puts a real shape on the distribution of the aggregate North American library collection, quantifies it, and hints at the type of regional discussion about shared management that we are now seeing.

An important aspect of this more bounded view is the emergence of what I call collective collections – where collections are addressed at various aggregate levels above the individual institution.

A collective collection might be realized at different levels of integration: through data (a visualization, for example), or through discovery (what is available through a particular discovery interface), or through federating applications as in consortial borrowing (what is available for lending), or through actual physical consolidation (at CRL, CAVAL, ReCAP, or some other shared storage), or through some other agreement. A collective collection might also be at different geographic scales: state, regional, national and so on.

Here are some brief observations about collective collections, in terms of how they are managed and in terms of implications for their use.

  1. Visualising and analysing the collective collection: Where we have aggregate metadata or full-text from collective collections, we can visualize  them in interesting ways, identify patterns in their composition, and draw out useful inferences touching on a range of issues. This might be about the management and disposition of collections (as in the mega-scale report mentioned above) or about the body of knowledge represented by those collections (as for example, in our study of the Irish published record, summarized here). Of particular interest in the latter case, is the recent announcement that the HathiTrust Research Center now “provides access to the text of the complete 16.7-million-item HathiTrust corpus for non-consumptive research, such as data mining and computational analysis, including items protected by copyright.
  2. Operationalizing the collective collection: In recent years, the management of print library collections at the collective level has become an active area of discussion, planning and implementation. Existing organizations have taken this on as an activity (e.g BTAA) and some new organizations have emerged (e.g. WEST, EAST). Managing the collective collections of these groups has had a regional flavor, which makes sense from a logistics point of view.
    Decisions about consolidating print collections are also influenced by the availability of digital surrogates. The Internet Archive and Google Books are important in this context. The role of HathiTrust is of special interest, given its mission to preserve the scholarly record and the fact that it is a collaboratively managed within the library community.  The relationship between the digital resource curated by HathiTrust, the print resource curated by CRL, and the arrangements put in place by shared print collaborations will be more closely managed within an evolving ecosystem of services.
    These activities are about retrospective collection development, configuring materials that have already been added to library collections. At the same time, the changing role of the book collection, the emergence of shared approaches to management, and the availability of new analytics services will make prospective, collaborative collection development more feasible than it has been in the past, as libraries do actually think about building and maintaining their collections in a collective context.
  3. Calibrating the collective collection: WorldCat (and union catalogues in other parts of the world) have emerged as important sources of comparative intelligence as libraries begin to look at print collections in aggregate and think about managing particular collective collections. And also as libraries want to calibrate their collections against a regional, national or broader context. Or as they want to calibrate their collection against existing digitized collections. Or as they want to think about prospective collection development in collaboration with peers. Having intelligence about collections at various aggregate levels is now increasingly important, as libraries want to make responsible decisions within a regional group or in the context of general availability, and as groups want to make decisions about coverage and overlap.
    Increasingly, libraries want to balance individual managing down of collections with a collective responsibility to the print scholarly and cultural record. And of course, different libraries and groups will recognise different responsibilities here. The ability to record retention commitments in WorldCat is an important infrastructural element, as libraries and groups can disclose important intelligence about their collections to help decision making.

The management of collective collections also influence how libraries organize themselves and their services. Here are two areas of note in this context ….
The management of collective collections also influence how libraries organize themselves and their services. Here are two areas of note in this context ….

  1. Collaboration at scale. Operationalising collective collections will happen inside consortial arrangements, existing, or as noted above, newly created.  Much of this activity will have a regional scale, given the logistics of print materials. Although there is also a desire to aggregate or coordinate at higher levels and certainly there is advantage in having consistent policy frameworks.  At some stage we might expect to see a level of coordination which assures the integrity of the print scholarly and cultural record.

    At the same time, some activity may be better carried out at network scale. It is useful to have data at this level for example and OCLC supports a network of over 16000 libraries who collaborate at scale to describe, discover and share their collections. This has resulted in important infrastructure in WorldCat and associated services which is widely relied on. HathiTrust is in an interesting situation – will it evolve the organizational and sustainability models for it to become a persistent part of  infrastructure, widely relied on in the library community and beyond? Will it scale to support library collaboration at network level?

    In this way, the collective collection throws up questions about models and issues of collaboration at several levels.  I have discussed many of these questions elsewhere.

  2. Library logistics.  Logistics is about moving materials quickly and efficiently through networks. Library operations increasingly have a logistics flavor – thinking about how best to manage stocks and flows. We will see increasing interaction between collection development, resource sharing, shared print and digitization as libraries look at an ecosystem of services around effective management and delivery of collections. This will be multiscalar (Ohio State participates in OhioLINK, BTAA and HathiTrust, for example). And it will involve interesting decisions about consolidation vs federation (of collections, data and applications), which in turn involves various tradeoffs, notably between efficiency and control at different levels (what sort of authority, for example, will individual libraries cede to consortia of which they are a part).

Libraries have entered a phase where they are now looking at managing print collections as a finite resource, and where there is systemwide attention to collective collections. Collections are reservoirs to be managed.  We are in a period of organizational design and innovation as new frameworks for thinking about and managing collective collections are put in place.

Libraries have entered a phase where they are now looking at managing print collections as a finite resource, and where there is systemwide attention to collective collections.

Thanks to my colleague Brian Lavoie for comments on an earlier version of this post.

The facilitated collection

Collections have been central to library identity – we have discussed how library collections are changing in a network environment elsewhere (Collection Directions: The Evolution of Library Collections and Collecting – PDF). Support for the discovery, curation and creation of resources in research and learning practices continues to evolve. In this blog entry I discuss one element of these changes, the emergence of what I call the facilitated collection, a coordinated mix of local, external and collaborative services assembled around user needs

Collections and a print logic

Library collections were strongly shaped by a print logic. This required the distribution of print copies to multiple local destinations. In this way, materials could be closer to the user, to allow immediate access. This had two consequences. First, collections were assembled on a ‘just in case’ basis. And, second, the size of the collection was strongly associated with the goodness of the collection. The larger the local collection, the more potential local requirements could be met. The library collection was an owned collection.  Of course, the library collection is no longer entirely owned (think of the importance of licensed collections, for example) but this was an important shaping influence on our collections and still influences our thinking about them. And, certainly, the size of a locally owned collection is still important in popular perceptions of ‘goodness’. Just look at library job adverts or university promotional materials for potential students where it is common to mention collection size.

The facilitated collection is organized according to a network logic

The network environment is very different in several important ways. These are well-known, but it is useful to list some.

  1. The need for local physical distribution is much diminished as materials are available on the network. The print model was one of scarcity, requiring distribution to local nodes. The network model is one of abundance, encouraging aggregation at central hubs. The network is rich with informational opportunities – think of reference (Wikipedia), education (Khan Academy), discovery (Google Scholar, Amazon, Goodreads), research networking and social discovery (ResearchGate), software management (Github), data storage and manipulation (OpenRefine, FigShare), and so on.
  2. Discovery has been peeled away from the local collection, and a variety of network-level discovery venues exist (Google, Google Scholar, ResearchGate, and so on). Discovery often happens elsewhere, often going far beyond the collection.
  3. Creation activity happens in a digital environment, with a growth of interest in the process as well as the products of scholarship and learning. Support for digital scholarship and research data management is emerging as services around creation join those around discovery or consumption. We see a blurring between content, workflow and identity (think about researcher profiles and research networking sites). There is a growing interest in sharing research and learning outputs from the institution with external users – research data, for example. Together, these are becoming an important focus for academic libraries.
Collections spectrum
Collections spectrum

The emergence of the facilitated collection is one element of this changing picture.  [pullquote]The facilitated collection is organized according to a network logic[/pullquote], where a coordinated mix of local, external and collaborative services are assembled around user needs. This aims to meet research and learning needs in the best ways available, and not just by assembling material locally. This is actually a significant shift in how the library thinks about what it does.

Here are some central strands of the facilitated collection.

1. The external collection

Libraries now provide access to many network resources they do not own or license. These include guided access to Google Scholar (there is an incentive to provide proxied access to this so that links to licensed resources work in a well-seamed way), inclusion of ‘free’ ebook resources in the catalog (e.g. HathiTrust collections), or pointing to various resources with the very popular LibGuides or other resource guides. Indeed, resource guides are an interesting signal of the facilitated collection as they are organized around user interests rather than around local collections. They may point to local collections, but typically also to externally available resources. The library discovery layer provides another example, where libraries may provide discovery access to resources which they do not hold.

2. The move to licensed and just-in-time

A large part of academic library collections is now licensed. First abstracting and indexing services, then e-journals, then ebooks. This has moved the library away from a local ownership to a licensing model, with known questions about scholarly communication policy. For my purposes here, though, this means that the collection is more elastic as titles are added or dropped, as needs, budgets or priorities change. More recently, we have seen the emergence of DDA, implemented in various ways. I include it here, because again, it represents a move away from the just in case, owned collection, based on librarian judgement, towards a model which is built around patron behaviors. The library is facilitating access to required materials, rather than always anticipating what those requirements are.

3. Shared – or collective – collections

However large, a purely local collection seems increasingly partial when placed in the context of the universe of potentially interesting resources. There is a growing trend to place local collections in a broader network context. While discussing the future of libraries, John Wilkin [pdf] has distinguished between what is best done locally (the management of space is the obvious example here) and what is best done at the network level. Interestingly, he asserts that the “best example of an activity that can be done most appropriately in a networked context is curation.” And we can indeed see how several manifestations of such collective collections have emerged successively in recent years. Here are some examples.

  1. The ‘borrowed’ collection. Libraries have long organized in resource sharing networks, through OCLC, or through various regional or national infrastructures. Often, these are associated with union catalogs which describe the ‘collective collection’ available for borrowing. A library may belong to several networks. For example, our neighbor in Columbus, Ohio State University, will share resources in OhioLink, CIC, and OCLC. WorldCat has emerged an an important registry of the borrowable collective collection, spanning thousands of libraries. In this way, the library can facilitate access to a broader collection than is available locally.
  2. The ‘shared print’ collection. The shared print collection is a natural evolution of resource sharing networks, as local collections are managed down and collaborative approaches to collection management emerge. And indeed, we see OhioLink and the CIC also turn their attention to such shared management. A large part – perhaps the majority? – of library collections will be under shared management within the next decade. In this way, curation of the collective print record is beginning to be advanced in a network context.
  3. The ‘shared digital’ collection. As libraries digitize their collections, it has become clear that very few individual institutions are strong gravitational hubs in themselves. Materials digitized from local collections release greater value when aggregated within larger collections, which can aggregate both supply and demand. In different ways, for example, we have seen HathiTrust, DPLA and Trove, emerge to create these aggregations, aiming to more efficiently unite collections and their potential users. WorldCat also aggregates digital materials through the digital collections gateway.
  4. The evolving scholarly record. At one time, the scholarly record comprised the final outputs of research – the journal articles and books. Now, increasingly, there is an interest in a variety of other outputs: methods, working papers, research data, preprints, and so on. In some regimes there is also growing government or funder interest in ensuring broad access to these materials through mandates. Institutions have developed mechanisms for managing and disclosing these, and they are collected into many services for management and/or discovery. These include disciplinary repositories (e.g. arXiv), third party services (e.g. FigShare), national infrastructure services (e.g. Research Data Australia or Narcis in the Netherlands), collaborative approaches like the nascent Share in the US, and so on. So, while research outputs will feature in a variety of venues, including of course publisher services, we are also seeing collaborative educational initiatives in this space.

In each of these cases we can see a shift of focus from locally owned or managed resources to a shared or collective arrangement at the network level. Developments are uneven, but a trend is apparent.

As always, it is instructive to watch job adverts at innovative institutions for signals of change. Consider this interesting advert for an AUL for ‘content and collections’ at the University of Minnesota. It talks about support for local creation (copyright, open scholarship and publishing) which hasn’t been my focus here. But it also makes involvement in shared collections a large part of the role. It makes the increasingly multi-institutional nature of collections clear (‘works collaboratively with other institutions to develop services at scale’ and ‘shaping Libraries programs to contribute digital content to the broader scholarly community’). The advert also makes the multi-scalar nature of this clear: “The AUL contributes to the strategic development of local, regional, and national strategies in preservation, digitization, and access.” Conscious community coordination around shared collections at the network level has become an important activity for many libraries.

The facilitated collection
The facilitated collection

Some issues around the facilitated collection – towards collections as a service

[pullquote]Our sense of what the library collection is continues to evolve.[/pullquote] This is a quick sketch, illustrating a direction. However, it is helpful to note some of the questions it raises. One can cluster these into core issues of organization, stewardship and discovery.

    1. Management of the owned collection (and subsequently the borrowed and licensed collection) shaped library organization (technical services, automation, resource sharing, etc) until recently. New organizational arrangements are emerging, but have not yet crystallized into a general pattern. Consider how libraries are providing support for digital humanities, scholarly communication or digital scholarship in different ways, for example, often aligned with a collections function. I have mentioned job adverts as a useful signal about directions – the increased use of ‘strategist’ in collections job titles is symptomatic of a shift, as people in those roles are asked to make more decisions about allocation of resources and attention. The facilitated collection as I have described it does not map onto a single library service or organizational category – it is emergent and spans organizational categories.
    2. In the ‘owned’ library, libraries had physical custody of the item, which supported clear stewardship lines. This has now changed completely. In fact, the facilitated collection involves different levels of custodial relationship with its components, which can complicate stewardship arrangements. Notably, of course, the licensed collection poses well-known questions around long term stewardship of the electronic journal literature. Stewardship of shared collections require conscious coordination of institutional actions, interests and policies. Think of the policy and service frameworks that are emerging around shared print initiatives such as the Western Regional Storage Trust (WEST) or the Michigan Shared Print Intiative (Mi-SPI). Or think about discussion of metadata rights in digital aggregations. Going further, what responsibility, if any, does the library take for external resources it points to? This may involve reliance on information partnerships, or in many cases, on no formal relationship at all (as for example where a library loads records for Project Gutenberg into its catalog). This variety in stewardship arrangements, and the emergence of shared collections, complicates the notion of the local collection. It also makes counting difficult or less relevant. For example, local collection counts can change significantly as libraries experiment with plugging in various content sources (think again of Project Gutenberg records in the catalog).
    3. As discovery options increase in the broader network environment, so the relationship between collections and discovery shifts. Demand or patron driven acquisition provides an interesting example. It represents an inversion of the historic discovery/collection relationship: before, the collection drove discovery (the catalog), here discovery drives the collection.

There is some discussion about a shift from collections to services. Another way of thinking about what I have called the facilitated collection here is to move towards thinking about collections as a service. Libraries will continue to build collections, although the level of activity will differ across libraries. At the same time, it seems likely that facilitated collections of various types will grow in importance.

[Thanks to Constance Malpas and Brian Lavoie for helpful comments on this post.]