At Michigan we’re engaged in an activity that I hope will one day seem ordinary and a routine part of library work. Resources from several departments are devoted to determining the copyright status of works typically presumed to be in copyright. For now, we’re focusing on US monographic imprints (books, that is) published between 1923 and 1963, but plan to turn our attention to non-US publications in the future. I wouldn’t want to give anyone the impression that this is easy work or work without its share of legal perils, but it does feel distinctly like “library work” and, as might be obvious, has a number of very significant positive benefits for library users in an increasingly digital environment. In this post, I won’t be describing our procedures for doing our work here or detailing the pitfalls, but I’d like to use this installment to muse on a few things that have seemed remarkable or interesting. We have, on the other hand, made a proposal to IMLS to ramp up this activity and make it more reliable through collaboration with several other institutions, and before I finish I’ll say a bit more about that.

Impact
One of the most interesting things about all of this is the great impact that’s possible with relatively modest resources. Experienced library staff in technical services—staff with considerable experience in various sorts of bibliographic work like copy- and original cataloging—review publications and use a variety of tools to make determinations about a book’s copyright status. In most cases, those staff can reliably confirm that the work is in the public domain or in copyright. Because their work is driven by content that has been digitized and is online, if the work has been determined to be in the public domain, we update records that control access to the materials and permit access.

I say “modest” and I know that this will seem odd to some colleagues at smaller libraries or other types of libraries, but in terms of research library staffing, the 1.5 FTE of professional staffing we devote to this work has a profound effect. Yes, of course more infrastructure is necessary than just these technical services staff members, but the processes in our library IT organization handle input from their work in overwhelmingly automated ways, creating lists of works to be reviewed and updating records about the rights status of the works. Compare this number to any other area of materials processing in a research library and the numbers seem modest. Consider, moreover, the fact that these staff process more than 2,000 titles each month, and that the majority of these works are found to be in the public domain. At our current rate of work, we’ll open access to over 15,000 titles in about one year of work. That would be a phenomenal number in the pre-Google days of our digitization (I believe we digitized about 9,000 volumes in our peak year of preservation-related digitization work), and focuses on more current publications than we typically see in digitization. For a relatively small sum, we’re benefiting our own constituency as well as readership throughout the Internet. No matter how you cut it, this feels like a good investment in library funds.

What is “library work”?
Once you get past the question of whether digitization is “library work,” it seems frivolous to ask whether this kind of copyright-determination work is also the work of libraries, but I think it’s very clearly a question for some of my colleagues. The copyright determination work involves several steps where our staff are particularly skilled, and the outcomes seem right up our alley. What sort of outcomes? Clearly access is a big one, but one doesn’t need a lot of imagination to understand that our preservation work (particularly of these digital files) is benefited by the increased access. The skills piece is particularly germane, though. Doing this work depends on knowledge of bibliographic description, of the sorts of variability in practice that one sees between a book as it leaves the printer and the ways that it’s described in everything from library catalogs (including OCLC WorldCat) to the Library of Congress’s copyright registration records. It’s also work that depends on recognizing the traps that come along with copyright, like the possibility that a US work was previously published abroad and thus may still be eligible for copyright protection. These things come naturally to a person well versed in bibliographic description: people who have been employed in processes that create the same sorts of records they are now being asked to review.

I’m a big fan of mainstreaming. In a conversation with a library director at another institution, a director who was once director of technical services, I found myself arguing that this work is the work of technical services, much to her dismay. I’m certain some institutions would be tempted to build out a separate unit devoted to this new activity, populated by bright young staff who have never worked their way through descriptive work using everything from the pre-1956 National Union Catalog to ancient card records to the once great variety of “bibliographic utilities” (RLIN, OCLC, WLN and the rest), but what we would lose in that sort of staffing model is a sort of skill that comes along with recognizing variations in descriptive practice and the great variety of publishing practices that we see in the materials themselves. By relying on existing technical services staff members, we have those skills and a sensitivity to the need to create sustainable, routinized activities.


2

Not the common wisdom
Some readers will have noticed that my numbers don’t add up to what we have generally considered to be the distribution between in-copyright and public domain for US 1923-1963 publications. When we talk about US renewals, lots of numbers are bandied about, some numbers are based on very early analyses, and some numbers are based on reasonable sample-based analyses. The most common estimate is that only 15% of US books published between 1923 and 1963 had their copyright renewed. Our fairly random selection of titles has generated very different numbers: we’re finding renewals for about 30% of the works in our queue, with another 10% having problems that are complex enough (e.g., possibly previous foreign publication or the inclusion of works such as short stories or poems by multiple authors).

I should say a little bit about our processes and the way that our queue may be influencing these numbers. As either Michigan or Google digitizes volumes and they flow back into our repository, we cull candidate titles for review. We rely on fixed fields in the MARC records to find candidate titles (i.e., monographs published in the US between 1923-1963). We tend to prefer those materials that have been digitized or processed more recently, as the quality of processing improves over time and we’d like to optimize our impact. To date, most of the candidate volumes have been drawn from our storage facility (Buhr), and so tend to be lower circulation titles in poorer condition. These facts, by themselves, shouldn’t skew our numbers or, if they do, should skew the selection toward volumes where the copyright wasn’t renewed because the title was less popular. In any case, it’s hard to imagine that our collection, which is fairly comprehensive for the period, would tend to be anything but representative, and yet the numbers are running very high for renewals. Of course I’d be happier to find that 85% of the works are in the public domain, but I continue to be encouraged to find that 60% of the works are in the public domain.

It’s far too soon to say if these numbers will hold or if there are other factors at work, but after having reviewed more than 25,000 volumes, the fairly constant pattern emerging should interest many who watch this space.


2

More to say, more to do
I didn’t intend to address the wealth of related issues in this space, but wanted to get these few thoughts down to start the conversation. Someone from Michigan should discuss how the work gets done, what the liabilities are for making mistakes, the various ways that we could make mistakes (did I hear the word “restoration”?), whether we should be doing the work in isolation, and many other topics. I will close with beginning to address the last question, however. We have long advocated sharing this work among many institutions and saw the Google digitization effort as one tremendous stimulus to creating some thoughtful, reliable group sourcing. In discussions with a very sympathetic General Counsel’s office, we concluded that the work of making these determinations would be strengthened by collaboration—double-blind or triple-blind tests of status, if you will. This winter, Anne Karle-Zenith on our staff wrote a proposal to IMLS for the creation of a multi-institutional queuing and vetting mechanism, and our friends at Indiana, Minnesota and Wisconsin wrote letters offering their enthusiastic support. I hope we will one day be doing this work in a well-documented and open group space, with contributions by many institutions. After all, while this really is library work, when it comes to US publications, there’s a bounded body of candidates, and by sharing this work our community can add several thousand titles to the known public domain.

Posted by jpwilkin on May 19, 2008
Tags: copyright, digitization

Total comments on this page: 6

How to read/write comments

Comments on specific paragraphs:

Click the icon to the right of a paragraph

  • If there are no prior comments there, a comment entry form will appear automatically
  • If there are already comments, you will see them and the form will be at the bottom of the thread

Comments on the page as a whole:

Click the icon to the right of the page title (works the same as paragraphs)

Comments

No comments yet.

Buzzy on paragraph 6:

Could those different percentages arise from the self-selecting pool that is a library’s collection? After all, with limited budgets, librarians presumably chose books deemed influential or useful enough to be added to UM’s collection.

May 20, 2008 5:37 am
Buzzy on paragraph 9:

Have you gotten a feeling for how amenable IMLS seems to such a proposal? While certainly a funder of great things, IMLS as an organization seems rather conservative in the programs they’ll fund. The state libraries to which IMLS distributes funds seem a lot more innovative - and more willing to take risks.

May 20, 2008 5:39 am
jpwilkin on paragraph 6:

Hard to say, and though that’s the most likely possibility, I have to say that I’m skeptical. While few libraries will be comprehensive, a large research library like ours will be more comprehensive for US pubs. Even if selectivity were the source of the bias, I’d be surprised if selection could skew things this much. Perhaps time will tell.

May 20, 2008 5:43 am
jpwilkin on paragraph 9:

I had some very encouraging conversations with IMLS. A key element for them is the opportunity to leverage private sector investment in digitization (Google, in this case), thus amplifying their impact.

May 20, 2008 5:46 am

[…] Wilkin wrote a long and interesting post on the University of Michigan’s efforts to identify works out of copyright from the pool of […]

May 20, 2008 12:17 pm

[…] Discovering the Undiscovered Public Domain. […]

June 8, 2008 7:06 am
Name (required)
E-mail (required - never shown publicly)
URI