I was recently quoted in an AP article (published here in Salon) as saying that Brewster Kahle's position with regard to the openness of Google-digitized public domain content is "theoretical." Well, I sure thought I said "polemical," but them's the breaks. Brewster argues that Google's work in digitizing the public domain essentially locks it up--puts it behind a wall and makes it their own--and that this is a loss in a world that loves openness. The contrast here is meant to be with the work of the Open Content Alliance, where the same public domain work might be be shared freely, transferred to anyone, anywhere, and used for any purpose. I don't want to get into the quibble here about the constraints on that apparently open-ended set of permissions (i.e., that an OCA contributor may end up putting constraints on materials that look worse than Google's constraints). What's key here for me, though, is the real practical part of openness--what most people want and what's possible through what Michigan puts online.
I think all of this debate begs us to ask the question "what is open"? For the longest time (since the mid-1990's), Michigan digitized public domain content and made it freely viewable, searchable and printable. Anyone, anywhere could come to a collection like Making of America and read, search and print to his heart's delight. If the same user wanted to download the OCR, that too was made possible and, in fact, the Distributed Proofreader's project has made good use of this and other MOA functionality. We didn't make it possible for anyone to get a collection of our source files because we were actively involved in setting up Print-on-Demand (POD), POD typically has up-front, per-title costs, and making the source files available would have cost us some sales that might otherwise pay for that initial investment. As we moved into the agreement with Google, we made clear our intention to do the same "open" thing with the Google-digitized content, and to throw in our lot with a (then) yet-to-be-defined multi-institutional "Shared Digital Repository." In fact, now we have hundreds of thousands of public domain works online, all of which are readable, searchable and printable by anyone in the world in much the same way.
So, what's the beef? The OCA FAQ states that for them this openness means that "textual material will be free to read, and in most cases, available for saving or printing using formats such as PDF." By all means! I hope it's clear by what I wrote above that this is an utterly accurate description of what happens when Google digitizes a volume from Michigan's collection and Michigan puts it online. It's also, incidentally, what Google makes possible, but even if Google didn't, Michigan could and would be rushing in to fill that breach. The challenges to Google's openness always seem to ignore what's actually possible through our copies at Michigan. This sort of polarizing rhetoric seems to be about making a point that's not accurate in the service of an attack on Google's primacy in this space: we don't want them to dominate the landscape, so let's characterize their Bad version as being the opposite of our Good version. This notion that what Google does is closed is not an accurate description of Google's version of these books, and even less so a description of Michigan's.
Could the Google books be more open? Absolutely. Along with Carl Malamud, for example, I would love to see all of the government documents that have been digitized by Google available for transfer to other entities so that the content could be improved and integrated into a wide variety of systems, thus opening up our government as well as our libraries. I believe that will happen, in fact, and that Google will one day (after they've had a chance to gain some competitive advantage) open up far more. In the meantime, however, when we talk about "open," let's mean it the way that the OCA FAQ means it. Let's mean it in the same way that the bulk of our audience means it. Let's talk about the ability to read, cite and search the contents of these books, and let's call the Google Books project and particularly Michigan's copies Open. Let's stop being theoretical, er, I mean polemical.
Posted by jpwilkin on April 25, 2008
Tags: digitization


Comments on specific paragraphs:
Click the
icon to the right of a paragraph
Comments on the page as a whole:
Click the
icon to the right of the page title (works the same as paragraphs)