<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.2.2" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>John Wilkin’s blog</title>
	<link>http://scholarlypublishing.org/jpwilkin</link>
	<description>John's blog on libraries, library technology, and pizza</description>
	<pubDate>Tue, 20 May 2008 00:23:57 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.2</generator>
	<language>en</language>
			<item>
		<title>Discovering the Undiscovered Public Domain</title>
		<link>http://scholarlypublishing.org/jpwilkin/archives/13</link>
		<comments>http://scholarlypublishing.org/jpwilkin/archives/13#comments</comments>
		<pubDate>Tue, 20 May 2008 00:23:57 +0000</pubDate>
		<dc:creator>jpwilkin</dc:creator>
		
		<category><![CDATA[copyright]]></category>

		<category><![CDATA[digitization]]></category>

		<guid isPermaLink="false">http://scholarlypublishing.org/jpwilkin/archives/13</guid>
		<description><![CDATA[At Michigan we’re engaged in an activity that I hope will one day seem ordinary and a routine part of library work. Resources from several departments are devoted to determining the copyright status of works typically presumed to be in copyright. For now, we’re focusing on US monographic imprints (books, that is) published between 1923 [...]]]></description>
			<content:encoded><![CDATA[<p>At Michigan we’re engaged in an activity that I hope will one day seem ordinary and a routine part of library work. Resources from several departments are devoted to determining the copyright status of works typically presumed to be in copyright. For now, we’re focusing on US monographic imprints (books, that is) published between 1923 and 1963, but plan to turn our attention to non-US publications in the future. I wouldn’t want to give anyone the impression that this is easy work or work without its share of legal perils, but it does feel distinctly like “library work” and, as might be obvious, has a number of very significant positive benefits for library users in an increasingly digital environment. In this post, I won’t be describing our procedures for doing our work here or detailing the pitfalls, but I’d like to use this installment to muse on a few things that have seemed remarkable or interesting. We have, on the other hand, made a proposal to IMLS to ramp up this activity and make it more reliable through collaboration with several other institutions, and before I finish I’ll say a bit more about that.</p>
<p><strong>Impact</strong><br />
One of the most interesting things about all of this is the great impact that’s possible with relatively modest resources. Experienced library staff in technical services—staff with considerable experience in various sorts of bibliographic work like copy- and original cataloging—review publications and use a variety of tools to make determinations about a book’s copyright status. In most cases, those staff can reliably confirm that the work is in the public domain or in copyright. Because their work is driven by content that has been digitized and is online, if the work has been determined to be in the public domain, we update records that control access to the materials and permit access.</p>
<p>I say “modest” and I know that this will seem odd to some colleagues at smaller libraries or other types of libraries, but in terms of research library staffing, the 1.5 FTE of professional staffing we devote to this work has a profound effect. Yes, of course more infrastructure is necessary than just these technical services staff members, but the processes in our library IT organization handle input from their work in overwhelmingly automated ways, creating lists of works to be reviewed and updating records about the rights status of the works. Compare this number to any other area of materials processing in a research library and the numbers seem modest. Consider, moreover, the fact that these staff process more than 2,000 titles each month, and that the majority of these works are found to be in the public domain. At our current rate of work, we’ll open access to over 15,000 titles in about one year of work. That would be a phenomenal number in the pre-Google days of our digitization (I believe we digitized about 9,000 volumes in our peak year of preservation-related digitization work), and focuses on more current publications than we typically see in digitization. For a relatively small sum, we’re benefiting our own constituency as well as readership throughout the Internet. No matter how you cut it, this feels like a good investment in library funds.</p>
<p><strong>What is “library work”?</strong><br />
Once you get past the question of whether digitization is “library work,” it seems frivolous to ask whether this kind of copyright-determination work is also the work of libraries, but I think it’s very clearly a question for some of my colleagues. The copyright determination work involves several steps where our staff are particularly skilled, and the outcomes seem right up our alley. What sort of outcomes? Clearly <em>access</em> is a big one, but one doesn’t need a lot of imagination to understand that our preservation work (particularly of these digital files) is benefited by the increased access. The skills piece is particularly germane, though. Doing this work depends on knowledge of bibliographic description, of the sorts of variability in practice that one sees between a book as it leaves the printer and the ways that it’s described in everything from library catalogs (including OCLC WorldCat) to the Library of Congress’s copyright registration records. It’s also work that depends on recognizing the traps that come along with copyright, like the possibility that a US work was previously published abroad and thus may still be eligible for copyright protection.  These things come naturally to a person well versed in bibliographic description: people who have been employed in processes that create the same sorts of records they are now being asked to review.</p>
<p>I’m a big fan of mainstreaming. In a conversation with a library director at another institution, a director who was once director of technical services, I found myself arguing that this work is the work of technical services, much to her dismay. I’m certain some institutions would be tempted to build out a separate unit devoted to this new activity, populated by bright young staff who have never worked their way through descriptive work using everything from the pre-1956 National Union Catalog to ancient card records to the once great variety of “bibliographic utilities” (RLIN, OCLC, WLN and the rest), but what we would lose in that sort of staffing model is a sort of skill that comes along with recognizing variations in descriptive practice and the great variety of publishing practices that we see in the materials themselves. By relying on existing technical services staff members, we have those skills and a sensitivity to the need to create sustainable, routinized activities.</p>
<p><strong>Not the common wisdom</strong><br />
Some readers will have noticed that my numbers don’t add up to what we have generally considered to be the distribution between in-copyright and public domain for US 1923-1963 publications. When we talk about US renewals, lots of numbers are bandied about, some numbers are based on very early analyses, and some numbers are based on reasonable sample-based analyses. The most common estimate is that only 15% of US books published between 1923 and 1963 had their copyright renewed. Our fairly random selection of titles has generated very different numbers: we’re finding renewals for about 30% of the works in our queue, with another 10% having problems that are complex enough (e.g., possibly previous foreign publication or the inclusion of works such as short stories or poems by multiple authors).</p>
<p>I should say a little bit about our processes and the way that our queue may be influencing these numbers. As either Michigan or Google digitizes volumes and they flow back into our repository, we cull candidate titles for review. We rely on fixed fields in the MARC records to find candidate titles (i.e., monographs published in the US between 1923-1963). We tend to prefer those materials that have been digitized or processed more recently, as the quality of processing improves over time and we’d like to optimize our impact. To date, most of the candidate volumes have been drawn from our storage facility (Buhr), and so tend to be lower circulation titles in poorer condition. These facts, by themselves, shouldn’t skew our numbers or, if they do, should skew the selection toward volumes where the copyright wasn’t renewed because the title was less popular. In any case, it’s hard to imagine that our collection, which is fairly comprehensive for the period, would tend to be anything but representative, and yet the numbers are running very high for renewals. Of course I’d be happier to find that 85% of the works are in the public domain, but I continue to be encouraged to find that 60% of the works are in the public domain.</p>
<p>It’s far too soon to say if these numbers will hold or if there are other factors at work, but after having reviewed more than 25,000 volumes, the fairly constant pattern emerging should interest many who watch this space.</p>
<p><strong>More to say, more to do</strong><br />
I didn’t intend to address the wealth of related issues in this space, but wanted to get these few thoughts down to start the conversation. Someone from Michigan should discuss how the work gets done, what the liabilities are for making mistakes, the various ways that we could make mistakes (did I hear the word “restoration”?), whether we should be doing the work in isolation, and many other topics. I will close with beginning to address the last question, however. We have long advocated sharing this work among many institutions and saw the Google digitization effort as one tremendous stimulus to creating some thoughtful, reliable <strong>group sourcing</strong>. In discussions with a very sympathetic General Counsel’s office, we concluded that the work of making these determinations would be strengthened by collaboration—double-blind or triple-blind tests of status, if you will. This winter, Anne Karle-Zenith on our staff wrote a proposal to IMLS for the creation of a multi-institutional queuing and vetting mechanism, and our friends at Indiana, Minnesota and Wisconsin wrote letters offering their enthusiastic support. I hope we will one day be doing this work in a well-documented and open group space, with contributions by many institutions. After all, while this really <em>is</em> library work, when it comes to US publications, there’s a bounded body of candidates, and by sharing this work our community can add several thousand titles to the <em>known</em> public domain.</p>
]]></content:encoded>
			<wfw:commentRss>http://scholarlypublishing.org/jpwilkin/archives/13/feed</wfw:commentRss>
		</item>
		<item>
		<title>Did I say &#8220;theoretical&#8221;?  Openness and Google Books digitization</title>
		<link>http://scholarlypublishing.org/jpwilkin/archives/12</link>
		<comments>http://scholarlypublishing.org/jpwilkin/archives/12#comments</comments>
		<pubDate>Fri, 25 Apr 2008 18:01:17 +0000</pubDate>
		<dc:creator>jpwilkin</dc:creator>
		
		<category><![CDATA[digitization]]></category>

		<guid isPermaLink="false">http://scholarlypublishing.org/jpwilkin/archives/12</guid>
		<description><![CDATA[I was recently quoted in an AP article (published here in Salon) as saying that Brewster Kahle's position with regard to the openness of Google-digitized public domain content is "theoretical." Well, I sure thought I said "polemical," but them's the breaks.  Brewster argues that Google's work in digitizing the public domain essentially locks it [...]]]></description>
			<content:encoded><![CDATA[<p>I was recently quoted in an AP article (published <a href="http://www.salon.com/wires/ap/scitech/2008/04/24/D908LLMO0_google_book_search/index.html">here</a> in Salon) as saying that Brewster Kahle's position with regard to the openness of Google-digitized public domain content is "theoretical." Well, I sure thought I said "polemical," but them's the breaks.  Brewster argues that Google's work in digitizing the public domain essentially locks it up--puts it behind a wall and makes it their own--and that this is a loss in a world that loves openness.  The contrast here is meant to be with the work of the Open Content Alliance, where the same public domain work might be be shared freely, transferred to anyone, anywhere, and used for any purpose.   I don't want to get into the quibble here about the constraints on that apparently open-ended set of permissions (i.e., that an OCA contributor may end up putting constraints on materials that look worse than Google's constraints).  What's key here for me, though, is the real practical part of openness--what most people want and what's possible through <a href="http://www.lib.umich.edu/mdp/">what Michigan puts online</a>.</p>
<p>I think all of this debate begs us to ask the question "what is open"?  For the longest time (since the mid-1990's), Michigan digitized public domain content and made it freely viewable, searchable and printable.  <em>Anyone</em>, anywhere could come to a collection like <a href="http://moa.umdl.umich.edu/">Making of America</a> and read, search and print to his heart's delight.  If the same user wanted to download the OCR, that too was made possible and, in fact, the <a href="http://www.pgdp.net/c/">Distributed Proofreader's</a> project has made good use of this and other MOA functionality.  We didn't make it possible for anyone to get a collection of our source files because we were actively involved in setting up Print-on-Demand (POD), POD typically has up-front, per-title costs, and making the source files available would have cost us some sales that might otherwise pay for that initial investment.   As we moved into the <a href="http://www.lib.umich.edu/mdp/umgooglecooperativeagreement.html">agreement with Google</a>, we made clear our intention to do the same "open" thing with the Google-digitized content, and to throw in our lot with a (then) yet-to-be-defined multi-institutional "Shared Digital Repository."  In fact, now we have hundreds of thousands of public domain works online, all of which are readable, searchable and printable by anyone in the world in much the same way.</p>
<p>So, what's the beef?  The <a href="http://www.opencontentalliance.org/faq.html">OCA FAQ</a> states that for them this openness means  that "textual material will be free to read, and in most cases, available for saving or printing using formats such as PDF."  By all means!   I hope it's clear by what I wrote above that this is an utterly accurate description of what happens when Google digitizes a volume from Michigan's collection and Michigan puts it online.  It's also, incidentally, what Google makes possible, but even if Google didn't, Michigan could and would be rushing in to fill that breach.  The challenges to Google's openness always seem to ignore what's actually possible through our copies at Michigan.   This sort of polarizing rhetoric seems to be about making a point that's not accurate in the service of an attack on Google's primacy in this space:  we don't want them to dominate the landscape, so let's characterize their Bad version as being the opposite of our Good version.   This notion that what Google does is closed is not an accurate description of Google's version of these books, and even less so a description of Michigan's.</p>
<p>Could the Google books be <em>more</em> open?  Absolutely.   Along with <a href="http://freegovinfo.info/node/1541">Carl Malamud</a>, for example, I would love to see all of the government documents that have been digitized by Google available for transfer to other entities so that the content could be improved and integrated into a wide variety of systems, thus opening up our government as well as our libraries.   I believe that will happen, in fact, and that Google will one day (after they've had a chance to gain some competitive advantage) open up far more.  In the meantime, however, when we talk about "open," let's mean it the way that the OCA FAQ means it.  Let's mean it in the same way that the bulk of our audience means it.  Let's talk about the ability to read, cite and search the contents of these books, and let's call the Google Books project and particularly Michigan's copies Open.  Let's stop being theoretical, er, I mean polemical.</p>
]]></content:encoded>
			<wfw:commentRss>http://scholarlypublishing.org/jpwilkin/archives/12/feed</wfw:commentRss>
		</item>
		<item>
		<title>The future of LIS programs</title>
		<link>http://scholarlypublishing.org/jpwilkin/archives/10</link>
		<comments>http://scholarlypublishing.org/jpwilkin/archives/10#comments</comments>
		<pubDate>Fri, 30 Nov 2007 16:01:16 +0000</pubDate>
		<dc:creator>jpwilkin</dc:creator>
		
		<category><![CDATA[library education]]></category>

		<guid isPermaLink="false">http://scholarlypublishing.org/jpwilkin/archives/10</guid>
		<description><![CDATA[In late October, 2007, I was invited to a summit on the future of Library and Information Science (LIS) programs in our I-schools. The LIS specialization, particularly at Michigan, has been in some disarray. Surrounded by compelling and successful programs in areas such as archives and records management and human computer interaction, the LIS specialization [...]]]></description>
			<content:encoded><![CDATA[<p>In late October, 2007, I was invited to a summit on the future of Library and Information Science (LIS) programs in our I-schools. The LIS specialization, particularly at Michigan, has been in some disarray. Surrounded by compelling and successful programs in areas such as archives and records management and human computer interaction, the LIS specialization has been seen by some as the rearguard program, supporting the last remnants of a profession that, if not dying, is assumed to be significantly threatened. This stands in stark contrast to librarianship, where in nearly every sphere (e.g., public and academic libraries) we see vital issues being addressed and new futures being forged. For the summit each invitee was asked to write a short position paper organized around the notions represented in the headings, below. Mine follows.</p>
<p><strong>Introduction</strong><br />
I am an academic librarian who works in research libraries, so I see the questions being posed here (and the issue of LIS education generally) through that lens. My perspective is tied significantly to the interplay of information resources and the research uses to which they are put. There are, I think, many reasonable ways to approach these questions, but mine is about this interplay and the need for professionals in my sphere to support an array of activities around research and teaching, including authentication and curation of the products of research.</p>
<p><strong>Technical and social phenomena we see coming in the next 10 years</strong><br />
The technical and social phenomena that seem most significant surround a tension in the perception that <em>disintermediation</em> plays an increasingly evident role in the information space of research institutions.</p>
<p>On the one hand, we see intensifying disintermediation, and along with that an increasingly rich array of tools and technology that facilitate academic users interacting directly with their sources, and directly with the means for dissemination. At the same time, in tension with this disintermediation, we see a drive by competing <em>mediating</em> open systems to facilitate that disintermediation: Google's preeminence makes it an obvious example of this sort of mediation; smaller players (Flickr, Facebook, others) may only fill niche roles, but have come to play the same sort of mediating role.</p>
<p>The irony in this dynamic is that many (or even most) of the most compelling resources have <em>not</em> been peer-to-peer resources, but networked resources like Google or even WorldCat. Consequently, in this world of growing disintermediation, we do not see, primarily, peer-to-peer services predominating, but rather very compelling social networking services that act as a powerful set of intermediaries. Openness at the network layer has become much more important than even "open source" because the services (rather than the software) are the destinations. At the outset, then, in this small space, what I would like to highlight is a growing sense of <em>agency</em> by users in the academic research world, and agency facilitated not by specialized software on their desktops, but by mediating <em>services</em> that those users can leverage to accomplish remarkable things.</p>
<p>In this context of what we've come to think of as "in the flow" (i.e., in the flow of engagement between the user and the mediating network resource), academic research libraries are challenged to perform core functions (functions, such as archiving and instruction, that have not diminished in importance) at the same time that they are challenged to perform their work with users "in the flow." Significantly, the research library must continue to serve a critical curatorial role for cultural heritage information despite the sense that the information being used is everywhere and perhaps thus cared for by the network. While they engage with this challenge of what sometimes feels like trying to catch the wind in a net, academic research libraries must craft a new role more clearly focused on engagement with scholarly communication. They must simultaneously reach out to and become a natural part of the working environment and methods of their users, and engage in the strategic curation of the human record.<a href="http://scholarlypublishing.org/jpwilkin/wp-admin/post.php#foot1">[1]</a> Around this apparent or real disintermediation with increasingly powerful <em>intermediaries</em>, we need to ensure perpetual access and the right sorts of services to our communities.</p>
<p><strong>Key unanswered questions that should drive research</strong><br />
The problem, as I see it, is that the set of questions evolves as quickly as the environment. So, for example, some current questions include:</p>
<ul>
<li><strong>What are the tools, services and systems that optimize the information seeking, use and creation activities of our users?</strong> Even in the age of Google, Amazon and Flickr, academic research library <em>systems</em> play a role in discovery of information. For example, although Google Scholar has been shown to be more effective in discovery than metasearch applications, vast numbers of key resources are not indexed by GS and are only found through the cumbersome and arcane specialized interfaces provided by publishers and vendors.<a href="http://scholarlypublishing.org/jpwilkin/wp-admin/post.php#foot2">[2]</a> Finding effective ways to intercede and assist users (without also putting cumbersome "help" in their way) is one of the challenges for our community. Similarly, a better understanding of the way our users interact with resources is beginning to make it possible for us to layer onto the network an array of tools (e.g., <a href="http://www.zotero.org/" target="_blank">Zotero</a> or the <a href="http://www.libx.org/" target="_blank">LibX toolbar</a>) that make it possible for users to integrate networked resources into their scholarship. And, finally, libraries have become the equivalent of publishers in the new, networked environment, and ensuring that we perform that role <em>along with curation</em> in seamless and effective ways is one of our current challenges.<a href="http://scholarlypublishing.org/jpwilkin/wp-admin/post.php#foot3">[3]</a> All of this raises a number of embedded questions, some related to understanding the behavior of users, others to deploying the most effective technologies, and yet others to judging what the next great technological innovation will be and where we can situate ourselves.</li>
<li><strong>How can we most effectively curate the human record in a world that is simultaneously more interconnected and, in some ways, more fragmented?</strong>
<ul>
<li>It’s worth noting that even though the network holds out promise for unifying formally-defined "library collections" in a way never before imagined, the fact that many resources are rare or valuable or have significant artifactual value means that the "scatter" of unique parts of collections that we already know well will only become more pronounced (if only by contrast). For example, our making digital surrogates available will remove most, but not all, need for scholars to travel to Michigan to use the papyrus collection.</li>
<li>This problem of the artifact obviously represents a marginal case. More significantly, as we are increasingly able to provide electronic access to our print collections, we are faced with the need to develop effective strategies for storing print and balancing access with minimizing waste. It obviously doesn’t make sense to store a copy of ordinary works at each of more than 100 research libraries in the United States, but how can an amalgamation of collections be performed in ways that respect current user preferences for print and takes into account bibliographic ambiguity (e.g., is my copy the same as your copy, and when there are differences, how much variation should be preserved)? We need to document this in a way that ensures a comprehensive sense of curatorial responsibility so that, for example, one institution does not withdraw a "last copy" of a volume by assuming (incorrectly) that it is acting in isolation.</li>
<li>Finally, and perhaps most compellingly, there is the question of what constitutes effective digital curation and how (and to what extent) we should balance that curation with access. There is much that we know about appropriate digital formats, migration, and the design of effective archiving services, but this has not been put to the test with the grand challenge that is looming. Moreover, as we provide access, we are challenged by questions of usability, and even more by the question of how we best situate our access services relative to network services. We should not duplicate Google's work in Google Book Search, but there are services Google may not or will not offer, and that we should in agile and relevant ways.</li>
</ul>
</li>
</ul>
<p><strong>The curriculum we should provide to train professionals in this changing environment</strong><br />
Working from this perspective, it strikes me that the LIS curriculum should focus on developing a method of engagement rather than primarily training to answer specific questions. Of course that focus on methodology <em>must</em> be grounded in an exploration of specific contemporary questions, but it should be made clear that the circumstances of those questions are likely to change (i.e., the journey will be more important than the destination). Perhaps this is obvious or has always been the case, but the incredible fluidity of the environment now calls for precisely this type of response. Some recent experience may help to illustrate this:</p>
<ul>
<li>In our efforts to better understand how mass digitization work succeeds and fails, we have needed to understand the distribution of certain types of materials in our collection. Being able to articulate the question and then pursue strategies for mitigating problems (and increase opportunities) has called for analytical skills and an understanding of research methods, including statistical skills. In a recent specific case, we needed to understand the interaction between particular methods of digitization and different methods of printing (e.g., reproduction of typescript versus offset printing). The methods of digitization are squarely within the field of current librarianship, as is an understanding of the types of materials we collect and own; and it is equally true that both the digitization methods and types of materials will change with time. What I would emphasize is that it is the <em>skills</em> involved in the inquiry that are paramount. Though they are in no way divorced from the specific problems that one tackles, they are the most important part of the educational process.</li>
<li>In filling the niche left by Google because of legal constraints and a genuine lack of interest in academic uses of materials, we have embarked on a process of system design and software development. This effort has required of staff not only the ability to write effective code (or manage writing that code), but also the ability to chart courses informed by usability, by an understanding of the law (particularly copyright law), and by a deep understanding of the digital archiving effort (both in formats and in strategies for storage). There is no doubt in my mind that librarians will continue to play a role in the effective design of information systems, and that navigating these parameters (i.e., usability, legal issues, sustainability of the systems and, more importantly, the content) will continue to play a role in the systems we design. Just as with the previous example, those skills cannot be developed or exercised in some way that is abstracted from the materials, the users, and the uses. Again, just as with the previous example, current contexts will change, and the <em>skills and instincts</em> will continue to be the enduring element in our future librarians.</li>
</ul>
<p>Because of space constraints, these are only two examples, but examples that show the range of skills and approaches necessary in the current environment. The current environment is extremely fluid in the ways that information is made available and in the ways that users, specifically those in our academic community, interact with it. Too often, academic libraries are defined by that which is held in them (witness the importance of the ARL volume count for defining research libraries). Libraries are, above all else, the people, processes, and resources that connect users and information and, unlike organizations like Google or Amazon, libraries are predicated on a commitment to enduring, reliable access to that information. Libraries curate the growing body of human knowledge and through that curation ensure its longevity and reliability; libraries need to make sure that the right kinds of services and interactions are taking place "in the flow," where (disintermediation or not) users have much more agency and much more direct interaction with networked resources. LIS education should focus its efforts on ensuring that the next generation of academic librarians has an awareness of the issues and an aptitude for designing solutions in that world.</p>
<p><strong>Notes</strong><br />
<a name="#foot1" title="#foot1"></a>[1] It is probably also the case that libraries, in order to have the opportunity to play these service roles in the future, must prove the importance of the curatorial function and their ability to perform it.<br />
<a name="#foot2" title="#foot2"></a>[2] For example, see Haya, Glenn et al. "<a href="http://www.emeraldinsight.com/10.1108/14684520710764122" target="_blank">Metalib and Google Scholar: a user study</a>," in <em>Online Information Review</em>, Vol. 31 No. 3, 2007, pp. 365-375.<br />
<a name="#foot3" title="#foot3"></a>[3] See, for example, the work of the UM Library's Scholarly Publishing Office <a href="http://spo.lib.umich.edu/" target="_blank">(http://spo.lib.umich.edu/)</a> in creating new scholarly publications with sustainable methods, or Deep Blue, the Library's institutional repository <a href="http://deepblue.lib.umich.edu/" target="_blank">(http://deepblue.lib.umich.edu/)</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://scholarlypublishing.org/jpwilkin/archives/10/feed</wfw:commentRss>
		</item>
		<item>
		<title>Next Generation Library Systems</title>
		<link>http://scholarlypublishing.org/jpwilkin/archives/7</link>
		<comments>http://scholarlypublishing.org/jpwilkin/archives/7#comments</comments>
		<pubDate>Fri, 16 Nov 2007 15:56:11 +0000</pubDate>
		<dc:creator>jpwilkin</dc:creator>
		
		<category><![CDATA[library technology]]></category>

		<guid isPermaLink="false">http://scholarlypublishing.org/jpwilkin/archives/7</guid>
		<description><![CDATA[The problem
With the backdrop of the widely touted lessons of Amazoogle—an expression I can barely stand to write—three of the more interesting emerging developments of late have been OCLC’s WorldCat Local, Google Book Search, and Google Scholar. As Lorcan Dempsey argued, the "massive computational and data platforms [of Google, Amazon and EBay] exercise [a] strong [...]]]></description>
			<content:encoded><![CDATA[<p><strong>The problem</strong><br />
With the backdrop of the widely touted lessons of Amazoogle—an expression I can barely stand to write—three of the more interesting emerging developments of late have been OCLC’s WorldCat Local, Google Book Search, and Google Scholar. As Lorcan Dempsey <a href="http://orweblog.oclc.org/archives/000562.html" target="_blank">argued</a>, the "massive computational and data platforms [of Google, Amazon and EBay] exercise [a] strong gravitational web attraction," a sort of undeniable central force in the solar system of our users’ web experience. What has happened with WorldCat Local, Google Book Search and Google Scholar has extended that same sort of pull to key scholarly discovery resources. No one needed the OCLC environmental scans to be reminded that our users look to Google before they turn to the multi-million dollar scholarly resources that we purchase for them, and everyone was aware that Amazon satisfied a broad range of discovery needs more effectively than the local catalog. Now, however, mainstream “network services” like Amazon and Google web search, deficient in their ability to satisfy scholarly discovery, are complemented by similarly “massive computational and data platforms” that specialize in just that—finding resources in the scholarly sphere. These forces, and perhaps more like them in the future, should influence the way that we design and build our library systems. If we ignore these types of developments, choosing instead to build systems with ostensibly superior characteristics, <em>systems that sit on the margins</em>, we effectively ensure our irrelevance, building systems for an idealized user who is practically non-existent.</p>
<p>Our resources, skills and investments have helped to create an opportunity for us to shape a next generation of library systems, simultaneously cognizant of the strong network layer <strong>and</strong> our needs and responsibilities as a preeminent research library. At Michigan, we have designed and built our past systems, each in partial isolation from the other system, reflecting the state of library technology and our response to user needs. We were not <em>wrong</em> in the way that we developed our systems, but rather we were right for those times. In building things in this way, we have developed an LMS support team with extraordinary talent and responsiveness, a digital library systems development effort that blazed trails and continues to be valued for the solidity of its product, and base-funded IT infrastructure that is utterly rock-solid--all great, but generally as independently conceived efforts.<a href="http://scholarlypublishing.org/jpwilkin/archives/7#foot1">[1]</a> What libraries like ours must do now is reconceive our efforts in light of the changed environment. The reconceptualization should, as mentioned, not only be built with an awareness of the new destinations our users choose, but also with a recognition that we have a special responsibility for the long-term curation of library assets. Even at its most successful, Google Scholar does not include all of the roughly $8m in electronic resources that we purchase for the campus, and Google Book Search is not designed to support the array of activities that we associate with scholarship.</p>
<p>Knowing that we must change where we invest our resources is one thing; knowing where we must invest is another. I don’t believe I should (or could) paint an accurate picture of the sorts of shifts we should make. On the other hand, I can lay out here a number of key principles that should guide our work.</p>
<p><strong>Principles<br />
1.    Balanced against network services</strong>: I believe this is probably the most important principle in the design of what we must build. We must not try to do what the network can do for us. We must find ways to facilitate integration with network services and ensure that our investment is where our role is most important (e.g., not trying to compete with the network services <em>unless</em> we think we can and should displace them in a key area). For example, we have recognized that Google will be a point of discovery, and so rather than trying to duplicate what they do well for the broad masses of people, we should (1) put all things online in a way that Google can discover; and (2) because we recognize that Google won’t build services in ways that serve all scholarly needs, work to strategically complement what they do. In the first instance (i.e., making sure that Google can discover resources), we will always need to block them, for legal or other reasons, from discovering content.<a href="http://scholarlypublishing.org/jpwilkin/archives/7#foot2">[2]</a> These types of exceptions should add nuance to what we do in exposing content. In the second instance, when it comes to building complementary services, we’ll need to be both smart (and well-informed) and strategic.</p>
<p><strong>2.    Openness</strong>:  What we develop should easily support our building services <em>and</em>, even more importantly, should allow others to build them. It should take advantage of existing protocols, tools and services. Throughout this document, I want to be very clear that these principles or criteria don’t necessarily point to a specific tool or a specific way of doing things. Here, I would like to note that the importance of openness, though great, does not necessarily point to the need to do things as <em>open source</em>.  As O’Reilly <a href="http://www.oreilly.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html" target="_blank">has written</a> in his analysis of the emergence of Web 2.0, this is what we see in Amazon’s and Google’s architectures, where the mechanisms for building services are clearly articulated, but no one sees the code for their basic services: the investment shifts from shareable software to services. Similarly, our being open to having external services built on top of our own should not imply that our best or only route is open source software. What is particularly important is the need to have data around which others would like to build tools and services: openness in resources that few wish to include is really only beautifying a backwater destination.</p>
<p><strong>3.    Open source</strong>: Despite what I noted above about openness, we should try, wherever possible, to do our work with open source licensing models and we should try to leverage existing open source activities. In part, this is simply because, in doing so, we’ll be able to leverage the development efforts of others. We should also aim for this because of the increasing cost of poorly functioning commercial products in the library marketplace. Note, though, that when we choose to use open source software, it’s important to pick the right open source development effort—one that is indeed open and around which others are developing. Much open source software is isolated, with few contributions. We should aim for openness in our services over slavish devotion to open source. We should also choose this route when we can simply because it's the best economic model for software in our sphere.</p>
<p><strong>4.    Integration</strong>: Tight integration is not the most important characteristic of the systems we should build, nor should this sort of integration be an end in itself; however, we have an opportunity to optimize integration across all or most of our systems, making an investment in one area count for others. In Michigan’s MBooks repository, we have already begun to demonstrate some of the value in this type of integration by relying on the Aleph X-Server for access to bibliographic information, and we should continue to make exceptions to tighter integration only after careful deliberation. A key example is <a href="http://scholarlypublishing.org/jpwilkin/archives/6" target="_blank">the use of metasearch for discovery of remote and local resources</a>: we should need to address only a single physical or virtual repository for locally-hosted content. We should give due consideration to the value of “loose” integration (e.g., automatically copying information out of sources and into target systems), but the example of the Aleph X-Server has been instructive and shows the way this sort of integration can provide both increased efficiency and greater reliability in results.</p>
<p><strong>5.    Rapid development</strong>: If we take a long time to develop our next generation architecture, it will be irrelevant before we deploy it. I know this pressure is a classic tension point between Management and Developers: one perspective holds that we’re spending our time on fine-looking code rather than getting a product to the user, and the other argues that work done rapidly will be done poorly. This dichotomy is false. The last few years of Google’s “perpetual beta” and a rapidly changing landscape have underlined the need to build services quickly, while the importance of reliability and unforgiving user expectations have helped to emphasize the value of a quality product. We can’t do one without the other, and I think the issue will be scaling our efforts to the available resources, picking the right battles, and not being overambitious.</p>
<p><strong>Directions</strong><br />
These sorts of defining principles are familiar and perhaps obvious, but what is less obvious is where all of this points. Although there are some clear indications that these sorts of principles are at play in, for example, the adoption of WorldCat Local or the integration of Fedora in VTLS’s library management system, there are also contradictory examples (e.g., the rush to enhance the local catalog, and many more silo-like systems like DSpace), and I’ve heard no articulations of an overarching integrated environment. If we undertake a massive restructuring of our IT infrastructure rather than strategic changes in some specific areas, or tweaking in many areas, it may appear to be an idiosyncratic and expensive development effort that robs one's larger library organization of limited cycles for enhancements to existing systems. On the other hand, if we don’t position ourselves to take advantage of the types of changes I mentioned at the outset, we will polish the chrome on our existing investments for a few years until someone else gets this right or libraries are entirely irrelevant. Moreover, if we make the right sorts of choices in the current environment, we should also be able to capitalize on the efforts of others, thus compounding the return on each library’s investment. And of course, situating this discussion in a multi-institutional, cooperative effort minimizes the possibility that building the new architecture robs our institutions of scarce cycles.</p>
<p>It’s important, also, to keep in mind that this kind of perspective (i.e., the one I’m positing here) doesn’t presume to replace our existing technologies with something different. Many libraries have made many good choices on technologies that are serving their institutions well, and to the extent that they are the best or most effective tool for aligning with the principles I’ve laid out, we should use them. The X-Servers of Aleph and MetaLib are excellent examples of tools that allow the sort of integration we imagine. At UM, our own <a href="http://www.dlxs.org/" target="_blank">DLXS</a> and the new repository software we developed are powerful and flexible tools without the overhead of some existing DL tools. But in each case, it may make more sense to migrate to a new technology because we are elaborating a model of broader integration (both locally and with the ‘net) that others may also use. Where there is a shared development community (e.g., <a href="http://www.fedora-commons.org/" target="_blank">Fedora</a>, <a href="http://www.open-ils.org/" target="_blank">Evergreen</a> or <a href="http://libraryfind.org/" target="_blank">LibraryFind</a>), we can benefit from a community of developers. In all of this, we’ll need a strategy, and a strategy that remains flexible as the landscape changes.</p>
<p>It’s time to see our environment as being comprised of a set of inventory management responsibilities (both print and digital, both local and remote) that leverages a growing and maturing array of network services so that our users can effectively discover and use the resources available to them. I think that requires a change in the way we think about our technologies and a much more strategic arrangement of those technologies in relation to each other. We may be stuck with a bunch of local print “repositories” because of the nature of print and the history of library development. That’s not the case for our digital repository, however. On top of this, we need to conceptualize the sorts of services we need (e.g., ingest, exposure, other types of dissemination, archiving, etc.) and the tools that can best accomplish these things.</p>
<p><strong>Notes</strong><br />
<a title="foot1" name="foot1"></a>[1] Incidentally, I also believe that <a href="http://www.lib.umich.edu/lit/" target="_blank">Michigan’s organizational model</a>, comprised as it is of five distinct IT departments, is ideally suited to building the next generation of access and management technologies. Core Services should continue to provide a foundation of technology relevant to all of our activities, and should continue to develop and maintain system integration services used by all of the Library’s IT units. Library Systems will need to continue to support operational activities such as circulation and cataloging at the same time that it manages our most important database of descriptive metadata. DLPS should continue to focus on technologies that manage and provide access to the digital objects themselves—the data described by those metadata. Web Systems is ideally suited to provide a top layer of discovery and “use” tools that tap into both local data resources and those things we license remotely. I believe that our current organizational model shares out responsibility effectively and allows for a sort of specialization that is complementary; however, I wouldn’t rule out different organizational models if they made sense in the course of this process. For those readers outside the UM Library, the fifth department is Desktop Support Services, responsible not only for the desktop platform but also for the infrastructure supporting it.</p>
<p><a title="foot2" name="foot2"></a>[2] For example, with regard to <a href="http://deepblue.lib.umich.edu/" target="_blank">Deep Blue</a>, our institutional repository, in Michigan’s agreement with Wiley, approximately 33% of the Wiley-published/UM-authored content is restricted to UM users; and in our agreement with Elsevier, we may make it possible for Google to discover metadata but not fulltext. Similar things are bound to occur in the materials we put online in services other than Deep Blue.</p>
]]></content:encoded>
			<wfw:commentRss>http://scholarlypublishing.org/jpwilkin/archives/7/feed</wfw:commentRss>
		</item>
		<item>
		<title>Mastering the crust</title>
		<link>http://scholarlypublishing.org/jpwilkin/archives/9</link>
		<comments>http://scholarlypublishing.org/jpwilkin/archives/9#comments</comments>
		<pubDate>Fri, 16 Nov 2007 03:07:56 +0000</pubDate>
		<dc:creator>jpwilkin</dc:creator>
		
		<category><![CDATA[pizza]]></category>

		<guid isPermaLink="false">http://scholarlypublishing.org/jpwilkin/archives/9</guid>
		<description><![CDATA[It's probably just that I'm a slow learner, but getting a great crust took me a few years.  A good crust is fairly easily in reach and a good crust alone is worth the effort, but stepping it up a notch requires finding the right balance of temperature, tools and ingredients.
Temperature: While the dough [...]]]></description>
			<content:encoded><![CDATA[<p>It's probably just that I'm a slow learner, but getting a great crust took me a few years.  A good crust is fairly easily in reach and a <em>good</em> crust alone is worth the effort, but stepping it up a notch requires finding the right balance of temperature, tools and ingredients.</p>
<p><strong>Temperature:</strong> While the dough is rising, pre-heat your oven with the pizza stone inside it. Here's one of the big challenges. Of course you'd prefer a wood-fired pizza oven, but that's not gonna happen for most of us. You'll want an oven that holds a very high temperature and keeps fairly even heat. I tend to run our electric convection oven at about 530°.  This allows the crust to brown nicely in a very short period of time and avoids drying out the crust.  Although putting the pizza stone at the top of the oven will make sure it's in the hottest part of the oven, if you're able to get the temp up that high, it won't really matter, and having a few extra inches of working space in sliding the pizza off the peel can be helpful; put the stone on a middle rack with lots of room above.</p>
<p><strong>Tools:</strong> In addition to the oven, you'll want a few things like a nice pizza stone (a good, heavy one will hold the heat better) and a decent peel. It also helps to have a brush (to brush oil on the dough).</p>
<p><strong>Dough:</strong> Getting a good dough is about balance. If your water is too hot, it'll kill the yeast; too cold, and the yeast won't become active enough. In my opinion, ditto on the flours: too much white flour, you'll lose out on texture and taste; and, for my approach, too much whole wheat and semolina, you'll miss out on the delicate flavors that balance against everything else. All that said, I've found that the preparation of the sponge is one of the most forgiving parts of making a good dough.<br />
<strong>Yeast "sponge"</strong><br />
&nbsp;&nbsp;&nbsp;approx. 2t active dry yeast<br />
&nbsp;&nbsp;&nbsp;a little less than 2/3c of warm water (about 105°)<br />
&nbsp;&nbsp;&nbsp;1T whole wheat flour<br />
&nbsp;&nbsp;&nbsp;1T honey<br />
&nbsp;&nbsp;&nbsp;about 2T white wine<br />
&nbsp;&nbsp;&nbsp;about 1t olive oil<br />
Combine these ingredients, minus the white wine, and let sit for about 5 minutes. The yeast should begin to foam. (If the yeast doesn't foam, it may be because the yeast was too old or because the water temperature wasn't right. If you suspect the culprit was the yeast, the only solution is to toss the sponge and the yeast and start all over.) After the yeast begins to foam, add the wine and mix well.<br />
<strong>Flour</strong><br />
While the yeast is activating, combine the following dry ingredients in a bowl:<br />
&nbsp;&nbsp;&nbsp;1/2c semolina<br />
&nbsp;&nbsp;&nbsp;1/3c fresh organic whole wheat flour (it'll give your dough a nice, almost nutty flavor)<br />
&nbsp;&nbsp;&nbsp;about 1/2c unbleached white flour, preferably organic<br />
&nbsp;&nbsp;&nbsp;1-2t sea salt<br />
Mix the sponge into the flour mixture and turn out onto a floured surface. Knead 5-10 minutes, until the dough has a springy, resilient feel. In addition to the unbleached white flour you mixed in at the outset, as you're kneading, add as much additional flour as you need to have the dough be just a tad less than sticky. When you've kneaded enough, you'll be able to push the dough down with your hand and it'll rebound in a few seconds. Drizzle a small amount (1/2t) of oil in a bowl, roll the ball of dough around the inside of the bowl, and let rise for an hour in a slightly warm, draft-free place. I place a slightly damp towel over the bowl and put the bowl in the unused side oven in our two-oven range. After the dough has risen to about 1.5-2 times its original size, put it out on the counter with a bit of flour and knead it down so that the air is out of the dough--about two minutes.</p>
<p><strong>Rolling out the dough</strong><br />
You'll want to avoid using a rolling pin to roll out the dough, as a rolling pin is likely to take too much of the air out of the dough and give you a harder, less flavorful dough.  Start by pressing the ball of dough out with the heel of your hand until it begins to form a flatish circle, and then continue to press the dough from the inside out, again, with the heel of your hand.  Rotate the ball around as you press outward.  Occasionally sprinkle the ball with a small amount of flour and flip it over, using the flour on the bottom to keep the dough from sticking to your surface.  Once it reaches roughly half the size of your pizza, tossing the dough (spinning it as it goes up) in the air actually helps to stretch the dough without taking more air out of the dough.  Continue to rotate the crust on your surface, pushing outward with the heel of your hand, until it's reached the size you'd like for your pizza, about 14" in diameter.</p>
<p><strong>Finishing up</strong><br />
Put a liberal amount of rough cornmeal on a pizza peel and the toss the dough onto the peel.<br />
Brush a thin coat of olive oil on the dough, particularly the outside eadges.<br />
When you top it, avoid being overgenerous with the toppings, particularly the cheese.  A thinner layer is better for the flavor of the dough and the toppings.<br />
Especially if you've been able to get 530° for your oven, cook for about 10-12 minutes.  I try to turn the pizza from back to front about halfway through, even though the convection oven evenly distributes the heat, as the back of the oven still cooks more quickly.</p>
]]></content:encoded>
			<wfw:commentRss>http://scholarlypublishing.org/jpwilkin/archives/9/feed</wfw:commentRss>
		</item>
		<item>
		<title>A year of pizzas</title>
		<link>http://scholarlypublishing.org/jpwilkin/archives/8</link>
		<comments>http://scholarlypublishing.org/jpwilkin/archives/8#comments</comments>
		<pubDate>Tue, 13 Nov 2007 03:46:04 +0000</pubDate>
		<dc:creator>jpwilkin</dc:creator>
		
		<category><![CDATA[pizza]]></category>

		<guid isPermaLink="false">http://scholarlypublishing.org/jpwilkin/archives/8</guid>
		<description><![CDATA[Pizza is a regular occurrence at our place.  Every week, either on Sunday or Friday, Maria and I collaborate to create a pie.  I've got to admit that I don't think of them as pies, that term that seems distinctly east coast to this southern boy.  This all began about seven years [...]]]></description>
			<content:encoded><![CDATA[<p>Pizza is a regular occurrence at our place.  Every week, either on Sunday or Friday, Maria and I collaborate to create a pie.  I've got to admit that I don't think of them as <em>pies</em>, that term that seems distinctly east coast to this southern boy.  This all began about seven years ago with my earnest pursuit of trying to create a great crust.  Over time, I learned a few things, and the results of my effort shifted from being doughy monstrosities, to barely manageable soft forms that sometimes collapsed in the process of getting them into the oven (more than a couple became calzones),  and finally to what we have today.  On <a href="http://525second.smugmug.com/gallery/1551812" target="_blank">smugmug, you can find 52 pictures of the pizzas we made</a>, nearly all from the last 18 months, though with one that significantly predates the rest.  This beauty, a fresh fig, leek, fontina and pancetta pizza, was one of the softer varieties, but one that held together, prepared as a course for my uncle and aunt (Rodger [sic] and Betty), visiting from Kansas:  <a href="http://525second.smugmug.com/gallery/1551812#74856793-A-LB" target="_blank"><img src="http://525second.smugmug.com/photos/74856793-S.jpg" height="300" width="400" /></a></p>
<p>In the process, the oversized and very soft dough resulted in this very rustic look. Among the 52 pictures, you'll find <a href="http://525second.smugmug.com/gallery/1551812#196878976-A-LB" target="_blank">a more recent version of the same thing</a>, done with more competence but less of the chaotic beauty of this first one.   Incidentally, we decided to stop taking regular pictures of the pizzas at #52, so you'll only find an occasional update on the smugmug site.</p>
<p>At this point, before going further, I need to acknowledge, more than by name, the contributions of my partner in all of this.  Maria is the master (mistress?) of the toppings, assembling amazing combinations of herbs and tomatoes, as well as, frequently, other layers like pesto.  The pizzas wouldn't be what they are without her contributions.</p>
<p>After my dogged pursuit of the great crust, I've concluded that having it work right depends on a host of things, including the right ingredients, good "tools," a great oven, and skill with the dough.    I'll post my version soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://scholarlypublishing.org/jpwilkin/archives/8/feed</wfw:commentRss>
		</item>
		<item>
		<title>Metasearch vs. Google Scholar</title>
		<link>http://scholarlypublishing.org/jpwilkin/archives/6</link>
		<comments>http://scholarlypublishing.org/jpwilkin/archives/6#comments</comments>
		<pubDate>Tue, 06 Nov 2007 01:59:04 +0000</pubDate>
		<dc:creator>jpw</dc:creator>
		
		<category><![CDATA[metasearch]]></category>

		<category><![CDATA[google scholar]]></category>

		<guid isPermaLink="false">http://scholarlypublishing.org/jpwilkin/2007/11/05/metasearch-vs-google-scholar/</guid>
		<description><![CDATA[What the world needs now is not another metasearch engine.  Mind you, having more and better and even free metasearch engines is a good thing, but there are already many metasearch engines, each with different strengths and weaknesses, and even some that are free and open source (e.g., see Oregon State’s LibraryFind).  Metasearch [...]]]></description>
			<content:encoded><![CDATA[<p>What the world needs now is not another metasearch engine.  Mind you, having more and better and even free metasearch engines is a good thing, but there are already many metasearch engines, each with different strengths and weaknesses, and even some that are free and open source (e.g., see <a href="http://libraryfind.org/" target="_blank">Oregon State’s LibraryFind</a>).  Metasearch isn’t an effective solution for the problem at hand.</p>
<p>Let’s start with the problem:  each of our libraries invests millions of dollars each year in a wide array of electronic resources for the campus, and we’d like to make it possible for our users to get the best possible information from these electronic resources in the easiest possible way.  When presented with this problem over the years, libraries have tacitly posed two possible solutions:  (1) bring all of the information together into a single database, or (2) find some way to search across all of these resources with a single search.  I suspect no one in our community has the audacity to suggest the first option as a solution because it’s <em>crazy talk</em>.  On the other hand, though, for more than a decade we’ve held out the hope of being able to search across many databases as a solution.  Wikipedia perhaps says it best in defining the term <a href="http://en.wikipedia.org/wiki/Metasearch" target="_blank"><em>metasearch</em></a>:  "Metasearch engines create what is known as a virtual database. They do not compile a physical database or catalogue of [all of their sources]. Instead, they take a user's request, pass it to several other heterogeneous databases and then compile the results in a homogeneous manner based on a specific algorithm."  Elsewhere, in the more polished <a href="http://en.wikipedia.org/wiki/Federated_search" target="_blank">entry for <em>federated search</em></a> (a more old-fashioned reference to the same concept), the author notes that federated searching solves the problem of scatter and lack of centralization, making a wide variety of documents “searchable without having to visit each database individually.”</p>
<p>Metasearch is a librarian’s idealistic solution to an intractable problem.<a href="http://scholarlypublishing.org/jpwilkin/archives/6#foot1">[1]</a>   Metasearching works, and there are standards that help ensure that it does.  So why doesn’t metasearch work to solve the larger problem I laid out at the beginning?  There are many reasons:  small variability in network performance, vast variations in the ways that different vendors database systems work, even greater variation in the information found in those different databases, and an overwhelming number of sources.  We complain at Michigan that our vendor product, MetaLib, is only able to search eight databases at once, but if there were no limits would we ask it to search the roughly 800 resources we currently list for our users?  Surely these problems are tractable.  Networks get more robust, standards are designed to iron out differences in systems, and 800 hardly seems like a large number.  Nevertheless, networks are in fact very robust right now and those standards only persist in trying to hamstring vendors who are trying to distinguish themselves from their competitors, and 800 <em>is</em> a very large number.  Despite all we do, even in the simplest metasearch applications today, when we repeat the <em>same</em> query against the <em>same</em> set of databases, we retrieve <em>different</em> results (IMHO, one of the greatest sins imaginable in a library finding tool).  We toss out important pieces of functionality in some of the resources in order to find the right lowest common denominator.  (Think about the plight of our hapless user when one database consists of fulltext and another is only bibliographic information:  a search of the first resource needs to be crafted carefully to avoid too-great recall, and a the search of the second needs the broadest set of possible terms to avoid too high a level of precision.)   This is not to say that it doesn’t make perfect sense to use metasearch to attack, say, a small group of similarly constructed and perhaps overlapping engineering databases rather than submitting the same search against each in some serial fashion.</p>
<p>Although metasearch doesn’t work to conduct discovery over the great big world of licensed content, creating a comprehensive database does work to conduct discovery over a vast array of resources.  Recent years have seen several presumptive dominant comprehensive databases.  Elsevier’s Scopus (focusing on STM and social science content) <a href="http://www.info.scopus.com/news/press/pr_050621.asp" target="_blank">claims</a> that its “[d]irect links to full-text articles, library resources and other applications like reference management software, make Scopus quicker, easier and more comprehensive to use than any other literature research tool.”  Scopus is just one of the most recent entrants in an arena where California’s <a href="http://oedb.org/" target="_blank">Online Education Database</a>, with its slogan of “Research Beyond Google,” can <a href="http://oedb.org/library/college-basics/research-beyond-google" target="_blank">claim</a> to present “119 Authoritative, Invisible, and Comprehensive Resources.”  Ironically, in describing the problem of getting at an “invisible web” estimated to be 500 times the size of the visible web, the OEDB poses itself as <a href="http://oedb.org/library/college-basics/research-beyond-google" target="_blank">going <em>beyond</em> Google</a>, when the obvious place to turn in all of this is <em>Google Scholar</em>.</p>
<p>Google Scholar (GS) is absolutely <em>not</em> a replacement for the vast array of resources we license for our users.  Criticisms of Google Scholar abound.  Perhaps most troubling to an academic audience, GS is secretive about its coverage:  no information exists either inside GS or by any watchdog group analyzing the extent of its coverage in any area or for any publisher.  Moreover, it will probably always be the case that some enterprises in our sphere fund the work of finding and indexing the literature of a discipline, online and offline, by charging for subscriptions, thus putting them in direct opposition to GS and keeping their indexes out of GS.  (Consider, for example, the Association of Asian Studies with its <a href="http://quod.lib.umich.edu/b/bas/" target="_blank">Bibliography of Asian Studies</a> or the Modern Language Association and the <a href="http://www.mla.org/bibliography" target="_blank">MLA Bibliography</a>, each funding its bibliographic sleuthing by selling access to the resulting indexes.  To give their information to GS is to destroy the same funding that makes it possible for them to collect the information.)  And yet, as we learned in the recent article “Metalib and Google Scholar: a User Study,” undergraduates are more effective in finding needed information through Google Scholar than through our metasearch tools.<a href="http://scholarlypublishing.org/jpwilkin/archives/6#foot2">[2]</a></p>
<p>If metasearch is an ineffective tool for comprehensive “discovery” and Google Scholar has its own shortcomings, the need and the opportunity in this space is <em>not</em> creating a more effective metasearch tool; rather, the challenge is to bring these two strategies together in a way that best serves the interests of an insatiable academic audience, whether undergraduate, graduate or faculty.</p>
<p>Recently, Ken Varnum (our head of Web Systems) and I brainstormed about a few approaches and followed this with a conversation with Anurag Acharya, who developed Google Scholar.  I toss out the the strategies that follow to seed this conversation space with a few ideas, not to pretend to be exhaustive or to point to the best possible solution.  These need to be further developed and tested before exploring them further.  In each of these, the scenario begins with an authenticated user querying Google Scholar.  While the GS results are coming back and are presented to the user, into either a separate frame (Anurag’s recommendation, based on usability work at Google) or into a separate pop-up window, we present information about other sources that might prove useful.</p>
<p><strong>1. Capitalize on user information to augment GS searches</strong>:  When a user authenticates, we have at our disposal a number of attributes about the user such as status, currently enrolled courses, and degree programs.  With this, we initiate a metasearch of databases we deem to be relevant and either return, in that frame or window, ranked results or links to hit counts and databases.  One advantage of this approach is that it’s fairly straightforward with few significant challenges.  We would probably want to capitalize on work done by Groningen in their <a href="http://livetrix.ub.rug.nl/" target="_blank">Livetrix</a> implementation, where they eschew the standard MetaLib interface for a connection to the MetaLib X-Server so that they can better tailor interaction with the remote databases and present results.  The obvious disadvantage to this approach is that we make an assumption about a user based on his or her subject focus:  when a faculty member in English searches Google Scholar for information on mortality statistics in 16c England, we’re likely to have missed the mark by searching <em>MLA Bibliography</em>.</p>
<p><strong>2. Capitalize on query information to augment GS searches</strong>:  In this scenario, we find some way to intercept query terms to try to map concepts to possible databases.  We would use the same basic technical approach described above (i.e., GS in the main frame or window; other results in a separate frame or window) to ensure that the user immediately gets on-target results, but through sophisticated linguistic analysis we find and introduce the user to other databases that might bear fruit.  This approach avoids the deficiency of the first by making no assumptions about a user’s interest based on his or her degree/departmental affiliation.  It does, however, create great challenges for us in creating quick and accurate mapping relationships between brief (one- or two-word) query terms and databases.  Although a library might be able to undertake the first strategy with only modest resources, this second approach requires partnership with researchers in areas such as computational linguistics.</p>
<p><strong>3. Introduce the user to the possibility of other resources</strong>:  This more modest approach only requires the library interface to figuratively tap the user on the shoulder and point out that, in addition to GS, other resources may be helpful.  So, for example, we might submit the user’s query to GS <em>while</em> we submit the same query to Scopus and Web of Science, two other fairly comprehensive resources, produce hit counts, and suggest to the user that s/he look at results from these two databases or some of our other 800 resources.</p>
<p><strong>4. Use GS results to augment GS</strong>: Use the results from GS, rather than queries to GS, to derive the content of the “you could also look at…” pane.  By clustering things that come back, we could provide some subject areas that might be useful.  Clustering is tricky, of course, for the same reason that metasearch is tricky—we’re not working with a lot of text and with dissimilar text lengths—but if we could pull back the full text of documents via the OpenURL links GS provides, and then cluster that, we might have some useful information.  Again, a library might benefit from collaboration with some area of information science research, particularly on the semantic aspects.  The biggest challenge here would be in doing something that doesn’t introduce significant delay (and thus annoyance); however, we might accomplish this by offering it as an <em>option</em> to users (i.e., as in “good stuff here, but think you might want more and better?”).</p>
<p>Our challenge is to help our users through the maze of e-resources without interrupting their journey, getting them to results as quickly as possible; by combining results from Google Scholar with licensed resources we can help them get fast results <em>and</em> become more aware of the wealth of resources available to them.  All of these ideas are off-the-cuff and purposely sketchy.  Ken and I have spent little time exploring the opportunities or pitfalls. Some approaches will lend themselves to collaboration more than others (e.g., collaboration with HCI and linguistics researches), but all benefit from further study (How much more effective is this approach than traditional metasearch?  Than Google Scholar alone?  How satisfied is the user with the experience compared to those other approaches?).</p>
<p><strong>Notes</strong><br />
<a title="foot1" name="foot1"></a>[1] Note the interestingly self-serving article by Tamar Sadeh, from Ex Libris, where she concludes, “Metasearch systems have several advantages over Google Scholar. We anticipate that in the foreseeable future, libraries will continue to provide access to their electronic collections via their branded, controlled metasearch system” (<em>HEP Libraries Webzine</em>, Issue 12 / March 2006, <a href="http://library.cern.ch/heplw/12/papers/1/" target="_blank">http://library.cern.ch/heplw/12/papers/1/</a>).</p>
<p><a title="foot2" name="foot2"></a>[2] Haya, Glenn, Else  Nygren, and Wilhelm  Widmark. “<a href="http://www.emeraldinsight.com/10.1108/14684520710764122" target="_blank">Metalib and Google Scholar: a User Study</a>” <em>Online Information Review</em>  31(3)(2007): 365-375.  I found one review of the article by an enlightened librarian where he concludes that the moral of the study is that we need to do a better job training our users to use metasearch <a href="http://scholarlypublishing.org/jpwilkin/archives/6#more-6" class="more-link">(more...)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://scholarlypublishing.org/jpwilkin/archives/6/feed</wfw:commentRss>
		</item>
	</channel>
</rss>
