Episode 38 – E-Book Redux

In a very special timetraveling episode, the Digital Campus crew journey back to 2007 to hear from their old selves–specifically, what they said about e-books when Amazon’s Kindle was released–and whether their present selves agree with their ghosts from the past in light of the release of the Kindle 2 and the mobile version of Google Books. Also covered on the podcast are the demise of rumor site Juicy Campus and music site Ruckus, the impact of Creative Commons and downloads on YouTube, and the addition of history to Google Earth. Picks for the episode include a programming interface for New York Times articles, a blog on the futures of learning, a search engine for open journals, and a site for medieval manuscripts.

Links mentioned on the podcast:
Kindle 2
Google Book Search Mobile
New York Times Article Search API
Catalogue of Digitized Medieval Manuscripts
Futures of Learning blog
Google Earth 5′s Historical Imagery
JURN search engine

Running time: 49:18
Download the .mp3

Citing Software

Reference Geoffrey Rockwell and I have been giving considerable thought recently to how we might facilitate the integration of text analysis tools and results into (mostly scholarly) writing. Scholars feel compelled to cite ideas and texts that come from other authors, but they are much less likely to recognized tools that have contributed to their work (and we would probably not want every scholar to cite search engines such as Google that have been used during research). We feel strongly that text analysis tools can represent a significant contributor to digital research, whether they were used to help confirm hunches or to lead the researcher into completely unanticipated realms. Whether or not scholars do make it more of a habit to cite tools is beyond our control, but we want to design our upcoming tools to make it easier for them to do so. At the very least this includes:

  • providing a preferred general citation for the tool suite
  • providing preferred citations for specific results including references to the tool and the source text(s)
  • making it easier for users to extract static or dynamic results and include them elsewhere (a web-based blog editor, an HTML editor, a word processor article, etc.), with a reference

An important component of academic knowledge is reproducibility, and providing scholars with more information on the processes followed during research – including the text analysis tools and digital texts used – is sure to be important.

I was prompted to write this post by a recent notice in a Globe and Mail article that provided several statistics:

These figures have been compiled by Patrick Brethour, the Globe and Mail’s British Columbia editor, drawing from the 2006 census with the help of special software from Tetrad Computer Applications Inc.

The figures referred to are mostly present in the text of the article as well, but I wonder if the editor would have been as likely to include this notice if there hadn’t been the inset with the concentrated statistics. The distinction is important because it’s about recognizing what contributed to the research regardless of how the results are presented (though ironically, journalism tends to have very different standards of citation that academic writing, and yet it’s in a newspaper article that we find a software tool cited). Will standards for citing digital tools in the humanities shift in the coming years?

Cooliris-enabled scrapbook

There’s more 3D goodness for you to enjoy now that the Mapping our Anzacs scrapbook is Cooliris-enabled. If you have Cooliris installed, you’ll notice that the Cooliris icon on your browser toolbar lights up when you visit the site. Just click on the icon to browse all the photos posted to the scrapbook on a glorious 3D wall.

Scrapbook posts in 3D

Scrapbook posts in 3D

(If you don’t have Cooliris then go and get it. It can be used both in Internet Explorer and Firefox, though you’ll probably need to have admin rights to install for IE.)

Having given the 3D treatment to digitised files from the National Archives of Australia and portrait images from the Australian Dictionary of Biography, it wasn’t too hard to do. The scrapbook is a Tumblr site and the api makes it easy to extract all the photos. So I created a php file to gather all the details and then write them to a media-rss file. Then it was just a matter of  inserting a link to it in the scrapbook.

Code follows:

<p><?php<br />
if ($_GET['start']) {<br />
$start = $_GET['start'];<br />
} else {<br />
$start = 0;<br />
}<br />
$url = "http://our-anzacs.tumblr.com/api/read?start=$start&#038;num=50&#038;type=photo&#038;filter=text";<br />
$ch = curl_init();<br />
curl_setopt($ch, CURLOPT_URL, $url);<br />
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);<br />
curl_setopt($ch, CURLOPT_FAILONERROR, true);<br />
curl_setopt($ch, CURLOPT_TIMEOUT, 20);<br />
$result = curl_exec($ch);<br />
if (!$result) {<br />
echo "cURL error number:" .curl_errno($ch);<br />
echo "cURL error:" . curl_error($ch);<br />
exit;<br />
}<br />
curl_close($ch);<br />
$dom = new DOMDocument();<br />
@$dom->loadHTML($result);<br />
$xpath = new DOMXPath($dom);<br />
$attrs = $xpath->evaluate("//posts/@total[1]");<br />
foreach ($attrs as $attr) {<br />
$total = $attr->nodeValue;<br />
}<br />
$num_pages = ceil($total/50);<br />
$start_next = $start+50;<br />
$start_previous = $start-50;</p>
<p>echo "<?xml version='1.0' encoding='utf-8' standalone='yes'?>n";<br />
echo "<rss version='2.0' xmlns:media='http://search.yahoo.com/mrss/' xmlns:atom='http://www.w3.org/2005/Atom'>n";<br />
echo "<channel>n";<br />
echo "n";<br />
echo "<description>Photos posted to the Mapping our Anzacs scrapbook</description>n";<br />
echo "
<link>http://our-anzacs.tumblr.com</link>n";<br />
if ($start_previous >= 0) {<br />
echo "<atom:link rel='previous' href='moa-media-rss.php?start=$start_previous' />";<br />
}<br />
if ($start_next <= $total) {<br />
echo "<atom:link rel='next' href='moa-media-rss.php?start=$start_next' />";<br />
}<br />
$posts = $xpath->evaluate("//post/@id");<br />
foreach ($posts as $post) {<br />
$id = $post->nodeValue;<br />
$url = "http://our-anzacs.tumblr.com/post/$id";<br />
$photos = $xpath->evaluate("//post[@id='$id']/photo-url[@max-width='500']/text()");<br />
foreach ($photos as $photo) {<br />
$photo_500 = $photo->nodeValue;<br />
}<br />
$photos = $xpath->evaluate("//post[@id='$id']/photo-url[@max-width='250']/text()");<br />
foreach ($photos as $photo) {<br />
$photo_250 = $photo->nodeValue;<br />
}<br />
$nodes = $xpath->evaluate("//post[@id='$id']/photo-caption/text()");<br />
foreach ($nodes as $node) {<br />
$caption = $node->nodeValue;<br />
preg_match("/View details fors+([ws,-]*)/", $caption, $matches);<br />
$names = explode(", ", $matches[1]);<br />
$name = "$names[1] $names[0]";<br />
}<br />
echo "<item>n";<br />
echo "<guid isPermaLink='false'>$id</guid>n";<br />
echo "n";<br />
echo "
<link>$url</link>n";<br />
echo "<media:thumbnail url='$photo_250' />n";<br />
echo "<media:content url='$photo_500' type='image/jpeg' />n";<br />
echo "</item>n";<br />
}<br />
echo "</channel>n";<br />
echo "</rss>n";<br />

The Stimulus Package (and now for something completely different)

Suppose that there were a major fire, and that in order to put out the fire you would need, say, a trillion gallons of water.  Can you imagine a city council that would say, “oh no, we can only afford 734 billion gallons of water, so let’s leave out about a quarter of the neighborhoods.  It’s the right thing to do because we won’t go into debt, and future residents will be better off for having had a quarter of the city burn down.”?

Or, for a better analogy, suppose that your ship is sinking, through a hole that that is 10 feet in diameter.  How about saving on repair costs but inserting a plug that covers only 75 percent of the leak?  Sound like a good plan?  Not so much.

The reason that we need fiscal stiumus is that monetary policy is impotent to provide sufficient stimulus (not generally true, but true now, and essentially no one disagrees with this view).  With an unemployment rate of 7.6 percent, the economy is well below its potential level of output — we are about five percent below potential GDP, and the situation is getting worse by the hour.  The current cost of putting unemployed resources to work in this setting is very low, because the alternative is that those resources will not be used at all. Deficit-financed spending, public and private, can create current income and will reduce unemployment and the risk of future unemployment.  Some of the income generated by the stimulus, and some of the stimulus itself, will go into investment, and hence lead to increases in future income.  The income gains are valuable in themselves, and will offset a good deal of the taxes required to service the debt.  This analysis would be completely different if the economy were somewhere near full employment.  In that case the new spending, both private and public, would substitute for other activity, and the increase in the deficit would reduce investment and growth.  (To go back to the sinking ship analogy, patching a leak where there is no leak is simply a waste of resources.)

Everything that I have said above is oversimplified, of course, but the public discussion of the size and shape of the stimulus package seems to be missing the point.  The point isn’t to have the cheapest stimulus package possible; the point is to align the size and timing of the package with the size of the problem.  The most immediate and effective form of stimulus is to support state governments, because their revenues are falling and they will be forced, by their own constitutions, to reduce spending and lay off workers.  So the immediate stimulus effect of a dollar of support for state spending is a dollar, growing to about two dollars once the effects percolate through the economy.  (Note that what is really going on is the avoidance of a dollar’s spending reduction, growing to two dollars, at the worst imaginable time.)  In this context, Congress gets all sanctimonious about waste in government.  Halleluljah!

Paul Krugman’s recent columns and blogs on this subject have been excellent, by the way.  I commend them to the world of libraries.

And one more thing.  If we happen to make a mistake and overstimulate the economy, monetary policy will be perfectly effective in reining things in.  One way of characterizing the goal of fiscal policy in the current crisis is to restore the economy to a place where monetary policy can work.  The task is urgent.

Digital Humanities in 2008, Part I

When I wrote a series of blog posts last year summarizing developments in digital humanities, a friend joked that I had just signed on to do the same thing every year.  So here’s my synthesis of digital humanities in 2008, delivered a little later than I intended. (Darn life, getting in the way of blogging!) This post, the first in a series, will focus on the emergence of digital humanities (DH), defining DH and its significance, and community-building efforts.   Subsequent posts will look at developments in research, open education, scholarly communication, mass digitization, and tools.   Caveat lector:  this series reflects the perspective of an English Ph.D. with a background in text encoding and interest in digital scholarship working at a U.S. library who wishes she knew and understood all but surely doesn’t.  Please  add comments and questions.

1.    The Emergence of the Digital Humanities

This year several leaders in digital humanities declared its “emergence.”  At one of the first Bamboo workshops, John Unsworth pointed to the high number of participants and developments in digital humanities since work on the ACLS Cyberinfrastructure report (Our Cultural Commonwealth) began 5 years earlier and noted “we have in fact reached emergence… we are now at a moment when real change seems possible.”  Likewise, Stan Katz commented in a blog post called “The Emergence of the Digital Humanities,” “Much remains to be done, and campus-based inattention to the humanities complicates the task. But the digital humanities are here to stay, and they bear close watching.”

Termite Cathedral (Wikipedia)

Emergence: Termite Cathedral (Wikipedia)

Last year I blogged about the emergence of digital humanities and I suspect I will the next few years as well, but digital humanities did seem to gain momentum and visibility in 2008.  For me, a key sign of the DH’s emergence came when the NEH transformed the Digital Humanities Initiative into the Office of Digital Humanities (ODH), signaling the significance of the “digital” to humanities scholarship.  After the office was established, Inside Higher Ed noted in“Rise of the Digital NEH” that what had been a “grassroots movement” was attracting funding and developing “organizational structure.”  Establishing the ODH gave credibility to an emerging field (discipline? methodology?).  When you’re trying to make the case that your work in digital humanities should count for tenure and promotion, it certainly doesn’t hurt to point out that it’s funded by the NEH.  The ODH acts not only as a funder (of 89 projects to date), but also a facilitator, convening conversations, listening actively, and encouraging digital humanities folks to “Keep innovating.” Recognizing that digital humanities works occurs across disciplinary and national boundaries, the ODH collaborates with funding agencies in other countries such as the UK’s JISC, Canada’s Social Sciences and Humanities Research Council (SSHRC), and Germany’s DFG; US agencies such as NSF, IMLS and DOE; and non-profits such as CLIR.  Although the ODH has a small staff (three people) and limited funds, I’ve been impressed by how much this knowledgeable, entrepreneurial team has been able to accomplish, such as launching initiatives focused on data mining and high performance computing, advocating for the digital humanities, providing seed funding for innovative projects, and sponsoring institutes on advanced topics in the digital humanities.

It also seemed like there were more digital humanities jobs in 2008, or at least more job postings that listed digital humanities as a desired specialization.  Of course, the economic downturn may limit not only the number of DH jobs, but also the funding available to pursue complex projects–or, here’s hoping, it may lead to funding for scanner-ready research infrastructure projects.

2.    Defining “digital humanities”

Perhaps another sign of emergence is the effort to figure out just what the beast is.  Several essays and dialogues published in 2008 explore and make the case for the digital humanities; a few use the term “promise,” suggesting that the digital humanities is full of potential but not yet fully realized.

  • The Promise of Digital History,” a conversation among Dan Cohen, Michael Frisch, Patrick Gallagher, Steven Mintz, Kirsten Sword, Amy Murrell Taylor, Will Thomas III, and Bill Turkel published in the Journal of American History.  This fascinating, wide-ranging discussion explores defining digital history; developing new methodological approaches; teaching both skills and an understanding of the significance of new media for history; coping with impermanence and fluidity; sustaining collaborations; expanding the audience for history; confronting institutional and cultural resistance to digital history; and much more. Whew! One of the most fascinating discussion threads: Is digital history a method, field, or medium?  If digital history is a method, then all historians need to acquire basic knowledge of it; if it is a medium, then it offers a new form for historical thinking, one that supports networked collaboration.  Participants argued that digital history is not just about algorithmic analysis, but also about collaboration, networking, and using new media to explore historical ideas.
  • In “Humanities 2.0: Promise, Perils, Predictions”  (subscription required, but see Participatory Learning and the New Humanities: An Interview with Cathy Davidson for related ideas), Cathy Davidson argues that the humanities, which offers strengths in “historical perspective, interpretative skill, critical analysis, and narrative form,” should be integral to the information age.  She calls for humanists to acknowledge and engage with the transformational potential of technology for teaching, research and writing.
    Extra Credit, by ptufts

    Extra Credit, by ptufts

    Describing how access to research materials online has changed research, she cites a colleague’s joke that work done before the emergence of digital archives should be emblazoned with an “Extra Credit” sticker.  Now we are moving into “Humanities 2.0,” characterized by networked participation, collaboration, and interaction.  For instance, scholars might open up an essay for criticism and commentary using a tool such as CommentPress, or they might collaborate on multinational, multilingual teaching and research projects, such as the Law in Slavery and Freedom Project.   Yet Davidson acknowledges the “perils” posed by information technology, particularly monopolistic, corporate control of information.   Davidson contributes to the conversation about digital humanities by emphasizing the importance of a critical understanding of information technology and advocating for a scholarship of engagement and participation.

  • In “Something Called ‘Digital Humanities’”, Wendell Piez challenges William Deresiewicz’s dismissal of “something called digital humanities” (as well as of “Contemporary lit, global lit, ethnic American lit; creative writing, film, ecocriticism”).  Piez argues that just as Renaissance “scholar-technologists” such as Aldus Manutius helped to create print culture, so digital humanists focus on both understanding and creating digital media. As we ponder the role of the humanities in society, perhaps digital humanities, which both enables new modes of communicating with the larger community and critically reflects on emerging media, provides one model for engagement.

3.    Community and collaboration

According to Our Cultural Commonwealth, “facilitat[ing] collaboration” is one of the five key goals for the humanities cyberinfrastructure.   Although this goal faces cultural, organizational, financial, and technical obstacles, several recent efforts are trying to articulate and address these challenges.

To facilitate collaboration, Our Cultural Commonwealth calls for developing a network of research centers that provide both technical and subject expertise.  In A Survey of Digital Humanities Centers in the United States, Diane Zorich inventories the governance, organizational structures, funding models, missions, projects, and research at existing DH centers.  She describes such centers as being at a turning point, reaching a point of maturity but facing challenges in sustaining themselves and preserving digital content.  Zorich acknowledges the innovative work many digital humanities centers have been doing, but calls for greater coordination among centers so that they can break out of siloes, tackle common issues such as digital preservation, and build shared services.   Such coordination is already underway through groups such as CenterNet and HASTAC, collaborative research projects funded by the NEH and other agencies, cyberinfrastructure planning projects such as Bamboo, and informal partnerships among centers.

How to achieve greater coordination among “Humanities Research Centers” was also the topic of the Sixth Scholarly Communications Instititute (SCI), which used the Zorich report as a starting point for discussion.   The SCI report looks at challenges facing both traditional humanities centers, as they engage with new media and try to become “agents of change,” and digital humanities centers, as they struggle to “move from experimentation to normalization” attain stability (6).   According to the report, humanities centers should facilitate “more engagement with methods,” discuss what counts as scholarship, and coordinate activities with each other.  Through my Twitter feeds, I understand that the SCI meeting seems to be yielding results: CenterNet and the Consortium of Humanities Centers and Institutes (CHCI) are now discussing possible collaboratiions, such as postdocs in digital humanities.

Likewise, Bamboo is bringing together humanities researchers, computer scientists, information technologists, and librarians to discuss developing shared technology services in support of arts and humanities researchers.  Since April 2008, Bamboo has convened three workshops to define scholarly practices, examine challenges, and plan for the humanities cyberinfrastructure.  I haven’t been involved with Bamboo (beyond partnering with them to add information to the Digital Research Tools wiki), so I am not the most authoritative commentator, but I think that involving a wide community in defining scholarly needs and developing technology services just makes sense–it prevents replication, leverages common resources, and ultimately, one hopes, makes it easier to perform and sustain research using digital tools and resources.  The challenge, of course, is how to move from talk to action, especially given current economic constraints and the mission creep that is probably inevitable with planning activities that involve over 300 people.  To tackle implementation issues, Bamboo has set up eight working groups that are addressing topics like education, scholarly networking, tools and content, and shared services. I’m eager to see what Bamboo comes up with.

Planning for the cyberinfrastructure and coordinating activities among humanities centers are important activities, but playing with tools and ideas among fellow digital humanists is fun!  (Well, I guess planning and coordination can be fun, too, but a different kind of fun.)  This June, the Center for New Media in History hosted its first THATCamp (The Humanities and

Dork Shorts at THAT Camp

Dork Shorts at THAT Camp

Technology Camp), a “user-generated,” organically organized “unconference” (very Web 2.0/ open source).  Rather than developing an agenda prior to the conference, the organizers asked each participant to blog about his or her interests, then devoted the first session to setting up sessions based on what participants wanted to discuss.  Instead of passively listening to three speakers read papers, each person who attended a session was asked to participate actively.  Topics included Teaching Digital Humanities, Making Things (Bill Turkel’s Arduino workshop), Visualization, Infrastructure and Sustainability, and the charmingly titled Dork Shorts, where THAT Campers briefly demonstrated their projects. THAT Camp drew a diversity of folks–faculty, graduate students, librarians, programmers, information technologists, funders, etc.  The conference used technology effectively to stir up and sustain energy and ideas—the blog posts before the conference helped the attendees set some common topics for discussion, and  Twitter provided a backchannel during the conference.   Sure,  a couple sessions meandered a bit, but I’ve never been to a conference where people were so excited to be there, so engaged and open.  I bet many collaborations and bright ideas were hatched at THAT Camp.  This year, THAT Camp will be expanded and will take place right after Digital Humanities 2009.

THAT Camp got me hooked on Twitter.  Initially a Twitter skeptic (gawd, do I need another way to procrastinate?), I’ve found that it’s great way to find out what’s going on digital humanities and connect with others who have similar interests.  I love Barbara Ganley’s line (via Dan Cohen): “blog to reflect, Tweet to connect.”  If you’re interesting in Twittering but aren’t sure how to get started, I’d suggest following digital humanities folks and the some of the people they follow.  You can also search for particular topics at search.twitter.com  Amanda French has written a couple of great posts about Twitter as a vehicle for scholarly conversation, and a recent Digital Campus podcast features a discussion among Tweeters Dan Cohen and Tom Scheinfeldt and skeptic Mills Kelly.

HASTAC offers another model for collaboration by establishing a virtual network of people and organizations interested in digital humanities and sponsoring online forums (hosted by graduate and undergraduate students) and other community-building activities.  Currently HASTAC is running a lively, rich forum on the future of the digital humanities featuring Brett Bobley, director of the NEH’s ODH.  Check it out!

Google, Robert Darnton, and the Digital Republic of Letters

Robert Darnton recently published an essay in the New York Review of Books on the Google settlement. There has been much commentary in blogs, listserves, and print media. Below I reproduce a letter that I sent to the New York Review of Books, that they found to be far too long to publish. It is my understanding that they expect to publish a much-shortened revision. In any case, here’s what I had to say.


To the editors:

My colleague and friend Robert Darnton is a marvelous historian and an elegant writer. His utopian vision of a digital infrastructure for a new Republic of Letters (Google and the Future of Books, NYRB Feb. 12) makes the spirit soar. But his idea that there was any possibility that Congress and the Library of Congress might have implemented that vision in the 1990s is a utopian fantasy. At the same time, his view of the world that will likely emerge as a result of Google’s scanning of copyrighted works is a dystopian fantasy.

The Congress that Darnton imagines providing both money and changes in law that would have made out-of-print but in-copyright works (the great majority of print works published in the 20th century) digitally available on reasonable terms showed no interest in doing anything of the kind. Rather, it passed the Digital Millennium Copyright Act and the Sonny Bono Copyright Term Extension Act. (More recently, Congress passed the Higher Education Opportunity Act, which compels academic institutions to police the electronic environment for copyright infringement). This record is unsurprising; the committees that write copyright law are dominated by representatives who are beholden to Hollywood and other rights holders. Their idea of the Republic of Letters is one in which everyone who ever reads, listens, or views pretty much anything should pay to do so, every time.

The Supreme Court, which was given the opportunity to limit the extension of the term of copyright, which was already far too long (like Darnton, I think that 14 years renewable once is more than enough to achieve the purposes of copyright) refused to do so (with only two dissenters) in Eldred v. Ashcroft, decided in 2003. Instead, it upheld legislation that, contrary to the fundamental principles of copyright, provided rewards to authors who are long dead, preventing our cultural heritage from rising into the public domain,

In short, over the last decade and more, public policy has been consistently worse than useless in helping to make most of the works of the 20th century searchable and usable in digital form. This is the alternative against which we should evaluate Google Book Search and Google’s settlement with publishers and authors.

First, we should remember that until Google announced in 2004 that it was going to digitize the collections of a number of the world’s largest academic libraries, absolutely no one had a plan for mass digitization at the requisite scale. Well-endowed libraries, including Harvard and the University of Michigan, were embarked on digitization efforts at rates of less than ten thousand volumes per year. Google completely shifted the discussion to tens of thousands of volumes per week, with the result that overnight the impossible goal of digitizing (almost) everything became possible. We tend to think now that mass digitization is easy. Less than five years ago we thought it was impossibly expensive.

The heart of Darnton’s dystopian fantasy about the Google settlement follows directly from his view that “Google will enjoy what can only be called a monopoly … of access to information.” But Google doesn’t have anything like a monopoly over access to information in general, nor to the information in the books that are subject to the terms of the settlement. For a start (and of stunning public benefit in itself) up to 20% of the content of the books will be openly readable by anybody with an Internet connection, and all of the content will be indexed and searchable. Moreover, Google is required to provide the familiar “find it in a library” link for all books offered in the commercial product. That is, if after reading 20 percent of a book a user wants more and finds the price of on-line access to be too high, the reader will be shown a list of libraries that have the book, and can go to one of those libraries or employ inter-library loan. This greatly weakens the market power of Google’s product. Indeed, it is much better than the current state affairs, in which users of Google Book Search can read only snippets, not 20% of a book, when deciding whether what they’ve found is what they seek.

Darnton is also concerned that Google will employ the rapacious pricing strategies used by many publishers of current scientific literature, to the great cost of academic libraries, their universities, and, at least as important, potential users who are simply without access. But the market characteristics of current articles in science and technology are fundamentally different from those of the vast corpus of out-of-print literature that is held in university libraries and that will constitute the bulk of the works that Google will sell for the rights holders under the settlement agreement. The production of current scholarship in the sciences requires reliable and immediate access to the current literature. One cannot publish, nor get grants, without such access. The publishers know it, and they price accordingly. In particular the prices of individual articles are very high, supporting the outrageously expensive site licenses that are paid by universities. In contrast, because there are many ways of getting access to most of the books that Google will sell under the settlement, the consumer price will almost surely be fairly low, which will in turn lead to low prices for the site licenses. Again, “find it in a library,” coupled with extensive free preview, could not be more different than the business practices employed by many publishers of scientific, technical and medical journals.

There is another reason to believe that prices will not be “unfair”, which is that Google is far more interested in getting people to “google” pretty much everything than it is in making money through direct sales. The way to get people to come to the literature through Google is make it easy and rewarding to do so. For works in the public domain, Google already provides free access and will continue to do so. For works in the settlement, a well-designed interface, 20 percent preview, and reasonable prices are all likely to be part of the package. Additionally, libraries that don’t subscribe to the product will have a free public terminal accessible to their users. This increases the public good deriving from settlement both directly and by providing yet another distribution channel that does not require payment to Google or the rightsholders.

The settlement is far from perfect. The American practice of making public policy by private lawsuit is very far from perfect. But in the absence of the settlement – even if Google had prevailed against the suits by the publishers and authors – we would not have the digitized infrastructure to support the 21st century Republic of Letters. We would have indexes and snippets and no way to read any substantial amount of any of the millions of works at stake on line. The settlement gives us free preview of an enormous amount of content, and the promise of easy access to the rest, thereby greatly advancing the public good.

Of course I would prefer the universal library, but I am pretty happy about the universal bookstore. After all, bookstores are fine places to read books, and then to decide whether to buy them or go to the library to read some more.

Paul N. Courant

Note: This letter represents my personal views and not those of the University of Michigan, nor any of its libraries or departments.


So I was thinking, wouldn’t it be nice if the Australian Dictionary of Biography’s ‘born on this day‘ feature could be made available as an RSS feed. Every morning you’d get a new list of biographies delivered direct to your feed reader. And so…

[sounds of xpath wrangling and PHP coding]

here it is.

It’s pretty simple – it harvests all the links of people born on the current day, then loops through the links to gather the first paragraph of each biography. Then it’s just a matter of writing everything to an RSS file.

In case you missed it, I also created a Media RSS feed for portrait images used in the ADB. This enables them to be viewed in CoolIris.

Code follows…

function getPage($url, $ch) {
	curl_setopt($ch, CURLOPT_URL,$url);
	$html= curl_exec($ch);
	if (!$html) {
		echo "cURL error number:" .curl_errno($ch);
		echo "cURL error:" . curl_error($ch);
	return $html;
$url = "http://www.adb.online.anu.edu.au/scripts/adbp-births-deaths.php";
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';

$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = getPage($url, $ch);

$dom = new DOMDocument();

$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("//ul[@class='pb-results'][1]/li/a");
$titles = $xpath->evaluate("//ul[@class='pb-results'][1]/li/a/text()");

echo "<?xml version='1.0'?>\n";
echo "<rss version='2.0'>\n";
echo "<channel>\n";
echo "\n";
echo "
echo "<description>A list of all those people in the Australian Dictionary of Biography who were born on this day.</description>\n";
for ($i = 0; $i < $hrefs->length; $i++) {
	$href = $hrefs->item($i);
	$title = $href->nodeValue;
	$bio = "";
	$url = "http://www.adb.online.anu.edu.au" . substr($href->getAttribute('href'),2);
	$html = getPage($url, $ch);
	$dom = new DOMDocument();
	$xpath = new DOMXPath($dom);
	$paras = $xpath->evaluate("//div[@id='content']/p[1]/text()");
	foreach ($paras as $para) {
		$bio .= $para->nodeValue;
	$bio .= "...";
	$bio = htmlspecialchars($bio, ENT_QUOTES);
	$bio = str_replace('\n', '', $bio);
	echo "<item>\n";
	echo "\n";
	echo "
	echo "<description>$bio</description>\n";
	echo "</item>\n";
echo "</channel>\n";