Digital Humanities in 2008, Part I

When I wrote a series of blog posts last year summarizing developments in digital humanities, a friend joked that I had just signed on to do the same thing every year.  So here’s my synthesis of digital humanities in 2008, delivered a little later than I intended. (Darn life, getting in the way of blogging!) This post, the first in a series, will focus on the emergence of digital humanities (DH), defining DH and its significance, and community-building efforts.   Subsequent posts will look at developments in research, open education, scholarly communication, mass digitization, and tools.   Caveat lector:  this series reflects the perspective of an English Ph.D. with a background in text encoding and interest in digital scholarship working at a U.S. library who wishes she knew and understood all but surely doesn’t.  Please  add comments and questions.

1.    The Emergence of the Digital Humanities

This year several leaders in digital humanities declared its “emergence.”  At one of the first Bamboo workshops, John Unsworth pointed to the high number of participants and developments in digital humanities since work on the ACLS Cyberinfrastructure report (Our Cultural Commonwealth) began 5 years earlier and noted “we have in fact reached emergence… we are now at a moment when real change seems possible.”  Likewise, Stan Katz commented in a blog post called “The Emergence of the Digital Humanities,” “Much remains to be done, and campus-based inattention to the humanities complicates the task. But the digital humanities are here to stay, and they bear close watching.”

Termite Cathedral (Wikipedia)

Emergence: Termite Cathedral (Wikipedia)

Last year I blogged about the emergence of digital humanities and I suspect I will the next few years as well, but digital humanities did seem to gain momentum and visibility in 2008.  For me, a key sign of the DH’s emergence came when the NEH transformed the Digital Humanities Initiative into the Office of Digital Humanities (ODH), signaling the significance of the “digital” to humanities scholarship.  After the office was established, Inside Higher Ed noted in“Rise of the Digital NEH” that what had been a “grassroots movement” was attracting funding and developing “organizational structure.”  Establishing the ODH gave credibility to an emerging field (discipline? methodology?).  When you’re trying to make the case that your work in digital humanities should count for tenure and promotion, it certainly doesn’t hurt to point out that it’s funded by the NEH.  The ODH acts not only as a funder (of 89 projects to date), but also a facilitator, convening conversations, listening actively, and encouraging digital humanities folks to “Keep innovating.” Recognizing that digital humanities works occurs across disciplinary and national boundaries, the ODH collaborates with funding agencies in other countries such as the UK’s JISC, Canada’s Social Sciences and Humanities Research Council (SSHRC), and Germany’s DFG; US agencies such as NSF, IMLS and DOE; and non-profits such as CLIR.  Although the ODH has a small staff (three people) and limited funds, I’ve been impressed by how much this knowledgeable, entrepreneurial team has been able to accomplish, such as launching initiatives focused on data mining and high performance computing, advocating for the digital humanities, providing seed funding for innovative projects, and sponsoring institutes on advanced topics in the digital humanities.

It also seemed like there were more digital humanities jobs in 2008, or at least more job postings that listed digital humanities as a desired specialization.  Of course, the economic downturn may limit not only the number of DH jobs, but also the funding available to pursue complex projects–or, here’s hoping, it may lead to funding for scanner-ready research infrastructure projects.

2.    Defining “digital humanities”

Perhaps another sign of emergence is the effort to figure out just what the beast is.  Several essays and dialogues published in 2008 explore and make the case for the digital humanities; a few use the term “promise,” suggesting that the digital humanities is full of potential but not yet fully realized.

  • The Promise of Digital History,” a conversation among Dan Cohen, Michael Frisch, Patrick Gallagher, Steven Mintz, Kirsten Sword, Amy Murrell Taylor, Will Thomas III, and Bill Turkel published in the Journal of American History.  This fascinating, wide-ranging discussion explores defining digital history; developing new methodological approaches; teaching both skills and an understanding of the significance of new media for history; coping with impermanence and fluidity; sustaining collaborations; expanding the audience for history; confronting institutional and cultural resistance to digital history; and much more. Whew! One of the most fascinating discussion threads: Is digital history a method, field, or medium?  If digital history is a method, then all historians need to acquire basic knowledge of it; if it is a medium, then it offers a new form for historical thinking, one that supports networked collaboration.  Participants argued that digital history is not just about algorithmic analysis, but also about collaboration, networking, and using new media to explore historical ideas.
  • In “Humanities 2.0: Promise, Perils, Predictions”  (subscription required, but see Participatory Learning and the New Humanities: An Interview with Cathy Davidson for related ideas), Cathy Davidson argues that the humanities, which offers strengths in “historical perspective, interpretative skill, critical analysis, and narrative form,” should be integral to the information age.  She calls for humanists to acknowledge and engage with the transformational potential of technology for teaching, research and writing.
    Extra Credit, by ptufts

    Extra Credit, by ptufts

    Describing how access to research materials online has changed research, she cites a colleague’s joke that work done before the emergence of digital archives should be emblazoned with an “Extra Credit” sticker.  Now we are moving into “Humanities 2.0,” characterized by networked participation, collaboration, and interaction.  For instance, scholars might open up an essay for criticism and commentary using a tool such as CommentPress, or they might collaborate on multinational, multilingual teaching and research projects, such as the Law in Slavery and Freedom Project.   Yet Davidson acknowledges the “perils” posed by information technology, particularly monopolistic, corporate control of information.   Davidson contributes to the conversation about digital humanities by emphasizing the importance of a critical understanding of information technology and advocating for a scholarship of engagement and participation.

  • In “Something Called ‘Digital Humanities’”, Wendell Piez challenges William Deresiewicz’s dismissal of “something called digital humanities” (as well as of “Contemporary lit, global lit, ethnic American lit; creative writing, film, ecocriticism”).  Piez argues that just as Renaissance “scholar-technologists” such as Aldus Manutius helped to create print culture, so digital humanists focus on both understanding and creating digital media. As we ponder the role of the humanities in society, perhaps digital humanities, which both enables new modes of communicating with the larger community and critically reflects on emerging media, provides one model for engagement.

3.    Community and collaboration

According to Our Cultural Commonwealth, “facilitat[ing] collaboration” is one of the five key goals for the humanities cyberinfrastructure.   Although this goal faces cultural, organizational, financial, and technical obstacles, several recent efforts are trying to articulate and address these challenges.

To facilitate collaboration, Our Cultural Commonwealth calls for developing a network of research centers that provide both technical and subject expertise.  In A Survey of Digital Humanities Centers in the United States, Diane Zorich inventories the governance, organizational structures, funding models, missions, projects, and research at existing DH centers.  She describes such centers as being at a turning point, reaching a point of maturity but facing challenges in sustaining themselves and preserving digital content.  Zorich acknowledges the innovative work many digital humanities centers have been doing, but calls for greater coordination among centers so that they can break out of siloes, tackle common issues such as digital preservation, and build shared services.   Such coordination is already underway through groups such as CenterNet and HASTAC, collaborative research projects funded by the NEH and other agencies, cyberinfrastructure planning projects such as Bamboo, and informal partnerships among centers.

How to achieve greater coordination among “Humanities Research Centers” was also the topic of the Sixth Scholarly Communications Instititute (SCI), which used the Zorich report as a starting point for discussion.   The SCI report looks at challenges facing both traditional humanities centers, as they engage with new media and try to become “agents of change,” and digital humanities centers, as they struggle to “move from experimentation to normalization” attain stability (6).   According to the report, humanities centers should facilitate “more engagement with methods,” discuss what counts as scholarship, and coordinate activities with each other.  Through my Twitter feeds, I understand that the SCI meeting seems to be yielding results: CenterNet and the Consortium of Humanities Centers and Institutes (CHCI) are now discussing possible collaboratiions, such as postdocs in digital humanities.

Likewise, Bamboo is bringing together humanities researchers, computer scientists, information technologists, and librarians to discuss developing shared technology services in support of arts and humanities researchers.  Since April 2008, Bamboo has convened three workshops to define scholarly practices, examine challenges, and plan for the humanities cyberinfrastructure.  I haven’t been involved with Bamboo (beyond partnering with them to add information to the Digital Research Tools wiki), so I am not the most authoritative commentator, but I think that involving a wide community in defining scholarly needs and developing technology services just makes sense–it prevents replication, leverages common resources, and ultimately, one hopes, makes it easier to perform and sustain research using digital tools and resources.  The challenge, of course, is how to move from talk to action, especially given current economic constraints and the mission creep that is probably inevitable with planning activities that involve over 300 people.  To tackle implementation issues, Bamboo has set up eight working groups that are addressing topics like education, scholarly networking, tools and content, and shared services. I’m eager to see what Bamboo comes up with.

Planning for the cyberinfrastructure and coordinating activities among humanities centers are important activities, but playing with tools and ideas among fellow digital humanists is fun!  (Well, I guess planning and coordination can be fun, too, but a different kind of fun.)  This June, the Center for New Media in History hosted its first THATCamp (The Humanities and

Dork Shorts at THAT Camp

Dork Shorts at THAT Camp

Technology Camp), a “user-generated,” organically organized “unconference” (very Web 2.0/ open source).  Rather than developing an agenda prior to the conference, the organizers asked each participant to blog about his or her interests, then devoted the first session to setting up sessions based on what participants wanted to discuss.  Instead of passively listening to three speakers read papers, each person who attended a session was asked to participate actively.  Topics included Teaching Digital Humanities, Making Things (Bill Turkel’s Arduino workshop), Visualization, Infrastructure and Sustainability, and the charmingly titled Dork Shorts, where THAT Campers briefly demonstrated their projects. THAT Camp drew a diversity of folks–faculty, graduate students, librarians, programmers, information technologists, funders, etc.  The conference used technology effectively to stir up and sustain energy and ideas—the blog posts before the conference helped the attendees set some common topics for discussion, and  Twitter provided a backchannel during the conference.   Sure,  a couple sessions meandered a bit, but I’ve never been to a conference where people were so excited to be there, so engaged and open.  I bet many collaborations and bright ideas were hatched at THAT Camp.  This year, THAT Camp will be expanded and will take place right after Digital Humanities 2009.

THAT Camp got me hooked on Twitter.  Initially a Twitter skeptic (gawd, do I need another way to procrastinate?), I’ve found that it’s great way to find out what’s going on digital humanities and connect with others who have similar interests.  I love Barbara Ganley’s line (via Dan Cohen): “blog to reflect, Tweet to connect.”  If you’re interesting in Twittering but aren’t sure how to get started, I’d suggest following digital humanities folks and the some of the people they follow.  You can also search for particular topics at  Amanda French has written a couple of great posts about Twitter as a vehicle for scholarly conversation, and a recent Digital Campus podcast features a discussion among Tweeters Dan Cohen and Tom Scheinfeldt and skeptic Mills Kelly.

HASTAC offers another model for collaboration by establishing a virtual network of people and organizations interested in digital humanities and sponsoring online forums (hosted by graduate and undergraduate students) and other community-building activities.  Currently HASTAC is running a lively, rich forum on the future of the digital humanities featuring Brett Bobley, director of the NEH’s ODH.  Check it out!

Google, Robert Darnton, and the Digital Republic of Letters

Robert Darnton recently published an essay in the New York Review of Books on the Google settlement. There has been much commentary in blogs, listserves, and print media. Below I reproduce a letter that I sent to the New York Review of Books, that they found to be far too long to publish. It is my understanding that they expect to publish a much-shortened revision. In any case, here’s what I had to say.


To the editors:

My colleague and friend Robert Darnton is a marvelous historian and an elegant writer. His utopian vision of a digital infrastructure for a new Republic of Letters (Google and the Future of Books, NYRB Feb. 12) makes the spirit soar. But his idea that there was any possibility that Congress and the Library of Congress might have implemented that vision in the 1990s is a utopian fantasy. At the same time, his view of the world that will likely emerge as a result of Google’s scanning of copyrighted works is a dystopian fantasy.

The Congress that Darnton imagines providing both money and changes in law that would have made out-of-print but in-copyright works (the great majority of print works published in the 20th century) digitally available on reasonable terms showed no interest in doing anything of the kind. Rather, it passed the Digital Millennium Copyright Act and the Sonny Bono Copyright Term Extension Act. (More recently, Congress passed the Higher Education Opportunity Act, which compels academic institutions to police the electronic environment for copyright infringement). This record is unsurprising; the committees that write copyright law are dominated by representatives who are beholden to Hollywood and other rights holders. Their idea of the Republic of Letters is one in which everyone who ever reads, listens, or views pretty much anything should pay to do so, every time.

The Supreme Court, which was given the opportunity to limit the extension of the term of copyright, which was already far too long (like Darnton, I think that 14 years renewable once is more than enough to achieve the purposes of copyright) refused to do so (with only two dissenters) in Eldred v. Ashcroft, decided in 2003. Instead, it upheld legislation that, contrary to the fundamental principles of copyright, provided rewards to authors who are long dead, preventing our cultural heritage from rising into the public domain,

In short, over the last decade and more, public policy has been consistently worse than useless in helping to make most of the works of the 20th century searchable and usable in digital form. This is the alternative against which we should evaluate Google Book Search and Google’s settlement with publishers and authors.

First, we should remember that until Google announced in 2004 that it was going to digitize the collections of a number of the world’s largest academic libraries, absolutely no one had a plan for mass digitization at the requisite scale. Well-endowed libraries, including Harvard and the University of Michigan, were embarked on digitization efforts at rates of less than ten thousand volumes per year. Google completely shifted the discussion to tens of thousands of volumes per week, with the result that overnight the impossible goal of digitizing (almost) everything became possible. We tend to think now that mass digitization is easy. Less than five years ago we thought it was impossibly expensive.

The heart of Darnton’s dystopian fantasy about the Google settlement follows directly from his view that “Google will enjoy what can only be called a monopoly … of access to information.” But Google doesn’t have anything like a monopoly over access to information in general, nor to the information in the books that are subject to the terms of the settlement. For a start (and of stunning public benefit in itself) up to 20% of the content of the books will be openly readable by anybody with an Internet connection, and all of the content will be indexed and searchable. Moreover, Google is required to provide the familiar “find it in a library” link for all books offered in the commercial product. That is, if after reading 20 percent of a book a user wants more and finds the price of on-line access to be too high, the reader will be shown a list of libraries that have the book, and can go to one of those libraries or employ inter-library loan. This greatly weakens the market power of Google’s product. Indeed, it is much better than the current state affairs, in which users of Google Book Search can read only snippets, not 20% of a book, when deciding whether what they’ve found is what they seek.

Darnton is also concerned that Google will employ the rapacious pricing strategies used by many publishers of current scientific literature, to the great cost of academic libraries, their universities, and, at least as important, potential users who are simply without access. But the market characteristics of current articles in science and technology are fundamentally different from those of the vast corpus of out-of-print literature that is held in university libraries and that will constitute the bulk of the works that Google will sell for the rights holders under the settlement agreement. The production of current scholarship in the sciences requires reliable and immediate access to the current literature. One cannot publish, nor get grants, without such access. The publishers know it, and they price accordingly. In particular the prices of individual articles are very high, supporting the outrageously expensive site licenses that are paid by universities. In contrast, because there are many ways of getting access to most of the books that Google will sell under the settlement, the consumer price will almost surely be fairly low, which will in turn lead to low prices for the site licenses. Again, “find it in a library,” coupled with extensive free preview, could not be more different than the business practices employed by many publishers of scientific, technical and medical journals.

There is another reason to believe that prices will not be “unfair”, which is that Google is far more interested in getting people to “google” pretty much everything than it is in making money through direct sales. The way to get people to come to the literature through Google is make it easy and rewarding to do so. For works in the public domain, Google already provides free access and will continue to do so. For works in the settlement, a well-designed interface, 20 percent preview, and reasonable prices are all likely to be part of the package. Additionally, libraries that don’t subscribe to the product will have a free public terminal accessible to their users. This increases the public good deriving from settlement both directly and by providing yet another distribution channel that does not require payment to Google or the rightsholders.

The settlement is far from perfect. The American practice of making public policy by private lawsuit is very far from perfect. But in the absence of the settlement – even if Google had prevailed against the suits by the publishers and authors – we would not have the digitized infrastructure to support the 21st century Republic of Letters. We would have indexes and snippets and no way to read any substantial amount of any of the millions of works at stake on line. The settlement gives us free preview of an enormous amount of content, and the promise of easy access to the rest, thereby greatly advancing the public good.

Of course I would prefer the universal library, but I am pretty happy about the universal bookstore. After all, bookstores are fine places to read books, and then to decide whether to buy them or go to the library to read some more.

Paul N. Courant

Note: This letter represents my personal views and not those of the University of Michigan, nor any of its libraries or departments.


So I was thinking, wouldn’t it be nice if the Australian Dictionary of Biography’s ‘born on this day‘ feature could be made available as an RSS feed. Every morning you’d get a new list of biographies delivered direct to your feed reader. And so…

[sounds of xpath wrangling and PHP coding]

here it is.

It’s pretty simple – it harvests all the links of people born on the current day, then loops through the links to gather the first paragraph of each biography. Then it’s just a matter of writing everything to an RSS file.

In case you missed it, I also created a Media RSS feed for portrait images used in the ADB. This enables them to be viewed in CoolIris.

Code follows…

function getPage($url, $ch) {
	curl_setopt($ch, CURLOPT_URL,$url);
	$html= curl_exec($ch);
	if (!$html) {
		echo "cURL error number:" .curl_errno($ch);
		echo "cURL error:" . curl_error($ch);
	return $html;
$url = "";
$userAgent = 'Googlebot/2.1 (';

$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = getPage($url, $ch);

$dom = new DOMDocument();

$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("//ul[@class='pb-results'][1]/li/a");
$titles = $xpath->evaluate("//ul[@class='pb-results'][1]/li/a/text()");

echo "<?xml version='1.0'?>\n";
echo "<rss version='2.0'>\n";
echo "<channel>\n";
echo "\n";
echo "
echo "<description>A list of all those people in the Australian Dictionary of Biography who were born on this day.</description>\n";
for ($i = 0; $i < $hrefs->length; $i++) {
	$href = $hrefs->item($i);
	$title = $href->nodeValue;
	$bio = "";
	$url = "" . substr($href->getAttribute('href'),2);
	$html = getPage($url, $ch);
	$dom = new DOMDocument();
	$xpath = new DOMXPath($dom);
	$paras = $xpath->evaluate("//div[@id='content']/p[1]/text()");
	foreach ($paras as $para) {
		$bio .= $para->nodeValue;
	$bio .= "...";
	$bio = htmlspecialchars($bio, ENT_QUOTES);
	$bio = str_replace('\n', '', $bio);
	echo "<item>\n";
	echo "\n";
	echo "
	echo "<description>$bio</description>\n";
	echo "</item>\n";
echo "</channel>\n";

Episode 37 – Material Culture

Aside from the technical challenges of moving museums online, there’s the cultural challenge of squaring the curator’s focus on the actual, authentic object with the free-for-all, non-hierarchical nature of the web. That’s the tension addressed in the feature story on this episode, a follow-up to concerns expressed at the Smithsonian 2.0 conference. We’re lucky to be joined in the discussion by Sharon Leon, Director of Public Projects at the Center for History and New Media. In the news roundup, we assemble our own stimulus package, talk about Creative Commons on the White House website, look at the impact of Gmail going offline, and debate a possible change to Wikipedia’s moderation policy. Picks include a new grant, Omeka training, museum awards, and (despite protests by Mills) a Twitter client.

Links mentioned on the podcast:
Broadband, Computers Part of Stimulus Package
Wikipedia Co-Founder Calls for Major New Moderation Policy
New White House Copyright Policy
Smithsonian 2.0
National Postal Museum’s Arago website
Best of the Web at the Museums and the Web 2009 meeting
Digging into Data Challenge
Omeka Workshops
Gmail Goes Offline

Running time: 45:14
Download the mp3

Virtual Strangers: e-Research and the Humanities

The Arts and Humanities have traditionally been worlds apart from Science and Technology in their ways of pursuing and generating knowledge and understanding. So much so that the famous term, ‘The Two Cultures’, coined in the mid twentieth century by C. P. Snow to describe the vast gap between these discipline areas, is still current and relevant.[i] It continues to dominate the organisation of disciplines in universities and drive the distribution of most national research funding. However, quite suddenly, at the end of the twentieth century, the digital environment began to trigger major changes in the knowledge economy, with the result that the humanities were thrown unexpectedly and involuntarily into a close relationship with technology. As one might expect in any forced marriage, it was not a case of love at first sight. In fact, the humanities have exhibited the full range of reactions—from totally ignoring the other, through unashamedly raiding their wealth, to wholeheartedly embracing the exciting future they seem to offer. Whatever the reaction, it is clear that the humanities are now inescapably entangled with technology, for better or worse, and the two cultures are connecting more than ever before, notably in the new research activities and spaces signalled by the term ‘e-research’.

Episode 36 – Tweeting into 2009

Tom and Dan kick off the new year by annoying Mills with tales of Twitter and tweets. In our newly extended news roundup, the panel looks at the use of Twitter at academic conferences; assesses the Palm Pre and the future of mobile apps for education, museums, and libraries; wonders about touch screens and the blind; thinks once again about the use of e-book readers on campus; discusses the end of Google Notebook and what it says about putting your research in services that might fail; debates the wisdom of putting academic articles on Wikipedia; and gives an update on Europeana, the EU digital library.

Other links for the episode:
Amanda French on the digital MLA experience
HearPlanet iPhone application
The American Association of History and Computing
ReframeIt and Web Annotation

Running time: 49:32
Download the .mp3

TREX 2008 Winners Announced

TREX08 TADA (the Text Analysis Developers’ Alliance, of which I’m the unofficial future former director) has announced winners of the 2008 T-REX Competition (for text analysis tools development and usage). The panel of judges reviewed the many submissions received and has recognized winners in five categories:

  • Best New Tool
    • Degrees of Connection by Susan Brown, Jeffery Antoniuk, Sharon Balazs, Patricia Clements, Isobel Grundy, Stan Ruecker
    • Ripper Browser by Alejandro Giacometti, Stan Ruecker, Ian Craig, Gerry Derksen
  • Best Idea for a New Tool
    • Magic Circle by Carlos Fiorentino, Stan Ruecker, Milena Radzikowska, Piotr Michura
  • Best Idea for Improving a Current Tool
    • Collocate Cloud by Dave Beavan
    • Throwing Bones by Kirsten C. Uszkalo
  • Best Idea for Improving the Interface of the TAPoR Portal
    • Bookmarklet for Immediate Text Analysis by Peter Organisciak
  • Best Experiment of Text Analysis Using High Performance Computing
    • Back-of-the-Book Index Generation by Patrick Juola

Congratulations to all winners and thanks to all participants! Watch this space for upcoming TADA events, including the next TREX Competition.