Virtual Strangers: e-Research and the Humanities

The Arts and Humanities have traditionally been worlds apart from Science and Technology in their ways of pursuing and generating knowledge and understanding. So much so that the famous term, ‘The Two Cultures’, coined in the mid twentieth century by C. P. Snow to describe the vast gap between these discipline areas, is still current and relevant.[i] It continues to dominate the organisation of disciplines in universities and drive the distribution of most national research funding. However, quite suddenly, at the end of the twentieth century, the digital environment began to trigger major changes in the knowledge economy, with the result that the humanities were thrown unexpectedly and involuntarily into a close relationship with technology. As one might expect in any forced marriage, it was not a case of love at first sight. In fact, the humanities have exhibited the full range of reactions—from totally ignoring the other, through unashamedly raiding their wealth, to wholeheartedly embracing the exciting future they seem to offer. Whatever the reaction, it is clear that the humanities are now inescapably entangled with technology, for better or worse, and the two cultures are connecting more than ever before, notably in the new research activities and spaces signalled by the term ‘e-research’.

Episode 36 – Tweeting into 2009

Tom and Dan kick off the new year by annoying Mills with tales of Twitter and tweets. In our newly extended news roundup, the panel looks at the use of Twitter at academic conferences; assesses the Palm Pre and the future of mobile apps for education, museums, and libraries; wonders about touch screens and the blind; thinks once again about the use of e-book readers on campus; discusses the end of Google Notebook and what it says about putting your research in services that might fail; debates the wisdom of putting academic articles on Wikipedia; and gives an update on Europeana, the EU digital library.

Other links for the episode:
Amanda French on the digital MLA experience
HearPlanet iPhone application
The American Association of History and Computing
ReframeIt and Web Annotation

Running time: 49:32
Download the .mp3

TREX 2008 Winners Announced

TREX08 TADA (the Text Analysis Developers’ Alliance, of which I’m the unofficial future former director) has announced winners of the 2008 T-REX Competition (for text analysis tools development and usage). The panel of judges reviewed the many submissions received and has recognized winners in five categories:

  • Best New Tool
    • Degrees of Connection by Susan Brown, Jeffery Antoniuk, Sharon Balazs, Patricia Clements, Isobel Grundy, Stan Ruecker
    • Ripper Browser by Alejandro Giacometti, Stan Ruecker, Ian Craig, Gerry Derksen
  • Best Idea for a New Tool
    • Magic Circle by Carlos Fiorentino, Stan Ruecker, Milena Radzikowska, Piotr Michura
  • Best Idea for Improving a Current Tool
    • Collocate Cloud by Dave Beavan
    • Throwing Bones by Kirsten C. Uszkalo
  • Best Idea for Improving the Interface of the TAPoR Portal
    • Bookmarklet for Immediate Text Analysis by Peter Organisciak
  • Best Experiment of Text Analysis Using High Performance Computing
    • Back-of-the-Book Index Generation by Patrick Juola

Congratulations to all winners and thanks to all participants! Watch this space for upcoming TADA events, including the next TREX Competition.

Digital Humanities Sessions at MLA 2008

A couple of days after returning from the MLA (Modern Language Association) conference, I ran into a biologist friend who had read about the “conference sex” panel at MLA.  She said,  “Wow, sometimes I doubt the relevance of my research, but that conference sounds ridiculous.” I’ve certainly had my moments of skepticism toward the larger purposes of literary research while sitting through dull conference sessions, but my MLA experience actually made me feel jazzed and hopeful about the humanities.  That’s because the sessions that I attended–mostly panels on the digital humanities–explored topics that seemed both intellectually rich and relevant to the contemporary moment.  For instance, panelists discussed the significance of networked reading, dealing with information abundance, new methods for conducting research such as macroanalysis and visualization, participatory learning, copyright challenges, the shift (?) to digital publishing, digital preservation, and collaborative editing.  Here are my somewhat sketchy notes about the MLA sessions I was able to attend; see great blog posts by Cathy Davidson, Matt Gold, Laura Mandell, Alex Reid, and John Jones for more reflections on MLA 2008.

1)    Seeing patterns in literary texts
At the session “Defoe, James, and Beerbohm: Computer-Assisted Criticism of Three Authors,” David Hoover noted that James scholars typically distinguish between his late and early work.  But what does that difference look like?  What evidence can we find of such a distinction? Hoover used computational/ statistical methods such as Principal Components Analysis and the T-test to examine the word choice in across James’ work and found some striking patterns illustrating that James’ diction during his early period was indeed quite different from his late period.   Hoover introduced the metaphor of computational approaches to literature serving either as a telescope (macroanalysis, discerning patterns across a large body of texts) or a microscope (looking closely at individual works or authors).

2)    New approaches to electronic editing

The ACH Guide to Digital-Humanities Talks at the 2008 MLA Convention lists at least 9 or 10 sessions concerned with editing or digital archives, and the Chronicle of Higher Ed dubbed digital editing as a “hot topic” for MLA 2008.   At the session on Scholarly Editing in the Twenty-First Century: Digital Media and Editing, Peter Robinson (whose paper was delivered by Jerome McGann and included passages referencing Jerome McGann) presented the idea of “Editing without walls,” shifting from a centralized model where a scholar acts as the “guide and guardian” who oversees work on an edition to a distributed, collaborative model.  With “community made editions,” a library would produce high quality images, researchers would transcribe those images, other researchers would collate the transcriptions, others would analyze the collations and add commentaries, etc. Work would be distributed and layered.  This approach opens up a number of questions: what incentives will researchers have to work on the project? How will the work be coordinated? Who will maintain the distributed edition for the long term?  But Robinson claimed that the approach would have significant advantages, including reduced cost and greater community investment in the editions.  Several European initiatives are already working on building tools and platforms similar to what Peter Shillingsburg calls “electronic knowledge sites,” including the Discovery Project, which aims to “explore how Semantic Web technology can help to create a state-of-the-art research and publishing environment for philosophy” and the Virtual Manuscript Room, which “will bring together digital resources related to manuscript materials (digital images, descriptions and other metadata, transcripts) in an environment which will permit libraries to add images, scholars to add and edit metadata and transcripts online, and users to access material.”

Matt Kirschenbaum then posed the provocative question if Shakespeare had a hard drive, what would scholars want to examine: when he began work on King Lear, how long he worked on it, what changes he made, what web sites he consulted while writing?  Of course, Shakespeare didn’t have a hard drive, but almost every writer working now uses a computer, so it’s possible to analyze a wide range of information about the writing process.  Invoking Tom Tanselle, Matt asked, “What are the dust jackets of the information age?” That is, what data do we want to preserve?  Discussing his exciting work with Alan Liu and Doug Reside to make available William Gibson’s Agrippa in emulation and as recorded on video in the early 1990s, Matt demonstrated how emulation can be used to simulate the original experience of this electronic poem.  He emphasized the importance of collaborating with non-academics–hackers, collectors, and even Agrippa’s original publisher–to learn about Agrippa’s history and make the poem available.  Matt then addressed digital preservation.  Even data designed to self-destruct is recoverable, but Matt expressed concern about cloud computing, where data exists on networked servers.  How will scholars get access to a writer’s email, Facebook updates, Google Docs, and other information stored online?  Matt pointed to several projects working on the problem of archiving electronic art and performances by interviewing artists about what’s essential and providing detailed descriptions of how they should be re-created: Forging the Future and Archiving the Avante Garde.
3)    Literary Studies in the Digital Age: A Methodological Primer

At the panel on Methodologies Literary Studies in the Digital Age, Ken Price discussed a forthcoming book that he is co-editing with Ray Siemens called Literary Studies in a Digital Age: A Methodological Primer.  The book, which is under consideration by MLA Press, will feature essays such as John Unsworth on electronic scholarly publishing, Tanya Clement on critical trends, David Hoover on textual analysis, Susan Schreibman on electronic editing, and Bill Kretzschmer on GIS, etc.   Several authors to be included in the volume—David Hoover, Alan Liu, and Susan Schreibman—spoke.

Hoover began with a provocative question: do we really want to get to 2.0, collaborative scholarship? He then described different models of textual analysis:
i.    the portal (e.g. MONK, TAPOR): typically a suite of simple tools; platform independent; not very customizable
ii.     desktop tools (e.g. TACT)
iii.    standardized software used for text analysis (e.g. Excel)

Next, Alan Liu  discussed his Transliteracies project, which examines the cultural practices of online reading and the ways in which reading changes in a digital environment (e.g. distant reading, sampling, and “networked public discourse,” with links, comments, trackback, etc).  The transformations in reading raise important questions, such as the relationship between expertise and networked public knowledge.  Liu pointed to a number of crucial research and development goals (both my notes and memory get pretty sketchy here):
1)    development of a standardized metadata scheme for annotating social networks
2)    data mining and annotating social computing
3)    reconciling metadata with writing systems
4)    information visualization for the contact zone between macro-analysis and close reading
5)    historical analysis of past paradigms for reading and writing
6)    authority-adjudicating systems to filter content
7)    institutional structures to encourage scholars to share and participate in new public knowledge

Finally, Susan Schreibman discussed electronic editions.  Among the first humanities folks drawn to the digital environment were editors, who recognized that electronic editions would allow them to overcome editorial challenges and present a history of the text over time, pushing beyond the limitations of the textual apparatus and representing each edition.  Initially the scholarly community focused on building single author editions such as the Blake and Whitman Archives.  Now the community is trying to get beyond siloed projects by building grid technologies to edit, search and display texts.  (See, for example, TextGrid,   Schreibman asked how we can use text encoding to “unleash the meanings of text that are not transparent” and encode themes or theories of text, then use tools such as TextArc or ManyEyes to engage in different spatial/temporal views.

A lively discussion of crowdsourcing and expert knowledge followed, hinging on the question of what the humanities have to offer in the digital age.  Some answers: historical perspective on past modes of reading, writing and research; methods for dealing with multiplicity, ambiguity and incomplete knowledge; providing expert knowledge about which text is the best to work with.  Panelists and participants envisioned new technologies and methods to support new literacies, such as the infrastructure that would enable scholars and readers to build their own editions; a “close-reading machine” based upon annotations that would enable someone to study, for example, dialogue in the novel; the ability to zoom out to see larger trends and zoom in to examine the details; the ability to examine “humanities in the age of total recall,” analyzing the text in a network of quotation and remixing; developing methods for dealing with what is unknowable.

4) Publishing and Cyberinfrastructure

At the panel on publishing and cyberinfrastracture moderated by Laura Mandell, Penny Kaiserling from the University of Virginia Press, Linda Bree from Cambridge UP, and Michael Lonegro from Johns Hopkins Press discussed the challenges that university presses are facing as they attempt to shift into the digital.  At Cambridge, print sales are currently subsidizing ebooks.  Change is slower than was envisioned ten years ago, more evolutionary than revolutionary.  All three publishers emphasized that publishers are unlikely to transform their publishing model unless academic institutions embrace electronic publication, accepting e-publishing for tenure and promotion and purchasing electronic works.  Ultimately, they said, it is up to the scholarly community to define what is valued.  Although the shift into electronic publishing of journals is significant, academic publishers’ experience lags in publishing monographs.  One challenge is that journals are typically bundled, but there isn’t such a model for bundling books.  Getting third party rights to illustrations and other copyrighted materials included in a book is another challenge.  Ultimately scholars will need to rethink the monograph, determining what is valuable (e.g. the coherence of an extended argument) and how it exists electronically, along with the benefits offered by social networking and analysis.  Although some in the audience challenged the publishers to take risks in initiating change themselves, the publishers emphasized that it is ultimately up to scholarly community.  The publishers also asked why the evaluation of scholarship depended on a university press constrained by economics rather than scholars themselves–that is, why professional review has been outsourced to the university press.

5) Copyright

The panel on Promoting the Useful Arts: Copyright, Fair Use, and the Digital Scholar, which was moderated by Steve Ramsay, featured Aileen Berg explaining the publishing industry’s view of copyright, Robin G. Schulze describing the nightmare of trying to get rights to publish an electronic edition of Marianne Moore’s notebooks, and Kari Kraus detailing how copyright and contract law make digital preservation difficult.  Schulze asked where the MLA was when copyright was extended through the Sony Bono Act, limiting what scholars can do, and said she is working on pre-1923 works to avoid the copyright nightmare.  Berg, who was a good sport to go before an audience not necessarily sympathetic to the publishing industry’s perspective, advised authors to exercise their own rights and negotiate their agreements rather than simply signing what is put before; often they can retain some rights.  Kraus discussed how licenses (such as click-through agreements) are further limiting how scholars can use intellectual works but noted some encouraging signs, such as the James Joyce estate’s settlement with a scholar allowing her to use copyrighted materials in her scholarship.  Attendees discussed ways that literature professors could become more active in challenging unfair copyright limitations, particularly through advocacy work and supporting groups such as the Electronic Frontier Foundation.

6) Humanities 2.0: Participatory Learning in an Age of Technology

The Humanities 2.0 panel featured three very interesting presentations about the projects funded through the MacArthur Digital Learning competition, as well Cathy Davidson’s overview of the competition and of HASTAC.  (For a fuller discussion of the session, see Cathy Davidson’s summary.) Davidson drew a distinction between “digital humanities,” which uses the digital technologies to enhance the mission of the humanities, and humanities 2.0, which “wants us to combine critical thinking about the use of technology in all aspects of social life and learning with creative design of future technologies” (Davidson).    Next Howard Rheingold discussed the “social media classroom,” which is “a free and open-source (Drupal-based) web service that provides teachers and learners with an integrated set of social media that each course can use for its own purposes—integrated forum, blog, comment, wiki, chat, social bookmarking, RSS, microblogging, widgets, and video commenting are the first set of tools.”  Todd Presner showcased the Hypercities project, a geotemporal interface for exploring and augmenting spaces.  Leveraging the Google Maps API and KML, HyperCities enable people to navigate and narrate their own past through space and time, adding their own markers to the map and experiencing different layers of time and space.  The project is working with citizens and students to add their own layers of information—images, narratives—to the maps, making available an otherwise hidden history.  Currently there are maps for Rome, LA, New York, and Berlin.   A key principle behind HyperCities is aggregating and integrating archives, moving away from silos of information. Finally, Greg Niemeyer and Antero Garcia presented, which is engaging students and citizens in tracking pollution using whimsically designed sensors that measure pollution.  Students tracked levels of pollution at different sites—including in their own classroom—and began taking action, investigating the causes of pollution and advocating for solutions.  What unified these projects was the belief that students and citizens have much to contribute in understanding and transforming their environments.

7. The Library of Google: Researching Scanned Books

What does Google Books mean for literary research?  Is Google Books more like a library or a research tool?  What kind of research is made possible by Google Books (GB)? What are GB’s limitations?  Such questions were discussed in a panel on Google Books that was moderated by Michael Hancher included Amanda French, Eleanor Shevlin, and me.  Amanda described how Google Books enabled her to find earlier sources on the history of the villanelle than she was able to locate pre-GB, Eleanor provided a book history perspective on GB, and I discussed the advantages and limitations of GB for  digital scholarship (my slides are available here).  A lively discussion among the 35 or so attendees followed; all but one person said that GB was, on balance, good for scholarship, although some people expressed concern that GB would replace interlibrary loan, indicated that they use GB mainly as a reference tool to find information in physical volumes, and emphasized the need to continue to consult physical books for bibliographic details such as illustrations and bindings.

8. Posters/Demonstrations: A Demonstration of Digital Poetry Archives and E-Criticism: New Critical Methods and Modalities

I was pleased to see the MLA feature two poster sessions, one on digital archives, one on digital research methods. Instead of just watching a presentation, attendees could engage in discussion with project developers and see how different archives and tools worked.  That kind of informal exchange allows people to form collaborations and have a more hands-on understanding of the digital humanities. (I didn’t take notes and the sessions occurred in the evening, when my brain begins to shut down, so my summary is pretty unsophisticated: wow, cool.)

Reflections on MLA

This was my first MLA and, despite having to leave home smack in the middle of the holidays, I enjoyed it.   Although many sessions that I attended shifted away from the “read your paper aloud when people are perfectly capable of reading it themselves” model, I noted the MLA’s requirement that authors bring three copies of their paper to provide upon request, which raises the question what if you don’t have a paper (just Powerpoint slides or notes) and why can’t you share electronically? And why doesn’t the MLA  provide fuller descriptions of the sessions besides just title and speakers?  (Or am I just not looking in the right place?)  Sure, in the paper era that would mean the conference issue of PMLA would be several volumes thick, but if the information were online there would be a much richer record of each session.  (Or you could enlist bloggers or twitterers [tweeters?] to summarize each session…) After attending THAT Camp, I’m a fan of the unconference model, which fosters the kind of engagement that conferences should be all about—conversation, brainstorming, and problem-solving rather than passive listening.  But lively discussions often do take place during the Q & A period and in the hallways after the sessions (and who knows what takes place elsewhere…)

Episode 35 – Top Ten of 2008

Dan, Mills, and Tom round out 2008 with the top ten most significant stories, trends, and technologies of the year. The regulars discuss how netbooks, Google Books, e-books, and iPhones made 2008 a year to remember. What will make the list in 2009? The regulars offer some predictions as well.

Running time: 51:30
Download the .mp3

Episode 34 – Extra, Extra!

This Thanksgiving week in the U.S. we have a cornucopia of news, starting with the reaction of Harvard to the Google Book Search settlement and including the end of email service for students at Boston College and two efforts to create an “academic Google.” We also launch a new segment, “We Told You So,” to gloat over the predicted death of Google’s virtual world, Lively, and over continuing problems in Second Life. Picks for this episode include a new site on place-based computing, a couple of easy (or bizarre) ways to write a book, and an easy-to-learn programming language.

Links mentioned on the podcast:
Harvard on Google Book Search settlement
Lively No More
“Eric Reuters” on Second Life
Boston College Will Stop Offering New Students E-Mail Accounts
Reference Extract
Google SearchWiki
Processing 1.0
Place-based Computing

Running time: 44:27
Download the .mp3

Episode 33 – Classroom Action Settlement

The big news this week was the announcement that a settlement had been reached between Google and authors and publishers over Google’s controversial Book Search program, which has scanned over seven million volumes, including many books that are still copyrighted. The Digital Campus team takes a first pass at the agreement and tries to understand how it might affect higher ed. Other news from a busy week include the release of the first phone based on Google’s Android operating system, and Microsoft’s conversion to “cloud” computing. Picks for this podcast include a new report on teenagers and videogames, a new version of Linux for the masses, and a program to help you focus on the Mac.

Links mentioned on the podcast:
Google Book Search Settlement Agreement
Open Library
Think for the Mac
Microsoft Azure
Pew report on teens and videogames

Running time: 49:29
Download the .mp3

The Google Settlement – From the Universal Library to the Universal Bookstore

If you think about it, a universal bookstore is a pretty cool idea. Bookstores are wonderful things. Anyone can walk into bookstore, take a book off a shelf, read in it, decide whether to buy it or forget about it, or get it from the library. The settlement announced today by Google, the Association of American Publishers, and the Authors Guild will in time make it possible for millions of books, currently out of print and in-copyright, to be perused, searched and purchased (or not) in an electronic bookstore that will be operated by Google.

The books will come from a number of academic libraries, including the University of Michigan, the University of California, and Stanford University, which have been participants Google Book Search from the beginning, These three worked with Google during the settlement negotiations in an effort to shape the settlement to serve the interests of research libraries and the public, as discussed in a joint press release.

The settlement is complicated, and as people work through it I expect a lively set of discussions and I invite comment on this blog and elsewhere. I’d like to start with what I see as a couple of key points.

First, and foremost, the settlement continues to allow the libraries to retain control of digital copies of works that Google has scanned in connection with the digitization projects. We continue to be responsible for our own collections. Moreover, we will be able to make research uses of our own collections. The huge investments that universities have made in their libraries over a century and more will continue to benefit those universities and the academy more broadly.

Second, the settlement provides a mechanism that will make these collections widely available. Many, including me, would have been delighted if the outcome of the lawsuit had been a ringing affirmation of the fair use rights that Google had asserted as a defense. (My inexpert opinion is that Google’s position would and should have prevailed.) But even a win for Google would have left the libraries unable to have full use of their digitized collections of in-copyright materials on behalf of their own campuses or the broader public. We would have been able, perhaps, to show snippets, as Google has being doing, but it would have been a plain violation of copyright law to allow our users full access to the digitized texts. Making the digitized collections broadly usable would have required negotiations with rightsholders, in some cases book by book, and publisher by publisher. I’m confident that we would have gotten there in time, serving the interests of all parties. But “in time” would surely have been many years, and the clock would have started only at the end of a lawsuit that had many years left to run. Moreover, each library would have had to negotiate use rights to its own collection, still leaving us a long way from a collection of digitized collections that we could all share.

The settlement cuts through this morass. As the product develops, academic libraries will be able to license not only their own digitized works but everyone else’s. Michigan’s faculty and students will be able to read Stanford and California’s digitized books, as well as Michigan’s own. I never doubted that we were going to have to pay rightsholders in order to have reading access to digitized copies of works that are in-copyright. Under the settlement, academic libraries will pay, but will do so without having to bear large and repeated transaction costs. (Of course, saving on transaction costs won’t be of much value if the basic price is too high, but I expect that the prices will be reasonable, both because there is helpful language in the settlement and because of my reading of the relevant markets.)

The settlement is not perfect, of course. It is reminiscent, however, of the original promise of the Google Book project: what once looked impossible or impossibly distant now looks possible in a relatively short period of time. Faculty, students, and other readers will be able to browse the collections of the world ‘s great libraries from their desks and from their breakfast tables. That’s pretty cool.

Digital Fabric, Narrative Threads

The traditional crafts of quilting, embroidering and weaving may appear to be a world away from the high tech fields of computer networking, digital interface design, and database development. However, the old and new are increasingly being linked through metaphors that reveal a great deal about changing attitudes to digital technologies as they become more established and widely accessible [...] Today’s communication networks are structured around “patchwork” designs, software glitches are fixed with “patches,” computer processors are being described as “multi-threaded,” and over the past decade other “material metaphors” have been embraced as a means of conceptualising and giving form to our new world of amorphous digital texts. In particular, the quilt motif has been used in a variety of ways, including as a means of visualising interaction and information flows and as a template for digital interface design.

“Less than perfect” is not always bad

In a recent paper prepared for the Boston Library Consortium, Richard Johnson decries the fact that some mass digitization arrangements between libraries and corporations have been “less than perfect.”

The choices that we face are indeed less than perfect. We can choose purity and perfection, and not permit any restrictions on the use of scans of public domain material, with the result that the rate of scanning and consequent display will be pitifully slow. Or we can permit corporate entities, including dreaded Google, to scan our works, enabling millions of public domain works to be made available to readers on line, at no cost to the readers, in a relatively short period of time. I am on record by word and deed as preferring the second choice.

In his paper, Johnson notes that the original works are retained by the libraries and could be scanned again. He fails to note that libraries whose PD works are scanned by Google get to keep a copy of the scans and are free to display them on line, independent of Google Book Search. Over 300,000 public domain works can be found in the University of Michigan catalog and read on line. The number grows by thousands per week. Of course I would prefer it if the digital files could be used without restriction. Would someone please tell me the name of the entity that stands ready to digitize our collections, for free, without restriction on the use of the digital files? In the meantime, it seems to me that making the books available to readers online makes for a better world, albeit, sadly, not a perfect one.

And, this just in, an article by Kalev Leetaru in First Monday that compares Google Book Search and the Open Content Alliance and finds much that is both good and less than perfect in both.