Ethan Gruber

Dec 162010

Our NEH-funded Neatline project has inspired the Scholars’ Lab to develop or enhance several new Omeka plugins recently. (See our full list.)

One of these is FedoraConnector, which is designed to enable administrators to attach Fedora datastreams (a digital object — whether image, XML like TEI or EAD, or video) to Omeka items. This is fundamentally different from attaching files to an item–the datastream is not duplicated and stored within Omeka’s archive. Rather, a reference to the Fedora object (PID) is stored within a new table in the Omeka database that associates the item with the URL of the datastream that is accessed (and rendered) with Fedora’s REST API. The plugin also supports importing Dublin Core and MODS metadata into the DC Element Set in Omeka. The importers can be extended to map from any metadata standard into DC.

The benefit to this architecture is that it enables dynamic rendering of the most current version of the Fedora object, and thus there is no issue about storing duplicate files in the Omeka disk space that can be deprecated by updates to the original Fedora object. Additionally, FedoraConnector can take advantage of institutional-specific services that are developed for delivering content. For example, thumbnail and medium-sized page images are rendered in real time by querying the University of Virginia Library’s JPEG2000 server and requesting deliverables at a specific dimension. Disseminators, or handler functions for rendering Fedora content based on mime-type and/or datastream type, are extensible.

TEI document from Fedora

TEI document from Fedora

Earlier this year, we released a beta version of a plugin for rendering TEI files into HTML within Omeka. Called TeiDisplay, this plugin was enhanced by the insertion of several hooks that execute FedoraConnector functions (if FedoraConnector is installed) to render TEI XML datastreams on the fly directly from the repository. TeiDisplay supports, as the documentation for the plugin indicates, selection of customized XSLT stylesheets and two display types: entire document and segmental view (with table of contents and by-section rendering). Indeed, documents coming from Fedora can be rendered dynamically with the same set of options.

But what about indexing the document? This is why the Scholars’ Lab developed SolrSearch last summer to replace Omeka’s default mySQL search with the more advanced search options afforded by Solr, an open source search index. SolrSearch supports facets, sorting, hit highlighting, and a handful of other options. Originally designed to index the full text of Omeka files with a text/xml mime-type, SolrSearch was enhanced to index the full text of Fedora datastreams with a text/xml mime-type as well, enabling full text searching, faceted browsing, and hit highlighting of the aforementioned TEI files referenced from a repository.


Solr search of TEI file in Omeka

So in essense, the range of plugins the Scholars’ Lab has created for Omeka can enable creation of attractive and cutting-edge public user interfaces for collections of Fedora objects. Coupled with our Neatline plugins, which are all about geospatial and temporal interpretation of archival collections, this work bridges a well-recognized gap between the volume of digital content housed in sophisticated repositories and the curators, scholars, and end users who seek access to it and wish to interpret it in online exhibits.

Sep 102010

One of the most vital tools that computers bestow upon the humanities scholar is the ability to rapidly sort and group data that are relevant to the scholar’s own research needs. A digital collection of several thousand artifacts is useful, but it is even more useful if, for example, the user can filter the results for lithographs created or published by a certain person or corporate identity. Omeka’s built-in search mechanism is fairly simple, and it may suffice for most collections, but it may also fall short of providing the kind of advanced querying abilities that scholars are growing accustomed to with other digital collections, such as Northwestern’s Winterton Collection or modern library catalogs such as the one released publicly here at the University of Virginia Library in July. Apache Solr is an open-source Java-based search index that provides this functionality.

Folks in the Scholars’ Lab and other U.Va. Library departments have been using Solr for a number of years. I have used it for nearly a dozen different projects since 2007, when Bess Sadler (now with Stanford’s Digital Library Systems and Services group) introduced it to the department. About two months ago, I began work on a Solr plugin for Omeka which would post public collection items to a Solr index. The search results then would be rendered in the public theme. A table in the Omeka database contains all of the elements that the user may select as facets, displayable fields, or sortable fields, and the user may check boxes in a form in the administrative panel to customize the Solr results. Collections, item types, and tags may also be selected as facet, displayable, or sortable fields, and thumbnail images may be displayed in the search results. The simple admin interface to the variety of Solr options outlined above can transform your Omeka collection into a great resource that visitors can manipulate to meet their own research interests.

Yesterday, I released SolrSearch 0.9. In this most recent version of the plugin, text nodes from XML files attached to items are indexed for full text searching. SolrSearch, then, is an important plugin to install in conjunction with TeiDisplay, a plugin the Scholars’ Lab developed for rendering Text Encoding Initiative (TEI) XML files. Therefore, not only can a user read TEI transcriptions of textual works, but search the collection for words or phrases in these works as well. SolrSearch will feature a hit highlighting option in a future version so that the user may see their search keywords in context.

I know of at least one institution that is using SolrSearch (at least, in an experimental state) for their collection, so hopefully as more people begin to use it, a larger developer group can form around adapting Solr features to Omeka. Solr is useful for controlled vocabulary services, and it would be great to maximize the application’s capabilities.

Jul 242010

Several months ago, I wrote a post about my XForms development in the Scholars’ Lab as part of a research project. I’m currently working on two research projects that utilize the standard: EADitor (Encoded Archival Description management and dissemination framework) and Numishare (geared towards online delivery of numismatic collections, though other artifacts can be represented). Despite its promise, XForms has not quite swept up the library world yet (though it is most definitely generating some buzz). The W3C standard is a definition for creating dynamic webforms that handle complex, hierarchical XML data–the type of stuff libraries deal with daily. However, only in recent years have XForms processors matured to the point they are ready for mass-market consumption. There are numerous private firms developing XForms applications, including Wachovia, Cisco, and Pfizer. It is also used to some degree in the academic community. As far as I am aware, not many institutions are running it in production, though some are rapidly moving in that direction. The XForms4Lib listserv created in the fall has 80 members from across North American and European academia.

Which brings me to my point.

Matt Zumwalt, active code4lib member and Ruby on Rails/institutional repository developer, boldly declared XForms to be dead. I offer this critique:

There are some inaccuracies in this post that I would like to address. First of all, HTML 5 forms do not supplant XForms as an option for collecting user inputted data. HTML5 is much simpler, and thus has broader appeal. XForms enables the creation of much, much more complex models, with far more sophisticated controls and validation. Moreover, if XForms was a dead language in January 2008, with the release of the HTML5 specification, and that IBM had dropped support, then why do you suppose the XForms 1.1 specification was released in October 2009, edited by a representative of IBM?

No, XForms is very much alive. It has a small, but very active community, which is especially visible with the Orbeon development community. XForms is best used as a definition of dynamic forms that are processed server-side, not in the browser (which pushes a lot of processing demand onto the user, which isn’t good). There are some good, open source frameworks out there. Orbeon is the best, and has many users from both industry and academia, including Pfizer, Leap Frog, Wachovia, UCSB, Stanford, and the National Archives. In fact, Orbeon XForms applications form a large part of the enormous workflow of the NARA Electronic Records Archives project, which is a multi-year project contracted to Lockheed Martin and has a financial backing of close to a half a billion dollars (I have heard). XForms, dead?

A lot of the design flaws you describe are in actuality implementation flaws. Development of a Rails-based framework seems to me like an enormous waste of time and money. You can adapt the MODS editor developed by Brown to such a task. It has already been proven that you can interact with metadata delivered through REST from a Fedora repository. And MODS is fairly simple as a a metadata standard. Care to take a stab at TEI or EAD?

When you began your research in 2007, Orbeon was a fairly young application. But the standard and its delivery and processing applications have evolved since then. Only in the last two or three years has XForms grown into a viable solution. Moreover, since it is a W3C standard, you can pick your forms up and migrate them to a new framework fairly easily. Is your Rails application sustainable in the long term? Are today’s jQuery functions going to work in 2015′s browsers? These are things you need to consider when contemplating a web form standard.

Fedora is a Tomcat application. So is Solr. So is adore-djatoka, which UVA/Hydra utilizes for jpeg2000 delivery. And so is Orbeon. ActiveFedora and any Rails-based MODS editor seem to me like the third wheel in the repository relationship. But in all seriousness, the sustainability of a boutique Rails application that is heavily dependent on the javascript functions of 2010 should be a serious concern to repository developers. jQuery is all the rage today, but it could blow away in the wind five years from now. This is the very thing that the XForms working group set out to prevent when they introduced a standard approach to dynamic webforms.