Aug 182010
 

The following was a response I made in an email exchange with Tom Elliot of the Pleiades Project and Bethany Nowviskie. Our conversation was prompted by Tom’s inquiry on planning, budgeting for, and conducting a code review as part of a grant-funded project. What follows is a slightly modified (and expanded) version of that email conversation.

Testing and code review is something that has been on my mind a lot lately as our shop has been shifting its focus from boutique, one-off projects, to building upon frameworks maintained by other organizations. As these code bases continue to grow, we need to ensure that subtle changes to the core functionality of the underlying systems do not propagate into bugs in our code. We also need a way to handle this situation quickly and efficiently when this does arise. This was especially reinforced by two recent projects our group undertook to migrate nearly decade-old software on to new servers.

If you ask anyone in the office, they will most likely roll their eyes when I start beating the testing drum. These are great tools for not only generating pretty green and red bar charts, but also documenting the intention of the programmer in writing the code, and zeroing in on bugs where they occur without weeks of hunting. However, this is only one of the tools in the chest for writing solid code, sans bugs. In fact, there are a lot of sophisticated, freely available, automated tools that help programmers of all skill levels not only write more consistent code, but also zero in on potential performance issues and just plain smelly code (that they obviously wrote just to get running and fully intended to go back and fix later).

Over the years, tools that measure code complexity (like PMD, PHPMD, and flog), code dependency analyzers (JDepend, PHPDepend, and rcov), copy/paste detection (in PMD, flay, and phpcpd), and enforcing coding standards (a la PHPCode Sniffer and rails_best_practices), along with not only unit and integration tests (in whatever style you choose), but a code coverage analysis reports that provides feedback on which lines were executed, go a long way in reducing the number of bugs in code. These tools are really pre-emptive step in writing stronger, more elegant, and ultimately more sustainable code, all before once gets to the point of performing a human code review.

While I don’t need to be building software per-se, I have started experimenting with the Hudson continuous integration server as a dashboard to get a quick snapshot of how these different metric tests all play together in the code that our team writes. It is no longer good enough to simply have code functioning, we need the code to pass certain thresholds of quality and sustainability before we can release. Where we find issues in the code, like test coverage, high cyclomatic complexity, lots of copy-n-pasted code, or high volatility in dependency scans, we can sit down and perform a rather focused mini code review (resembling the pair-programming idiom) on that section of code to refactor a better solution or approach To this end, we’re currently working on a set of baseline testing and reporting tools for our projects. Currently, we have Ant scripts for our PHP and Java projects, and a gem bundle for Rails and Sinatra projects.

While we take this approach in the Scholars’ Lab, we were wondering if there were others out there that had opinions or experiences to share about code review during development? If you do, leave a comment, write a post, or tweet at us (@scholarslab, @nowviskie, @wayne_graham) — and at @paregorios, who started the conversation in the first place. We’d love to hear about your best practices (and even horror stories) and philosophy on what constitutes good software and useful code reviewing, including whether you think current trends in open source development constitute a good-enough review for DH projects.

Further Resources

Java

PHP

Ruby

Javascript

Continuous Integration

Issue Tracking

Jul 242010
 

Several months ago, I wrote a post about my XForms development in the Scholars’ Lab as part of a research project. I’m currently working on two research projects that utilize the standard: EADitor (Encoded Archival Description management and dissemination framework) and Numishare (geared towards online delivery of numismatic collections, though other artifacts can be represented). Despite its promise, XForms has not quite swept up the library world yet (though it is most definitely generating some buzz). The W3C standard is a definition for creating dynamic webforms that handle complex, hierarchical XML data–the type of stuff libraries deal with daily. However, only in recent years have XForms processors matured to the point they are ready for mass-market consumption. There are numerous private firms developing XForms applications, including Wachovia, Cisco, and Pfizer. It is also used to some degree in the academic community. As far as I am aware, not many institutions are running it in production, though some are rapidly moving in that direction. The XForms4Lib listserv created in the fall has 80 members from across North American and European academia.

Which brings me to my point.

Matt Zumwalt, active code4lib member and Ruby on Rails/institutional repository developer, boldly declared XForms to be dead. I offer this critique:

There are some inaccuracies in this post that I would like to address. First of all, HTML 5 forms do not supplant XForms as an option for collecting user inputted data. HTML5 is much simpler, and thus has broader appeal. XForms enables the creation of much, much more complex models, with far more sophisticated controls and validation. Moreover, if XForms was a dead language in January 2008, with the release of the HTML5 specification, and that IBM had dropped support, then why do you suppose the XForms 1.1 specification was released in October 2009, edited by a representative of IBM?

No, XForms is very much alive. It has a small, but very active community, which is especially visible with the Orbeon development community. XForms is best used as a definition of dynamic forms that are processed server-side, not in the browser (which pushes a lot of processing demand onto the user, which isn’t good). There are some good, open source frameworks out there. Orbeon is the best, and has many users from both industry and academia, including Pfizer, Leap Frog, Wachovia, UCSB, Stanford, and the National Archives. In fact, Orbeon XForms applications form a large part of the enormous workflow of the NARA Electronic Records Archives project, which is a multi-year project contracted to Lockheed Martin and has a financial backing of close to a half a billion dollars (I have heard). XForms, dead?

A lot of the design flaws you describe are in actuality implementation flaws. Development of a Rails-based framework seems to me like an enormous waste of time and money. You can adapt the MODS editor developed by Brown to such a task. It has already been proven that you can interact with metadata delivered through REST from a Fedora repository. And MODS is fairly simple as a a metadata standard. Care to take a stab at TEI or EAD?

When you began your research in 2007, Orbeon was a fairly young application. But the standard and its delivery and processing applications have evolved since then. Only in the last two or three years has XForms grown into a viable solution. Moreover, since it is a W3C standard, you can pick your forms up and migrate them to a new framework fairly easily. Is your Rails application sustainable in the long term? Are today’s jQuery functions going to work in 2015′s browsers? These are things you need to consider when contemplating a web form standard.

Fedora is a Tomcat application. So is Solr. So is adore-djatoka, which UVA/Hydra utilizes for jpeg2000 delivery. And so is Orbeon. ActiveFedora and any Rails-based MODS editor seem to me like the third wheel in the repository relationship. But in all seriousness, the sustainability of a boutique Rails application that is heavily dependent on the javascript functions of 2010 should be a serious concern to repository developers. jQuery is all the rage today, but it could blow away in the wind five years from now. This is the very thing that the XForms working group set out to prevent when they introduced a standard approach to dynamic webforms.

Jun 302010
 

In our work on Neatline, we have made a deliberate choice to start by constraining ourselves to map-sources that are quickly and easily provided through WMS. This leaves out (for now) two popular sources of map imagery; Google Maps and Open Street Map. I’m going to explain why we made that choice, and why, when we do come to make these sources usable with Neatline, we will do so with great care and with an eye to scholarly method.

All two-dimensional maps (as opposed to globes) are projected. That is, the curved three-dimensional surface of the Earth is transformed onto a flat two-dimensional surface. This can be done in an infinite variety of ways, many of which have been mathematically characterized and named by cartographers, for whom they are necessary tools. We must note, however, that no such transform can obtain a perfect representation of a section of the Earth. The mapmaker must choose which qualities to preserve and in what measures. Is it more important to provide an accurate depiction of relative areas or of relative lengths? Is the area around Greenland to be kept in the focus of accuracy, or that around New Zealand?

Each map therefore carries with it from its creation certain choices like these, part of the arguments the map makes about the world by its very construction. We chose WMS on which to start building our tools because, amongst other reasons, it allows for the transmission of projection information as part of its operation. This fact allows us to produce imagery from historical maps (themselves in any number of projections) and maintain the original choices the mapmaker made. Google Maps and Open Street Map are not WMS sources. They can be described as tile caches, huge reservoirs of rendered imagery. As such, they offer their own choices about how the world is to be projected. (Google’s choice has become so closely associated with Google that it is known widely as “the Google projection”.)

Now we come to an important technical distinction; WMS services are able (depending on the capabilities of the specific software in use) to reproject their contents. That is, in response to a specific request for imagery, they can produce the imagery in a projection different from the one in which it was stored. GeoServer, the software we are using for Neatline, has a library of thousands of projections to which users can add more as desired. This allows us to take imagery from a WMS source and lay it under a historical map layer while maintaining the original projection for that of the map as a whole. Tile caches, by and large, do not allow for this. (Google Maps offers its one projection, and Open Street Map offers two.) This means that in order to lay historical map imagery over a layer from one of these sources, we would have to reproject the foreground (historical imagery) overriding the choices of the mapmaker and introducing additional choices of our own about what facets of the geographies at stake are to be preserved and which abandoned.

(Neogeographers will remark that georectifying a digital image introduces similar issues. This is true, but unavoidable for our purposes. We would like to avoid compounding the matter in a way that is subtle and hard to detect.)

We are working out means by which we can provide the undeniable utility of popular tilecaching services in a way that is respectful of the historical context and story of map artifacts. Until we do, we will continue to concentrate on the more flexible and sophisticated apparatus provided by WMS.