Susan Thomas

Nov 132012
 
I do worry from time to time that textual analogue records will come to suffer from their lack of searchability when compared with their born-digital peers. For those records that have been digitised, crowd-sourcing transcription could be an answer. A rather neat example of just that is the arcHIVE platform from the National Archives of Australia. arHIVE is a pilot from NAA's labs which allows anyone to contribute to the transcription of records. To get started they have chosen a selection of records from their Brisbane office which are 'known to be popular'. Not too many of them just yet, but at this stage I guess they're just trying to prove the concept works. All the items have been OCR-ed, and users can choose to improve or overwrite the results from the OCR process. There are lots of nice features here, including the ability to choose documents by a difficulty rating (easy, medium or hard) or by type (a description of the series by the looks of it). The competitive may be inspired by the presence of a leader board, while the more collaborative may appreciate the ability to do as much as you can, and leave the transcription for someone else to finish up later. You can register for access to some features, but you don't have to either. Very nice.

Atlas of digital damages

 Uncategorized  Comments Off
Oct 202012
 
An Atlas of digital damage has been created on Flickr, which will provide a handy resource for illustrating where digital preservation has failed. Perhaps 'failed' is a little strong. In some cases the imperfection may be an acceptable trade off. A nice, and useful, idea. Contribute here.
Oct 142012
 
Yesterday was Day of Digital Archives 2012! (And yes, I'm a little late posting...)

This 'Day' was initiated last year to encourage those working with digital archives to use social media to raise awareness of digital archives: "By collectively documenting what we do, we will be answering questions like: What are digital archives? Who uses them? How are they created and managed? Why are they important?" . So in that spirit, here is a whizz through my week.

Coincidentally not only does this week include the Day of Digital Archives but it's also the week that the Digital Preservation Coalition (or DPC) celebrated its 10th birthday. On Monday afternoon I went to the reception at the House of Lords to celebrate that landmark anniversary. A lovely event, during which the shortlist for the three digital preservation awards was announced. It's great to see three award categories this time around, including one that takes a longer view: 'the most outstanding contribution to digital preservation in the last decade'. That's quite an accolade.

On the train journey home from the awards I found some quiet time to review a guidance document on the subject of acquiring born-digital materials. There is something about being on a train that puts my brain in the right mode for this kind of work. Nearing its final form, this guidance is the result of a collaboration between colleagues from a handful of archive repositories. The document will be out for further review before too long, and if we've been successful in our work it should prove helpful to creators, donors, dealers and repositories.

Part of Tuesday I spent reviewing oral history guidance drafted by a colleague to support the efforts of Oxford Medical Alumni in recording interviews with significant figures in the world of Oxford medicine. Oral histories come to us in both analogue and digital formats these days, and we try to digitise the former as and when we can. The development of the guidance is in the context of our Saving Oxford Medicine initiative to capture important sources for the recent history of medicine in Oxford. One of the core activities of this initiative is survey work, and it is notable that many archives surveyed include plenty of digital material. Web archiving is another element of the 'capturing' work that the Saving Oxford Medicine team has been doing, and you can see what has been archived to-date via Archive-It, our web archiving service provider.

Much of Wednesday morning was given over to a meeting of our building committee, which had very little to do with digital archives! In the afternoon, however, we were pleased to welcome visitors from MIT - Nancy McGovern and Kari Smith. I find visits like these are one of the most important ways of sharing information, experiences and know-how, and as always I got a lot out of it. I hope Nancy and Kari did too! That same afternoon, colleagues returned from a trip to London to collect another tranche of a personal archive. I'm not sure if this instalment contains much in the way of digital material, but previous ones have included hundreds of floppies and optical media, some zip discs and two hard disks. Also arriving on Wednesday, some digital Library records courtesy of our newly retired Executive Secretary; these supplement materials uploaded to BEAM (our digital archives repository) last week.

On Thursday, I found some time to work with developer Carl Wilson on our SPRUCE-funded project. Becky Nielsen (our recent trainee, now studying at Glasgow) kicked off this short project with Carl, following on from her collaboration with Peter May at a SPRUCE mashup in Glasgow. I'm picking up some of the latter stages of testing and feedback work now Becky's started her studies. The development process has been an agile one with lots of chat and testing. I've found this very productive - it's motivating to see things evolving, and to be able to provide feedback early and often. For now you can see what's going on at github here, but this link will likely change once we settle on a name that's more useful than 'spruce-beam' (doesn't tell you much, does it?! Something to with trees...) One of the primary aims of this tool is to facilitate collection analysis, so we know better what our holdings are in terms of format and content. We expect that it will be useful to others, and there will be more info. on it available soon.

Friday was more SPRUCE work with Carl, among other things. Also a few meetings today - one around funding and service models for digital archiving, and a meeting of the Bodleian's eLegal Deposit Group (where my special interest is web archiving). The curious can read more about e-legal deposit at the DCMS website.  One fun thing that came out of the day was that the Saving Oxford Medicine team decided to participate in a Women in Science wikipedia editathon. This will be hosted by the Radcliffe Science Library on 26 October as part of a series of 'Engage' events on social media organised by the Bodleian and the University's Computing Services. It's fascinating to contemplate how the range and content of Wikipedia articles change over time, something a web archive would facilitate perhaps. 

For more on working with digital archives, go take a look at the great posts at the Day of Digital Archives blog!

Day of Digital Archives, 2011

 Uncategorized  Comments Off
Oct 062011
 
Today is officially 'Day of Digital Archives' 2011! Well, it's been quite a busy week on the digital archives front here at the Bodleian...

The week began with the arrival of our new digital archives graduate trainee, Rebecca Nielsen. During her year here with us, the majority of Rebecca's work will be on digital archives of one kind or another, she'll be archiving all sorts, from materials arriving on old floppies to web sites on the live web.

Another of my colleagues, Matthew Neely, has been spending quite a bit of time this week working on the archive of Oxford don, John Barton. The archive includes over 150 floppies and a hard disk as well as hard-copy papers and photographs.


Barton's digital material was captured in our processing lab back in the Spring of 2010, and now Matthew is busy using Forensic Toolkit software to appraise, arrange and describe the digital content alongside the papers. There are a few older word-processing formats in the collection, but all things that we can handle.

We've also been having conversations with quite a few archive depositors this week, about scoping collections and transfer mechanisms, among other things. There has been some planning work too, while we consider the requirements for processing the archive of Sir Walter Bodmer, which includes around 300 disks (3.5" and 5.25"). For more on the Bodmer archive see the Library's Special Collections blog, The Conveyor.

Today, I've spent a little time looking at our 'Publication Pathway' and thinking about where we need a few tweaks. This is the process and toolset that we are building to publish our digital archives to users (Pete called it CollectionBuilder, and you can have a look at a slightly out-of-date version of it here: http://sourceforge.net/projects/beamcollectionb/). We have a bit more work to do on this and our user interface, but quite a bit of material in the pipeline waiting to get out to our users.

To close out the week, two of our webarchiving pilot group are heading off to the DPC's The Future of the Past of The Web event tomorrow, to learn more about the state of the art in webarchiving.

Lastly, I can't resist returning to the start of the week. On Monday, we had a power cut and temporarily lost access to Bodleian Electronic Archives and Manuscripts (BEAM) services. An unsubtle reminder that digital archives require lots of things to remain accessible, power being one of them!
Jul 272011
 
http://www.flickr.com/photos/fensterbme/1990023423/
Interesting to see Killian Escobedo's post on digital video preservation over at the Smithsonian Archives' visual archives blog. Our trainee, Emma, is working on questions of these sort at the moment as we start to develop strategies for preserving the vast amount of born-digital video being deposited in our archive collections. While there's quite a lot of material out there on digitising analogue video, we've found a real shortage of guidance on the management of born-digital video collections. With that in mind I'd be really interested in hearing how other folks are dealing with this kind of material. Can you give us any pointers? At the moment we're particularly interested in learning more about existing practices, good tools, realistic workflows, and preservation-grade standards (for metadata and content - which ones and why?).

So, what kind of digital video do we have? It's a good question, and one I can't answer fully for the moment. What I can say is that our collections include digital video deposited on CDs, DVDs, Bluray discs, miniDV and mediumDV cassettes, and hard disks. Much of this material has yet to be captured from its original media so we don't have that inventory of codecs, wrapper formats, frame rates, metadata, etc. that Killian talks about. This kind of detailed survey work is a next step for us, but one that will have to wait until we have developed a workflow for initial capture (bit-level preservation comes first). I wonder if we'll see the same diversity of technical characteristics present in the Smithsonian's materials. It seems likely.
Apr 212011
 
3 inch Disks (Mitsumi 'Quick Disk')

Type:
Magnetic storage media
Introduced:
?1985
Active:
Unlikely.
Cessation:
Used in the 1980s.
Capacity:
?128KB - 256KB
Compatibility:
Requires a 3” drive appropriate to the manufacturer's specifications.
Users:
Likely to have been individual users and small organisations. Used for word-processing, music and gaming.
File Systems:
Unknown. May vary according to use. The disks were manufactured by Mitsumi and offered as OEM to resellers and used in a range of contexts including Nintendo (Famicom), various MIDI keyboards/samplers (Roland) and the Smith Corona Personal Word Processor (PWP).
Common Manufacturers:
Disks: Mitsumi appear to have made the magnetic disk (the innards), while other manufacturers made the cases. This resulted in different case shapes and labelling. For example Sharp Corona labelled the disks as DataDisk 2.8"
Drives: Mitsumi?

Recognition
The Smith Corona Personal Word Processor (PWP) variant of the disk is double sided with one side being labelled ‘A’ and the other ‘B’. Each side also had a dedicated write-protect hole, known as a 'breakout lug'.

2.8" Smith Corona 'Quick Disk'
3.5" floppy side-by-side with a 2.8" Smith Corona 'Quick Disk'
Nintendo Famicon disk
Some rights reserved by bochalla



High Level Formatting
Unknown. Possibly varied according to use.

3 Inch Disk Drives
Varied according to disk. The Smith Corona word processing disks are most likely to turn up in an archival collection. These were used in a Smith Corona PWP and possible models nos. include: 3,5,6, 6BL, 7, X15,X25, 40, 50LT, 55D, 60, 65D, 75D, 80, 85DLT, 100, 100C, 220, 230, 250, 270LT, 300, 350, 355, 960, 990, 2000, 2100, 3000, 3100, 5000, 5100, 7000LT, DeVille 3, DeVille 300, Mark X, Mark XXX, Mark XL LT. 

Lego mockup of a Nintendo Famicon drive
Some rights reserved by kelvin255
   
Useful links
http://www.cromwell-intl.com/technical/quickdisk-recovery.html
http://en.wikipedia.org/wiki/History_of_the_floppy_disk