Academic crowdsourcing – feedback loops

This year’s summer reading included Jon Ronson’s So you’ve been publicly shamed, a journalistic investigation into why the internet has become so fond of collaring those who transgress its unwritten rules and tearing them apart. His case studies include Jonah Lehrer, the writer found to have fabricated quotes by Bob Dylan, Lindsey Stone, who was inadvisably photographed flipping the bird in Arlington National Cemetery, and Justine Sacco, the advertising exec who, while en route to South Africa, tweeted a “joke” about hoping she didn’t catch AIDS as she was white. All three became transient global hate-figures, with tens of thousands of tweets and comments raining shame down upon them. Ronson’s book is a readable and engaging romp through what are, of course, deadly serious issues for contemporary digital culture. However his conclusion interested me: he contends that the modern day version of the village stocks he describes is down to “feedback loops”. Ronson urges us to disregard the theories of Gustave Le Bon (one of Goebbels’s favourite theoreticians) and Philip Zimbardo, conceiver of the notorious Stanford Prison Experiment, who argue that mass hatred and hysteria are spread from node to node within the crowd through some process of broadly defined “contagion”. Rather, says Ronson, internet users copy what they see happening – a version of the “information cascade” theory of James Surowiecki, which I have blogged about before. So when tens of thousands of Twitter users piled into the wretched Sacco for example, it was because they had seen others doing so, resulting in a collective assurance that it was “right”. Ronson underscores this with the observation of how dramatically effective are signs attached to speed limit signs which automatically flash motorists’ current speed. This instant feedback, devoid on any actual consequence or punishment – dramatically cuts instances of speeding.

Successful academic crowdsourcing projects, as I and other have argued elsewhere, depend for their success on the relationships they create with their volunteers. I believe there is some reason to believe that Ronson’s logic can be applied here too – i.e. where both non-professional volunteers and professional project instigators are exposed controlled feedback loops. Lasecki et. al. for example argue that crowds can self-learn through the correct application of mechanical tasks which are tightly regulated and controlled on platforms such as Mechanical Turk. The feedback loop is that the task has been performed correctly or incorrectly. Other volunteers report learning from each other via discussion forums, learning from good practice. Others go on to create Wikipedia pages around the content they have worked on – although whether Wikipedia is crowdsourcing or something else, such as community participation is another matter (a distinction succinctly made on this blog post).

Those of us who have researched crowdsourcing over the last few years often get hung up on semantics and labels; and I am guilty as charged — I have found myself having far longer conversations that the subject justifies (which is how much?) over whether crowdsourcing should a hyphen or not. I think that considering the attributes of what makes crowdsourcing crowdsourcing, as opposed to something else, is more useful. An effort to characterize what makes “good” or “productive” feedback loops – as opposed to wild and unconstrained ones which destroyed Lehrer, Stone and Sacco – might be a good place to start.

GW4 Archives: exploring UK Medical Heritage Library and Historical Texts as data

In recent years hack-days have been all the rage and have proved a good vehicle for interactions between people who normally might not work together. In academia there has been a trend towards running so-called ‘labs’. The word implies experimentation; hack-day tends to imply coding (it can be experimental!), whereas ‘lab’ suggests that it can be about experimental thinking, without necessarily needing to lead to the production of code. Code can still be an output of course, but that is not the main point of running a lab. It’s much more about the ideas.

Under the banner of UK Medical Heritage Library we at Jisc have been undertaking some events we call ‘Live Labs’ during which we work with academics and student to explore how we deliver the thousands of 19C medical texts which are located on our Historical Texts website. The labs set out to challenge the idea that these texts are just a bunch of digitised old books. They form part of the web, and, even if they might not be currently represented in the form of linked data, they are linked through their metadata to texts in other places on the network and through the access and use of them for academic discourse they become apparent to new audiences.

The events also seek to identify new ways in which people want to interact with the content in its various manifestations: individual item, corpus, aggregation, links in a web of related data, the metadata itself, as images and so forth. Participants also want to investigate the effectiveness of various web interface, discovery tools and the data itself and the relationship between the book as physical object and its electronic manifestation.

The labs are not an end in themselves and we hope to develop some case studies which draw on the initial insights the labs provide. The case studies will ultimately act as way-pointers for the onward development of our Historical Texts service but should also inform the wider dialogue about the usefulness of electronic archives.

Recently, I was pleased to be able to work with the GW4 Archives team to put together a great program of activities focused on the content of our Historical Texts service including the UK Medical Heritage Library content. The GW4 Archives team decided to call their event a hack-day which meant that people probably expected to be getting their hands dirty with code. In the end no one wrote any, but they did have a deep dive into the fabric of the Historical Texts site and were also able to explore the possibilities of taking content and working with it by using freely available tools and Wikimedia platforms such as Wikidata, WikeSource and Wikipedia.  Colleagues from Cardiff University, the University of Bath and the University of Bristol convened in the John Percival Building at Cardiff University for a jam packed day.

We had run a Live Lab at the launch of UK Medical Heritage Library with Owen Stephens and we were pleased that Owen was again able to lead a session in Cardiff. After some contextual comments from Anthony Mandel, a brief presentation from Keir Waddington (a member of the UKMHL Advisory Group), some remarks form me and an introduction from Leah Tether, Owen was able to demonstrate the possibilities offered by the Historical Texts collections: Eebo, Ecco, British Library 19C Books and UKMHL. Owen unpacked a range of possibilities for investigating the content as pure data, as images and as a cross searchable resources and explored the various interfaces which provide access to these various content types. Participants came up with ideas for ways of improving the search, suggestions for making better use of the UKMHL visualisation tools. They also identified content they might be able to explore in their own research. We gained lots of feedback on how the content is delivered and now understand more about the ways in which researchers think about these archives as ways to identify content for their research, as a means of eliminating items from that research but also as a means of locating the physical artefact and supporting decisions about visiting a particular library to see something in the flesh. Some felt they would like to use the Historical Texts API but that it would need more documentation or training to allow them to do that. Some felt they would like to develop more skills in using these technologies and also perhaps start doing their own coding.

Digital skills development was a major concern of many of those taking part. Overall people were excited by the extent of the UKMHL corpus and suggested that it should be the starting point for anyone wanting to explore 19C advances in medicine. We also looked at tools which already exist in the environment such as Voyant Tools, Open Refine, the Programming Historian and Library Carpentry to enable people to work with Historical Texts content. This strand of Owens session brought us nicely to another set of openly available resources, those provided by the Wikimedia Foundation.

We were fortunate that Martin Poulter was available to take people through the use of platforms provided by Wikimedia. Martin is currently working with the Oxford University after a stint as the Bodleian Wikimedian in residence. He was previously Jisc’s Wikimedian (see our guide: Crowdsourcing – the wiki way of working) and he focuses on the use of these tools to support the dissemination of academic knowledge. The idea is to encourage academics to get involved in contributing to amazing free resources such WIKIDATA and Wikisource. The benefit being that the general public can become aware of new knowledge which has emerged though academic research.

After and introduction from Jenny Kidd, Martin asked participants to work on a text which had been imported from the UKMHL corpus into WikiSource as raw OCR (Optical Character Recognition) text, each person being allocated a page to edit. By the end of the session, we had a corrected publication sitting on WikiSource. Owen had already looked in some detail at issues around dirty OCR on UKMHL and the ability to create a clean version of a text on WikiSource revealed the power of this freely available technology.

Martin was also able to demonstrate how original research, though not directly present on Wikimedia platforms, can be shared through making reference to papers in article texts, by creating stand-alone wikies and links made in Wikipedia entries to external resource where these are appropriate. He also showed how a historic document can reach a wider audience through being represented on crowdsourcing platforms.

Once a paper has been published and reference in other academic works there is no reason why it can’t be added to Wikipedia as an article as long as the article follows Wikipedia conventions. Plos Computational Biology journal is a good example of how this can work. When articles are published in the journal, a matching article of record is produced and which follows the Wikipedia style and these are uploaded onto Wikipedia to fill gaps in its Computational Biology section. They are then available for editing in the normal wiki way. Martin also highlighted an initiatives which allow digitised material to be added by cultural organisation to Wikicommons and WikiSource. We had a look at WIKIDATA and explored how it is creating a web of data by enabling links to open data sets. Wikicommons the repository of reusable media files, should also form a valuable source of photographs, maps and audio clips for academics and student. We created our own timelines in Histropedia and tried out the new referencing tool in Wikipedia. It will make referencing much simpler in future. So, we came away with a clear idea of what we could do with UKMHL texts using these tools. Martin kindly did a Storify of the day.

At the end there was a feedback session during which participants discussed the usefulness of the structured data on WIKIDATA, the tools available on Historical texts, the need to think about ethics in relation to contributors and readers in a shared knowledge environment and how interacting with these kinds of collections, for example visually, is changing the nature of scholarship.

Such events bring our attention to the increasing need for skills in managing humanities research and for those skills necessary to the effective deliver of teaching. If we are to level the playing field between STEM and humanities subjects we need to recognise that effective delivery of the humanities can and should lead to students leaving institutions with a high level of digital skills and by implication those teaching humanities subjects need to be supported by infrastructures that enables the development of their own skills and maximises their ability to teach well in an electronic environment populated by archives, libraries and tools.

Share and Enjoy

FacebookTwitterDeliciousLinkedInStumbleUponAdd to favoritesEmailRSS

Blog Diss: Crowdsourcing Reading Lists, Parts Three and Four-New Media/Digital Humanities and Oppositional Media Studies

Greetings once more for the exciting third instalment of my crowdsourced reading lists! Today’s list is a bit longer because it includes two of my fields; it simply felt too arbitrary to attempt to decide whether many works I’ve included (particularly those I haven’t read) belonged in the New Media/DH or the Oppositional Media Studies field, so I elected to post them together.

As always, comments, suggestions, and interventions are welcome!


New Media/Digital Humanities and Oppositional Media Studies

read more

Creating community governance for THATCamp

As the period of Mellon Foundation funding for THATCamp nears its March 31st, 2014 end date, it becomes time to set up a community-driven means of managing the overall THATCamp project. I won’t bother you too much yet with my thoughts about what it has meant to me to be the THATCamp Coordinator over the last four years, but I will just say here that it’s been a pleasure and a privilege.

The task of turning THATCamp over to the community is in one sense utterly simple: it’s already a radically decentralized project, and there are plenty of THATCamps I have literally nothing to do with. In another sense, though, it’s hard — maybe the hardest task I’ve yet faced as THATCamp Coordinator. This is something I want very much to do right. I’ve therefore spent quite a bit of time thinking about how to do it, helped by an initial consultation session last October at THATCamp Leadership. I also read Jono Bacon’s The Art of Community, which gives practical advice from the perspective of the Ubuntu development community, and even got a bit of help from @jonobacon himself.

The result of all that study is the below document, a 3-page draft THATCamp Council Charter that describes a system of elections and governance. And now here comes the begging: please comment on the charter by March 10, 2014. You can use the regular blog comment box here titled “Leave a reply” to let us know if the system herein described looks good to you. (Don’t forget to scroll.) I’m particularly interested in how to ensure a diverse Council: I had thought about instituting quotas of some kind dealing with race, gender, country, rank, and so on, but frankly the math got too complex too quickly because of all the variables that could attach to any of the seven members: I wouldn’t want a Council with six white male American tenured professors and one black female Belgian grad student. We might want slightly more specific guidelines than those I’ve outlined here, though. My ears are open.