
Have you hosted a workshop this year? Or attended one? Did it work? Or totally tank?
We're interested in building a kit to help you organize HASTAC-related events at your own campus or instiution. We'd love to know about:

Have you hosted a workshop this year? Or attended one? Did it work? Or totally tank?
We're interested in building a kit to help you organize HASTAC-related events at your own campus or instiution. We'd love to know about:
The question ’what is the digital humanities’ is hardly new; nor is discussion of the various epistemologies of which the digital humanities are made. However, the relationship which archaeology has with the digital humanities – whatever the epistemology of either – has been curiously lacking. Perhaps this is because archaeology has such strong and independent digital traditions, and such a set of well-understood quantitative methods, that the close analysis of of those traditions – familiar to readers of Humanist, say – seem redundant. However, at the excellent CAA International conference in Southampton last week, there was a dedicated round-table session on the ‘Digital Humanities/Archaeology Venn Diagram’, in which I was a participant. This session highlighted that the situation is far more nuanced and complex that it first seems. As is so often the case with digital humanities.
A Venn Diagram, of course, assumes two or more discrete groups of objects, where some objects contain the attributes of only one group, and others share attributes of multiple groups. So – assuming that one can draw a Venn loop big enough to contain the digital humanities – what objects do they share with archaeology? As I have not been the first to point out, digital humanities is mainly concerned with methods. This, indeed, was the basis of Short and McCarty’s famous diagram. The full title of CAA – Computer Applications and Quantitative Methods in Archaeology – suggests that a methodological focus is one such object shared by both groups. However unlike the digital humanities, archaeology is concerned with a well defined set of questions. Most if not all, of these questions derive from ‘what happened in the past?’. Invariably the answers lie, in turn, in a certain class of material; and indeed we refer to collectively to this class as ‘material culture’. And digital methods are a means that we use to the end of getting at the knowledge that comes from interpretation of material culture.
The digital humanities have much broader shared heritage which, as well as being methodological, is also primarily textual. This fact is illustrated by the main print publication in the field being called Literary and Linguistic Computing. It is not, I think, insignificant as an indication of how things have moved on that that a much more recently (2007) founded journal has the less content-specific title Digital Humanities Quarterly. This, I suspect, is related to the reason why digitisation so often falls between the cracks in the priorities of funding agencies: there is a perception that the world of printed text is so vast that trying to add to the corpus incrementally would be like painting the Forth Bridge with a toothbrush (although this doesn’t affect my general view that the biggest enemy of mass digitisation today is not DEC or public spending cuts, but the Mauer im Kopf that form notions of data ownership and IPR). The digital humanities facing a tension between variable availability of digital material, and the broad access to content that any porting over to the ‘digital’ that the word ‘humanities’ implies. As Stuart Jeffrey’s talk in the session made clear, the questions facing archaeology are more about what data archaeologists throw away: the emergence of Twitter, for example, gives an illusion of ephemerality, but every tweet adds to the increasing cloud of noise on the internet; and those charged with preserving the archaeological record in digital form must decide where where the noise ends and the record begins.
There is also the question of what digital methods *do* to our data. Most scholars who call themselves ‘digital humanists’ would reject the notion that textual analysis, which begins with semantic and/or stylometric mark-up is a purely quantitative exercise; and that qualitative aspects of reading and analysis arise from, and challenge, the additional knowledge which is imparted to a text in the course of encoding by an expert. However, as a baseline, it is exactly the kind of quantitative reading of primary material which archaeology – going back to the early 1990s – characterized as reductionist and positivist. Outside the shared zone of the Venn diagram, then, must be considered the notions of positivism and reductionism: they present fundamentally different challenges to archaeological material than they do to other kinds of primary resource, certainly including text, but also, I suspect, to other kinds of ‘humanist’ material as well.
A final point which emerged from the session is the disciplinary nature(s) of archaeology and the digital humanities themselves. I would like to pose the question as to why the former is often expressed as a singular noun whereas the latter is a plural. Plurality in ‘the humanities’ is taken implicitly. It conjures up notions of a holistic liberal arts education in the human condition, taking in the fruits of all the arts and sciences in which humankind has excelled over the centuries. But some humanities are surely more digital than others. Some branches of learning, such as corpus linguistics, lend themselves to quantitative analysis of their material. Others tend towards the qualitative, and need to be prefixed by correspondingly different kinds of ‘digital’. Others are still more interpretive, with their practitioners actively resisting ‘number crunching’. Maybe therefore we would free ourselves of the restrictions of nomenclature if we thought in terms of ‘digital archaeologies’; of branches of archaeology which require (e.g.) archiving, communication, semantic web, UGC and so on; and some which don’t require any. I can’t doubt that the richness and variety of the conference last week is the strongest argument possible for this.
Herewith a slightly belated report of the recent talk in the CeRch seminar series given by Professor Mike Thelwell of Wolverhampton University. Mike’s talk, Webometric Analyses of Social Web Texts: case studies Twitter and YouTube concerned getting useful information out of social media, primarily social science means: information, specifically, about the sentiment of the communications on those platforms. His group produces software for text based information analysis, making it easy to gather and process large scale data, focusing on Twitter, YouTube (especially the textual comments), and the web in general and the Technorati blog search engine, also Bing. This shows how a website is positioned on the web, and gives insights as to how their users are interacting with them.
In sentiment analysis, a computer programme reads text and predicts whether it is positive or negative in flavour; and how strongly that positivity or negativity is expressed. This is immensely useful in market research, and is widely employed by big corporations. It also goes to the heart of why social media works – they function well with human emotions, and tracks what role sentiments have in social media. The sentiment analysis engine is designed for text that is not written with good grammar. At its heart is a list of 2,489 terms which are either normally positive or negative. Each has a ‘normal’ value, and ratings of -2 – -5. Mike was asked if it could be adapted to slang words, which often develop, and sometime recede, rapidly. Experience is that it copes well with changing language over time – new words don’t have a big impact in the immediate term. However, the engine does not appear to work with sarcastic statements which, linguistically, might have diction opposite to its meaning, now with (for example) ‘typical British understatement’. This means that it does not work very well for news fora, where comments are often sarcastic and/or ironic (e.g. ‘David Cameron must be very happy that I have lost my job’). There is a need for contextual knowledge – e.g. ‘This book has a brilliant cover’ means ‘this is a terrible book’, in the context of the phrase don’t judge a book by its cover. Automating the analysis of such contextual minute would be a gigantic task, and the project is not attempting to do so.
Mike also discussed the Cyberemotions project. This looked at peaks of individual words in Twitter, e.g. Chile, when the earthquake struck in February 2010. As might be expected, positivity decreased. But negativity increased only by 9%: it was suggested that this might have been to do with praise for the response of the emergency services, or good wishes to the Chilean people. Also, the very transience of social media means that people might not need to express sentiment one way or another. For example, simply mentioning the earthquake and its context would be enough to convey the message the writer needed to convey. Mike also talked about the sentiment engine’s analysis of YouTube. As a whole, most YouTube comments are positive, however those individual videos which provoke many responses are frequently negatively viewed.
So I tried the sentiment engine (www. http://sentistrength.wlv.ac.uk), with an entirely hypothetical phrase, ‘I hate commuting to London but it’s all worth it to work at King’s’. This returns quite a negative reckoning because of the presence of the word ‘hate’, a strong negative, and there is no corresponding strong positive to counteract it – even though, semantically, the positive part of the sentence should cancel out the strong negative, even though it is less strongly expressed. So this technique is good as an investigative method for large corpora; it can fall down when applied in individual cases.
One wonders if it might be useful in XML/RDF projects such as SAWS, or indeed to book reviews on publications such as www.arts-humanities.net.
by Przemek Lenkiewicz
The CLARA Summer School on Infrastructure Tool Development has taken place at Max Planck Institute for Psycholinguistics on 5th – 12th July.
Participants came from several institutions, including the University of Bielefeld, the Technical University of Aachen, Gießen University or Technical School of Mittelhessen. Some representatives of Max Planck staff also participated in parts of the summer school, especially those requiring less technical expertise. Altogether they have created a very inspiring and productive group that managed to carry out the tasks planned for the event and also came up with some new ideas for developing useful things, which also have been done during the summer school.
On the first day Przemek Lenkiewicz opened the summer school and introduced participants to the agenda and all extra activities. Participants were also encouraged to present themselves and their work, giving an idea about how they use ELAN and what are they hoping to learn at this event.
Later Han Sloetjes, the main developer of ELAN, has presented the annotation tool and introduced its mechanisms for creating and integrating extensions (recognizers). Some users said that although they have used ELAN for quite a long time, they were not even aware that it is possible to extend its functionality and that it is so simple. Han has spent the whole day with participants to clear out any doubts they might have. He also showed up on following days and participated in the development sessions.
Days 2-4 of the event were about signal processing techniques. Stefano Masneri of Fraunhofer HHI Berlin and Dr. Rolf Bardeli of Fraunhofer IAIS Sankt Augustin have introduced the participants to video and audio processing basics. In the afternoon hands-on sessions participants have developed some simple video/audio processing algorithms, like histogram calculations for both audio and video, color-to-greyscale conversion, image flipping, etc. But also more advanced functionality was developed, like detecting a person’s hand in a video using edge detector as the base or detecting fricatives in a speech recording using thresholding.
The last two days of the summer school were led by Przemek Lenkiewicz and Eric Auer. In a brainstorming session with the participants we defined two recognizers, which are interesting for them to develop. Those included automated importing of eye-tracking data into ELAN and representing it as annotations and curves, and also a recognizer to compare two tiers based on the similarity of the annotations. Both recognizers have been successfully developed until the end of the summer school.
Since the summer school included the weekend, the group met and explored Nijmegen for a while. On Monday July 11th we also had dinner together in a nice Dutch restaurant.
Additional pictures from the event can be found on this web page:
After the event participants have filled a survey and rated the summer school very well for a good content, good way to deliver it and for overall organization. Considering the good feedback, another Summer School on Infrastructure Tool Development might take place at Max Planck in summer 2012. All interested in participating should contact Przemek Lenkiewicz about it.