Chicks rule?

 diagram, People, technology, Web  Comments Off
May 232012
 

Who Rules The Social Web? - Information is Beautiful
Back in 2010 we calculated there were 74 million more women on social networks than men. We’ve had a fresh look at the data. In the age of Facebook, Pinterest and Instagram, do chicks still rule?

» See the new diagram
» See the old diagram


 Posted by on May 23, 2012
Apr 192012
 

Being able to embed non-standard fonts in websites is a huge step forward for the . I’ve been using Typekit for a while and wanted to share my experiences with the service.

I’ve subscribed to the Portfolio Plan for $4 per month which gives you access to the whole Typekit Font Libray and lets you embed Typekit Fonts on an unlimited number of websites. You can have up to 500.000 Pageviews per month in this plan.

Setting it up/Technical Stuff

After you have added the fonts you want to use on your site to the Kit, you have to insert a small script in your html code (it goes into the head tag):

<script type="text/javascript" src="http://use.typekit.com/xxx.js"></script>
<script type="text/javascript">try{Typekit.load();}catch(e){}</script>

Then you can either add the corresponding font class to your html tags (e.g. «.tk-news-gothic-std») or you can use it directly in your css with the font-family property:

body {
  font-family: "news-gothic-std", sans-serif;
}

The first time I tried it out I couldn’t get it to work. Took some time to figure it out, the problem was the font property in css:

body {
  font: 0.8125em/1.692307em "news-gothic-std", sans-serif;
}

The font didn’t load at all with above code, at the end the code below made it work (don’t know why though):

body {
  font-family: "news-gothic-std", sans-serif;
  font: 0.8125em/1.692307em "news-gothic-std", sans-serif;
}

Workflow with Typekit

Here’s the catch: You can’t use Typekit in Photoshop or Fireworks. You have to convert your into HTML & CSS, then you can try out the Typekit fonts in the browser. For me as a designer that’s a big issue. I don’t know if you have seen Bret Victor’s talk on «Inventing on Principle» (you definitely should, it’s great!), but designers and makers in general have to see the effect of their changes instantly to have an effective workflow. Imagine I have three different website layouts and want to test some Typekit fonts with them. Now I have to convert all three layouts into HTML & CSS before I can see which font fits best? Why put all the work into converting all three layouts into HTML? I’m going to end up using only one of them, that workflow seems very ineffective to me. More time should be spent on the design process instead.

I could buy and download the fonts in Opentype Format and do the testing in Photoshop or Fireworks. If this should be «the way to go» that is a real disadvantage in my opinion. Expensive as well. From my perspective there should be a Typekit Photoshop Extension or something similar. Extensis showed how it’s done and released a Web Font Photoshop Plugin to use with their WebINK Web Font Service. As you know Typekit was acquired by Adobe, I hope they’re planning something similar. I downloaded the Photoshop CS6 Beta but there is no sign of a Typekit extension.

Such a Photoshop plugin would of course have to download some kind of font file to the computer (at least temporarily) so that could be a problem with the Font Foundries. I don’t know about the conditions between Typekit and the Foundries but I strongly believe the Foundries want to be assured that the people can’t just save those font files and do whatever they want with them. Maybe someone has some insight about the possibilities and limitations of the licensing model in use at Typekit.

For now I will stick with Typekit but I really hope they will find a solution for this in the near future because I think apart from this workflow problem it’s a great service.

Update: Paul Heyer from fnctions just told me that there’s also a Photoshop Plugin from Fontshop. This Plugin doesn’t download font files, it downloads the already rendered type (I’m guessing as image-data) directly from the Fontshop Servers. So that’s definitely a technical possibility to consider.

 Posted by on April 19, 2012
Feb 242012
 
Five years ago I wrote, what for some reason, was the first post for my blog that got any sort of attention. Basically it was a run down of the “Top Ten” software tools I use as an educator. At the time I was consistently asked by colleauges what computer “stuff” I used, so I decided to narrow it down to one post and include publish it. Indeed when I first started blogging I thought my entire website would be about tech tools and tips for academics. That roll is now fulfilled in a far better way by so many other sites that I hardly use this site for that anymore, or at least not directly. 

But I did think the more than five year anniversary was worth revisiting and taking a look at how my media environment has changed. I actually think these sorts of posts can be pretty useful, as the how of our computer use is often obscured, despite the fact that it is so varied. In my classes I often like to have students talk about what programs, apps, techniques they use as a way to show a diversity of approaches.

The most substantial change I have made is moving away from Apple. Once an avid promoter of their products at this point I am so concerned about the computing environment they are building that it is worth my time to look for something else. I have moved from my main computer running Mac OS to one which runs Ubuntu. Indeed although I still have a Mac laptop I rarely use it for anything I can’t do on Ubuntu (http://www.ubuntu.com/), and am looking to purchase a new laptop soon, one I suspect will not be made by Apple (or if it is, Macbook Air?, I will use it to run Ubuntu). I no longer use an iPhone, which has been replaced by a Nexus S (which I tell people is so much more powerful than an iPhone). The only Apple product I consistently use, and find to be ridiculously useful (more on that later), is the iPad, but with the substantial number of impressive Android tablets coming out, I suspect it is only a matter of time until I migrate away from this product as well. There is a decided shift here to platform independent services, and services which offer greater control, even sometimes at the cost of ease of use.

As such this list has changed substantially. So here, in some sort of rough order is my list of essential pieces to my computing environment.

-SpiderOak I replaced Dropbox with SpiderOak because of security concerns. And haven’t looked back since. Sure SpiderOak is a tad more difficult to set up, but the fact that the files are encrypted on my end and thus SpiderOak has no access to my files makes it more than worth it. I still use Dropbox to share articles or readings with students and store some files, but SpiderOak is my main system. SpiderOak works across all my devices (phone, laptop, desktop, tablet).

-WordPress. Oddly this didn’t even make the list back in 2006, even though my blog was running of WordPress. But now I use it for managing not only my blog, but my main site, and sites for the classes I am teaching (screw BlackBoard), as well as a separate one for my current research project. The ability to quickly roll out a good looking website, that is easy to update, and highly customizable is invaluable. I am always recommending to people that they build an online precense they control to display their scholarly work, and WordPress is the easiest, and one of the most powerful ways to do this.

-iAnnotate PDF. First tablet app on the list. This one is ridiculously useful. I use it to read and mark up PDFs. I use this to both comment on student work (especially grad students and drafts of papers), and to read journal articles. This lets me “carry with me” all the papers I need to read in digital format and still mark them up as if they were paper. Seriously, probably half the time I use the iPad its with this app.

-Instapaper. Throughout the day, I come across various articles that I want to read, but don’t have the time to read right then. Instapaper lets me save these pages for reading later. I actually have a habit of carving out an hour or two to read thru everything I have saved. This also has the bonus effect of not being distracted by articles which may seem like a good idea at the time, but a couple of days later seem irrelevant or only interesting as a distraction. I also use IFTT so that by favoriting Tweets in any Twitter client I am using they automatically get saved to Instapaper. After iAnnotate, Instapaper is probably the app I used most. (Although you can access Instapaper on any computing device.)

-Astrid. I played around a lot (way too much perhaps) with various todo list organizers. But this is the one I settled on. Mainly it came down to interface and cross platform use, coupled with the ability to connect to other services I use. I am still trying to figure out a way to integrate a todo list with voice commands effectively. I might hook Astrid to Prodcteev just to accomplish this.

-WiTopia. This is a paid VPN service. When connecting to the net via an untrusted connection a VPN service is critical. Our University I am sure has one they provide, but I prefer to control my own. A serious advanatage of WiTopia is that I can pick from an elaborate range of locations enabling me to connect to the net with an IP from anyone of a number of countries and getting outside of the “American” centric net (not to mention for watching the BBC). Even on campus I will use the VPN if I want to hide my traffic for anyone of a number of reasons. WiTopia is safe, easy to install, and works across all my devices.

-AutoHotKey  or TextExpander. These programs are ridiculously useful. I specify a series of characters and they are then instally replaced another. For example I type “aadd” anytime I want to add my mailing address to something and “aadd” is replaced by my full snail mail address. I use this for titles of books I have to type a lot, or code shortcuts: I never actually type “”<a href=”"></a>”.” You can also set up these programs to automatically drop in the most recently copied text, or insert today’s date etc.

-gedit. I used to use word processors or Scrivener to write. Now I just use a simple text editing program. Forget making the text look good, that’s for later. Now when I write I work it a very simple text only environment. On the Mac I had switched over to TextMate, on my home computer I use gedit. Seriously 80% of what I write starts off as basic text.

-A Good Hosting Service. The cloud and free services are one thing, but the ability to host your own site and control your own data etc., for me is crucial. Get your own hosting service for your website, set up your own email and stop counting on someone to do this for you. I prefer HostGator. But there are lots of good ones out there.
 Posted by on February 24, 2012
Dec 152011
 

The Scholars’ Lab at the University of Virginia has posted audio recordings of sessions from “The Humanities in a Digital Age,” a symposium that took place in November at UVA’s new Institute of the Humanities and Global Cultures. My keynote at the symposium was entitled “Humanities Scholars and the Web: Past, Present, Future,” and focused on what I believe are three critical elements of the web that scholars tend to overlook, or that cause concern because they upset certain academic conventions:

1) The openness and standards of the web produce generative platforms. The magic of the web is that from relatively simple technical specifications and interoperability arise an incredibly varied and constantly innovative set of genres. For those wedded to traditional forms such as the book and article, this can be difficult to understand and accept.

2) Interfaces shape genres. Tracing the history of web applications used to make blogs, from early link aggregators to the blank page of WordPress 3′s full-screen writing environment, shows this in action. Humanities blogs shifted in helpful ways over the last 15 years, into modes that should be more acceptable to the academy, as these interfaces changed. Being in control of these interfaces is important as we continue to develop online scholarship.

3) Communities define practice. Conventions around web genres are created by those participating in them. This has serious implications for what the academy might be able to do with the web in the future.

You can hear about these three main points and much more in the talk, which is available as a podcast or audio stream near the bottom of this page. Part of the talk comes from chapter 1 of The Ivory Tower and the Open Web.

Oct 042011
 

Entering the web of data

[view the presentation...]

Keynote delivered at the annual conference of the Australia and New Zealand Society of Indexers, 14 September 2011.


This is me.

Today, Wednesday, 14 September 2011, I’m honoured to be able to join you here in the luxurious surrounds of the Brighton Savoy Hotel for the ‘Indexing See Change‘ conference. This is an event, a moment in history; we can pinpoint ourselves, this gathering, both in time and in space.

If we do that, if we move outside the moment and position ourselves on a timeline or a map, interesting things start to happen. Connections emerge.

Here we are at number 150, The Esplanade, in Brighton. A bit over a kilometre away is the stately villa, Kamesburgh. For many years Kamesburgh was also known as the Anzac Hostel — a refuge for permanently-incapacitated World War One veterans.

The Anzac Hostel opened on 5 July 1919. Here it is draped in its patriotic finery, from the collections of the Australian War Memorial. According to the caption, the Anzac Hostel was ‘a home, not an institute’.

Also amongst the War Memorial’s holdings is a wheeled bed that was used at the hostel. This particular bed was apparently occupied by one man, Albert Ward, for forty-three years.

Death notice for Alexander Kelley. Argus, 29 January 1944.

It was probably in a bed just like this that Alexander Dewar Kelley passed away on 27 January 1944. Alexander Kelley was cremated, and his remains interred amongst the roses at what is now called the Springvale Botanical Cemetery. Not far from my own grandparents.

Alexander Kelley spent close to half his life in the Anzac Hostel. Like many young men, he bravely answered his nation’s call to arms, but returned from war much changed. We can follow Alex’s war through his service record, easily-accessible through the website ‘Mapping Our Anzacs‘.

Alex was a coach painter who enlisted in the AIF in January 1916. Within a year he was in France. In May 1917 he suffered a gunshot wound to the head, but was able to rejoin his unit in August. Less than a month later though, he was wounded again, this time more severely. For Alex the war was over, and he was shipped back to Australia in May 1918.

‘Mapping Our Anzacs’ includes a scrapbook feature through which visitors to the site can attach notes or photographs to a service record. Amongst the the many thousands of postings is a fragment from a diary, found tucked inside the bible of Alexander Kelley’s mother. The diary entry reads simply: ‘Alex arrived from Front. Wet day. Saw him at “Caulfield”.’

Alex had survived and had returned to his family. This was a day to remember. But there was sadness too, for Alex was not the same young man who had left for the battlefields of Europe. In the diary fragment, ‘Caulfield’ is enclosed in inverted commas, indicating perhaps that the reunion took place, not in the suburb, but in the Caulfield rehabilitation hospital. Alexander Kelley was wounded in the face, hands and legs. He was left blind in both eyes and his right leg was amputated. He would live the remainder of his life a little over a kilometre away from here at the Anzac Hostel.

This is just one story. There are over 375,000 World War One service records held by the National Archives of Australia. How can we hope to understand a number like that? How can we hope to imagine the war’s impact on families, on communities?

‘Mapping Our Anzacs’ uses familiar Google maps to display the places of birth and enlistment recorded in many of those service records. But technical limitations make it impossible to display all the places at once. You can, however, take the same data and open it in Google Earth. If you then zoom in on Victoria, you see something like this.

Mapping Our Anzacs data viewed in Google Earth.

Each marker represents a place where a service person was born or enlisted. It’s impossible to read, of course, but that’s the point. There is so little blank space. As you zoom further, more markers appear, more place names resolve. It’s simple, but it’s powerful. They came from everywhere. From the smallest village to the biggest city; nowhere was untouched.

The ‘Mapping Our Anzacs’ scrapbook offers another perspective. It’s possible to extract the images posted to the scrapbook and present them on a 3D wall. Amidst an assortment of memorabilia, there are faces. Not places, or records — this is a wall of people.

Mapping Our Anzacs Scrapbook photos viewed through CoolIris

It’s worth noting too that like the markers on the maps, these faces link back to the actual service records. So they’re not just a new way of seeing the collection, they’re a new way of exploring it.

But the records don’t stand in isolation, they themselves have a context. A couple of years ago, Mitchell Whitelaw from the University of Canberra, undertook a project called ‘The Visible Archive‘ to investigate ways of visualising the holdings of the National Archives of Australia. Have you ever wondered what 360km worth of records looks like?

The collections of the NAA visualised by Mitchell's Series Browser.

This represents the holdings of the National Archives. Files within the archives are organised into series, and each square in this image represents a single series — there are about 60,000 of them. Naturally the size of the square gives an indication of the size of the series itself. It’s a fascinating and strangely beautiful picture.

It’s easy enough to pick out the World War One service records — Series B2455. In the interactive version of Mitchell’s series browser you can click on a box and display links between series, as well as other series created by the same government agency. Again, it’s not just a way of seeing the collection, but a means of exploring and interpreting it. As Mitchell says:

Visualisation enables us to literally show everything, to display large volumes of data in a way that reveals patterns and communicates context, but also provides access to the fine grain of individual elements.

But we can also employ such techniques to ask new kinds of questions. Can you imagine how Alexander Kelley and the other inhabitants of the Anzac Hostel must have felt in 1939? They had lost so much in the Great War, the ‘war to end all wars’, and yet within their own lifetime it was all happening again. More young men were answering the call, more lives were going to be destroyed.

There must have been a dreadful, disheartening moment when Australians realised that the Great War was not an end, but a beginning — the first in a series of devastating global conflicts. At some point the ‘Great War’ became the ‘First World War’, but when?

When did the 'Great War' become the 'First World War'?


This is one possible answer. This graph draws its data from the 50 million or so digitised newspaper articles in Trove, the National Library of Australia’s discovery service. It shows the proportion of newspaper articles that included the phrase ‘the great war’ compared to the proportion containing ‘the first world war’ (and variations thereof). The lines cross late in 1941. With German victories in Europe and Africa, the opening of the Eastern Front and the Japanese attack on Pearl Harbour, 1941 makes sense.

What is perhaps more intriguing is the dramatic peak in the occurrence of ‘the great war’ in 1939. It’s no surprise that the looming threat of a new conflict would provoke comment and comparisons, but it does make you wonder about the context of those discussions and how they might have changed as the reality of war edged closer.

To start exploring this I’ve harvested the content of the 6,600 articles from 1939 that included the phrase ‘the great war’. Using an online text analysis service called VoyeurTools I can quickly generate a picture of their contents.

This simple visualisation shows us the relative frequencies of words within the articles. It doesn’t reveal any great mysteries, but it does suggest some possibilities for further prodding. The prevalence of ‘time’ and ‘new’, for example — might these help us understand the shift in perspective from one war to the next? We can follow this up by browsing the different contexts in which the words were used.

But what actually is it that we’re actually searching? We know that Trove includes newspapers from 1803 to 1954, but if we’re really going to analyse shifting words and ideas it’s important to have a clear picture of the sources of those words.

Something like this perhaps. This graph shows the holdings of the Trove newspaper database on 4 August 2011, organised by state. You can see, for example, that if you’re searching on a topic between the 1920s and 1940s you’re probably likely to get more results from Queensland than anywhere else.

So starting from our location here, today, we can make connections across time and space. We can pull back and look at the big picture, or dive in and examine the fabric of a single life. Through the web we can build and explore a rich and complex contextual network.


It’s an exciting time to be a cultural data hacker. We now have a growing range of tools and technologies available for extracting interesting data from a wide variety of sources, both structured and unstructured.

The ‘Visible Archive’ project started with well-structured data, courtesy of Peter Scott, the developer of the Series System — the descriptive framework used by many Australian archives. But we’re rarely so lucky.

Even when the data starts off in nicely-organised fields in a database there’s no guarantee that that’s how it’s going to be delivered to our web browser. In order to extract the data from my Trove graphs, for example, I had to write a little program called a ‘screen scraper‘ to identify and save the important metadata elements from the raw web page itself.

Where there are no subject keywords we can infer them using techniques such as topic modelling. Where there are no access points we can identify people, organisations, places and events using special tools developed for named entity extraction. Where there are no common identifiers across datasets we can employ record linkage technologies to find possible connections.

We can count words, we can identify parts of speech, we can formulate a measure of the similarity of any two pieces of text. Once we have some useful data we can manipulate and enrich it. Place names can be geolocated — you simply send your place name off to a web service and get back its latitude and longitude.

Increasingly these sorts of tools are becoming accessible to anyone. For historians they offer a means of wrestling with rapidly-growing bulk of source material that is becoming available in digital form. How do you make use of 5 million digitised books, 50 million newspaper articles or the complete archive of every public message ever sent on Twitter?

The digital historian Dan Cohen has noted:

These computational methods which allow us to find patterns, determine relationships, categorize documents, and extract information from massive corpuses, will form the basis for new tools for research in the humanities and other disciplines in the coming decade.

Dan is involved in a number of interesting projects investigating the possibilities of these techniques — often grouped together under the heading ‘text mining’. One of these projects, ‘With Criminal Intent‘, is looking to see what patterns can be drawn out of the digitised proceedings of criminal trials held at the Old Bailey from 1645 to 1913. That’s 197,745 trials, in case you were wondering.

Here’s one of their visualisations showing how the length of trials varies over time. Much to the surprise of the research team, this graph suggests a dramatic shift in legal practice around 1825 — defendants started pleading guilty!

A visualisation by the With Criminal Intent project showing changing trial lengths.

Rather than falter under the growing weight of digital sources, these technologies can actually thrive. The more raw material available, the more chance there is to observe and track new patterns. As digitisation continues apace will we ever reach the point when history can simply be read from a graph?

There are some researchers at Harvard who seem to think that’s where we’re heading. Borrowing liberally from the store of scientific metaphors they have staked out the new field of ‘culturomics‘. By mining massive digital resources, like Google’s scanned books, they hope to map the ‘cultural genome’ that would enable us to follow the evolution of language and culture.

But there’s something quite barren in this ambition. I prefer the vision of digital humanist Stephen Ramsay, who commented in regard to the ‘With Criminal Intent’ project:

The Old Bailey, like the Naked City, has eight million stories. Accessing those stories involves understanding trial length, numbers of instances of poisoning, and rates of bigamy. But being stories, they find their more salient expression in the weightier motifs of the human condition: justice, revenge, dishonor, loss, trial. This is what the humanities are about. This is the only reason for an historian to fire up Mathematica or for a student trained in French literature to get into Java.

Ultimately it’s the stories that nourish, anger, inspire and depress us. The closely-packed map of places recorded in World War I service records is so powerful because we know that under each marker are men, women, families, communities — each with their own story. These new technologies offer new perspectives, they raise new questions, and they challenge us with new contexts to explore and understand. But there is still space for stories and perhaps we can use them to give our stories new life and depth.


This is another World War One service record. It belongs to Charlie Allen. Charlie enlisted three times in the AIF and was discharged on medical grounds each time. It seems he had a problem with his ankle.

Charlie’s service record notes a tattoo, proclaiming his love for ‘Maud Gordon’. He married Maud in Sydney in 1917 and had two daughters soon after.

Charlie survived the war without further injury, but was not so lucky in peace. On 11 March 1938, Charlie was crushed to death between two railway cars. The accident happened at the Bunnerong Power Station, only a short distance from his home in Matraville. He was buried nearby in the Botany Cemetery.

We also know quite a bit about Charlie’s early life. Why? Because Charlie’s father was Chinese and he was therefore categorised as a ‘half-caste’, as someone who was not white, and therefore fell under the restrictions imposed by the White Australia Policy.

Charlie was born in Sydney in 1896. His mother was Frances Allen (sometime sweet shop owner and brothel keeper), his father Charlie Gum (a buyer for Wing On company). Charlie was raised by his mother, but in 1909, at the age of 13, he was taken to China by his father.

NAA: ST84/1, 1909/22/41-50

This certificate granted Charlie an exemption to the Dictation Test. Without it, he may not have been allowed back into the country.

Every time one of many thousands of non-Europeans resident in Australia sought to travel overseas and return home again they needed one of these certificates.

Charlie’s father returned to Sydney, leaving him in China. He lived with relatives in the town of Shekki (inland from Hong Kong). Charlie was naturally homesick, but had no means of getting back to Australia. He wrote to his mother in 1910:

Do try and bring me home every minute I think of you and long for a piece of bread and butter this tucker is not doing me well.

His mother wrote to the Prime Minister Billy Hughes in an attempt to enlist government help but to no avail. Charlie finally returned to Australia in 1915.

Despite this experience, Charlie visited China again in 1922 for 7 months. Once again carrying papers to grant him re-entry to the country of his birth.

These fragments of Charlie’s life have been assembled by my partner, Kate Bagnall, a historian of Chinese-Australia. They are remarkable, and yet not so, because there are many thousands of stories like Charlie’s contained within the voluminous records generated by the administration of the White Australia Policy.

We’re all of course familiar with the general outlines of the White Australia Policy, and the way it underpinned conceptions of Australia as a nation in the first half of the 20th century.

But what we sometimes forget is that it was also a massive bureaucratic exercise.

Forms and certificates were printed, issued, used and filed. Regulations were modified, guidelines were distributed and administering officers were managed and advised. Individual cases were reviewed, policy was changed and new forms and certificates were printed, issued, used and filed…

Much of this system is now preserved in the National Archives.

You can get a idea of the range of material available from a case study Kate has prepared focusing on the efforts of Poon Gooey, a successful businessman in Horsham, to keep his wife and family in Australia.

If we look again at Charlie’s certificate from 1909 we can see that it contains a lot of interesting structured data:

  • name
  • place of birth
  • age
  • height
  • destination
  • date of departure
  • name of ship

We estimate that there are probably about 50,000 of these forms remaining in the Archives, and then there’s case files and a variety of other government documents.

Wouldn’t it be great if we could extract this structured data. If we could piece together the slivers of identity that remain within the Archives and give people back their lives.

This is the dream of Invisible Australians, a project Kate and I are trying to turn into a reality. Our aim is to build systems that will enable this data to be extracted, aggregated, shared and connected — whether to a family tree, a cemetery record, or another document in another archive.

Imagine being able to navigate the network of lives, families and relationships. To follow their journeys, to share their tragedies, to celebrate their small victories against a repressive system.

Imagine being able to watch them age.


We tend to assume that new technologies require us to change, to adapt. But sometimes they can take advantage of our strengths. Mitchell Whitelaw is interested in finding out what happens when you take large cultural datasets and try to ‘show everything’. Such an approach, he suggests, takes advantage of the raw processing power of computers, while giving us space to do what we’re good at — finding patterns, making connections, crafting meanings.

The History Wall tries to create a similar sort of space. The History Wall brings together material from a range of different sources — newspaper articles from Trove, biographies from the Australian Dictionary of Biography, records from a database of NSW convicts, population statistics, collection items from the National Museum of Australia — you can pretty much plug anything in as long as it has a date attached to it.

Irish History Wall

For a particular year, the Wall retrieves a random sample from the available sources, jumbles everything up and then throws it onto the screen. As a result, no two views of the Wall are ever quite the same. This is not a traditional exhibition. There is no curator controlling the content or designing the structure. It’s ephemeral, it’s serendipitous — instead of relying on an authorial voice to smooth over the gaps and transitions, it leaves open the cracks and allows new contexts to seep in and around each item.

As the pioneering digital historian Edward Ayers noted:

even isolated and inert pieces of evidence — a list, a letter, a map, a picture — can assume new and unimagined meanings when placed in juxtaposition with other fragments.

This is not an absence of narrative, but an opportunity for narration. Edward Ayers suggests that we’re actually quite comfortable filling in blanks and untwisting timelines:

Humans, presented with pieces of information about people, put things into the form of a story. They need not be simple stories, for we know how to deal with unexplained lapses of time, flashbacks, and overlapping narratives. We know how to imagine, infer, things happening at the same time in different places. Film and television train all of us at early ages to weave strands of narrative out of intentional (if carefully constructed) confusion and to take pleasure in that weaving.

And so I can show you a death notice, or a certificate and you will take those fragments, those isolated data points and you will construct a story — you will see the person behind them, you will imagine their life. It’s what we do. We’re good at it.

Computers on the other hand will just see data.

In her ode in praise of humanities data, digital humanist Amanda French wonders whether we always need to crunch our data into abstract, pliable forms:

What I wonder is whether instead we can begin with the data, or with a datum, and simply watch for what it may tell us, even if what it tells us is simply a story.

Yes we can. And we should teach computers how to do it as well. Not because we want them to take over. Not because they can necessarily do it faster or better. But because they can help us share, preserve and connect those stories.

Let’s think again about the array of documents that Kate has assembled to piece together the story of Charles Allen. How can you share this sort of material? Typically you’d ‘write it up’. You’d capture the story behind the data and commit it to words. The documents would then become evidence — points of connection between your text and the historical record.

So in order to share the meanings of these documents we remove them from the context of the person’s life and marshal them as allies to proclaim the authenticity of our rendering. Wouldn’t it be better if we could tell the story, but maintain within our texts the direct connections between sources and subject?

What we need is a data framework that sits beneath the text, identifying people, dates and places, and defining relationships between them and our documentary sources. A framework that computers could understand and interpret, so that if they saw something they knew was a placename they could head off and look for other people associated with that place. Instead of just presenting our research we’d be creating a whole series of points of connection, discovery and aggregation.

Sounds a bit far-fetched? Well it’s not. We have it already — it’s called the Semantic Web.

The Semantic Web exposes the structures that are implicit in our web pages and our texts in ways that computers can understand. The Linked Data movement takes the basic ideas of the Semantic Web and turns them into a collaborative activity. You share vocabularies, so that other people (and computers) know when you’re talking about the same sorts of things. You share identifiers, so that other people (and computers) know that you’re talking about a specific person, place, object or whatever.

Linked Data is Storytelling 101 for computers. It doesn’t have the full richness, complexity and nuance that we invest in our narratives, but it does at least help computers to fit all the bits together in meaningful ways. And if we talk nice to them, then they can apply their newly-acquired interpretative skills to the things that they’re already good at — like searching, aggregating, or generating the sorts of big pictures that enable us to explore the contexts of our stories.

This is why we’ve always imagined Invisible Australians to be something more than an online database. We want to provide points of connection that other people can build into their own stories. But to do that we have to pay attention to things like vocabulary management and authority control, we have to construct web addresses that are not going to break every time we upgrade our software. We have to think about the sorts of things we’re talking about — not just people, but government agencies, legislation, certificates, and correspondence. How do we describe these entities and what sorts of relationships do they have?

And of course we need to expose all these structures so that we can say, these things are people, these are events, these are places and these are documents.

Or perhaps, to introduce Alexander Kelley.

Or remember Charles Allen.


You might be wondering why we don’t just leave it all to the computers themselves. Didn’t I just talk about all the exciting new tools and techniques that enable us to analyse the structures of texts? Perhaps we should just wait for the Culturomics guys to solve all the problems.

But who defines the problems?

Our postmodern sensibilities encourage a suspicion of neutrality. Labels like ‘the new museology’ or Archives 2.0 reflect an awareness that the way we describe and arrange our collections is itself culturally-determined. It’s not just a matter of what our descriptive systems show, but what they hide.

Tim Hitchcock, another member of the ‘With Criminal Intent’ team, has described how online technologies can change the way we access archives. Instead of being forced to navigate the hierarchical structures that archives impose on records, which in turn tend to reflect the workings of the institutions that created the records, we can directly find the people whose lives were regulated, influenced, shaped or controlled by the policies of those institutions.

Instead of merely hearing ‘the institutional voice… in all its stentorian splendour’, he says, we can listen in to ‘the quieter tones uttered by the individual’.

This reminds us that search boxes, along with other digital tools, themselves embody arguments. There are assumptions built into their code about what is relevant, what is significant, what is necessary.

We can build our own tools of course, and we can critique other people’s algorithms. But what if we just want to collect and share stories?

Linked Data gives us a way to present an alternative to Google’s version of the world. We can argue back against the search engines, defining our own criteria for relevance, and building our own discovery networks.

Changing the way we access resources changes the sorts of stories we can tell. Tim Hitchcock asks:

What happens when institutions and archives are ‘decentred’ in favour of the individual? What changes when we examine the world through the collected fragments of knowledge that we can recover about a single person, reorganised as a biographical narrative, rather than as part of an archival system?

Perhaps the invisible become visible.

 Posted by on October 4, 2011
Sep 062011
 

Evolution of the web

In celebration of Chrome's third birthday, Google teamed up with Hyperakt and Vizzuality to explore the evolution of the Web:

Over time web technologies have evolved to give web developers the ability to create new generations of useful and immersive web experiences. Today's web is a result of the ongoing efforts of an open web community that helps define these web technologies, like HTML5, CSS3 and WebGL and ensure that they're supported in all web browsers.

The black timelines show major browser releases. As you click each browser icon, you can see how the browser window has changed for each release, which I think is the most interesting part of the interactive.

Color bands represent browser technologies such as JavaScript, HTML, and Flash, and the bands grow as new browsers integrate the technologies. The intertwining of bands is supposed to show the interaction between different technologies, but it gets fuzzy here. Does the vertical position of bands mean anything? Does shape mean anything, or is it more for show? I think it's a little of both. More the latter. Fun to poke around memory lane either way.

[Thanks, Deroy]

Jul 272011
 

In the summer of 2007, Nate Silver decided to conduct a rigorous assessment of the inexpensive Mexican restaurants in his neighborhood, Chicago’s Wicker Park. Figuring that others might be interested in the results of his study, and that he might be able to use some feedback from an audience, he took his project online.

Silver had no prior experience in such an endeavor. By day he worked as a statistician and writer at Baseball Prospectus—an innovator, to be sure, having created a clever new standard for empirically measuring the value of players, an advanced form of the “sabermetrics” vividly described by Michael Lewis in Moneyball.1 But Silver had no experience as a food critic, nor as a web developer.

In time, his appetite took care of the former and the open web took care of the latter. Silver knit together a variety of free services as the tapestry for his culinary project. He set up a blog, The Burrito Bracket, using Google’s free Blogger web application. Weekly posts consisted of his visits to local restaurants, and the scores (in jalapeños) he awarded in twelve categories.

Home page of Nate Silver’s Burrito Bracket
Ranking system (upper left quadrant)

Being a sports geek, he organized the posts as a series of contests between two restaurants. Satisfying his urge to replicate March Madness, he modified another free application from Google, generally intended to create financial or data spreadsheets, to produce the “bracket” of the blog’s title.

Google Spreadsheets used to create the competition bracket

Like many of the savviest users of the web, Silver started small and improved the site as he went along. For instance, he had started to keep a photographic record of his restaurant visits and decided to share this documentary evidence. So he enlisted the photo-sharing site Flickr, creating an off-the-rack archive to accompany his textual descriptions and numerical scores. On August 15, 2007, he added a map to the site, geolocating each restaurant as he went along and color-coding the winners and losers.

Flickr photo archive for The Burrito Bracket (flickr.com)
Silver’s Google Map of Chicago’s Wicker Park (shaded in purple) with the location of each Mexican restaurant pinpointed

Even with its do-it-yourself enthusiasm and the allure of carne asada, Silver had trouble attracting an audience. He took to Yelp, a popular site for reviewing restaurants to plug The Burrito Bracket, and even thought about creating a Super Burrito Bracket, to cover all of Chicago.2 But eventually he abandoned the site following the climactic “Burrito Bowl I.”

With his web skills improved and a presidential election year approaching, Silver decided to try his mathematical approach on that subject instead—”an opportunity for a sort of Moneyball approach to politics,” as he would later put it.3 Initially, and with a nod to his obsession with Mexican food, he posted his empirical analyses of politics under the chili-pepper pseudonym “Poblano,” on the liberal website Daily Kos, which hosts blogs for its engaged readers.

Then, in March 2008, Silver registered his own web domain, with a title that was simultaneously and appropriately mathematical and political: fivethirtyeight.com, a reference to the total number of electors in the United States electoral college. He launched the site with a slight one-paragraph post on a recent poll from South Dakota and a summary of other recent polling from around the nation. As with The Burrito Bracket it was a modest start, but one that was modular and extensible. Silver soon added maps and charts to bolster his text.

FiveThirtyEight two months after launch, in May 2008

Nate Silver’s real name and FiveThiryEight didn’t remain obscure for long. His mathematical modeling of the competition between Barack Obama and Hillary Clinton for the Democratic presidential nomination proved strikingly, almost creepily, accurate. Clear-eyed, well-written, statistically rigorous posts began to be passed from browsers to Blackberries, from bloggers to political junkies to Beltway insiders. From those wired early subscribers to his site, Silver found an increasingly large audience of those looking for data-driven, deeply researched analysis rather than the conventional reporting that presented political forecasting as more art than science.

FiveThiryEight went from just 800 visitors a day in its first month to a daily audience of 600,000 by October 2008.4 On election day, FiveThiryEight received a remarkable 3 
million 
visitors, more than most daily newspapers
.5

All of this attention for a site that most media coverage still called, with a hint of deprecation, a “blog,” or “aggregator” of polls, despite Silver’s rather obvious, if latent, journalistic skills. (Indeed, one of his roads not taken had been an offer, straight out of college, to become an assistant at The Washington Post.6 ) An article in the Colorado Daily on the emergent genre represented by FiveThirtyEight led with Ken Bickers, professor and chair of the political science department at the University of Colorado, saying that such sites were a new form of “quality blogs” (rather than, evidently, the uniformly second-rate blogs that had previously existed). The article then swerved into much more ominous territory, asking whether reading FiveThirtyEight and similar blogs was potentially dangerous, especially compared to the safe environs of the traditional newspaper. Surely these sites were superficial, and they very well might have a negative effect on their audience:

Mary Coussons-Read, a professor of psychology at CU Denver, says today’s quick turnaround of information helps to make it more compelling.

“Information travels so much more quickly,” she says. “(We expect) instant gratification. If people have a question, they want an answer.”

That real-time quality can bring with it the illusion that it’s possible to perceive a whole reality by accessing various bits of information.

“There’s this immediacy of the transfer of information that leads people to believe they’re seeing everything … and that they have an understanding of the meaning of it all,” she says.

And, Coussons-Read adds, there is pleasure in processing information.

“I sometimes feel like it’s almost a recreational activity and less of an information-gathering activity,” she says.

Is it addiction?

[Michele] Wolf says there is something addicting about all that data.

“I do feel some kind of high getting new information and being able to process it,” she says. “I’m also a rock climber. I think there are some characteristics that are shared. My addiction just happens to be information.”

While there’s no such mental-health diagnosis as political addiction, Jeanne White, chemical dependency counselor at Centennial Peaks Hospital in Louisville, says political information seeking could be considered an addictive process if it reaches an extreme.7

This stereotype of blogs as the locus of “information” rather than knowledge, of “recreation” rather than education, was—and is—a common one, despite the wide variety of blogs, including many with long-form, erudite writing. Perhaps in 2008 such a characterization of FiveThirtyEight was unsurprising given that Silver’s only other credits to date were the Player Empirical Comparison and Optimization Test Algorithm (PECOTA) and The Burrito Bracket. Clearly, however, here was an intelligent researcher who had set his mind on a new topic to write about, with a fresh, insightful approach to the material. All he needed was a way to disseminate his findings. An academic at heart, his audience appreciated his extraordinarily clever methods for cutting through the mythologies and inadequacies of standard political commentary. All they needed was a web browser to find him.

A few journalists saw past the prevailing bias against non-traditional outlets like FiveThirtyEight. In the spring of 2010, Nate Silver bumped into Gerald Marzorati, the editor of the New York Times Magazine, on a train platform in Boston. They struck up a conversation, which eventually turned into a discussion about how FiveThirtyEight might fit into the universe of the Times, which ultimately recognized the excellence of his work and wanted FiveThirtyEight to enhance their political reporting and commentary. That summer, a little more than two years after he had started FiveThirtyEight, Silver’s “blog” merged into the Times under a licensing deal.8 In less time than it takes for most students to earn a journalism degree, Silver had willed himself into writing for one of the world’s premier news outlets, taking a seat in the top tier of political analysis. A radically democratic medium had enabled him to do all of this, without the permission of any gatekeeper.

FiveThirtyEight on the New York Times website, 2010

* * *

 

The story of Nate Silver and FiveThirtyEight has many important lessons for academia, all stemming from the affordances of the open web. His efforts show the do-it-yourself nature of much of the most innovative work on the web, and how one can iterate toward perfection rather than publishing works in fully polished states. His tale underlines the principle that good is good, and that the web is extraordinarily proficient at finding and disseminating the best work, often through continual, post-publication, recursive review. FiveThirtyEight also shows the power of openness to foster that dissemination and the dialogue between author and audience. Finally, the open web enables and rewards unexpected uses and genres.

Undoubtedly it is true that the path from The Burrito Bracket to The New York Times may only be navigated by an exceptionally capable and smart individual. But the tools for replicating Silver’s work are just as open to anyone, and just as powerful. It was with that belief, and the desire to encourage other academics to take advantage of the open web, that Roy Rosenzweig and I wrote Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web.9 We knew that the web, although fifteen years old at the time, was still somewhat alien to many professors, graduate students, and even undergraduates (who might be proficient at texting but know nothing about HTML), and we wanted to make the medium more familiar and approachable.

What we did not anticipate was another kind of resistance to the web, based not on an unfamiliarity with the digital realm or on Luddism but on the remarkable inertia of traditional academic methods and genres—the more subtle and widespread biases that hinder the academy’s adoption of new media. These prejudices are less comical, and more deep-seated, than newspapers’ penchant for tales of internet addiction. This resistance has less to do with the tools of the web and more to do with the web’s culture. It was not enough for Roy and I to conclude Digital History by saying how wonderful the openness of the web was; for many academics, this openness was part of the problem, a sign that it might be like “playing tennis with the net down,” as my graduate school mentor worriedly wrote to me.10

In some respects, this opposition to the maximal use of the web is understandable. Almost by definition, academics have gotten to where they are by playing a highly scripted game extremely well. That means understanding and following self-reinforcing rules for success. For instance, in history and the humanities at most universities in the United States, there is a vertically integrated industry of monographs, beginning with the dissertation in graduate school—a proto-monograph—followed by the revisions to that work and the publication of it as a book to get tenure, followed by a second book to reach full professor status. Although we are beginning to see a slight liberalization of rules surrounding dissertations—in some places dissertations could be a series of essays or have digital components—graduate students infer that they would best be served on the job market by a traditional, analog monograph.

We thus find ourselves in a situation, now more than two decades into the era of the web, where the use of the medium in academia is modest, at best. Most academic journals have moved online but simply mimic their print editions, providing PDF facsimiles for download and having none of the functionality common to websites, such as venues for discussion. They are also largely gated, resistant not only to access by the general public but also to the coin of the web realm: the link. Similarly, when the Association of American University Presses recently asked its members about their digital publishing strategies, the presses tellingly remained steadfast in their fixation on the monograph. All of the top responses were about print-on-demand and the electronic distribution and discovery of their list, with a mere footnote for a smattering of efforts to host “databases, wikis, or blogs.”11 In other words, the AAUP members see themselves almost exclusively as book publishers, not as publishers of academic work in whatever form that may take. Surveys of faculty show comfort with decades-old software like word processors but an aversion to recent digital tools and methods.12 The professoriate may be more liberal politically than the most latte-filled ZIP code in San Francisco, but we are an extraordinarily conservative bunch when in comes to the progression and presentation of our own work. We have done far less than we should have by this point in imagining and enacting what academic work and communication might look like if it was digital first.

If Digital History was about the mechanisms for moving academic work online, this book is about that digital-first culture of the web. It is, by necessity, slightly more polemical than Digital History, since it takes direct aim at the conservatism of the academy that twenty years of the web has laid bare. But the web and the academy are not doomed to an inevitable clash of cultures. Viewed properly, the open web is perfectly in line with the fundamental academic goals of research, sharing of knowledge, and meritocracy. This book—and it is a book rather than a blog or stream of tweets because pragmatically that is the best way to reach its intended audience of the hesitant rather than preaching to the online choir—looks at several core academic values and asks how we can best pursue them in a digital age.

First, it points to the critical academic ability to look at any genre without bias and asks whether we might be violating that principle with respect to the web. Upon reflection many of the best things we discover in scholarship are found by disregarding popularity and packaging, by approaching creative works without prejudice. We wouldn’t think much of the meandering novel Moby-Dick if Carl Van Doren hadn’t looked past decades of mixed reviews to find the genius in Melville’s writing. Art historians have similarly unearthed talented artists who did their work outside of the royal academies and the prominent schools of practice.

The genre of the blog has been received with less open-mindedness from the academy. Chapter 1, “What is a Blog?”, looks at the history of the blog and blogging, the anatomy and culture of a genre that is in many ways most representative of the open web. Saddled with an early characterization as being the locus of inane, narcissistic writing, the blog has had trouble making real inroads in academia, even though it is an extraordinarily flexible form and the perfect venue for a great deal of academic work. The chapter highlights the best examples of academic blogging and how they shape and advance arguments in a field. We can be more creative in thinking about the role of the blog within the academy, as a venue for communicating our work to colleagues as well as to a lay audience beyond the ivory tower.

This academic prejudice against the blog extends to other genres that have proliferated on the open web. Chapter 2, “Genres and the Open Web,” examines the incredible variety of those new forms, and how, with a careful eye, we might be able to import some of them profitably into the academy. Some of these genres, like the wiki, are well-known (thanks to Wikipedia, which academics have come to accept begrudgingly in the last five years). Other genres are rarer but take maximal advantage of the latitude of the open web: its malleability and interactivity. Rather than imposing the genres we know on the web—as we do when we post PDFs of print-first journal articles—we would do well to understand and adopt the web’s native genres, where helpful to scholarly pursuits.

But what of our academic interest in validity and excellence, enshrined in our peer review system? Chapter 3, “Good is Good,” examines the fundamental requirements of any such system: the necessity of highlighting only a minority of the total scholarly output, based on community standards, and of disseminating that minority of work to communities of thought and practice. The chapter compares print-age forms of vetting with native web forms of assessment and review, and proposes ways that digital methods can supplement—or even replace—our traditional modes of peer review.

“The Value, and Values, of Openness,” Chapter 4, broadly examines the nature of the web’s openness. Oddly, this openness is both the easiest trait of the web to understand and its most complex, once one begins to dig deeper. The web’s radical openness not only has led to calls for open access to academic work, which has complicated the traditional models of scholarly publishers and societies; it has also challenged our academic predisposition toward perfectionism—the desire to only publish in a “final” format, purged (as much as possible) of error. Critically, openness has also engendered unexpected uses of online materials—for instance, when Nate Silver refactored poll numbers from the raw data polling agencies posted.

Ultimately, openness is at the core of any academic model that can operate effectively on the web: it provides a way to disseminate our work easily, to assess what has been published, and to point to what’s good and valuable. Openness can naturally lead—indeed, is leading—to a fully functional shadow academic system for scholarly research and communication that exists beyond the more restrictive and inflexible structures of the past.

  1. Nate Silver, “Introducing PECOTA,” in Gary Huckabay, Chris Kahrl, Dave Pease et al., eds., Baseball Prospectus 2003 (Dulles, VA: Brassey’s Publishers, 2003): 507-514. Michael Lewis, Moneyball: The Art of Winning an Unfair Game (New York: W. W. Norton & Company, 2004).
  2. Frequently Asked Questions, The Burrito Bracket, http://burritobracket.blogspot.com/2007/07/faq.html
  3. http://www.journalism.columbia.edu/system/documents/477/original/nate_silver.pdf
  4. Adam Sternbergh, The Spreadsheet Psychic, New York, Oct 12, 2008, http://nymag.com/news/features/51170/
  5. http://www.journalism.columbia.edu/system/documents/477/original/nate_silver.pdf
  6. http://www.journalism.columbia.edu/system/documents/477/original/nate_silver.pdf
  7. Cindy Sutter, “Hooked on information: Can political news really be addicting?” The Colorado Daily, November 3, 2008, http://www.coloradodaily.com/ci_13105998
  8. Nate Silver, “FiveThirtyEight to Partner with New York Times, http://www.fivethirtyeight.com/2010/06/fivethirtyeight-to-partner-with-new.html
  9. Daniel J. Cohen and Roy Rosenzweig, Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web (University of Pennsylvania Press, 2006).
  10. http://www.dancohen.org/2010/11/11/frank-turner-on-the-future-of-peer-review/
  11. Association of American University Presses, “Digital Publishing in the AAUP Community; Survey Report: Winter 2009-2010,” http://aaupnet.org/resources/reports/0910digitalsurvey.pdf, p. 2
  12. See, for example, Robert B. Townsend, “How Is New Media Reshaping the Work of Historians?”, Perspectives on History, November 2010, http://www.historians.org/Perspectives/issues/2010/1011/1011pro2.cfm