sgsinclair

Profile in Fast Company

 Uncategorized  Comments Off
Sep 112012
 

Fast CompanyI’ve been meaning to post a link to a profile article that Adam Bluestein wrote on my digital humanities work in Fast Company. I’ve had a variety of experiences being interviewed, but this has been one of the most positive ones: I had a truly enjoyable extended chat with Adam and he managed to faithfully synthesize a lot of content into a relatively compact article. My only bone of contention is the title Big Data On Campus Is Like A Keg Stand For Your Brain, but presumably the editors are to blame for that, and not Adam :) .

CiteLab Lyrics Project Work Session

 Uncategorized  Comments Off
Apr 042012
 

World Music ChartsThe McGill CiteLab is a newly formed group – we’re starting small, with Andrew Piper and Mark Algee-Hewitt. As a forthcoming post will explain in more detail, we’re inspired by groups like the Stanford Literary Lab, though our own interests go beyond literary texts to include a wide range of cultural artifacts (though still mostly textual). Our first real work meeting was today, and I’ve posted some information and code on initial experiments with data from world music charts (I’ve made a concerted effort to make my post somewhat pedagogical and reproducible in nature, for those who may be interested in following along at home).

Voyant Workshop at DH2012 in Hamburg

 Uncategorized  Comments Off
Feb 042012
 

Voyant Tools   Introduction to Distant Reading Techniques with Voyant Tools, Multilingual Edition

You have a collection of digital texts, now what? This workshop provides a gentle introduction to text analysis in the digital humanities using Voyant Tools, a collection of free web-based tools that can handle larger collections of texts, be they digitized novels, online news articles, twitter feeds, or other textual content. This workshop will be a hands-on, practical guide with lots of time to ask questions, so participants are encouraged to bring their own texts. In the workshop we will cover the following:

  1. A brief introduction to text analysis in the humanities;
  2. Preliminary exploration techniques using Voyant;
  3. Basic issues in choosing, compiling, and preparing a text corpus;
  4. Text mining to identify themes in large corpora;
  5. Ludic tools and speculative representations of texts; and
  6. Integrating tool results into digital scholarship.

This year’s workshop will pay special attention to certain multilingual issues in text analysis, such as character encoding, word segmentation, and available linguistic functionality for different languages. The instructors will present in English, but can also present or answer questions in French and Italian.

AUDIENCE: This is intended as an introduction to text analysis and visualization. We hope for an audience with a range of interests and relevant competencies. In the past we have had 20 to 25 participants, which works well with two workshop leaders. Participants are expected to bring their own laptop and are encouraged to bring their own texts.

LENGTH: half-day

WORKSHOP LEADERS: Stéfan Sinclair (McGill) and Geoffrey Rockwell (Alberta)

REQUIREMENTS: Participants are expected to have their own laptop.

The exact date and time of the workshop are yet to be determined.

Voyant Workshop at DH2012 in Hamburg

 Uncategorized  Comments Off
Feb 042012
 

Voyant Tools   Introduction to Distant Reading Techniques with Voyant Tools, Multilingual Edition

You have a collection of digital texts, now what? This workshop provides a gentle introduction to text analysis in the digital humanities using Voyant Tools, a collection of free web-based tools that can handle larger collections of texts, be they digitized novels, online news articles, twitter feeds, or other textual content. This workshop will be a hands-on, practical guide with lots of time to ask questions, so participants are encouraged to bring their own texts. In the workshop we will cover the following:

  1. A brief introduction to text analysis in the humanities;
  2. Preliminary exploration techniques using Voyant;
  3. Basic issues in choosing, compiling, and preparing a text corpus;
  4. Text mining to identify themes in large corpora;
  5. Ludic tools and speculative representations of texts; and
  6. Integrating tool results into digital scholarship.

This year’s workshop will pay special attention to certain multilingual issues in text analysis, such as character encoding, word segmentation, and available linguistic functionality for different languages. The instructors will present in English, but can also present or answer questions in French and Italian.

AUDIENCE: This is intended as an introduction to text analysis and visualization. We hope for an audience with a range of interests and relevant competencies. In the past we have had 20 to 25 participants, which works well with two workshop leaders. Participants are expected to bring their own laptop and are encouraged to bring their own texts.

LENGTH: half-day

WORKSHOP LEADERS: Stéfan Sinclair (McGill) and Geoffrey Rockwell (Alberta)

REQUIREMENTS: Participants are expected to have their own laptop.

The exact date and time of the workshop are yet to be determined.

Introducing Voyant RezoViz

 Uncategorized  Comments Off
Nov 302011
 

Voyant RezoVizGeoffrey Rockwell and I are delighted to welcome the newest member of the Voyant Tools family: RezoViz, a network visualization interface. RezoViz is actually an adaptation of the Halfviz example from the arbor.js library by Christian Swinehart. There is a dizzying number of graphing libraries out there, but we wanted to work with one that was reasonably efficient for larger datasets, HTML5-based (not Flash), easily extensible, and with an open-source license – arbor.js fits all of these conditions, and I especially liked the built-in editor of Halfviz.

Here are some of the more significant modifications that were made to RezoViz based on the Halfviz code:

  • hovering over labels changes their colour
  • labels that are linked also change colour, with little badges that indicate a value
  • labels are drawn “above” the network lines to make them easier to read
  • there’s an option to specify the maximum number of labels to show
  • there is a search bar that produces results in the graph as you type
  • edge (line) thickness and opacity are calculated dynamically based on relative values

Want to see it in action? Try the Humanist archives demo (that’s still a work in progress). I’ll post a link to the GitHub repository for the code soon. Keep posted as well for more news of full integration with Voyant Tools.

Introducing Voyant RezoViz

 Uncategorized  Comments Off
Nov 302011
 

Voyant RezoVizGeoffrey Rockwell and I are delighted to welcome the newest member of the Voyant Tools family: RezoViz, a network visualization interface. RezoViz is actually an adaptation of the Halfviz example from the arbor.js library by Christian Swinehart. There is a dizzying number of graphing libraries out there, but we wanted to work with one that was reasonably efficient for larger datasets, HTML5-based (not Flash), easily extensible, and with an open-source license – arbor.js fits all of these conditions, and I especially liked the built-in editor of Halfviz.

Here are some of the more significant modifications that were made to RezoViz based on the Halfviz code:

  • hovering over labels changes their colour
  • labels that are linked also change colour, with little badges that indicate a value
  • labels are drawn “above” the network lines to make them easier to read
  • there’s an option to specify the maximum number of labels to show
  • there is a search bar that produces results in the graph as you type
  • edge (line) thickness and opacity are calculated dynamically based on relative values

Want to see it in action? Try the Humanist archives demo (that’s still a work in progress). I’ll post a link to the GitHub repository for the code soon. Keep posted as well for more news of full integration with Voyant Tools.

Nov 152011
 

Makers and Coders McGill (MC²) is one of the new Digital Humanities initiatives that we’ve started this year. It’s a complement to the the Digital Humanities Reading Group, which is best thought of as a book club (or more accurately article and blog club) for DH enthusiasts and the DH-curious (a term that has resonated a lot in the group). Whereas the DH Reading Group is about reading and discussion, MC² is much more about doing stuff, running the gamut from coding to fabrication. Attendance in both groups has been strong and the diversity of perspectives (traditional humanities, libraries, social sciences, music, etc.) has been very stimulating.

During the last MC² meeting we agreed that we would experiment with data aggregated by Montréal Ouvert, an initiative to promote open access to a range of municipal data from Montreal (similar open data initiatives exist in Canada and elsewhere). My usual research doesn’t much involved the use geographical data and I was keen to get my hands dirty and learn some new stuff; did I ever. I knew there was a variety of APIs and web services that would help us, though I hadn’t anticipated how quickly we would be able to create and play with a map, especially given data that wasn’t especially intended for mapping purposes. Much of the geo-coding magic was accomplished by BatchGeo, a service suggested by fellow MC² participant Renee Sieber.

The first step was to consult a list of data sources from the city of Montreal, compiled on the Actions page of Montréal Ouvert. The formats here vary, but it seemed preferable to start with something in XML – I was tempted by the bike-sharing Bixi data, but those were already geo-encoded (with latitude and longitude values), and I wanted more of a challenge. I opted for the “Health Inspection Infractions” from 2010, thinking it might be interesting to see what neighbourhoods had the most restaurants and other establishments that had been fined (for this quick and dirty experiment I didn’t correlate population, total number of restaurants, income levels or any other data that would probably be relevant if I were doing anything more than a quick experiment).

Once the source was chosen, I downloaded the XML file. I knew that BatchGeo required tabular data as presented in a spreadsheet, and since the XML file had a simple structure, I could do a quick search for a free, web-based XML to CSV converter like the one at Luxon Software. I uploaded the XML file to this service, and presto! downloaded at CSV file. I could now import this file into a spreadsheet program like Google Spreadsheets.

On the left is the source XML document (as viewed in a browser) and on the right is the converted spreadsheet data imported from comma separated values (as viewed in Google Spreadsheets).

Once in the spreadsheet, I could select on the content, copy it into the clipboard, and then paste it into the box in BatchGeo. Like magic, the table redraws itself with nicely formatted data. I still needed to set the relevant options, including to specify columns for the city, address, theme, and title. The final step was to hit the “Make Google Map” button and watch as the geo-coding was performed (BatchGeo was assigning longitude and latitude numbers based on the addresses provided). After about a minute, ding! there was a fully baked map:

The map generated by BatchGeo using the Health Infractions data in the spreadsheet. Click on the image to view the live map.

So, XML to CSV, CSV to BatchGeo to add geo-location data, and there we have a map. An amazing transformation from static XML data to an interactive map. Yes, it’s simplistic, but that’s the point: you can easily create and play with maps.

Given how unexpectedly quickly this went, I started looking for some alternatives. Again Renee introduced me to something new, the ability to import XML into a Google Spreadsheet, using a custom XPath query to define the values of each column. My first attempt at this actually failed, I suspect because the source XML document is wrongly declared by the server to be HTML, not XML. A quick posting of the XML file on my server allowed me to continue. Such are the joys of working with data in the wild – one must often use duct tape to make things work properly – in fact, I think that recognizing little problems along the way and figuring out how to resolve them is the essence of digital humanities – very little of interest works properly out of the box.

Anyway, now I could proceed with my importing of data into a new worksheet, using importXML.

The importXML function allows me to specify a URL source and an XPath query for each column.

I like my scripting languages as much as the next DHer, but Look Ma! No programming! Now that was a fun MC² meeting.