Data as a Kandinksy Painting

I just found this package for R, ‘Kandinsky‘. You can read the logic of what it does here.

I’m totally into representing data as art, so I thought I would feed all 900+ annotations my ‘Crafting Digital History’ class is making across the web through it

  • Grab all the annotations using Lincoln’s ‘Hypothesisr‘ package.
  • Turn that into tidy data:
word_counts <- documents %>%
  group_by(user) %>%
  unnest_tokens(word, text) %>% 
  count(user, word, sort = TRUE) %>%
  • feed word_counts into kandinsky

et volia:


Now, let’s visualize the stopwords. I also add some custom stopwords to that list (things like ‘digital’, ‘historian’ etc, given nature of the course). Ecco:

There is something extraordinarily satisfying about those two images. The first captures the entire universe of possible responses that my students are making. In the second, that purple circle seems to my mind to correspond with the normal stopwords and the squiggles my additions. Let us know subtract the second from the first:

Interesting, this visualization of what remains after the stopwords are applied…

I can also do some other fun things with my annotations, such as term frequency – inverse distribution frequency to find out what words tend to characterize which students’ annotations. As a Kandinsky painting:

Let’s paint our feelings – here’s the sentiment of the annotations (‘affin’):

And here’s the same data again, but sorted from most positive to most negative:

Finally, let’s finish off with a topic model and then the top terms from the topic model:

Data is beautiful.

What does it mean? Well, that might take another post or two. Maybe the meaning would emerge if I also sonified, or 3d printed, this data. If we use the full sensorium…

Middle School Science Teachers

July 26 2017 to August 25 2017
Jobs and Fellowships
At Ascend, we seek middle school science teachers who are passionate about their subjects and their students – who want to spend their time discussing, preparing, and immersing themselves in the content they're about to teach. We value teachers who truly listen to what students are saying – who...
Brooklyn, NY 11212
United States

Founding U.S. History Teacher

July 26 2017 to August 25 2017
Jobs and Fellowships
At Ascend, we seek high school teachers who are passionate about their subjects and their students – who want to spend their time discussing, preparing, and immersing themselves in the content they're about to teach. We value teachers who truly listen to what students are saying – who create a...
Brooklyn, NY 11212
United States

What can digital humanities tell us about Character?

My part of the collaboration with James has been thinking through what this text has to tell us about “Character” as a literary category and to consider how digital tools can help modern users interact with eighteenth-century characters.

There’s been a learning curve for me as I find out more and more about what digital formats can and can’t do. I think my biggest challenge has been learning to think about digital material spatially—in order for something to exist in our final product we have to think about where it goes and how to attach it. Our original plan was to preserve every page in three separate files—one with the image of the text, one with a transcription of the text, and a third that contained commentary for that page. The hope was that we could sync every file by line and thus create a no frills edition that could be accessible and transparent for all users.

We’ve been forging full steam ahead with the transcriptions, and I’ve learned a great deal about how to preserve physical features on page in a digital translation. I began to realize that I think of character conceptually, not spatially, and thus finding a way to break down what this text can tell us about Character by page began to seem less and less feasible—let alone breaking it down by line! A line by line commentary is useful to explicate specific things in the text—allusions that would escape a twenty first century reader, say, or translating Latin phrases into English. Each of these things occur at a specific place in the text, and are thus well suited to line by line annotation. We’ve shied away from doing that kind of annotation—not because it’s not useful, but because it’s already been done, and done well, first by Robert Thyer for the 1759 edition and for modern audiences by Charles Daves in 1970.

Butler’s work is a collection of Theophrastan Characters—a genre of writing that enjoyed a revival when Butler was writing in the late seventeenth century, but which had fallen out of fashion by the time the collection was published posthumously in 1759. Theophrastan Characters are an odd genre. They break down characters into general “types” and give a description that ostensibly describes every person that falls under that category. For instance, when Butler writes about “An Amorist” that “His Passion is as easily set on Fire as a Fart, and as soon out again.” We are meant to assume that 1) this is true of all Amorists, and 2) if we ever meet somebody whose passion is, err, easily stirred and just as quickly extinguished, that person is an Amorist.

We’re used to breaking down literary characters into round and flat characters, or individuals and types. Theophrastan Characters dwell completely on the side of types, which, when you think about it is kind of nuts. We tend to think of people specifically, not generally. If I were to ask you to imagine a lawyer, you would probably think of a lawyer you know, or a famous lawyer you’ve seen in the news or in pop culture—Elle Woods, say, or Johnny Cochran. But Butler asks us to imagine a generic lawyer, someone whose “Opinion is one thing while it’s his own, and another when it is paid for,” a figure who represents all lawyers everywhere. This is familiar to us when we think about type—who doesn’t love a good lawyer joke? But it’s strange when we consider this figure as a “Character.” In literature, even type characters require a modicum of specificity, which is dictated by their literary surroundings. When a lawyer appears in Bleak House, even though that lawyer is just a flat, type character, we still imagine a single figure in Chancery during 1852 litigating Jarndyce vs Jarndyce; it could not be Elle Woods, or Johnny Cochran or your college friend who went to law school. But Butler’s characters are devoid of context—his lawyer is at once every lawyer and no lawyer at all.

I’m hoping this project will be able to tell us two things. First, what tools do you use to create a general character? Just a surface read through shows us that Butler seldom uses traits or characteristics to describe his characters—they’re too individualizing. Instead he writes largely with metaphors. An Amorist is “like an Officer in a corporation” and a Lawyer is “like a French duelist”—which of course begs the question, what are the officers of corporations and French duelists like? Are there other devices that Butler uses? Does he use the same devices for every character? My plan is to run the text through Stylo to see if we can learn anything about how Butler creates his types.

Second, what will it take to find examples of Butler’s characters? What does it take to fit a specific person into a general description? Could we argue that perhaps Butler is describing Johnny Cochran, even if he is not describing Elle Woods? How would we show that Cochran fits into Butler’s category? By looking at what he’s done? How he acts? Who he is? Leaving aside lawyers, would we be able to find examples of Henpect Men or Fifth Monarchy Men in today’s world—or are types too dependent on their political and cultural context to translate?

Now that we have a good number of transcriptions we can begin to create a corpus, which I hope will be able to answer some of these questions.

Uh oh

I’ve taught ‘Crafting Digital History’ twice before. Once as a face to face course complete with lectures and in-class exercises, and once as a fully online course. The workbook now approaches 200 pages when it is printed out. One takeway from the 2016 edition was that I didn’t want to be writing tutorials and supporting students across multiple operating systems.

Especially Windows. Windows drives me up the freakin’ wall.

Because I also like to learn, and I’m trying to push for reproducibility as a goal in digital history (of methods at least, and re-visiting of conclusions) and in digital archaeology, I had it in mind for some time that some sort of virtual machine would be great. Everybody would be on the same platform. I would only have to write one set of materials. But experiments with virtual machines kept throwing up the same issues of getting the damned machine installed and configured correctly across multiple operating systems. I especially loathe those back-to-school specials with 2 gb that so many of my students seem to have (if you only do a bit of wordprocessing and facebook, good enough I suppose).

Enter DHBox.

I love DHBox. I love the concept. I love the philosophy of openness baked in. I decided ‘go big or stay home’ and so I rewrote the 2017 version of the course to use DHBox nearly exclusively. And up until about, oh, 11.30 last night, things were going great.

A troubling error message, but not the end of the world. We had already increased the amount of memory allocated to our DHBox twice already (we have it installed on top of an stack). Earlier, in the run up to the course, we tried to estimate how much memory the students would need. I wanted the students to work with real digitized materials that hitherto had not attracted any attention – the Shawville Equity’s print run from 1883-2010. I figured I could teach them how to use wget to download this stuff, and then in the next module I’d teach them various ways of looking at it, exploring it, extracting interesting stuff from it. Earlier, I’d also taught them how to use Twarc to download materials from Twitter, suggesting they use the ‘canada150′ hashtag (Non-Canadians: it’s 150 years from Confederation, whence sprungeth modern-ish Canada).


Being only a few weeks from the official day of celebrations (July 1) meant that there were, oh I don’t know, hundreds of thousands of tweets with that tag available via Twarc. Multiply by # of students.

Number of editions of the Equity available for download: 1595. Each one between 8 and 20 high-rez pages. Even though I asked the students to only download a few years’ worth, multiply by malformed Wget and/or processes left running…. (I had shown them and walked them through how to identify and kill running processes when necessary, but alas…)

And so I sent a call out to Andrew who has been supporting this class above and beyond the call of duty. He’s on vacation. But he tried to help me out regardless, and set things in motion to increase our memory allocation. Unfortunately, we’d clogged the pipes so badly that this process has itself gone sideways in ways that I am unable to explain (server-side stuff ain’t my bag, as Austin Powers might say).

And so we are currently DHBox-free. While this has caused me a mild heart-attack, it’s not really as bad as it might first seem. I still have all of my materials written from last year where I was supporting individual operating systems, so I just dusted that off (thank you, O Github repository) and gave it to the students who needed it.

The only thing that is seriously hurt at this point is my pride, and the loss of some downloaded data. The final projects – where I imagined them all collaborating on different aspects of that particular dataset – will need to be rejigged a bit, but it’s all going to be ok.

It’ll be ok.




Featured Image by Simson Petrol on Unsplash