Sep 122012
 

Alberto Cairo's newly translated book on information graphics, The Functional Art, is a healthy mix of theory and how it applies in practice, and much of it comes from Cairo's own experiences designing graphics for major news publications. (I don't think Alberto remembers, but what seems like many years ago, I sat right behind him for two weeks at the New York Times when they brought him in to help illustrate Raphael Nadal's approach to tennis.)

His experience is hugely important in making the book work. There's a growing number of books on information graphics, and many are written and illustrated by people who don't have much experience displaying information, which leads to art books posing as something else. This isn't one of those books. Cairo knows what he's talking about.

As you flip through, you'll notice a lot of examples, with a focus on process and even a handful of pencil sketches. The last third of the book is interviews with those well-established in the field, which also walks you through how some graphics were made. There's a strong undertone of finding the balance between function (e.g. efficiency and accuracy) and engagement (e.g. use of circles).

Cairo comes from a journalism background, so the book is mostly in the context of presentation, but there's of course plenty that you can apply to more exploratory graphics. I would say though that Cairo's strength is in illustration and information, and so the book reflects that. This isn't a book that covers visual data analysis or statistical concepts, but it is one that explores and describes the making of high quality information graphics that lend clarity to concepts and ideas. If you're looking for the latter, The Functional Art is worth your time.

Check out the sample chapter on the publisher page, but then grab it on Amazon and save a few bucks.

May 272012
 

ScreenShot129

Last week I attended the 29th annual symposium at the Human-Computer Interaction Lab at the University of Maryland. The HCIL is famous for a little thing known as the treemap, created by the founder of the lab, Ben Shneiderman. It's famous for lots of other visualizations and people too, but it's best known for the treemap.

The annual symposium is put on by the lab to showcase it's latest and greatest research. I sometimes forget that HCIL focuses on things other than visualization, so I had to sit, confused, through a few talks before I realized they weren't about visualization ("Where's the viz?" I was thinking). I won't fault them for not being all about dataviz; the Social Network Analysis Strategies for Surviving the Zombie Apocalypse by lab Director, Jen Golbeck, was thoroughly entertaining and insightful work regarding social networks.

HCIL is very kind and generous in that it puts all of its 25+ years of papers and talks online, and many of its projects are open source. You can also go to each individual's page (faculty/student) to find every talk and paper they've completed.

My favorite talks were:

The work coming out of HCIL is inspirational as well as practical. The lab clearly works from the premise that they can have a direct impact on everyday lives in a very meaningful way.

I also have to give a shout out to Justin Grimes, PhD candidate, for giving me a great tour, long walk, and fantastic discussion on the quantified self, quantified babies, and outdated medical devices.

Nov 162011
 

R, the favorite computing language of a growing number of statisticians, is friendly enough that you can get a lot done without being an expert programmer, because there are a lot of packages and built-in functions that can take care a lot of the grunt work for you. Learn how to use a function, prepare your data, and you get some output. However, as you use R more, whether it's for analysis or just for graphics, there comes a point when there isn't a package or function that does exactly what you want.

Norman Matloff's Art of R Programming is for those who want to learn to write their own software in R. This is an R programming book that starts from the beginning — running R, vectors, lists — to the more advanced such as simulations, object-oriented programming, and debugging.

As the back cover says, no statistical knowledge is required to learn from this book, because it really is a book about programming written by a computer science professor (although Matloff is a former professor of statistics).

The technical reviewer and an active member of the R community, Hadley Wickham no doubt provided additional depth to the text. There are lots of examples and explanations to go along with the code.

The graphics chapter is relatively basic, explaining how to produce a few plots, so you won't get a ton out of this part other than a foundation to look into other resources. If you really want to learn the nuts and bolts of graphics in R, Paul Murrell's R Graphics is a good choice (but kind of pricey). And of course Visualize This for more high-level material. I only mention this though because this is a visualization blog — not because I was expecting more about graphics in Art of R Programming.

Here's who Matloff says the book is for:

Many use R mainly in an ad hoc way—to plot a histogram here, perform a regression analysis there, and carry out other discrete tasks involving statistical operations. But this book is for those who wish to develop software in R. The programming skills of our intended readers may range anywhere from those of a professional software developer to "I took a programming course in college," but their key goal is to write R code for specific purposes.

That's about right. If you've never written a line of code, you might find some of the concepts challenging, but if you have at least a vague idea of what programming is, you should find Art of R useful. I'm keeping this one.

An open-source, "rough and partial" draft of the book can be found here [pdf]. The draft has a lot of the same ideas so you can get a good idea for what Art of R Programming is about, although from a quick scan, the final is more polished and contains more material.

Apr 272011
 

Kimberly (“Kim”) Christer (Department of Critical Culture, Gender and Race Studies, Washington State University), an anthropologist by training, presented “Open and shut: Digital repatriation and the circulation of indigenous knowledge” on 4-14-11 at the University of Washington, Seattle campus. With the support of NEH, she is working on the development of the “Mukurtu software tool” (http://www.mukurtuarchive.org/documentation.html), a user-customizable tool for the creation of archives.

read more

Apr 262011
 

R can be confusing when you're first starting out, especially when you don't have any experience in programming. There's a lot of documentation online, and package developers do a decent job at providing examples on how to use their work in your code, but that stuff is not always easy to find. It's easy if you know the name of the package or function you're looking for. However, most of the time you just know what you want to do—like sort a data frame or test a regression model—and not the name of a package.

The R Cookbook by developer Paul Teetor might be your answer.

Overview

From the book description:

This collection of concise, task-oriented recipes makes you productive with R immediately, with solutions ranging from basic tasks to input and output, general statistics, graphics, and linear regression.

There are 14 chapters and around 400 pages. One third covers the basics of R, such as setting variables; one half covers analysis tools such hypothesis testing and linear regression; the rest covers miscellaneous topics. There's one chapter on creating graphics.

The Insides

Those who have used other O'Reilly cookbooks will recognize the format right away. Sections are organized by recipe with a problem, solution, and discussion. Most recipes are pretty short, around one or two pages.

In short, R Cookbook is basically what I expected. This is a good thing, as O'Reilly cookbooks are usually pretty useful. The recipes are straightforward to follow and the text is an easy read. There's some discussion at the beginning of each chapter about statistical methods, such as what p-values mean, but don't expect a full-on guide on statistical analysis (not that it claims to be one).

What this book will provide are steps that can help you with the early stages of getting up and running to the more advanced functions for probability, general statistics, and time series analysis.

Bottom Line

For those who already use R, the R Cookbook can be a handy reference when Google lets you down. I imagine those who are familiar with statistical methods but use different software like SAS will also find this useful. If, however, you're looking for a book that's more visualization-based or you're new to statistics, you will probably want to look elsewhere.

[Amazon Link]

Mar 092011
 

I finally got a chance to take a closer look at O'Reilly's most recent edition to their "Beautiful" series, Beautiful Visualization: Looking at Data through the Eyes of Experts, and it's a good one. In case you're not familiar, each book in the series is a collection of essays from people who work in the field. Essays range in topic, but they usually focus on a single project and discuss the steps it took to make said project. To be clear, Beautiful Visualization isn't a how-to book, although you can learn a lot from the writings.

First Impressions

When I first received Beautiful Visualization in the mail and opened up the package, it was smaller than I expected. Height and width are the same as the previous Beautiful Data, but it's about a third smaller in depth. However, I realized that's just because they use a different kind of paper (without feeling cheap) and it actually has about twenty more pages than Beautiful Data, coming in at just under 400. All of the images are in full color and big enough so that you can make out the details.

The Authors

There are 20 chapters, or essays rather, by 24 authors. The lineup will tip you off on what you're in for, and many of the names will sound familiar, as they've been mentioned on FD on more than several occasions. There were a few names on the list that I didn't recognize, but it's clear that everyone enjoys what they do, and more importantly, enjoy playing with data.

Noah Illinsky, who edited the book with Julie Steele, sets the stage with the first chapter on what is meant by beautiful visualization. Then the rest of the essays get into specific projects and datasets.

For example, Jer Thorp, who was the data artist in residence at the New York Times over the summer, goes into how he made use of the New York Times API.

Aaron Koblin, whose work below we all know I am sure, dives into Flight Patterns and how he got into exploring 24 hours worth of flight data.

Robert Kosara explains the process of parallel sets.

Moritz Stefaner explored submissions to the prix ars electronica over several decades.

Maximilian Schich uses data matrices to uncover patterns in heterogenous data.

Martin Wattenberg and Fernanda Viegas also describe History Flow, their visualization to show Wikipedia edits, Jessica Hagy briefly covers her index cards, and several others go into detail about how they did things.

Learning About Process

While visualization can get very technical, the authors do a good job of keeping things abstract enough so that you know what they're talking about even if you're not particularly experienced in the field. They provide enough detail though that it's still interesting for others.

A lot of people who are interested in visualization think that's it's a matter of learning a bunch of tools, but there's a lot more to it than that. You're also learning about data, and learning what questions to ask, and if you don't know what questions to ask, you just end up with visualization that doesn't really mean anything. Design also plays a role in in conveying the message you want. So it's great that there's a resource that can help you get into the experts' heads.

If anything, it's just fun to read about the process of how a graphic or tool gets made. For example, Jonathan Feinberg, who designed the ever popular Wordle, explains what went into the work. Some people like to knock it, but he knows plenty well that the stylized word clouds aren't the best way to visualize data or extract information, or whatever.

Bottom Line

I'll tell you what this book isn't. It's not a how-to book. It's not a showcase book with screenshots of a bunch of out-of-context projects. Rather, Beautiful Visualization tells you how some well-known visualizations were made. I do wish there was at least one essay from a pure statistician like Di Cook, but other than that, the author group is a good one. All in all, it's a good read with interesting subject matter. Thumbs up.

[Amazon Link]

Dec 282010
 

The current issue of Computational Linguistics includes a review (by Eric J. M. Smith) of Vladimir Pericliev's book Machine-Aided Linguistic Discovery: An Introduction and Some Examples. The review gives a quick overview of the problems that Pericliev approaches and the techniques he applies.

Nov 292010
 

Snow's Broad Street Map (detail)

John Snow's map of the cholera dead after London's 1854 epidemic is often heralded as one of the earliest examples of graphical data analysis. Steven Johnson's The Ghost Map gives a lot of background about the London of the 1850s, Snow's work, and how central the map really was.

London and the Cholera

John Snow actually made his name by refining the fledgling field of anesthetics using chloroform and ether from a hit-and-miss operation to a reproducible science. He did that based on then relatively recent research on the behavior of gases and how it depends on temperature (which is important for dosage).

Snow had developed and published a theory of how the cholera spreads through water, but had not been able to convince the medical establishment at the time. He was already working on a major study of the water supply in South London and its impact on people's health (some of that water came from clean sources, some of it from the polluted Thames downstream from London's core) when people started dying rapidly on and around Broad Street, close to where he lived. That such a forceful outbreak occurred so close to him was coincidence, but he was prepared and saw the chance to apply and prove his theory.

It is easy today to look down on theories about disease like the miasma (essentially the smell of human waste), but Johnson paints such a vivid picture of the smells, the living conditions (London at the time had a greater density of people than Manhattan has today), and the general lack of knowledge about what caused disease, that it becomes possible to appreciate how difficult it was to break out of that kind of thinking.

In addition to Snow, Johnson gives a lot of credit to Rev. Henry Whitehead, who was the parish priest of the area. While initially skeptical, Whitehead was open to evidence and changed his mind when Snow presented him with a clearly argued case for cholera's waterborne nature. Whitehead then helped track down many of the survivors who had fled, to show that they had not drunk from the tainted well (an important part of the overall argument), and even looked for and found the index case that had started the entire outbreak.

The Map

Given the title of the book, I expected to read more about the famous map. But it turns out that not only was the map not a crucial analysis tool, others had actually drawn similar maps of the outbreak before Snow. For all we know, Snow only drew a map long after the end of the outbreak to illustrate his argument in a report and in the second edition of his monograph on the cholera. The famous map served as a communication tool, rather than as a way to discover the cause of the outbreak.

The book demonstrates in great detail the many steps it took for Snow to come up with his theory, collect evidence, perform experiments, etc., in order to get to the point where he knew what he was looking for when the 1854 outbreak occurred. Snow's great achievement is not merely drawing a map, but developing a whole new way of thinking about how disease spreads. The map was a tool, and it helped him make his point (especially the version he drew that included a kind of Voronoi diagram to show who lived closer to which water pump).

The Book

Several times, Johnson argues that the ability to understand the spread of disease, and ability to fight it, is key to building dense, large-scale cities that house millions of people. While I don't disagree on that point, he takes that way too far, away from John Snow and cholera to biological and nuclear terrorism. The conclusion already makes that point and gets a bit repetitive. But the epilogue is where it really starts hurting the book. Johnson seemingly tries to make the case for he importance of what happened in the 1850s in London for the present and the future. But it ends up feeling forced and unnecessary. I would recommend skipping the epilogue entirely.

Besides a bit of repetitiveness here and there, as well as the epilogue, it's an excellent book. Johnson is a fantastic writer, and clearly understands the history as well as the science of what he's writing about. In an appendix, he explains his use of historic materials (dialogue is only used where he has sources), and what freedoms he took in writing the story. If you care about context, not just a single map on a pedestal, this book will give you a lot of insight into the world it came from, and the revolution in thinking it embodies.


Steven Johnson, The Ghost Map. Riverhead Trade, 320 pages (paperback, also available as hardcover, audiobook, and on Kindle), 2007.

Johnson has a website with several videos of him talking about topics from the book. A fantastic source on the map is also Ralph R. Frerichs's website on John Snow, despite the rather horrific navigation (the little squares are the links, not the text).

Mar 192010
 

I move between paper and digital representations fairly seamlessly in my pedagogical world.  However, this is not always true with students. This morning, my English 10 TechnoLiterature students presented an intriguing and, well, quite frankly, a proud moment for me. This is a General Education course intended for non-English major Frosh and Sophomores. It fulfills a literature area that is required of all SJSU students, and this was my first time teaching it.  Really, it’s titled Great Works of Literature, and being the rabble rouser that I am, I thought we might do some “great” works that were “great” for different reasons: “Prometheus” (Byron), Frankenstein, “Rime of the Ancynt Mariner,” Dr. Jekyll and Mr. Hyde, A Clockwork Orange, The Passion of New Eve, all during the first part of the semester. After Spring Break, we’ll tackle cyberpunk (Mirrorshades), some general poetry (TBD), a little ELO Vol. I and round it all up with Patchwork Girl.

Today, we concluded talks about The Passion of New Eve. Admittedly, it’s a difficult book for non-English majors because of Carter’s fluid style and absurdly dystopic themes. We’re reading it because it represents a moment in biotechnology that is not realistic, even now. They hated it but in a good way.

That’s beside the point, though. A duo presented us with facts, information, themes and relevant allusions about the novel today. Surprisingly, when it came to provide reception and readers’ responses to this novel, the dynamic duo turned to the source that was most familiar:  AMAZON! They quoted extensively from readers’ reports and noted that some anonymous reviewers were required to read the book for a class and therefore provided a brief, acerbic review. Other reviewers chose the book for its ugly cover and found it to their liking. Yet others came to it because they enjoyed Carter.

I’m most impressed because the presenters came to this on their own. And they didn’t just give us the words of the reviewers; they critiqued the reviews.

This is an uncanny moment: Last week, the Stanford BeyondSearch research group (run by Matt Jockers & Franco Moretti) heard from doctoral student, Ed Finn, who is working on this idea that Amazon reviews are altering the reading practices (or commodification, anyway) of reading.  Interesting discussion which provided this uncanny moment today.