<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>2cultures.net(.au) &#187; Matthew Wilkens</title>
	<atom:link href="http://www.2cultures.net/author/matthew-wilkens/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.2cultures.net</link>
	<description>Humanities + Computing</description>
	<lastBuildDate>Sat, 19 May 2012 07:01:25 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Fish’s Object</title>
		<link>http://mattwilkens.com/2012/01/23/fishs-object/</link>
		<comments>http://mattwilkens.com/2012/01/23/fishs-object/#comments</comments>
		<pubDate>Tue, 24 Jan 2012 04:37:57 +0000</pubDate>
		<dc:creator>Matthew Wilkens</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[Meta]]></category>

		<guid isPermaLink="false">https://workproduct.wordpress.com/?p=878</guid>
		<description><![CDATA[Stanley Fish has a piece in the New York Times today that makes some use of my contribution to Debates in the Digital Humanities. The DH Debates collection isn&#8217;t online yet, but similar work of mine can be found in Post45 and (with updates) in the proceedings of the Chicago Colloquium on Digital Humanities and [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&#38;blog=5042818&#38;post=878&#38;subd=workproduct&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Stanley Fish has a <a href="http://opinionator.blogs.nytimes.com/2012/01/23/mind-your-ps-and-bs-the-digital-humanities-and-interpretation/">piece</a> in the <em>New York Times</em> today that makes some use of my contribution to <a href="http://www.upress.umn.edu/book-division/books/debates-in-the-digital-humanities"><em>Debates in the Digital Humanities</em></a>. The <em>DH Debates</em> collection isn&#8217;t online yet, but similar work of mine can be found in <a href="http://post45.research.yale.edu/archives/574"><em>Post45</em></a> and (with updates) in the proceedings of the <a href="http://post45.research.yale.edu/archives/574">Chicago Colloquium on Digital Humanities and Computer Science</a> (PDF).</p>
<p>Jeremy Rosen anticipated most of what Fish says in his lengthy <a href="http://post45.research.yale.edu/archives/1805">response</a> to the <em>Post45</em> essay. My <a href="http://post45.research.yale.edu/archives/1944">reply to Rosen</a> probably works equally well as a response to Fish.</p>
<p>Here I&#8217;ll only add that while I appreciate the attention, I have my doubts about Fish&#8217;s sincerity when he proposes to defend the pursuit of authorial intent (in Milton, no less!).</p>
<br />Filed under: <a href='http://mattwilkens.com/category/digital-humanities/'>Digital Humanities</a>, <a href='http://mattwilkens.com/category/meta/'>Meta</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/workproduct.wordpress.com/878/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/workproduct.wordpress.com/878/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/workproduct.wordpress.com/878/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/workproduct.wordpress.com/878/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/workproduct.wordpress.com/878/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/workproduct.wordpress.com/878/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/workproduct.wordpress.com/878/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/workproduct.wordpress.com/878/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/workproduct.wordpress.com/878/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/workproduct.wordpress.com/878/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/workproduct.wordpress.com/878/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/workproduct.wordpress.com/878/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/workproduct.wordpress.com/878/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/workproduct.wordpress.com/878/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&amp;blog=5042818&amp;post=878&amp;subd=workproduct&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mattwilkens.com/2012/01/23/fishs-object/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://1.gravatar.com/avatar/12b5810f600c29cd1608ad22d2890148?s=96&amp;amp;d=identicon" length="" type="" />
		</item>
		<item>
		<title>A Wee Debate in Post45 Contemporaries</title>
		<link>http://mattwilkens.com/2012/01/02/a-wee-debate-in-post45-contemporaries/</link>
		<comments>http://mattwilkens.com/2012/01/02/a-wee-debate-in-post45-contemporaries/#comments</comments>
		<pubDate>Mon, 02 Jan 2012 20:52:14 +0000</pubDate>
		<dc:creator>Matthew Wilkens</dc:creator>
				<category><![CDATA[digital humanities]]></category>

		<guid isPermaLink="false">https://workproduct.wordpress.com/?p=872</guid>
		<description><![CDATA[Earlier this year, Andy Hoberek published a piece of mine called &#8220;Contemporary Fiction by the Numbers&#8221; in his Contemporaries section of Post45. There&#8217;s now a response up from Jeremy Rosen and a reply from me. The substance of the thing concerns the best uses of computational methods in literary and cultural studies. Mostly, though, it&#8217;s [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&#38;blog=5042818&#38;post=872&#38;subd=workproduct&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Earlier this year, <a href="https://www.facebook.com/andrew.hoberek">Andy Hoberek</a> published a piece of mine called &#8220;<a href="http://post45.research.yale.edu/archives/574">Contemporary Fiction by the Numbers</a>&#8221; in his <a href="http://post45.research.yale.edu/sections/contemporaries">Contemporaries</a> section of <a href="http://post45.research.yale.edu/">Post45</a>. There&#8217;s now a <a href="http://post45.research.yale.edu/archives/1805">response</a> up from Jeremy Rosen and a <a href="http://post45.research.yale.edu/archives/1944">reply</a> from me. The substance of the thing concerns the best uses of computational methods in literary and cultural studies.</p>
<p>Mostly, though, it&#8217;s good to have another excuse to mention Post45 in general and Contemporaries in particular. They&#8217;re on my own required reading list.</p>
<br />Filed under: <a href='http://mattwilkens.com/category/digital-humanities/'>Digital Humanities</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/workproduct.wordpress.com/872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/workproduct.wordpress.com/872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/workproduct.wordpress.com/872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/workproduct.wordpress.com/872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/workproduct.wordpress.com/872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/workproduct.wordpress.com/872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/workproduct.wordpress.com/872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/workproduct.wordpress.com/872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/workproduct.wordpress.com/872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/workproduct.wordpress.com/872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/workproduct.wordpress.com/872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/workproduct.wordpress.com/872/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/workproduct.wordpress.com/872/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/workproduct.wordpress.com/872/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&amp;blog=5042818&amp;post=872&amp;subd=workproduct&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mattwilkens.com/2012/01/02/a-wee-debate-in-post45-contemporaries/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://1.gravatar.com/avatar/12b5810f600c29cd1608ad22d2890148?s=96&amp;amp;d=identicon" length="" type="" />
		</item>
		<item>
		<title>Books I Read in 2011</title>
		<link>http://mattwilkens.com/2012/01/02/books-i-read-in-2011/</link>
		<comments>http://mattwilkens.com/2012/01/02/books-i-read-in-2011/#comments</comments>
		<pubDate>Mon, 02 Jan 2012 20:37:47 +0000</pubDate>
		<dc:creator>Matthew Wilkens</dc:creator>
				<category><![CDATA[Literature]]></category>

		<guid isPermaLink="false">https://workproduct.wordpress.com/?p=869</guid>
		<description><![CDATA[As I did last year and the year before, here&#8217;s a list of books I read for the first time in 2011. Mostly confined to fiction, but including two popular-academic books that I (uncharacteristically) read from cover to cover. Adichie, Chimamanda Ngozi. Half of a Yellow Sun (2008). Aira, César. The Literary Conference (2010). Calvino, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&#38;blog=5042818&#38;post=869&#38;subd=workproduct&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>As I did <a href="https://workproduct.wordpress.com/2011/01/05/books-i-read-in-2010/">last year</a> and <a href="http://workproduct.wordpress.com/2009/12/24/books-i-read-in-2009/">the year before</a>, here&#8217;s a list of books I read for the first time in 2011. Mostly confined to fiction, but including two popular-academic books that I (uncharacteristically) read from cover to cover.</p>
<ul>
<li>Adichie, Chimamanda Ngozi. <em>Half of a Yellow Sun</em> (2008).</li>
<li>Aira, César. <em>The Literary Conference</em> (2010).</li>
<li>Calvino, Italo. <em>Invisible Cities</em> (1978).</li>
<li>Carson, Anne. <em>Autobiography of Red</em> (1998).</li>
<li>DeLillo, Don. <em>Libra</em> (1988). [Ducks head in shame.]</li>
<li>Egan, Jennifer. <em>A Visit from the Goon Squad</em> (2010).</li>
<li>Graeber, David. <em>Debt: The First 5,000 Years</em> (2011).</li>
<li>Johns, Adrian. <em>Piracy: The Intellectual Property Wars from Gutenberg to Gates</em> (2010).</li>
<li>McCarthy, Tom. <em>Remainder</em> (2007).</li>
<li>Miéville, China. <em>The City and the City</em> (2009).</li>
<li>Millet, Lydia. <em>Oh Pure and Radiant Heart</em> (2005).</li>
<li>O’Brien, Tim. <em>In the Lake of the Woods</em> (1994).</li>
<li>Sayles, John. <em>A Moment in the Sun</em> (2011).</li>
<li>Vollmann, William. <em>Europe Central</em> (2005).</li>
<li>Wallace, David Foster. <em>The Pale King</em> (2011).</li>
</ul>
<p>Not a record-breaking effort, I&#8217;d say, but a pretty fun year. I didn&#8217;t get to either Theroux or Esterházy as I&#8217;d hoped, but there&#8217;s always next year, right? Same goes for Dickens &#8212; I picked up and put down <em>Our Mutual Friend</em> a couple of times and keep meaning to go back to it. Oh, and I&#8217;m maybe twenty pages into Arthur Phillips&#8217; <em>The Tragedy of Arthur</em>, which seems nifty so far. I&#8217;ve gotten a couple of other recommendations, but am always happy to have more &#8230;</p>
<br />Filed under: <a href='http://mattwilkens.com/category/literature/'>Literature</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/workproduct.wordpress.com/869/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/workproduct.wordpress.com/869/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/workproduct.wordpress.com/869/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/workproduct.wordpress.com/869/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/workproduct.wordpress.com/869/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/workproduct.wordpress.com/869/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/workproduct.wordpress.com/869/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/workproduct.wordpress.com/869/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/workproduct.wordpress.com/869/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/workproduct.wordpress.com/869/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/workproduct.wordpress.com/869/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/workproduct.wordpress.com/869/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/workproduct.wordpress.com/869/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/workproduct.wordpress.com/869/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&amp;blog=5042818&amp;post=869&amp;subd=workproduct&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mattwilkens.com/2012/01/02/books-i-read-in-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://1.gravatar.com/avatar/12b5810f600c29cd1608ad22d2890148?s=96&amp;amp;d=identicon" length="" type="" />
		</item>
		<item>
		<title>Two Interesting Job Openings</title>
		<link>http://mattwilkens.com/2011/12/26/two-interesting-job-openings/</link>
		<comments>http://mattwilkens.com/2011/12/26/two-interesting-job-openings/#comments</comments>
		<pubDate>Mon, 26 Dec 2011 18:28:31 +0000</pubDate>
		<dc:creator>Matthew Wilkens</dc:creator>
				<category><![CDATA[digital humanities]]></category>

		<guid isPermaLink="false">https://workproduct.wordpress.com/?p=866</guid>
		<description><![CDATA[I&#8217;ve recently received word of two intriguing DH jobs that might be of interest to some readers: The three-year Mark Steinberg Weil Early Career Fellowship in Digital Humanities at Washington University in St. Louis. I was at WashU last year and worked closely with many of the folks involved in this program. It&#8217;s a terrific [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&#38;blog=5042818&#38;post=866&#38;subd=workproduct&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve recently received word of two intriguing DH jobs that might be of interest to some readers:</p>
<ol>
<li>The three-year <a href="http://hdw.artsci.wustl.edu/weilfellowship">Mark Steinberg Weil Early Career Fellowship in Digital Humanities</a> at Washington University in St. Louis. I was at WashU last year and worked closely with many of the folks involved in this program. It&#8217;s a terrific place with great people &#8212; really, one of the best experiences of my academic life. I can&#8217;t endorse it highly enough. And this newly created fellowship is generous indeed.
</li>
<li>A <a href="http://hastac.org/opportunities/associate-director-cdh-south-carolina">Research Assistant Professorship</a> to serve as Associate Director of the <a href="http://www.cdh.sc.edu/">Center for Digital Humanities</a> at South Carolina. An interesting research/admin hybrid at an important DH center.
</li>
</ol>
<br />Filed under: <a href='http://mattwilkens.com/category/digital-humanities/'>Digital Humanities</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/workproduct.wordpress.com/866/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/workproduct.wordpress.com/866/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/workproduct.wordpress.com/866/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/workproduct.wordpress.com/866/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/workproduct.wordpress.com/866/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/workproduct.wordpress.com/866/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/workproduct.wordpress.com/866/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/workproduct.wordpress.com/866/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/workproduct.wordpress.com/866/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/workproduct.wordpress.com/866/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/workproduct.wordpress.com/866/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/workproduct.wordpress.com/866/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/workproduct.wordpress.com/866/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/workproduct.wordpress.com/866/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&amp;blog=5042818&amp;post=866&amp;subd=workproduct&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mattwilkens.com/2011/12/26/two-interesting-job-openings/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://1.gravatar.com/avatar/12b5810f600c29cd1608ad22d2890148?s=96&amp;amp;d=identicon" length="" type="" />
		</item>
		<item>
		<title>Named Localities</title>
		<link>http://mattwilkens.com/2011/09/26/named-localities/</link>
		<comments>http://mattwilkens.com/2011/09/26/named-localities/#comments</comments>
		<pubDate>Tue, 27 Sep 2011 00:07:15 +0000</pubDate>
		<dc:creator>Matthew Wilkens</dc:creator>
				<category><![CDATA[digital humanities]]></category>

		<guid isPermaLink="false">https://workproduct.wordpress.com/?p=856</guid>
		<description><![CDATA[Following on my last post about choropleth maps and regional densities, here are a couple of quick figures showing specific named locations at the city level and below (&#8216;bare&#8217; mentions of nations and regions/states alone are excluded) in the same nineteenth-century literary corpus, scaled by number of occurrences: The biggies are New York, D.C., Boston, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&#38;blog=5042818&#38;post=856&#38;subd=workproduct&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Following on my <a href="http://mattwilkens.com/2011/09/12/density-of-locations-in-u-s-fiction-around-the-civil-war/">last post about choropleth maps and regional densities</a>, here are a couple of quick figures showing specific named locations at the city level and below (&#8216;bare&#8217; mentions of nations and regions/states alone are excluded) in the same nineteenth-century literary corpus, scaled by number of occurrences:</p>
<p><img src="http://workproduct.files.wordpress.com/2011/09/localities-allyears.png?w=480" alt="Localities AllYears" title="Localities AllYears.png" border="0" width="100%" style="float:left;" /></p>
<p>The biggies are New York, D.C., Boston, London, Paris, etc. Compare this to the log version, which seemed more useful in the density case:</p>
<p><img src="http://workproduct.files.wordpress.com/2011/09/localities-allyears-log.png?w=480" alt="Localities AllYears Log" title="Localities AllYears Log.png" border="0" width="100%" style="float:left;" /></p>
<p>Looks to me like the log version is less clear for this type of figure.</p>
<p>A few notes:</p>
<p>1. These figures include all the texts from 1851-75; still working on year-by-year figures and an animation. Won&#8217;t be hard.</p>
<p>2. A couple of things to check out in the near future. (a.) How does the density of named localities compare to that of named regions and nations? Consider Africa in particular, where there&#8217;s decent national density in some cases, but perhaps less geographic specificity. (b.) I need to produce a state-level density map that subtracts some measure of population from the number of named location mentions to get a sense of which states received a disproportionate share of literary attention.</p>
<p>3. These maps were produced using the &#8216;maps&#8217; package in R. Really simple to use. Method cribbed from Nathan Yau&#8217;s <em><a href="http://www.amazon.com/Visualize-This-FlowingData-Visualization-Statistics/dp/0470944889/ref=sr_1_1?ie=UTF8&amp;qid=1317081394&amp;sr=8-1">Visualize This</a></em>.</p>
<p>4. The top few cities:</p>
<table border="0" cellpadding="0" cellspacing="0" style='border-collapse:collapse;table-layout:fixed;'>
<tr>
<td><strong>Place</strong></td>
<td><strong>Count</strong></td>
</tr>
<tr>
<td>New York, NY, USA</td>
<td align="right">9183</td>
</tr>
<tr>
<td>Washington D.C., DC, USA</td>
<td align="right">4179</td>
</tr>
<tr>
<td>Boston, MA, USA</td>
<td align="right">3951</td>
</tr>
<tr>
<td>Paris, France</td>
<td align="right">3312</td>
</tr>
<tr>
<td>London, UK</td>
<td align="right">3279</td>
</tr>
<tr>
<td>Rome, Italy</td>
<td align="right">2154</td>
</tr>
<tr>
<td>Philadelphia, PA, USA</td>
<td align="right">2058</td>
</tr>
<tr>
<td>New Orleans, LA, USA</td>
<td align="right">1580</td>
</tr>
<tr>
<td>Richmond, VA, USA</td>
<td align="right">1152</td>
</tr>
<tr>
<td>Jerusalem, Israel</td>
<td align="right">925</td>
</tr>
<tr>
<td>Charleston, SC, USA</td>
<td align="right">885</td>
</tr>
<tr>
<td>Baltimore, MD, USA</td>
<td align="right">709</td>
</tr>
<tr>
<td>San Francisco, CA, USA</td>
<td align="right">682</td>
</tr>
</table>
<br />Filed under: <a href='http://mattwilkens.com/category/digital-humanities/'>Digital Humanities</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/workproduct.wordpress.com/856/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/workproduct.wordpress.com/856/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/workproduct.wordpress.com/856/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/workproduct.wordpress.com/856/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/workproduct.wordpress.com/856/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/workproduct.wordpress.com/856/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/workproduct.wordpress.com/856/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/workproduct.wordpress.com/856/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/workproduct.wordpress.com/856/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/workproduct.wordpress.com/856/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/workproduct.wordpress.com/856/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/workproduct.wordpress.com/856/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/workproduct.wordpress.com/856/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/workproduct.wordpress.com/856/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&amp;blog=5042818&amp;post=856&amp;subd=workproduct&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mattwilkens.com/2011/09/26/named-localities/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://workproduct.files.wordpress.com/2011/09/localities-allyears.png" length="" type="" />
<enclosure url="http://1.gravatar.com/avatar/12b5810f600c29cd1608ad22d2890148?s=96&amp;amp;d=identicon" length="" type="" />
<enclosure url="http://workproduct.files.wordpress.com/2011/09/localities-allyears-log.png" length="" type="" />
		</item>
		<item>
		<title>Density of Locations in U.S. Fiction around the Civil War</title>
		<link>http://mattwilkens.com/2011/09/12/density-of-locations-in-u-s-fiction-around-the-civil-war/</link>
		<comments>http://mattwilkens.com/2011/09/12/density-of-locations-in-u-s-fiction-around-the-civil-war/#comments</comments>
		<pubDate>Mon, 12 Sep 2011 22:27:00 +0000</pubDate>
		<dc:creator>Matthew Wilkens</dc:creator>
				<category><![CDATA[digital humanities]]></category>

		<guid isPermaLink="false">https://workproduct.wordpress.com/?p=837</guid>
		<description><![CDATA[I&#8217;ve been working recently on different visualizations of the geolocation information I&#8217;ve discussed on a couple of previous occasions. (See posts on the corpus, on method and accuracy, and on an earlier style of mapping.) Here&#8217;s the latest: Below are Google Fusion Tables intensity maps of the distribution of named places in my corpus (1098 [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&#38;blog=5042818&#38;post=837&#38;subd=workproduct&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working recently on different visualizations of the geolocation information I&#8217;ve discussed on a couple of previous occasions. (See posts on <a href="https://mattwilkens.com/2011/06/13/literary-production-around-the-civil-war/">the corpus</a>, on <a href="https://mattwilkens.com/2011/07/08/toponym-resolution-accuracy/">method and accuracy</a>, and on an <a href="https://mattwilkens.com/2011/03/28/maps-of-american-fiction/">earlier style of mapping</a>.)</p>
<p>Here&#8217;s the latest: Below are <a href="http://www.google.com/fusiontables/public/tour/index.html">Google Fusion Tables</a> intensity maps of the distribution of named places in my corpus (1098 volumes of U.S. fiction dating from 1851-75; good but not final data, so don&#8217;t get carried away just yet), aggregated by nation and by U.S. state.</p>
<p><a href="http://www.nd.edu/~mwilkens/Fusion.html"><img src="http://workproduct.files.wordpress.com/2011/09/countries-linear.png?w=480" alt="Countries Linear" title="Countries Linear.png" border="0" width="100%" style="float:left;" /></a><br />
<br /><strong>Named locations aggregated by nation, linear density scale.</strong> <br />(WordPress.com doesn&#8217;t allow embedded iframes; click on this (or any) map to see the live version, which includes raw counts per territory on mouseover.)</p>
<p>This first figure mostly shows that the large majority of named places in books written around the Civil War are located in the United States. But (a.) there&#8217;s a fair amount of international distribution and (b.) there&#8217;s more variation in that international distribution than the shading here reveals. (FWIW, the distribution looks power-law-like, but I haven&#8217;t checked yet.)</p>
<p>For better comparative resolution, we can use log-scaled density shading. Note that this of course flattens the difference between high and low densities, which is why I&#8217;ve included both figures.</p>
<p><a href="http://www.nd.edu/~mwilkens/Fusion.html"><img src="http://workproduct.files.wordpress.com/2011/09/countries-log.png?w=480" alt="Countries Log" title="Countries Log.png" border="0" width="100%" style="float:left;" /></a><br />
<br /><strong>Named locations aggregated by nation, log density scale.</strong> <br />Click for live version.</p>
<p>The log scale brings out a bit better the comparatively high concentrations of named places in western Europe, the Middle East, Russia (who knew?), China, India, Canada, Mexico, Brazil, and Australia. (If I&#8217;m remembering right, Greenland is all Melville. But don&#8217;t quote me on that.)</p>
<p>What about the distribution within the United States? Ask and ye shall receive:</p>
<p><a href="http://www.nd.edu/~mwilkens/Fusion.html"><img src="http://workproduct.files.wordpress.com/2011/09/states-linear.png?w=480" alt="States Linear" title="States Linear.png" border="0" width="100%" style="float:left;" /></a><br />
<br /><strong>Named locations aggregated by state, linear density scale.</strong> <br />Click for live version.</p>
<p>New York, Virginia, and Massachusetts stand out; PA, CA, TX, and LA also have pretty decent numbers. A lot of flattening in this visualization, though, so &#8230;</p>
<p>The log version:</p>
<p><a href="http://www.nd.edu/~mwilkens/Fusion.html"><img src="http://workproduct.files.wordpress.com/2011/09/states-log.png?w=480" alt="States Log" title="States Log.png" border="0" width="100%" style="float:left;" /></a><br />
<br /><strong>Named locations aggregated by state, log density scale.</strong> <br />Click for live version.</p>
<p>Interesting how this shows more clearly the notable density in the south and midwest.</p>
<p>More to come, especially time-resolved series (which should be really useful) and city/POI-level maps.</p>
<p>Two notes in passing:</p>
<p>1. Fusion Tables (the tool) and fusion tables (the output) are really cool. They&#8217;re dead simple; the charts here took about 15 minutes to create once I&#8217;d dumped the relevant data from MySQL. Great for testing and prototyping. But there are limits on what they can do and they&#8217;re not terribly flexible outside the things they&#8217;re built to do. I had to generate the log counts in Excel, for instance, because you can&#8217;t perform computations on aggregated data. (The aggregation itself was totally painless, though, as was the export-import.)</p>
<p>2. I&#8217;ll probably need a different package for the city-level mapping, because fusion tables intensity maps will only show 250 data points at a time. Even in my reduced and cleaned data, I have about 1700 unique locations. Also thinking about exactly how to represent both number of instances (marker size, I think) and time-evolution (maybe something like the <a href="http://projects.flowingdata.com/walmart/">Outbreak-style Walmart map</a> from FlowingData, though I&#8217;d like for my sanity to avoid Flash.)</p>
<br />Filed under: <a href='http://mattwilkens.com/category/digital-humanities/'>Digital Humanities</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/workproduct.wordpress.com/837/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/workproduct.wordpress.com/837/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/workproduct.wordpress.com/837/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/workproduct.wordpress.com/837/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/workproduct.wordpress.com/837/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/workproduct.wordpress.com/837/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/workproduct.wordpress.com/837/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/workproduct.wordpress.com/837/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/workproduct.wordpress.com/837/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/workproduct.wordpress.com/837/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/workproduct.wordpress.com/837/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/workproduct.wordpress.com/837/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/workproduct.wordpress.com/837/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/workproduct.wordpress.com/837/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&amp;blog=5042818&amp;post=837&amp;subd=workproduct&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mattwilkens.com/2011/09/12/density-of-locations-in-u-s-fiction-around-the-civil-war/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://1.gravatar.com/avatar/12b5810f600c29cd1608ad22d2890148?s=96&amp;amp;d=identicon" length="" type="" />
<enclosure url="http://workproduct.files.wordpress.com/2011/09/countries-linear.png" length="" type="" />
<enclosure url="http://workproduct.files.wordpress.com/2011/09/countries-log.png" length="" type="" />
<enclosure url="http://workproduct.files.wordpress.com/2011/09/states-linear.png" length="" type="" />
<enclosure url="http://workproduct.files.wordpress.com/2011/09/states-log.png" length="" type="" />
		</item>
		<item>
		<title>Toponym Resolution Accuracy</title>
		<link>http://mattwilkens.com/2011/07/08/toponym-resolution-accuracy/</link>
		<comments>http://mattwilkens.com/2011/07/08/toponym-resolution-accuracy/#comments</comments>
		<pubDate>Fri, 08 Jul 2011 21:22:51 +0000</pubDate>
		<dc:creator>Matthew Wilkens</dc:creator>
				<category><![CDATA[digital humanities]]></category>

		<guid isPermaLink="false">https://workproduct.wordpress.com/?p=825</guid>
		<description><![CDATA[I just finished a study on the accuracy of automated location identification in nineteenth-century literary texts using the Stanford NLP package (for named entity extraction) and Google&#8217;s geocoding API (for associating location names with lat/lon and other GIS data). The full results will go in the article I&#8217;m currently writing, but here&#8217;s a quick preview [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&#38;blog=5042818&#38;post=825&#38;subd=workproduct&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I just finished a study on the accuracy of automated location identification in nineteenth-century literary texts using the <a href="http://nlp.stanford.edu/">Stanford NLP</a> package (for named entity extraction) and <a href="http://code.google.com/apis/maps/documentation/geocoding/">Google&#8217;s geocoding API</a> (for associating location names with lat/lon and other GIS data). The full results will go in the article I&#8217;m currently writing, but here&#8217;s a quick preview of this piece.</p>
<p>Out of the box, the combination of Stanford NER + Google has precision of about 0.40 and recall of 0.73 on my data (U.S. novels published between 1851 and 1875). Precision is the fraction of identified places that are correct; recall is the fraction of actual places in the source text that are identified correctly. You could get great recall&#8212;and terrible precision&#8212;by identifying everything in the source text as a location; likewise you&#8217;d have terrific precision&#8212;but awful recall&#8212;by limiting the locations you identify to those that are easy and unambiguous, e.g., &#8220;Boston.&#8221; You can combine (well, take the harmonic mean of) precision and recall to get an overall sense of accuracy via an F measure; in this case F1 (which weighs P and R equally) is 0.52.</p>
<p>What those numbers mean is that the method succeeds in finding most of the named places, but it also finds a lot of other extraneous stuff that it thinks are places but really aren&#8217;t. Fortunately, many of its errors aren&#8217;t of the kind you might expect. For instance, the location of &#8220;Springfield&#8221; in a text is hard to resolve without more information. There are some of these ambiguity problems, of course, but many more come from text strings that ought not to have been identified as locations at all. Some of these are more or less ambiguous (&#8220;Charlotte&#8221; or &#8220;Providence,&#8221; for instance, both of which show up pretty often in nineteenth-century texts, almost always as a personal name and divine care, respectively). But many such false locations are (even more) straightforward: &#8220;New Jerusalem,&#8221; &#8220;Conrad,&#8221; &#8220;Caroline,&#8221; etc. (I saw something similar in my <a href="http://wp.me/pl9RM-ch">previous work with GeoDict</a>.)</p>
<p>Because these sorts of errors are pretty easily identified out of context, it&#8217;s not terribly hard to clean up (quickly!) the results by hand, striking recognized locations that likely aren&#8217;t used as real places. At the same time, there are a few commonly-used pseudo-places that the NER package finds but Google doesn&#8217;t identify (&#8220;the South,&#8221; &#8220;Far East,&#8221; and so on). These are trivial to correct.</p>
<p>Applying such hand cleanup raises precision to 0.59 and recall to 0.84 (the latter mostly due to &#8220;South,&#8221; &#8220;North,&#8221; etc.&#8212;we&#8217;re talking about the lit of the Civil War, after all). The revised F1 score is 0.69. That&#8217;s not bad, really (though one would always like these numbers to be higher). Compare, for instance, Jochen Leidner&#8217;s <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.7200">evaluation of toponym resolution methods</a>, which found lower numbers using more sophisticated techniques on locations mentioned in newspaper articles. Note in particular that even humans often don&#8217;t agree on what constitutes a named location (&#8220;Boston lawyer&#8221;: adjective or place?) nor on the identity of the referent (Leidner cites inter-annotator agreement of roughly 80-90% depending on the corpus).</p>
<p>So long story short: the combination of Stanford NER and Google geolocation performs (surprisingly?) well by contemporary standards. But keep in mind that even in the best case, around 40% of the identified results will be spurious.</p>
<br />Filed under: <a href='http://mattwilkens.com/category/digital-humanities/'>Digital Humanities</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/workproduct.wordpress.com/825/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/workproduct.wordpress.com/825/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/workproduct.wordpress.com/825/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/workproduct.wordpress.com/825/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/workproduct.wordpress.com/825/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/workproduct.wordpress.com/825/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/workproduct.wordpress.com/825/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/workproduct.wordpress.com/825/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/workproduct.wordpress.com/825/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/workproduct.wordpress.com/825/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/workproduct.wordpress.com/825/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/workproduct.wordpress.com/825/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/workproduct.wordpress.com/825/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/workproduct.wordpress.com/825/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&amp;blog=5042818&amp;post=825&amp;subd=workproduct&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mattwilkens.com/2011/07/08/toponym-resolution-accuracy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://1.gravatar.com/avatar/12b5810f600c29cd1608ad22d2890148?s=96&amp;amp;d=identicon" length="" type="" />
		</item>
		<item>
		<title>Bowker Publishing Stats for 2010</title>
		<link>http://mattwilkens.com/2011/06/13/bowker-publishing-stats-for-2010/</link>
		<comments>http://mattwilkens.com/2011/06/13/bowker-publishing-stats-for-2010/#comments</comments>
		<pubDate>Mon, 13 Jun 2011 23:14:08 +0000</pubDate>
		<dc:creator>Matthew Wilkens</dc:creator>
				<category><![CDATA[Literature]]></category>

		<guid isPermaLink="false">https://workproduct.wordpress.com/?p=817</guid>
		<description><![CDATA[I overlooked last month&#8217;s announcement from Bowker concerning the number of books published in 2009 and 2010. Condensed version: fiction is flat at a little under 50,000 new titles, literature dropped off a lot (~30%, to 8k from 11k), though if memory serves, &#8220;literature&#8221; is a catch-all for anthologies and books about literature; all novels [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&#38;blog=5042818&#38;post=817&#38;subd=workproduct&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I overlooked last month&#8217;s announcement from Bowker concerning the <a href="http://www.bowker.com/index.php/press-releases/633-print-isnt-dead-says-bowkers-annual-book-production-report">number of books published in 2009 and 2010</a>. Condensed version: fiction is flat at a little under 50,000 new titles, literature dropped off a lot (~30%, to 8k from 11k), though if memory serves, &#8220;literature&#8221; is a catch-all for anthologies and books <em>about</em> literature; all novels fall under fiction, even when they&#8217;re categorized as &#8220;literary fiction.&#8221; Poetry and drama were off, too.</p>
<p>But&#8212;and this may explain much of the drop/flatness&#8212;&#8221;non-traditional&#8221; publication was way, way up. Like, into the millions up. Bowker reports about 316k new traditional titles across all categories for 2010, against almost 2.8 million non-traditional (mostly POD reprints of public domain works). Until c. 2006, the ratios were reversed at about 10:1 traditional:non-traditional. My guess would be that there&#8217;s also, buried in that landslide of reprints, a small but very non-trivial number of books that might in the past have been published traditionally, but now are sold direct via Amazon and author sites without the intervention of a regular publisher (note the presence of significant numbers from Lulu, AuthorHouse, XLibris, etc.).</p>
<p>Take-away point: There&#8217;s a lot of new fiction out there. I&#8217;ll assume most of it is awful, but then most of it has always been awful. It&#8217;s only that the sea of words is a lot bigger now.</p>
<br />Filed under: <a href='http://mattwilkens.com/category/literature/'>Literature</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/workproduct.wordpress.com/817/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/workproduct.wordpress.com/817/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/workproduct.wordpress.com/817/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/workproduct.wordpress.com/817/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/workproduct.wordpress.com/817/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/workproduct.wordpress.com/817/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/workproduct.wordpress.com/817/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/workproduct.wordpress.com/817/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/workproduct.wordpress.com/817/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/workproduct.wordpress.com/817/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/workproduct.wordpress.com/817/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/workproduct.wordpress.com/817/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/workproduct.wordpress.com/817/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/workproduct.wordpress.com/817/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&amp;blog=5042818&amp;post=817&amp;subd=workproduct&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mattwilkens.com/2011/06/13/bowker-publishing-stats-for-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://1.gravatar.com/avatar/12b5810f600c29cd1608ad22d2890148?s=96&amp;amp;d=identicon" length="" type="" />
		</item>
		<item>
		<title>Literary Production around the Civil War</title>
		<link>http://mattwilkens.com/2011/06/13/literary-production-around-the-civil-war/</link>
		<comments>http://mattwilkens.com/2011/06/13/literary-production-around-the-civil-war/#comments</comments>
		<pubDate>Mon, 13 Jun 2011 21:02:11 +0000</pubDate>
		<dc:creator>Matthew Wilkens</dc:creator>
				<category><![CDATA[digital humanities]]></category>

		<guid isPermaLink="false">https://workproduct.wordpress.com/?p=809</guid>
		<description><![CDATA[One more histogram, possibly of general interest. Below is a plot showing the number of literary titles by American authors published in the U.S. each year between 1850 and 1875 (via Lyle Wright&#8217;s 1957 bibliography as represented in Indiana&#8217;s holdings, black bars) along with the number of those titles held in fully edited form in [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&#38;blog=5042818&#38;post=809&#38;subd=workproduct&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>One more histogram, possibly of general interest. Below is a plot showing the number of literary titles by American authors published in the U.S. each year between 1850 and 1875 (via <a href="http://www.worldcat.org/oclc/484994">Lyle Wright&#8217;s 1957 bibliography</a> as represented in Indiana&#8217;s holdings, black bars) along with the number of those titles held in fully edited form in the Wright American Fiction archive from Indiana and in the MONK project.</p>
<p>Note that this isn&#8217;t a stacked bar plot; you&#8217;re seeing three distinct histograms superimposed on one another. So if you&#8217;re looking just at the black bars, you&#8217;re seeing a comprehensive survey of American literary production around the Civil War.</p>
<p><a href="http://workproduct.files.wordpress.com/2011/06/wright-dates-combo.png"><img style="display:block;margin-left:auto;margin-right:auto;" src="http://workproduct.files.wordpress.com/2011/06/wright-dates-combo.png?w=480" alt="Wright Dates Combo" title="Wright Dates Combo.png" border="0" width="100%" /></a></p>
<p>Nothing shocking here. Publication of literary texts drops off in the run-up to the Civil War and in its early years, then bounces back pretty quickly, even before the war is over. There are about 100 new books each year on average through the period.</p>
<p>Two notes for my own purposes. (1.) IU&#8217;s coverage of fully edited texts is around 40% of the total period output. That&#8217;s pretty good. Just as importantly, it hits that level roughly evenly for each year. No need to worry about serious variations from year to year or about individual years with very low representation (though be careful with, e.g., 1860&#8211;61). and (2.) I like what MONK did with its 300-text subset, clustering texts as far on either side of the war as possible. Even if you were only working with MONK, you&#8217;d still have a decent chance of picking out ante-/post-bellum features.</p>
<br />Filed under: <a href='http://mattwilkens.com/category/digital-humanities/'>Digital Humanities</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/workproduct.wordpress.com/809/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/workproduct.wordpress.com/809/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/workproduct.wordpress.com/809/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/workproduct.wordpress.com/809/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/workproduct.wordpress.com/809/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/workproduct.wordpress.com/809/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/workproduct.wordpress.com/809/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/workproduct.wordpress.com/809/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/workproduct.wordpress.com/809/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/workproduct.wordpress.com/809/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/workproduct.wordpress.com/809/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/workproduct.wordpress.com/809/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/workproduct.wordpress.com/809/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/workproduct.wordpress.com/809/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&amp;blog=5042818&amp;post=809&amp;subd=workproduct&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mattwilkens.com/2011/06/13/literary-production-around-the-civil-war/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://1.gravatar.com/avatar/12b5810f600c29cd1608ad22d2890148?s=96&amp;amp;d=identicon" length="" type="" />
<enclosure url="http://workproduct.files.wordpress.com/2011/06/wright-dates-combo.png" length="" type="" />
		</item>
		<item>
		<title>More Wright American Fiction</title>
		<link>http://mattwilkens.com/2011/06/10/more-wright-american-fiction/</link>
		<comments>http://mattwilkens.com/2011/06/10/more-wright-american-fiction/#comments</comments>
		<pubDate>Fri, 10 Jun 2011 04:41:57 +0000</pubDate>
		<dc:creator>Matthew Wilkens</dc:creator>
				<category><![CDATA[digital humanities]]></category>

		<guid isPermaLink="false">https://workproduct.wordpress.com/?p=796</guid>
		<description><![CDATA[With the kind assistance of several folks at Indiana, I&#8217;ve now gotten my hands on IU&#8217;s full holdings of the digitized Wright American Fiction collection. This is the literary corpus spanning 1850-1875 from which the MONK texts that I used for my initial mapping project were drawn. But MONK chose to limit the size of [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&#38;blog=5042818&#38;post=796&#38;subd=workproduct&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>With the kind assistance of several folks at Indiana, I&#8217;ve now gotten my hands on IU&#8217;s full holdings of the digitized Wright American Fiction collection. This is the literary corpus spanning 1850-1875 from which the MONK texts that I used for my initial <a href="http://mattwilkens.com/2011/03/28/maps-of-american-fiction/">mapping project</a> were drawn. But MONK chose to limit the size of their Wright-based corpus to around 300 volumes for reasons of balance across their several datasets.</p>
<p>IU has an additional c. 900 Wright texts that have been fully edited and XML encoded (plus 1300 more that have been OCR&#8217;ed and XML encoded but not hand edited). This means my depth and temporal coverage in the period around the Civil War just got <em>way</em> better.</p>
<p>More info and results to come as I work my way through this stuff. In the meantime, here&#8217;s a plot of the temporal distribution by original publication date of the texts in the two corpora:</p>
<div id="attachment_800" class="wp-caption alignleft" style="width: 490px"><a href="http://workproduct.files.wordpress.com/2011/06/wright-dates-iu1.png"><img class="size-full wp-image-800" title="Wright-Dates-IU" src="http://workproduct.files.wordpress.com/2011/06/wright-dates-iu1.png?w=480&#038;h=392" alt="" width="480" height="392" /></a><p class="wp-caption-text">Distribution of 954 Wright titles with known publication dates in Indiana&#039;s holdings</p></div>
<div id="attachment_801" class="wp-caption alignleft" style="width: 490px"><a href="http://workproduct.files.wordpress.com/2011/06/wright-dates-monk1.png"><img class="size-full wp-image-801" title="Wright-Dates-MONK" src="http://workproduct.files.wordpress.com/2011/06/wright-dates-monk1.png?w=480&#038;h=392" alt="" width="480" height="392" /></a><p class="wp-caption-text">Distribution of 297 Wright titles with known publication dates in MONK</p></div>
<br />Filed under: <a href='http://mattwilkens.com/category/digital-humanities/'>Digital Humanities</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/workproduct.wordpress.com/796/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/workproduct.wordpress.com/796/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/workproduct.wordpress.com/796/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/workproduct.wordpress.com/796/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/workproduct.wordpress.com/796/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/workproduct.wordpress.com/796/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/workproduct.wordpress.com/796/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/workproduct.wordpress.com/796/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/workproduct.wordpress.com/796/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/workproduct.wordpress.com/796/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/workproduct.wordpress.com/796/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/workproduct.wordpress.com/796/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/workproduct.wordpress.com/796/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/workproduct.wordpress.com/796/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&amp;blog=5042818&amp;post=796&amp;subd=workproduct&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mattwilkens.com/2011/06/10/more-wright-american-fiction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://1.gravatar.com/avatar/12b5810f600c29cd1608ad22d2890148?s=96&amp;amp;d=identicon" length="" type="" />
<enclosure url="http://workproduct.files.wordpress.com/2011/06/wright-dates-monk1.png" length="" type="" />
<enclosure url="http://workproduct.files.wordpress.com/2011/06/wright-dates-iu1.png" length="" type="" />
		</item>
		<item>
		<title>Post45</title>
		<link>http://mattwilkens.com/2011/06/01/post45/</link>
		<comments>http://mattwilkens.com/2011/06/01/post45/#comments</comments>
		<pubDate>Thu, 02 Jun 2011 02:16:45 +0000</pubDate>
		<dc:creator>Matthew Wilkens</dc:creator>
				<category><![CDATA[digital humanities]]></category>

		<guid isPermaLink="false">https://workproduct.wordpress.com/?p=792</guid>
		<description><![CDATA[I have a new piece, &#8220;Contemporary Fiction by the Numbers,&#8221; in the inaugural batch of essays at Post45 Contemporaries. My article is a primer on quantitative methods for literary studies, along with a brief for their significance. Not much that I haven&#8217;t said before, but it pulls together a few DH-and-lit ideas and a set [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&#38;blog=5042818&#38;post=792&#38;subd=workproduct&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I have a new piece, &#8220;<a href="http://post45.research.yale.edu/archives/574">Contemporary Fiction by the Numbers</a>,&#8221; in the inaugural batch of essays at <a href="http://post45.research.yale.edu/sections/contemporaries">Post45 Contemporaries</a>. My article is a primer on quantitative methods for literary studies, along with a brief for their significance. Not much that I haven&#8217;t said before, but it pulls together a few DH-and-lit ideas and a set of examples in one place.</p>
<p>More important, though, is the existence of <a href="http://post45.research.yale.edu/">Post45</a>. Post45 is a bunch of things: an Americanist working group, a book series (with Stanford UP), a conference, an online journal (which will soon begin publishing regular peer-reviewed articles), and&#8212;through its Contemporaries section, edited by Andy Hoberek&#8212;a cross between the <em>Partisan Review</em>, <em>NYRB</em>, and an especially smart blog devoted to &#8220;actively intervening in current tastes.&#8221;</p>
<p>I&#8217;m really happy to have an essay in the launch edition of the site, but I&#8217;m even happier that the whole project exists.</p>
<br />Filed under: <a href='http://mattwilkens.com/category/digital-humanities/'>Digital Humanities</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/workproduct.wordpress.com/792/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/workproduct.wordpress.com/792/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/workproduct.wordpress.com/792/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/workproduct.wordpress.com/792/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/workproduct.wordpress.com/792/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/workproduct.wordpress.com/792/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/workproduct.wordpress.com/792/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/workproduct.wordpress.com/792/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/workproduct.wordpress.com/792/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/workproduct.wordpress.com/792/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/workproduct.wordpress.com/792/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/workproduct.wordpress.com/792/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/workproduct.wordpress.com/792/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/workproduct.wordpress.com/792/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&amp;blog=5042818&amp;post=792&amp;subd=workproduct&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mattwilkens.com/2011/06/01/post45/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://1.gravatar.com/avatar/12b5810f600c29cd1608ad22d2890148?s=96&amp;amp;d=identicon" length="" type="" />
		</item>
		<item>
		<title>Memoir and Autobiography after WWII</title>
		<link>http://mattwilkens.com/2011/04/04/memoir-and-autobiography-after-wwii/</link>
		<comments>http://mattwilkens.com/2011/04/04/memoir-and-autobiography-after-wwii/#comments</comments>
		<pubDate>Tue, 05 Apr 2011 03:58:42 +0000</pubDate>
		<dc:creator>Matthew Wilkens</dc:creator>
				<category><![CDATA[Literature]]></category>

		<guid isPermaLink="false">https://workproduct.wordpress.com/?p=781</guid>
		<description><![CDATA[Apropos my upcoming talk at the Narrative Conference, an interesting n-gram chart of the terms &#8220;memoir&#8221; and &#8220;autobiography&#8221; after 1945. Serious bonus points for a convincing explanation of what you&#8217;ll find if you widen the date range. (Click the image for Google&#8217;s live-data version.) Update: Another, possibly relevant chart. Again, click for the live version: [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&#38;blog=5042818&#38;post=781&#38;subd=workproduct&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Apropos my upcoming talk at the <a href="http://narrative.wustl.edu/">Narrative Conference</a>, an interesting <em>n</em>-gram chart of the terms &#8220;memoir&#8221; and &#8220;autobiography&#8221; after 1945. Serious bonus points for a convincing explanation of what you&#8217;ll find if you widen the date range. (Click the image for Google&#8217;s live-data version.)</p>
<p><a href="http://ngrams.googlelabs.com/graph?content=memoir,autobiography&amp;year_start=1945&amp;year_end=2008&amp;corpus=0&amp;smoothing=3"><img src="http://workproduct.files.wordpress.com/2011/04/memoir-autobio.png?w=480" alt="Memoir and Autobiogrphy after 1945" border="0" /></a></p>
<p><strong>Update</strong>: Another, possibly relevant chart. Again, click for the live version:</p>
<p><a href="http://ngrams.googlelabs.com/graph?content=I,he,she,you&amp;year_start=1800&amp;year_end=2000&amp;corpus=4&amp;smoothing=3"><img src="http://workproduct.files.wordpress.com/2011/04/i-you-he-she-ngrams.png?w=480" alt="I you he she ngrams" title="I-you-he-she-ngrams.png" border="0" /></a></p>
<br />Filed under: <a href='http://mattwilkens.com/category/literature/'>Literature</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/workproduct.wordpress.com/781/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/workproduct.wordpress.com/781/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/workproduct.wordpress.com/781/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/workproduct.wordpress.com/781/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/workproduct.wordpress.com/781/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/workproduct.wordpress.com/781/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/workproduct.wordpress.com/781/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/workproduct.wordpress.com/781/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/workproduct.wordpress.com/781/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/workproduct.wordpress.com/781/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/workproduct.wordpress.com/781/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/workproduct.wordpress.com/781/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/workproduct.wordpress.com/781/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/workproduct.wordpress.com/781/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&amp;blog=5042818&amp;post=781&amp;subd=workproduct&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mattwilkens.com/2011/04/04/memoir-and-autobiography-after-wwii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://workproduct.files.wordpress.com/2011/04/memoir-autobio.png" length="" type="" />
<enclosure url="http://workproduct.files.wordpress.com/2011/04/i-you-he-she-ngrams.png" length="" type="" />
<enclosure url="http://1.gravatar.com/avatar/12b5810f600c29cd1608ad22d2890148?s=96&amp;amp;d=identicon" length="" type="" />
		</item>
		<item>
		<title>Maps of American Fiction</title>
		<link>http://mattwilkens.com/2011/03/28/maps-of-american-fiction/</link>
		<comments>http://mattwilkens.com/2011/03/28/maps-of-american-fiction/#comments</comments>
		<pubDate>Tue, 29 Mar 2011 01:07:33 +0000</pubDate>
		<dc:creator>Matthew Wilkens</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[Literature]]></category>

		<guid isPermaLink="false">https://workproduct.wordpress.com/?p=761</guid>
		<description><![CDATA[A quick post to show some recent research on named places in nineteenth-century American fiction. I&#8217;m interested in the range and distribution of places mentioned in these books as potential indicators of cultural investments in, for example, internationalism and regionalism. I&#8217;m also curious about the extent to which large-scale changes (both cultural and formal) are [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&#38;blog=5042818&#38;post=761&#38;subd=workproduct&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A quick post to show some recent research on named places in nineteenth-century American fiction. I&#8217;m interested in the range and distribution of places mentioned in these books as potential indicators of cultural investments in, for example, internationalism and regionalism. I&#8217;m also curious about the extent to which large-scale changes (both cultural and formal) are observable in the overall literary production of this (or any) period. The mapping work I&#8217;ve done so far doesn&#8217;t come close to answering those questions, but it&#8217;s part of the larger inquiry.</p>
<h2>The Maps</h2>
<p>The maps below were generated using a modest corpus of American novels (about 300 in total) drawn from the <a href="http://www.letrs.indiana.edu/web/w/wright2/">Wright American Fiction Project</a> at Indiana by way of the <a href="http://www.monkproject.org/">MONK project</a>. They show the named locations used in those books; points correspond to everything from small towns through regions, nations and continents. Methodological details and (significant) caveats follow.</p>
<p><a href="http://workproduct.files.wordpress.com/2011/03/1851.png"><img src="http://workproduct.files.wordpress.com/2011/03/1851.png?w=480" alt="1851" title="1851.png" border="0" width="480" /></a><br />
<strong>1851</strong>. 37 volumes (~2.5M words), with data cleanup.</p>
<p><a href="http://workproduct.files.wordpress.com/2011/03/1852.png"><img src="http://workproduct.files.wordpress.com/2011/03/1852.png?w=480" alt="1852" title="1852.png" border="0" width="480" /></a><br />
<strong>1852</strong>. 44 volumes (~3.0M words), minimal cleanup.</p>
<p><a href="http://workproduct.files.wordpress.com/2011/03/1874.png"><img src="http://workproduct.files.wordpress.com/2011/03/1874.png?w=480" alt="1874" title="1874.png" border="0" width="480" /></a><br />
<strong>1874</strong>. 38 volumes (~3.1M words), minimal cleanup.</p>
<h2>The Method</h2>
<p>Texts were taken from MONK in XML (TEI-A) format with hand-curated metadata. Location names were identified and extracted using Pete Warden&#8217;s simple gazetteering script <a href="https://github.com/petewarden/geodict">GeoDict</a>, backed by MaxMind&#8217;s free <a href="http://www.maxmind.com/app/worldcities">world cities database</a>. <strong>[Note that there's currently a bug in the database population script for Geodict. Pete tells me it'll be fixed in the next release of his general-purpose <a href="http://www.datasciencetoolkit.org/">Data Science Toolkit</a>, into which Geodict has now been folded. But for now, you probably don't want to use Geodict as-is for your own work.]</strong> I tweaked GeoDict to identify places more liberally than usual, which results (predictably) in fewer missed places but more false positives. The locations for 1851 were reviewed pretty carefully by hand; I haven&#8217;t done the same yet for the other years. Maps were generated in Flash using <a href="http://modestmaps.com/">Modest Maps</a> with <a href="http://flowingdata.com/2008/10/21/code-for-walmart-growth-visualization-now-available/">code</a> cribbed shamelessly from the awesome FlowingData <a href="http://projects.flowingdata.com/walmart/">Walmart project</a>. This means that it should be relatively easy to turn the static maps above into a time-animated series, but I haven&#8217;t done that yet.</p>
<h2>Discussion</h2>
<p>As I pointed out in my <a href="http://workproduct.wordpress.com/2011/01/29/some-thoughts-on-dh-and-canons/">talk on canons</a>, the international scope and regional clustering of places in 1851 strike me as interesting. See the talk for (slightly) more discussion. Moving forward to 1874&#8212;and bearing in mind that we&#8217;re looking at dirty data best compared with the similarly dirty 1852&#8212;the density of named places in the American west increases after the Civil War and it looks as though a distinct cluster of places in the south central U.S is beginning to emerge.</p>
<p>The changes form 1852 to 1874 are (1) intriguing, (2) but also mostly as expected, and (3) more limited in scope than one might have imagined, given that they sit a decade on either side of <em>the</em> periodizing event of American history. I think an important question raised by a lot of work in corpus analysis (the present research included) concerns exactly what constitutes a &#8220;major&#8221; shift in form or content.</p>
<p>I&#8217;m going to avoid saying anything more here because I don&#8217;t want to build too much argument on top of a dataset that I know is still full of errors, but I wanted to put the maps up for anyone to puzzle through. If you have thoughts about what&#8217;s going on here, I&#8217;d love to hear them.</p>
<h2>Caveats</h2>
<p>A couple of notes and caveats on errors:</p>
<ul>
<li>Errors in the data are of several kinds. There are <strong>missed locations</strong>, i.e., named places that occur in the underlying text but are not flagged as such. Some places that existed in the nineteenth century don&#8217;t exist now. Some colloquial names aren&#8217;t in the database. And of course a book can be set in, say, New York City and yet fail to use the city&#8217;s name often or at all, possibly preferring street addresses or localisms like &#8220;the Village.&#8221; Also, GeoDict as configured identifies all country and continent names with no restrictions, but requires cities and regions (e.g., U.S. states) either to be paired with a larger geographic region (&#8220;Brooklyn, New York,&#8221; not &#8220;Brooklyn&#8221;) or preceded by &#8220;in&#8221; or &#8220;at&#8221; as indicators of place. You pretty much have to do this to keep the false positive rate manageable.</li>
<li>But there are still <strong>false positives</strong>. There&#8217;s a city somewhere in the world named for just about any common English name, adjective, military rank, etc. &#8220;George,&#8221; for instance, is a city in South Africa. &#8220;George, South Africa,&#8221; if it ever occurred in a text, would be identified correctly. But &#8220;<em>In</em> George she had found a true friend&#8221; produces a false positive. When I clean the data, I eliminate almost all proper names of this kind and investigate anything else that looks suspicious. Note that the cluster of places in southern Africa visible in the (uncleaned) 1852 and 1874 maps is almost certainly attributable to this kind of error. <a href="http://mith.umd.edu/mithstaff/#travisbrown">Travis Brown</a> tells me he&#8217;s seen the same thing in his own geocoding experiments.</li>
<li>Then there are <strong>ambiguous locations</strong>, usually clear in context but not obvious to GeoDict. &#8220;Cambridge&#8221; is the most frequent example. Some study suggests that most American novels in the corpus mean the city in Massachusetts, but that&#8217;s surely not true of every instance. Most other ambiguities are much more easily resolved, but they still require human attention.</li>
</ul>
<br />Filed under: <a href='http://mattwilkens.com/category/digital-humanities/'>Digital Humanities</a>, <a href='http://mattwilkens.com/category/literature/'>Literature</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/workproduct.wordpress.com/761/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/workproduct.wordpress.com/761/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/workproduct.wordpress.com/761/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/workproduct.wordpress.com/761/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/workproduct.wordpress.com/761/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/workproduct.wordpress.com/761/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/workproduct.wordpress.com/761/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/workproduct.wordpress.com/761/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/workproduct.wordpress.com/761/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/workproduct.wordpress.com/761/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/workproduct.wordpress.com/761/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/workproduct.wordpress.com/761/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/workproduct.wordpress.com/761/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/workproduct.wordpress.com/761/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&amp;blog=5042818&amp;post=761&amp;subd=workproduct&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mattwilkens.com/2011/03/28/maps-of-american-fiction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://workproduct.files.wordpress.com/2011/03/1874.png" length="" type="" />
<enclosure url="http://workproduct.files.wordpress.com/2011/03/1852.png" length="" type="" />
<enclosure url="http://1.gravatar.com/avatar/12b5810f600c29cd1608ad22d2890148?s=96&amp;amp;d=identicon" length="" type="" />
<enclosure url="http://workproduct.files.wordpress.com/2011/03/1851.png" length="" type="" />
		</item>
		<item>
		<title>Job News II</title>
		<link>http://mattwilkens.com/2011/03/27/job-news-ii/</link>
		<comments>http://mattwilkens.com/2011/03/27/job-news-ii/#comments</comments>
		<pubDate>Mon, 28 Mar 2011 03:29:31 +0000</pubDate>
		<dc:creator>Matthew Wilkens</dc:creator>
				<category><![CDATA[Meta]]></category>

		<guid isPermaLink="false">https://workproduct.wordpress.com/?p=750</guid>
		<description><![CDATA[I&#8217;m very happy to say that I&#8217;ll join the English faculty at Notre Dame in the fall. The position (in American fiction after 1900) is great, the people are terrific, the university is lovely. I couldn&#8217;t be happier and I&#8217;m tremendously excited to get started in my new home. In the meantime, I&#8217;m particularly grateful [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&#38;blog=5042818&#38;post=750&#38;subd=workproduct&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m very happy to say that I&#8217;ll join the <a href="http://english.nd.edu/">English</a> faculty at <a href="http://nd.edu/">Notre Dame</a> in the fall. The position (in American fiction after 1900) is great, the people are terrific, the university is lovely. I couldn&#8217;t be happier and I&#8217;m  tremendously excited to get started in my new home.</p>
<p>In the meantime, I&#8217;m particularly grateful for my colleagues in <a href="http://amcs.wustl.edu/">American Culture Studies</a> at Wash U, whom I will be leaving sooner than planned. My time in St. Louis has been wonderful: stimulating, friendly, generous of attention and resources &#8212; everything a scholar and a person could want. I&#8217;m sorry to leave, but happy I&#8217;ll only be moving a few hours up the road. (OK, six and a half, but who&#8217;s counting?)</p>
<p>As I said the last time around, nothing much should change here on the blog. I&#8217;ll post new contact info once I have it, but that won&#8217;t happen until August. In the meantime, I&#8217;ll be in St. Louis through the end of the semester and into the summer.</p>
<p>Oh, and those maps of named places in American fiction are coming shortly &#8230;</p>
<br />Filed under: <a href='http://mattwilkens.com/category/meta/'>Meta</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/workproduct.wordpress.com/750/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/workproduct.wordpress.com/750/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/workproduct.wordpress.com/750/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/workproduct.wordpress.com/750/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/workproduct.wordpress.com/750/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/workproduct.wordpress.com/750/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/workproduct.wordpress.com/750/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/workproduct.wordpress.com/750/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/workproduct.wordpress.com/750/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/workproduct.wordpress.com/750/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/workproduct.wordpress.com/750/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/workproduct.wordpress.com/750/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/workproduct.wordpress.com/750/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/workproduct.wordpress.com/750/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&amp;blog=5042818&amp;post=750&amp;subd=workproduct&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mattwilkens.com/2011/03/27/job-news-ii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://1.gravatar.com/avatar/12b5810f600c29cd1608ad22d2890148?s=96&amp;amp;d=identicon" length="" type="" />
		</item>
		<item>
		<title>Some Thoughts on DH and Canons</title>
		<link>http://mattwilkens.com/2011/01/29/some-thoughts-on-dh-and-canons/</link>
		<comments>http://mattwilkens.com/2011/01/29/some-thoughts-on-dh-and-canons/#comments</comments>
		<pubDate>Sat, 29 Jan 2011 18:57:32 +0000</pubDate>
		<dc:creator>Matthew Wilkens</dc:creator>
				<category><![CDATA[digital humanities]]></category>
		<category><![CDATA[Literature]]></category>

		<guid isPermaLink="false">https://workproduct.wordpress.com/?p=719</guid>
		<description><![CDATA[Below is a draft of the talk I&#8217;m giving next week at Austin for the first of three DH symposia this semester sponsored by the Texas Institute for Literary and Textual Studies. The theme of this first meeting is &#8220;Access, Authority, and Identity&#8220;; my paper is an attempt to think through some of the implications [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&#38;blog=5042818&#38;post=719&#38;subd=workproduct&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Below is a draft of the talk I&#8217;m giving next week at Austin for the first of three DH symposia this semester sponsored by the <a href="http://www.utexas.edu/cola/insts/tilts/">Texas Institute for Literary and Textual Studies</a>. The theme of this first meeting is &#8220;<a href="http://tilts.dwrl.utexas.edu/symposia/i">Access, Authority, and Identity</a>&#8220;; my paper is an attempt to think through some of the implications of working beyond the canon (however construed) for straight literary and cultural scholarship and for DH alike. It&#8217;s also a nice excuse to show a little preview of the geolocation work I&#8217;ve been doing recently.</p>
<p>A prettier <a href="http://artsci.wustl.edu/~wilkens/papers/TILTS%20Paper.pdf">PDF version</a> is also available.</p>
<h2 class="titleHead">Undermining Canons</h2>
<p>I have a point from which to start: Canons exist, and we should do something about them.</p>
<p>  I wouldn&#8217;t have thought this was a dicey claim until I was scolded recently by a senior colleague who told me that I was thirty years out of date for making it. The idea being that we&#8217;d had this fight a generation ago, and the canon had lost. But I was right and he, I&#8217;m sorry to say, was wrong. Ask any grad student reading for her comps or English professor who might confess to having skipped Hamlet. As I say, canons exist. Not, perhaps, in the Arnoldian&#8211;Bloomian sense of <em>the </em>canon, a single list of great books, and in any case certainly not the <em>same </em>list of dead white male authors that once defined the field. But in the more pluralist sense? Of books one really needs to have read to take part in the discipline? And of books many of us teach in common to our own   students? Certainly. These are canons. They exist.</p>
<p>   So why, a few decades after the question of canonicity as such was in any way current, do we still have these things? If we all agree that canons are bad, why haven&#8217;t we done away with them? Why do we merely tinker around the edges, adding a Morrison here and subtracting a Dryden there? Is this a problem? If so, what are we going to do about it? And more to the immediate point, what does any of this have to do with digital humanities?</p>
<p>   The answer to the first question&#8212;&#8220;Why do we still have canons?&#8221;&#8212;is as simple to articulate as it is apparently difficult to solve. We don&#8217;t read any faster than we ever did, even as the quantity of text produced grows larger by the year. If we need to read books in order to extract information from them and if we need to have read things in common in order to talk about them, we&#8217;re going to spend most of our time dealing with a relatively small set of texts. The composition of that set will change over time, but it will never get any bigger. This is a canon. [Footnote: How many canons are there? The answer depends on how many people need to have read a given set of materials in order to constitute a field of study. This was once more or less everyone, but then the field was also very small when that was true. My best guess is that the number is at least a hundred or more at the very lowest end&#8212;and an order of magnitude or two more than that at the high end&#8212;which would give us a few dozen subfields in English, give or take. That strikes me as roughly accurate.]</p>
<p>  Another way of putting this would be to say that we need to decide what to ignore. And the answer with which we&#8217;ve contented ourselves for generations is: &#8220;Pretty much everything ever written.&#8221; We don&#8217;t read much. What little we do read is deeply nonrepresentative of the full field of literary and cultural production. Our canons are assembled haphazardly, with a deep set of ingrained cultural biases that are largely invisible to us, and in ignorance of their alternatives. We&#8217;re doing little better, frankly, than we were with the dead-white-male bunch fifty or a hundred years ago, and we&#8217;re just as smug in our false sense of intellectual scope.</p>
<p>  So canons, even in their current, mildly multiculturalist form, are an enormous problem, one that follows from our single working method, that is, from the need to perform always and only close reading as a means of cultural analysis. It&#8217;s probably clear where I&#8217;m going with this, at least to a group of DH folks. We need to do less close reading and more of anything and everything else that might help us extract information from and about texts as indicators of larger cultural issues. That includes bibliometrics and book historical work, data-mining and quantitative text analysis, economic study of the book trade and of other cultural industries, geospatial analysis, and so on. Moretti is an obvious model here, as is the work of people like Michael Witmore on early modern drama and Nicholas Dames on social structures in nineteenth-century fiction.</p>
<p>   To show you one quick example of what I have in mind, here&#8217;s a map of the locations mentioned in thirty-seven American literary texts published in 1851:</p>
<div class="center">
<a href="http://workproduct.files.wordpress.com/2011/01/18511.png"><img style="display:block;margin-left:auto;margin-right:auto;" src="http://workproduct.files.wordpress.com/2011/01/18511.png?w=480" alt="1851.png" border="0" width="100%" /></a></div>
<p> 
<div class="caption"><span class="id">Figure&#xA0;1: </span><span class="content">Places named in 37 U.S. novels published in 1851</span></div>
</p><p>  There are some squarely canonical works included in this collection, including <em>Moby-Dick </em>and <em>House of the Seven Gables</em>, but the large majority are obscure novels by the likes of T.&#xA0;S. Arthur and Sylvanus Cobb. I certainly haven&#8217;t read many of them, nor am I likely to spend months doing so. The corpus is drawn from the <a href="http://www.letrs.indiana.edu/web/w/wright2/">Wright American Fiction collection</a> and represents about a third of the total American literary works published that year. [Footnote: Why only a third? Those are all the texts available in machine-readable format at the moment.] Place names were extracted using a tool called <a href="https://github.com/petewarden/geodict">GeoDict</a>, which looks for strings of text that match a large database of named locations. I had to do a bit of cleanup on the extracted places, mostly because many personal names and common adjectives are also the names of cities somewhere in the world. I erred on the conservative side, excluding any of those I found and requiring a leading preposition for cities and regions, so if anything, I&#8217;ve likely missed some valid places. But the results are fascinating. Two points of interest, just quickly:
<ol class="enumerate1">
<li class="enumerate" id="x1-6x1">For one, there are a lot more international locations  than  one  might  have  expected. True,  many  of  them  are  in  Britain  and western  Europe,  but  these  are  American novels,  not  British  reprints,  so  even  that fact might surprise us. And there are also multiple  mentions  of  locations  in  South America,   Africa,   India,   China,   Russia, Australia, the Middle East, and so on. The imaginative landscape of American fiction in the mid-nineteenth century appears to be pretty diversely outward looking in a way        that hasn&#8217;t received much attention. </li>
<li class="enumerate" id="x1-8x2">And then&#8212;point two&#8212;there&#8217;s the distinct cluster  of  named  places  in  the  American south. At some level this probably shouldn&#8217;t be  surprising;  we&#8217;re  talking  about  books that  appeared  just  a  decade  before  the Civil  War,  and  the  South  was  certainly on   people&#8217;s   minds.   But   it   doesn&#8217;t   fit very  well  with  the  stories  we  currently tell about Romanticism and the American Renaissance,  which  are  centered  firmly  in New  England  during  the  early  1850s  and dominate our understanding of the period. Perhaps  we  need  to  at  least  consider  the possibility that American regionalism took hold  significantly  earlier  than  we  usually claim. </li>
</ol>
</p><p>   So as I say, I think this is a pretty interesting result, one that demonstrates a first step in the kind of analyses that remain literary and cultural but that don&#8217;t depend on close reading alone nor suffer the material limits such reading imposes. I think we should do more of this&#8212;not necessarily more geolocation extraction in mid-nineteenth-century American fiction (though what I just showed obviously doesn&#8217;t exhaust that little project), but certainly more algorithmic and quantitative analysis of piles of text much too large to tackle &#8220;directly.&#8221; (&#8220;Directly&#8221; gets scare quotes because it&#8217;s a deeply misleading synonym for close reading in this context.)</p>
<p>   If we do that&#8212;shift more of our critical capacity to such projects&#8212;there will be a couple of important consequences. For one thing, we&#8217;ll almost certainly become worse readers. Our time is finite; the less of it we devote to an activity, the less we&#8217;ll develop our skill in that area. Exactly how much our reading suffers&#8212;and how much we should care&#8212;are matters of reasonable debate; they depend on both the extent of the shift and the shape of the skill&#8211;experience curve for close reading. My sense is that we&#8217;ll come out alright and that it&#8217;s a trade well worth making. We gain a lot by having available to us the kinds of evidence text mining (for example) provides, enough that the outcome will almost certainly be a net positive for the field. But I&#8217;m willing to admit that the proof will be in the practice and that the practice is, while promising, as yet pretty limited. The important point, though, is that the decay of close reading as such is a negative in itself only if we mistakenly equate literary and cultural analysis with their current working method.</p>
<p>  Second&#8212;and maybe more important for those of us already engaged in digital projects of one sort or another&#8212;we&#8217;ll need to see a related reallocation of resources within DH itself. Over the last couple of decades, many of our most visible projects have been organized around canonical texts, authors, and cultural artifacts. They have been motivated by a desire to understand those (quite limited) objects more robustly and completely, on a model plainly derived from conventional humanities scholarship. That wasn&#8217;t a mistake, nor are those projects without significant value. They&#8217;ve contributed to our understanding of, for example, Rossetti and Whitman, Stowe and Dickinson, Shakespeare and Spenser. And they&#8217;ve helped legitimate digital work in the eyes of suspicious colleagues by showing how far we can extend our traditional scholarship with new technologies. They&#8217;ve provided scholars around the world&#8212;including those outside the centers of university power&#8212;with better access to rare materials and improved pedagogy by the    same means. But we shouldn&#8217;t ignore the fact that they&#8217;ve also often been large, expensive undertakings built on the assumption that we already know which authors and texts are the proper ones to which to devote our scarce resources. And to the extent that they&#8217;ve succeeded, they&#8217;ve also reinforced the canonicity of their subjects by increasing the amount of critical attention paid to them.</p>
<p>   What&#8217;s required for computational and quantitative work&#8212;the kind of work that undermines rather than reinforces canons&#8212;is more material, less elaborately developed. The Wright collection, on which the 1851 map that I showed a few minutes ago was based (Figure 1), is a partial example of the kind of resource that&#8217;s best suited to this next development in digital humanities research. It covers every known American literary text published in the U.S. between 1851 and 1875 and makes them available in machine-readable form with basic metadata. Google Books and the Hathi Trust aim for the same thing on a much larger scale. None of these projects is cheap. But on a per-volume basis, they&#8217;re not bad. And of course we got Google and Hathi for very little of our own money, considering the magnitude of the projects.</p>
<p>   It will still cost a good deal to make use of these what we might call &#8220;bare&#8221; repositories. The time, money, and attention they demand will have to come from somewhere. My point, though, is that if (as seems likely) we can&#8217;t pull those resources from entirely new pools outside the discipline&#8212;that is to say, if we can&#8217;t just expand the discipline so as to do everything we already do, plus a great many new things&#8212;then we should be willing to make sacrifices not only in traditional or analog humanities, but also in the types of first-wave digital projects that made the name and reputation of DH. This will hurt, but it will also result in categorically better, more broadly based, more inclusive, and finally more useful humanities scholarship. It will do so by giving us our first real chance to break the grip of small, arbitrarily assembled canons on our thinking about large-scale cultural production. It&#8217;s an opportunity not to be missed and a chance to put our money&#8212;real and figurative&#8212;where our mouths have been for two generations. We&#8217;ve complained about canons for a long time. Now that we might do without them, are we willing to try? And to accept the trade-offs involved? I think we should be.</p>
<br />Filed under: <a href='http://mattwilkens.com/category/digital-humanities/'>Digital Humanities</a>, <a href='http://mattwilkens.com/category/literature/'>Literature</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/workproduct.wordpress.com/719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/workproduct.wordpress.com/719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/workproduct.wordpress.com/719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/workproduct.wordpress.com/719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/workproduct.wordpress.com/719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/workproduct.wordpress.com/719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/workproduct.wordpress.com/719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/workproduct.wordpress.com/719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/workproduct.wordpress.com/719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/workproduct.wordpress.com/719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/workproduct.wordpress.com/719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/workproduct.wordpress.com/719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/workproduct.wordpress.com/719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/workproduct.wordpress.com/719/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mattwilkens.com&amp;blog=5042818&amp;post=719&amp;subd=workproduct&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mattwilkens.com/2011/01/29/some-thoughts-on-dh-and-canons/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://1.gravatar.com/avatar/12b5810f600c29cd1608ad22d2890148?s=96&amp;amp;d=identicon" length="" type="" />
<enclosure url="http://workproduct.files.wordpress.com/2011/01/18511.png" length="" type="" />
		</item>
	</channel>
</rss>

