Free Webinars for Teaching Writing with Wikipedia

 Uncategorized  Comments Off on Free Webinars for Teaching Writing with Wikipedia
Jun 252016
June 28 2016 to June 29 2016
You are cordially invited to participate in the webinar portion of "Integrating Wikipedia Into Writing-Intensive Courses." This symposium has been generously funded by the Associated Colleges of the South (ACS) and will be hosted on-site at Spelman College. By making key parts of our symposium...
Atlanta, GA 303014
United States

Language bias and black sheep

 Uncategorized  Comments Off on Language bias and black sheep
Jun 252016
Tolga Bolukbasi and colleagues recently posted an article about bias in what is learned with word2vec, on the standard Google News crawl (h/t Jack Clark). Essentially what they found is that word embeddings reflect stereotypes regarding gender (for instance, "nurse" is closer to "she" than "he" and "hero" is the reverse) and race ("black male" is closest to "assaulted" and "white male" to "entitled"). This is not hugely surprising, and it's nice to see it confirmed. The authors additionally present a method for removing those stereotypes with no cost (as measured with analogy tasks) to accuracy of the embeddings. This also shows up on twitter embeddings related to hate speech.

There have been a handful of reactions to this work, some questioning the core motivation, essentially variants of "if there are biases in the data, they're there for a reason, and removing them is removing important information." The authors give a nice example in the paper (web search; two identical web pages about CS; one mentions "John" and the other "Mary"; query for "computer science" ranks the "John" one higher because of embeddings; appeal to a not-universally-held-belief that this is bad).

I'd like to take a step back and argue the the problem is much deeper than this. The problem is that even though we all know that strong Sapir-Whorf is false, we seem to want it to be true for computational stuff.

At a narrow level, the issue here is the question of what does a word "mean." I don't think anyone would argue that "nurse" means "female" or that "computer scientist" means "male." And yet, these word embeddings, which claim to be capturing meaning, are clearly capturing this non-meaning-effect. So then the argument becomes one of "well ok, nurse doesn't mean female, but it is correlated in the real world."

Which leads us to the "black sheep problem." We like to think that language is a reflection of underlying truth, and so if a word embedding (or whatever) is extracted from language, then it reflects some underlying truth about the world. The problem is that even in the simplest cases, this is super false.

The "black sheep problem" is that if you were to try to guess what color most sheep were by looking and language data, it would be very difficult for you to conclude that they weren't almost all black. In English, "black sheep" outnumbers "white sheep" about 25:1 (many "black sheep"s are movie references); in French it's 3:1; in German it's 12:1. Some languages get it right; in Korean it's 1:1.5 in favor of white sheep. This happens with other pairs, too; for example "white cloud" versus "red cloud." In English, red cloud wins 1.1:1 (there's a famous Sioux named "Red Cloud"); in Korean, white cloud wins 1.2:1, but four-leaf clover wins 2:1 over three-leaf clover. [Thanks to Karl Stratos and Kota Yamaguchi for helping with the multilingual examples.]

This is all to say that co-occurance frequencies of words definitely do not reflect co-occurance frequencies of things in the real world. And the fact that the correlation can both both ways means that just trying to model a "default" as something that doesn't appear won't work. (Also, computer vision doesn't really help: there are many many pictures of black sheep out there because of photographer bias.)

We observed a related phenomena when working on plot units. We were trying to extract "patient polarity verbs" (this idea has now been expanded and renamed "implicit sentiment": a much better name). The idea is that we want to know what polarity verbs inflict on their arguments. If I "feed" you, is this good or bad for you? For me? If I "punch" you, likewise. We focused on patients because action verbs are almost always good for the agent.

In order to accomplish this, we started with a seed list of "do-good-ers" and "wrong-do-ers." For instance, "the devil" was a wrong do-er, and so we can extract things that the devil does, and assume that these are (on average) bad for their patients. The problem was the "do-good-ers" don't do good, or at least they don't do good in the news. One of our do-good-ers was "firefighter". Firefighters are awesome. Even stereotyped, this is arguably a very positive social good, heroic profession. But in the news, what do firefighters do? Bad things. Is this because most firefighters do bad things in the world? Of course not. It's because news is especially poignant when stereotypically good people do bad things.

This comes up in translation too, especially when looking at looking at domain adaptation effects. For instance, our usual example for French to English translation is that in Hansards, "enceinte" transates as "room" but in EMEA (medical domain), it translates as "pregnant." What does this have to do with things like gender bias? In Canadian Hansards, "merde" translates mostly as "shit" and sometimes as "crap." In movie subtitles, it's very frequently "fuck." (I suspect translation direction is a confounder here.) This is essentially a form of intensification (or detensification, depending on direction). It is not hard to imagine similar intensifications happening between racial descriptions and racial slurs, or between gender descriptions and sexist slurs, depending on where the data came from.

 Posted by on June 25, 2016

Genetic Data Tools and Pop Music

 Uncategorized  Comments Off on Genetic Data Tools and Pop Music
Jun 242016

Music GenresGreat crossover article about using tools developed for genetic analysis to also analyze pop music from 1960s onward. It’s an interesting read and I encourage reading all of it so I’ll just entice you with one conclusion.

There’s a popular conception in the music industry that in recent years pop music has become less diverse. Usually the arguments involve standard economic ideas such as a diminishing pool of leaders emerging from an initially diverse pool of players. And in the music industry that seems to make sense with the usual arguments being limited air time to play a broader range of hits and a somewhat consolidation of radio stations.

However, the linked article and the chart I’m using to showcase the article point out this is not the case. The analysis technique quantifiably identified 13 musical genres.Since these genres were identified algorithmically, it isn’t initially clear what actual genres these 13 belong to. While it’d be great if we could click on a link for each of these genres and hear a sample, the authors did the next best thing and used the tags from a popular music website to verbally describe the specific genres. This is why there’s a fair amount of overlap in this “naming scheme” (eg “love song” is one of the tags for both genre 3 and genre 10).

One way to verify the conclusion that music is as diverse now as it was then is to look at the thinnest lines at the top and bottom of the graph. At the bottom, in 1960 there were only 2-3 very thin lines (one obviously belongs to rap, genre 2), but also at the top there are 2-3 very thin lines (here one belongs to “funk, blues, jazz, soul”, genre 4). Basically 10-11 active genres in the 1960s and 10-11 active genres now

A traveller’s diary, Nov 24th, 1874

 Uncategorized  Comments Off on A traveller’s diary, Nov 24th, 1874
Jun 242016

In a traveller’s diary, November 24th 1874

Stopped at the Coliseum and

went to the very topmost platform of this tremendous ruin

and looked into the far depths of the present excavations. Walked all

around among the arches and after coming down to what used to be

considered the arena we went boldly down an inclined plain in

among the recently found arches etc into two or three long passages

heading ever so far away, saw the bronze sockets in the large

flag stones in the middle of the passage where the gates are

supposed to have turned to admit the animals. Some columns

and capitols and fragments of statuary and slabs with figures engraved on them.

some of them representing fighting or chase. After a late lunch took

a walk through via Babuino, Condoti and Corso home. On the old

pavement below the arena are long pieces of charred wood

with smaller crossbars of the same – very curious.

Sci-Fi short film scripted by machine learning algorithm

 Data Art, machine learning, movie  Comments Off on Sci-Fi short film scripted by machine learning algorithm
Jun 242016

Filmmaker Oscar Sharp and technologist Ross Goodwin fed a machine learning algorithm with a bunch of Sci-Fi movie scripts to see what new script it would spit out. A script for Sunspring is the result, and this is the film, starring Thomas Middleditch. Riveting.

The thought of a machine tapping into emotion and creativity likely brings some sneers, but Goodwin argues that it’s about assistance and augmentation rather than a replacement for the humans.

The machine dictated that Middleditch’s character should pull the camera. However, the reveal that he’s holding nothing was a brilliant human interpretation, informed by the production team’s many years of combined experience and education in the art of filmmaking. That cycle of generation and interpretation is a fascinating dialogue that informs my current understanding of this machine’s capacity to augment our creativity.


Tags: ,

From the Louvre’s Library to the Treasures Gallery

 Uncategorized  Comments Off on From the Louvre’s Library to the Treasures Gallery
Jun 232016
In 1364 the French king, Charles V (b. 1338, d. 1380) set about constructing a new library in the Louvre, which had hitherto been a fortress and was now a royal palace. He chose an old falconry tower, removed the birds and created a splendid library. What was a loss...

Release Notes for Safari Technology Preview Release 7

 Uncategorized  Comments Off on Release Notes for Safari Technology Preview Release 7
Jun 232016

Safari Technology Preview Release 7 is now available for download. If you already have Safari Technology Preview installed, you can update from the Mac App Store’s Updates tab. Release 7 of Safari Technology Preview covers WebKit revisions 201541–202085.


  • Implemented options argument to addEventListener (r201735, r201757)
  • Updated JSON.stringify to correctly transform numeric array indices (r201674)
  • Improved the performance of Encode operations (r201756)
  • Addressed issues with Date setters for years outside of 1900-2100 (r201586)
  • Fixed an issue where reusing a function name as a parameter name threw a syntax error (r201892)
  • Added the error argument for window.onerror event handlers (r202023)
  • Improved performance for accessing dictionary properties (r201562)
  • Updated Proxy.ownKeys to match recent changes to the spec (r201672)
  • Prevented RegExp unicode parsing from reading an extra character before failing (r201714)
  • Updated SVGs to report their memory cost to the JavaScript garbage collector (r201561)
  • Improved the sampling profiler to protect itself against certain forms of sampling bias that arise due to the sampling interval being in sync with some other system process (r202021)
  • Fixed global lexical environment variables scope for functions created using the Function constructor (r201628)
  • Fixed parsing super when the default parameter is an arrow function (r202074)
  • Added support for trailing commas in function parameters and arguments (r201725)


  • Added the unprefixed version of the pseudo element ::placeholder (r202066)
  • Fixed a crash when computing the style of a grid with only absolute-positioned children (r201919)
  • Fixed computing a grid container’s height by accounting for the horizontal scrollbar (r201709)
  • Fixed placing positioned items on the implicit grid (r201545)
  • Fixed rendering for the text-decoration-style values: dashed and dotted (r201777)
  • Fixed support for using border-radius and backdrop-filter properties together (r201785)
  • Fixed clipping for border-radius with different width and height (r201868)
  • Fixed CSS reflections for elements with WebGL (r201639)
  • Fixed CSS reflections for elements with a backdrop-filter property (r201648)
  • Improved the Document’s font selection lifetime in preparation for the CSS Font Loading API (r201799)
  • Improved memory management for CSS value parsing (r201608)
  • Improved font face rule handling for style change calculations (r201971, r202085)
  • Fixed multiple selector rule behavior for keyframe animations (r201818)
  • Fixed applying CSS variables correctly for writing-mode properties (r201875)
  • Added experimental support for spring() based CSS animations (r201759)
  • Changed the initial value of background-color to transparent per specs (r201666)

Web APIs

Web Inspector

  • Added ⌘T keyboard shortcut to open the New Tab tab (r201692, r201762)
  • Added the ability to show and hide columns in data grid tables (r202009, r202081)
  • Fixed an error when trying to delete nodes with children (r201843)
  • Added a Top Functions view for Call Trees in the JavaScript & Events timeline (r202010, r202055)
  • Added gaps to the overview and category graphs in the Memory timeline where discontinuities exist in the recording (r201686)
  • Improved the performance of DOM tree views (r201840, r201833)
  • Fixed filtering to apply to new records added to the data grid (r202011)
  • Improved snapshot comparisons to always compare the later snapshot to the earlier snapshot no matter what order they were selected (r201949)
  • Improved performance when processing many DOM.attributeModified messages (r201778)
  • Fixed the 60fps guideline for the Rendering Frames timeline when switching timeline modes (r201937)
  • Included the exception stack when showing internal errors in Web Inspector (r202025)
  • Added ⌘P keyboard shortcut for quick open (r201891)
  • Removed Text → Content subsection from the Visual Styles Sidebar when not necessary (r202073)
  • Show <template> content that should not be hidden as Shadow Content (r201965)
  • Fixed elements in the Elements tab losing focus when selected by the up or down key (r201890)
  • Enabled combining diacritic marks in input fields in Web Inspector Enabled combining diacritic marks in input fields in Web Inspector (r201592)


  • Prevented double-painting the outline of a replaced video element (r201752)
  • Properly prevented for video.src="file" with audio user gesture restrictions in place (r201841)
  • Prevented showing the caption menu if the video has no selectable text or audio tracks (r201883)
  • Improved performance of HTMLMediaElement.prototype.canPlayType that was accounting for 250–750ms first loading (r201831)
  • Fixed inline media controls to show PiP and fullscreen buttons (r202075)


  • Fixed a repaint issue with vertical text in an out-of-flow container (r201635)
  • Show text in a placeholder font while downloading the specified font (r201676)
  • Fixed rendering an SVG in the correct vertical position when no vertical padding is applied, and in the correct horizontal position when no horizontal padding is applied (r201604)
  • Fixed blending of inline SVG elements with transparency layers (r202022)
  • Fixed display of hairline borders on 3x displays (r201907)
  • Prevented flickering and rendering artifacts when resizing the web view (r202037)
  • Fixed logic to trigger new layout after changing canvas height immediately after page load (r201889)

Bug Fixes

  • Fixed an issue where Find on Page would show too many matches (r201701)
  • Exposed static text if form label text only contains static text (r202063)
  • Added Origin header for CORS requests on preloaded cross-origin resources (r201930)
  • Added support for the upgrade-insecure-requests (UIR) directive of Content Security Policy (r201679, r201753)
  • Added proper element focus and caret destination for keyboard users activating a fragment URL (r201832)
  • Increased disk cache capacity when there is lots of free space (r201857)
  • Prevented hangs during synchronous XHR requests if a network session doesn’t exist (r201593)
  • Fixed the response for a POST request on a blob resource to return a “network error” instead of HTTP 500 response (r201557)
  • Restricted HTTP/0.9 responses to default ports and cancelled HTTP/0.9 resource loads if the document was loaded with another HTTP protocol (r201895)
  • Fixed parsing URLs containing tabs or newlines (r201740)
  • Fixed cookie validation in private browsing (r201967)
  • Provided memory cache support for the Vary header (r201800, r201805)

Please join MITH in welcoming Purdom Lindblad to our team!

 Uncategorized  Comments Off on Please join MITH in welcoming Purdom Lindblad to our team!
Jun 232016

MITH is excited to announce that Purdom Lindblad will be joining us in the newly established position of Assistant Director for Innovation and Learning, beginning in July. In this position, she will play a leadership role in managing MITH’s growing portfolio of courses and instructional programs.

Purdom comes to us from Scholars’ Lab at the University of Virginia, where as Head of Graduate Programs she collaborated with graduate student fellows, developers, librarians, and designers to create a space for experimentation and play. She was a crucial team member of the Praxis Program, which introduces graduate students to research questions and methods for the digital humanities, and she also worked with UVA’s Director of Diversity to develop a Leadership Alliance Mellon Institute (LAMI), a digital humanities-inflected summer research program, for which she developed two courses in Research Methods and an Introduction to Digital Humanities. Dedicated to cultivating supportive communities for learning, Purdom thrives in collaborative environments where people are at the heart of her work. As she notes,

“I strive to foster spaces and programs where novices as well as experienced practitioners are encouraged to take creative and intellectual risks.”

Purdom’s research interests include feminist interface design, exploring how digital projects can be empathetic platforms for both the users and the people affected by the content. She and her Scholars’ Lab colleague Jeremy Boggs are in the process of incorporating these principles into the interface of Take Back the Archive, a digital public history project being created by UVA faculty, students, librarians, and archivists.

The post Please join MITH in welcoming Purdom Lindblad to our team! appeared first on Maryland Institute for Technology in the Humanities.

 Posted by on June 23, 2016