There have been a handful of reactions to this work, some questioning the core motivation, essentially variants of "if there are biases in the data, they're there for a reason, and removing them is removing important information." The authors give a nice example in the paper (web search; two identical web pages about CS; one mentions "John" and the other "Mary"; query for "computer science" ranks the "John" one higher because of embeddings; appeal to a not-universally-held-belief that this is bad).
I'd like to take a step back and argue the the problem is much deeper than this. The problem is that even though we all know that strong Sapir-Whorf is false, we seem to want it to be true for computational stuff.
At a narrow level, the issue here is the question of what does a word "mean." I don't think anyone would argue that "nurse" means "female" or that "computer scientist" means "male." And yet, these word embeddings, which claim to be capturing meaning, are clearly capturing this non-meaning-effect. So then the argument becomes one of "well ok, nurse doesn't mean female, but it is correlated in the real world."
Which leads us to the "black sheep problem." We like to think that language is a reflection of underlying truth, and so if a word embedding (or whatever) is extracted from language, then it reflects some underlying truth about the world. The problem is that even in the simplest cases, this is super false.
The "black sheep problem" is that if you were to try to guess what color most sheep were by looking and language data, it would be very difficult for you to conclude that they weren't almost all black. In English, "black sheep" outnumbers "white sheep" about 25:1 (many "black sheep"s are movie references); in French it's 3:1; in German it's 12:1. Some languages get it right; in Korean it's 1:1.5 in favor of white sheep. This happens with other pairs, too; for example "white cloud" versus "red cloud." In English, red cloud wins 1.1:1 (there's a famous Sioux named "Red Cloud"); in Korean, white cloud wins 1.2:1, but four-leaf clover wins 2:1 over three-leaf clover. [Thanks to Karl Stratos and Kota Yamaguchi for helping with the multilingual examples.]
This is all to say that co-occurance frequencies of words definitely do not reflect co-occurance frequencies of things in the real world. And the fact that the correlation can both both ways means that just trying to model a "default" as something that doesn't appear won't work. (Also, computer vision doesn't really help: there are many many pictures of black sheep out there because of photographer bias.)
We observed a related phenomena when working on plot units. We were trying to extract "patient polarity verbs" (this idea has now been expanded and renamed "implicit sentiment": a much better name). The idea is that we want to know what polarity verbs inflict on their arguments. If I "feed" you, is this good or bad for you? For me? If I "punch" you, likewise. We focused on patients because action verbs are almost always good for the agent.
In order to accomplish this, we started with a seed list of "do-good-ers" and "wrong-do-ers." For instance, "the devil" was a wrong do-er, and so we can extract things that the devil does, and assume that these are (on average) bad for their patients. The problem was the "do-good-ers" don't do good, or at least they don't do good in the news. One of our do-good-ers was "firefighter". Firefighters are awesome. Even stereotyped, this is arguably a very positive social good, heroic profession. But in the news, what do firefighters do? Bad things. Is this because most firefighters do bad things in the world? Of course not. It's because news is especially poignant when stereotypically good people do bad things.
This comes up in translation too, especially when looking at looking at domain adaptation effects. For instance, our usual example for French to English translation is that in Hansards, "enceinte" transates as "room" but in EMEA (medical domain), it translates as "pregnant." What does this have to do with things like gender bias? In Canadian Hansards, "merde" translates mostly as "shit" and sometimes as "crap." In movie subtitles, it's very frequently "fuck." (I suspect translation direction is a confounder here.) This is essentially a form of intensification (or detensification, depending on direction). It is not hard to imagine similar intensifications happening between racial descriptions and racial slurs, or between gender descriptions and sexist slurs, depending on where the data came from.
Video game developer Michael Davies provides a Blender script to procedurally generate pretty 3-D spaceships. Enter your parameters, such as number of hull segments, scaling, and rotation, and you’ve got a new vehicle for the stars. [via @albertocairo]
Great crossover article about using tools developed for genetic analysis to also analyze pop music from 1960s onward. It’s an interesting read and I encourage reading all of it so I’ll just entice you with one conclusion.
There’s a popular conception in the music industry that in recent years pop music has become less diverse. Usually the arguments involve standard economic ideas such as a diminishing pool of leaders emerging from an initially diverse pool of players. And in the music industry that seems to make sense with the usual arguments being limited air time to play a broader range of hits and a somewhat consolidation of radio stations.
However, the linked article and the chart I’m using to showcase the article point out this is not the case. The analysis technique quantifiably identified 13 musical genres.Since these genres were identified algorithmically, it isn’t initially clear what actual genres these 13 belong to. While it’d be great if we could click on a link for each of these genres and hear a sample, the authors did the next best thing and used the tags from a popular music website to verbally describe the specific genres. This is why there’s a fair amount of overlap in this “naming scheme” (eg “love song” is one of the tags for both genre 3 and genre 10).
One way to verify the conclusion that music is as diverse now as it was then is to look at the thinnest lines at the top and bottom of the graph. At the bottom, in 1960 there were only 2-3 very thin lines (one obviously belongs to rap, genre 2), but also at the top there are 2-3 very thin lines (here one belongs to “funk, blues, jazz, soul”, genre 4). Basically 10-11 active genres in the 1960s and 10-11 active genres now
Stopped at the Coliseum and
went to the very topmost platform of this tremendous ruin
and looked into the far depths of the present excavations. Walked all
around among the arches and after coming down to what used to be
considered the arena we went boldly down an inclined plain in
among the recently found arches etc into two or three long passages
heading ever so far away, saw the bronze sockets in the large
flag stones in the middle of the passage where the gates are
supposed to have turned to admit the animals. Some columns
and capitols and fragments of statuary and slabs with figures engraved on them.
some of them representing fighting or chase. After a late lunch took
a walk through via Babuino, Condoti and Corso home. On the old
pavement below the arena are long pieces of charred wood
with smaller crossbars of the same – very curious.
Filmmaker Oscar Sharp and technologist Ross Goodwin fed a machine learning algorithm with a bunch of Sci-Fi movie scripts to see what new script it would spit out. A script for Sunspring is the result, and this is the film, starring Thomas Middleditch. Riveting.
The thought of a machine tapping into emotion and creativity likely brings some sneers, but Goodwin argues that it’s about assistance and augmentation rather than a replacement for the humans.
The machine dictated that Middleditch’s character should pull the camera. However, the reveal that he’s holding nothing was a brilliant human interpretation, informed by the production team’s many years of combined experience and education in the art of filmmaking. That cycle of generation and interpretation is a fascinating dialogue that informs my current understanding of this machine’s capacity to augment our creativity.
Safari Technology Preview Release 7 is now available for download. If you already have Safari Technology Preview installed, you can update from the Mac App Store’s Updates tab. Release 7 of Safari Technology Preview covers WebKit revisions 201541–202085.
- Implemented options argument to
JSON.stringifyto correctly transform numeric array indices (r201674)
- Improved the performance of Encode operations (r201756)
- Addressed issues with Date setters for years outside of 1900-2100 (r201586)
- Fixed an issue where reusing a function name as a parameter name threw a syntax error (r201892)
- Added the
window.onerrorevent handlers (r202023)
- Improved performance for accessing dictionary properties (r201562)
Proxy.ownKeysto match recent changes to the spec (r201672)
RegExpunicode parsing from reading an extra character before failing (r201714)
- Improved the sampling profiler to protect itself against certain forms of sampling bias that arise due to the sampling interval being in sync with some other system process (r202021)
- Fixed global lexical environment variables scope for functions created using the
- Fixed parsing
superwhen the default parameter is an arrow function (r202074)
- Added support for trailing commas in function parameters and arguments (r201725)
- Added the unprefixed version of the pseudo element ::placeholder (r202066)
- Fixed a crash when computing the style of a grid with only absolute-positioned children (r201919)
- Fixed computing a grid container’s height by accounting for the horizontal scrollbar (r201709)
- Fixed placing positioned items on the implicit grid (r201545)
- Fixed rendering for the
- Fixed support for using
backdrop-filterproperties together (r201785)
- Fixed clipping for
border-radiuswith different width and height (r201868)
- Fixed CSS reflections for elements with WebGL (r201639)
- Fixed CSS reflections for elements with a
- Improved the Document’s font selection lifetime in preparation for the CSS Font Loading API (r201799)
- Improved memory management for CSS value parsing (r201608)
- Improved font face rule handling for style change calculations (r201971, r202085)
- Fixed multiple selector rule behavior for keyframe animations (r201818)
- Fixed applying CSS variables correctly for
- Added experimental support for
spring()based CSS animations (r201759)
- Changed the initial value of
transparentper specs (r201666)
CanvasRenderingContext2D.putImageData()to throw the correct exception type and align with the specification (r201664)
- Fixed a number of issues with Web Workers (r201876, r201970, r201918, r201926, r201791, r201898, r201925, r201808)
- Added ⌘T keyboard shortcut to open the New Tab tab (r201692, r201762)
- Added the ability to show and hide columns in data grid tables (r202009, r202081)
- Fixed an error when trying to delete nodes with children (r201843)
- Added gaps to the overview and category graphs in the Memory timeline where discontinuities exist in the recording (r201686)
- Improved the performance of DOM tree views (r201840, r201833)
- Fixed filtering to apply to new records added to the data grid (r202011)
- Improved snapshot comparisons to always compare the later snapshot to the earlier snapshot no matter what order they were selected (r201949)
- Improved performance when processing many
- Fixed the 60fps guideline for the Rendering Frames timeline when switching timeline modes (r201937)
- Included the exception stack when showing internal errors in Web Inspector (r202025)
- Added ⌘P keyboard shortcut for quick open (r201891)
- Removed Text → Content subsection from the Visual Styles Sidebar when not necessary (r202073)
<template>content that should not be hidden as Shadow Content (r201965)
- Fixed elements in the Elements tab losing focus when selected by the up or down key (r201890)
- Enabled combining diacritic marks in input fields in Web Inspector Enabled combining diacritic marks in input fields in Web Inspector (r201592)
- Prevented double-painting the outline of a replaced video element (r201752)
- Properly prevented
video.src="file"with audio user gesture restrictions in place (r201841)
- Prevented showing the caption menu if the video has no selectable text or audio tracks (r201883)
- Improved performance of HTMLMediaElement.prototype.canPlayType that was accounting for 250–750ms first loading theverge.com (r201831)
- Fixed inline media controls to show PiP and fullscreen buttons (r202075)
- Fixed a repaint issue with vertical text in an out-of-flow container (r201635)
- Show text in a placeholder font while downloading the specified font (r201676)
- Fixed rendering an SVG in the correct vertical position when no vertical padding is applied, and in the correct horizontal position when no horizontal padding is applied (r201604)
- Fixed blending of inline SVG elements with transparency layers (r202022)
- Fixed display of hairline borders on 3x displays (r201907)
- Prevented flickering and rendering artifacts when resizing the web view (r202037)
- Fixed logic to trigger new layout after changing canvas height immediately after page load (r201889)
- Fixed an issue where Find on Page would show too many matches (r201701)
- Exposed static text if form label text only contains static text (r202063)
- Added Origin header for CORS requests on preloaded cross-origin resources (r201930)
- Added support for the
upgrade-insecure-requests(UIR) directive of Content Security Policy (r201679, r201753)
- Added proper element focus and caret destination for keyboard users activating a fragment URL (r201832)
- Increased disk cache capacity when there is lots of free space (r201857)
- Prevented hangs during synchronous XHR requests if a network session doesn’t exist (r201593)
- Fixed the response for a POST request on a blob resource to return a “network error” instead of HTTP 500 response (r201557)
- Restricted HTTP/0.9 responses to default ports and cancelled HTTP/0.9 resource loads if the document was loaded with another HTTP protocol (r201895)
- Fixed parsing URLs containing tabs or newlines (r201740)
- Fixed cookie validation in private browsing (r201967)
- Provided memory cache support for the
Varyheader (r201800, r201805)
MITH is excited to announce that Purdom Lindblad will be joining us in the newly established position of Assistant Director for Innovation and Learning, beginning in July. In this position, she will play a leadership role in managing MITH’s growing portfolio of courses and instructional programs.
Purdom comes to us from Scholars’ Lab at the University of Virginia, where as Head of Graduate Programs she collaborated with graduate student fellows, developers, librarians, and designers to create a space for experimentation and play. She was a crucial team member of the Praxis Program, which introduces graduate students to research questions and methods for the digital humanities, and she also worked with UVA’s Director of Diversity to develop a Leadership Alliance Mellon Institute (LAMI), a digital humanities-inflected summer research program, for which she developed two courses in Research Methods and an Introduction to Digital Humanities. Dedicated to cultivating supportive communities for learning, Purdom thrives in collaborative environments where people are at the heart of her work. As she notes,
“I strive to foster spaces and programs where novices as well as experienced practitioners are encouraged to take creative and intellectual risks.”
Purdom’s research interests include feminist interface design, exploring how digital projects can be empathetic platforms for both the users and the people affected by the content. She and her Scholars’ Lab colleague Jeremy Boggs are in the process of incorporating these principles into the interface of Take Back the Archive, a digital public history project being created by UVA faculty, students, librarians, and archivists.
The post Please join MITH in welcoming Purdom Lindblad to our team! appeared first on Maryland Institute for Technology in the Humanities.