Harry Potter meets the Middle Ages

Harry Potter: A History of Magic has been a rip-roaring success. Not only has every session of every day of our exhibition sold out (a first for the British Library), and not only did we sell more advance tickets than Tate's Hockney blockbuster, but the accompanying books have been bestsellers...

Transcription Is Complicated

In a recent PMLA issue on digital methods, Johanna Drucker concludes her article “Why Distant Reading Isn’t” by claiming that distant reading’s

literalness makes it the closest form of reading imaginable. What distant reading lacks is distance. That distance is critical; it is the space between the literal text and the virtual text, between the inscriptional, notational surface and the rhetorical, cognitive effect that produces a text. (633)

In other words, when an algorithm “reads” a corpus by scouring it for patterns of one kind or another, it doesn’t transform the text the way that human a reader does. It can get so “close” because it reads without the powerful and dynamic cognitive filters through which human readers conjure, out of the written word, literary worlds. For Drucker, closing the gap between “reader” and text in this way is one of the things that makes distant reading “the closest form of reading imaginable.”

But, crucially, human decisions shape how a program closes that gap in the first place. As Drucker argues elsewhere in the article, “modeling and paramaterization”—decisions made by scholars and programmers as to what a program will look for and, therefore, be able to find—not only “shape the terms by which a text is analyzed to produce quantitative data,” but are also “rendered almost invisible by the forms in which results are expressed” (632). These before-the-fact decisions, then, are what allow an algorithm to read from such a close range—ignoring the “rhetorical, cognitive effect that produces a text,” they engage with “the inscriptional, notational surface” according to a set of pre-established instructions to produce results of one form or another. In this sense, some might argue that the “distance” distant reading “lacks” is the gap in which literature happens: the unpredictable, unwieldy interpretive space in which a reader transforms text on a page or screen into a living work of art.


As I assemble my corpus of poetry from the Black Arts Movement, I’ve grown more interested in this gap between “inscriptional, notational surface” and “rhetorical, cognitive effect.” In the past three weeks I have transcribed approximately twenty books of poetry. This is, in many ways, the kind of “reading” that we expect a machine to be good at: tedious and time-consuming, sure, but also mechanical, even mindless—something lacking that human “distance” Drucker describes above.

When it comes to transcription, however, the devil is in the details. And anyone familiar with using OCR software to transcribe text from images knows that machines still struggle to get all the details right. After scanning pages into images and processing them with a program like ABBYY FineReader, the resulting text files are often garbled with mistakes—errors that require a human reviewer to identify, compare with the original, and correct by hand. Though an extremely useful piece of software, a program can’t be all things to all people, and I found this especially true for experimental texts like the poetry in my corpus that employ unusual indentation, spacing, punctuation, capitalization, and non-traditional spellings.

But I already knew that ABBYY FineReader would have trouble transcribing text from images from my corpus. That’s one of the reasons I decided to transcribe them by hand in the first place. What I didn’t anticipate was how much trouble I—a presumably well-trained human reader—would have transcribing text from physical documents into a text editor. This being the case even when my documents were fully intact and the text completely legible.

Over the course of the past few weeks, I found that this hairs-breadth, closest-form-of-reading-imaginable reading—the kind that seems to go no further than inscriptional surface—is also a complex task requiring creativity, imagination, and resourcefulness. Moreover, rather than being a mindless or merely mechanical task, the transcription of these texts frequently presented thorny decisions that demanded my judgement as a reader, scholar, and programmer. Arriving at these decisions often required not only a knowledge of digital methods, but also of bibliographical methods, questions of poetic form, and more practical project management skills.


Take, for example, lines from “a/coltrane/poem,” the final poem from Sonia Sanchez’s 1970 collection We A BaddDDD People (and a poem that got me listening to Coltrane’s music while transcribing):

         (soft       rise up blk/people.  rise up blk/people
         chant)   RISE.  &  BE.  what u can.
                         MUST BE.BE.BE.BE.BE.BE.BE.BE-E-E-E-E-
                                        yeh. john coltrane.
         my favorite things is u.

Like many of the poems from We A BaddDDD People, “a/coltrane/poem” makes dramatic use of indentation, punctuation, the spaces between words, and the spaces between lines. Even transcribing these lines to be published here on the Scholars’ Lab WordPress site, however, raises a number of technical and practical issues. For example, there is no easy way to produce this kind of whitespace in HTML. When web browsers parse the whitespace in poetry—indentation, tabs, etc.—they more or less get rid of it. While investigating the poetry of Mina Loy, Andrew Pilsch argues in his chapter in Reading Modernism with Machines that “the nature of HTML resists—even prevents—the easy introduction of … typographic experimentation” (245)—something he discusses earlier on his blog. Like Pilsch, I ended up having to make use of the “&nbsp” space—something Pilsch discusses more in-depth—to shoehorn spaces into the poem so it would appear correctly, I hope, in web browsers. This means that, in HTML, the above section of poetry looks like this:

In other words, a complete mess. But before trying to print parts of this poem in HTML through WordPress, at an even more basic level I had to get it into a text editor, a process which also raised a number of questions requiring practical decisions. As I type out the above lines into Atom, I have to ask: how many spaces should separate the words that seem to be a stage direction on the left— “(soft / … / chant)”—from the words on the right?

In an ideal world, I would have access to all materials used by Dudley Randall’s Broadside Press to publish this 1970 edition, as well as publication materials from all subsequent editions. Comparing these various documents, I would be able to get a better sense of the typographical materials and units of measurement used to represent Sanchez’s poem on paper. This would provide me with a more holistic sense of how to represent Sanchez’s poem in my text editor. However, given constraints on my time and resources as a Ph.D. student, as well as the size of my corpus, deciding how deep I want to dig in the archive to answer such questions requires serious consideration. Moreover, as far as I can tell, while there were printings of this edition of We A BaddDDD People as late as 1973, there were no other new editions of the work—so the edition I have is the only one I have to work with.

So when faced with the question—how many spaces should separate these words in a text file?—I looked at how far a space gets me in relation to other characters, gauged this against the kinds of spaces in poems elsewhere in the book, and made an educated guess: three after “(soft”, and one after “chant)”. The same goes for the space between “&  BE.”, which is slightly larger than the gaps separating most other words. I’m not sure exactly how much larger this gap is, so I make another educated guess, giving it two spaces instead of one.

In a multiple-page poem defined by such visual experimentation, however, trying to measure and align every word, space, and line break so that the text in my text editor resembles the text on the page—even roughly—is a real challenge. In some cases, given the functionalities of the editor I’m working with, this challenge becomes an impasse. Even in the example above: the space separating the line “yeh. john coltrane.” from the preceding line—“BE-E-E-E-E-E-”—matches the size of other line breaks within stanzas in this volume. But the space separating this line from its succeeding line—“my favorite things is u.”—is both larger than line breaks within stanzas and smaller than breaks indicating new stanzas. While transcribing, I normally represent adjacent lines in a poem with adjacent lines in my text editor; I represent stanza breaks with an empty line. How do I represent in my text editor a line break that is effectively 1.5 times the size of a normal line break? Without reworking my entire spacing system across all of my poems, I can’t—so I decided to transcribe them as adjacent lines despite the clearly visible difference on the page.

Textual Scholarship

The nature of these challenges would come as no surprise to scholars—like Drucker—interested in textual study, bibliographical study, and scholarly editing. Having had the great fortune of taking a seminar here at UVA on textual criticism and scholarly editing with David Vander Meulen, a course at the Rare Book School on the book in the American industrial era with Michael Winship, as well as many thoughtful conversations with friend, colleague, and textual scholar James Ascher, I’ve had the opportunity to adopt many of these methodological lenses as my own. These lenses help us to ask questions like: what exactly, is a literary work? Is Sanchez’s We A BaddDDD People the words printed in ink on the pages of the physical book I’m holding? If there are discrepancies between this book and later editions, how do we reconcile them? And, more relevant to my current project, how does the digital copy of this work in my text editor differ from the bound copy held at UVA’s library from which I’m making my transcription?

In considering these questions, I find helpful the vocabulary used by textual scholar G.T. Tanselle that distinguishes between document, text, and work. To offer a greatly reduced shorthand for Tanselle’s nuanced thinking on these distinctions: there are texts of works and there are texts of documents. Texts of documents refer to the words, markings, or inscriptions on a physical object that is completely unique though it may seem to be identical to other artifacts. Texts of works, on the other hand, are slightly more complicated—they consider the words as instructions for performing that intangible thing that is a verbal literary work in the minds of readers.

Though they may seem abstract, conceptual distinctions such as these have emerged from some of the most concrete, hands-on, rubber-meets-the-road scholarship in literary thought—for example, the kind of thinking that goes into examining multiple versions of a work (like King Lear) so as to produce a single scholarly edition. A distinction like Tanselle’s between texts of documents and texts of works offers a guiding light for scholar down in the often bewildering weeds of a given archive. As Tanselle argues in “Textual Criticism and Deconstruction,”

The distinction between the texts of documents (handwritten or printed, private or published) and the texts of works is basic to textual criticism. The effort to “reconstruct” or “establish” the texts of works presupposes that the texts we find in documents cannot automatically be equated with the texts of the works that those documents claim to be conveying. (1)

In other words, scholars must exercise a great deal of judgement as they try to reconcile meaningful—and sometimes extremely significant—discrepancies between versions of a given physical text as found in physical documents in their efforts to determine the text of the work itself. The role that “intentions” play in all this— as in the words that were meant to be put down—and how best to account for the mediating forces and actors at work in the publication of a book, is a point of debate in textual scholarship, often dependent on the kinds of research questions one hopes to investigate (for more reading here, see D F. McKenzie’s Bibliography and the Sociology of Texts, Jerome McGann’s The Textual Condition, and Tanselle’s “Textual Criticism and Literary Sociology”). And as many scholars have argued, these conceptual distinctions central to textual criticism and thought extend to digital artifacts as well—see, for example, Matthew Kirschenbaum’s “.txtual condition.” Scholarship such as this helps me to think through how a hand-typed .txt file of We A BaddDDD People relates to a physical codex made of paper and ink.

Stanza Breaks

Again, part of the purpose of this post is to expand on just how complicated transcription can be when it comes to performing text analysis on a literary corpus. Moreover, I’m hoping to think through how these practices are bound up with traditional bibliographical lines of inquiry. In short, I’m hoping here to offer further examples of how reading a literary text at extremely close range—Drucker’s “inscriptional, notational surface”—involves all kinds of human thought and judgement. Even if this thought and judgement are hidden in things we might take for granted—like the distinction between thinking of the book I’m holding as being Sonia Sanchez’s We A BaddDDD People, as opposed to a unique physical document inscribed with a text that intends to convey We A BaddDDD People.

So I want to offer a couple more examples of typographical concerns that came up during my transcription process. Unlike extra spaces between words in a line, these issues also more directly impact the kinds of results my analysis aims to produce, as they impact what “counts” as a line or stanza in my model.

The first has to do with stanza breaks. In my day-to-day reading practice, identifying a stanza break usually feels straightforward: lines grouped together in a poem, probably separated by white space. Digging a little deeper, The Princeton Encyclopedia of Poetry & Poetics begins its entry by defining a stanza as “a unit of poetic lines organized by a specific principle or set of principles” (1358). Likewise, The Oxford Dictionary of Literary Terms defines a stanza first and foremost as

A group of verse lines forming a section of a poem and sharing the same structure as all or some of the other sections of the same poem, in terms of the lengths of its lines, its metre, and usually its rhyme scheme. In printed poems, stanzas are separated by spaces.

While this definition doesn’t help us much with something like Sanchez’s “a/coltrane/poem”—a poem that more or less flies in the face of traditional stanzaic form—it does seem like it would help us if we wanted to make a “stanza” a parameter in our analytical models, or even in figuring out how best to separate lines and stanzas in our text files.

But even in more traditionally stanzaic poems—of which there are many in my corpus—deciding what “counts” as a stanza can get messy. Something as simple as page breaks, for instance, can wreak havoc in making such decisions. This is particularly the case when only one edition of a work exists, and one doesn’t have access to original manuscripts.

Consider, for example, a poem titled “Malcolm Spoke/  who listened?” from Haki R. Madhubuti’s 1969 collection Don’t Cry, Scream, published with Broadside Press. The poem is stanzaic, and distinguishes stanzas with what seem to me like normal breaks. These groupings, however, have no regular rhyme scheme, no regular use of capitalization, no regular number of lines, no tight thematic or narrative structure (i.e. a point of view that alternates from stanza to stanza), and no regular pattern in punctuation (i.e. some stanzas conclude with no punctuation while some conclude with a period). And, crucially, the poem extends partway onto a second page. These are the two groups of lines on either side of the page break:

animals come in all colors.
dark meat will roast as fast as whi-te meat
especially in
the unitedstatesofamerica’s
self-cleaning ovens.

For a few reasons, I decided to transcribe these two sections as a single stanza. First, at a more visual, design level, the poem has no other stanzas as short as two lines. The book as a whole, in fact, has very few two-line stanzas, and while there are a few single unattached lines, they usually come right at the end of a poem. In comparison with the rest of the poem and the other poems in the collection, then, it seemed more likely to be a larger stanza than not.

More convincingly, however, my feeling that these two chunks are one unit comes from the poem itself—the group of lines above seems, to me, to develop a coherent line of poetic thought. The first two lines introduce the metaphor of meat of “all colors” roasting, and the following line (after the page break) intensifies this imagery by locating this metaphor in the United States and its “new /self-cleaning ovens.” The lines after the page break make most grammatical and metaphorical sense when taken as part and parcel of the lines prior to the page break.

This is not to say that other poems in this volume don’t break up grammatical expressions across stanzas—they definitely do. Other poems in this volume also develop specific metaphors or images over the course of several stanzas. But with this poem in particular, stanzas seem to be doing something else. Each has a kind of conceptual focus—they stand alongside one another as evenly-weighted, coherent units of expression. For example, the stanza preceding the one quoted above is as follows:

the double-breasted hipster
has been replaced with a
dashiki wearing rip-off
who went to city college
majoring in physical education.

This stanza develops, from line to line, a description of—and stance towards—this “dashiki wearing rip-off” who replaces the “double-breasted hipster.” Each line builds on the last, slowly unfolding different aspects of how one figure “has been replaced” with another: the speaker discloses a skeptical attitude towards these figures, identified by what they wear, where they went to school, and what they studied. Like the stanza with the page break, this group of lines seems to me to develop a coherent line of thought that doesn’t spill over into subsequent stanzas.

Understanding these stanzas in light of the poem as a whole, then, aligns with this reading: the rhythm of the poem as it moves from stanza to stanza seems to emerge from a feeling of moving from one idea to the next—and, for me as a reader, breaking this group of lines at the page break into two different stanzas feels like it disrupts that rhythm.

It could certainly be argued that the group of lines with the page break was meant to be two stanzas specifically so as to disrupt the rhythm of this stanzaic form—that such a disruption is vital to the poem’s meaning. But, as is the case with scholarly editing, I had to make a judgement call to proceed with my project. So I considered everything I knew, tried to find out more if possible, and made the best decision I could given what I had in front of me.


One last example. Lines of poetry can get very long. Sometimes, lines get too long for the physical documents on which they’re inscribed. During an enlightening conversation with Jahan Ramazani on this and many other issues addressed in this post, he gave me the example of editing The Norton Anthology of Modern and Contemporary Poetry and having to print and number the extremely long lines of Allen Ginsberg’s “Howl.” Central to this decision-making process was considering standard practice on what the Chicago Manual of Style calls “Long lines and runovers in poetry.”

The CMS defines runovers as “the remainder of lines too long to appear as a single line,” which are “usually indented one em from the line above.” In other words, when lines get too long—as in Ginsberg’s poetry, or Walt Whitman’s—a hanging indent about an em-dash in length tells the reader that the line was too long for the book. The entry concludes, however, by indicating that it might not always be so clear when an indentation is a runover and when it’s a new line:

Runover lines, although indented, should be distinct from new lines deliberately indented by the poet … Generally, a unique and uniform indent for runovers will be enough to accomplish this.

As we’ve seen already just in this post, much of poetry in my corpus rebels against traditional poetic form, including standard indentation and spacing practices. Determining whether or not a group of words is one or two lines, however, is extremely important for my project. The “line” is the basic unit I’ve been asking sentiment analysis tools in TextBlob and NLTK to evaluate for sentiment. In short: what counts as a line really matters, and ambiguities surrounding runovers could very well add up to have a significant impact on the results of my analyses.

An excellent example of this appears a few pages earlier in Madhubuti’s Don’t Cry, Scream, in a poem titled “Gwendolyn Brooks.” The poem is available online through the Poetry Foundation, and it appears in my physical copy as it does on this website, indentations and all. Halfway through the poem there is a distinct sequence, over a dozen lines long, that lists a series of portmanteaus describing different kinds of “black”—from “360degreesblack” to “blackisbeautifulblack” and “i justdiscoveredblack.” Over the course of this sequence, there are three indented lines, each one-word long, that interrupt the otherwise steady stream of images.

At first bluff, these lines struck me as runovers. The list-like nature of the lines felt like they lent themselves to running a little long—as we see with a poet like Whitman, once a list starts, it can just keep going and going. Moreover, no thematic or poetic reason jumped out at me as to why someone might indent these words as opposed to any others. Of course, there is the possibility that such indentations were completely on purpose, and are part of a project to disrupt and transform any resonance with someone like Whitman and the canon he represents. Sitting in front of my computer, a little bleary-eyed from all the transcribing, I honestly wasn’t sure.

So I began looking for other appearances of the poem. The version published by the Poetry Foundation complicated my initial thought that these one-word indented lines were runovers. Jahan Ramazani also suggested that, given the importance of anthologies to the Black Arts Movement, even if a book has no later editions, individual poems therein might appear somewhere in a collection.

Such a realization, however, presents another fork in the road of my research. As a researcher committed to being as thoughtful and thorough as possible as I work with the poetry from a revolutionary art movement, I am delighted to know that I still might be able to pursue questions that I thought would remain unanswered (i.e., “is this a runover line or two separate lines?”). As a researcher with limited resources, however, I have to decide whether or not pursuing these questions will be the best use of my time and energy in this particular project. There are a lot of anthologies containing poetry from the Black Arts Movement out there, so I have to weigh the time it would take to locate and look through them all for instances of those poems from my ~20 book corpus that may have runover lines, against the potential impact it would have on the results of the analyses I hope to perform. As it currently stands, I’ve made a note of this particular ambiguity and plan to reassess what I should do with it and others like it after assembling the rest of the corpus.

Final Thoughts

As this post has hopefully shown, transcribing texts from book to screen can get very tricky. More than a simple act of mechanical reproduction, it can stump us with questions about literary works that seem to have no discernible answers. From one moment to the next, it can demand a working knowledge of bibliographical methods; digital methods; aesthetic form; and how to manage a project’s resources. And—as Drucker above argues regarding text analysis more generally—navigating these questions requires rigorous human judgement every step of the way. Even in situations where the practicalities of project management and the realities of our textual archive make this judgement feel all-too-fallible.

There are other, important aspects of this human judgement which I haven’t had time to think through as much as I would like to have in this post. For example, digging deeper into those questions explored by Andrew Pilsch mentioned above that investigate the challenging ways in which web browsers are designed to parse the whitespace in poetry in HTML. Or, how the default parameters of the basic tokenizing packages in NLTK throw away whitespace—the idea that the programmers behind these text analysis technologies view their standard use as most likely to focus on text, not the spaces between text.

Very long story short: transcription is complicated! And I hope this post has done something to foreground some of those invisible, behind-the-scenes decisions that—like modeling and parameterization—give shape to the results a text analysis project produces.

Introducing Storage Access API

In June last year we introduced Intelligent Tracking Prevention (ITP). ITP is a privacy feature that detects which domains have the ability to track the user cross-site and either partitions the domain’s cookies or purges its website data all together.

The strongest developer feedback we got on ITP was that it needs to provide a way for embedded cross-site content to authenticate users who are already logged in to their first-party services. Today we are happy to provide a solution in the form of Storage Access API. It allows for authenticated embeds while continuing to protect customers’ privacy by default.

Partitioned Cookies and Embedded Content

Let’s say that socialexample.org is embedded on multiple websites to facilitate commenting or “liking” content with the user’s socialexample ID. ITP will detect that such multi-page embeds gives socialexample.org the ability to track the user cross-site and therefore deny embedded content from socialexample.org access to its first-party cookies, providing only partitioned cookies. This breaks the user’s ability to comment and like content unless they have interacted with socialexample.org as first-party site in the last 24 hours. (Please see the original ITP blog post for the exact rules around partitioned cookies.)

The same goes for embedded third-party payment providers and embedded third-party videos from subscription services. As soon as ITP detects their tracking abilities, it denies them first-party cookie access outside the 24 hour window, and the embedded content treats the user as logged out even though they are logged in.

We’ve made tradeoffs for user privacy. But it would be even better if we could provide the benefits of being logged in to third party iframes, provided that the user is actually interested in using them, while still protecting privacy.

The Solution: Storage Access API

The solution is to allow third-party embeds to request access to their first-party cookies when the user interacts with them. To do this, we created the Storage Access API.

The Storage Access API offers two new functions to cross-origin iframes — document.hasStorageAccess() and document.requestStorageAccess(). It also offers the embedding top frame a new iframe sandbox token — “allow-storage-access-by-user-activation”.

Storage access in this context means that the iframe has access to its first-party cookies, i.e. the same cookies it would have access to as a first-party site. Note that storage access does not relax the same-origin policy in any way. Specifically, this is not about third-party iframes getting access to the embedding website’s cookies and storage, or vice versa.

WebKit’s implementation of the API only covers cookies for now. It does not affect the partitioning of other storage forms such as IndexedDB or LocalStorage.

Check For Storage Access

A call to document.hasStorageAccess() returns a promise that resolves with a boolean indicating whether the document already has access to its first-party cookies or not. Should the iframe be same-origin as the top frame, the promise returns true.

var promise = document.hasStorageAccess();
  function (hasAccess) {
    // Boolean hasAccess says whether the document has access or not.
  function (reason) {
    // Promise was rejected for some reason.

Request Storage Access

A call to document.requestStorageAccess() upon a user gesture such as a tap or click returns a promise that is resolved if storage access was granted and is rejected if access was denied. If storage access was granted, a call to document.hasStorageAccess() will return true. The reason why iframes need to call this API explicitly is to offer developers control over when the document’s cookies change.

function makeRequestWithUserGesture() {
  var promise = document.requestStorageAccess();
    function () {
      // Storage access was granted.
    function () {
      // Storage access was denied.
<button onclick="makeRequestWithUserGesture()">Play video</button>

The iframe needs to adhere to a set of rules to be able to get storage access granted. The basic rules are:

  • The iframe’s cookies need to be currently partitioned by ITP. If they’re not, the iframe either already has cookie access or cannot be granted access because its cookies have been purged.
  • The iframe needs to be a direct child of the top frame.
  • The iframe needs to be processing a user gesture at the time of the API call.

Below are the detailed rules for the promise returned by a call to document.requestStorageAccess(). When we say eTLD+1 we mean effective top-level domain + 1. An eTLD is .com or .co.uk so an example of an eTLD+1 would be social.co.uk but not sub.social.co.uk (eTLD+2) or co.uk (just eTLD).

  1. If the sub frame is sandboxed but doesn’t have the tokens “allow-storage-access-by-user-activation” and “allow-same-origin”, reject.
  2. If the sub frame’s parent is not the top frame, reject.
  3. If the browser is not processing a user gesture, reject.
  4. If the sub frames eTLD+1 is equal to the top frame’s eTLD+1, resolve. As an example, login.socialexample.co.uk has the same eTLD+1 as www.socialexample.co.uk.
  5. If the sub frame’s origin’s cookies are currently blocked, reject. This means that ITP has either purged the origin’s website data or will do so in the near future. Thus there is no storage to get access to.
  6. If all the above has passed, resolve.

Access Removal

Storage access is granted for the life of the document as long as the document’s frame is attached to the DOM. This means:

  • Access is removed when the sub frame navigates.
  • Access is removed when the sub frame is detached from the DOM.
  • Access is removed when the top frame navigates.
  • Access is removed when the webpage goes away, such as a tab close.

Sandboxed Iframes

If the embedding website has sandboxed the iframe, it cannot be granted storage access by default. The embedding website needs to add the sandbox token “allow-storage-access-by-user-activation” to allow successful storage access requests. The iframe sandbox also needs the tokens “allow-scripts” and “allow-same-origin” since otherwise it can’t call the API and doesn’t execute in an origin that can have cookies.

<iframe sandbox="allow-storage-access-by-user-activation allow-scripts allow-same-origin"></iframe>

A Note On Potential Abuse

We have decided not to prompt the user when an iframe calls the Storage Access API to make the user experience as smooth as possible. ITP’s rules are an effective gatekeeper for who can be granted access, and for the time being we rely on them.

However, we will monitor the adoption of the API and make changes if we find widespread abuse where the user is clearly not trying to take some authenticated action in the calling iframe. Such API behavior changes may be prompts, abuse detection resulting in a rejected promise, rate limiting of API calls per origin, and more.


Storage Access API is available in Safari 11.1 on iOS 11.3 beta and macOS High Sierra 10.13.4 beta, as well as in Safari Technology Preview 47+. If you’re interested in cross-browser compatibility, please follow the whatwg/html issue for Storage Access API.


Please report bugs through bugs.webkit.org, or send feedback on Twitter to the team @webkit, or our evangelist @jonathandavis. If you have technical questions about how the Storage Access API works, you can find me on Twitter @johnwilander.

Release Notes for Safari Technology Preview 50

Safari Technology Preview Release 50 is now available for download for macOS Sierra and macOS High Sierra. If you already have Safari Technology Preview installed, you can update from the Mac App Store’s Updates tab. This release covers WebKit revisions 227873-228454.

Service Workers

  • Added support for cache storage of blob responses (r228326)
  • Changed to queue a microtask when a waitUntil() promise is settled (r227959)
  • Delayed service worker process creation until actually needed (r227989)
  • Delayed saving service worker registrations to disk until after the activation succeeds (r228180)
  • Fixed issue with IndexedDB databases not persisting inside Service Workers (r228230)
  • Fixed issue where service workers jobs would sometimes not get processed anymore (r228101)
  • Fixed clearing a registration to properly null out its workers before setting their state to "redundant" (r228015)
  • Fixed clearing all service worker registrations to wait for importing the service worker registration to finish (r228025, r228034)
  • Started nulling out registration.installing before setting service worker state to “redundant” when install fails (r227997)

Web App Manifest

  • Changed to default Web App Manifest scope to the containing directory of the start URL when 'scope' is not specified (r228036)

Payment Request

  • Changed show() to take an optional PaymentDetailsUpdate promise (r228195)
  • Fixed payment sheet not dismissing when calling complete() with result "unknown" or "fail" (r228342)


  • Implemented createImageBitmap(HTMLVideoElement) (r228092)


  • Corrected invaliding style for sibling combinators on class change (r227956)
  • Fixed rendering SVG images with same size as WebGL texture (r228213)
  • Fixed computing inline-block baseline for vertical-lr (r227947)

Web Inspector

  • Added listing of Canvases, Programs, and Recordings to the sidebar (r228301)
  • Fixed the Canvas tab tree selection abruptly changing when selecting a recording frame (r228362)
  • Fixed pasting multiple properties to create properties instead of causing a bad property in the Styles Sidebar (r228030)
  • Fixed the completion popover not hiding when switching panels in the Styles Sidebar (r228232)
  • Fixed typing a value and quickly moving focus away sometimes displaying an outdated value in the Styles Sidebar (r228296)
  • Updated the Elements tab to have “Jump to Layer” functionality (r228215)

Web Driver

  • Changed cookies returned by automation to have expiry time in seconds (r227891)
  • Changed to not return an error if resizing or moving a window has no effect (r228434)
  • Prepended a dot to the domain when missing in the addCookie command (r228087, r228371)


  • Fixed Accessibility getting notified when a web process cancels suspension (r228350)
  • Deferred attribute computation until needed (r228279)
  • Deferred focus notifications for UI elements (r228417)


  • Changed to throw an exception when using structured cloning on a Symbol (r227969)
  • Fixed an incorrect case of variable resolution to consult the global lexical environment first before the global object (r227898)

Doctoral studentship, Digital Grammar of Greek Documentary Papyri, Helsinki

Posting for Marja Vierros. Full details and applications forms at University of Helsinki site.

Applications are invited for a doctoral student for a fixed term of up to 4 years, starting in the fall of 2018 to work in the University of Helsinki. The selected doctoral candidate will also need to apply for acceptance in the Doctoral Programme for Language Studies at the Faculty of Arts during the fall application period. The candidate’s main duties will consist of PhD studies and writing of a dissertation.

The doctoral candidate will study a topic of his/her choice within the historical development and linguistic variation of Greek in Egypt (e.g. certain morphosyntactic variation as a sign of bilingualism), by way of producing a selected, morphosyntactically annotated corpus of documentary papyri, according to Dependency Grammar. The candidate’s duties include participation in regular team meetings and presenting his/her research at seminars and academic conferences. The candidate is expected to also take part in designing the online portal that presents the results of the project.

The appointee to the position of doctoral student must hold a Master’s degree in a relevant field and must subsequently be accepted as a doctoral candidate in the Doctoral Programme mentioned above. Experience in linguistic annotation, corpus linguistic methods or programming are an asset, but not a requirement. The appointee must have the ability to conduct independent scientific research. The candidate should have excellent analytical and methodological skills, and be able to work both independently and collaboratively as part of a multidisciplinary scientific community. The successful candidates are expected to have excellent skills in written and oral English. Skills in Finnish or Swedish are not required. Relocation costs can be negotiated and the director will offer help and information for the practicalities, if needed.

Who’s winning the medal race, depending on how you weight the medals

Every year, we look at the medal counts of each country. Who’s winning? It depends on how much value you place on each medal. Do you only count the golds and disregard silver and bronze? Do you just treat all medals the same? Josh Katz for The Upshot lets you test all the possibilities with this interactive.

Apply different values to each medal type by mousing over the x-y coordinate plane and see how the country rankings shift.

Tags: , ,