Nov 072011
 

by Sebastian Drude

Tuesday, the 11th of October 2011, the new unit of the Max-Planck-Institute for Psycholinguistics “The Language Archive” (TLA) has been officially launched in a public event with more than 150 guests and speeches from eminent representatives from Germany and the Netherlands.

Many more showed up than expected: there were even not enough seats for all guests at the launching of TLA in the Headquarters of the Berlin-Brandenburgische Akademie der Wissenschaften (BBAW) at the Gendarmenmarkt in the center of Berlin. The BBAW is one of the three supporting institutions of TLA, together with the Dutch Koninklijke Nederlandse Akademie van Wetenschappen (KNAW) and the German Max-Planck-Gesellschaft (MPG).

The guests were presented with coffee and snacks, but before and above all with much content: five eminent representatives of the major stakeholders of the new unit gave fascinating talks discussing different topics, all related to the ongoing and future activities of TLA. These were on the one hand the respective representatives of the three supporting institutions: Wolfgang Klein for the MPG, Angelika Storrer for the BBAW, and Theo Mulder for the KNAW. On the other hand, Wilhelm Krull represented the Volkswagenstiftung, the funding agency that supports the programme “Documentation of Endangered Languages” (DOBES) since 2000, which in turn was represented by Nikolaus P. Himmelmann. The DOBES archive is in many respects the core of the archive hosted by TLA. After the talks, Paul Trilsbeek provided a look into the archive itself.

The full program and topics of the speeches

Begrüßung und Zielstellung für das Spracharchiv
Prof. Dr. Wolfgang Klein
Direktor am Max Planck Institut für Psycholinguistik

Sprachforschung und Sprachdokumentation im digitalen Zeitalter
Prof. Dr. Angelika Storrer
Zentrum Sprache der BBAW

E-science: a major challenge for the humanities
Prof. Dr. Theo Mulder
Forschungsdirektor der KNAW

Dokumentation bedrohter Sprachen – eine Aufgabe für Wissenschaft und Gesellschaft
Dr. Wilhelm Krull
Generalsekretär der VolkswagenStiftung

Wie die Sprachwissenschaft zur Empirie fand (und findet)
Prof. Dr. Nikolaus P. Himmelmann
Universität Köln

Blick ins Archiv
(interactive presentation)

The TLA Opening in the media:

  • The TLA opening in the paper press
  • Sep 152011
     

    by Lena Karvovskaya and Soraya Hosni

    The DoBeS Project “Languages of Southwest Ambrym” is happy to invite you to an exhibit in the newly opened exhibition-center Humboldt-Box in the heart of Berlin. The exhibit “Sprachdokumentation auf Südwest-Ambrym” (Flyer with more information) will be open to the public from 1st of July till 31st of December 2011.

    The project team members wanted the installation to present the different ways in which cultures, language and knowledge are transmitted within written (books and recordings) and oral societes (sand drawing and story telling). The highlights of the installation are sandroings: a unique form of art practiced in Vanuatu. An example of such a performance is shown in a short film “The Liliwi masks story” projected on the ground. The film shows an elder man drawing complex geometric figures onto the sand with a continuous one finger movement so that it will end up forming a specific picture. The drawing is followed by a story or a description. This is a sandroing performance. The Liliwi masks story has a sand drawing to illustrate the narrative.

    A typical sandroing

    The exhibit also shows a real Sandroing left by Abel Taho as he was our guest in Berlin from Ambrym. Visitors can also try themselves to make the performance, all you need to do is to follow the instructions which a young girl on the video is giving you: Joelyne teaches German children how to draw a breadfruit. Additionally you can watch a film on the process of linguistic fieldwork at the installation. One can see how the recordings are being transcribed and translated and how a dictionary is being composed. There is also a beautiful illustration for the dictionary done by local artist Joebang Maaseng.

    For those who want to see and to hear more about the “Languages of Southwest Ambrym”, there is a video channel on Youtube, where Soraya Hosni shares her works. At the moment it contains the film about language documentation, the video of the Liliwi sandroing performance and two films which give you instructions on how to make a sandroing yourself. The channel will be regularly updated with new films.

    Visitors at the Ambrym exhibition

    The project “Languages of Southwest Ambrym” is also presented to the broader public through “Science movies”, the videoblog of Volkswagen Stiftung. “Wer spricht noch Daakaka?” is a series of 10 shorts, filmed by Susanne Fuchs and Soraya Hosni, in which we follow them on their journey from Berlin to Ambrym. We learn about daily life in the island, from preparing meals and basic hygiene to how houses are built or marriages are celebrated. We can admire the unique volcanic landscape and tropical vegetation but we can also learn about how the “Languages of Southwest Ambrym” Team conduct linguistic and ethnographic fieldwork and collaborate with local leaders, schools and children to make the best out of the research to the survival of the language and the cultural for future generations. (More about “Science movies” can be found on the accompanying web site)

    The Project “Languages of Southwest Ambrym” has started in August 2009. It investigates three language varieties spoken on Ambrym, a volcanic island in the northern part of Vanuatu: Daakaka, Daakiye and Dal kalaen. The goal of the project is documentation of linguistic and cultural heritage of the people of Ambrym. During extensive fieldwork sessions the team members make recordings of custom stories and cultural practices. Among others the project has created a collection of sandroing. Each drawing has been documented together with the language performance.

    The team members are: Prof. Dr. Manfred Krifka, Soraya Hosni, Kilu von Prince, Dr. Susanne Fuchs and Lena Karvovskaya (student assistant). (To learn more about the Project “Languages of Southwest Ambrym” visit the official websites at the MPI or at the ZAS.)

    Feb 122011
     

    by Paul Trilsbeek

    Field linguists often ask me whether they shouldn’t be recording audio in high definition, 24 bit 96 kHz format, because their recorder has this option and the higher the quality the better, right? Well, not really. I’ll try to explain why it doesn’t make much sense to do so and why we even convert all audio recordings that we receive at The Language Archive to 16 bit 44.1 or 48 kHz.

    When the digital audio CD standard was developed, it was argued that a digital representation of the audio signal using 16 bits and a sampling frequency of 44.1 kHz was sufficient to capture all the details a human being would be able to hear in a musical recording. For most types of music that is actually the case, only some highly dynamic music with both very loud as well as very silent passages might not fit in the 96 dB of dynamic range that 16 bits of audio resolution offer. Nonetheless, companies selling audio equipment such as Philips and Sony saw the need to introduce newer formats such as the Super Audio CD and the DVD-Audio format at the end of the nineties, not unlikely driven by the idea to have consumers replace their perfectly fine CD players with the latest state of the art. Both turned out a commercial failure. Still, high definition audio has gained some ground in the recording industry and during the last years also in “prosumer” audio recording equipment.

    Before I go into the issue whether or not humans can actually hear a difference between HD and regular CD-quality audio, let me give some arguments why from a technical point of view it makes little to no sense for field linguists to record in 24 bit 96 kHz or higher.

    Many cheap portable audio recording devices these days offer the possibility to record in 24 bit at 96 kHz. Recording with a sampling frequency of 96 kHz means that in theory you can record frequencies up to 48 kHz, more than double the highest frequency that (young) human beings can hear and way beyond the highest frequency components that are present in a speech signal (about 7 kHz). The built-in microphones in these types of recorders however do not capture anything above 16 kHz at most, so in order to record higher frequencies, one needs to use an external microphone. There are microphones on the market that record frequencies up to 40 or 50 kHz, but these are not the kind of microphones a linguist would typically take into the field if they even were within their budget (>3000 € a piece). The same is true for the dynamic range. 16 bit recordings can have a theoretical dynamic range of 96 dB, 24 bit recordings can have a dynamic range of 144 dB. The background noise in a very quiet room has a sound pressure level of about 20-30 dB, the human pain threshold lays around 130 dB. Human speech has a dynamic range of about 40 dB. Very good microphones have a dynamic range of about 120 dB, however the type of microphone a linguist is likely to be using in the field does not have a dynamic range higher than about 75 dB. Recording high definition audio from a technical point of view only makes sense with ultimate quality recording equipment, for example in a recording studio or in a high-end digitization facility.

    Some argue that recording in 24 bit would allow one to leave more “headroom” for unexpected peaks when setting the recording level. This is only true though for the level of the analog line-level signal that goes into the analog–to-digital converter of the recorder. Most portable audio recorders only allow one to adjust the input gain of the microphone preamplifier, which should be adjusted properly anyhow to achieve a good signal-to-noise ratio, regardless of whether one records in 16 or 24 bit.

    Some analogue carriers can actually reproduce sound beyond the limits of the digital audio CD specification. 1/4 inch open reel audio tape being recorded/played on a studio recorder with Dolby SR noise reduction could achieve a dynamic range of over 100 dB for example. Commercially produced vinyl records can in some cases contain frequencies of up to 50 kHz. For archives dealing with these kinds of materials, it would make sense to digitize them in high definition formats in order to truthfully capture the originals.

    It is still debated whether humans can actually hear the difference between CD-quality and high definition audio. Audiophiles claim that the presence of frequencies above the human hearing limit does have an influence on the frequencies that we do hear. Blind listening tests however have shown that even expert listeners were at chance level when having to judge whether a recording was high definition or not (Meyer and Moran, 2007). In order to rule out possible differences in the recordings themselves, the same high definition recordings were played both with and without a device in the chain to reduce the recordings to regular CD quality. The rest of the playback setup (loudspeakers, amplifiers, cables, etc.) was left identical.

    The main disadvantages of recording with high sampling frequencies and bit rates are that the recordings take up more storage space and that they are less compatible with audio software and hardware. Recordings made in 24bit/96kHz take up 3 times as much storage space as CD quality recordings and even though flash memory cards are getting cheaper every month, this is still a drastic reduction in recording capacity for no real-world benefit in terms of quality. Recording in 24 bit at normal sampling frequencies (44.1kHz/48kHz) would create files that are 1/3 larger than 16 bit files, which isn’t too dramatic and could be justified when using very high grade microphones and recording equipment. The fact that not all audio software and hardware can play back high-definition formats may cause problems when working with the files on a computer. As an archive, we would therefore need to create additional copies in standard CD quality, such that everyone can use the files. Instead of creating duplicate files in different qualities, we have chosen to normalize and convert high definition audio to regular 16 bit at 44.1 or 48 kHz. The normalization step before the conversion makes sure that we use the maximum 96 dB of dynamic range that 16 bits offer, which is more than enough to retain the full quality of the recordings we receive.

    References:

    E. Brad Meyer and David R. Moran (2007). “Audibility of a CD-Standard A/D/A Loop Inserted into High-Resolution Audio Playback”, Journal of the Audio Engineering Society, 55-9, pp. 775-779.

    Jul 092010
     

    by Jacquelijn Ringersma and Paul Trilsbeek

    Language documentation is a field in linguistics which went through a “technology driven” change over the last 10 to 15 years. Linguists have been going into the field for decades making sound recordings of languages and linguistic events. However the miniaturization of recording equipment made it much easier to make large quantities of high quality audio recordings. In addition, upcoming affordable, high quality, video equipment permitted an extension of documentation work from audio to the visual dimension. The latter made it possible to document the languages within their natural and cultural context, which triggered the establishment of a branch within linguistics where the creation of a rich multimedia corpus for languages that are threathened with extinction became the main goal. In addition to collecting large amounts of primary audio and video recordings, numerous derived resources are produced: annotations and transcriptions, lexica, grammars, field notes etc.

    The DoBeS (Dokumentation Bedrohter Sprachen/Documentation of Endangered Languages) programme, which started about 10 years ago, was among the first funding initiatives for endangered languages documentation projects. An important aspect of this programme was the establishment of a central, specialized archive to take care of long-term preservation of the valuable material that was collected by the documentation projects. The central archive, which is based at the Max Planck Insitute for Psycholinguistics, was made an essential part of the programme because one had become aware of the fact that large amounts of recordings about languages and cultures were in danger of being lost forever. Old tapes and films that are not stored in specialized climatized rooms rapidly degrade over time, but the situation is even worse for modern digital storage media such as DVDs and hard disks. Even if the media would survive, the technology changes so fast that it is very unlikely that there will be equipment around to read today’s storage media 20 years from now. A specialized digital archive will continuously migrate the stored material to the latest storage technology and will also migrate the stored file formats should they become obsolete.

    Some researchers have their doubts about storing their resources in an online archive. Arguments presented to us are in the form of: (1) Once my material is in there, I will not be able to get it out; or (2) Other researchers will use my material without giving me the credit and do all kinds of nice things with it. However, when you store material in the MPI archive, you will maintain full control over the access to the data through an online access management system (AMS). You are the owner of the data, and you will remain the owner of the data. You decide who you will grant access. This opens up opportunities to give access to members of the speech communities or the relatives of those recorded.

    The MPI archive accepts deposits from linguists who do not have an affiliation with the MPI or DoBeS. Storing your data in the MPI archive has the advantage that the data is stored in an organized manner and that you can use online tools to search through your data. You can also use online tools to visualize your data in an attractive manner. But most important, we will safeguard your data by making various backup copies in the Netherlands and Germany, by always using the latest state of the art in storage technology and by migrating to newer file formats should the current ones become obsolete in the future.

    If you are interested in storing your language data in the MPI archive, please inquire about the conditions with one of the archive managers: Paul Trilsbeek or Jacquelijn Ringersma.