The World Atlas of Language Structures project (http://wals.info) is one of the landmarks of digital linguistics. It contains 192 features in 2678 languages. However, the resulting data matrix is very sparse, and instead of the possible 514176 datapoints, there are only about 68000, or 13%.
An interview with Sean Wallis, author of http://corplingstats.wordpress.com/:
What led you to set up the blog?
The blog comes from several sources. My research background is in cognitive science and AI, and in particular machine learning applied to scientific research, and statistics is a key component of that. I have been involved in regular debates about the role of statistical evidence in corpus linguistics over the years, so (for example) you will find some of the same experimental design themes about choice in our 2002 book, http://www.ucl.ac.uk/english-usage/projects/ice-gb/book.htm. I am not a linguist "by trade" but a methodologist, so I can only work by collaborating with and learning from others.
We are happy to announce Glottolog/Langdoc, a comprehensive knowledge base of 104k languoids and 175k references for the Semantic Web.
In linguistics as well as in the Semantic Web world, it is important to clearly identify the concepts one is talking about. Glottolog/Langdoc takes this insight as a starting point and provides 104k Unique Resource Identifiers (URIs) for languoids and 175k for references to descriptive literature focusing on underdescribed languages.
The NSF Directorates for Social, Behavioral & Economic Sciences (SBE) and Education & Human Resources (EHR), together with the Office of Cyberinfrastructure (OCI) recently announced a solicitation for Building Community and Capacity for Data-Intensive Research (http://www.nsf.gov/pubs/2012/nsf12538/nsf12538.htm?WT.mc_id=USNSF_25&WT....) with a proposal deadline of 2012-05-22. Here are some snippets from the solicitation.
SSWL (Syntactic Structure of the World's Languages) is a open-ended database of syntactic, morphological and semantic properties. Each language is characterized by a set of property-value pairs (e.g., Object Verb: Yes), and examples that illustrate these property value pairs. A rich variety of search functions are available, as well as mapping and the creation of similarity trees. The database is open-ended in the sense that (a) new language experts may sign up to add new languages, and (b) new properties may be added.
Call for papers
Papers, posters and software demonstrations are invited on all topics of lexicography, including, but not limited to, the following fields, which are the main focus of the congress:
• Lexicography and national Identity
• Indigenous Languages and Lexicography
• Corpus-driven Lexicography
• Lexicography in Language Technology
• Multilingual Lexicography
• Lexicography and semantic Theory
• Terminology, LSP and Lexicography
• Reports on Lexicographical and Lexicological Projects
• Other topics
The most recent NSF Computer & Information Science & Engineering (CISE) Computing Research Infrastructure (CRI) solicitation http://www.nsf.gov/publications/pub_summ.jsp?WT.z_pims_id=12810&ods_key=... was posted on 1 April 2001. The next deadline for proposals is Tuesday, 25 October 2011. CRI supports two types of projects:
- Institutional Infrastructure, for either
- the creation of new computing research infrastructure (II-NEW), or
- the enhancement of existing such infrastructure (II-EN).
The US National Science Foundation (NSF) has just announced a new Documenting Endangered Languages (DEL) solicitation at http://www.nsf.gov/pubs/2011/nsf11554/nsf11554.pdf, with a deadline of 20 Sept 2011 for proposals (note: not 15 Sept as in past years). Projects must focus on one or more of the following areas:
Linked Data in Linguistics
Linguists from all disciplines produce more and more data and share the challenge how to make this data accessible to other researchers in their field and beyond. This does not only concern the general availability of data, but also the representation of the structure of the data. Linked Data is one paradigm which can be employed to tackle this task.
We are happy to announce the workshop "Linked Data in Linguistics" at the annual meeting of the German Linguistic Society (Deutsche Gesellschaft für Sprachwissenschaft, DGfS) taking place March 7-9, 2012 in Frankfurt a.M., Germany.
This might be of interest to folks: The call for the 2011 CI computing fellowships is out: http://cifellows.org/. These fellowships provide very generous funding:
The CIFellow’s salary for one year, at $75,000.
Health and other fringe benefits for the CIFellow. We expect the host institution to offer the CIFellow the standard package that it normally offers.
Discretionary expenses for the CIFellow, including moving costs as well as other minor costs (such as travel to conferences, etc.).
Indirect costs for the host institution (up to 25%).