The World Atlas of Language Structures project (http://wals.info) is one of the landmarks of digital linguistics. It contains 192 features in 2678 languages. However, the resulting data matrix is very sparse, and instead of the possible 514176 datapoints, there are only about 68000, or 13%.
A few announcements of the good news that arXiv has found sufficient funding to continue :
LJ's infoDOCKET edits and presents the Cornell UL press release.
The Chronicle's Wired Campus also writes up the story : "New Grant and Governance Structure Will Help Support arXiv."
It's good news for Open Access fans and for those supporting library publishing and archiving initiatives.
An interview with Sean Wallis, author of http://corplingstats.wordpress.com/:
What led you to set up the blog?
The blog comes from several sources. My research background is in cognitive science and AI, and in particular machine learning applied to scientific research, and statistics is a key component of that. I have been involved in regular debates about the role of statistical evidence in corpus linguistics over the years, so (for example) you will find some of the same experimental design themes about choice in our 2002 book, http://www.ucl.ac.uk/english-usage/projects/ice-gb/book.htm. I am not a linguist "by trade" but a methodologist, so I can only work by collaborating with and learning from others.
For me, the best way to learn R, especially on the visualization side of things, is to dive right in. Grab some data and make some charts, or better yet, find a graph you like and try to replicate it. R core functionality and the many available packages let you do a lot without having to know what's going on underneath. I use this approach in Visualize This and the tutorials around here.
I like the satisfaction of immediate results. Then I learn the nitty gritty later. That said, it doesn't hurt to familiarize yourself with the environment. Also, visualization is a small part of what you can do with R, so it can help to know what else you can do analysis-wise.
Direct from R
For starters, you can turn to R itself. Open the console, and enter
help.start(). Your browser will open with links to a few manuals and a list of available packages. Of main interest is An Introduction to R. It's in HTML format, but you can also grab the PDF version. I don't recommend reading it from beginning to end, unless you like falling asleep at your desk.
Using R for Data Analysis and Graphics — By Maindonald, this covers similar topics as An Introduction, but it also offers exercises at the end of each chapter.
R for Beginners — This one by Emmanuel Paradis is a bit more manual-like than the previous, but it couldn't hurt to have it available.
R Graphics — Paul Murrell's book is actually too advanced for people new to programming. Plus it's priced academically (read that as expensive textbook). But if you're already a coder familiarizing yourself with R's graphics engine, this is a good one. Don't get this if you're looking for a package or plotting function reference.
Online and Shorter
The manuals and books can be a lot to take in though when you want to get to work on your own data as soon as possible. Here are a couple of resources where you can get nibbles at a time.
R twotorials — This is a collection of 90 two-minute videos by Anthony Damico. Prepare yourself for some fast talking. I'm a slow person in general, so it's rushed for me, but I imagine this could be an interesting three-hour cram session. Just don't blame me if your head explodes.
R Bloggers — There are people scattered across the Web who blog about R. This is simply an aggregator of many of those feeds.
Additionally, despite what some say, the documentation for R is actually pretty good if you know how to use it. Simply enter a question mark followed by the function name for a description of that function and usage. For example, enter
?plot for documentation on
Google is also helpful, which often takes you to Stack Overflow. A search for R used to turn up all sorts of unrelated junk, but it has gotten a lot better. Sometimes it's helpful to include "r-project" or "rstats" in your query to make the search for a letter less ambiguous.
Finally, #rstats on Twitter is the common hashtag for questions and comments.
Thanks to LJ's infoDOCKET :
"...collection of more than 2.2 million images going back to the mid-1800s, the photographs feature all manner of city oversight — from stately ports and bridges to grisly gangland killings."
"...a catalog of more than 80,000 Einstein-related documents, and a visual display of 2,000 documents up to the year 1921..."
"...Cage fans can celebrate Cage’s centennial curated series of browsable Cage curios; Want to see him play an amplified cacti and plant matter with a feather, review his notes from 1939 or view a 1960 TV performance of “Water Walk”, you can do all that and more on the extensive digital project."
This could be exciting!
From INFOdocket :
A group of 17 European partner institutions have joined forces in the “European Newspapers” project and will, over the next 3 years, provide more than 10 million newspaper pages to the EUROPEANA service.
Each library participating in the project will distribute digitized newspapers and full-texts free of any legal restrictions to Europeana. There will be a special focus on newspapers published during the First World War, thus providing a meaningful addition to the resources aggregated by the current Europeana 1914-1918 project.
Additional Details in the Complete Announcement
Dear members of the Cyberling community,
upon the kind invitation by Emily Bender, this is to inform you that
FRIAS -- the Freiburg Institute for Advanced Studies -- and the Max-Planck-Institute of Evolutionary Anthropology (Leipzig) proudly announce the availability of a fascinating new OPEN ACCESS online tool which can be exploited both in research and teaching on the grammars of varieties of English worldwide:
From where come the "25,000 books in the sciences, social sciences, and arts and humanities."? I do not read a diverse list of scholarly publishers (no De Gruyter? PUF? What scholarly books does ProQuest publish?) All languages? Who selects them? On what criteria? How is the chapter-level data for a monograph (e.g., not collected essays or conference proceedings) analysed? I find the information from Thompson rather vague. And how do the "Links from book and book chapter records to full text" work?
The press release touts that the books included "[date] back to 2005." That's only 5 years of books -- a vast time span for the sciences, but not for the humanities & many social sciences. At this rate, the value for most researchers will not fully be demonstrated for many years to come (especially as only 18% of the books will be in the "arts and humanities," which conversely relies most heavily on the print monograph).
Other comments? Opinions?
From INFOdocket :
Maps dating back to 1906 are available as scanned digital files, or can be viewed online using a handy web map viewer. Made up of over 1,000 maps, the archive include all five past editions of the Atlas of Canada, 1906 to 1995, the Canadian sector of the International Map of the World, 1956 to 1987, and the first Glacier Atlas of Canada, 1969 to 1972. Topics of maps include maps themed on: Population, culture, aboriginal peoples, economics, transportation, environment, and historical themes.
Direct to Atlas of Canada Archives (Historical Maps)"