(under construction – more coming soon)


This page serves as a repository for stuff I’ve created that I think others might find useful.  Please get in touch if you try to use something linked here and it doesn’t work.

Word length database

This .zip file contains some supplementary materials for my 2016 paper Learnability shapes typology: the case of the midpoint pathology.  In particular, it has .txt files containing parts of the New Testament for 102 languages, in addition to the entire King James Bible, and a collection of interviews with various former members of the Beatles.  A summary file is also included, as are a collection of (not very user-friendly, not very efficient) Perl scripts used to determine the distribution of word lengths in each file.  There’s more information about what’s there and how to use it in the folder’s readme.

Convergent GLA

This Python script provides an implementation of Magri’s (2012) convergent GLA.  It prints out (i) a ranking trace, (ii) the trial at which the last error occurred, and (iii) the final weights of all constraints, in descending order.  It also writes to file a log of what happens at each trial.  Input file format mostly follows the OTSoft format, with the following differences: (i) the second row must contain an initial weight for each constraint, and (ii) each input candidate may have one and only one optimal candidate, and that optimal candidate must be assigned a frequency of >= 0.  If that’s unclear, maybe this toy input file will make it clearer!

O/E Calculator

This Python script calculates the observed and expected frequencies of identical consonant co-occurrence.  It takes in a wordlist with one word per line, where each sound is represented by one grapheme.  (Here is an example of a wordlist formatted in the appropriate way). It saves to file a few things that you can play around with (e.g. a file that lists the frequencies of each consonant by word – this makes it easy to pull out the words that have co-occurring [r]s, for example), but most importantly a file that contains the expected and observed co-occurrence frequencies for each consonant.