NWAV 47 is pleased to offer a rich and varied set of workshops.
Thursday, October 18
The Graduate Center, CUNY, 365 5th Avenue
Best Practices in Sociophonetics
Part I: myVoice
Marianna Di Paolo, Adrian Bell, Lisa M. Johnson
(University of Utah)
Our workshop will present myVoice, a web application written by Adrian Bell in the statistical program R (R Core Team, 2016) using the Shiny R package, The app automates the process of recording, uploading, analyzing, and displaying cultural and sociophonetic data. The app interface displays geospatial context; asks the user to respond to prompts eliciting answers to demographic, cultural, and recent-language-use questions; and then asks the user to record a sentence or list of words via the microphone on their computer or mobile device. In creating myVoice, Bell has integrated the Montreal Forced Aligner (McAuliffe, 2017) with Praat (Boersma & Weenink, 2017), and other R packages to (1) record survey responses, geoposition, and audio; (2) then align the audio files to text, and analyze the sociophonetic and cultural features, and (3) finally to present the analysis to the participant within a few seconds, returning geospatial social network displays of sociophonetic and cultural affinity with other users. myVoice will make crowdsourcing sociolinguistic data from any language possible from any mobile device. We are conducting tests to assure that the myVoice app records at levels comparable to the recording equipment we have been using for our field recordings. (We believe that it will, given Kardous and Shaw (2014, 2016) findings).
We will be using myVoice in a study of the ethnic group formation of the Tongan diaspora community in Utah based on linguistic and cultural data sampled over time from community members, including their use of both Tongan and/or English. The app will prompt them to respond as they go about their normal lives and thus capture the community dynamics as they are ongoing. We know that immigrant languages are typically replaced by English in three generations in U.S. diaspora communities (Bayley, 2004). However, we have as of yet no-fine grained empirical data needed to elucidate the day-to-day mechanism leading to language shift or to the formation of new contact varieties such as “Tongan Utah English”.
Two additional enhancements will be added to the app and may be ready to present at the workshop: The first enhancement will allow the elicitation of participants’ interpretation of the results displayed by the sociolinguistic app, e.g. “Can you explain why you are in this particular spot on the network?”. The participants contribution can be in terms of open response or selection from a menu of possible explanations. The menu of candidate explanations will be populated by the researchers and previous open responses by participants. It is expected that through time the menu of explanations will converge to a salient set of factors most likely relevant to ethnic group formation and conformist assimilation. In the next phase, we will be working with video game designers to gamify the MyVoice crowd-sourcing app.
Part II: CLOx
Alicia Beckford Wassink, Robert Squizzero, Campion Fellin and David Nichols
(University of Washington)
Unscripted vernacular speech is often the desired object of sociolinguistic study. However, because it is labor-intensive and time-consuming, manual transcription of audio recordings remains a major obstacle to the analysis of conversational speech.
Client Libraries Oxford (CLOx) is a new, user-friendly application for sociolinguists developed by the Sociolinguistics Laboratory at the University of Washington. CLOx utilizes Microsoft Azure Cognitive Services Speech API recognition technology to automatically generate plain-text orthographic transcriptions.
CLOx saves time; we estimate that this tool enables transcription of a sociolinguistic interview to be completed in one-fifth or less of the time it would take to produce a fully manual transcription. Another significant advantage of CLOx is that timestamps indicating the start and end time of each audio sample are preserved. This facilitates a range of tasks further downstream in the process of linguistic analysis, including: forced alignment and extraction for phonetic analysis, conversation analysis, part-of-speech tagging, etc. Because transcriptions are in standard, plain-text format (.csv), output is readable by a variety of applications commonly used for analysis and processing of linguistic data (e.g., Microsoft Word, Excel, R, ELAN).
This workshop will demonstrate how to use CLOx to generate transcriptions and work with the output to perform different kinds of linguistic analysis. For demonstration purposes, we will be working with English language data, but CLOx allows users to select any of the languages that are supported by Microsoft Cognitive Services. These include: Arabic, Chinese, English (US & GB), French, German, Italian, Japanese, Portuguese, Russian, and Spanish. Participants will have an opportunity to work with their own audio files or use example files provided by the presenters. Participants will first learn how to format audio files for the service, then how to access CLOx to generate a transcription and download the .csv output. We will then cover importation of the transcript to ELAN and manual correction using sample data from the Pacific Northwest English Study. We will show an example workflow for correction in ELAN, including the transcription conventions used in the Pacific Northwest English Study, which include handling speaker overlap and dealing with disfluencies (following Du Bois, 1991), and marking speech to be redacted. Using the study corpus has allowed us to test CLOx’s behavior with a range of vocal qualities, speech rates, and group sizes. Participants will have opportunities to ask questions and offer suggestions to improve CLOx.
Those who wish to actively participate should bring a laptop with ELAN pre-installed. ELAN is available at https://tla.mpi.nl/tools/tla-tools/elan/. Participants who wish to use their own audio files may bring 1-3 .wav recordings, each with a maximum file size of 6MB for stereo audio sampled at a 16 kHz rate. Mono audio is also acceptable. We request that sample recordings be in English and contain the voices of no more than two interlocutors for this demonstration. No prior transcription experience is required.
Integrated Speech Corpus ANalysis – ISCAN:
A new tool for large-scale, cross-corpus, sociolinguistic analysis
Jane Stuart-Smith (University of Glasgow), Morgan Sonderegger (McGill University), Michael McAuliffe (McGill University)
Methodological and Pedagogical Issues for Undergraduate Researchers in Large Corpus Projects
Michol Hoffman (York), James Stanford (Dartmouth), Sali Tagliamonte (Toronto), Christina Tortora (CUNY), and James Walker (La Trobe)
Organizers: Christina Tortora, Bill Haddican, Michael Newman, and Cecelia Cutler (CUNY)
In recent years, many variationist labs have made efforts to increase participation of undergraduates in research projects–particularly those involving the creation of large speech corpora. This trend seems to reflect two changes that have shaped sociolinguistic practice in recent years: (i) a desire for larger data sets; and (ii) an increased awareness of the value of research apprenticeship in undergraduate teaching and learning outcomes. The workshop organizers take this to be a potentially positive development in the field, but we think it raises several issues that should be explicitly addressed by colleagues including:
1. How to ensure data quality.
2. How to handle sampling, in cases where undergraduates help recruit subjects.
3. How to make sure that the scientific goals of the researchers align well with the educational needs of our UG students, i.e. teaching/learning objectives of the undergraduate program.
4. Best practices in training of undergraduates for data collection.
The workshop will involve presentations from members of four different research groups that have worked with undergraduates, followed by a panel discussion.
Language Variation Suite Toolkit
Olga Scrivner (Indiana University), Rafael Orozco (Louisiana State University), Manuel Diaz-Campos (Indiana University)
Given a current need in modern sociolinguistics for tools that reflect changes in modern technology and new methods, our workshop will contribute to the field by passing the torch from traditional sociolinguistic tools into new technology.
We have developed a cutting-edge tool for sociolinguistics that is based on state-of-the-art statistical methods: Language Variation Suite Toolkit – www.languagevariationsuite.com
. We have applied a new technology, the interactive Shiny web application, to a sociolinguistics framework.
- LVS offers the flexibility of online web-based applications.
- It is accessible from any device, which is important for many sociolinguists who may need to process data during field trips.
- It is interactive, allowing researchers to view, pre-process and examine their data from various angles.
- In addition, with the current shift to R coding, this application provides a gateway into learning R.
- Finally, it is built entirely in R.
Our second web application is Text Mining Tool (http://www.interactivetextminingsuite.com
), which might be of interest to many sociolinguists who are interested in learning about interactive text mining, for example, clustering and topic modeling.
Computational Sociolinguistics SOLD OUT!
Jack Grieve (University of Birmingham), Dirk Hovy (Bocconi University), David Jurgens (University of Michigan), Tyler Kendall (University of Oregon), Dong Nguyen (Alan Turing Institute), James Stanford (Dartmouth College), Meghan Sumner (Stanford University), Rachael Tatman (Kaggle)
Over the past decade, a new approach to the study of language variation and change has emerged at the intersection of linguistics and computer science. Research in sociolinguistics, dialectology, and corpus linguistics has increasingly been using advanced quantitative methods to analyze larger and more complex datasets, often harvested from online sources, so as to understand patterns of language use across regions, social groups, and communicative situations. Concurrently, research in computational linguistics has increasingly been concerned with integrating social information into natural language processing systems. Recently these two lines of research have begun to converge, giving rise to the new field of computational sociolinguistics.
In this workshop, we will introduce the field and present a series of studies that exemplify a range of methods currently being applied, including the using large datasets consisting of social media corpora and crowdsourced surveys, and using techniques from data mining and machine learning for data analysis. The workshop will conclude with a panel discussion and a question period.
Eye-tracking for LVC research
Vishal Arvindam, Ailís Cournane
(New York University)
This workshop provides an introduction to the use of eye tracking for linguistic research, with particular focus on how implicit behavioural measures can be used effectively for (a) sociolinguistic research (e.g. McGowan, 2010; Koops et al., 2008; Fricke et al. 2016; Mitterer et al. 2007) and (b) syntactic and semantic variables. After an overview of the method and equipment, we will give a presentation of a sample study on semantic gender processing (Arvindam, 2018), and then run live demonstrations of both that reading study and a visual-world study we’re currently developing at the Child Language Lab at NYU. We will enlist the help of a lucky audience member participant (or two) for these demonstrations using our SR Research Eyelink Duo machine!
Workshop participants will leave with an understanding of how eye tracking works, the kinds of questions it can and cannot address, the kinds of data eye trackers can collect and how to interpret them, and what running a study participant involves.
- Arvindam, V. S. (2018). How stereotypical gender is linguistically represented: Evidence from eye movements. Honors thesis. University of Massachusetts, Amherst.
- Fricke, M., Kroll, J. F., & Dussias, P. E. (2016). Phonetic variation in bilingual speech: A lens for studying the production–comprehension link. Journal of memory and language, 89, 110-137.
- Koops, C., Gentry, E., & Pantos, A. (2008). The effect of perceived speaker age on the perception of PIN and PEN vowels in Houston, Texas. University of Pennsylvania Working Papers in Linguistics, 14(2), 12.
- McGowan, K.B. (2010). Examining Listeners’ Use of Sociolinguistic Information During Early Phonetic Judgments: Evidence from eye-tracking. New Ways of Analyzing Variation 39. San Antonio,TX
- Mitterer, H., & McQueen, J. M. (2007). Tracking perception of pronunciation variation by tracking looks to printed words: The case of word-final /t/. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007),1929-1932.
Friday, October 19
New York University
Careers, Variation and Change: Sociolinguists in the Workplace
Lunch provided for PhD students.
Reporting statistics for LVC
Rena Torres Cacoullos (Penn State University), Gregory Guy (New York University)
This is a brown-bag event.
Saturday, October 20
New York University
COSWL Pop-Up Mentoring
Facilitated by the LSA Committee on the Status of Women in Linguistics
Mentors and mentees will have lunch together. More information about this event is available here.
If you wish to be a mentor, please fill out this form
If you wish to be a mentee, please fill out this form
Going Viral: Shopping sociolinguistic research to the media and to the community
Shana Poplack, Nathalie Dion, Suzanne Robillard, Basile Roussel (University of Ottawa Sociolinguistics Lab)
This is a brown-bag event.