In my research, I use computational models to investigate language processing and first language acquisition. I collect data using lab-based behavioral experiments, from web experiments on Amazon Mechanical Turk, and make extensive use of natural language corpora including CHILDES, Speechome, the British National Corpus, Google N-grams, and many others.
Here are a few active projects:
Wordful is a smartphone app for tracking early vocabulary growth. Wordful builds on proven checklist-based methods to support more engaging, high-touch longitudinal studies of language acquistion, with model-based sampling logic that maximizes the utility of caregiver responses. Multiple caregivers can contribute data for the same child, and each caregiver can contribute data to multiple children. A powerful scheduling and templating system means that the app is highly extensible, and can be customized for many kinds of longitudinal studies of language development. Wordful is undergoing pilot testing at Stanford and the University of Wisconsin in Winter 2018/2019. Joint work with Mika Braginsky, Ryan Warner, Dr. George Kachergis, Benny DeMayo, and Dr. Michael C. Frank.
Telephone is a web-based experimental platform for running large-scale audio-based games of telephone with adult participants. This process of reptition—called serial transmission by researchers—yields datasets with very special statistical properties that can provide valuable insights regarding the mechanisms underlying human speech regconition. The yielded data is especially useful for evaluating probabilistic models of language structure, and can even be used as a data source for constructing better speech recognition models. Joint work with Sathvik Nair and Dr. Tom Griffiths.
When do children start to develop abstract representations of language structure? In this paper, we use a hiererchical Bayesian model to look for evidence of grammatical generalization in children’s earliest use of articles (“a”, “an,” and “the”). The model, when applied to a large set of longitudinal, developmental corporal, yields evidence of minimal generalization before two years of age, but a rapid increase thereafter. Joint work with Dr. Michael C. Frank, Dr. Roger Levy, and Brandon Roy.
childes-db is a set of software tools for cognitive scientists, psychologists, and linguists who want to work with child language corpora in the Child Language Data Exchange System (CHILDES). childes-db provides a versioned set of reference parsings, direct MySQL access, an R API, and web-based visualizations for many common tasks. Joint work with Alessandro Sanchez, Mika Braginsky, Kyle MacDonald, Dr. Dan Yurovsky, and Dr. Michael C. Frank.