I am Associate Professor of Sociology at Princeton University where I am also affiliated with the Politics Department, the Office of Population Research, the Princeton Institute for Computational Science and Engineering, The Center for Information Technology Policy and the Center for the Digital Humanities. I develop new quantitative statistical methods for applications across computational social science. I completed my PhD in Government at Harvard in 2015 where I had the good fortune of working with the interdisciplinary group at IQSS. I also earned a master's degree in Statistics from Harvard in 2014.
I've worked extensively on methods for computer-assisted text analysis and with Justin Grimmer published an introduction to the field. Molly Roberts, Dustin Tingley and I have developed the Structural Topic Model, an unsupervised topic model geared towards inference in the social sciences. The accompanying software stm is available on CRAN and at structuraltopicmodel.com. It also includes a full vignette demonstrating its use. A different approach, explored with Pedro Rodríguez and Arthur Spirling, uses contextually-specific word embeddings to explore meaning. We have a paper, non-technical explainer, and the conText R package all available on github. I've also worked on the connections between text analysis and causal inference. Recent papers cover adjusting for confounding with text, and text as treatment and outcome. I also contributed to a recent survey of these topics aimed at a computer science audience.
Justin Grimmer, Molly Roberts and I have a 2022 book at Princeton University Press, Text as Data: A New Framework for Machine Learning and the Social Sciences, which provides a guide to using computational text analysis to learn about the social world. A related perspective is given in our recent annual review piece on machine learning. I recently gave a short recorded talk on this area of work for the Princeton Talks series.
I've also written a paper, "What is Your Estimand?", with my graduate students Ian Lundberg (now an assistant professor at Cornell Information Science) and Rebecca Johnson (now assistant professor at Georgetown Public Policy) about the disconnect between claims and evidence in the social sciences. The paper encourages more precision in specifying the goals of quantitative work where the target of estimation is too often specified in terms of regression coefficients rather than model-free estimands.
One of the papers I am most excited about recently is about Design-based Supervised Learning, a flexible strategy for addressing error in machine-learning produced outcomes or covariates. This is particularly useful for using annotations from large language models which are accurate but known to have biases. We are working on a package and if you are interested sign up for the mailing list here: https://forms.gle/8p9CpEMsiQ8DCmrCA.
I teach undergraduate and graduate statistics as well as the occassional course on text analysis. Course materials for these classes as well as from my summer methods camp and sociology statistics reading group are available on my teaching page.