I am Associate Professor of Sociology at Princeton University where I am also affiliated with the Politics Department, the Office of Population Research, the Princeton Institute for Computational Science and Engineering, The Center for Information Technology Policy and the Center for the Digital Humanities. I currently serve as an Associate Editor at Sociological Methods & Research handling text as data manuscripts. I develop new quantitative statistical methods for applications across computational social science. I completed my PhD in Government at Harvard in 2015 where I had the good fortune of working with the interdisciplinary group at IQSS. I also earned a master's degree in Statistics from Harvard in 2014.
I've worked in the fields of text as data, causal inference, and the combination of the two. My publications can be found here on my publications page or (probably more up to date on) Google Scholar. A few highlights
- Justin Grimmer, Molly Roberts and I have a 2022 book at Princeton University Press, Text as Data: A New Framework for Machine Learning and the Social Sciences, which provides a guide to using computational text analysis to learn about the social world. I gave a short recorded talk on this area of work for the Princeton Talks series.
- My former graduate student Naoki Egami (Columbia Political Science), my postdoc Musashi Hinck (now Research Scientist at Intel AI Labs), Naoki's graduate student Hanying Wei and I developed Design-based Supervised Learning, a flexible strategy for addressing error in machine-learning produced outcomes or covariates. Software is available at http://dsl.software. We have a new (June 2024) working paper for social scientists.
- My graduate students Ian Lundberg (now an assistant professor at UCLA Sociology) and Rebecca Johnson (now assistant professor at Georgetown Public Policy) published "What is Your Estimand?" in American Sociological Review about the disconnect between claims and evidence in the social sciences.
- Molly Roberts, Dustin Tingley and I developed the Structural Topic Model, an unsupervised topic model geared towards inference in the social sciences. The accompanying software stm is available on CRAN and at structuraltopicmodel.com. It also includes a full vignette demonstrating its use.
- Pedro RodrÃguez, Arthur Spirling, and I wrote a paper in American Political Science Review on contextually-specific word embeddings to explore meaning. We also have a non-technical explainer, and the conText R package all available on github.
- With many coauthors, I worked on papers about causal inference with text including: adjusting for confounding with text, and text as treatment and outcome. I also contributed to a recent survey of these topics aimed at a computer science audience.
I teach undergraduate and graduate statistics as well as the occasional course on text analysis. Course materials for these classes as well as from my summer methods camp and sociology statistics reading group are available on my teaching page.
You can find preprints of my publications on this site and on Google Scholar. Replication data for most of my work is available on my Dataverse.