Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts
Politics and political conflict often occur in the written and spoken word. Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have prevented political scientists from using texts in their research. Here lies the promise of automated text analysis: it substantially reduces the costs of analyzing large collections of text. We provide a guide to this exciting new area of research and show how, in many instances, the methods have already obtained part of their promise. But there are pitfalls to using automated methods. Automated text methods are useful, but incorrect, models of language: they are no substitute for careful thought and close reading. Rather, automated text methods augment and amplify human reading abilities. Using the methods requires extensive validation in any one application. With these guiding principles to using automated methods, we clarify misconceptions and errors in the literature and identify open questions in the application of automated text analysis in political science. For scholars to avoid the pitfalls of automated methods, methodologists need to develop new methods specifically for how social scientists use quantitative text methods.
Awarded Political Analysis Editor’s Choice Award for an article providing an especially significant contribution to political methodology. Replication Data: here.