Ongoing Projects

Improved Data Collection from Online Sources with Quey Expansion and Active Learning

This is the first chapter of my dissertation. I propose the use of query expansion and active learning to improve data collection from large online databases such as Twitter.

Privacy Protection for Natural Language
with Alexander Ororbia and Joshua Snoke

We propose to use character level neural networks to generate synthetic text in order to allow data release while protecting the privacy of individuals producing the text

Online Public Opinion and Refugee Allocation

Using Twitter data I study how Germans react to the allocation of refugees in close geographic proximity. I find that German Twitter users show more interest in the topic before the facility opens.

Published Projects

Active Learning Approaches for Labeling Text
with Blake Miller and Walter R. Mebane Jr.

In this paper we study the benefits of active learning for social science applications. Including novel problems that have not been addressed so far (intercoder (un)reliability). We find that active learning has most benefits in cases of imbalanced data and find that active learning works even with noisy labeled data.

Text as Policy: Measuring Policy Similarity Through Bill Text Reuse
Policy Studies Journal
with Bruce Desmarais, Matthew Burgess, Eugenia Giraudy

We propose the use of text-sequencing algorithms, applied to legislative text, to identify bills that introduce similar policy proposals.

Exploratory Data Analysis with Random Forest
Journal of Open Source Software
with Zachary M. Jones

An R package to use Random Forest for exploratory data analysis. Most of the Software is written by Zach Jones. There is also a working paper that introduces a bit more of the theory for a social science audience.

Human Rights Text as Data
with Chris Fariss, Charles Crabtree, Zachary M. Jones, Megan Biek,Taranamoll Kaur, Ana Ross, and Michael Tsai

We introduce and make publicly available a large corpus of digitized primary source human rights documents.