Ongoing Projects


Improved Data Collection from Online Sources with Quey Expansion and Active Learning

This is the first chapter of my dissertation. I propose the use of query expansion and active learning to improve data collection from large online databases such as Twitter.

Text as Policy: Measuring Policy Similarity Through Bill Text Reuse
with Bruce Desmarais, Matthew Burgess, Eugenia Giraudy

We propose the use of text-sequencing algorithms, applied to legislative text, to identify bills that introduce similar policy proposals.

Privacy Protection for Natural Language
with Alexander Ororbia and Joshua Snoke

We propose to use character level neural networks to generate synthetic text in order to allow data release while protecting the privacy of individuals producing the text

Published Projects


Human Rights Text as Data
PLoS ONE
with Chris Fariss, Charles Crabtree, Zachary M. Jones, Megan Biek,Taranamoll Kaur, Ana Ross, and Michael Tsai

We introduce and make publicly available a large corpus of digitized primary source human rights documents.

Exploratory Data Analysis with Random Forest
Journal of Open Source Software
with Zachary M. Jones

An R package to use Random Forest for exploratory data analysis. Most of the Software is written by Zach Jones. There is also a working paper that introduces a bit more of the theory for a social science audience.