Text as Policy: Measuring Policy Similarity Through Bill Text Reuse

with Bruce Desmarais, Matthew Burgess, Eugenia Giraudy

We propose the use of text-sequencing algorithms, applied to legislative text, to identify bills that introduce similar policy proposals. We present three ground truth tests, applied to a corpus of 500,000 bills from US-state legis- latures. First, we show that bills introduced by ideologically similar sponsors are more likely to exhibit a high degree of text reuse. Second, we show that bills clas- sified by the National Council of State Legislatures as covering the same policies exhibit a high degree of text re-use. Third, we show that rates of text reuse across state borders correlate strongly with the diffusion networks recently introduced by Desmarais, Harden and Boehmke (2015).

Privacy Protection for Natural Language

with Alexander Ororbia and Joshua Snoke

Redaction has been the most common approach to protecting text data, but synthetic data presents a potentially more reliable alternative for disclosure control. By producing new sample values which closely follow the original sample distribution but do not contain real values, privacy protection can be improved while utility from the data for specific purposes is maintained. We extend the synthetic data approach to natural language by developing a neural generative model for such data. We find that the synthetic models outperform simple redaction on both comparative risk and utility.

Human Rights Text as Data

with Chris Fariss, Charles Crabtree, Zachary M. Jones, Megan Biek,Taranamoll Kaur, Ana Ross, and Michael Tsai

We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State.

Exploratory Data Analysis with Random Forest

with Zachary M. Jones

We introduce Random Forest with an emphasis on its practical application for exploratory analysis and substantive interpretation. We provide intuition as well as technical detail about how Random Forests work, in theory and in practice, as well as empirical examples from the literature on American and comparative politics. Furthermore, we provide software implementing the methods we discuss, in order to facilitate their use.

Bias in Ideological Self and Party Placement

In two experiments I show that individuals bias their reported ideological position in a survey, when they are asked to report their own and their preferred party's position together.

Parallel Estimation of Bayesian Measurement Models

with Zita Oravecz

In this project we investigate the feasibility of several parallelization methods for bayesian estimation of IRT models. We compare them to variational and subsampling methods.