We propose the use of text-sequencing algorithms, applied to legislative text, to identify bills that introduce similar policy proposals. We present three ground truth tests, applied to a corpus of 500,000 bills from US-state legis- latures. First, we show that bills introduced by ideologically similar sponsors are more likely to exhibit a high degree of text reuse. Second, we show that bills clas- sified by the National Council of State Legislatures as covering the same policies exhibit a high degree of text re-use. Third, we show that rates of text reuse across state borders correlate strongly with the diffusion networks recently introduced by Desmarais, Harden and Boehmke (2015).
Redaction has been the most common approach to protecting text data, but synthetic data presents a potentially more reliable alternative for disclosure control. By producing new sample values which closely follow the original sample distribution but do not contain real values, privacy protection can be improved while utility from the data for specific purposes is maintained. We extend the synthetic data approach to natural language by developing a neural generative model for such data. We find that the synthetic models outperform simple redaction on both comparative risk and utility.
We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State.
We introduce Random Forest with an emphasis on its practical application for exploratory analysis and substantive interpretation. We provide intuition as well as technical detail about how Random Forests work, in theory and in practice, as well as empirical examples from the literature on American and comparative politics. Furthermore, we provide software implementing the methods we discuss, in order to facilitate their use.
In two experiments I show that individuals bias their reported ideological position in a survey, when they are asked to report their own and their preferred party's position together.