In this paper we study the benefits of active learning for social science applications. Including novel problems that have not been addressed so far (intercoder (un)reliability). We find that active learning has most benefits in cases of imbalanced data and find that active learning works even with noisy labeled data.
This is the first chapter of my dissertation. I propose the use of query expansion and active learning to improve data collection from large online databases such as Twitter.
Using Twitter data I study how Germans react to the allocation of refugees in close geographic proximity. I find that German Twitter users show more interest in the topic before the facility opens.
We propose the use of text-sequencing algorithms, applied to legislative text, to identify bills that introduce similar policy proposals.
We introduce and make publicly available a large corpus of digitized primary source human rights documents.
An R package to use Random Forest for exploratory data analysis. Most of the Software is written by Zach Jones. There is also a working paper that introduces a bit more of the theory for a social science audience.