This is the first chapter of my dissertation. I propose the use of query expansion and active learning to improve data collection from large online databases such as Twitter.
We propose to use character level neural networks to generate synthetic text in order to allow data release while protecting the privacy of individuals producing the text
Using Twitter data I study how Germans react to the allocation of refugees in close geographic proximity. I find that German Twitter users show more interest in the topic before the facility opens.
In this paper we study the benefits of active learning for social science applications. Including novel problems that have not been addressed so far (intercoder (un)reliability). We find that active learning has most benefits in cases of imbalanced data and find that active learning works even with noisy labeled data.
We propose the use of text-sequencing algorithms, applied to legislative text, to identify bills that introduce similar policy proposals.
An R package to use Random Forest for exploratory data analysis. Most of the Software is written by Zach Jones. There is also a working paper that introduces a bit more of the theory for a social science audience.