Summer Archives

Ricardo Sandoval

Class of: 2020

Understanding Feature Selection Practices for Social Work Researchers Implementing effective poverty-alleviation programs in the US requires understanding the dynamics of poverty and its interactions with domains including health and nutrition. Computational advances have given data scientists tools to study poverty from multiple angles. For instance, Blumenstock, Cadamuro, and On (2015) use phone metadata to infer an individual’s socioeconomic status. These inferences can then be aggregated to examine the distribution of wealth of a particular nation. Other data scientists have used satellite imagery to construct indicators of regional poverty (Blumenstock 2016), and search for extremely poor villages in order to target aid to these (Abelson, Varshney, and Sun 2014). Despite significant computational advances, data scientists remain limited in their ability to accurately predict poverty outcomes. Many empirical studies show that, in fact, regression based analyses conducted using a handful of features selected by domain experts can outperform sophisticated ones based on several hundreds or thousands of features. In this project, we pursue two lines of inquiry inspired by this discrepancy: first, we ask whether feature selection techniques can give us insights into the impact of health and nutrition on poverty outcomes, even despite the limitations to accurately predict poverty outcomes. We use a large-scale administrative dataset -- Survey of Income and Program Participation (SIPP) -- to explore this computational question. Second, we examine the feature selection practices of social work researchers. We conduct a user-stud based on in-depth interview about data practices and feature selection practices of social work researchers, including when these practices may vary from those of data scientists.