Evaluating and Applying Machine-Learning Models in Predicting Gene Regulation and Expression in Neuropsychiatric Disorders (DSI-SRP)

This DSI-SRP fellowship funded Namju Kim to work in the laboratory of Dr. Eric Gamazon, Ph.D. in the Department of Medicine during the summer of 2023. Namju is a rising junior with a major in Molecular and Cellular Biology and a minor in Data Science.

Neuropsychiatric diseases are partially caused by genetic factors, including non-coding variants in the genome that confer disease liability through dysregulated gene expression. Thus, studies such as the Genotype-Tissue Expression (GTEx), which generated transcriptomic data in human tissues (including the brain), are relevant to the search for genotype-phenotype associations.

LASSO, ridge regression, and elastic net can be used to build predictive models of gene regulation and expression level. LASSO enforces sparsity, extracting a small number of potential predictors. Ridge regression estimates the effects of the potentially highly correlated predictors without causing sparsity. Elastic net combines the characteristics of both. Random forest provides a non-linear approach. Namju and his mentor trained predictive models of gene expression using genetic variants as features with machine-learning approaches trained on GTEx data. The summer research program extended his research study (BSCI3861) on computational genomics that he took in the laboratory of Dr. Eric Gamazon.

In addition to receiving support through a DSI-SRP fellowship, this project was supported and facilitated by the DSI Data Science Team through their regular summer workshops and demo sessions.