PhD candidate Siwei Zhang will present her dissertation work on Monday, May 12, at 12:30 p.m. Central Time. Her advisor is Yaomin Xu. All are invited and encouraged to attend.
The defense will take place in person at 2525 West End Avenue, in the 11th floor large conference room (room 11105).
Network Analysis and Visualization of Disease Multimorbidity Using Electronic Health Records and Genetic Biobank Data
Disease multimorbidity, the co-occurrence of multiple diseases within an individual, presents complex challenges for both public health and precision medicine. Advancing our understanding of multimorbidity can illuminate disease mechanisms, reveal patient heterogeneity, and enable biomarker discovery and treatment repurposing. Large-scale Electronic Health Records (EHR) and EHR-linked genetic biobanks offer unique opportunities to quantify phenome-wide multimorbidity, uncover shared genetic mechanisms among co-occurring conditions, and define multimorbidity-based disease clusters. However, major analytical and methodological challenges remain. To address these, we present three key contributions. First, we introduce a phenome-wide multimorbidity network that quantifies nonrandom disease-disease co-occurrences while accounting for potential confounding factors. Second, we develop a genetic discovery platform that integrates polygenic scores for predicted transcriptomic, proteomic, and metabolomic traits with phenome-wide association studies (PheWAS) to uncover shared biological mechanisms among multimorbid conditions. To support exploration, we also develop an interactive network visualization tool featuring dynamic cluster analysis of biological pathways linked to diseases with similar multimorbidity patterns, enabling intuitive exploration of complex disease relationships and their shared biological mechanism. Third, we propose a model-based clustering framework using a bipartite stochastic block model (biSBM) with a stability-driven post-processing step to identify robust disease clusters and patient subgroups from individual-level EHR data. This framework demonstrates superior performance in simulations and replicates coherent, interpretable multimorbidity structures across independent datasets, including UK Biobank and Vanderbilt BioVU. A case study of JAK2V617F somatic mutation carriers reveals genetic heterogeneity across patient subgroups with distinct multimorbidity patterns, illustrating the potential of our data-driven approach to uncover mechanistic insights into patient heterogeneity through EHR-derived multimorbidity networks.
