Deep-CCA: Developing a Deep Multi-view Learning Algorithm for Aggregating Multi-modal Patient Data (DSI-SRP)
This DSI-SRP fellowship funded funded Jiaxin (Nicole) He to work in the laboratory of Professor Yuankai Huo in the Department of Electrical Engineering and Computer Science during the summer of 2021. Nicole is a junior with majors in Computer Science and Mathematics.
The project funded by this fellowship aimed to apply canonical correlation analysis (CCA) for cancer’s survival prediction and explore the differences between different extensions of CCA in this application. In this project, Nicole and Professor Huo used a multi-modal cancer dataset processed by the paper “Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis” published in IEEE in 2020. The dataset is composed of three views: the qualitative information from histology processed by convolutional neural network(CNN) and graph convolutional network(GCN) respectively and quantitative information from genomic data. Though our dataset does not have very high dimension, survival prediction task from high-dimensional genomic data is often preceded by a dimension reduction step, like principal component analysis (PCA). Here is when CCA comes into play because for multi-modal data CCA can generate more meaningful lower-dimensional features. CCA is a technique used to identify and measure the associations among two sets of variables, which has the form:
Ui = a1(predictor1) + a2(predictor2) + … + an(predictorn) and
Vi = b1(outcome1) + b2(outcome2) + … + bm(outcomem)
Nicole and Professor Huo firstly ran CCA between each view and the outcome variables and found that histology information from GCN has lower correlation score. They chose to run CCA again between histology information from CCN and genomic data for dimension reduction. Nicole and Professor Huo kept six dimensions which have largest weights (largest a’s and b’s) in each set of variables. They then ran cox regression to calculate concordance index on new lower-dimensional dataset. They did this for 15 different train and test splits and get the average concordance index. The concordance indices are around 0.75 to 0.80. Therefore, except sparse CCA, all other extensions of CCA improved the performance of survival prediction task.
In addition to receiving support through a DSI-SRP fellowship, this project was supported and facilitated by the DSI Data Science Team through their regular summer workshops and demo sessions.