Gene Expression Model Selector (GEMS)


Gene Expression Model Selector (GEMS) is a system that constructs, in a supervised fashion, diagnostic and outcome prediction models from array gene expression data. Examples of such models are: (a) models that detect cancer, (b) models that determine the correct subtype of cancer, or (c) models that predict survival after treatment. Models that support such complex decision making are widely recognized as having the potential to revolutionize medicine in the years to come. In addition to the decision support models, GEMS can be used to select a small number of genes that are as good or better than the full gene set for diagnosis and/or outcome prediction. These biomarkers (genes) are also useful for discovery purposes (e.g., they suggest plausible causes and treatments of various types of cancer). Finally, GEMS provides estimates of the models' performance (e.g., accuracy) in future applications (i.e., when applied on patients, not used to build the models, but who come from the same patient population as the ones used to build the models), and allows users to run the models for individual patients.

Building such models (a) requires specialized training in statistics and/ or bioinformatics and/or pattern recognition, (b) takes several weeks to months to accomplish in typical academic settings, and (c) may suffer from pitfalls introduced by human analysts such as overfitting the data (i.e., building models that are very good for the training set but perform poorly on future independent patient cases). GEMS performs these tasks quickly, automatically, without overfitting and without requiring the user to have expertise in data analysis.


GEMS was validated using the most stringent gold standard technique of independent (i.e., cross-data set) validation. In this method the system is used to build a model from dataset 1, then the model is applied on dataset 2. The two datasets come from different labs and hospitals, as well as in our experiments obtained using different microarray technologies. It was found that GEMS (a) matches or exceeds the performance of human analysts, (b) builds automatically models in minutes, (c) estimates the models' performance correctly, and (d) selects gene markers that generalize from one dataset to another.

Competitive Advantages

1. GEMS's learning algorithms were chosen from ~20 algorithms after an extensive algorithmic evaluation using 11 publicly available datasets spanning 74 cancer types.

2. Thorough validation:

(a) GEMS was tested by re-analyzing the above datasets. (b) GEMS was validated with
5 "fresh" datasets against human experts using cross-validation.

(c) GEMS was validated with two pairs of datasets using independent (cross-dataset) validation. The validation involved both model and gene marker generalizability. In total GEMS was validated with 16 datasets.

3. Fully automated, yet provides many optional features for the seasoned analyst.

4. Includes proprietary gene selection and causal discovery algorithms with well-defined properties, theoretical guarantees for correctness and excellent empirical performance.

5. Client-server architecture.

Alexander StatnikovConstantin AliferisIoannis TsamardinosNafeh Fananapazir
Licensing manager: 
Hassan Naqvi

Featured Video

Vanderbilt Patent Activity

View Vanderbilt University Patents

CTTC on Twitter