Vanderbilt Institute of Chemical Biology



Discovery at the VICB







Identifying Clinically Relevant Gene Rearrangements in Cancer


By: Carol A. Rouzer, VICB Communications
Published:  July 21, 2016



A new algorithm uses next generation sequencing data to rapidly and efficiently detect genetic abnormalities in multiple kinds of cancer.


The advent of next-generation sequencing (NGS) has enabled us to amass genomic and transcriptomic information on large numbers of cancers in databases such as The Cancer Genome Atlas (TCGA). This accumulating information has revealed that most cancers are genetically heterogeneous and that many genetic alterations that represent potential therapeutic targets are present only in a minority of tumors of any particular kind of cancer. Because of this diversity, identifying gene aberrations that drive the malignant behavior of any specific tumor can be a challenge - one that is magnified by the difficulty of obtaining a comprehensive assessment of the array of abnormalities present in the tumor. To address this challenge, Vanderbilt Institute of Chemical Biology member Jennifer Pietenpol and her laboratory have developed a new algorithm to rapidly and efficiently mine NGS data for the identities of gene rearrangements of clinical importance [T. M. Shaver, B. D. Lehmann et al., Cancer Res., published online May 26, doi: 10.1158/0008-5472.CAN-16-0058]


Chromosomal rearrangements are a particularly common genetic aberration in many kinds of cancer. However, these abnormalities can be difficult to identify due to the presence of genes that share homologous regions and alternative gene splicing, both of which contribute to false positive results. To solve this problem the Pietenpol lab developed a new algorithm - Segmental Transcript Analysis (STA) - that uses estimates of RNA expression to rank tumor samples based on the probability that the sample harbors a gene rearrangement. Specifically, STA analyzes exon-level RNA-seq data and assigns a score to each identified gene in every tumor sample in a population of samples. For each tumor in a population, the STA score quantifies the deviation in magnitude and directionality of expression of individual transcript segments as compared to that of the overall population. A high STA score suggests that a portion of the gene or the entire gene is abnormally expressed. If all exons are uniformly overexpressed, a derangement that affects the entire open reading frame of the gene is implied. Possibilities include an increase in gene copy number or a rearrangement involving the regulatory region of the gene leading to loss of expression control. Alternatively, if only a portion of the exons is overexpressed, the result suggests that a rearrangement has occurred involving a breakpoint within the open reading frame that leads to the select expression of only a part of the gene. In this case, closer examination of the RNA-seq data in the region of the breakpoint can reveal sequences from other genes that are involved in the rearrangement. When whole genome sequencing (WGS) data are available, the investigator can confirm the identification through a search for DNA sequences corresponding to the proposed rearrangement (Figure 1).



FIGURE 1. Workflow for Segmental Transcription Analysis (STA) of next-generation sequence (NGS) data of large tumor sample populations. (A) Exon-level RNA-seq expression data for a large number of tumors are collected and plotted for each gene, in this case ROS1. Black lines indicate data for tumor samples exhibiting expression levels typical of the population. Red lines indicate data for tumor samples showing unusually high levels of expression for some exons of the gene. (B) STA values are calculated for the data in (A) and are plotted here in order from the highest STA value to the lowest. Data points in red correspond to samples that exceed the cut-off value of 2. These points also correspond to the data shown in red in (A). (C) RNA sequence data are analyzed for samples exceeding the STA cutoff. In this case, a sequence from another genomic region was discovered to form a "hybrid transcript" with ROS1 at exon 34. The hybrid transcript was further analyzed to identify the rearrangement partner, CD74. (D & E) If whole genome sequencing (WGS) data are available, those data are used to map sequences near the breakpoints identified using the RNA-seq data. Successful mapping of any rearranged sequences to both RNA-level partners validates the identification. Figure reproduced by permission from T. M. Shaver, B. D. Lehmann et al., Cancer Res., published online May 26, doi: 10.1158/0008-5472.CAN-16-0058. Copyright 2016 American Association for Cancer Research.



To test the ability of STA to identify DNA rearrangements, the investigators collected RNA-seq and WGS data for a number of different tumor types from the TCGA database. They then showed that STA identified a number of known gene rearrangements across the different cancer types, including lung cancer, glioblastoma, and prostate cancer. They also demonstrated STA's ability to detect loss of expression of known tumor suppressors in lung, bladder, endometrial, and renal cancer.


Having validated STA's ability to identify clinically relevant known gene rearrangements over a range of tumor types, the investigators next collected data for 173 triple negative breast cancer (TNBC) samples. TNBC is a particularly aggressive form of breast cancer that does not express the estrogen receptor or HER2, making it resistant to the most common breast cancer therapies. Application of STA to the TNBC data revealed a number of novel gene rearrangements. For example, a fusion of the 5´ end of TMEM87B a gene encoding a transmembrane protein, with a portion of the gene for the MER tyrosine kinase (MERTK) gave rise to overexpression of a protein that includes the entire tyrosine kinase domain of MERTK (Figure 2). MERTK is a protooncogene, and the investigators showed that expression of this fusion product in a mouse lymphocyte cell line led to constitutive activation of signaling pathways that promote cell proliferation and survival. Consistently, expression of the fusion protein also conveyed a survival advantage to the cells. Expression of the fusion protein in MCF10A cells, which are derived from breast basal epithelial cells, yielded similar results. The finding that TMEM87B-MERTK fusion proteins have been reported in lung, bladder, and cervical cancer support the hypothesis that this gene rearrangement may be a cancer driver in some TNBCs.


FIGURE 2. (A) Domain structures of the TMEM87B and MERTK proteins and the rearrangement gene product. Dotted lines indicate the junction point of the rearrangement. (B) Membrane topology of the TMEM87B, MERTK, and rearrangement proteins. Figure reproduced by permission from T. M. Shaver, B. D. Lehmann et al., Cancer Res., published online May 26, doi: 10.1158/0008-5472.CAN-16-0058. Copyright 2016 American Association for Cancer Research.



Following this success, the investigators expanded their efforts to include data from 80 more TNBCs plus 28 TNBC-derived cell lines. An interesting result of this work was discovery of a fusion of the kinase domain of FGFR3 (fibroblast growth factor receptor 3) to the coiled-coil domain of TACC3 (transforming acidic coiled-coil-containing protein 3) (Figure 3). This gene rearrangement was present in one tumor and one of the TNBC-derived cell lines. To ascertain the effects of the fusion, the investigators used siRNA to knock down expression of the abnormal protein in the cell line. The result was a reduction in cell viability, suggesting that this protein was a driver of growth and survival in the cells. Similar effects resulted when the investigators treated the cells with a small molecule inhibitor of the FGFR3 kinase. FGFR3-TACC3 rearrangements have been detected at high frequency in bladder cancer and glioblastoma, suggesting that therapeutically actionable rearrangements detected in one tumor type may appear sporadically in other diseases such as TNBC.





FIGURE 3. Domain structures of an FGFR3-TACC3 gene rearrangement product identified in a TNBC tumor (A) and the SUM185PE TNBC-derived cell line (B). Also shown are the domain structure of FGFR3 and TACC3, and the portions of those proteins that contributed to the SUM185PE rearrangement product. Figure reproduced by permission from T. M. Shaver, B. D. Lehmann et al., Cancer Res., published online May 26, doi: 10.1158/0008-5472.CAN-16-0058. Copyright 2016 American Association for Cancer Research.



Having successfully identified these and other gene rearrangements in TNBC, the investigators obtained 14 additional data sets from TCGA that included 5461 samples across 14 different cancer types. From these, the researchers used STA to identify and validate 1,178 gene rearrangements at both the RNA and DNA level. Approximately 40% of the rearrangements involved fusion between two protein-coding genes. The majority of the remaining rearrangements resulted from fusion of a portion of a coding gene to a region of noncoding DNA. In most of these cases, the coding gene served as the 5´ partner. However, in 3% of the cases, the noncoding DNA was the 5´ partner, and many of these resulted in deregulated expression.


Overall, the results revealed overexpression of a wide range of tumor promoters and loss of tumor suppressors via a highly diverse panoply of rearrangements. Oncogenes or potential oncogenes impacted by these transcripts include ALK, HRAS, and PPR11 among many others. Rearrangements were also identified in immunomodulatory genes, such as IL6R and CD274, that result in increased transcript levels. The findings confirm the power of STA to rapidly and efficiently identify gene rearrangements that result in aberrant expression levels of proteins or portions of proteins that may serve as cancer drivers. We look forward to the new insights these data will provide toward our understanding of the malignant behavior of many cancers.




View Cancer Research article: Diverse, Biologically Relevant, and Targetable Gene Rearrangements in Triple-Negative Breast Cancer and Other Malignancies










The Vanderbilt Institute of Chemical Biology, 896 Preston Building, Nashville, TN 37232-6304, phone 866.303 VICB (8422), fax 615 936 3884
Vanderbilt University is committed to principles of equal opportunity and affirmative action. Copyright © 2014 by Vanderbilt University Medical Center