Introduction

Gene fusions resulting from chromosomal rearrangements have been well recognized as an important class of genomic aberrations that are key drivers of oncogenesis [1]. While many of the cancer-associated mutations have a variable and heterogeneous nature of occurrence across different tumor types, gene fusions for the most part are typically disease specific. Gene fusions have an important role in the initial steps of tumor development [1, 2]. This is exemplified by the archetype fusion, BCR-ABL in chronic myeloid leukemia, also known as Philadelphia chromosome [35]. More important, many of the recurrent gene fusions identified in cancer have been proven as robust therapeutic targets and their discoveries have been the driving force of modern precision therapeutics [6]. The significance of gene fusions in the development of epithelial cancers is on the rise in recent years since the discovery of TMPRSS2-ERG fusion in about 50 % of prostate cancer [7] and EML4-ALK fusion in 6.7 % of non-small cell lung cancer (NSCLC) [8]. The EML4-ALK gene fusion has been matched with an effective ALK-targeted therapy that has generated enormous patient impact [9]. Recent genome-wide studies provide evidence that gene fusions are not only present but also abundant in solid tumors including breast cancer [10, 11]. An interesting recent review by Mertens et al. recapitulates the evolution of gene fusion research over the years, including the technological advancements, the breakthrough fusion discoveries made, and the resultant therapeutic developments [12]. A more detailed account of these developments, as discussed in the recent book by Rowley et al., is also a valuable resource in this context [13].

Breast cancer is a highly heterogeneous disease comprising of distinct molecular subtypes, variety of genetic aberrations including mutations, and a varied clinical outcome [14, 15]. While endocrine therapy and Her-2-targeted therapy have been very successful in treating estrogen receptor (ER)-positive or Her-2-positive breast cancers, about half of the tumors in advanced cases will exhibit de novo or acquired resistance to these therapies [1620]. In addition, treatment options are limited for breast cancers that are negative for both receptors. The genetic aberrations driving these more aggressive breast cancers are ill-understood. Recently, driven by new deep sequencing studies of breast cancer, several recurrent or pathological gene fusions have been identified, and interestingly, most of which are preferentially found in the more aggressive breast cancers such as luminal B, triple-negative, or endocrine-resistant breast cancer. This review summarizes the current achievements in identification and characterization of recurrent or pathological gene fusions in breast cancer. We also highlight the biological relevance and clinical implications of these discoveries, and discuss the possible applications and future directions of gene fusion study in breast cancer.

Next-generation sequencing-aided fusion detection

Although important gene fusion discoveries transpired before the advent of second-generation sequencing or next-generation sequencing (NGS) through conventional methods like comparative genomic hybridization (CGH) and Fluorescent in situ hybridization (FISH), the invention of NGS technologies revolutionized the field of genomics and made large-scale and high-throughput fusion discovery possible. NGS generates high-throughput data and can comprehensively catalog chimerical transcripts and genomic rearrangements in the cancer transcriptome/genome. Since the introduction of first NGS method in 2005, several different and improved NGS platforms have been developed, including whole genome sequencing (WGS), whole exome sequencing (WES), and whole transcriptome sequencing (RNA-Seq). To date, more than 90 % of the total number of gene fusions reported in all types of tumors has been discovered using various NGS approaches in the past 5 years [12].

RNA-seq

RNA-seq is the most commonly selected method for fusion transcript detection since it focuses only on the expressed regions of the genome, making the discovered fusions more relevant. In addition to being relatively less expensive with a quick turnaround time, it can also quantify expression levels and facilitate the detection of multiple fusion variants generated during the fusion event, making this a more ideal technology for gene fusion detection. However, RNA-seq routinely generates a daunting quantity of chimeric sequences, most of which are artifactual fusion sequences as a result of library artifacts and mapping errors [21, 22], transcription-induced chimeras (TIC) resulting from intergenic splicing of adjacent genes, or passenger gene fusions. Bioinformatics methods that use stringent parameters to filter out artifactual chimeras may become less sensitive in detecting authentic fusions or underestimate their incidence, while filtering out the chimeras from adjacent genes will dismiss the close-range gene translocations.

Whole genome sequencing

WGS-based gene fusion detection works through identifying structural variations, a subset of which generates expressed gene fusions. Despite being unbiased and providing a comprehensive and integrative characterization of novel genomic alterations, WGS falls short owing to the short read length, technical artifacts, poor coverage, and higher false-positive calls due to sequencing errors. In addition, the gene fusions discovered through WGS are at the genomic level covering both coding and noncoding regions, and so further evaluation of their ability to produce fusion transcripts is needed to determine their significance. More recently, a third-generation sequencing technology based on single-molecule sequencing was developed by Pacific Biosciences (PacBio) in 2010 [23]. Although the technology is still relatively immature and not widely available, it promises longer read lengths (>40,000 bp with average length around 10,000–15,000 bp), higher accuracy, small amount of starting material, and low cost [24, 25], and thus may overcome the limitations of the current WGS technology.

Bioinformatics approaches and tools in gene fusion discovery

A comprehensive list of the gene fusion detection tools and other NGS data analysis tools [26] is available in the OMICtools portal (www.omictools.com). Currently, there are over 20 different fusion detection tools that utilize WGS, RNA-seq, or both and follow different discovery algorithms [27]. Each tool has its fair share of intrinsic pros and cons and also exhibits diverse behavior based on the dataset used. Achieving an adequate balance between fusion detection sensitivity and accuracy, while critical, is still in the making. The study by Carrara et al. comparing eight most used RNA-seq-based fusion detection tools revealed that the tools vary greatly in fusion detection sensitivity and false fusion discovery rate [21], suggesting the need for improvement in their specificity and sensitivity. More recently, Kumar et al. compared the performance of 12 fusion detection tools on the basis of fusions detected, sensitivity, and positive prediction values in four different types of RNA-seq datasets [22]. Among these tools, they found that EricScript [28] had 100 % positive predictive value with a sensitivity of 78 %, while requiring the least amount of time and memory utilization. Moreover, the performance of each tool differed based on the quality of dataset, such as read length and number of reads, suggesting that the attributes of RNA-seq data on hand should be one of the key deciding factors while choosing the fusion detection tool. Another latest study by Liu et al. [29] comprehensively evaluated 15 fusion transcript detection algorithms and found that no single method is dominantly the best, but SOAPfuse [30] has overall best performance, followed by FusionCatcher [31] and JAFFA [32]. EricScript performed well in synthetic data but not in the three real data sets they evaluated. Since WGS and RNA-seq have its own limitations when employed separately, integrative tools such as INTEGRATE, BreakTrans, and Comrad that utilize both WGS and RNA-seq were developed to increase the sensitivity and specificity of fusion detection [3335].

While these approaches aim to detect authentic fusion sequences, identifying the recurrent gene fusions generated by genomic rearrangements and distinguishing the pathological fusions from passenger fusions is challenging. In this regard, deploying additional dimensions of genomic datasets such as copy number, exon expression, and molecular concepts will be critical to provide additional filters. As copy number alterations frequently accompany genomic fusion events, although not necessarily at all times (such as in copy-neutral balanced rearrangements), the presence of intragenic copy number breakpoints within one or both of the fusion partner genes are indications of genomic rearrangements [36, 37]. In our recent discovery of ESR1-CCDC170 fusion [38], we utilized a fusion detection pipeline to capture authentic fusion sequences from RNA-seq data, nominate tumor-specific recurrent fusion candidates based on their expression profile, and deploy copy number profiling datasets to reveal the fusions generated by unbalanced genomic rearrangements (Fig. 1a). These candidates were then ranked through Concept Signature (ConSig) analysis (http://consig.cagenome.org/), which prioritizes biologically important fusion genes by assessing their association with molecular concepts characteristic of cancer genes [39]. This discovery indicated the importance of data integration and use of appropriate additional filters during fusion detection process, which will provide multiple levels of genomic evidence to lock in the authentic recurrent gene fusions.

Fig. 1
figure 1

Schematic representation of bioinformatics pipeline to discover pathological recurrent gene fusions in the cancer genome a The FusionZoom discovery pipeline detects recurrent pathological fusion candidates from the RNA-seq data, catalogs the unbalanced breakpoints at the genomic loci of fusion partner genes from copy number data, and prioritizes pathological gene fusions through the ConSig analysis b Recurrent ESR1-CCDC170 gene fusion identified in aggressive luminal breast cancers. The schematic shows the tandem duplication in chr6q25.1 as possible major genetic mechanism generating this fusion

The landscape of gene fusions in breast cancer

A considerable diversity in the incidence of gene fusions has been observed across different tumor types, with breast cancer falling more toward the higher end of this spectrum [40]. In a study by Robinson et al., RNA-seq on a panel of 89 breast cancer cell lines and tumors showed that gene fusions are not rare events in breast cancer, with an average of 5.5 fusions per breast cancer cell line and 4.2 fusions per primary tumor [10]. Sequencing analyses of breast cancer genome have broadly revealed the presence of complex somatic rearrangements in both breast cancer cells and primary tumor samples [41, 42]. Most gene fusions found in breast cancer involve intrachromosomal rather than interchromosomal rearrangements [10, 42], and tandem duplications are especially common in some cancers [42]. Further, a study by Kalyana-Sundaram et al. reported the presence of “amplicon-associated gene fusions”, a distinct group of genetic alterations involving genes in the chromosomal amplifications loci, to be largely prevalent in breast cancers [43]. However, this class of genomic aberrations is predominantly a by-product of chromosomal amplifications, and constitutes a subset of “pseudo” or passenger aberrations that should be carefully considered during the prioritization of gene fusion candidates.

To date, only a handful of the discovered fusions have been found to be recurrent or pathological genomic events in breast cancer. In spite of the slow progress in breast cancer gene fusion discovery, we speculate that recurrent and pathological genomic fusions are still to be discovered considering the limitations of current deep sequencing technologies and bioinformatics approaches in detecting recurrent genomic gene fusions (as discussed above and in perspectives), the complex and diverse protein-coding structures and oncogenic properties of authentic recurrent gene fusions, and the fact that the gene fusion pool inhabiting a breast tumor is comprised of a daunting quantity of miscellaneous passenger rearrangements and recurrently expressed TICs. In contrast, driver genetic aberrations including driver gene fusions are usually mutually exclusive such that they rarely coexist with one another in the same tumor [4446]. Thus, looking for driver genomic fusions within the large quantity of passenger rearrangements and chimerical transcripts in a tumor is like finding the needle in a haystack. All these perspectives will complicate the discovery and identification process of true recurrent gene fusions resulting from genomic rearrangements. Herein, we review the gene fusions currently reported in breast cancer, categorized by their functionally distinctive molecular classes to summarize the current advances in gene fusion identification. Table 1 summarizes the currently known recurrent and pathologically important gene fusions in breast cancer, together with TICs.

Table 1 List of the currently known recurrent gene fusions and TICs in breast cancer

Estrogen receptor gene fusions

Estrogen receptor (ER) is a key regulator of cell growth and survival in a large fraction of breast cancers, especially the ER+ subtype. Recently, mutations in ER have been linked to acquired endocrine resistance in breast cancer [4750]. Our recent discovery of recurrent genomic rearrangements between ESR1 and its neighbor gene CCDC170 in 6–8 % of luminal B tumors revealed the presence of recurrent gene fusions in this more aggressive and endocrine-resistant form of ER+ breast cancer [38]. In addition, a majority of ESR1-CCDC170-positive tumors harbor tandem duplications between ESR1 and CCDC170 genes, which is consistent with the previous finding that tandem duplications are particularly common in certain breast tumors [42]. Additional clinicopathological studies will be required to further evaluate the incidence of this fusion in different breast cancer cohorts and elucidate its association with breast cancer endocrine resistance. In addition to our study, several other gene fusions involving the ESR1 gene have been identified recently, including ESR1-YAP1, ESR1-POLH, and ESR1-AKAP12 in ER+ patient-derived xenograft (PDX) samples [51]. Although these fusions appear to happen in individual cases among the samples tested, they share a common structure in that the ligand-binding domain of ESR1 is absent, but the hormone-independent transactivation domain and DNA-binding domain are retained suggesting their pathological significance in ER+ breast cancer. The Cancer Genome Atlas (TCGA) gene fusion study using RNA-seq data from16 tumor types (http://www.tumorfusions.org) including 1019 breast tumors reported the identification of 16 ESR1-associated fusions in breast cancer [40]. However, no molecular or functional validation of these fusions was performed in that study.

Kinase gene fusions

Gene fusions involving kinases as functional fusion partner are crucial in cancer, since the chimeric protein generated by this class of fusion often signify ideal and specific targets for drug development. The ETV6-NTRK3 fusion generated by balanced chromosomal translocation between ETV6 gene on chromosome 12p13 and NTRK3 gene on chromosome 15q25 comprises the helix-loop-helix (HLH) dimerization domain of ETV6 transcription factor linked to the protein tyrosine kinase (PTK) domain of NTRK3 and encodes a chimeric protein functioning as constitutive-active PTK. In addition to being a pathologically significant and functionally well-characterized fusion, the systematic approaches that were employed during the initial discovery of ETV6-NTRK3 served as a prospect for further gene fusion discoveries. ETV6-NTRK3 was initially detected in pediatric mesenchymal tumors in 1998, using conventional cytogenetic approaches [52, 53]. Karyotype analysis of fibroblasts from congenital fibrosarcoma (CFS) cases revealed abnormal clones with rearrangements of chromosome 15q25-26 and abnormalities of 12p13. To map the breakpoints, FISH analysis of the 12p13 and 15q25-26 alterations was performed using a series of non-chimeric yeast artificial chromosomes (YACs). The YACs yielding split FISH signals were identified and used together in dual-colored FISH and the resultant fusion signal detected represented a der(15)t(12;15). Subsequent cloning and sequencing of the fragments revealed that the ETV6 gene is fused in-frame to NTRK3 gene. Interestingly, this rearrangement gained importance in the context of breast cancer when it was later discovered as the only cytogenetic abnormality in human secretory breast carcinoma (SBC) in 2002 [54]. In that study, ETV6-NTRK3 was observed in 12 of the 13 (92 %) formalin-fixed paraffin-embedded (FFPE) SBC tissues analyzed but not in other ductal carcinomas, suggesting that SBC is characterized by ETV6-NTRK3. Later, Makretsov et al. developed a FISH assay for the detection of ETV6-NTRK3 in FFPE TMAs from SBC tissues, which showed 80 % sensitivity and 100 % specificity in detecting the fusion. Using this assay, the frequency of ETV6-NTRK3 was assessed in 202 invasive breast carcinoma cases including 1 SBC case [55]. ETV6-NTRK3 was detected only in that SBC tissue but not in non-SBC tissues. Further, four additional SBC cases were examined and three of them expressed ETV6-NTRK3 fusion. These results suggest ETV6-NTRK3 as the characteristic genetic alteration of SBC. Although restricted to a rare subset of breast cancers, this fusion represents a dominantly acting oncogene in breast cancer.

Further, the EML4-ALK fusion initially detected in NSCLC [8, 56] was later also detected in breast cancer. However, the incidence of EML4-ALK in breast cancer remains uncertain as a result of inconsistent results from different studies. While the study by Lin et al. in 2009 reported the expression of this fusion in 2.4 % of breast cancers [57], the study by Fukuyoshi et al. did not find any EML4-ALK rearrangement during their analysis of 90 breast cancer cases in 2008 [58], and the sequence analysis of 65 triple-negative breast cancers by Grob et al. failed to detect any ALK rearrangements [59]. The results of all these studies using different breast cancer tissue cohorts, though inconsistent, could indicate rare EML4-ALK rearrangements in breast cancer as opposed to NSCLC. Another recent discovery of MAGI3-AKT3 fusion in breast cancer started off as a promising recurrent rearrangement with therapeutic implications in triple-negative breast cancer [41] but was subsequently amended. Mosquera et al. examined 236 triple-negative breast cancer samples and failed to detect these rearrangements in any of the samples [60]; Pugh et al. analyzed the 3 positive TNBC cases from original screen [41] as well as additional tumors by exome plus hybrid capture and illumina sequencing and observed that MAGI3-AKT3 is expressed in only one of the index breast cancer cases, suggesting that this fusion may be a private event [61].

Another recently reported recurrent RPS6KB1-VMP1 gene fusion expressed in 30 % of breast cancers involves adjacent genes and is generated by tandem duplication in 17q23 [62]. However, this fusion was also detected in low levels in the normal breast tissue, and the chimeric protein does not contain a functional protein domain. This fusion is proposed to serve as an indicator of genomic instability at the 17q23 locus which leads to gene amplification and/or overexpression of crucial oncogenic elements such as MIR21 and RPS6KB1. The authors concluded that RPS6KB1-VMP1 fusion is not a driver in tumor development. Several gene fusions involving FGFR family member as 5′ or 3′ fusion partner such as ERLIN2-FGFR1, FGFR2-AFF3, FGFR2-CASP7, and FGFR2-CCDC6 are identified in breast cancer [63]. Interestingly, all fusions retain the intact kinase domain of FGFRs, suggesting their potential functionality and the ERLIN2-FGFR1 and FGFR2-CCDC6 fusions produce an active FGFR kinase. Moreover, the FGFR fusion partners in these fusions are proposed to mediate oligomerization, thereby triggering the activation of respective FGFR kinase. Similarly, another transcriptome sequencing study by Robinson et al. in a panel of 89 breast cancer cell lines and tumors identified several MAST kinase gene fusions—ARID1A-MAST2, GPBP1L1-MAST2, ZNF700-MAST1, NFIX-MAST1, and TADA2A-MAST1 involving MAST family members as the 5′ or 3′ fusion partner in 3–5 % of breast cancers [10]. All five fusions retain the PDZ domain and the 3′ kinase-like domain of MAST kinase. Additional sporadic kinase fusions identified in breast cancer include ERC1-RET and PDGFRA-KIT fusions identified in one case and, the TBL1XR1-PIK3CA fusion detected in 2 cases [46].

Gene fusions involving transcription factors

Transcription factors are the master regulators of the expression of multiple downstream target genes, and so their involvement in any gene fusion event could have substantial consequences not only on itself but also on its downstream transcriptional targets. The transcriptome sequencing study by Robinson et al. in a panel of 89 breast cancer cell lines and tumors identified 8 rearrangements involving transcription factors, NOTCH1 or NOTCH2SEC16A-NOTCH1, and SEC22B-NOTCH2 among which SEC16A-NOTCH1 fusion showed recurrence in 2 out of 89 samples [10]. NOTCH proteins function as receptors for membrane-bound ligands Jagged1/2 and Delta1 to regulate cell differentiation, proliferation, and apoptotic programs. Upon ligand activation, the protein fragment containing NOTCH intracellular domain (NICD) will be released and form transcriptional activator complex. Interestingly, all the NOTCH translocations were observed in ER-negative cases, and the fusion open reading frames retain the NICD domain, which might lead to constitutive activation of notch receptor. Subsequently, Clay et al. analyzed 501 breast cancer tissues and observed 5 cases with NOTCH1 rearrangements [64]. However, the NOTCH1 fusions in this study were all observed in ER-positive cases as opposed to previous observations by Robinson et al. in ER-negative cases. This could be in part attributed to the differences in the detection techniques and breast cancer tissue cohorts utilized in these studies. Further studies are necessary before drawing definite conclusions about the demographic distribution of these fusions in breast cancer.

Transcription-induced chimeras

In addition to the largely known genomic events generating gene fusions, RNA-seq is unraveling yet another class of gene fusions generated by intergenic splicing, termed as the transcription-induced chimeras (TICs). Such chimeras are most frequently generated by cis-splicing between collinearly positioned neighboring genes (also known as read-through events), and also occasionally by trans-splicing between noncollinearly positioned genes or distant genes [65]. Interestingly, at least 4–5 % of the tandem gene pairs in human genome are estimated to generate read-through events [66]. It has been suggested that most of the frequently expressed gene fusions are the result of TIC events [67]. Although many TICs are observed to be considerably expressed in normal human tissues as well, some are selectively overexpressed in human cancers, suggesting their functional significance in cancer development [68], as exemplified by the SLC45A3-ELK4 fusion identified in prostate cancer [69]. Some TICs such as SCNN1A-TNFRSF1A, and CTSD-IFITM10 [67] are identified to be more frequently expressed in breast cancers compared to normal breast tissues. Both the fusions involve membrane proteins suggesting that these fusions could be breast cancer-specific cell surface markers. Despite the lack of tumor specificity or tissue specificity as opposed to genomic gene fusions, the functional significance of TIC in breast cancer cannot be excluded. Further studies are required to determine whether these events are pathological, or merely passenger events.

The pathological role of gene fusions in breast cancer

Gene fusions are powerful drivers of cancer, and can generate novel chimeric proteins, change gene expression levels, alter protein activities, force oligomerization, or change the subcellular localization of a protein [11]. Functional characterization of fusion genes in breast cancer primarily focuses on elucidating their role in tumorigenesis and therapeutic resistance. To examine the biological effects of fusion genes, fusion transcripts are either ectopically overexpressed in normal breast epithelial cells and breast cancer cells, or silenced in endogenous fusion-positive breast cancer cell lines. Some studies also explore the in vivo phenotypes endowed by these fusion genes using breast cancer xenograft/orthotopic models [38, 70]. Various recurrent gene fusions have been shown to possess oncogenic activity through promoting cancer cell proliferation and/or migration [10, 38, 67, 70]. Fusion genes might drive cancer progression through a) generating constitutively active kinases or transcription factors or amplify the growth factor signaling, b) inactivating apoptotic factors and promote uncontrollable cell growth.

One of the well-studied gene fusions in breast cancer is the ETV6-NTRK3 fusion discovered in SBC. This fusion encodes a chimeric protein comprising the oligomerization domain of ETV6 and PTK domain of NTRK3 [52]. ETV6-NTRK3 expression in mammary tissues results in the development of a fully penetrant, multifocal malignant breast cancer with short latency. Moreover, studies have established that the ETV6-NTRK3 fusion protein is sufficient to initiate mammary tumorigenesis. Interestingly, upregulation and activation of the c-Jun/Fosl1 AP1 complex and several of the AP1 target genes including cyclin D1 have been observed in ETV6-NTRK3-expressing tumors, which stimulate cell proliferation. Moreover, activation of c-Jun/Fosl1 complex and establishment of ETV6-NTRK3/AP1 invasiveness program might be early events in breast cell transformation [71]. These results suggest the critical role of AP1 complex in ETV6-NTRK3-mediated breast cancer development and progression.

Gene fusions involving the estrogen receptor gene (ESR1) is preferentially found in luminal B or endocrine-resistant tumors. In our previous study, we have identified recurrent genomic rearrangements between ESR1 and its neighbor gene CCDC170 in the more aggressive and endocrine-resistant luminal B tumors (6–8 %) [38]. The observed fusion joins the 5′-untranslated region of ESR1 to the coding region of CCDC170, leading to the expression of truncated CCDC170 protein variants under ESR1 promoter (Fig. 1b). Thus, the ESR1 gene does not contribute to the coding sequences. The expression of ESR1-CCDC170 was associated with significant upregulation of Gab1, a key docking protein that enhances the signaling of many receptor tyrosine kinases and a key scaffold protein involved in the formation of invadopodia [72], along with activation of its downstream signaling molecules, AKT and ERK [38]. Our data show that Gab1 appears to play a role in the increased invasiveness driven by this fusion. However, further studies are required to determine the role of this fusion in breast cancer endocrine resistance and how the fusion proteins could engage Gab1 and interfere with signaling pathways, as there is very little knowledge regarding the role of CCDC170 in either normal or cancer cells. Interestingly, genome-wide association studies (GWAS) have revealed that CCDC170 locus is associated with breast cancer susceptibility in women [73, 74]. Another latest fine-scale mapping study revealed the existence of causal genetic variants regulating CCDC170 expression with a direct effect on breast cancer risk [75]. Together, these studies strengthen the significance of CCDC170 locus in breast cancer development.

The discovery of recurrent ESR1-CCDC170 fusion in luminal B subtype is important as it sheds light on the significance of gene fusions as oncogenic events that may promote the aggressive form of ER+ breast cancer. As one of the lead studies in the identification of recurrent and pathological gene fusions in breast cancer, it also highlights the necessity and significance of further elucidating the pathological role of recurrent gene fusions in breast cancer. The other ESR1 fusions reported thus far, such as ESR1-YAP1, ESR1-POLH, and ESR1-AKAP12, share distinct coding structures from ESR1-CCDC170 whereby the fusions result in truncated ER fragments that lack the hormone-dependent transactivation domain (AF2) and ligand-binding domain, but retain the hormone-independent transactivation domain (AF1) and DNA-binding domain. Among these, the function of ESR1-YAP1 fusion identified in an endocrine therapy-resistant PDX model is best documented. The ESR1-YAP1 fusion has been shown to possess constitutive ER transcriptional activity and estrogen-independent signaling and thereby induce estradiol-independent cell growth and promote endocrine resistance [51, 76]. Therefore, the ESR1 gene fusions generating chimeric ER proteins may represent a recurrent mechanism for development of endocrine resistance via constitutively active ER.

For gene fusions involving partners whose functions have been well documented, such as the NOTCH gene family, studies about the underlying mechanism of how these fusions might lead to a malignant phenotype are extensively guided by existing knowledge about the fusion partners. The functional recurrent NOTCH fusion transcripts retain the exons encoding NICD that is responsible for NOTCH-induced transcriptional activities, and the fusion-positive cells exhibit substantially enhanced activation of NOTCH pathway [10]. The endogenous NOTCH fusion-positive breast cancer cell lines as well as the TERT-HME1 cells ectopically expressing NOTCH fusion exhibited decreased cell–matrix adhesion and grow in suspension or as weakly attached clusters. In addition, the NOTCH fusion index cell lines displayed a dependence on NOTCH signaling for proliferation and survival. Deregulated NOTCH signaling has been reported to be oncogenic in several tumor types, for example somatic activating mutations of NOTCH1 are present in more than 50 % of T-cell acute lymphoblastic leukemia [77]. Therefore, these previous studies laid important foundation for studying the role of activated NOTCH1 in the molecular pathogenesis of breast tumors harboring NOTCH1-activating fusions and also provide a rationale for targeted therapies that interfere with the NOTCH signaling pathway.

On the contrary, some gene fusions discovered in breast cancer involve fusion partners that are not well-characterized functionally or biologically. Although structurally well-characterized, the biological/functional role of MAST kinase family genes in cancer has not been well studied. Ectopic expression of the MAST kinase family fusions in benign breast epithelial and breast cancer cells induced cell proliferation and endowed growth advantage [10]. Moreover, following MAST2 knockdown, the endogenous ARID1A-MAST2 fusion-positive breast cancer cell line failed to develop tumors in vivo. This suggests that ARID1A-MAST2 could function as a key oncogenic driver at least in this breast cancer cell line. A more detailed investigation using additional breast cancer cell lines is necessary to establish the oncogenic potential of these fusions in breast cancer as well as to provide mechanistic insights. On the other hand, in case of FGFR family fusions, despite involving biologically significant fusion partners, their implication in breast cancer development and progression is uncertain due to the rarity of their occurrence. Overexpression of these fusions was observed to induce the proliferation of 293T cells and TERT-HME cells, while knockdown of FGFR-BAIAP2L1 in the index bladder cancer cell line significantly suppressed cell proliferation [63]. Moreover, FGFR small-molecule inhibitor treatment inhibited the growth of xenografts developed from FGFR fusion expressing bladder cancer cell lines. Although these results suggest the significance of FGFR fusions in cancer, the impact of these fusions specifically in breast cancer cells has not yet been studied.

Together, all the recurrent gene fusions currently identified in breast cancer invariably have a key role in promoting tumorigenesis and endocrine resistance. Despite our increased understanding toward the transforming power of fusion genes, further studies are needed to comprehensively understand the mechanisms utilized by these fusions to promote breast cancer development/progression. These studies will not only enrich our knowledge about the disease, but also provide valuable concepts about how we might treat breast cancers harboring these fusions.

Clinical implications and future directions

Despite the high occurrence of gene fusion events, the biological significance and clinical implications of gene fusions in breast cancer have been largely elusive. From a therapeutic perspective, gene fusions that involve well-characterized and targetable fusion partners are of particular interest. This is represented by the ETV6-NTRK3 gene fusion. Clinical trials are currently recruiting breast cancer patients harboring NTRK rearrangements to test the efficacy of Entrectinib, an orally available inhibitor of the tyrosine kinases, NTRKs, ROS, and ALK [https://clinicaltrials.gov/show/NCT02097810, https://clinicaltrials.gov/show/NCT02568267]. In addition, Crizotinib, the ALK inhibitor currently in clinical trials, has been shown to inhibit ETV6-NTRK3 fusion kinase, block proliferation of ETV6-NTRK3-dependent tumor cells, and induce the regression of tumor xenografts in mice [78]. Further, the breast cancer cells harboring ETV6-NTRK3 fusion show upregulation and activation of AP1 complex. TAM67, a dominant negative inhibitor of AP1 shown to inhibit the growth of malignant breast cells [79], suggests that targeting AP1 complex in the ETV6-NTRK3-positive breast cancer cases might also be beneficial.

Further, the study of functional recurrent NOTCH gene family fusions indicated that NOTCH1 rearrangements were associated with high levels of activated NOTCH1 (N1-ICD) and were sensitive to the gamma-secretase inhibitor (GSI), MRK-003 [70]. This study not only provided a feasible rationale for the treatment of breast cancers harboring NOTCH1 rearrangements, but also offers valuable insights about the functional recurrent fusion gene families which were largely overlooked previously. For example, the FGFR2-AFF3, FGFR2-CASP7, FGFR2-CCDC6, and ERLIN2- FGFR1 fusions that join the FGFR family members to various 3′ or 5′ gene partners have been identified in metastatic breast cancers [63]. Although none of these fusion genes showed recurrence in breast tumors, all the fusion partners exhibited oligomerization capability, and thereby might contribute to upregulated FGFR signaling through forced oligomerization of the growth factor receptor [63]. Inhibition of FGFR might offer substantial therapeutic potential in treating metastatic breast cancers. Taken together, although activating gene fusions involving druggable gene families such as NOTCH, MAST, FGFR are mostly private events, these families of gene fusions in accrual will affect a significant subset of patients that could be potentially manageable through targeting the common targetable partners.

In addition to serving as viable therapeutic targets, gene fusions also bear potential prognostic and diagnostic significance in breast cancer. For example, luminal B breast tumors are notoriously known to develop de novo resistance to endocrine therapy, and clinically it is even difficult to clearly define this more aggressive form of ER+ breast cancer due to the lack of reliable and accurate genetic biomarkers. Current available classification methods like PAM50 gene expression profile or Ki67 index are limited by controversial performance, and the patients, especially those who are on the borderline, may be misclassified and thus may not receive the appropriate treatment. Our discovery of the ESR1-CCDC170 fusions in luminal B breast cancers [38] provided a potential genetic biomarker for defining and subtyping luminal B breast cancers. Detection of the ESR1–CCDC170 gene fusion may be used as an independent or companion diagnostic to screen for patients who may harbor this more aggressive form of breast cancer that may require advanced treatment. In addition, several studies have suggested activating mutations in ESR1 as a key mechanism in acquired endocrine resistance in breast cancer therapy [47, 49, 51]. Given that some of the ESR1 rearrangements have been shown to be associated with endocrine resistance, such as the ESR1-YAP1, future discovery of such rearrangements will have critical clinical implications in predicting endocrine-resistant breast cancer. In addition, evaluating the druggability of these ESR1 gene fusions including ESR1-CCDC170 will be of great interest for the development of new targeted therapies.

Conclusion and perspectives

In summary, integrative genomic research methodologies play an indispensable role in gene fusion discoveries in breast cancer. In the era of precision medicine, molecular subtyping of breast tumors is of utmost importance for genetic characterization of breast cancer subtypes and identification of effective treatment strategies. Current studies have identified diverse recurrent or pathological fusion genes in breast cancer that may drive the development and progression of more aggressive tumors. Despite the diversity of most fusion events in breast cancer, our finding of the recurrent ESR1-CCDC170 gene fusion suggests the presence of true recurrent genomic fusions in aggressive breast cancer subtypes, and more such pathological recurrent events are yet to be discovered.

While several deep sequencing studies of breast cancer transcriptome or genome have concluded that breast cancer genome exhibits complex rearrangement pattern, therefore recurrent gene fusions are rare, we believe that more significant recurrent gene fusions are still to be uncovered in breast cancer for the following reasons. First, the sensitivity of current NGS technology in detecting gene fusions may not be as good as expected. One of the reasons is that in gene fusion detection, only paired-reads spanning or encompassing the fusion junction are identifiable as chimeric reads, while reads from rest of the fusion sequences will be mapped to wild-type genes. This will result in substantially lower sensitivity of fusion detection compared to wild-type genes. In addition, deep sequencing technologies are not sensitive to the high-GC or high-AT sequences [80], which may be present in fusion junctions. Second, RNA-seq routinely generates a daunting quantity of chimerical sequences, most of which are artifactual fusion sequences, TICs, or nonfunctional passenger gene fusions. Third, the prohibiting costs of WGS technology have prevented large-scale deep sequencing of breast cancer genomes and the short read lengths of current WGS technologies impede the de novo assembly of fusion sequences. Fourth, although the primary tumors of breast cancer have been intensively sequenced with the current deep sequencing technologies, it will be interesting to further explore whether pathological recurrent gene fusions exist in the special breast tumor entities such as metastatic breast cancer, endocrine-resistant breast cancer, or Her2 therapy-resistant breast cancer. Therefore, it is reasonable to believe that with more advances in the sequencing technologies, extended sequencing of special breast tumor entities, as well as improvements in bioinformatics analyses, more recurrent and pathologically important gene fusions in breast cancers could be discovered in the near future.