Common fragile sites

Common fragile sites (CFSs) are regions of profound genomic instability that are observed when cells are cultured in the presence of inhibitors of replication such as the DNA polymerase α inhibitor aphidicolin [1, 2]. However, in order for a specific region to be defined as a CFS there must be sufficient breakpoints within that region to be statistically significant. The CFSs are defined by a cytogenetic-based assay where fluorescent labeled DNA probes are hybridized to the metaphases of cells after exposure to aphidicolin. This assay revealed that breakage within any CFS occurred over a large region as large insert probes (such as BAC clones) would hybridize proximal to the region of breakage in some metaphases, distal in others and would sometimes hybridize across the region of breakage in yet other metaphases. There are approximately 90 described CFSs distributed throughout the human genome. The frequency that each individual CFS has breakage in the presence of aphidicolin varies between individuals and in different tissues, but the three most frequently expressed CFSs are FRA3B (3p14.2), FRA16D (16q23.2) and FRA6E (6q26) [1].

The biological significance of the CFSs is unknown, but they are hot-spots for deletions and other alterations especially in genomically unstable cancer cells. In addition, they also appear to mediate chromosomal amplification events through breakage–fusion–bridge cycles. The c-MYC oncogene, the most frequently amplified oncogene in human cancers is actually flanked by two CFS regions; FRA8C and FRA8D, and these appear to mediate c-MYC frequent amplification [3, 4]. The FRA2C common fragile site maps to the borders of MYCN amplicons in neuroblastoma. Breakage at FRA2H and a closely located CFS FRA2S sets boundaries of amplified regions in two leukemia cell lines [5].

CFSs are also hot-spots for viral integration events. An analysis of HPV16 integrations in cervical cancer revealed that almost half of these integrations occurred within CFS regions [68]. A similar analysis of HPV18 integrations in cervical cancer found that over 60 % of these integrations occurred in CFSs. In addition, there was a hot-spot for HPV18 integrations in the CFS regions surrounding c-MYC [3]. The recent sequencing of the HeLa genome revealed that this cervical cancer had HPV18 integrated 500 kb from c-MYC (within one of the flanking CFS regions) and this was also associated with amplification of the HPV genome at the site of integration [9].

Many of the CFS regions have been shown to have frequent deletions and other alterations in different cancers. Early studies demonstrated frequent loss of heterozygosity of polymorphic markers within different CFS regions and this led a number of groups to search for genes within those regions that could potentially be the target of those deletions. To best summarize these efforts, we will first describe work characterizing the three most frequently expressed CFS regions and the identification of extremely large genes within those regions. Then we will discuss the other very large genes that have been identified within CFS regions. We will then discuss work that demonstrates that some cancers have decreased expression of multiple large CFS genes. Finally, we will discuss the potential role that these very large CFS genes could be playing in the development of cancer.

FRA3B and FHIT

The first identified and most unstable CFS region in lymphoblasts is FRA3B which is located within chromosome band 3p14.2. In a study from a group of 70 normal healthy male subjects, the chromosome breaks or chromatid breaks and gaps at 3p14 could be observed in every individual by fluorescence studies [10]. This chromosomal band was a region of intense interest due to the fact that it was a hot-spot for deletions in both lung and renal cancer. In addition, there was a family with a balanced reciprocal translocation t(3, 8)(3p14.2;8q24.23) within this region and individuals in this family with this translocation had a very high probability of developing renal cell carcinoma [11]. Finally, there were HPV integrations into this region in several cervical cancers [12]. Thus, it was suggested that there might be a tumor suppressor gene that played a role in cancer development within this region.

A YAC clone was identified which crossed the FRA3B CFS (YC850A6) [13]. This 1.3 megabase (Mb) YAC clone contained the hereditary renal cell carcinoma translocation breakpoint, and several HPV integrations. However, even with this large insert clone FISH-based studies revealed that in some metaphases there were breakpoints proximal to this YAC clone and in others there were breakpoints distal to this YAC. Thus the entire FRA3B region encompasses a region larger than this YAC clone. Using BAC clones across 3p14.2 as FISH-based probes we were subsequently able to demonstrate that the full size of the FRA3B region of instability was 4.5 Mb [14]. However, there is a 300-kb core region within this where the majority of FRA3B breakpoints occur and this encompasses the hereditary renal cell carcinoma translocation breakpoint.

In searching for genes located within the FRA3B region, Ohta et al. developed a cosmid contig covering the homozygous deletions and a target gene was detected which had deletions in tumors that were derived from different organs. The amino acid sequence showed significant homology to a group of proteins that have a histidine triad motif (HIT), thus the gene was designated as the Fragile Histidine Triad (FHIT) gene. FHIT encodes a 1.1-kb transcript with ten exons, but is transcribed from an extremely large 1.5 Mb genomic sequence [15]. Since the discovery of the gene, deletions, loss of expression and other alterations of FHIT have been frequently observed in a variety of different cancers including breast cancer, lung cancer, cervical cancer, and B cell lymphoma [1619]. Subsequent work has demonstrated that not only this gene is a frequent target of alterations leading to its decreased expression but that there are alterations in this gene in a number of pre-malignant lesions both in the lung and esophagus [20, 21]. While many cancers had decreased expression of the FHIT gene there was initially some controversy over whether this gene was a true tumor suppressor as some cancers had alterations within the extremely large introns which left the FHIT exons intact [22, 23].

To determine if this gene was functioning as a true tumor suppressor, Ishii et al. introduced the FHIT gene by adenoviral transduction into several esophageal cancer cell lines which were deficient in FHIT. Adenoviral-induced FHIT expression induced caspase-dependent apoptosis, small apoptotic cell fractions and the accumulation of cells at the S to G2-M phase were observed in different cancer cell lines [24]. The same group also generated Fhit +/− and Fhit / mice and showed that these mice were prone to develop tumors. Tumor incidence could be decreased by adenoviral virus transfection of the wild type FHIT gene [25]. In addition, by introducing FHIT into pancreatic cancer cells which contain a large portion of FHIT deleted resulted in the induction of apoptosis, delayed tumor growth and prolonged survival in a murine model [26]. All these studies demonstrated that FHIT actually does function as a tumor suppressor and may potentially be used for cancer treatment or the prevention of carcinogen-induced tumor development.

In spite of the fact that the FHIT gene resides within one of most unstable genomic regions, there is remarkable sequence conservation of both FHIT and the FRA3B region. The Fhit gene in mice has the same overall organization as the human FHIT gene and the genomic region surrounding this gene is also a highly expressed CFS in mice (FRA14A2) [27].

Loss of FHIT expression is found to be a predictor of poor outcome in many different cancers. In colorectal cancer, loss of heterozygosity (LOH) of the FHIT gene was found to be associated with patients’ poorer survival [28]. Huang et al. [29] also demonstrated that reduced FHIT expression is strongly correlated with cancer progression and loss of FHIT expression was closely related to lymph node metastasis and parametrical invasion. In addition, Yang et al. [30] showed that FHIT expression was inversely correlated with histological grade, negative estrogen receptor status, TP53 overexpression and tumor proliferation activity in breast cancer. Furthermore, they found that reduced FHIT expression was associated with poor outcome. Toledo et al. [31] demonstrated that loss of FHIT protein expression was related to high proliferation, low apoptosis and worse prognosis in non-small cell lung cancer. Decreased FHIT expression was also found to be associated with worse prognosis in oral squamous carcinoma by Guerin et al. [32]. Recently, Kapitanović et al. [33] reported that reduced FHIT expression was associated with tumor progression in sporadic colon adenocarcinoma.

FRA16D and WWOX

The second most frequently expressed CFS is FRA16D which is located in chromosomal region 16q 23.3–24.1. This chromosome region was frequently found to have allelic loss in breast, prostate and ovarian cancers among others [34]. This region is also involved in a translocation t(14q32;16q23) observed in up to 25 % of multiple myelomas [35]. Thus, there were similarities between this region and the region surrounding FRA3B with respect to alterations in cancer. To identify candidate genes located in this region, Bednarek et al. [36] isolated and analyzed transcripts mapping to this region and they identified another extremely large gene, WWOX within this region. WWOX spans 1.0 Mb across the center of the FRA16D CFS. Similar to FHIT, in spite of the fact that this gene spans such a large genomic region, its final processed transcript is relatively small (2.1 kb). WWOX contains two WW domains with high homology to the short-chain dehydrogenase/reductase (SDR) family of enzymes. The SDR domain is predicted to be involved in sex-steroid metabolism and the WW domains are likely involved in protein–protein interactions.

As was found with FHIT, numerous studies have shown a very high incidence of allelic loss of markers mapping within WWOX in a variety of different cancers. The LOH of the region surrounding WWOX and (or) aberrant WWOX transcripts due to the absence of exons was observed in esophageal squamous cell carcinomas [37], and in non-small cell lung cancer [38]. WWOX has also now been demonstrated to function as a tumor suppressor. Re-introduction of WWOX into various cancer-derived cell lines that did not express it resulted in growth inhibition. Ectopic expression of WWOX into breast cancer cell lines could induce a dramatic inhibition of tumorigenicity [39]. The generation of WWOX knockout mice reveals that these mice are tumor prone [40]. Hence, similar to FHIT, WWOX also functions as an important tumor suppressor involved in the development of a variety of different types of cancer [41, 42]. However, unlike FHIT, which has decreased or absent expression in many different cancers, the story with WWOX is much more complex. While there are many published reports of cancers with decreased WWOX expression, there are also several reports demonstrating increased WWOX expression in multiple cancers [43, 44].

To understand the precise role that WWOX plays in normal and tumor cells, functional studies indicated that WWOX could physically interact with the TP53 homolog TP73 in the cytoplasm, and this interaction enhances its proapoptotic activity [45]. Studies in ovarian cancer cell lines have shown that overexpression of WWOX could reduce membranous integrin α3 protein levels, thus inhibiting the interaction between tumor cells and the extracellular matrix and thus inhibit tumor cells migration. In addition, WWOX restoration in ovarian cancer cell which contains homozygously deleted WWOX could abolish cell tumorigenicity [46]. Similarly, in human hepatocellular carcinoma cells, forced expression of WWOX could decrease FGF2-mediated proliferation and enhanced JNK inhibitor-induced apoptosis [47]. Recent studies using human lung adenocarcinoma cell lines revealed that ectopic expression of WWOX caused apoptosis by activation of procaspase-3 and caspase-9 and the release of cytochrome C [48]. Emerging functions of WWOX roles in tumor suppression and genome stability are discussed in R. Aqeilan’s chapter in this issue.

Similar to what was observed with FHIT and FRA3B, WWOX and FRA16D are also highly conserved. The mouse homolog of WWOX, Wox1 is located at chromosome band 8E1 in the mouse. Using BAC clones spanning Wox1, we demonstrated that this large mouse gene was located within the mouse CFS Fra8E1 [49]. The comparison of the genomic region spanning WWOX and Wox1 shows tremendous similarities in the overall organization of these two homologous regions. Figure 1 shows the comparison between the chromosomal regions surrounding the human WWOX and the mouse Wox1 gene. Both genes are located within the middle of their respective CFS regions.

Fig. 1
figure 1

Comparison ideograms and diagrams of common fragile sites FRA16D and FRA8E from human and mouse. Top the relative chromosome location of human common fragile site FRA16D and its associated gene WWOX. Bottom the relative chromosome location of mouse common fragile site FRA8E and its associated gene Wwox (the ideograms of human chromosome 16 and mouse chromosome 8 are retrieved from NCBI website. The diagrams of the human common fragile sites FRA16D and mouse FRA8E are taken from Krummel et al. [49]). Also included on this figure are the BAC clones utilized to characterize each CFS region and the number of times that that BAC clone was found to hybridize proximal, crossing or distal to the region of decondensation/breakage in specific metaphases

As was found with FHIT, decreased or lost expression of WWOX is associated with poor clinical outcomes. WWOX protein expression varies among ovarian carcinoma histotypes but those with significant loss of WWOX expression had the worst overall clinical outcome [50]. Aqeilan [51] demonstrated that WWOX was associated with ErbB4 in breast cancer. They further found that WWOX expression was absent in 36 % of the breast cancers they analyzed and this loss of expression was associated with an unfavorable outcome. Similar studies in clear renal cell carcinoma indicated that the downregulation of WWOX protein expression was also correlated with a less-favorable prognosis [52].

FRA6E and Parkin

The third most frequently expressed CFS in lymphoblasts is FRA6E (6q26). Chromosomal band 6q26 has a high frequency of LOH in squamous cell lung, ovarian, hepatocellular and breast cancers [5356]. By using FISH with large insert BAC clones derived from the 6q26 band, we were able to define FRA6E with seven clones across this region [57]. FRA6E spans approximately 3.6 Mb and sequence analysis revealed that are eight genes localized within this CFS region. One of these genes is PARK2 which was first identified as a mutational target in patients with autosomal recessive juvenile Parkinsonism (ARJP) [58]. PARK2 spans 1.36 Mb and comprises 11 small exons with a final processed transcript that is 2.3 kb in size [57]. Hence, this gene has a similar overall organization to FHIT and WWOX, namely extremely large genes which encode relatively small final processed transcripts. The PARK2 protein is known to contain an ubiquitin-like domain at its N-terminus and two RING finger motifs and an IBR at its C terminus. It encoded an E3 ubiquitin-protein ligase which binds to E2 ubiquitin-conjugating enzymes [59].

Further gene expression studies in a variety of cancer tissues and tumor-derived cell lines indicated that there was reduced or absent PARK2 transcripts in ovarian cancer, breast cancer, renal cancer, lung cancer [6063], and sporadic colorectal cancer [64]. The frequently observed partial or complete loss of PARK2 suggested that the genomic deletions observed across this gene might lead to tumor initiation and development. While germline PARK2 mutations were known to cause neural dysfunction, it was found that somatic PARK2 mutations could decrease PARK2’s E3 ligase activity, compromising its ability to ubiquitinate cyclin E and resulting in mitotic instability, suggesting its tumor suppressor’s function [65]. Recently, PARK2 was identified as a TP53 target gene that is an important mediator of TP53’s function in regulating energy metabolism and antioxidant defense, thus functioning in TP53 medicated tumor suppression [66].

As was observed with both FHIT and WWOX, breakpoints within the PARK2/FRA6E region and the reduced expression of a gene located just telomeric of PARK2, ADADIN, were found to be associated with poor outcome in breast cancer [63]. In addition, PARK2 and its potentially co-regulated gene PACRG were also found commonly downregulated in clear-cell renal cell carcinoma and this is also associated with aggressive disease and poor clinical outcome [67].

In addition, a recent analysis of somatic copy-number alterations (SCNAs) from 3,131 cancer specimens of 26 histological types identified 158 regions of focal SCNAs including 82 deletions and 76 amplification, of which all these three CFS large genes FHIT, PARK2 and WWOX were identified as frequent deletion targets [68].

Association between CFS regions and extremely large genes

Since the three most frequently expressed CFS regions contained extremely large genes and all three genes were demonstrated to function as tumor suppressors, we were very interested in whether or not there was an association between the other CFS regions and large genes and if there was, whether these other large CFS genes also played important roles in the development of cancer. Hence, in 2005 we obtained a list of the largest known human genes. There were 40 genes which spanned greater than 1 Mb and 200 which were greater than 500 Mb in size. Table 1 shows the 40 human genes which span greater than 1 Mb of genomic sequence. Also, included on this table are the full size of the genomic region that encodes each gene, the number of exons in each gene, the size of the final processed transcript and the chromosomal localization of each gene.

Table 1 The 40 human genes spans greater than 1 Mb of genomic sequence

The genes listed in the table are very interesting for multiple reasons. The first is that this demonstrates that there is not a relationship between the size of the genomic region which contains a gene and the size of the final processed transcript. This was clearly seen with FHIT, WWOX and PARK2 as all three genes spanned genomic regions larger than 1.0 Mb in size, but each of their final processed transcripts was relatively small. Indeed there are some genes that span considerably smaller genomic regions but which have final processed transcripts that are larger than those of these three very large genes. Other very large genes have quite large final processed transcripts, however. For example, DMD is the second largest known human gene (2.09 Mb) and it has a total of 79 exons and its final processed transcript is 13,957 bases long; LRP1B is the fourth largest known human gene (1.9 Mb) and it comprises 91 exons with a final processed transcript that is also quite long (16,556 bases). The second interesting observation about some of the largest human genes is that mutations in a number of these genes resulted in neurological alterations. Inactivation in Park2 in mice results in the Quaker phenotype. In addition, mutations in PARK2 in human results in early onset Parkinson disease, which was how this gene was first identified. Mutation of WWOX was observed in autosomal recessive cerebellar ataxia with epilepsy, mental retardation and retinal degeneration [69, 70]. Alterations in GRID2 result in the mouse neurological mutant Lurcher, and inactivation of the large DAB1 gene result in the mouse neurological mutant Scrambler.

Many of the chromosomal bands containing these extremely large genes were also the chromosomal bands that were known to contain CFSs. However, this did not definitively prove that these other large genes were contained within the CFS regions. The assay to determine where a particular CFS region begins and ends is based upon a cytogenetic assay using large insert clones as FISH-based probes. Individual BAC clones are hybridized to metaphase preparations after the cells are exposed to aphidicolin. A sufficient number of metaphases with breakage/decondensation of the particular CFS that is being studied are then analyzed to determine where the BAC hybridizes relative to that breakage/decondensation. The center of the CFS region is then defined as the position where BAC clones hybridize with approximately equal frequencies both proximal and distal to the breakage. To completely characterize the full region of genomic instability, FISH studies are continued with BAC clones both proximal and distal to that BAC until BACs are identified that always hybridize proximal to the region of breakage at one end and always distal at the other end. This assay is both difficult and time-consuming, especially for CFS regions whose frequency of expression is considerably less than those of FRA3B, FRA16D and FRA6E. When this was done for a number of CFS regions, it was found that the full size of an individual CFS region could vary from less than one Mb to over 10 Mb. Since the average chromosomal band is 5–15 Mb in size, a large gene and a large CFS region could be within the same chromosomal band but not be intersecting or overlapping. Hence, the only way to definitively know whether a particular very large gene is contained within a specific CFS region is to take large insert clones containing a portion of that gene (and for some of the largest genes it was actually advisable to take one clone from the 5′ end and another from the 3′ end of that gene) and to use them as FISH-based probes against metaphase preparations cultured in the presence of aphidicolin. A particular large gene is then defined as being within the closest mapping CFS region if a BAC clone spanning a portion of that large gene is found to hybridize either across the region of breakage–decondensation in one metaphase, or proximal to the region of breakage in one metaphase and distal in another. Its relative position within the spanning CFS region is then defined by the frequency that that BAC hybridizes proximal as compared to distal to the region of breakage/decondensation. We will next summarize the work done by us, and others, to definitively demonstrate that a number of the human genes which span large genomic regions were also actually contained within CFS regions.

RORA and FRA15D

The very first large gene that we tested to determine if it was derived from within a CFS region was the orphan retinoic receptor alpha (RORA) gene. Although this gene was not one of the 40 largest genes, it does span 730 kb within chromosomal band 15q22.2, which also contains the FRA15A CFS region. RORA is involved in the cellular response to hypoxia. We felt that RORA could be a very interesting gene to test since FHIT, WWOX and PARK2 all seem to play important roles in cellular responses to stress, and PARK2 and FHIT also appears to be involved in oxidative stress [71, 72]. BAC clones which spanned a portion of this large gene were used as FISH-based probes to demonstrate that RORA was derived from within the middle portion of the FRA15A CFS site [67]. RORA was expressed in normal breast, prostate and ovarian epithelium, but we found that it was frequently inactivated in cancers that arise from these organs [7376]. Subsequent work has demonstrated that RORA attenuates Wnt/beta-catenin signaling by PKC alpha-dependent phosphorylation in colon cancer [77]. In addition, RORA suppresses breast tumor invasion by inducing SEMA3F expression [78]. RORA is thus a very attractive candidate as an important tumor suppressor gene, but more definitive functional studies need to be carried out to prove this.

Disabled-1 (DAB1) and FRA1B

The human disabled-1 (DAB1) gene spans 1.25 Mb within chromosomal band 1p32.2, and this chromosomal band also contains the FRA1B CFS. Large insert BAC clones that spanned a portion of this gene were used as FISH-based probes to determine if this gene was derived from within FRA1B. This gene was found to reside within the FRA1B CFS site [79]. We further demonstrated that the expression level of DAB1 was decreased in many cancer samples, especially those from the brain and endometrium. Re-introduction of an overexpression DAB1 plasmid into two different cell lines with little endogenous DAB1 expression resulted in decreased cell growth. Subsequent work by other groups has demonstrated altered DAB1 splicing in retinoblastoma and neuroblastoma [80].

DMD and IL1RAPL1

DMD is the second largest known human gene which spans 2.09 Mb of genomic sequence within chromosomal band Xp21.2. There is also a second extremely large gene immediately adjacent to DMD, IL1RAPL1 (1.36 Mb) which is involved in X-linked mental retardation. It is interesting that a number of the very large genes are found immediately adjacent to a second large gene. Just centromeric of the 1.5 Mb FHIT gene is the 700 kb PTPRG gene. Similarly, just telomeric of the 1.3 Mb PARK2 gene is the 1 Mb parkin-associated gene. Chromosomal band Xp21.1 also contains the FRAXC common fragile site. We therefore took BAC clones spanning portions of DMD and IL1RAPL1 and demonstrated that both of these genes were contained within FRAXC. The full size of the FRAXC CFS is greater than 5 Mb as it spans both of these very large genes. However, studies to determine precisely where this CFS region begins and ends have not yet been carried out. Both DMD and IL1RAPL1 are abundantly expressed in normal brain, but they were dramatically under expressed in every brain tumor cell line and xenograft tested [81]. Mice that are double mutants for dystrophin and dysferin were found to be predisposed to develop rhabdomyosarcoma [82] hence DMD is also a very attractive tumor suppressor candidate.

GRID2 and FRA4G

Rozier et al. [83] characterized a conserved aphidicolin-sensitive CFS which was located at human 4q22 (now called FRA4G). The homologous region in the mouse, at chromosome 6C1 is also found to be a CFS region. This is an extremely long CFS region, spanning 15 Mb in the human genome. Contained within this unstable region is yet another very large gene the ionotropic glutamate receptor delta 2 (GRID2) which spans 1.47 Mb, and there are spontaneous chromosome rearrangements which occur frequently in mice, giving rise to mutant animals in inbred populations. These deletions in mice result in the Lurcher neurological mutant [83]. There have not yet been any publications describing alterations in the expression of this gene in cancer, but there are deletions in this chromosomal region in colon cancer [84].

NBEA and FRA13A

The FRA13A CFS maps to chromosome band 13q13.2. This region was characterized by Savelyeva et al. [85] and they found that the hot-spot for breakpoints within this CFS was within a 650 kb region within the neurobeachin (NBEA) gene, which spans approximately 730 kb. NBEA encodes a neuron-specific multidomain protein implicated in membrane trafficking and is predominantly expressed in the brain and during development. This gene is a target of recurrent interstitial deletions at 13q13 in patients with monoclonal gammopathy of undetermined significance (MGUS) and multiple myeloma [86]. It is also a translocation partner of PVT1 in multiple myeloma [87].

Integration of HPV into LRP1B in cervical cancer

Just one of the interesting aspects of the FRA3B CFS region was the observation that there were a number of HPV integration events within this region in cervical cancer. As part of our work analyzing HPV16 and HPV18 integration events in cervical cancer we not only found frequent integrations within other CFS regions, but that there were large genes at those integration sites in a number of instances. One of the very large genes that had an HPV18 integration event was the 1.9 Mb LRP1B. This gene is derived from within chromosomal band 2q22.1 which also is the chromosomal band that is known to contain the FRA2F CFS region. BAC clones spanning the HPV integration site within LRP1B were then found to be contained with FRA2F hence LRP1B is yet another very large CFS gene [4]. LRP1B is a very interesting gene with a number of potentially important cancer connections. The most exciting was the finding that the deletion of this gene in high-grade serous ovarian cancer is associated with acquired chemotherapy resistance to liposomal doxorubicin [88]. In addition, there are recurrent mutations in this gene in melanoma [89]. Finally down-expression of this gene promotes cell migration via the Pho-Cdc42 pathway and actin cytoskeleton remodeling in renal cell cancer [90]. This gene is therefore an excellent candidate as yet another large CFS tumor suppressor gene.

Not all very large genes are contained within CFS regions

The examination of the location of very large genes relative to mapped CFSs led us to examine a number of very large genes as potential CFS genes. As described above a number of the largest genes were found to be derived from within CFS regions, but not every large gene was localized within a CFS region. One gene that we did not even test was the seventh largest human gene, the ataxin-2 binding protein (A2B2). This gene is derived from chromosome band 16p13.2, but there are no known CFSs on the short arm of chromosome 16. One of the large genes that we did test was the deleted in colorectal cancer (DCC) tumor suppressor gene derived from 18q21.1, but we found that this gene was located proximal to the closest mapping CFS (FRA18B).

Other large CFS genes

At this point there have been 26 very large genes localized within CFS regions by us and other groups. Six of the ten largest human genes have been localized within a CFS region: CNTNAP2 in FRA7I, DMD in FRAXC, LRP1B in FRA2F, CTNNA3 in FRA10D, DAB1 in FRA1B, and FHIT in FRA3B. Table 2 includes all 26 very large genes which are known large CFS genes. Included in the table are the genes, their sizes, the chromosomal band where they are localized, and the CFS region that they are contained within.

Table 2 The 26 known CFS large genes and their CFS regions and size

There are a number of other very large CFS genes which definitely map to chromosomal bands known to contain a CFS. Unfortunately, it is not completely trivial to test every large gene to determine if it lies within a CFS region, especially for those CFSs that are expressed at very low frequencies. Hence, it is unknown at this point how many of the largest genes are derived from within CFS regions. What is known is that the size of CFS regions varies from less than 1 Mb to over 10 Mb in size. Thus, the full length of the genome that all the CFSs span could be up to 500 Mb in size. It is therefore likely that the full number of large CFS genes (depending upon what your definition of what a large gene is) could be over 100 genes. While many of the largest genes may be contained within CFS regions, it is also likely that there are numerous very large genes that are not within these highly unstable chromosomal regions.

Do large genes cause CFSs?

The observation that so many large genes reside within CFSs provides an important linkage between genomic instability and the loss of expression of potentially important cancer-related genes. Helmrich et al. described a potential reason why so many very large genes are contained within CFS regions. This is based upon the fact that the time it takes to transcribe human genes larger than 800 kb is more than one completed cell cycle. In addition, the highly unstable CFS regions that span several of these large genes replicate late. Hence, regions of concomitant transcription and replication in late S phase could be responsible for those regions being so unstable due to collisions between replication and transcription complexes [91]. This is an attractive hypothesis, but if it is true one would expect that the longer a particular gene is the more instability the CFS that contains that gene would have. However, this does not take into account the fact that perhaps instability within a particular CFS region in a specific tissue type could be due to the level of expression of that gene in that tissue, and this needs to be examined further.

Genomic instability within FRA3B (FHIT) and cancer

Studies analyzing FHIT within the FRA3B region also provide an important linkage between genomic instability and cancer (for details, see K. Huebner Chapter in this issue). The two breast cancer predisposition genes BRCA1 and BRCA2 are known to be involved in the maintenance of genome stability, and mutations of these two genes are often found in both breast and ovarian cancer. Ingvarsson et al. [92] demonstrated that the loss expression of FHIT was observed much more in BRAC2 / breast tumors than in sporadic tumors without BRAC2 mutations. This indicated that the loss of BRCA2 could affect the genome stability of the FRA3B locus, thus cause the loss of FHIT expression in breast cancer. Similarly, loss of FHIT expression was also significantly more frequent in cancers arising in BRCA1 carriers compared with sporadic breast cancers. It is also found that alterations within FHIT could further promote genome instability. Recently, Miuma et al. [93] showed that in Fhit −/− derived mouse MEF cells, the somatic CNVs (DNA gain or loss >10 kb) occurred more frequently than in Fhit +/+ cells. In addition, more genes were detected with small insertions, deletions and point mutations in mouse Fhit −/− kidney cells than Fhit +/+ cells. Previously, Turner et al. [94] also showed that FHIT-deficient cell lines had elevated expression of chromosome gaps and breaks. All these studies demonstrate that it is possible that overall global genome instability may be induced by the loss of FHIT, and that the FHIT/FRA3B region is profoundly sensitive to increases in genomic instability. It will be interesting to determine if this is the case for some of the other large CFS genes.

Large CFS genes and cancer

Most of the effort in studying the CFS large genes and their relationship to cancer has been focused specifically on the two most well-known large CFS genes FHIT and WWOX. Although loss of expression of either FHIT or WWOX alone has been shown to be involved in the development of a variety of different cancers, the concordant loss of expression of both genes has also been found in invasive breast cancer, hematopoetic malignancies, cervical cancer, thyroid cancer and pancreatobiliary cancer [9599]. Moreover, it has also been shown that the novel chimeric genes PVT1-NBEA and PVT1-WWOX occurred frequently in multiple myeloma, in the presence of abnormal expression of NBEA and WWOX [81]. This indicated that in certain types of cancers rather than just a single CFS large gene contributing to the cancer development by its own specific pathway, there might be multiple CFS regions and their large genes that contribute to cancer development.

However, the systematic profiling or analysis on the all known CFS large genes is limited in cancer studies. Genome-wide DNA-profiling of HIV-related B cell lymphomas revealed that the three known CFS tumor suppressor genes FHIT (FRA3B), WWOX (FRA16D), and PARK2 (FRA6E) were frequently concordantly inactivated in HIV-positive non-Hodgkin lymphomas [100]. Previously, we analyzed the expression of 14 large CFS genes in two distinct groups of head and neck cancers using real-time RT-PCR. The first were oral tongue squamous cell carcinomas (SCCs), which do not have a human papillomavirus (HPV)-associated etiology and the second were base of tongue/tonsillar (oropharyngeal) SCCs (OPSCC), which quite frequently are HPV-positive. These two groups of head and neck cancers showed distinct groups of large CFS genes having loss of expression which suggested that there might be a selection for loss of expression of specific CFS genes in different cancers [101].

We have been analyzing OPSCC specifically because of the dramatic increased incidence of this type of cancer, in spite of decreased incidences of smoking in the United States [102]. This is due to dramatic increases in HPV-positive OPSCCs presumably due to changes in sexual practices. Our studies have included both mate-pair sequencing of genomic DNA to characterize genome-wide alterations in these cancers [103], and RNAseq to analyze changes in gene expression [44]. The RNAseq study which was carried out in 10 OPSCCs enabled us to do a systemic analysis on the expression of all known CFS large genes, as this study compared RNA expression in OPSCC tumors to matched normal oropharyngeal tissue obtained from the same patients [104]. This analysis revealed that there was a select group of CFS large genes that had decreased expression in the tumor samples when compared to the matched normal tissue, while some other CFS large genes showed either increased expression or no changes in their expression (Fig. 2). The two genes that had the greatest and most consistent decreased expression were FHIT and PARK2, the two known tumor suppressor CFS genes. Several other CFS large genes (DMD, DLG2, NBEA, and CTNNA3) also frequently had decreased expression [44]. Validation experiments using quantitative reverse transcription real-time PCR in a much larger number of OPSCCs revealed that this selected group of CFS large genes had decreased expression in more than half of the samples analyzed.

Fig. 2
figure 2

RNA sequencing revealed that different common fragile sites genes showed different expression pattern in examined oropharyngeal squamous cancer samples. The RNA sequencing data were analyzed using Geospiza Genesifter analysis pipeline. The pipeline generated each gene’s average expression in tumor group and normal group. The figure is represented in log2 format

As mentioned, HPV has been known as the leading factor causing dramatic increases in the incidence of OPSCC over the past several decades. In our study, we observed that there is a subset of tumors in the HPV-positive group that had more dramatic decreases in the expression of these large genes compared to the HPV-negative group. However, regardless of HPV status of the OPSCC patients, the proportion of tumors that had either increased or decreased expression of these large genes was similar. Most importantly, in most tumors, these large genes showed a similar expression pattern in each individual tumor sample; thus, all six genes were generally either all up-regulated or all downregulated and to a similar extent [44]. The observation of decreased expression of multiple large CFS genes is not confined to just OPSCC. We extended these studies to other cancers of the head and neck and observed the same phenomenon. In addition, an examination of the RNA seq data generated from the Cancer Genome Atlas in breast cancers reveals that there are also multiple large CFS genes that have consistently decreased expression in that cancer. The precise subset of known CFS genes with decreased expression in breast cancer is different from those observed to have decreased expression in OPSCC.

However, WWOX, a known CFS large gene tumor suppressor was not observed to have decreased expression from both the RNAseq and real-time PCR analysis, but instead had increased expression in most tumor samples examined [44]. This is not the first report that WWOX had increased expression in cancer. Previously, Watanabe et al. [43] reported that they observed elevated WWOX protein level in gastric and breast carcinoma. Along with WWOX, RNAseq data indicated that several other CFS large genes including DAB1, GRID and CTNN2 had increased expression in the tumor samples examined [44].

The decreased expression of these six CFS large genes observed in OPSCC could not be simply interpreted as the result of random genome instability within CFS regions. If it was due to random genomic instability in the CFS regions during carcinogenesis we would have expected to see losses in the expression of all CFS genes, or more losses in the expression of the large CFS genes derived from the most highly unstable CFS regions. Instead we observed that some of the CFS large genes did not have significant changes or actually had increased expression, as was observed frequently for WWOX. It is possible that there is a selection for alterations in specific regions due to the important large genes that reside within them, or this could be due to some chromosome regions that are more sensitive to specific carcinogens that are involved in certain type of cancers. For example, FHIT has been shown to have more significant alterations or loss in smokers in both lung cancers and cervical cancers as compared to these cancers arising in non-smokers [105107]. Studies in rodents exposed to cigarette smoke demonstrated that the loss of FHIT is an early event in smoking related lung carcinogenesis [108]. In addition, Thavathiru et al. [109] also demonstrated that expression of common chromosomal fragile site genes, WWOX/FRA16D and FHIT/FRA3B is downregulated by exposure to the carcinogens, UV, and Benzo(a)pyrene diol epoxide (BPDE) but not when exposed to ionizing radiation.

The proportion of cancers that had decreased expression of these large genes is similar in HPV-positive and HPV-negative group, but we did observe that a subset of HPV-positive OPSCCs showed more dramatic decreases in the expression for these six specific genes. Previous work has shown that the activation of the expression of HPV-16 E6, E7 oncogenes could cause de-activation of the RB-E2F pathway, leading to perturbations in replication, DNA damage and structural chromosomal instability [110, 111]. Whether HPV plays a role in inducing genomic instability and thus causes greater decreases in the expression of these CFS large genes, as compared to HPV-negative OPSCCs will require further investigation.

The decrease or loss of expression of a single CFS large gene such as FHIT and WWOX is known to be associated with tumor progression and poor prognosis as discussed above. Studies from Sbrana et al. [97] also showed that concordant loss of expression of both FHIT (FRA3B) and WWOX (FRA16) is associated with failure of apoptosis in lymphocytes from patients with thyroid cancer. In addition, recently Le Tallec et al. performed common fragile site profiling in epithelial and erythroid cells. They found that over 50 % of recurrent cancer deletions originate in CFSs that are associated with large genes [112]. Thus, it will be interesting to determine if the loss of expression of this group of large CFS genes observed in OPSCC will have any clinical significance and in predicting patients’ outcome. If so, this could lead a new direction to better stratifying patients for better treatment.

Conclusions

There is clearly an association between many of the highly unstable CFS regions and extremely large genes. However, not every large gene is contained within a CFS region, nor does every CFS span a large gene. There have been 26 very large genes localized within CFS regions and once many of the CFS regions have been precisely defined to determine where they begin and end, it is most likely that the number of very large CFS genes could approach 100 or more. The three CFS genes that are contained within the three most frequently expressed CFSs have all been demonstrated to function as important tumor suppressors. Many of the other very large genes that are already known to be contained within CFS regions are very attractive tumor suppressor candidates, but functional studies will need to be carried out. The finding that numerous cancers have decreased expression of multiple large CFS genes suggests that this may be an important linkage between genomic instability and cancer progression through the loss of multiple important genes. The concordant loss of expression of multiple large CFS genes, many of which may also function as tumor suppressors could have a profound phenotypic effect on the tumors associated with this loss. This may explain why increased genomic instability has been found to be associated with worse clinical outcomes. There is clearly a great deal of work that remains to be done, both in defining the numerous CFS regions and in characterizing the interesting group of large CFS genes for their role in cancer development.