Keywords

1 Introduction

Early studies of tumour karyotypes used direct observation of chromosomes and many examples of abnormalities were observed. Two technical advances moved the field forward: firstly, the development of culture media to grow cancer cells in vitro and, secondly, the discovery that colchicine arrests cells in metaphase making their chromosomes visible using microscopy. When possible, cells growing in short-term cultures were passaged until they became established as immortalized cell-lines and these remain a valuable tool for molecular cell biology. When staining techniques were developed and each individual chromosome could be distinguished [1], tumour-specific chromosomal abnormalities could be enumerated. This enabled the construction of large catalogues such as the Catalog of Chromosome Aberrations in Cancer [2] now online but originally a hard-back book. Most of the catalog was devoted to haematological malignancies due to the relative ease of obtaining chromosome spreads from bone marrow or peripheral blood specimens. Solid tumours were often represented by samples from metastatic lesions or effusions and, thus, only typical of late stage disease.

The development of molecular genetics, made possible by DNA cloning, completely changed the approach to the study of copy number in tumours (see Chap. 2). It was no longer necessary to be able to observe chromosomes directly: instead, by measuring the relative proportions of DNA from different regions of the genome, chromosomal gain and loss could be inferred. This meant that it was possible to analyze relatively large numbers of primary tumour samples from patients although the extensive use of cell lines continued. Rather than whole chromosomes or chromosome arms, the better resolution of newer techniques allowed the emphasis to shift to focal regions since this provided the opportunity to identify which genes were involved. In the 1980s and 1990s, the focus was on understanding the biological consequences of gene loss and gain because it became clear that gene copy number gain was a mechanism of activating oncogenes and that gene copy number loss was a method of inactivating tumour suppressor genes. By the end of the twentieth century, conventional chemotherapy was recognized as failing to deliver the hoped for improvements in survival in the major common malignances: understanding the way in which these genes, collectively cancer genes, initiate and maintain the disease was seen as a new approach to identifying drug targets.

2 Relationship Between Oncogenes and Chromosomal Amplifications

The chromosomal abnormalities, homogeneously staining regions and double minutes, have been found exclusively in mammalian tumour cells, particularly in cell lines. Using Southern blotting and fluorescence in situ hybridization (FISH) (see Chap. 2), correlative evidence in mouse adrenal tumours suggested that these structures might be the location of amplified genes [3]. The same techniques were used to make the link between a region of gene amplification and the location of a known oncogene. Since oncogenes, first found in avian retroviruses were known to have a cellular counterpart [4], a cell line, COLO320, with structural evidence of gene amplification, was screened for over-expression of 12 viral oncogenes: only the homolog of viral myc was overexpressed [5]. Neither the mechanisms of amplification of the homolog called cMYC nor the biological consequences of its over-expression were known at that time. However this study, by producing probes by molecular cloning for both filter and in situ hybridization, established a valuable approach to associate oncogenes with regions of gene amplification. To be successful, this trio of complementary methodologies worked best on cell-lines and, therefore, was most frequently applied to those tumours for which a large number of cell-lines had been derived. Lung tumours are a good example. In the 1980s, John Minna and Adi Gazdar had considerable success in developing the conditions required to establish lung tumours in culture and showed that members of the MYC family MYC, MYCL, and MYCN were frequency amplified and expressed [6]. Since not all tumour samples can be converted to cell lines, there was concern that the successful group represented the most aggressive diseases but parallel analyses of cell lines and their cognate primary tumours largely dispelled this concern [7].

Not all amplified regions have the benefit of encompassing a homolog of a viral oncogene to guide identification of the pivotal gene, but those amplicons that occur with high frequency have been the subject of intense scrutiny. A good example of this is the amplified regions on 3q. This is very common in all squamous cell carcinomas sometimes occurring as an extra copy of the chromosome arm, as in cervical tumours [8], but also involving minimal regions such as the focal amplifications seen at 3q26 in squamous cell lung cancers [9]. Even then, the region encompasses a large number of genes: several methods have been used to determine the key gene(s) within the amplicon. Often this starts with an educated guess evolving from a knowledge of the characteristics of genes within the amplified region: in the 3q amplicon TP63, P3CA and SOX2 have all been favoured [10]. In a few oesophageal tumours, copy number analysis has pinpointed SOX2 as the only amplified gene in the amplicon. Functional analysis confirmed its role in tumour proliferation when co-transfection with FOXE1 or FGFR2 transformed an immortalized (but non-tumourigenic) bronchial epithelial cell-line [11]. In another study SOX2 and another 3q26 gene, PPKC1 were shown to cooperate to activate hedgehog signalling in a cell model of squamous cell lung cancer [12]. Taking a computational approach to identify co-operating genes within the amplicon identified a further three genes: SENP2, DCUN1D1 and DVL3 [13]. Confusingly, increased expression has been associated with increased survival in some lung cancer patients [14] and decreased survival in patients with cervical cancer [15].

The narrative of this research, designed to identify the pivotal genes in this very important amplicon, illustrates a current problem. Although the gene order on chromosomes, their copy number and transcription levels are now well documented by high through-put sequencing and the use of expression microarrays, functional assays to confirm the key gene(s) in an amplicon have not kept pace with structural analysis.

3 Chromosomal Deletions and Tumour Suppressor Genes

Investigators using classical cytogenetic techniques were able to identify deletions but it was the application of a genetic and molecular genetic approach to a childhood tumour, retinoblastoma [16] that captured their importance for the development of cancer and identified a new class of genes, later called tumour suppressor genes, characterized by the requirement for inactivation of both alleles to elicit a tumourigenic effect – the ‘two hit mechanism’ [17]. Just as viral oncogenes were of value to pinpoint oncogenes involved in human tumours, for tumour suppressor genes (TSGs), inherited cancer syndromes provided a useful route to identify their chromosomal location [18]. Genetic linkage studies were used first to define the chromosomal locus followed by molecular genetic approaches, such as loss of heterozygosity analysis, LOH (see Chap. 2) to define the region further and identify genes that could be examined for mutations by sequence analysis.

A number of TSGs have been identified using this approach such as APC [19], BRCA1 [20] and BRCA2 [21]. In some situations, the role of the deletions is to delineate candidate genes; in others, finding that a gene already associated with cancer within a deletion can be a validation of its authenticity. This is true for TP53 which was first isolated as a host protein binding to a tumour viral protein (SV40 large T), but gained its tumour suppressor gene status when it was shown to reside in a frequently deleted region on chromosome 17 in colon tumours [22]. In the same study, the gene was shown to be mutated by sequence analysis. Subsequently, TP53 has been found to be involved in at least 50 % of human cancers [23], and is the subject of tens of thousands of research articles, yet despite being frequently mutated, it has not yet found its way into routine clinical practice either as a disease marker or a drug target [24].

The success in using deletions to pinpoint TSGs was next applied to cancers with no obvious inherited predisposition, since cytogenetic analysis of chromosomes spreads showed evidence of frequent deletions in solid tumours [1]. This was followed by LOH analysis of samples from much larger patient series hoping to define a minimally deleted region to reduce the number of genes that required scrutiny for the presence of mutations. This was a daunting task [24] especially before the human genome was sequenced and the number and order of genes on chromosomes was known. However, both alleles are inactivated by a homozygous deletion within the region of interest in some tumours, thereby limiting the number of genes to be examined, as the deletion has to be compatible with cell viability. Such deletions were used successfully in the identification of a number of TSG, p16/CDKN2A [25] and PTEN [26] being notable examples. But not all homozygous deletions harbour bona fide TSG [27]. Studies on chromosome 3 exemplify this. Deletions of or within the short arm of chromosome 3 are very common in a range of malignant tumours, especially those of the squamous subtype, and occur very early in the development of these tumours and are even occasionally detected in apparently normal epithelium cells (see Chap. 5). Many studies have scrutinized the genes residing in homozygous deletions in 3p without identifying genes showing frequent mutation [28]. One possible explanation is that although deletion is responsible for the loss of one allele the remaining allele is inactivated by an epigenetic mechanism such as methylation [29]. Although candidate TSGs on 3p were identified, such as RASSF1 and FHIT, and partly validated, their inactivation in mouse models did not produce robust evidence of their independent tumour suppressor function. Furthermore it is becoming clear that, without the benefit of homozygous deletions, LOH is a clumsy tool for positional cloning strategies [24]. Nonetheless a more recent evaluation by the originator of the “two hit hypothesis”, Alfred Knudson, concedes that TSGs may have a role in tumourigenesis through their partial inactivation, and the concept of haploin sufficiency has now been validated for a number of TSGs [30]. A recent example of what this might mean is a study in renal cell carcinoma which showed that genes involved in LOH adjacent to the VHL gene, a TSG with a known role in this cancer, were down-regulated, resulting in a network metabolism signature unique to this cancer [31]. Thus, the “one gene at a time” approach that worked so well in the early phase of TSG discovery may be too simplistic and cooperation between genes may be involved in somatically arising tumours [32].

4 Identification of Functionally Important Cancer Genes

The identification of consistent copy number changes, amplifications and deletions, can provide strong circumstantial evidence for the involvement of the delineated genes in tumourigenesis. If genes within the candidate regions are frequently mutated in a tumour-specific manner, this greatly increases the conviction that the gene is directly involved in tumour development. Even in this situation, and certainly when candidates have no recurring mutations, functional assays as mentioned above, are needed to confirm the gene’s status, and also to understand the way in which a mutant protein has a tumourigenic effect. Assays for oncogenes have depended on introducing the suspected genes as a cDNA into untransformed cells and scoring for a tumour-related phenotype, usually involving increased proliferation [33]. Conversely, tumour-suppressing potential is assessed by introducing the suspected gene into tumour cell lines and observing a decrease in tumour-related features, such as migration or colony formation [34]. Assays involving tumour formation in nude mice have also been used [35] although, more recently, genetically engineered mice have been the system of choice to recapitulate gene expression in human tumours more closely as described in Chap. 20 in this book. However, traditionally this has been a time-consuming and expensive method. More recently, using the increased data now available for both genomes and transcriptomes, computational methods to identify pathways or networks and expose driver genes have become prevalent [36, 37]. Additionally, biological screens, such as RNAi, are being developed to replace the single gene approach [38]. A recent review of all these methods is provided by Eifert and Powers [39].

5 Copy Number Changes Associated with Disease Outcome

Naturally with so many chromosomal regions and interesting genes associated with cancer, the question that is frequently asked is ‘do the genomic and genetic abnormalities have any clinical significance?’ For these translational studies, a knowledge of the gene function is not required; in fact the correlation of a genetic abnormality and a clinically-related phenotype can be another way in which evidence is accumulated to support the importance of a particular gene. Clinical utility ranges across diagnosis, prognosis and prediction of treatment response, including efficacy and toxicity.

An early success was the association of the MYCN gene and neuroblastoma. Following on from the discovery that the MYC gene is localized to an amplified region in lung tumours [40], other tumour types with known amplifications were tested with probes to MYC. In this way, a gene homologous to MYC was found to be amplified in neuroblastoma and called MYCN [41]. It was of particular interest because the degree to which MYCN was amplified was closely associated with the disease stage, demonstrating its value as a prognostic marker [42].

The greatest success in translating laboratory discoveries into the clinic has been obtained for breast cancer. An early observation was that the ERBB2 gene, more usually now called HER2/neu or just HER2, was amplified and over-expressed in breast cancer, and that this indicated a poor prognosis [43]. The development of an antibody to the HER2 protein, a receptor on the cell surface, that was shown to be effective in the treatment of HER2 “positive” breast cancer [44, 45] meant that it was essential to develop robust laboratory tests to identify patients who would benefit from HER2-targeted therapies [46]. These tests rely on FISH to detect gene amplification or immunohistochemistry to detect increased levels of the HER2 protein. As such, these tests are only semi-quantitative and subjective and rely on experienced professionals for their interpretation. There is clearly a place for a test based on direct assessment of the patients’ tumour DNA and high through-put sequencing should provide that, although issues of normal cell contamination and inter-tumour heterogeneity will have their own drawbacks.

Although many thousands of studies of both genes and chromosomal regions have been linked to cancer phenotypes [47, 48] described in a database [49], only a very small number have been developed for use in the clinic, such as HER2 and EGFR. There are a number of reasons for this. With regard to prognosis, recurrence and survival, often the information does not directly impact on clinical management because the number of treatment options is limited and there are other confounding factors involved in their selection. A further complication that is currently receiving attention, is the effect of intra-tumour heterogeneity on the distribution of markers and targets [50]. This heterogeneity could result in biopsies failing to reflect the molecular composition of the whole tumour with obvious consequences for clinical management. It has also been appreciated that the conceptual and statistical framework applied to clinical trials needs to be developed for biomarker studies [51]. This will be particularly important for those biomarkers developed for the prediction of treatment response, including toxicity, as the biomarker would have immediate clinical application [52].

6 Genome-Wide Assessment of Copy Number Changes

When gene expression data are used to inform clinical outcomes, rather than single genes, groups of genes are assessed in a single test [53]. This approach may prove useful for copy number data. Although genome-wide copy number evaluation using comparative genomic hybridization (CGH) has been used extensively to provide copy number read-out across the genome, analysis has not usually been of whole genomes, but rather to pinpoint regions of particular interest (see Chap. 2). However, it is possible to use DNA copy number data obtained using microarrays to define patterns of gain and loss within the genome that have distinct relationships with outcomes. Hicks et al. [54] showed that when the whole genomes of breast cancers were defined by “the number and proximity of genomic alterations” they could be segregated into groups with different overall survival. Consideration of the whole genome may be less vulnerable to inter-tumour heterogeneity since many parameters are being assessed simultaneously. Single gene tests have the problem that they might implicate a candidate driver which in reality is only amplified or deleted in a fraction of the tumour mass. They have the additional problem that the driver for any particular tumour might not be the gene being tested. With whole genome measurements, the real drivers will be more likely to be present at an early stage in the disease, so are detected more frequently. If whole genomes are used to identify candidate genes, then computational methods need to be employed to filter all the potential drivers. These whole genome signatures are usually less reliant on the copy number in any one gene, instead measuring the cumulative effects of multiple regions of the genome, or the entire genome. Although microarray analysis has been the most common method for whole genome copy number measurement in recent years, the advent of next-generation sequencing has seen this monopoly eroded. Campbell et al. first described copy number measurement using next-generation sequencing in 2008 [55]. Since then, it has been shown to be possible to use very low coverage data to produce a similar read-out to the microarray method using diagnostic material at low cost [56] making it suitable for clinical use.