Cancer has been the second leading cause of death in the USA for the past 75 years (http://www.cdc.gov), accounting for almost 600,000 deaths in 2010. The majority of these cancer deaths were cases of solid tumors, and the primary cause of death was sequela associated with disseminated secondary tumors, or metastases. It has been estimated that 90 % of cancer-related deaths can be directly associated with metastatic disease. Better understanding of how tumors shed cells that colonize in distant organs is therefore important for better development of clinical strategies to prevent or treat metastatic disease.

Research into the etiology of metastasis has been growing over recent years, particularly during the past few decades as new technologies have enabled greater accessibility to study the different steps of the metastatic cascade. A number of different mechanisms have been proposed to explain the resulting experimental and clinical observations: the progression model [1], in which cells acquire metastasis-activating changes over time; the dynamic heterogeneity model [2, 3], which suggests a role for epigenetic plasticity in tumor progression; the cellular fusion [4] model, which proposes that tumor cells acquire their invasion and motility phenotypes by fusion with immune cells; the gene transfer [5] model, which hypothesizes that cancer cells incorporate circulating nucleic acids to acquire metastatic phenotypes; the exosome model [6], which suggests cells communicate and establish metastatic capacities by exchange of micro-vesicle-encased molecules; and the early oncogenesis model [7], which proposes that metastatic capacities were encoded by the same mutations that drove tumorigenesis.

Data exists to support all of these metastasis mechanisms. However, none of the models completely explains all of the observations. The likely explanation for this is that many, if not all, of the models are at least partially true. The metastatic cascade may incorporate different aspects of each of the models, and the precise mechanism utilized may be specific for tumor type and even tumor subtype [8]. The model that has best stood the test of time is the progression model. Work performed over the decades indicates that metastases tend to be clonal [911] and that they develop from specific subclones of the primary tumor [10, 1214], as predicted by this model. Furthermore, deep sequencing studies of matched primary and metastatic lesions confirm the presence of metastasis-specific mutations that would be expected from the progression model [10, 12, 13, 1518]. However, the progression model does not adequately explain why tumors derived from cells transplanted from a metastatic lesion are often as equally inefficient at metastasizing as the original primary tumor. If simple somatic mutation drove the metastatic process, cells that had completed the metastatic cascade would be expected to be permanently “locked” into the metastatic state by irreversible somatic alterations. The progression model also does not adequately explain a more recent observation that prognostic gene signatures can be detected in bulk primary tumor. If only a small subset of the primary tumor acquires all of the necessary functions to metastasize, any specific signal would be expected to be lost within the general “noise” of the bulk of the tumor tissue [7].

These and other observations suggest that additional factors must contribute to tumor progression. Recent studies have expanded our knowledge into other cellular factors important for tumor progression, such as interactions with components of the immune system [19], bone marrow-derived cells [20], and the interplay between tumor-derived exosomes and secondary organ sites [6]. Further investigation into these and other cellular and molecular factors are continuing to deepen our knowledge regarding the biology of progressive neoplastic disease.

Another factor that influences metastatic disease, which is frequently under-appreciated, is that of genetic background. Single nucleotide polymorphisms (SNPs) are one of the major drivers of evolution and are responsible for diversity not only between species but individual variation within species. This variation includes not only morphogenic phenotypes such as height, eye, hair, and skin color but also resistance or susceptibility to disease [21]. Importantly, even somatically acquired diseases like cancer have an important inherited component. Numerous molecular epidemiology studies have demonstrated that in addition to the highly penetrant cancer predisposition mutations like BRCA1, there are many low penetrant variants in the genome that cumulatively increase or decrease an individual’s probability of developing neoplastic disease [21]. Identification of these nucleotide variants and the genes they affect helps broaden understanding of the molecular and cellular pathways that play a role in cancer etiology.

Genetic susceptibility exists for both tumor initiation and also for the terminal stages of progression that together comprise the process of metastasis. This was demonstrated in the late 1990s using a transgenic model for metastatic mammary cancer, the MMTV-PyMT model [22], and a simple breeding scheme (Fig. 1). Male MTV-PyMT mice were mated to female mice from inbred strains on many different branches of the mouse phylogenetic tree. Once tumors had taken hold, the metastatic capacity of those tumors was determined by quantifying lung metastases. Significant suppression of metastatic burden was observed for approximately 40 % of the strains used (Fig. 2) [23]. Since each animal received the same transgene by breeding, and no differences were observed in the timing of transgene induction, level of PyMT protein, or post-translational modification of the protein, differences observed were mostly likely due to introduction of polymorphisms from the genomes, either nuclear or mitochondrial, of the non-MMTV-PyMT parent. In addition, since a continuum of metastatic capacity was observed across all of the strains used rather than discrete classes, this result suggested that there were likely to be many genes associated with metastatic progression.

Fig. 1
figure 1

Schematic diagram of the MMTV-PyMT inbred strain survey. Male MMTV-PyMT animals were bred to females from 28 different inbred mouse strains from different branches of the mouse phylogenetic tree (upper right). The resulting F1 progeny of these crosses was therefore on different genetic backgrounds due to the introduction of polymorphisms from the non-MMTV-PyMT parent, represented by the half-white, half-colored mice in the center of the figure. These animals were aged to determine whether the new genetic backgrounds were predisposed for poorly, intermediate, or highly metastatic capacity as the result of their individual genetic composition, as depicted on the bottom of the figure. Mouse strains with red font showed significant suppression of metastatic capacity as compared to the original FVB/NJ (underlined) genetic background. The phylogenetic tree is modified from Petkov et al. (2004) An efficient SNP system for mouse genome scanning and elucidating strain relationships, Genome Res. 14:1806–1811

Fig. 2
figure 2

Histogram displaying the results of the inbred strain metastasis survey. The average number of metastases in the whole lung observed after serial section is represented on the y-axis. The inbred strain bred to the MMTV-PyMT mouse is indicated on the x-axis. The genetic background of the MMTV-PyMT model (FVB/NJ) is indicated by the gold bar. Strains indicated by the blue bars were not significantly different (N.S.) from the original FVB/NJ background. Red bars indicate strains with significantly different metastatic capacities, compared to the FVB/NJ genome

The first of these genes was identified after a series of genetic and genomic studies that isolated the candidate genetic interval to a 110-kb region on mouse chromosome 19 [24, 25]. This region contained five genes, and sequence analysis demonstrated amino acid substitutions in two of these, Kcnk8, a potassium channel, and Sipa1, a RAP1 GTPase activating protein (GAP). Expression analysis indicated that Kcnk8 was restricted to the brain, so further efforts focused on Sipa1. The amino acid substitution in this protein was found to lie within an alpha-helix of a PDZ protein-protein interaction domain, and the allele from the low metastatic strain of mice was predicted to partially unwind the alpha-helix [26]. This was subsequently found to reduce, but not eliminate, the ability of SIPA1 protein to interact with its binding partner AQP2. Furthermore, the allele from the low metastatic strain was found to have reduced RAP1-GAP activity, suggesting that either binding of AQP2 or activation of RAP1, or both, played a role in metastatic disease.

To experimentally test the role of Sipa1 in tumor progression, ectopic expression of the allele from the high metastatic strain or shRNA-mediated knockdown of the endogenous gene was performed in metastatic mouse mammary tumor cells, which were subsequently implanted into animals. Suppression of the endogenous SIPA1 by approximately twofold was found to significantly suppress the metastatic capacity of the tumor. Conversely, ectopic expression of the high metastatic FVB allele resulted in an increase in metastatic ability (Fig. 3a). These results indicated that relative protein concentration resulting from variations in polymorphism-driven transcriptional efficiency, rather than somatically constitutive activation or inactivation, can play a major role in tumor progression [26].

Fig. 3
figure 3

a Results of orthotopic transplant assay testing for role of Sipa1 in metastasis. A highly metastatic mouse cell line was orthotopically implanted into mice after introduction of control vector (center), shRNA reducing Sipa1 by 50 % (left) or overexpression (∼twofold) of the FVB allele. Pulmonary metastases were enumerated 28 days after tumor implantation [26]. Figure reprinted from Hsieh, Lintell, and Hunter (2006–2007) Germline polymorphisms are potential metastasis risk and prognosis markers in breast cancer, Breast Dis. 26:157–62 with permission from IOS Press. b Kaplan-Meier analysis of the association of a polymorphism in the human ortholog of SIPA1 demonstrating a significant association of the minor allele (a) with distant metastasis free survival in estrogen receptor-positive (ER+), lymph node-negative (LN−) breast cancer patients. The number of patients for each genotype at each time point is listed below the x-axis. Figure modified from Hsieh et al. (2009) Distinct inherited metastasis susceptibility exists for different breast cancer subtypes: a prognosis study, published in Breast Cancer Research 11(5):R75

Two lines of evidence suggest that inherited polymorphism plays a role in human breast cancer metastasis susceptibility, as well as in rodents. The first line of evidence is direct molecular epidemiology of the human ortholog, SIPA1. Genotyping SNPs in the human gene demonstrated an association of allelic variants of SIPA1 for lymph node [27] or distant metastasis free survival [8] in independent cohorts of breast cancer patients (ex. Fig. 3b). Subsequent work demonstrated that one of the SNPs associated with protection against metastatic disease reduced promoter efficiency [28], consistent with the experimental results. Additional work has further demonstrated an association of SIPA1 variants with metastatic disease not only in breast cancer [29, 30] but also in lung [31] and cervical cancer [32].

The second line of evidence stems from population-based epidemiology. Several investigators analyzed large population-based cancer registries to look for potential clustering of metastatic disease within families. These investigators found that there was a significant enrichment of metastatic breast cancer in women whose mothers also had metastatic cancer compared to mothers whose tumors remained localized [3335]. In addition, similar clustering of prognosis, likely due to metastatic disease, was observed for other cancer types, including prostate [36, 37], bladder and renal cancer [38], and colorectal cancer [37]. However, significant clustering within families was not observed in families with cancers from different tissues, indicating that distinct mechanisms for inherited prognosis may exist for different cancers or cancer subtypes [37]. Interestingly, this observation was consistent with the molecular epidemiology results which demonstrated that the SNPs associated with prognosis for the metastasis susceptibility genes SIPA1 and RRP1B [39] were only able to discriminate outcome in estrogen receptor-positive, lymph node-negative patients [8]. Thus, while these results support the presence of inherited susceptibility genes for metastatic progression in the human population, they also suggest that multiple mechanisms may exist, each of which will require mapping and in depth characterization.

The existence of metastasis susceptibility genes in the human population begs the following question: how important are they in the progression of human disease? The relative importance of these genes compared to the somatically acquired mutations posited by the progression model is difficult to assess at this moment since a comprehensive catalog of metastasis-associated mutations has yet to be generated. The presence of metastasis susceptibility genes may however help reconcile some of the discordant features of the progression model. As mentioned earlier, the small subset of primary tumor cells that acquire all of the attributes to successfully metastasize would not be predicted to produce a prognostic gene expression signature in bulk tumor tissue due to a significant signal-to-noise problem. If the signature, however, was driven not by somatically acquired mutations but instead reflected a constitutionally encoded susceptibility that pre-existed disease, then the progression model might still be valid [40]. This hypothesis would suggest that prognostic gene signatures should be detectable in normal tissues of patients prior to disease onset. While this has not been demonstrated in patients as of yet, generation of prognostic signatures from normal tissues of animal models have been demonstrated [41]. Thus, it is likely that these genes play an important role in metastatic biology, though the details and the relative importance are not yet clearly established.

The next question is whether or not metastasis susceptibility genes have any clinical utility. Theoretically, it should be possible to identify patients with a high inherited risk of progressive disease by genotyping constitutional DNA, from blood, for example. Single SNPs, however, are unlikely to have any particular clinical utility. As demonstrated by the genotyping assays for SIPA1 and RRP1B, the ability to discriminate patient outcome for each gene is significant, but insufficiently robust to warrant routine use [8]. Furthermore, epidemiology studies have not yet identified any SNPs associated with prognosis in human populations. This likely suggests that metastasis susceptibility is encoded by many genes, each with a relatively small effect on the overall risk. This interpretation is consistent with the continuous range of metastasis susceptibility observed in mouse inbred strains, rather than discrete groups of susceptibility which would have been suggestive of a few, more dominant genes. Moreover, the complexity of the metastatic cascade—beginning with tumor induction followed by local cellular migration, intravasation, translocation to distant sites, arrest and extravasation, dormancy, and subsequent proliferation—provides many points at which subtle modification of gene function or dosage might have significant effects on metastatic efficiency. Clinical use of any inherited susceptibility is therefore likely to require multiple SNPs, each tagging different biological processes that could either enhance or suppress the phenotype. In addition, an inherited risk score may best be described as a continuous, rather than categorical, variable that should be combined with other clinical data for accurate prognosis.

As predicted by the genetics, our current understanding of metastasis susceptibility genes implicates many different systems. The evidence obtained to date indicate that inherited variation in the immune system plays both a positive and a negative role in tumor progression, consistent with cell biology research from multiple laboratories. In addition, genes associated with adhesion (Cadm1, Pvrl1) [42] [Bai et al., in submission] and the cytoskeleton and motility (Sipa1, Arap3) [26, 43] have also been implicated. Furthermore, a number of factors associated with transcription (Brd4) [44], chromatin biology (Arid4b, Rrp1b) [39, 45], or RNA stability and processing (miR216a/b, miR217, miR290-3p, mirR3470a/b, Cnot2) [46] have been identified as metastasis susceptibility genes. Interestingly, most of these genes have not been found to be frequently mutated (<3 %) in primary breast cancers, although RRP1B is frequently overexpressed and ARID4B both amplified and overexpressed. This would suggest that metastasis-associated genes are not major contributors to tumor etiology. Identification of critical metastasis drivers will therefore require strategies other than deep sequencing of primary tumors. The equivalent strategy to sequence large numbers of metastatic lesions would unveil the somatic mutations associated with tumor progression. These discoveries, coupled with genetic susceptibility to identify genes that might be modulated epigenetically rather than by somatically acquired mutation, would provide much more detailed understanding of the molecular and cellular mechanisms contributing to tumor progression.

In summary, understanding metastasis is a critical component in our ongoing odyssey to understand and control cancer. Metastasis is an immensely complicated process that requires participation of not only the somatic alterations that drive the primary tumor and potentially the metastatic cascade but also contributions from disparate tumor non-autonomous systems throughout a patient. To fully understand and intervene in the metastatic process, it is therefore necessary to understand the entire context in which it occurs (Fig. 4). Inherited genetics provides a window into portions of the metastatic cascade that are currently difficult to access experimentally. Furthermore, polymorphism “fingerprints” genes and processes that are not necessarily mutated in tumor progression, either because they are epigenetically controlled or they are expressed in non-tumor tissue. Integrating genetics strategies with the current efforts in genomics will provide a better understanding of the systemic disease that is metastatic progression, hopefully revealing new and improved methods for preventing secondary tumors and/or eradicating existing metastatic lesions.

Fig. 4
figure 4

The influence of polymorphism on the metastatic process, using the progression model as an example. Polymorphism has the potential of affecting every step during the formation of the “seed” by altering the probability or efficiency of completing each step. In addition, polymorphism can change the condition of the “soil” to make it more or less conducive for growth at the secondary site. In this example, the tumor with the C allele at the top of the figure is less capable of completing the metastatic cascade and has a less hospitable secondary environment compared to the tumor with the T allele. Thus, a patient with the C allele would be less likely to develop life-threatening metastases compared to a patient with a T allele