1. Genomic Instability

DNA instability is a frequently used broad term in medical genetics that encompasses many different types of instability. Most instabilities are associated with human disease, although telomere shortening could also be considered a form of genomic instability that is associated with normal cellular aging, which most of us do not consider a disease. Chromosomal instability, such as chromosomal rearrangements, deletions, and duplications of large tracts of DNA are observed in most malignancies. Some genetic disorders are associated with various types of genomic instability, particularly after exposure to DNA damaging agents, and increased cancer risk, such as Werner syndrome, Bloom syndrome, Fanconi’s anemia etc.[1]

Other types of DNA instabilities occur on a smaller scale. Micro satellite instability (MSI), also referred to as a MIN phenotype or a replication error (RER+) phenotype, describes small instabilities that occur at regions of repetitive DNA. MSI is characteristic of many sporadic cancers and tumors from patients with hereditary nonpolyposis colorectal cancer (HNPCC). The MSI phenotype is often associated with a loss of DNA mismatch repair (MMR). This loss results in small polymerase slippages during DNA replication, and consequently, in DNA instabilities. Site specific instability is observed at particular trinucleotide repeats (TNR) that, for unknown reasons, can expand from one familial generation to the next. The expansion of these repeats underlies a family of neurodegenerative disorders known as the triplet repeat disorders or the dynamic mutation disorders, so named for the plasticity of the TNR.[2]

The end of the millennium was characterized by a dramatic increase in the number of published manuscripts on various DNA instabilities. Because of the breadth of the topic, not all can be reviewed in one article. We will review the current understanding of the instabilities involving microsatellites, including trinucleotides, and their role in human disease. This article will focus on outlining the most recent findings that add to our knowledge of the mechanism(s) of microsatellite and TNR expansion and how such instability results in disease. Understanding why and how such repeats can expand and how such expansion events result in toxicity and disease will be critical for the development of appropriate treatments for both neoplasia and the neurologic disorders associated with TNRs.

2. Microsatellite Instability (MSI) and Tumorigenesis

MSI is defined as the appearance of novel alleles at DNA microsatellites due to DNA polymerase slippage during replication of repetitive sequence. MSI is an indication of a ‘mutator’ phenotype that can be caused by loss of function of MMR genes that results in widespread genomic instability. MMR is a postreplicative repair system that repairs normal or damaged single base mismatches as well as small insertions/deletions.[3] It plays an essential role in the maintenance of genome stability. An absence of MMR results in a loss of repair of uncorrected polymerase errors and a strong ‘mutator’ phenotype, most readily visible as MSI. Micro satellites are good indicators of repair proficiency as their repetitive nature makes them prone to polymerase slippages during replication.[4]

Tumors from individuals with the inherited cancer syndrome HNPCC demonstrate MSI. These individuals carry a germline mutation in one of the MMR genes, but heterozygosity does not affect MMR under normal conditions.[5,6] However, it is during the individual’s lifetime that the wildtype allele is mutated in certain cells (usually colon, endometrium, skin, etc.) rendering the cells hypermutable in the absence of MMR function. MSI is also an important factor in many sporadic cancers.[7] A variable but significant proportion of most sporadic cancers demonstrate MSI, sometimes associated with causative mutations in the known MMR genes. However, many sporadic cancers are not associated with the known MMR mutations, suggesting the existence of other undiscovered ‘mutator’ genes or other mechanisms of MMR gene inactivation, or that other factors such as genotoxic carcinogens can also be responsible for an increased/enhanced mutation rate.[79]

Some tumors lacking MMR activity display no mutations in the coding regions of the known MMR genes, but instead are biallelically hypermethylated in a specific promoter region of the critical MMR gene hMLH1.[10] This has now been shown for a significant proportion of MMR-deficient sporadic tumors arising from a variety of tissues.[1114] This link between the importance of aberrant methylation and loss of tumor suppressor function adds a new dimension to Knudson’s original hypothesis of gene inactivation, as inactivation can be due to both genetic and epigenetic mechanisms.[15] Identifying the cause(s) underlying the changes in methylation will be an important step in understanding the generation of a mutator phenotype.

3. Other Mechanisms Causing MSI

While tumors that have lost MMR function almost always demonstrate MSI, the reverse is not always true. MSI can also be caused by mutation in other genes involved in replication and/or maintenance of the genome. For example, a deleted variant of the polβ gene was identified as prevalent in microsatellite-unstable breast tumors an fibroadenomas.[16] Mutations in either polδ or RAD27, a gene involved in Okazaki fragment processing, also result in instability of microsatellites in yeast.[17] Other factors that interfere with accuracy of DNA replication, such as nucleotide pool imbalances, could also result in inaccurate replication slippages across repetitive DNA. A complete understanding of the causes of MMR and identification of all the genes involved in maintenance of genome stability will be critical to delineating the different repair/replication pathways.

4. MSI Detection

MSI is assessed by the appearance of novel alleles after polymerase chain reaction (PCR) amplification across repetitive DNA sequences. However, results vary depending on which and how many genomic markers are tested.[1820] Furthermore, sequence composition of microsatellite alleles can also greatly influence their stability, suggesting that the use of multiple markers is required to assign stability status to a particular patient sample.[21] Genotyping difficulties are also hampered by technical difficulties in amplifying the novel, unstable alleles by PCR in a heterogenous tumor sample. All these genotyping complications make it difficult to compare different studies and to determine the various causes underlying instability in different tumor populations. Determination of a standardized assay for MSI detection is required. The mononucleotide repeat BAT-26 [an (A)26 repeat] has been identified as a microsatellite marker that predicts MSI with high specificity and sensitivity.[22,23] BAT-26 at first appeared to be quasi-monomorphic, showing variations in allele length (shortened by 4–15 bp) only in unstable tumors, thus eliminating the need for testing matched nontumor tissue.[24] Use of BAT-26 on over 500 solid tumors demonstrated that MSI could be determined with 99.5% efficiency,[23] suggesting that BAT-26 alone could be accepted as a standard to determining MSI.[24,25] However, subsequent data demonstrated BAT-26 is polymorphic in certain populations, such as African Americans,[26] suggesting more work is required to assess population differences before such a marker, or a similar marker, can be used as a standard diagnostic indicator of MSI.

5. Pathogenesis of MSI: Multistage Pathway to Tumorigenesis

How does a loss of MMR result in disease? ‘Mutator’ genes such as the MMR genes do not cause cancer directly by altering the growth conditions of the cell, but rather increase the likelihood of secondary mutations resulting in ‘transformation’ to a malignant phenotype. Neoplasia is the result of cumulative genetic changes that confer proliferative and metastatic advantages upon a normal cell. It has been suggested that the mutation rate of 10−10 per nucleotide per generation is insufficient to allow for accumulation of genetic changes that occur in neoplasia, and that a ‘mutator’ phenotype may be required to accelerate this process.[27] MSI reflects increased genomic mutability, including increased accumulation of mutations in oncogenes, tumor suppressor genes and/or additional mutator genes, accelerating the ‘tempo’ of accumulation of critical genetic alterations in the development of neoplasia.[28] Polymerase slippages are most likely to occur at mono- and di-nucleotide repeats, with the chance of slippage increasing with the length of the repeat.[29] Long mono- and di-nucleotide repeat sequences are likely to acquire a frameshift mutation in the absence of MMR. Several cancerrelated genes are mutated within coding mononucleotide runs in HNPCC colon tumors, suggesting the mutations were a direct consequence of the lack of MMR. The observation that other genes containing similar repeats are not mutated within the tumors adds to the evidence that these mutations actually have a pathogenic role. Mutations have been found within a poly(A) [(A)10] tract in the transforming growth factor-β receptor gene (TGFBRII) in RER+ colorectal cancers[30,31] as well as within repetitive tracts in the anti-apoptotic gene BAX,[3234] the insulinlike growth factor receptor gene (IGFIIR),[30] the β-catenin gene[35,36] and others. Mononucleotide repeats located within the MMR genes MSH3 and MSH6 are also often mutated as a secondary event in the presence of another MMR gene mutation.[32,34] Thus, although any nucleotide within a gene is at risk of a transition or transversion mutation, coding sequences containing mononucleotide tracts may be particular targets for inactivation or activation in the absence of MMR (fig. 1).

Fig. 1
figure 1

Schematic demonstrating several genes containing coding mononucleotide repeats that are mutated in colorectal cancer as a consequence of a lack of DNA mismatch repair. Identification of all mutations relevant to neoplastic development and the importance of timing of the mutation remains to be determined. It remains to be determined if some of the same genes, in addition to other novel, tissue-specific genes are similarly mutated in cancers with microsatellite instability (MSI+) arising in other tissues. IGFIIR = insulin-like growth factor receptor; MMR = mismatch repair; TGFβRII = transforming growth factor-β receptor.

6. Instability of Trinucleotide (TNR) Repeats

Expansion of TNR repeats is associated with hereditary neurologic diseases that include Huntington disease, spinocerebellar ataxias, and spinal and bulbal muscular atrophy.[2] These disorders are characterized by instability at only one genomic site, often a CAG, CCG or GAA repeat. Disorders can be characterized according to what the repeat is, and its location with respect to the gene associated with the specific disease (fig. 2). In each disorder, the TNR appears to expand beyond the normal polymorphic range when it is associated with disease. The length of the repeat is inversely correlated with the age of onset of the disease. Repeats expand during transmission to offspring, often with a parent of origin bias.[3739]

Fig. 2
figure 2

Schematic demonstrating human disorders associated with expansion of a trinucleotide repeat (TNR) and the relative location of the repeat with respect to exonic sequence (black boxes) or introns (thick line) of a particular gene associated with a specific disease. DM = myotonic dystrophy; DRPLA = dentatorubral pallidoluysian atrophy; FRAXA,E = fragile X syndromes; FRDA = Freidrich’s ataxia; HD = Huntington disease; MJD = Machado Joseph disease; SBMA = spinal bulbar muscular atrophy; SCA1,2,3,6,7 = subtypes of spinocerebellar ataxia.

The mechanism controlling the TNR expansions remains elusive. Several mechanisms have been proposed to account for repeat instability: polymerase slippage during DNA replication, gene conversion events, and unequal crossing over and recombination.[40] Although these mechanisms are not mutually exclusive, formation of slipped strand structures between repeated TNR sequences during replication is the favoured model for expansion (fig. 3).[4143] In this model, the Okazaki fragment may dissociate from the template and anneal elsewhere. This suggests that the activity of a molecule named FEN1, a flap endonuclease important in Okazaki fragment processing at the replication fork[44] may be critical to repeat stability.[45] FEN1 is the human orthalog of the Saccharomyces cerevisiae gene/RAD27. Replication forks have been shown to stall at TNRs, probably as a result of unusual secondary structures formed at these sites. Stalling is dependent on the length of the TNR, the purity of the repeat, and orientation of the repeat relative to the origin of replication.[46] Stalling can lead to DNA breaks and ultimately to DNA expan- sion via improper end joining.

Other experiments have shown that FEN1 has specific pause sites during its excision process which occur at predominantly G:C rich regions. In S. cerevisiae, RAD27 mutants demonstrate increased spontaneous mutagenesis,[47,48] with simple repetitive DNA exhibiting 280 times higher instability in RAD27 mutants compared with wild-type strains.[49] A separate study showed an increase in destabilizations of mini and microsatellite repeats in these mutants, with primarily additions occurring to both mini and microsatellite repeats.[17] Deletion of RAD27 in S. cerevisiae also results in destabilization of CAG tracts, and in contrast to changes in wildtype strains, the mutations in the absence of RAD27 are predominantly CAG expansions.[50] Furthermore, the increase in CAG lengths was demonstrated to be length-dependent in the absence of RAD27 activity.[51] This supports the hypothesis that CAG tract expansions associated with human disease may be associated with excess DNA synthesis on the lagging strand of replication (fig. 3).[50,51]

Fig. 3
figure 3

Simplified diagram of a replication fork, showing the generation of a DNA flap on the lagging strand that is normally processed by FEN1 into a continuous strand. In the absence of FEN1 the DNA flap is hypothesized to be incorporated into the lagging strand resulting in duplication of DNA and expansion of the region after subsequent replication and cell division.[41,43]

However, some recent observations in mice suggest that post-mitotic neurons are also capable of CAG expansion and that there may be different mechanisms of repeat instability in terminally differentiated tissue compared with mitotic tissue. Observations in a genetic mouse model of Huntington disease suggested that somatic expansions increase with age in a tissue-specific manner. Kennedy et al. demonstrated that striatal cells in older mice had tripled their CAG repeat size, despite their post-mitotic state.[52] This remains to be examined in patients with Huntington disease.

7. Sex- Specific Factors Influence Repeat Expansion

The parent-of-origin differences in CAG lengths that are observed in all of the dynamic mutation disorders are still unexplained. While expansions in noncoding regions are associated with maternal inheritance, those with a CAG repeat within a coding region all demonstrate a paternal transmission bias for expansion. It is still not clear whether the gender influence on expansion occurs in the germ cell or the embryo, or both. Evidence of expanded sperm carrying repeat expansions in patients with Huntington disease demonstrates that germline expansions do occur.[53] However, evidence from identical twins with fragile X syndrome (FRAXA) and different repeat lengths at the FRAXA locus has shown that substantial TNR expansion can occur post-zygotically.[54] Some surprising recent evidence regarding germline expansion suggests that CAG expansion is influenced by the gender of the embryo.[55] Mouse progeny from the same father demonstrate that in general, CAG repeats are expanded in male mice and contracted in their female siblings. Therefore early embryogenesis is an important time for the influence of gender-specific factors on repeat expansion. The nature of these factors remains to be determined. It is possible that X- or Y-encoded DNA repair/replication factors or early embryogenesis imprinting factors control gender-specific differences in expansion.

8. DNA Repair and TNR Expansion

The unexpected observation that the absence of DNA mismatch repair function results in a more stable TNR repeat[56] may suggest that DNA damage and error-prone repair are associated with TNR expansion. However, as the MMR proteins may have other roles, such as important signaling molecules for apoptotic pathways,[57] a direct association between trinucleotide repeat expansion and DNA repair has not yet been proven.

9. Pathogenesis of the Diseases Associated with Expansion of a CAG Repeat

Why specific neurons are selectively vulnerable in each of the CAG repeat disorders is unclear. One theory is that there are cell-specific differences in processing the mutant polyglutamine, resulting in toxic cleavage products in specific neurons.[58] Another possibility is that the mutant Huntington protein interacts with neuron-specific proteins, resulting in the toxic effects. Alternatively, there may be a toxic threshold for polyglutamine that is reached in specific neurons, due to a tissue-specific increase in CAG number. However, new evidence from the spinocerebellar ataxias may suggest that polyglutamine tracts or even CAG repeat expansion may not underlie the pathogenesis of the triplet repeat disorders, suggesting that other molecular pathogenic mechanisms may be involved.

The autosomal dominant cerebellar ataxias (ADCA) are a clinically and molecularly heterogeneous group of neurodegenerative disorders that involve progressive cerebellar ataxia, or failure of muscle coordination. Most ADCAs are caused by repeat expansions in unstable forms of genes beyond a pathogenic threshold size. The genes responsible for several subtypes of ADCAs, spinocerebellar ataxia (SCA) 1–10 and 12, have been cloned or localized in the genome and characterized, but there are ADCA families for which pathogenic genetic loci remain unknown.[59]

Ataxia subtypes SCA1, SCA2, SCA3/Machado-Joseph disease (SCA3/MJD), SCA6, and SCA7 have been found to be caused by expansion of CAG repeats translated as a polyglutamine tract.[6063] These SCAs share common properties with the other polyglutamine diseases.[62] The CAG repeat sequences are unstable, with the exception of SCA6, and thus undergo larger expansions in successive generations. As a result, the age of disease onset becomes earlier in successive generations, a clinical phenomenon called anticipation.[62,64] In fact, the number of CAG repeats and the age at onset have a strong negative correlation.[62] Each type of polyglutamine disease has a threshold number of CAG repeats ranging from 20 in SCA6 to 54 in SCA3/MJD, above which clinical symptoms appear.[62]

However, several other SCAs have recently been shown to not involve polyglutamine tracts, such as SCA8[61] or SCA12,[63] or even CAG repeat expansion (SCA10[59]), suggesting that other molecular pathogenic mechanisms may be involved. Koob et al.[61] recently cloned and characterized the gene responsible for SCA8. SCA8 is transcribed as a CTG repeat rather than in the CAG orientation.[61] The SCA8 CTG expansion occurs 5′ of the poly A signal, but unlike the other dominant SCAs, the expansion is not translated and thus SCA8 is unlikely to be a polyglutamine disease.[16] Further investigation into the molecular pathogenesis of SCA8 revealed that the SCA8 transcript is an endogenous antisense RNA to a brain-specific transcript that encodes a novel actin-binding protein KLHL1,[65] but its exact role has yet to be determined.

SCA8 is an adult onset disease with symptoms of dysarthria, tremor, and limb and gait instability. SCA8 mutations have molecular similarities to mutations that cause myotonic dystrophy (DM), but the symptoms do not include the multisystemic features of DM.[61] Koob et al.[61] found between 110–130 CTG repeats in SCA8 in affected patients, consisting of a CTG repeat tract and an adjacent polymorphic CTA repeat, but only 16–37 in control chromosomes. Other studies have confirmed these results, suggesting therefore that the penetrance of SCA8 is related to CTG repeat length.[66,67] However, the pathogenesis of the CTG repeats and even the validity of SCA8 as the disease locus has since been questioned, as several other groups have found SCA8 expansions in control populations that fall within the originally defined penetrance range.[60,6871] These findings suggest that the SCA8 CTG expansion may instead be a nonpathogenic polymorphism that is in linkage disequilibrium with a true ataxia locus.[63]

Another inconsistent characteristic of SCA8 penetrance is the parental bias during inheritance. The original study found that SCA8 repeats tended to expand during maternal transmissions and contract during paternal transmissions, therefore most disease associated repeats were maternally inherited.[61] One study confirmed this finding[70] but another found expansion did occur during paternal transmission, resulting in disease.[66] Still others have found paternally inherited disease but also contraction during paternal transmission, contradicting previous data.[67]

Until the discrepancies regarding SCA8 penetrance, including the pathogenic repeat threshold, reduced penetrance, gender effects, and the possibility of multiple loci are addressed, clinical diagnosis of SCA8 based on expansion of the SCA8 TNR is not recommended.[63] Mutable sequence interruptions in the CTG repeat tract, not observed in other trinucleotide repeat diseases, are hypothesized to be involved in SCA8 pathogenesis, but one study suggests that they may not affect SCA8 penetrance.[72] Another hypothesis suggests that the ratio of CTA to CTG expansion may be important for SCA8 penetrance.[70] The definition of SCA8 function will further clarify the role of trinucleotide repeat expansion in SCA8 pathogenesis.

Another peculiar form of instability is responsible for a recently mapped form of ADCA, SCA10, characterized by unique combination of cerebellar ataxia and epilepsy.[64,73] The region of the mouse genome corresponding to SCA10-linked 22q13 contains the calcium channel γ subunit gene CACNG2, a mutation in which results in similar symptoms of ataxia, cerebellar dysfunction and epilepsy in the mouse mutant known as stargazer (stg).[74] Matsuura et al.[59] examined the microsatellites of this region for trinucleotide repeats, but instead found an unstable pentanucleotide (ATTCT) repeat that currently represents the largest micro satellite expansion identified in the human genome. Expansions of up to 22.5kb in the pentanucleotide repeat were found in all affected patients while control patients showed polymorphisms of 10 to 22 repeats with no evidence of expansions.[59] An inverse correlation between expansion size and the age of SCA10 onset exists,[59] confirming the initial observation of anticipation in paternal transmissions.[73] Because the ATTCT repeat occurs near the 3′ end of the large (>66kb) intron 9 in the SCA10 gene, expansion is thought to affect SCA10 transcription or post-transcriptional processing.[59]

The identification of the pentanucleotide repeat expansion as a new class of mutations will help advance our understanding of the pathogenesis of microsatellite instability in human disease. In addition, understanding the molecular mechanism that underlies TNR expansion will help to explain the unique natures of the CAG-repeat associated diseases. The question remains as to whether other yet unknown TNR expansions are associated with neurologic disease. There are still many patients with unusual neurologic symptoms that do not demonstrate expansion at any of the known TNR expanding repeats. One patient with ataxia and mental impairment showed expansion of a novel CAG repeat located with in the TATA-binding protein (TBP) gene.[75] Although the expansion in this case appears to have arisen from an unusual duplication event, the discovery of any new TNR associated with disease provides yet another opportunity to investigate the molecular nature of such diseases. Perhaps these patients will be shown to have other types of genomic instabilities.

10. TNR Instability and Carcinogenesis

Are the CAG repeats associated with dynamic mutations rendered unstable in malignant cells? Evidence suggests not, when the polymorphic CAG repeat in the androgen gene was assessed in a variety of colorectal cancers.[76] This repeat ranges from 8 to 35 repeats in normal individuals, expanding up to 62 repeats in SBMA. In 50 sporadic colorectal cancers, 10% demonstrated somatic reductions, suggesting the repeat is unstable, however, as the changes were observed in samples that were defined as both MSI+ and MSI−, it is likely that either different mechanisms of instability are present, or it results from a purely random slippage.

11. Conclusion

The discovery of the association of widespread genomic instability at microsatellite sequences with certain cancers and instability of specific TNR with a variety of neurologic disorders has led researchers towards trying to understand the basic mutational mechanism(s) and other factors underlying such instabilities. An understanding of how such instabilities occur and result in disease will be critical in the development of appropriate preventons and treatments, with the end goal being reduction in human morbidity and mortality.