Keywords

1 Introduction

1.1 Benefits and Problems with a Repetitive Genome

Nearly half of the human genome consists of repetitive DNA sequences composed mostly of interspersed and transposon-derived repeats but also tandem repeats (TRs) (Gemayel et al. 2010). Microsatellites, also known as TRs and simple sequence repeats (SSRs), are often defined as repeating units of ≤10 base pairs (bp) while larger repeats are referred to as minisatellites (>10 bp) and macrosatellites or megasatellites (>135 bp). Microsatellites, which account for 3–5 % of mammalian genomes, are highly polymorphic since the DNA replication, repair, and recombination machineries have intrinsic problems handling these unusual repetitive sequences due to their tendency to form imperfect hairpins, quadraplex-like and slipped-stranded structures (Lopez Castel et al. 2010; Mirkin 2007). Many microsatellites are bidirectionally transcribed (Batra et al. 2010; Budworth and McMurray 2013) and repeat length polymorphism is common with mutation rates 10–100,000 fold higher than other genomic regions (Jansen et al. 2012). In the overall population, repeat lengths for a given allele vary moderately and unaffected individuals may harbor alleles with a different number of repeats within the normal range. However, once an allele expands beyond a critical length threshold, instability is greatly amplified and the mutation manifests into a pathological state. Because expansions and contractions occur during cell division and error-prone DNA repair, affected patient tissues are composed of cells containing varying numbers of microsatellite repeats in the disease allele, a process termed somatic mosaicism (Lopez Castel et al. 2010; Mirkin 2007). Furthermore, continuous expansions of mutant repeat lengths often occur during aging, which might explain the progressive nature of many of these neurological diseases. Microsatellite expansions and contractions also occur in the germ line, which affects the repeat length passed on from one generation to the next. Comparison of average repeat lengths among patients within pedigrees shows that successive generations often have progressively larger repeats. Moreover, this increase in repeat length often correlates with an increase in disease severity and earlier age-of-onset of disease symptoms, hence providing a genetic explanation for the observation of intergenerational anticipation (Friedman 2011).

Although the remarkable abundance of repeats in the human genome hints at functionality, early reports classified these sequences as an evolutionary artifact or non-functional “junk” DNA (Doolittle and Sapienza 1980; Ohno 1972; Orgel and Crick 1980). More recent studies indicate that repetitive DNA might serve valuable cellular functions. Microsatellites occur in the protein-coding regions of ~17 % of human genes and the 10–20 % of eukaryotic genes that contain microsatellite repeats are often important for cellular regulatory pathways (Gemayel et al. 2010; Jansen et al. 2012). For example, TRs in budding yeast are primarily found within genes encoding cell-surface and important regulatory proteins, including chromatin modification and transcription factors. However, TRs are not restricted to eukaryotes. Indeed, TRs facilitate antigenic variation in pathogenic prokaryotes as a mechanism to evade host defense systems (Gemayel et al. 2010; Mrazek et al. 2007). Repeat unit variations in promoter regions may lead to changes in gene expression while TR variations in coding regions can result in frameshift mutations and the production of truncated proteins. Thus, simple sequence repeats serve regulatory functions and catalyze adaptations beneficial to pathogen survival. Despite these examples of TR functions, functional roles for microsatellite variability in human genes have not been well documented although TRs are enriched in vertebrate genes that control organ and/or body morphology (Gemayel et al. 2010). Due to their relatively high mutation rate, variability in microsatellite repeat number within a gene might offer evolutionary and regulatory advantages. For example, variations in GC-rich repeats within the normal range positioned within a promoter could have modest, but advantageous, effects on transcriptional activity while similar repeats in the 5′ untranslated region (5′ UTR) could modulate translation and the number of encoded proteins available during a particular developmental window (Fig. 10.1a). However, most studies have emphasized the detrimental effects of larger microsatellite expansions. How do expanded microsatellites cause disease? Research has revealed several distinct mechanisms, some of which are novel and unique, at least currently, to repeat disease. Relevant to this review, some of these mechanisms involve RNA-binding proteins and altered RNA processing.

Fig. 10.1
figure 1

Microsatellites in normal and mutant genes associated with hereditary diseases discussed in this chapter. (a) A normal five exon tandem repeat (TR) gene with 5′ and 3′ UTRs (open boxes), coding region (black boxes), introns (thick grey line), and microsatellites (green boxes) located in promoter, noncoding (UTRs, intron) and coding regions. Copy number variation, due to errors during DNA replication or repair, may result in a modest variation in TR number (shown here as 3–4 TRs) that might influence specific regulatory pathways depending on the TR location in the allele. (b) TR expansions result in the diseases discussed in this chapter. The position of each TR is shown (single green box) together with normal (N) allele and mutant (red triangle) length ranges

1.2 Microsatellites in Disease

Over two decades ago, the field of unstable microsatellite disease was initiated by seminal findings on the molecular etiology of fragile X syndrome (FXS), caused by a CGGexp in the fragile X mental retardation 1 (FMR1) gene, and spinal and bulbar muscular atrophy (SBMA), which is due to CAGexp mutations in the androgen receptor (AR) gene (La Spada et al. 1991; Verkerk et al. 1991) (Fig. 10.1b). Currently, dozens of neurological diseases have been traced back to microsatellite expansion mutations in additional genes with the most recent discovery that a GGGGCCexp in the C9ORF72 gene is responsible for the most common cause of familial amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) (C9-linked ALS/FTD) (DeJesus-Hernandez et al. 2011; Renton et al. 2011).

Immediately following the discovery of unstable microsatellites in FXS and SBMA, several perplexing questions emerged regarding how expanded microsatellites cause disease. Why are such unstable and disease-prone repetitive sequences so prevalent in the human genome, and in some cases, conserved during evolution (Buschiazzo and Gemmell 2010; Gemayel et al. 2010). Why are so many microsatellites located in noncoding regions? The discovery of microsatellite expansion disorders has been accompanied by global initiatives to characterize the associated disease pathologies and uncover the molecular mechanisms involved in this group of neurological disorders. Several common mechanistic themes have emerged (Fig. 10.2). Many of the repeat expansion diseases involve the nervous and musculoskeletal systems, follow a dominant inheritance pattern and show genetic anticipation. Despite widespread expression of certain disease genes, neurological and neuromuscular systems display a particular vulnerability to mutant microsatellite genes. Histological analysis of patient tissue reveals that many of these disorders are characterized by cellular aggregates of homopolymeric proteins, notably polyglutamine (polyQ) inclusions (Orr 2012b; Zoghbi and Orr 2009). In some diseases, patient cells contain nuclear foci harboring RNA expressed from the expanded genes along with specific proteins. Intriguingly, RNA-binding protein misregulation and compromised RNA metabolism have become recurrent disease themes (Echeverria and Cooper 2012; Poulos et al. 2011).

Fig. 10.2
figure 2

Proposed pathogenic mechanisms for microsatellite repeat diseases. Microsatellite expansions (red triangles) in noncoding (UTRs, small blue boxes; introns, black lines) regions may result in RNA (yellow boxes) gain-of-function (RNA GOF), protein (grey and colored circles) loss-of-function (protein LOF), or repeat-associated non-ATG (RAN) protein GOF (RAN GOF). In conventional coding regions (open reading frame, ORF), expansions might cause a protein GOF effect such as the CAG expansion in SCA3 which results in a polyGln (polyQ, brown circles) expansion that either accumulates in cellular inclusions or alters mutant ATXN3 (mATXN3) interactions. Note that RAN translation of the C9ORF72 GGGGCCexp repeat in ALS/FTD produces polyGlyAla (GA), polyGlyPro (GP), and polyGlyArg (GR) dipeptide repeat proteins due to the recognition an RNA secondary structure (red hairpin) by the ribosome (orange ellipsoids)

In this chapter, we focus on the molecular mechanisms underlying microsatellite expansion diseases which include: (1) RNA gain-of-function, in which the repeat-containing RNA transcript expressed from the mutant gene sequesters trans-acting factors; (2) protein loss-of-function due to microsatellite expansion mutations that render a protein non-functional or represses expression; (3) protein gain-of-function whereby a repeat expansion in a coding region contributes a deleterious function to the mutant protein; (4) repeat associated non-ATG (RAN) translation that results in homopolymeric or heteropolymeric peptides translated from the repeat region independent of a normal initiation codon. As an additional layer of complexity, bidirectional transcription through repeat DNA enables multiple pathogenic mechanisms from a single locus. Examples of microsatellite expansion diseases that illustrate each of these molecular mechanisms, and their effect on RNA-binding protein function, are the focus of this chapter.

2 RNA Toxicity and Protein Sequestration in Myotonic Dystrophy

2.1 Tipping the Balance Between Antagonistic RNA Processing Factors

The discovery that the transcription of some microsatellite expansions results in the synthesis of toxic RNAs originated from studies designed to elucidate the molecular etiology of the most frequent form of muscular dystrophy in adults, myotonic dystrophy (dystrophia myotonica, DM) (Echeverria and Cooper 2012; Poulos et al. 2011; Udd and Krahe 2012). Although DM is classified as a muscular dystrophy, multiple tissues are affected including skeletal muscle (hyperexcitability or myotonia, weakness/wasting), the heart (arrhythmias and conduction block), the visual system (dust-like cataracts), the reproductive system, (testicular atrophy), and the brain (hypersomnia, executive dysfunction, and cerebral atrophy). Interestingly, some disease manifestations, such as brain/muscle atrophy and alopecia (premature balding), resemble the normal aging process (Martin 2005). A distinguishing feature of DM is that microsatellite expansions in two unrelated genes cause disease. DM type 1 (DM1) is associated with a CTGexp in the dystrophia myotonica protein kinase (DMPK) gene, which encodes a serine–threonine kinase, while DM type 2 is caused by a CCTGexp in CNBP, the cellular nucleic acid binding protein gene that encodes a factor implicated in both transcription and translation (Brook et al. 1992; Fu et al. 1992; Jansen et al. 1992; Liquori et al. 2001; Mahadevan et al. 1992) (Fig. 10.1b). Unaffected individuals have 5–37 CTG repeats in the DMPK gene and 10–26 CCTG repeats in CNBP. For DM1, CTGexp lengths expand to 50- > 3,000 repeats while in DM2 the CCTGexp ranges from 75 to >11,000 repeats. In contrast to DM2, which does not have a congenital form, very large (>1,000 repeats) DMPK CTGexp mutations also cause congenital DM1 (CDM) characterized by neonatal hypotonia (floppy baby) and intellectual disability.

Several experimental findings argue that DM is an RNA-mediated, or RNA gain-of-function (GOF), disease (Fig. 10.2). The DMPK and CNBP expansions are located in noncoding regions and mutant DMPK and CNBP C(C)UGexp RNAs accumulate in RNA foci in the nucleus (Davis et al. 1997; Margolis et al. 2006; Poulos et al. 2011; Ranum and Cooper 2006; Taneja et al. 1995). Attempts to model DM in mouse transgenic models led to the observation that CTGexp mutations cause DM-relevant phenotypes irrespective of gene context and the degree of pathology correlates with transgene expression level (Mankodi et al. 2000; Sicot and Gomes-Pereira 2013). How does C(C)UGexp RNA expression lead to DM? C(C)UGexp RNAs could: (1) promote formation of nuclear RNA foci that impair normal nuclear functions such as RNA processing and nuclear export pathways; (2) fold into a structure that possesses an inherent dominant-negative function; (3) sequester specific proteins leading to loss-of-function of these factors. Interestingly, the current model for DM pathogenesis implicates all of these mechanisms. Transcription of expanded C(C)TG repeats leads to the synthesis of C(C)UGexp RNAs that gain toxic functions, including the recruitment and sequestration of the muscleblind-like (MBNL) proteins and hyperphosphorylation, and increased levels of CUGBP1 and ETR3-like factors (CELF) (Kuyumcu-Martinez et al. 2007; Miller et al. 2000). The MBNL and CELF proteins are antagonistic alternative splicing factors that function during postnatal development by promoting either adult (MBNL) or fetal (CELF) splicing patterns (Charizanis et al. 2012; Du et al. 2010; Ho et al. 2004; Kanadia et al. 2003a, b; Lin et al. 2006; Philips et al. 1998; Timchenko et al. 1996). Thus, DM pathogenesis involves RNA gain-of-function leading to protein loss, and gain, of function and the persistence of, or reversion to, fetal splicing patterns in adult tissues.

Besides MBNL and CELF, other RNA processing factors have been implicated in DM. HnRNP H binds in vitro to DMPK-derived CUGexp RNAs that also contain a splicing branch point distal to the repeat region and siRNA knockdown of this protein rescues nuclear retention of CUGexp-containing RNAs (Kim et al. 2005). HnRNP H protein levels are also elevated in DM1 myoblasts and hnRNP H or CELF1 overexpression leads to the formation of a repressor complex that inhibits splicing of insulin receptor (IR) exon 11 (Paul et al. 2006). HnRNP H exists in a complex with MBNL1 and 9 other proteins (hnRNP H2, H3, F, A2/B1, K, L, DDX5, DDX17, and DHX9) in normal myoblast extracts, but the stoichiometry of these complexes is altered in DM1 extracts (Paul et al. 2011). Staufen1 (STAU1) is a double-stranded (ds)RNA-binding protein that is also misregulated in DM1 skeletal muscle (Ravel-Chapuis et al. 2012). STAU1 levels increase in human DM1, and CUGexp mouse, skeletal muscle, but it is not sequestered in nuclear CUGexp RNA foci. STAU1 promotes nuclear export and translation of CUGexp mRNAs, and its overexpression rescues abnormal splicing of several pre-mRNAs misspliced in DM, so increased levels of STAU1 may be a compensatory response that ameliorates the DM phenotype.

An additional type of RNA-binding protein appears to modulate the interaction of MBNL proteins with pathogenic C(C)UGexp RNAs. MBNL proteins contain tandem zinc finger (ZnF) domains that bind preferentially to YGCY (Y = pyrimidine) RNA elements (Charizanis et al. 2012; Du et al. 2010; Goers et al. 2010; Wang et al. 2012a). Structural analysis of MBNL ZnFs indicates that they target GC steps and induce an antiparallel orientation on these bound elements due to inter-ZnF linker topology (Teplova and Patel 2008). MBNL proteins do not possess a high affinity for uninterrupted Watson–Crick RNA duplexes but instead show preferential binding to imperfect duplexes, particularly with U–U and C–U/U–C mismatches, and steady-state fluorescence quenching analysis confirms that MBNL1 binding alters helical RNA structures (Fu et al. 2012). Since CUGexp RNAs fold into RNA hairpin structures (Krzyzosiak et al. 2012), it is interesting that the RNA helicase p68/DDX5 acts to modulate MBNL1 binding activity (Laurent et al. 2012). The p68/DDX5 helicase colocalizes with nuclear RNA foci and has a stimulatory effect on MBNL1 binding to both CUGexp and MBNL1 splicing target RNA binding sites.

2.2 Pathogenic RNAs Disrupt Additional Regulatory Pathways Including Pre-miR Processing, mRNA Trafficking, and Translation

While the impact of C(C)UGexp RNA expression on the regulation of alternative splicing has received considerable attention, these toxic RNAs have also been reported to affect other regulatory pathways. In DM1 skeletal muscle biopsies, microRNA (miR) expression patterns are variable between studies with miR-206 overexpression compared to controls (Gambardella et al. 2010) while another study found that miR-1 and miR-335 were upregulated, miRs 29b,c and miR-33 were downregulated, and miR-1, miR133b and miR-206 were mislocalized (Perbellini et al. 2011). Mis-processing of miR-1 occurs in the DM1 heart and this has been linked to MBNL loss-of-function. MBNL1 normally recognizes a UGC motif in the pre-miR-1 loop to facilitate processing because it competes with LIN28, which blocks Dicer processing via ZCCHC11/TUT4-mediated pre-miR-1 uridylation (Rau et al. 2011).

Several studies have provided evidence that muscle weaknesses and wasting in DM may arise from impaired muscle differentiation due to alterations in CELF1 activity and the resulting effects on translation of specific target mRNAs. As mentioned previously, CELF1 protein levels increase in DM and CELF1 overexpression in transgenic mice inhibits myogenesis and causes MEF2A and p21 overexpression (Timchenko et al. 2004). Several studies have suggested that DM2 is caused by CNBP haploinsufficiency (Chen et al. 2007; Huichalaf et al. 2009; Raheem et al. 2010) although other groups report that CNBP protein levels are not altered in this disease (Botta et al. 2006; Margolis et al. 2006; Massa et al. 2010). Interestingly, CNBP binds to the 5′ UTRs of terminal oligopyrimidine (TOP) genes encoding a variety of proteins important for translational regulation, including PABPC1, eIF1a, and eIF2, so CCUGexp expression has been proposed to impact the rate of global protein synthesis (Huichalaf et al. 2009; Schneider-Gold and Timchenko 2010).

RNA trafficking is another critical regulatory pathway that is altered by the expansion mutations in DM. Early work provided evidence that MBNL2 regulates the localized expression of integrin α3 to adhesion complexes (Adereth et al. 2005). More recently, transcriptome analysis has indicated that MBNL proteins regulate mRNA localization in vertebrate and invertebrate (Drosophila) cells resulting in effects on both translation and protein secretion (Wang et al. 2012a). For future studies, it will be important to link distinct disease manifestations of DM to alterations in the localization of specific RNAs.

2.3 Are RNA Foci Important Features of RNA Toxicity?

Nuclear RNA foci are a hallmark pathological feature of DM as well as other neurological diseases that may be linked to RNA-mediated toxicity (e.g., C9-linked ALS/FTD). However, the exact role of RNA foci in the induction and/or maintenance of DM-associated pathology remains controversial (Junghans 2009; Mahadevan 2012; Wojciechowska and Krzyzosiak 2011). RNA foci might be pathological entities that interfere directly with normal nuclear functions or protective protein–RNA complexes designed to block toxic RNAs from reaching the cytoplasm. For DM1, RNA foci are composed of both MBNL proteins and CUGexp RNAs. Early studies in DM1 cells indicated that RNA foci are insoluble aggregates (Davis et al. 1997; Taneja et al. 1995) and thus formation of MBNL-C(C)UGexp complexes might effectively inhibit MBNL dissociation and enhance disease. An argument against this possibility comes from single-particle tracking, fluorescence recovery after photobleaching (FRAP) and fluorescence loss in photobleaching (FLIP) experiments, which indicate that (CUG)145 RNA transcripts expressed in C2C12 myoblasts undergo stochastic aggregation/disaggregation cycles. An early FRAP study provided evidence for rapid exchange between unbound GFP-Mbnl1 and GFP-Mbnl1 bound to aggregates (Ho et al. 2005). However, when GFP-MBNL1 was expressed at levels comparable to endogenous Mbnl1 levels, considerably less freely diffusing GFP-Mbnl1 was observed (Querido et al. 2011). Thus, RNA foci may be dynamic structures but MBNL proteins could be effectively sequestered in non-focal MBNL-C(C)UGexp complexes. In addition, the dynamics of RNA foci formation and MBNL-C(C)UGexp complex stability may be profoundly influenced by the larger repeats in DM1 and DM2 myofibers and neurons compared to other cell types. Irrespective of potential roles in pathogenesis, RNA foci and alternative splicing alternations are important biomarkers of RNA-mediated disease (Cardani et al. 2006, 2009; Nakamori et al. 2013).

3 Dual Disease Mechanisms from a Single Gene: Fragile X and FMR1

3.1 FMR1 Epigenetic Silencing and Misregulated Translation

Fragile X syndrome (FXS) is the most common form of inherited mental retardation with an incidence of approximately 1 in 5,000 individuals. The symptoms of FXS are variable but include intellectual disability, behavioral abnormalities such as attention deficit hyperactivity disorder (ADHD) and autistic behavior, childhood seizures, connective tissue defects, and macroorchidism (abnormally large testes) (Hernandez et al. 2009; McLennan et al. 2011; Nelson et al. 2013). Unlike many of the other microsatellite diseases, which tend to be inherited in an autosomal dominant pattern, FXS is an X-linked disorder and the mutation creates a fragile site, a locus that is prone to gaps or breaks. FXS arises when the FMR1 CGGexp repeat, which ranges between 6 and 52 repeats in unaffected individuals, expands beyond 230 repeats (Fig. 10.1b). These mutant repeats create an expanded CpG island in the DNA that becomes aberrantly methylated and coupled with histone deacetylation (Coffee et al. 1999; Sutcliffe et al. 1992) (Fig. 10.2). Thus, the primary molecular basis for this neurological disease is epigenetic silencing and loss of function (LOF) of the encoded protein, FMRP.

FMRP is an RNA-binding protein that interacts with polyribosomes to regulate local protein synthesis (Bhakar et al. 2012; Wang et al. 2012b). FMRP interacts with mRNA via two K homology (KH) domains, KH1 and KH2, as well as an arginine- and glycine-rich (RGG) domain (Siomi et al. 1993). While the RGG domain has been reported to bind to G quartets, FMRP has also been reported to bind to the coding regions of mRNA independent of these G-rich structures (Darnell et al. 2001, 2011). Tight regulation of synaptic protein synthesis in response to neuronal activity is thought to be essential for neuronal processes such as those involved in learning and memory formation. FMRP located in neurons at synapses binds to mRNA targets to fine-tune translation regulation in an activity-dependent manner (Akins et al. 2009; Bassell and Warren 2008). A combination of FMRP binding site analysis in mouse brain using high-throughput sequencing and cross-linking/immunoprecipitation (HITS-CLIP) with a polyribosome-programmed translation system revealed that FMRP binds to transcripts of presynaptic and postsynaptic proteins to stall ribosomes until the appropriate cellular signals are present (Darnell et al. 2011). The current molecular model of FXS suggests dysregulation of local protein synthesis as a consequence of FMRP loss leads to neuronal dysfunction and neurological manifestations of the disease. Consistent with this model, Fmr1 null mice have increased translation of Fmrp target mRNAs and have similar abnormal neuron morphology and dendritic abnormities as seen in FXS brains (Berman and Willemsen 2009). Fmr1 null mice also display phenotypes that mirror FXS symptoms (Bhogal and Jongens 2010). Similarly, point mutations in the human FMR1 gene that disrupt the FMRP RNA binding domain KH2 also result in FXS symptoms (Nelson et al. 2013). Cumulatively, these observations strongly suggest that loss of FMRP underlies the neuronal dysfunction seen in patients and is the primary molecular cause of FXS.

3.2 Fragile X-Associated Tremor/Ataxia Syndrome and RNA Toxicity

In contrast to FXS, FMR1 transcriptional activity does not decrease in patients with the related disease, fragile X-associated tremor/ataxia syndrome (FXTAS) (Peprah et al. 2010). Whereas FXS results from a full mutation, or (CGG)>230, more moderately sized premutation alleles of (CGG)55–200 cause FXTAS (Fig. 10.1b). Premutation carriers may suffer from a late-onset neurodegenerative disorder now known as FXTAS, with a typical age of onset in the early 60s, or fragile X-associated primary ovarian insufficiency (FXPOI) with infertility prior to age 40 (Hagerman and Hagerman 2004; Leehey and Hagerman 2012). Clinical features of FXTAS differ from those in FXS and include gait ataxia, progressive action tremor, autonomic dysfunction, and neurodegeneration. The distinct clinical outcomes of FXS and FXTAS result from different pathogenic mechanisms since CGGexp repeats in the premutation range do not repress FMR1 expression. On the contrary, FXTAS patients have up to eightfold higher FMR1 RNA levels with normal to slightly reduced levels of FMRP (Kenneson et al. 2001; Peprah et al. 2010; Tassone et al. 2000a, b, c). Current evidence points towards rCGGexp RNA toxicity in FXTAS. As in DM, FMR1 repeat-containing RNA accumulates in the nucleus (Tassone et al. 2004a, b). Furthermore, rCGGexp expression is sufficient to cause nuclear inclusions and neurodegeneration irrespective of its context in the FMR1 gene, as demonstrated by mouse models expressing a CGGexp reporter gene (Hashem et al. 2009). The presence of ribonuclear inclusions led researchers to ask whether FXTAS follows the RNA toxicity paradigm set by DM, in which cellular factors bind to the mutant RNA expansion and are sequestered away from their normal cellular functions (Fig. 10.2, RNA GOF). Indeed, several RNA-binding proteins have been proposed to be sequestered by rCGGexp repeats, including MBNL1, hnRNP A2/B1, hnRNP G, Sam68, and Pur α (Li and Jin 2012; Tassone and Hagerman 2012).

A combination of animal studies and in vitro assays identified hnRNP A2/B1 and Purα as candidate proteins that bind to rCGGexp RNAs and are titrated away from their normal functions (Jin et al. 2007; Muslimov et al. 2011; Sofola et al. 2007). HnRNP A2/B1 is an abundant nuclear RNA-binding protein containing a glycine-rich and two RRM domains and disruption of hnRNP A2/B1 function has been proposed to alter RNA processing in FXTAS. Furthermore, hnRNP A2/B1 tethers another RNA-binding protein to CGGexp RNA, CELF1, previously implicated in DM pathogenesis (Sofola et al. 2007). Overexpression of either hnRNP A2/B1 or CELF1 in a fly model of FXTAS alleviates the neurodegenerative phenotype observed in this model. Furthermore, CGGexp RNA induces mislocalization of hnRNP A2/B1 target RNAs from dendrites to the neuronal cell body, presumably via titration of hnRNP A2/B1 onto the repeats. Delivery of additional hnRNP A2/B1 to neurons restores dendritic localization of target RNA, supporting the proposed role of hnRNP A2/B1 in FXTAS (Muslimov et al. 2011). Purα is an RNA- and DNA-binding protein and has diverse roles in transcription activation, DNA replication and mRNA localization. In the brain, Purα is involved in neuronal cell proliferation and neurodevelopment and plays a role in regulating the dendritic and axonal localization of mRNA targets (Hokkanen et al. 2012; Johnson et al. 2006; Ohashi et al. 2000). Like hnRNP A2/B1, overexpression of Purα suppresses neurodegeneration in the fly model of FXTAS and this protein was identified in nuclear inclusions in FXTAS patient brains, suggesting that sequestration of Pur α may indeed play a role in the disease pathogenesis (Jin et al. 2007).

Another RNA-binding protein found to be present in FXTAS nuclear inclusions is Sam68 (Src-associated substrate during mitosis of 68 kDa) (Sellier et al. 2010). Sam68 is an alternative splicing regulator that has diverse roles in cellular signaling, apoptosis, and neuronal functions. Mice lacking Sam68 display motor coordination defects reminiscent of ataxia (Lukong and Richard 2008; Ramakrishnan and Baltimore 2011; Sellier et al. 2010). Sam68 localizes to nuclear inclusions following transfection of rCGGexp repeats in cell models and, in turn, recruits other RNA-binding proteins, including MBNL1 and hnRNP G (Sellier et al. 2010). Missplicing of pre-mRNA targets of Sam68 has been reported to occur in FXTAS patient brains, and overexpression of a Sam68 mutant lacking CGG binding function in cell culture is sufficient to rescue the splicing abnormalities.

FXTAS-associated rCGGexp nuclear inclusions also contain other proteins, including ubiquitin, heat-shock proteins such as αB-crystallin, lamin A/C, myelin basic protein, and DROSHA/DGCR8 (Greco et al. 2002; Iwahashi et al. 2006; Sellier et al. 2013). DGCR8 is a double-stranded RNA-binding protein and the DROSHA/DGCR8 complex is responsible for the miRNA processing step that converts pri-miRNA into pre-miRNA. DGCR8 binds to rCGGexp RNA and recruits its partner DROSHA into the inclusions. Levels of mature miRNA decrease in FXTAS patient brains, and overexpression of DGCR8 rescues rCGGexp-induced neuronal cell death, suggesting that miRNA dysregulation contributes to neuronal dysfunction in FXTAS (Sellier et al. 2013).

Cumulatively, these studies suggest that multiple RNA-binding proteins may be titrated by rCGGexp in FXTAS resulting in widespread dysregulation of RNA processing and neurodegeneration. Larger CGGexp mutations in FMR1 lead to FMRP loss-of-function and the subsequent loss of translational regulation and synaptic dysfunction seen in FXS. The observation that microsatellite expansions in a noncoding region of a single gene cause several distinct disease outcomes and disrupt multiple molecular pathways has also been observed in DM1/CDM and is likely to be repeated for other unstable microsatellite diseases.

4 Compound Threats: RNA and Protein Toxicity

Microsatellite expansion diseases have been traditionally classified as either or protein- or RNA-mediated. However, these expansions can also pose compound threats highlighted by the recent discoveries of bidirectional transcription through repeat regions and RAN translation (Batra et al. 2010; Zu et al. 2011). Indeed, in vitro studies suggest that CAG•CTG expansion mutations have the potential to produce nine toxic molecules including two pathogenic RNAs (CAGexp, CTGexp) as well as seven homopolymeric proteins produced by conventional and RAN translation (Pearson 2011; Zu et al. 2011). Below, we discuss expansion diseases, including HDL2 and several spinocerebellar ataxias, in which both RNA and protein toxicity have been implicated.

4.1 Huntington Disease-Like 2

Huntington disease-like 2 (HDL2) is a microsatellite expansion disease that is thought to trigger the production of a combination of toxic RNA and protein species (Margolis et al. 2004). HDL2 is a phenocopy of Huntington’s disease (HD), both of which are dominantly inherited diseases characterized by motor coordination defects, dementia, cortical and striatal neurodegeneration, and eventual death within decades of diagnosis (Rudnicki et al. 2008). HD is caused by a CAGexp in the coding region of the Huntingtin (HTT) gene that results in the expression of a HTT protein with an expanded polyglutamine tract (polyGln or polyQ) (Ha and Fung 2012). HD is a member of the CAGexp neurological disorders characterized by neuronal inclusions of ubiquitinated polyGln. PolyGln is thought to be neurotoxic due to several mechanisms including blockage of the ubiquitin proteasome system (UPS) and disruption of mitochondrial function (Finkbeiner 2011; Orr and Zoghbi 2007).

Although the striking clinical similarities between HD and HDL2 suggest that these diseases have a similar pathogenic mechanism, the mutation responsible for HDL2 was originally described as a CTGexp in the junctophilin-3 (JPH3) gene, which is primarily expressed in the brain (Holmes et al. 2001). Whereas unaffected individuals have 6–27 CTG repeats, HDL2 patients have expansions of 41–58 CTG repeats (Fig. 10.1b). The JPH3 protein is involved in formation of a structure that connects the plasma membrane with the endoplasmic reticulum to help control the release of calcium ions during neuronal activity. Loss of JPH3 expression may contribute to HDL2 pathogenesis (Seixas et al. 2012). While the CTGexp mutation is present in alternatively spliced JPH3 exon 2a, variations in transcription and alternative splicing cause the repeat to be located in different transcript regions. JPH3 transcripts that include exon 2a encode a truncated JPH3 isoform while the CTGexp may result in translation of polyleucine and polyalanine tracts in the truncated protein depending on which exon 2a 3′ splice site is utilized. A third exon 2a splice variant places the CTGexp in the 3′UTR suggesting that toxic CTGexp RNAs could be expressed and CTGexp nuclear RNA foci are detectable in HDL2 neurons (Rudnicki et al. 2007). These foci not only contain CTGexp RNA but also MBNL1, the splicing factor implicated in DM, and several MBNL1-dependent missplicing events seen in DM are also observed in HDL2 neurons.

Despite this data, the RNA-mediated model of HDL2 pathogenesis cannot explain the remarkable commonalities between HDL2 and HD. This discrepancy has now been addressed by the discovery of bidirectional transcription through the repeat that creates an antisense transcript containing a CAGexp encoding polyGln as seen in HD (Wilburn et al. 2011). Early histological analysis of HDL2 patient brains also revealed ubiquitin-positive polyGln nuclear inclusions, reminiscent of those observed in HD, independent of RNA foci (Rudnicki et al. 2008; Walker et al. 2002). Cumulatively, these results indicate that toxic RNA and protein (Fig. 10.2, RNA GOF and protein GOF) may interact synergistically to wreak havoc on neuronal pathways although polyGln toxicity appears to play the predominant role in HDL2.

4.2 Spinocerebellar Ataxia Types 3, 8 and 10

A similar interplay of protein and RNA toxicity has been observed in spinocerebellar ataxia types 3, 8, and 10 (SCA3, SCA8, SCA10). The SCAs are a large group of inherited neurological diseases in which neurological dysfunction in the cerebellum and brainstem causes motor coordination defects known as ataxias (Hersheson et al. 2012; Orr 2012a). SCAs can result from mutations in 37 genes (SCA1-37), but intriguingly several of the mutations are microsatellite repeat expansions including coding CAGexp mutations in SCA1, 2, 3, 6, 7, and 17 and noncoding CTGexp in SCA8, CAGexp in SCA12, ATTCTexp in SCA10, TGGAA in SCA31, and GGCCTGexp in SCA36 (Matilla-Duenas et al. 2012; Serrano-Munuera et al. 2013).

4.2.1 SCA3: Toxic ATXN3 Protein and CAGexp RNA

SCA3 (Machado–Joseph disease, MJD) is characterized by late-onset ataxia and neurodegeneration and is the most prevalent SCA worldwide (Orr 2012a). SCA3 is associated with CAG repeats that expand from the normal (12–37) to a pathogenic (61–84) range resulting in an extended polyGln tract in the C-terminus of the ataxin-3 (ATXN3) protein. ATXN3 is a deubiquitinating enzyme (DUB) that is involved in protein homeostasis and transcription and may regulate the expression of genes involved in stress response pathways (Orr 2012a). As with other polyGln diseases, SCA3 neurons contain nuclear inclusions and the extended polyGln tract triggers proteolytic cleavage of mutant ATXN3 protein. These cleavage products are detectable in patient brains and fragment accumulation in transgenic mice is neurotoxic (Goti et al. 2004; Haacke et al. 2006; Paulson et al. 1997; Warrick et al. 1998; Wellington et al. 1998) (Fig. 10.2, protein GOF).

While many studies indicate that SCA3 disease is predominantly caused by the mutant ATXN3 protein (Costa Mdo and Paulson 2012; Orr 2012a), SCA3 rCAGexp RNA has also been implicated in SCA3 pathogenesis. Similar to DM, a toxic RNA-induced protein sequestration hypothesis has been tested in a fly SCA3 model as well as CAGexp-expressing mouse and nematode models (Hsu et al. 2011; Li et al. 2008; Wang et al. 2011). To evaluate the toxicity of CAGexp independent of polyGln, interruption of CAGexp with CAA repeats, which still encode polyGln but do not cause RNA toxicity, mitigates the neurodegeneration observed in the fly SCA3 model (Li et al. 2008). Furthermore, expression of a CAGexp in the 3′UTR of a reporter gene in human cell lines is sufficient to elicit MBNL1-containing nuclear foci and MBNL1-dependent splicing changes previously reported in DM (Mykowska et al. 2011). The RNA-binding protein Orb2, also known as cytoplasmic polyadenylation element (CPE)-binding protein 1 (CPEB1), has been implicated in SCA3 since it is a modifier of neurotoxicity in the SCA3 fly model (Shieh and Bonini 2011). Orb2/CPEB1 contains two RRMs and a zinc finger domain, which it uses to bind the CPE of target mRNAs and regulate translation (Richter 2007). The mRNA targets of Orb2/CPEB1 are enriched for functions related to synaptic plasticity and neuronal growth, suggesting that this protein plays a role in learning and memory. Orb2 overexpression in the SCA3 fly model partially suppresses neurotoxicity hinting that the rCAGexp RNAs might interfere with normal Orb2 functions and lead to the loss of translation regulation in SCA3 neurons (Shieh and Bonini 2011). Similarly, Orb2 colocalizes with CGGexp nuclear foci and is a genetic modifier in a fly FXTAS model (Cziko et al. 2009).

4.2.2 SCA8: Bidirectional Transcription and the Discovery of RAN Translation

SCA8 is characterized by ataxia, slurred speech, and abnormal eye movements (nystagmus) and is associated with a CAG•CTG expansion that undergoes bidirectional transcription to produce a CAGexp from ATXN8 and a CUGexp RNA from the ataxin-8 opposite strand (AXN8OS) gene (Ikeda et al. 2008; Moseley et al. 2006). Unaffected individuals have 16–50 CAG•CTG repeats, whereas patients have expansions of 71–1,300 repeats (Fig. 10.1b). Originally, SCA8 pathogenesis was thought to arise primarily from RNA toxicity due to CUGexp expression from the noncoding AXN8OS gene (Koob et al. 1999). As predicted, CUGexp RNA foci colocalize with MBNL1 in SCA8 neurons and MBNL1 overexpression reverses SCA8-induced splicing errors (Chen et al. 2009; Daughters et al. 2009). However, ATXN8 also encodes a short ATXN8 protein that consists primarily of polyGln and SCA8 neurons contain polyGln and ubiquitin-positive inclusions in the nucleus similar to those observed in HD and other polyGln diseases (Moseley et al. 2006) (Fig. 10.3).

Fig. 10.3
figure 3

Bidirectional transcription across microsatellite expansions may generate both pathogenic RNAs and proteins. Transcription of a CTGexp mutation in the ATXN8OS gene (composed of noncoding exons A–D) produces toxic RNA hairpins (red) that either sequester the MBNL proteins or produce polyLeu (L, pink circles), polyCys (C, turquoise) or polyAla (A, blue) by RAN translation. Transcription of the CAGexp on the opposite strand (ATXN8) results in RAN-generated polyGln (Q, orange), polyAla (A, blue) or polySer (S, green) homopolymer polypeptides or a toxic RNA that might sequester an unknown rCAGexp binding protein (CAGBP). While all six RAN proteins are observed in transfected cells, only polyGln (asterisk), which is initiated with a conventional methionine codon (green) in ATXN8, and polyAla (asterisk) have been detected in vivo

Discovery of a third pathogenic mechanism in SCA8 has profound implications not only on the microsatellite expansion and neurological disease fields but also on our understanding of translational regulatory mechanisms. While investigating the role of polyGln in SCA8, ATXN8 expression constructs were generated to remove the ATG initiation codon upstream of the CAGexp encoding polyGln (Cleary and Ranum 2013; Zu et al. 2011). Surprisingly, removal of the start codon fails to ablate translation and polyGln is still expressed. More surprisingly, the translation of the CAGexp repeat occurs in all three open reading frames (ORFs) leading to co-expression of polyGln, polySer, and polyAla in the absence of frameshifting. Immunological evidence as well as direct protein sequencing supports the existence of this repeat-associated non-ATG, or RAN, translation. All three RAN proteins translated from CAGexp RNAs (polyGln, polyAla, polySer) accumulate in transfected cells and polyGln and polyAla proteins have been shown to accumulate in vivo in mouse and human SCA8 cerebellar Purkinje cells (Cleary and Ranum 2013; Zu et al. 2011) (Figs. 10.2 and 10.3).

Currently, RAN translation has been implicated in four microsatellite expansion diseases including ALS/FTD, DM1, FXTAS as well as SCA8 (Ash et al. 2013; Mori et al. 2013b; Todd et al. 2013; Zu et al. 2011). For FXTAS, considerable evidence supports the hypothesis that this disease is RNA-mediated, but patient neurons also contain large and ubiquitin-positive nuclear inclusions, a hallmark of protein-mediated disease. This paradox led to the recent discovery of RAN translation in FXTAS with in vitro and in vivo evidence for the accumulation of polyglycine (polyGly) containing aggregates (Todd et al. 2013). Additional studies show that polyAla can also be expressed across CGGexp constructs in transfected cells. Suppression of polyGly expression partially rescues CGGexp-induced toxicity and loss of cell viability in FXTAS models and polyGly accumulates in ubiquitinated intranuclear inclusions in patient brains. Thus, some microsatellite expansion diseases, previously labeled as either RNA mediated or protein mediated, involve multiple pathogenic mechanisms and toxic agents (Ling et al. 2013). Clearly, reexamination of patient brain sections from other microsatellite diseases for RAN protein aggregates should now be a priority.

4.2.3 SCA10: AUUCUexp RNA, hnRNP K and Apoptosis

Another RNA-binding protein, hnRNP K (HNRNPK), is implicated in spinocerebellar ataxia type 10 (SCA10). SCA10 is an autosomal dominant disorder characterized by ataxia, seizures, mild peripheral nerve and cognitive impairment. Disease results from an ATTCT repeat expansion in intron 9 of the ATXN10 gene (Teive et al. 2011). This pentanucleotide repeat is relatively short in the normal population (10–32) while SCA10 patients have very large expansions of 800–4,500 repeats (Fig. 10.1b). Although the ATTCTexp does not alter ATXN10 expression levels or splicing (Wakamiya et al. 2006), the AUUCUexp RNA accumulates in nuclear and cytoplasmic foci in SCA10 mouse models and patient fibroblasts (White et al. 2010, 2012). The presence of cytoplasmic foci is a somewhat surprising observation although these structures may be released from the nuclear compartment during mitosis. In vitro, hnRNP K binds to AUUCUexp RNA and colocalizes with AUUCUexp-containing foci (White et al. 2010). HnRNP K contains three KH domains that have a high affinity for C-rich clusters and SCA10 fibroblasts show aberrant splicing of the hnRNP K target β-tropomyosin, suggesting that loss of hnRNP K function contributes to missplicing in SCA10. HnRNPK also acts as a docking protein for multiple factors to modulate chromatin remodeling, transcription, and translation to facilitate cross-talk between multiple gene expression tiers (Bomsztyk et al. 2004). Another study suggests that loss of hnRNP K function triggers apoptosis in AUUCUexp-expressing cells (White et al. 2010). Sequestration of hnRNP K by AUUCUexp RNA blocks its binding to PKCδ, which could allow PKCδ to promote caspase 3-mediated apoptosis. In accordance with this model, overexpression of hnRNP K suppresses the AUUCUexp-induced apoptosis pathway and partially rescues cell viability. Nevertheless, additional cell and animal models must be developed to substantiate the hnRNP K LOF model for SCA10.

5 An Intrinsic Curse: Microsatellite Expansions in RNA-Binding Proteins

5.1 SCA2 and ATXN2

While some expansion mutations indirectly affect RNA-binding protein function via sequestration by repeat expansion RNAs, other diseases stem from expansion mutations in genes encoding RNA-binding proteins. One of the most common types of ataxia, spinocerebellar ataxia type 2 (SCA2), is caused by a CAGexp in ATNX2, the gene encoding the RNA-binding protein ataxin-2 (Orr 2012a; Rub et al. 2013). SCA2 patients suffer from cerebellar ataxia in addition to decreased reflexes and polyneuropathy, which affects peripheral nerves throughout the body. Cell death has been observed in both SCA2 motor neurons and Purkinje cells. The genetic basis of SCA2 is an expansion from 15–24 to 32–200 CAG repeats in exon 1 of ATXN2, which results in a polyGln expansion in the ATXN2 N-terminus (Fig. 10.1b).

ATXN2 has been implicated in various cellular processes including Golgi-mediated transport, calcium regulation, formation of stress granules and P-bodies, and regulation of RNA post-transcriptional modification and translation (Magana et al. 2013). Several lines of evidence suggest that ATXN2 participates directly and indirectly in RNA metabolism. The ATXN2 protein contains an Sm-like (Lsm) domain, also found in factors that function in pre-RNA processing and mRNA decay (Albrecht et al. 2004), as well as a PAM2 domain that mediates ATXN2 binding to PABPC1, the major cytoplasmic poly(A)-binding protein involved in regulating poly(A) tail length, mRNA stability, and translation (Ralser et al. 2005). ATXN2 and PABPC1 colocalize in cells, assemble onto polyribosomes and are recruited to stress granules, which suggests that ATXN2 and PABPC1 cooperate to sequester mRNAs in stress granules to downregulate their translation during cell stress (Magana et al. 2013; Satterfield and Pallanck 2006).

Another RNA-binding protein, originally identified as a novel protein that interacts with ATXN2, is ataxin-2-binding protein (A2BP1, now known as RBFOX1) (Shibata et al. 2000). RBFOX1 is predominantly expressed in the brain and skeletal muscle and belongs to the RBFOX family of alternative splicing factors (Fogel et al. 2012; Gehman et al. 2011; Underwood et al. 2005). Murine Rbfox1 binds directly to pre-mRNA to activate or repress exon inclusion in a position-dependent manner (Sun et al. 2012). Rbfox1 null mice have widespread splicing abnormalities in the brain, as well as neuronal excitability and seizures. A combination of splicing microarrays and binding site analysis using individual-nucleotide resolution cross-linking immunoprecipitation (iCLIP) reveals that Rbfox1 regulates the alternative splicing of gene transcripts involved in membrane excitation and synaptic transmission (Gehman et al. 2011). Similarly, knockdown of RBFOX1 in human neuronal cultures results in splicing abnormalities in genes involved in neuronal development (Fogel et al. 2012). The binding of RBFOX1 to ATXN2 suggests that the two proteins interact to regulate alternative splicing in the brain and that RBFOX1 may play a role in SCA2.

Several molecular mechanisms have been proposed for SCA2 pathogenesis, including formation of polyGln aggregates and deleterious gain-of-function of mutant ATXN2. Mutant ATXN2, but not the wild-type protein, has been observed to interact directly with the calcium channel InsP3R1 in neuronal cultures, which may lead to altered calcium signaling and excitotoxic cell death (Liu et al. 2009). Treatment of the cells with an inhibitor of another calcium channel, the ryanodine receptor, reduces cell death highlighting the calcium regulatory pathway as a potential therapeutic target for SCA2. The expanded polyGln tract in mutant ATXN2 also triggers ATXN2 protein misfolding and polyGln aggregation in ubiquinated intranuclear and cytoplasmic inclusions in SCA2 neurons (Huynh et al. 2000; Koyano et al. 1999). In addition, mutant ATXN2 undergoes proteolytic cleavage and the C-terminal fragment might have an altered function compared to the holoprotein while the N-terminal fragment, containing the polyGln expansion, accumulates in potentially toxic aggregates (Huynh et al. 2000). Interestingly, intermediate-sized ATXN2 expansions are associated with an increased risk for developing amyotrophic lateral sclerosis (ALS) and parkinsonism (Elden et al. 2010; Simon-Sanchez et al. 2005). Supporting a link between ATXN2 and ALS, ATXN2 accumulates in discrete foci, possibly stress granules, in degenerating motor neurons of ALS patients (Li et al. 2013). Furthermore, ATXN2 interacts in an RNA-dependent manner with TDP-43, which plays a key role in ALS pathology, and modifies TDP-43 toxicity in several ALS model systems (Elden et al. 2010). These results underscore the role of ATXN2 in RNA processing and turnover and provide genetic and mechanistic links between ATXN2 and ALS.

5.2 Oculopharyngeal Muscular Dystrophy and PABPN1

Another neuromuscular disorder that arises from a microsatellite expansion in the coding region of an RNA-binding protein is oculopharyngeal muscular dystrophy (OPMD) (Brais 2009; Messaed and Rouleau 2009). OPMD is a late-onset muscular dystrophy characterized by muscle weakness, ptosis (eyelid drooping), and dysphagia (difficulty in swallowing that may lead to aspiration pneumonia). OPMD is caused by a GCNexp in PABPN1, which encodes the major nuclear polyadenylate-binding protein that plays a vital role in pre-mRNA 3′ end processing and poly(A) tail formation (Banerjee et al. 2013).

Initially, OPMD was associated with GCGexp expansions (Brais et al. 1998), but additional mutations, including point mutations, have also been identified that result in a stretch of GCN codons encoding an expanded polyalanine (polyAla) tract in the N-terminus of PABPN1 (Nakamoto et al. 2002; Robinson et al. 2006). Whereas unaffected individuals typically have 10 GCN repeats, patients have 11–17 repeats, resulting in one to seven additional alanine residues (Fig. 10.1b). Similar to other neurological disorders, a central mystery is why a mutation in an essential and ubiquitously expressed protein causes a late-onset disease that primarily affects specific tissues, in this case facial, tongue, and extremity muscles. Histological analysis of patient skeletal muscle biopsies reveals hallmarks of muscular dystrophies, such as changes in muscle fiber size variability, centralized myonuclei, and an overall loss of myofibers with excessive fibrous and fatty tissue (Tome et al. 1997). Moreover, OPMD muscle fibers have mutant PABPN1 intranuclear inclusions that appear as zones of tubular filaments (Tome et al. 1997; Tome and Fardeau 1980).

Controversy exists over the toxicity of these inclusions and whether they induce toxicity or serve a protective role in confining the mutant protein. Several lines of evidence support the latter hypothesis. In an OPMD cell model, the most toxic form of mutant PABPN1 is soluble and not in insoluble inclusions while inclusion disruption increases cell death (Messaed et al. 2007). Using PABPN1 constructs with variable polyAla length, mutant PABPN1 is most toxic when the expansions are longer although these expanded proteins do not form inclusions (Klein et al. 2008). In a similar vein, an OPMD fly model recapitulates the muscle phenotype observed in the disease, despite the fact that it lacks inclusions (Chartier et al. 2006). PABPN1 forms oligomeric structures prior to inclusion formation so soluble and oligomeric forms of mutant PABPN1 may contribute to the cellular toxicity (Raz et al. 2011). In addition to inclusion-independent models of PABPN1 toxicity, several inclusion-dependent models have been proposed. PABPN1 expansion mutations may cause a dominant-negative effect leading to inclusions that, in turn, sequester other cellular factors and disrupts several cellular pathways. While PABPN1 is an integral constituent of the inclusions, other co-localizing components include transcription factors, polyadenylated RNA, RNA-binding proteins (including CELF1), and components of the UPS (Brais 2009; Corbeil-Girard et al. 2005). As noted previously, UPS dysfunction has been implicated in other neurodegenerative diseases (Ciechanover and Brundin 2003; Dennissen et al. 2012) and addition of a proteasome inhibitor in an OPMD cell model exacerbates inclusion formation in a dose-dependent manner while overexpression of heat shock proteins decreases inclusions (Abu-Baker et al. 2003). Furthermore, transcriptome analysis indicates that genes encoding UPS components are misregulated in OPMD (Anvar et al. 2011). The specificity of the UPS involvement in OPMD is highlighted by the observation that the UPS is consistently misregulated in OPMD while other protein degradation pathways are not. These results have led to a disease model in which an age-related decrease in UPS function causes accumulation of misfolded PABPN1 that aggregates, forms inclusions that, in turn, recruit UPS thus further blocking the UPS in OPMD cells (Raz et al. 2013). In addition to blocking the UPS, recruitment of other cellular factors to aggregates may disrupt additional cellular pathways. To support the toxicity of aggregate formation, oral treatment of anti-aggregation drugs attenuates pathology in an OPMD mouse model (Davies et al. 2005, 2006).

Inclusion-dependent models of OPMD propose that a reduction of soluble PABPN1 due to aggregation leads to PABPN1 loss-of-function. PABPN1 consists of an N-terminal region with the polyAla stretch, a central domain with a conserved RNA recognition motif (RRM) and a C-terminal domain that may also be involved in fibril formation (Winter et al. 2012). PABPN1 was originally characterized as a nuclear factor that increases processivity and efficiency of poly(A) addition by poly(A) polymerase. PABPN1 coats the poly(A) tail, possibly to maintain poly(A) tail length that impacts downstream events including nuclear RNA export, mRNA translation and stability (Banerjee et al. 2013). PABPN1 also appears to shuttle from nucleus to cytoplasm, potentially aiding in nuclear export and translation, and then cooperates with, or is replaced by, its cytoplasmic partner PABPC1 (Lemay et al. 2010). Knockdown of PABPN1 in primary mouse myoblasts prepared from extraocular, pharyngeal, and limb muscles causes shortening of poly(A) tails and accumulation of nuclear poly(A) RNA, supporting a role for PABPN1 in nuclear export (Apponi et al. 2010). Furthermore, loss of PABPN1 in myoblasts results in decreased proliferation and differentiation hinting that myogenesis might be compromised in OPMD. Recently, PABPN1 was found to regulate alternative cleavage and polyadenylation (APA) (de Klerk et al. 2012; Jenal et al. 2012). PABPN1 depletion induces selection of proximal cleavage sites and widespread 3′ UTR shortening. Analysis of OPMD patient cells, as well as an OPMD mouse model, also show 3′ UTR shortening suggesting that mutant PABPN1 and PABPN1 depletion are equivalent phenomena that result in APA dysregulation. These results support a model in which PABPN1 loss-of-function in OPMD leads to widespread molecular abnormalities, including shortening of 3′UTRs, decrease in poly(A) tail length, and blockage of poly(A)-mRNA nuclear export, which results in defects in myogenesis and muscle function. In addition to this mRNA-centric view, PABPN1 also promotes the turnover of a class of long noncoding (lnc)RNAs (Beaulieu et al. 2012). Further investigation into the normal functions of PABPN1 are required to clarify how disruption of these processes results in the distinct pathological phenotypes associated this neuromuscular disease.

6 The Horizon in Microsatellite Expansion Disorders

6.1 C9ORF72 Expansions in ALS/FTD

The microsatellite expansion field has received increased attention with the discovery that a G4C2 exp mutation in the C9ORF72 gene is the most common known cause of familial amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) (DeJesus-Hernandez et al. 2011; Renton et al. 2011). Prior to this discovery, there were indications that these two diseases represent different manifestations of a spectrum disorder with common genetics, pathology, and clinical features. Patients diagnosed with ALS, also known as Lou Gehrig’s disease, suffer from muscle wasting and paralysis resulting from denervation caused by the death of upper and lower motor neurons (Van Damme and Robberecht 2013). In contrast, FTD is a condition characterized by loss of neurons in the frontal and temporal lobes of the brain, which results in behavioral changes, dementia, and speech abnormalities or progressive non-fluent aphasia. However, FTD patients may suffer from ALS-like symptoms, such as motor deficits caused by motor neuron dysfunction (Lomen-Hoerth et al. 2002). Conversely, up to 50 % of ALS patients also develop FTD-like symptoms, including dementia and cognitive changes, related to impairment of frontotemporal functions (Giordana et al. 2011).

A positive family history of disease or related symptoms occurs in approximately 10 % of ALS, and 25–50 % of FTD, patients (Graff-Radford and Woodruff 2007; Gros-Louis et al. 2006; Rohrer et al. 2009). These cases are referred to as familial ALS and FTD, while the remainder of cases are sporadic. FTD has been linked to mutations in several genes including microtubule-associated protein tau (MAPT), progranulin (GRN), valosin-containing protein (VCP), charged multivesicular body protein 2B (CHMP2B), and C9ORF72 (Rademakers et al. 2012). Similarly, ALS-linked mutations include superoxide dismutase 1 (SOD1), ataxin 2 (ATXN2), TATA-binding protein associated factor 15 (TAF15), Ewing’s sarcoma breakpoint region 1 (EWSR1), angiogenin (ANG), senataxin (SETX), fused in sarcoma (FUS), TDP-43 (TARDBP), and C9ORF72 (Robberecht and Philips 2013). Many of the genes have been linked to both ALS and FTD but are more commonly associated with one of these diseases (Al-Chalabi et al. 2012). While the overlapping genetics and clinical features support the view of ALS/FTD is a spectrum disorder, even more striking are the commonalities between FTD and ALS molecular pathology. Most FTD and ALS cases have characteristic neuronal cytoplasmic inclusions, which contain certain RNA-binding proteins. For example, mutations in FUS and TDP-43 are less common in FTD than ALS, yet both ALS and FTD patient neurons often contain cytoplasmic aggregates of FUS or TDP-43 even without a corresponding gene mutation (Kwiatkowski et al. 2009; Neumann et al. 2006, 2009; Vance et al. 2009). These findings suggest that common molecular pathways may be disrupted by a number of different ALS/FTD-linked genetic variants. In addition to potential effects on RNA processing, RNA editing may be disrupted in ALS due to loss of ADAR2 responsible for A-to-I RNA editing (Aizawa et al. 2010; Kawahara et al. 2004).

The C9ORF72 expansion mutation is surprisingly common among the genetic variants linked to ALS/FTD, and can cause both sporadic and familial forms of ALS and FTD, strongly supporting the disease spectrum hypothesis. Although the distributions of normal and disease-associated G4C2 repeat lengths are still being actively investigated, thus far it appears that unaffected individuals have <30 repeats, generally with 2–5 repeats, whereas ALS/FTD patients have considerably longer expansions of 700–1,600 repeats (DeJesus-Hernandez et al. 2011; Dobson-Stone et al. 2012; Gijselinck et al. 2012; Majounie et al. 2012; Robberecht and Philips 2013) (Fig. 10.1b). The expansion is located in the first intron of the C9ORF72 gene, but this region is also the promoter of an alternative isoform (Fig. 10.4). The gene encodes three mRNA, and two protein, isoforms. One possibility is that the expansion alters activity of the promoter resulting in decreased gene expression levels. In support of this hypothesis, some studies have found that ALS patients with the G4C2 exp mutation have reduced levels of C9ORF72 mRNA (DeJesus-Hernandez et al. 2011; Gijselinck et al. 2012). This observation suggests that the repeat could interfere with expression of the C9ORF72 protein and cause haploinsufficiency, however no clear reduction in protein levels has been reported in these ALS/FTD patients.

Fig. 10.4
figure 4

Toxic RNAs and peptides in C9ORF72 ALS/FTD. The GGGCCexp mutation (red box) in C9ORF72 intron 1a blocks transcription initiation (arrow) from exon 1b but promotes initiation from exon 1a. RNA processing generates at least two mRNAs and a released intron 1a lariat (or intron 1a might be retained during alternative splicing) that may fold into a G-quadruplex and accumulate in nuclear foci (pink ellipsoid) together with multiple nuclear RNA-binding proteins (RBM45, Purα, SFRS1, hnRNP A2, hnRNP A3). Further repeat expansion may lead to failure of this retention mechanism due to titration of rGGGCCexp-binding factors leading to nuclear export of intron 1a-containing mRNA, RAN translation, and RAN protein aggregation

While the function of the C9ORF72 protein remains obscure, homology searches reveal that it is structurally related to DENN (differentially expressed in normal and neoplasia) domain proteins (Levine et al. 2013). DENN domain proteins are GDP/GTP exchange factors (GEFs) for Rab-GTPases. Therefore, based upon homology, C9ORF72 might be involved in membrane trafficking related to Rab-GTPase switches. C9ORF72 mRNA is expressed in a wide variety of cell types and the C9ORF72 protein is primarily cytoplasmic in neurons (DeJesus-Hernandez et al. 2011). Several immunohistochemical studies have revealed that the G4C2 exp causes no apparent changes in cellular distribution of the C9ORF72 protein and no abnormal C9ORF72 aggregates, suggesting that altered function of this protein may not be the primary molecular mechanism in ALS/FTD (Rademakers et al. 2012). Although patients have no C9ORF72 protein aggregates, many studies have revealed that patients with C9ORF72 mutations have the neuropathological hallmark of TDP-43 positive inclusions in the brain and spinal cord, as seen in other ALS/FTD patients. In addition to this characteristic TDP-43 pathology, these patients have a unique feature of neuronal inclusions that are negative for TDP-43, yet positive for UPS-associated proteins such as ubiquitin, ubiquilins, and sequestosome-1 (SQSTM1), also known as ubiquitin-binding protein p62 (p62) (Al-Sarraj et al. 2011). The presence of UPS suggests aberrant accumulation of unidentified molecules marked for degradation, possibly unique to patients with C9ORF72 mutations.

Several recent reports have shown that RAN translation may be an important pathogenic feature of C9ORF72 ALS/FTD. The G4C2 exp was initially predicted to form hairpins but may also fold into G-quadruplex structures (Ash et al. 2013; Fratta et al. 2012; Reddy et al. 2013) (Fig. 10.4). While G-quadruplexes in 5′UTRs often suppress canonical cap-dependent translation, these structures have also been reported to aid in noncanonical IRES-mediated translation initiation (Morris et al. 2010). To address the question of whether RAN translation initiates from rG4C2 exp RNA, antibodies against the predicted RAN translation products were used to assess the presence of these products in C9ORF72 patient tissues (Ash et al. 2013; Mori et al. 2013b). As hypothesized, RAN translation products are detectable from all three ORFs resulting in the dipeptide-repeat (DPR) proteins poly(Gly-Ala), poly(Gly-Pro), and poly(Gly-Arg). These DPR proteins form insoluble nuclear and cytoplasmic inclusions in C9ORF72 patient neurons and are not detectable in healthy controls, ALS/FTD patients negative for the G4C2 exp mutations or in patients with other neurodegenerative diseases. These inclusions are also distinct from TDP-43 inclusions but colocalize with p62-positive inclusions, suggesting UPS involvement in the turnover of RAN translation products (Ash et al. 2013; Mori et al. 2013b). Whether these protein products are neurotoxic and contribute significantly to ALS/FTD pathology remains a critical question.

Another molecular mechanism potentially involved in C9ORF72 disease pathogenesis is rG4C2 exp toxicity, since these mutant RNAs accumulate in nuclear foci in ALS/FTD tissues and patient-derived iPS cells (Almeida et al. 2013; DeJesus-Hernandez et al. 2011) (Fig. 10.4). While several reports have indicated that the total levels of C9ORF72 mRNA are reduced approximately twofold in patient lymphoblasts (DeJesus-Hernandez et al. 2011), frontal lobe (Gijselinck et al. 2012), and cerebellum (Mori et al. 2013b), levels of both sense and antisense intron 1a-containing RNAs increase seven- to eightfold (Mori et al. 2013b). The latter result suggests that C9ORF72 splicing is impaired and/or unspliced, or partially spliced, pre-mRNAs accumulate in RNA foci in C9ORF72 ALS/FTD, similar to the accumulation of CNBP intron 1 CCUGexp RNAs in DM2 cells (Margolis et al. 2006). Several candidates for factors sequestered by rG4C2 exp RNAs have been proposed including hnRNP A2/B1 (DeJesus-Hernandez et al. 2011), RBM45 (Collins et al. 2012), hnRNP A3 (Mori et al. 2013a), SRSF1 (ASF/SF2) (Reddy et al. 2013), and Purα (Xu et al. 2013). The next critical step for validation of these putative sequestered factors is to demonstrate that they bind to rG4C2 exp RNAs in patient’s cells and affected tissues. Moreover, appropriate loss-of-function mammalian models must be developed that recapitulate distinct ALS/FTD phenotypes.

6.2 Conclusion and Perspective

While the functions of microsatellites within the normal repeat range remain obscure, several common themes have emerged from studies on microsatellite expansions and disease. First, expansions occur frequently, but not exclusively, in GC-rich microsatellites due to their inherent tendency to form imperfect hairpins, quadraplex-like and slipped-stranded structures setting the stage for error-prone DNA replication, recombination and repair. Expansion mechanisms for repeat expansions that are not GC-rich, such as the SCA10 ATTCTexp and the GAAexp in Friedreich’s ataxia (FRDA), a recessively inherited neurological disorder caused by frataxin loss-of-function, are less studied. However, abnormal DNA structures may also be involved in these diseases with potential triplexes in FRDA and replication-associated template switching in SCA10 (Cherng et al. 2011; Lopez Castel et al. 2010; Mirkin 2007). Second, microsatellite expansions preferentially cause neurological and neuromuscular diseases, even when the affected gene is expressed ubiquitously, suggesting that different cell types possess varying sensitivities to these mutations. For example, why does a small increase in a polyalanine stretch in the N-terminal region of the PABPN1 protein cause a late-onset disease characterized by eyelid drooping, swallowing difficulty, and proximal limb weakness? Third, studies on expansion disorders have revealed novel disease mechanisms, including RNA toxicity and RAN translation. RAN translation, or the production of unusual reiterated (homopolymeric and heteropolymeric) proteins from repetitive RNA templates, is reminiscent of the cell-free experiments that employed synthetic RNA polynucleotides to decipher the genetic code (Nirenberg 2004). However, RAN translation produces homopolymeric proteins in all three frames from trinucleotide repeat expansions without frameshifting (Zu et al. 2011). Fourth, bidirectional transcription of repeat expansions generates multiple potentially toxic RNAs as well as conventional mutant, and RAN, proteins. Delineating the relative toxicities of each of these pathogenic entities will be a difficult, although an essential, step towards the development of effective new therapies for these diseases.

Many questions and experimental challenges remain in the unstable microsatellite field. How many additional familial and “sporadic” diseases are caused by tandem repeat expansion mutations? Why are so many of these disorders characterized by a late-onset clinical profile and why are muscle and nervous systems particularly vulnerable? Why are alterations in RNA binding protein functions and aggregation states so prominent in this group of diseases? Are RNA foci and protein inclusions pathogenic, protective, or innocent bystanders? Technological and computational advances in RNA and protein analyses, and the development of more informative cell and animal disease models, should provide additional experimental surprises and mechanistic insights into how microsatellite expansions perturb normal cellular pathways.