1 A Centromere Refresher

1.1 Why Centromeres?

Centromeres across eukaryotic lineages range from a relatively small, “point” within a chromosome to sprawling and complex structures that vary in size from 10’s of kilobases (KB) to 10’s of megabases (MB) (Pluta et al. 1995; Choo 1997, and reviewed in Bayes and Malik 2008; Brown and O’Neill 2014). While the location of the centromere as a single constriction on a chromosome is found broadly across the major clades of eukaryotes, some eukaryotic species do not harbor a distinct centromere; rather, there are multiple nucleating sites across chromosome arms that act as centromeres (termed holocentricity) (reviewed in Malik and Henikoff 2009). For example, chromosome segregation in Caenorhabditis elegans (and other nematodes) and some insect and plant species is mediated by sites along the entire chromosome.

The diversity in the complexity, density, and distribution of centromere forms across species lies in contrast to the uniform requisite function for the centromere: to serve as the site of kinetochore assembly and spindle attachment during meiosis and mitosis. In essence, the proper functioning of centromeres is a requirement for faithful segregation of a chromosome complement. Any failure in this function has catastrophic consequences for the cell, such as chromosome breakage, and/or loss and cellular breakdown (reviewed in Holland and Cleveland 2009); and, consequently has devastating consequences for the organism, such as infertility, loss of cell cycle control, and aberrant proliferation.

1.2 Why NOT Centromeres?

Despite the deep phylogenetic conservation of centromere function—to mediate kinetochore formation and spindle attachment—the diversity of centromere forms across species has presented a unique challenge in understanding the components that delineate centromere functionality as well as defining the minimal required elements for centromere integrity. For example, the “point centromeres” of the budding yeast, Saccharomyces cerevisiae (Fishel et al. 1988), consist of a 125-bp nucleotide sequence that supports centromere function (Meluh et al. 1998) without the requirement for any other complex repeat structures. The centromeres of the filamentous fungi Neurospora are 175–300 KB and harbor AT-rich, degenerate transposons (reviewed in Smith et al. 2012) whose sequences have been ravaged by a genome defense mechanism known as RIP (‘repeat induced point mutation’)(Smith et al. 2011). Many plants, including maize and grasses, carry satellites and transposons throughout their regional centromeres (Neumann et al. 2011; Gent and Dawe 2012). Like fungi, there does not appear to be any pattern to the repeat structure that defines the functional centromere core in most plants.

The diversity of centromere forms is not only restricted to species-specific genomic arrangements, as several species (e.g., orangutan, horse, chicken) carry a chromosome complement wherein some centromeres are characterized largely by repetitive DNA (satellites and transposable elements) while others are seemingly devoid of a highly repetitive structure (Piras et al. 2010; Shang et al. 2010; Locke et al. 2011). Thus, attempts to reconcile these diverse centromere forms with a generalizable model for centromere function across traditional eukaryotic model organisms (yeast, human, mouse, Arabidopsis, maize) have been largely unsuccessful as no simple rule appears to apply to even the majority of centromere types. Large-scale genome sequencing projects for several model species initially showed promise in capturing the DNA landscape of regional centromeres in species with diverse karyotypes [e.g., human (Schueler et al. 2001), Arabidopsis (Copenhaver et al. 1999; Kumekawa et al. 2000; Hosouchi et al. 2002), rice (Yan and Jiang 2007), wallaby (Renfree et al. 2011), and gibbon (Carbone et al. 2014)]. However, the highly repetitive nature of such centromeres, composed of expansive arrays of simple satellites [ranging in size anywhere from only 0.2 KB to more than 28 MB (Melters et al. 2013)] and other highly repeated sequences, such as transposable elements, has remained a hurdle in defining genomic maps for complex, regional centromeres. As a consequence, complex eukaryotic centromeres have, to date, remained on the “black list” (Miga et al. 2015) of regions refractive to mapping and assembly techniques (Altemose et al. 2014).

Emerging sequence techniques that afford long-range sequence information (e.g., long-read sequencers capable of sequencing >100 KB of contiguous DNA, such as PacBio and Oxford Nanopore; and, synthetic long-read sequencers such 10X Genomics) offer the potential to overcome the technical challenges of dealing with highly repeated regions of genomes. However, the overall scale of the total repeat regions that encompass the functional centromeres within model systems that are subject to genome sequencing efforts is orders of magnitude greater than the long-read sequencing capabilities and has left centromere regions in genome assemblies without the foundation of a linear genetic map in most cases, particularly human and mouse. Confounding this sequencing challenge is the sheer number of centromeres that must be tackled for assembly within any given genome—one per chromosome in a diploid cell—as each centromere contains a unique genomic sequence structure.

2 Centromeric DNA: A Descriptor or Determinant?

Studies aimed at identifying the primary sequence associated with functional centromeric chromatin have revealed a lack of conservation of centromeric sequences, even among closely related species. Thus, the genomic component of eukaryotic centromeres is relatively rapidly evolving despite its conserved role in chromosome segregation (Henikoff et al. 2001). A remarkable computational effort has led to the production of graphical models of human centromere sequences (Miga et al. 2014; Miga 2015; Rosenbloom et al. 2015), bypassing the need for strict linear assembly in the assessment of nascent genetic content. These “maps” do not delineate the order of sequences within any given centromere, yet reveal the diversity of satellites within and among centromeres, supporting earlier work demonstrating that while satellite higher order repeats (HORs) are homogenized through processes such as molecular drive and concerted evolution (Dover et al. 1982), some satellites are in fact distinct amongst different chromosomes (for an example, see Miga et al. 2014).

Defying another common misconception that each chromosome has only one location that can serve as a functional centromere, several human chromosomes have multiple HORs that act as functional centromeric epialleles (Maloney et al. 2012). Within any given chromosome, only one of these epialleles functions as the active centromere, raising the possibility of heterozygotes for different epialleles on the same chromosome pair. As the quality of sequencing and gap-filling for the human genome has increased, novel annotation workflows have also uncovered retroelements scattered throughout active centromere regions across all human chromosomes, within HORs and between epialleles (Miga et al. 2014; Rosenbloom et al. 2015). Indeed, co-option of repetitive elements, including tandem duplications, may be a general aspect of centromere ontogenesis across eukaryotes (Dawe 2003; Wong and Choo 2004; Chueh et al. 2009; O’Neill and Carone 2009; Brown and O’Neill 2010).

Most multicellular eukaryotic centromeres harbor a similar, characteristic repeat structure highly enriched in species-specific satellites (e.g., α satellites in human and minor satellites (miSAT) in mouse). The functional impact of these satellites with respect to kinetochore assembly remains less clear, however, based on multiple lines of evidence. Several studies highlight that centromeric satellites are not sufficient to form kinetochores. Placing an array of satellites in a cell is not the only requisite to form stable artificial chromosomes in all cases (Nakano et al. 2003). In fact, dicentric chromosomes often retain their satellite array but this array no longer forms functional centromeric chromatin (Warburton et al. 1997). Thus, the presence of satellite DNA alone is not the primary determinant for recruiting centromeric histones. As both ectopic centromeres in abnormal chromosomes (e.g., mini- and marker chromosomes, B chromosomes, neocentromeres) and newly formed centromeres that have only recently become fixed within a species (e.g., evolutionary new centromeres, ENC) are often devoid of satellite DNA, the absence of satellite DNA suggests such repeated DNA is also not required for centromere formation (Lo et al. 2001a; Alonso et al. 2007; Hasson et al. 2013).

While the canonical structure of species-specific satellites, and higher order arrays of groups of satellites, is neither sufficient nor required to facilitate centromere assembly, it is a pervasive feature among eukaryotic centromeres (Brown and O’Neill 2014; Plohl et al. 2014). While the fact that centromeres can form and act on genomic regions devoid of satellite DNA has lent support to the notion that centromere identity is likely under epigenetic control (Karpen and Allshire 1997; Henikoff et al. 2001). The contributions such types of genomic sequence have on defining the functional capacity of centromeric chromatin assembly and evolutionary stability of centromeres cannot be discounted. As exemplified in studies of human neocentromeres, DNA satellites alone are not required to attract centromere proteins to ectopic centromeres (e.g., Lo et al. 2001b). In such cases, another type of repeat found in most complex centromeres, retrotransposons, are found to bind the defining centromeric histone, CENP-A, and define the functional centromere (Chueh et al. 2009). These selfish entities may be the progenitors of satellite arrays (e.g., Macas et al. 2009) that experience accretion and diminution as either monomers or large homogenous arrays following centromere stabilization and fixation in a population. Just as the acquisition of repeat expansions may be linked to the ontogeny of a fixed, stable centromere within a species, the primary establishment of a new centromere may be the result of a seeding event from retroelement(s) that progressively generate novel satellites (Dawe 2003; O’Neill et al. 2004; Brown and O’Neill 2010).

Despite the challenges in delineating a finite sequence demarcating centromere functionality across species, the protein cascade that leads to faithful centromere assembly each cell cycle is more clearly defined. The pivotal event is the loading of the centromere specific H3, CENP-A (Fig. 1a), which occurs in late telophase/early G1 in most organisms (Dunleavy et al. 2009) [n.b. in S. pombe, this occurs in S phase (Dunleavy et al. 2007)]. During replication in S phase, the levels of CENP-A are diluted to 1/2 as H3.3 is assembled into centromeric chromatin as a placeholder (Dunleavy et al. 2011). In human, HJURP (holliday junction recognition protein) associates with CENP-A in pre-nucleosomal complexes (Mellone et al. 2009; Foltz et al. 2009; Dunleavy et al. 2009) and chaperones newly synthesized CENP-A to centromeric chromatin following mitosis (telophase/early G1) (Foltz et al. 2009; Dunleavy et al. 2009) when CENP-A loading occurs (Jansen et al. 2007). After mitosis, new CENP-A loading is also facilitated by a priming mechanism involving protein complexes such as hMis18 (Fujita et al. 2007) that prepares the centromeric nucleosome for CENP-A loading (Mellone et al. 2009; Dunleavy et al. 2009). These proteins serve as the pinnacle of the DNA-chromatin interface, yet many other proteins are involved in the coordinated assembly of the kinetochore (described in Parts I and IV of this book).

Fig. 1
figure 1

RNA binding domains of CENP-A and CENP-C. a Linear depiction of the complete CENP-A protein domain structure (Regnier et al. 2003) with amino acids that comprise each domain shown underneath. Amino acids in green were computationally predicted to have an RNA binding capability by Quenet and Dalal (2014). Most of the potential RNA interaction capability lies in the N-terminal tail region [a Alpha helix, L Loop region, CATD Centromere targeting domain (Black et al. 2004), CCBD CENP-C binding domain (Carroll et al. 2010)]. b Linear depiction of the CENP-C protein. The RNA binding domain experimentally validated by Wong et al. (2007) is located between amino acids 422 and 551, the sequence of which is shown below. Amino acids in green are most critical to RNA binding (Wong et al. 2007). The RNA binding domain of CENP-C overlaps with both the DNA binding domain (aqua) and the CATD (gray). Note that Wong et al. (2007) also found evidence for a second RNA binding domain between 552 and 943, but did not isolate the exact region. [DNA binding domain (Yang et al. 1996; Sugimoto et al. 1997; Cohen et al. 2008; Schueler et al. 2010), CATD (Yang et al. 1996), Dimerization domain (Sugimoto et al. 1997), CABD CENP-A binding domain (Trazzi et al. 2009)]

3 Active Transcription at Centromeres—Breaking Down Common Myths and Legends

Challenging another classical description of a eukaryotic centromere as a heterochromatin-rich and transcriptionally inactive region, centromeres are in fact characterized by a complex suite of different chromatin marks supporting active transcription and the production of centromeric noncoding RNAs required for proper centromere formation and function (Wong et al. 2007; Carone et al. 2009, 2013; Ting et al. 2011; Hall et al. 2012; Quenet and Dalal 2014). The chromatin encompassing the centromere core, referred to as “centrochromatin”, is distinct from that of pericentromeres and contains histone modifications associated with transcriptionally active chromatin (Sullivan and Karpen 2004; Eymery et al. 2009; Gopalakrishnan et al. 2009; Bergmann et al. 2011, 2012). CENP-A nucleosomes within centrochromatin are interspersed with modified histones, histone H3 methylation, and dimethylation of lysine 4 and di- and trimethylation of lysine 36 of histone H3 (H3K4me1, H3K4me2, H3K36me2, and H3K36me3). These modified histones are not only permissive to transcription, but differentiate centrochromatin from its neighboring pericentromere, a region that, while also characterized by a high density of repeats, is defined by histone modifications typically associated with transcriptional silencing [(Gopalakrishnan et al. 2009; Roadmap Epigenomics et al. 2015): di- and trimethylation of lysine residues 9 and 27 of histone H3 (H3K9me2, H3K9me3,H3K27me2, and H3K27me3)].

Despite the remarkably different chromatin environments that define the peri- and centromere, active transcription has been detected from both regions in many organisms (Carone et al. 2009; Ugarkovic 2005; Eymery et al. 2009; Brown et al. 2012; Gent and Dawe 2012; Hall et al. 2012; Biscotti et al. 2015; Koo et al. 2016; Rosic and Erhardt 2016). Moreover, a clear balance in transcriptional output from each region is required to maintain chromosome stability (Hall et al. 2012). The types of sequences found to produce transcripts within centromeres includes the same sequences represented in the genomic foundation of a centromere: satellites, retroelements and in some cases active genes located within the boundaries of centrochromatin (e.g., Nagaki et al. 2004).

While prevalent in complex eukaryotic centromeres, the importance of these retroelement and satellite-derived transcripts to centromere function is only recently becoming apparent; chromosome missegregation has been associated with aberrant centromere transcription in animals and satellite RNA has been implicated in the assembly of centromere components CENP-A and -C, in Drosophila, plants, mouse and human (Mejía 2002; Bergmann et al. 2011; Ting et al. 2011; Carone et al. 2013; Quenet and Dalal 2014; Leung et al. 2015).

4 Genome Engineering to Tease Apart the Transcriptional Framework of the Centromere

Advances in techniques that allow manipulation of DNA and its nascent chromatin have been used synergistically to create and modify artificial centromere constructs within living cells. For example, alpha satellite arrays from human have been isolated and, when placed in HT1080 cells, form functional human artificial chromosomes (HACs) (Harrington et al. 1997; Ikeno et al. 1998; Grimes and Monaco 2005; Lam et al. 2006; Maloney et al. 2012). Focused studies of HACs have shown that active transcription at the centromere is essential to their stable propagation (Okamoto et al. 2007; Nakano et al. 2008; Bergmann et al. 2011, 2012; Molina et al. 2016). DNA constructs that form stable HACs incorporate selectable marker genes (i.e., neo and bsr) under strong, constitutive promoters juxtaposed to the alphoid arrays. The ability of the resulting HAC to assemble a functional kinetochore and survive cell division was found to be reliant not only on the overall number of satellites but also on the transcriptional activity of these marker genes (Okamoto et al. 2007).

Human artificial chromosomes modified to carry tetO transcriptional regulatory sequences within alphoid arrays were manipulated to increase or decrease transcriptional output in attempts to define the activity for proper centromere function (Nakano et al. 2008). Switching off transcription from the tetO dramatically diminished propagation of the HACs, but upregulating transcription with tet activators had a similar effect, indicating a balanced level of transcription is a requisite for proper centromere function. Further modifications of HACs by tethering a lysine-specific demethylase (LSD1) to alphoid arrays showed that depletion of H3K4me2 from HAC centromeric chromatin results in a loss of satellite transcription and concomitant reduction in local assembly of newly synthesized CENP-A (Bergmann et al. 2011). HACs targeted to increase H3K9 acetylation, a mark permissive to transcription, showed no effect on kinetochore formation despite such a dramatic change in chromatin state. However, when this chromatin change is coupled with a dramatic increase in transcription, rapid centromere inactivation through loss of CENP-A loading results (Bergmann et al. 2012).

Recently, a study using an inducible ectopic centromere system in Drosophila showed that CENP-A assembly by the Drosophila CENP-A chaperone, CAL1, requires RNA pol II mediated transcription of nascent DNA (Chen et al. 2015). In this ectopic centromere system, transcription is mediated by CAL1’s binding partner, the chromatin remodeling complex FACT (facilitates chromatin transcription) and targets an artificial array of lacO sequences, indicating that the passage of RNA polymerase is required for CENP-A chromatin establishment rather than sequence-specific transcripts (Chen et al. 2015). A study of the primary centromere core sequence in S. pombe necessary and sufficient for CENP-A assembly was conducted wherein the core sequence was shuffled to create a de novo sequence with the same AT content and nucleosome positioning (Catania et al. 2015). This new construct was not able to effectively establish CENP-A chromatin, indicating some sequence features are required for centromere integrity. Notably, the core sequence is actively transcribed via multiple putative transcription start sites, implicating its ability to facilitate transcription (albeit stalled transcription, see below) as a defining feature of this centromere-competent sequence (Catania et al. 2015). As demonstrated by these studies, centromere integrity requires tight control of centromere transcription, suggesting that centromeric DNA sequence identity may not be an absolute requirement, but the ability to facilitate transcription and act as a fundamentally stable and immutable regulatory element(s) is needed.

5 The “How?” of Centromeric RNA: Centromere Transcripts and Protein Interactions

Centromeric RNAs have been hypothesized to perform diverse functions, including establishing and maintaining pericentromeric heterochromatin and recruiting kinetochore proteins to the centrochromatin core. Recent studies have focused on the transcription of the most prevalent centromeric sequence, satellites, with respect to centromere function, however, the identity and functional roles of satellite transcripts in diverse organisms have not been fully elucidated. Several recent studies highlighted below further support the growing evidence that transcription is an integral part of the centromere chromatin assembly cascade; how, when, and what types of transcripts impact centromere assembly are emerging areas of focus in the centromere biology field.

As the centromere is a tightly regulated network of protein and nucleic acid interactions, noncoding transcripts may only directly interact with a subset of this multi-protein network and yet, indirectly impact the function of many centromere and kinetochore proteins when these transcripts are mis-regulated. CENP-C, CENP-A, HJURP, and certain members of the chromosomal passenger complex (CPC) have been implicated as the centromere proteins that directly associate with, or bind to, noncoding RNAs (Wong et al. 2007; Ferri et al. 2009; Du et al. 2010; Carone et al. 2013; Quenet and Dalal 2014; Rosic et al. 2014; Blower 2016).

CENP-A: The first indication that CENP-A can interact with noncoding RNA was discovered in a human neocentromere; LINE-1 elements within the CENP-A binding region of a neocentromere on 10q25 are actively transcribed into a noncoding RNA that incorporates with CENP-A chromatin (Chueh et al. 2009). While less evidence exists for a direct association of centromeric RNA and CENP-A or HJURP, aberrant expression of these transcripts distinctly perturbs CENP-A localization and loading (Fig. 2). For example, overexpression of noncoding RNA from the centromeric retrotransposon KERV in tammar wallaby disrupts proper CENP-A loading into centromeres in late telophase (Carone et al. 2013). A recent study in human showed a more direct contact between CENP-A and RNA when an alpha satellite related RNA sequence was pulled down with the soluble CENP-A/HJURP complex using RNA immunoprecipitation (RIP) (Quenet and Dalal 2014). While this specific noncoding RNA sequence does not match the alpha satellite consensus, nor any other alpha satellite higher order repeat sequence, and any known repeated elements in the assembled or unassembled contigs of the human genome, DNA FISH showed it may reside in only a subset of chromosomes in the human karyotype (Quenet and Dalal 2014). Quenet and Dalal (2014) complemented their study with an in-silico prediction of potential RNA binding sites in CENP-A and HJURP, finding that 79 out of 140 residues in CENP-A and 286 out of 748 residues in HJURP had a capacity for RNA binding. Intriguingly, the entirety of the CENP-A N-terminal tail was predicted to carry RNA-binding capacity (Quenet and Dalal 2014) (Fig. 1a). CENP-A’s N-terminal tail is the most rapidly evolving portion of CENP-A (Henikoff et al. 2001; Malik and Henikoff 2001), and while it is known to be required for CENP-A stabilization at the centromere (Logsdon et al. 2015), its exact function remains elusive. Given a putative role in RNA interaction, it is possible the vast differences in amino acid sequence and overall length of the N-terminal region among species (Henikoff et al. 2001) could be to enable permissive interaction with a variety of transcripts that emanate from the rapidly evolving, underlying DNA.

Fig. 2
figure 2

Cell cycle variation of centromere transcription. Overview of when in the cell cycle centromeric transcripts have been identified in different model species in relation to critical assembly events defining centromere integrity. The cell cycle is indicated (color). Top The type of transcript for each species [from top, human (Wong et al. 2007; Chueh et al. 2009; Quenet and Dalal 2014), wallaby (Carone et al. 2013), plants (Topp et al. 2004; Koo et al. 2016), and reviewed in Gent and Dawe (2012), Drosophila (Rosic et al. 2014), frog (Blower 2016), mouse (Lu and Gilbert 2007; Ferri et al. 2009), yeast (Chen et al. 2008; Choi et al. 2011; Catania et al. 2015)] is indicated. Below the dashed line are the transcripts where the timing of transcription is known. A thinner bar represents a lower level of transcription while a thicker bar represents a higher level. Gray lines (above the dashed line) indicate that while transcripts have been identified, it is not known when in the cell cycle transcription is initiated. Above each line are the protein associations known for transcripts. Bottom The timing of protein cascade components relative to the cell cycle. Black bar indicates constitutive association with the centromere (FACT, RNA Pol II). Relevant timing of loading of the H3.3 placeholder, CPC recruitment and CENP-A loading components are indicated. Specific CENP-A assembly times are indicated for each group of species

CENP-C: CENP-C contains an experimentally validated, distinct RNA binding domain (Wong et al. 2007; Du et al. 2010) (Fig. 1b). Interestingly, the RNA binding domain of CENP-C shares homology to the RNA binding hinge domain region of the pericentromeric heterochromatin proteins HP1 alpha, beta, and gamma (Muchardt et al. 2002; Du et al. 2010). In human, CENP-C associates with single-stranded (ss) alpha satellite transcripts both in vitro and in vivo, and is lost from centromeres upon alpha satellite depletion along with the CPC proteins, INCENP and Survivin (Wong et al. 2007). DNA binding of maize CENP-C is stabilized by a ssRNA in vitro, although this stabilization appears to be independent of the ssRNA sequence (Du et al. 2010) (Fig. 2). This permissive binding in maize is in contrast to human CENP-C that showed a preferential association with alpha satellite ssRNA in competition assays with tRNA, rRNA, and mouse pericentric satellite (Wong et al. 2007). In Drosophila, the X chromosome specific satellite, SAT III, is actively transcribed into long noncoding transcripts that localize to centromeres and associate with CENP-C (Rosic et al. 2014) (Fig. 2). Upon CENP-C depletion, SAT III RNA signal is greatly reduced at centromeres, implying a similar interaction between CENP-C and RNA in Drosophila, as in human and maize. When depleting SAT III RNA levels, both newly synthesized CENP-C and CENP-A showed a reduction in centromeric signal that was also observed to cascade up through the kinetochore proteins (Rosic et al. 2014). SAT III-depleted cells also suffered errors in mitosis, including lagging chromosomes and micronuclei formation; notably, all chromosomes were susceptible to mitotic defects, indicating that SAT III RNAs, despite originating from the X chromosome, can act in trans to target the autosomes (Rosic et al. 2014). Lagging chromosomes with reduced CENP-C signal were also prevalent in human cells after RNA pol II inhibition (Chan et al. 2012).

The CPC: Ostensibly, a single centromeric noncoding transcript does not have to bind to just a single protein. When considering the fact that CENP-A assembly is facilitated by a chaperone, HJURP, and that CENP-A and CENP-C are both required for centromere integrity, it is probable that these transcripts contact multiple proteins that are associated with one another. Such RNA interactions may even serve to tether protein complexes together, or to scaffold these complexes to other components of the surrounding chromatin environment. The multi-protein interaction between Aurora-B, Dasra-a/Borealin, Survivin, and Incenp composes the CPC. The CPC aids in mitosis as a phosphorylating agent at chromosome arms, the inner centromere, and mitotic spindles (reviewed in (Carmena et al. 2012). While at the inner centromere, the CPC plays a key role in bipolar spindle attachment by acting as a “sensor” of connection between the centromere and the spindle (Lampson and Cheeseman 2011). Both cen-RNAs and spindle-enriched RNAs are known to congregate with the CPC (Ferri et al. 2009; Ideue et al. 2014; Jambhekar et al. 2014) (Fig. 2). In fact, there is direct binding of RNA to the CPC and this interaction is responsible for inner centromere localization (Jambhekar et al. 2014; Blower 2016), and is required for CPC activation (Blower 2016). Despite the fact that multiple proteins form the CPC, Ferri et al. (2009) showed that miSAT transcripts in mouse are a key partner with CPC proteins Aurora-B, INCENP and Survivin at the onset of mitosis. In Xenopus extracts, however, RNA binding was identified that is required for CPC localization, but this binding is specific only to the proteins Aurora-B and Dasra-A, and not INCENP, Survivin, and XMAP215 (Blower 2016).

In Xenopus, among the Aurora-B binding RNAs is a ~170 nt centromeric transcript (fcr1, frog centromeric repeat1) that, similar to the sat III RNA in Drosophila (Rosic et al. 2014), is only found on a subset of CENP-A defined centromeres within the karyotype (Edwards and Murray 2005). The active transcription of fcr1 is required for Aurora-B localization to the inner centromere of mitotic chromosomes and may act initially on the fcr-1-native chromosomes before diffusing to other centromeres (Blower 2016) (Fig. 2).

6 The “When?” of Centromere Transcription: It is an Around the Clock Job

The emergence of studies on RNA, transcription, and centromere function since the pivotal studies in yeast (Volpe et al. 2003), plants (Topp et al. 2004) and human neocentromeres (Wong et al. 2007) has led to accumulating evidence that transcription is a requirement for centromere function and cell stability. However, the timing of this transcription and a delineation of whether specific transcript sequences, or simply the act of transcription itself, are required for centromere integrity are not known. Studies in several model systems have begun to highlight the intricacies of transcriptional events at the centromere throughout the cell cycle, with a particular emphasis on mitotic transcription (Fig. 2).

Cell Cycle Phase G1: Late telophase/early G1 is the pivotal time in mammalian cells when CENP-A is actively loaded into centromeric chromatin. Thus, the impact of active transcription at this point in the cell cycle may have direct bearing on the ability of CENP-A to assemble functional centromeric chromatin. The 1.3 KB human centromeric transcript described above (Quenet and Dalal 2014) is transcribed by RNA pol II from late telophase into early G1, coincident with the timing of CENP-A deposition by its chaperone HJURP (Fig. 2). This RNA transcript was found to interact with these proteins, suggesting its capacity to aid in CENP-A nucleosome assembly. The transcription of one of two groups of transcripts that emanate from mouse pericentromeric gamma satellites was detected in late G1 and proceeded through mid-S phase (Lu and Gilbert 2007) (Fig. 2). This species of RNA did not show a discrete size range and transcription of this species decreased at a time coincident with the replication of pericentric heterochromatin. This RNA class is a large, heterogeneous group of gamma satellite transcripts whose transcriptional timing may simply be the result of cryptic transcription (Lu and Gilbert 2007). However, given that the appearance of these satellite transcripts is CDK (cyclin dependent kinase)-dependent, and thus is only in cells committed to proliferation, the transcripts may be required for heterochromatin reassembly at the replication fork (Lu and Gilbert 2007).

Cell Cycle Phase S and G2: Double-stranded RNAs are actively transcribed from the pericentric repeats dh and dg from within the centromeres of the yeast, Shizosaccharomyces pombe, and are subsequently processed into small interfering RNAs (siRNAs) (Volpe et al. 2002, 2003). These siRNAs are bound to a complex of proteins (the RNA-induced initiation of transcriptional gene silencing, RITS) and result in targeted H3 lysine-9 methylation through RNA interference (Volpe et al. 2002, 2003). Moreover, the disruption of RNAi components compromises heterochromatin assembly (Volpe et al. 2002) and CENP-A deposition (Folco et al. 2008), linking a small RNA component to centromere function.

The forward strand of centromeric repeats is transcribed in S phase in S. pombe, thought to be the major initiating point for siRNA production (Chen et al. 2008). siRNA levels are stably detected throughout the cell cycle but increased in S/G2, coincident with RITS complex accumulation and transcript processing (Fig. 2). Similar to that proposed for gamma satellites in mouse, these transcripts may be the result of cryptic or spurious transcription yet are required for the appropriate establishment of heterochromatin (Chen et al. 2008).

While active CENP-A loading occurs in late telophase/early G1 in mammals, H3.3 is loaded into centrochromatin during S phase, likely as a placeholder for CENP-A replenishment after mitosis (Dunleavy et al. 2011). Thus, transcription could be involved in the eviction of the placeholder (Catania et al. 2015; Chen et al. 2015; Chen and Mellone 2016). A recent study in S. pombe showed that RNA pol II stalls at centromeric DNA and that the level of stalling is directly related to the level of subsequent CENP-A nucleosome assembly. In this yeast species, CENP-A assembly occurs in S phase and G2 (Takahashi et al. 2005; Dunleavy et al. 2007; Takayama et al. 2008); an increase in RNA pol II stalling, and concomitantly permissive but “low-quality” transcription, may lead to increased CENP-A chromatin through either increased eviction rates for placeholder H3 or through demarcation of a specific environment conducive to efficient CENP-A assembly (Catania et al. 2015).

The timing of detectable increases in RNAPII stalling in S phase is coincident with DNA replication. Centromere transcription at this phase of the cell cycle [forward strands in S. pombe (Chen et al. 2008) and pericentromeric satellites in mouse (Lu and Gilbert 2007)] may position replication forks and RNA pol II to collide more often, increasing the rate of RNA pol II stalling (reviewed in Brown et al. 2012). RNA pol II stalling and collisions subsequently increase the generation of large, stable R-loop formation (Reddy et al. 2011). While the RNA–DNA hybrids present in R-loops are typically small and transient, it is notable that the transcriptional framework of the centromere may present an increase in stable R-loops in S phase since increases in R-loops are linked to phosphorylation of H3S10, a marker of subsequent entry into mitosis (M phase) (Castellano-Pozo et al. 2013; Oestergaard and Lisby 2016). Chen et al (2015) showed that FACT is required for CENP-A assembly and FACT has been previously shown to both travel in a complex with the CENP-A chaperone (Foltz et al. 2006) and localize to centromeres throughout the cell cycle (Okada et al. 2009). The presence of FACT at centromeres and in complex with key centromere assembly components supports hypotheses that FACT is an essential part of the chromatin remodeling involved in facilitating CENP-A nucleosome assembly. For example, FACT may be required to destabilize nucleosomes (Hondele and Ladurner 2013) and subsequently facilitate the transcription of centromere sequences preceding assembly of new CENP-A nucleosomes (Chen et al. 2015; Chen and Mellone 2016). Recently, FACT was found to bind the inner kinetochore proteins of the CENP-T/W complex and thus may promote the CENP-T/W deposition at centromeres (Prendergast et al. 2016). In fungi it appears that FACT is necessary to prevent spurious, ectopic incorporation of CENP-A rather than performing a function in primary CENP-A assembly at the centromere (Deyter and Biggins 2014). FACT is known to solve R-loop and replication mediated conflicts in both human and yeast (Herrera-Moyano et al. 2014). Furthermore, low levels of central core transcripts are detected in yeast cells due to an increase in RNA pol II stalling (Catania et al. 2015). We thus propose that FACT may also be present at centromeres to resolve the resulting R-loops prior to progression into mitosis. The multiple, possible roles for FACT in CENP-A assembly are not mutually exclusive, rather are an example of the dynamic state of the centromere during different phases of the cell cycle.

Cell Cycle Phase Mitosis: Whether or not centromeric transcription occurs during mitosis has been hotly debated since the majority of transcription factors and RNA polymerases are not associated with chromosomes during mitosis (Gottesfeld and Forbes 1997). However, several pieces of evidence suggest centromeric transcription occurs during M phase of the cell cycle (Liu 2016), indicating persistent transcription during mitosis may serve to further distinguish the centromere from chromosome arms. First, RNA pol II is present at kinetochores in M phase (Chan et al. 2012). Second, transcription run-on assays have shown that RNA pol II is capable of transcribing centromeres of mitotic chromosomes (Liu 2016). Third, inhibition of RNA pol II by alpha-amanitin reduces centromeric cohesion and CENP-C localization (Chan et al. 2012; Liu et al. 2015). The defective cohesion following RNA polymerase inhibition was caused by a mislocalization of Sgo1, a protein found at the inner centromere that protects cohesin during mitosis (Liu et al. 2015). Collectively, these data suggest that transcription and/or the transcripts themselves may play a functional role in a time-specific manner (i.e., restricted to specific phases of the cell cycle).

A long noncoding RNA was recently identified as actively transcribed from the centromeres of Xenopus egg extracts during mitosis; moreover, these transcripts serve a functional role via binding Aurora-B, a component of the CPC, and are required for normal kinetochore-centromere attachment (Blower 2016). Xenopus egg extracts have also led to the discovery that RNA processing assists in kinetochore and spindle assembly (Grenfell et al. 2016); inhibition of the spliceosome in egg extracts leads to an accumulation of long centromeric transcripts and a failure to efficiently recruit CENP-A, CENP-C and NDC80. What this study shows is that transcription is not only active during mitosis, further supporting the growing body of evidence indicating this occurs, but that transcripts undergo processing during this phase, contradicting the theory that RNA processing is repressed during mitosis (Shin and Manley 2002).

Centromere transcript processing is a recurring theme observed for a broad set of centromere transcripts, although the relationship of the processing machinery and/or processed RNA products to centromere integrity is less clear (with the notable exception of S. pombe, see above). Early work in mouse cells showed that a loss of DICER activity, the RNAse III enzyme that facilitates small RNA processing, results in an accumulation of larger satellite transcripts (Kanellopoulou et al. 2005). This finding implies that when DICER is available, these larger satellite transcripts are not detected as they are processed into smaller RNAs. The implication that DICER is involved in this RNA processing would also indicate these small RNAs are <40 nt based on the catalytic activity of the enzyme (MacRae et al. 2007). Other sizes of centromeric RNAs have been uncovered that are likely independent of DICER. For example, small RNAs have been detected for the maize centromere satellite CentC (Du et al. 2010) and from the wallaby centromeric retroelement KERV (Carone et al. 2009). Both of these small RNAs were also found to participate in the centromere assembly cascade: CentC associated with CENP-C directly (Du et al. 2010); a reduction in KERV small RNAs resulted in a loss of CENP-A assembly in late telophase (Carone et al. 2013) (Fig. 2). However, any connection between these types of processed, small RNA transcripts and the necessary RNA processing machinery observed in Xenopus is unexplored; likewise the timing of transcription for these processed RNAs is currently unknown.

7 Conclusion

Mounting evidence suggests that RNA species and the act of transcription itself is required for the recruitment and/or establishment of centromere and kinetochore proteins. Thus, it is clear that transcription at the centromere, and in neighboring pericentromeric heterochromatin, is functionally distinct yet critical throughout the entirety of the cell cycle. Studies are now beginning to reveal that centromeric transcripts and accompanying chromatin changes are required for different components of the centromere assembly cascade at different points in the cell cycle. Over the last decade, RNA species derived from the centromeric regions of many model species have been uncovered, as have some of their interacting partners. Closer examination of these transcripts, and indeed of the subregions of the centromere previously considered devoid of transcriptional activity, has made it clear that both the act of transcription itself and the resulting transcripts are critical to ensuring proper CENP-A assembly and faithful chromosome segregation. In the same manner that the comparative approach revealed that centromeres evolve rapidly and are established through an epigenetic framework, the use of diverse eukaryotic systems will afford the development of a model to describe key remaining questions, such as: how do specific transcripts mediate centromere function in cis and/or in trans? is splicing or RNA processing a requisite in forming functional transcripts across different cell cycles and among different species? and, how does this transcriptional landscape impact centromere evolution in both a phylogenetic and disease context? In fact, one of the reasons the myth of the centromere as “silent chromatin” prevailed for so long is that centromere transcripts have been difficult to capture and characterize. As highlighted herein, it is the very reason these transcripts are difficult to capture (e.g., RNA pol II stalling, RNA processing, protein-RNA binding) that holds the key to how centromere transcription, and their transcripts, likely function in maintaining centromere integrity.