Introduction

During mitosis and meiosis, proper kinetochore assembly on the centromere platform is a critical step to attach chromosomes to microtubules and ensure equal separation of sister chromatids and homologous chromosomes, respectively. Failure to do so will cause aneuploidy and even cell lethality. Recently, our understanding on the molecular mechanism of centromere propagation in different model organisms has advanced. The centromeric DNA sequences are not conserved among eukaryotes. Yet, in most eukaryotes, with a few exceptions (Drinnenberg et al. 2014), centromere identity is epigenetically maintained across the cell cycles by inheriting a centromere-specific histone H3 variant, CENP-A, in the centromeric nucleosomes. Thus, CENP-A is present in most functional centromeres. As DNA replicates in the cell, only half of the CENP-A nucleosomes are segregated to each of the replicated DNA (Jansen et al. 2007; Mellone et al. 2011; Shelby et al. 2000). The replenishment of additional CENP-A nucleosomes and the assembly of kinetochore components occur in a cell cycle–dependent manner. Each step of centromere propagation, including centromere licensing, CENP-A deposition, stabilization, and maintenance of CENP-A, is tightly linked with each other and coupled to the cell cycle. Recently, many excellent reviews have discussed the epigenetic regulation on centromere propagation (Black and Cleveland 2011; Black et al. 2010; Gambogi and Black 2019; McKinley and Cheeseman 2016).

However, how the centromere is first established during evolution, after chromosomal rearrangements, after inactivation of original centromeres, or on newly introduced DNA remains less well understood, partly because centromere establishment does not occur as frequently as centromere propagation, and thus is harder to observe and study. There are two types of new centromere that can be established: neocentromere and de novo centromere. Neocentromeres are new centromeres form on ectopic, non-centromeric DNA sequences on endogenous chromosomes or chromosomal fragments (Amor and Choo 2002; Kalitsis and Choo 2012). The occurrence of neocentromeres indicates that CENP-A can be deposited ectopically into non-centromeric chromatin, indicating that the DNA sequence itself is not an absolute prerequisite for centromere formation. On the other hand, de novo centromeres often refer to new centromeres that are formed on newly introduced, originally naked DNA, after it has become chromatinized in the cells (Harrington et al. 1997). Recently, an outstanding review by Barrey and Heun (2017) has discussed the requirement of de novo centromere formation on naked centromeric DNA and conditions that could artificially induce neocentromere formation on endogenous chromosome. We will continue the discussion by describing the naturally occurring neocentromeres, comparing the de novo centromeres and neocentromeres, and discussing the implications and importance of centromere establishment.

In this review, we will first discuss (1) when and where new centromeres establish and (2) the importance of establishing new centromeres in evolution and diseases. We will further introduce (3) the genetic and (4) epigenetic factors identified at new centromeres in normal or pathological settings, and in experimental conditions, and then discuss how they contribute to the formation of new functional centromeres. Lastly, we will speculate (5) the future directions and potential applications of the research on centromere establishment.

When and where does centromere establishment occur?

A comparison between the findings of neocentromeres and de novo centromeres and their implications are summarized in Table 1.

Table 1 Similarities and differences between the findings for neocentromere versus de novo centromere and their implications

Neocentromere formation accompanied by deletion or inactivation of the original centromere

Centromere repositioning in natural speciation

Neocentromere formation can be spontaneous, and most of the time, the outcome is deleterious. Dicentric chromosome can undergo chromosomal breakage or rearrangement (Fig. 1a) (McClintock 1939, 1941). If there is a shift of centromere position (simultaneous neocentromere formation on ectopic location and inactivation of the original centromere), the chromosome will be stable. This event, called centromere repositioning (Montefalcone et al. 1999; Schubert 2018), is proposed to be an important mechanism for promoting karyotype evolution and speciation (Henikoff et al. 2001).

Fig. 1
figure 1

When centromere establishment occurs? Centromere establishment can occur either naturally (ae) or experimentally (fj). In the natural situation, centromere can be established spontaneously (for unknown reason, ac) or under pathologically conditions (d, e). a When a neocentromere forms spontaneously without the inactivation of the original centromere, a dicentric chromosome will be formed which will cause chromosomal breakage and rearrangement (McClintock 1941; McClintock 1939). b On the other hand, if the original endogenous centromere is inactivated, the centromere is said to be repositioned, which is one of the reasons for karyotype evolution and speciation (Amor et al. 2004; Baldini et al. 1993; Carbone et al. 2006; Chmatal et al. 2014; Ferreri et al. 2005; Han et al. 2009; Kobayashi et al. 2008; Montefalcone et al. 1999; Nergadze et al. 2018; Piras et al. 2009; Rocchi and Archidiacono 2006; Rocchi et al. 2012; Ventura et al. 2007; Villasante et al. 2007; Wade et al. 2009; Wong and Choo 2001; Yang et al. 2014). c In maize, a special type of neocentromeres (knobs) can be formed at the terminal of the chromosomes (Kattermann 1939). Knobs lack conserved centromeric proteins CENP-A and CENP-C (Dawe et al. 1999; Dawe and Hiatt 2004) and are structurally different from the classical neocentromeres. In a Rhoades model of meiotic drive, a crossover event results in homologous chromosomes, which are heterozygous at the knob locus. In meiosis I, chromosomes with a knob has an advantage to move more rapidly to the pole. Such orientation is maintained in meiosis II, resulting in chromosomes containing the knob often segregating to the outermost megaspores. The basal outermost megaspore always becomes the functional megaspore (egg). Thus, the chromosome with the knob will be transmitted to the progeny at a higher frequency than the expected Mendelian ratio, which is the underlying mechanism for meiotic drive (Manzanero and Puertas 2003; Rhoades 1942; Sandler and Novitski 1957; Yu et al. 1997) (The schematic is adapted from Kanizay et al. (2013)). d Misregulation and overexpression of CENP-A or other kinetochore proteins is observed in cancer cells, leading to ectopic accumulation of CENP-A (Athwal et al. 2015; Tomonaga et al. 2003). If there is a simultaneous ectopic deposition of CENP-C or CENP-T to the mistargeted CENP-A sites, ectopic kinetochore would be formed (Gascoigne et al. 2011), resulting in a dicentric chromosome. On the other hand, mistargeted CENP-A could also weaken the original centromere by recruiting a subset of centromeric and kinetochore proteins, causing chromosomal instability (Shrestha et al. 2017; Van Hooser et al. 2001). e Chromatid breaks and chromosomal rearrangement in ALP-WDLPS tumor cells could force a neocentromere to form on chromatid fragments lacking endogenous centromeric DNA, i.e., rings and giant rod chromosome (Macchia et al. 2015; Sirvent et al. 2000). f In Drosophila, γ-irradiation generates acentric chromosomal fragments and neocentromeres are established on acentric fragments only if the acentric chromosomal fragments were formerly juxtaposed to the native centromeric DNA (Maggert and Karpen 2001; Williams et al. 1998). g Deletion of the original centromere by homologous recombination results in neocentromere formation at different genomic loci in different models; fission yeast: subtelomeric regions (Ishii et al. 2008); C. albicans: (i) large intergenic regions and repeated DNA at the centromere-distal sites; (ii) centromeric-proximal regions with an absence of specific sequence motifs or repeats; (iii) adjacent to telomeres (Ketel et al. 2009; Thakur and Sanyal 2013); chicken DT40 cells: adjacent to the deleted centromere region (Shang et al. 2013) (see Fig. 2). h Introducing exogenous naked DNA (usually including centromeric DNA) results in the formation of artificial chromosomes with de novo centromere formation on naked DNA, i.e., ScYAC in budding yeast (Murray and Szostak 1983), SpYAC in fission yeast (Hahnenberger et al. 1989), HAC in human cells (Harrington et al. 1997; Ohzeki et al. 2012), and WAC in C. elegans (Mello et al. 1991; Stinchcomb et al. 1985). i Experimental overexpression of CENP-A or CENP-A chaperone induces the ectopic loading CENP-A on chromosome arm in HeLa cells (Lacoste et al. 2014; Yu et al. 2015) and near heterochromatin-euchromatin boundaries in fission yeast (Gonzalez et al. 2014) and in Drosophila S2 cells (Olszak et al. 2011). In fission yeast and Drosophila, accumulation of ectopic CENP-A promotes the development of ectopic kinetochores, resulting in chromosomal breakage and missegregation. j CENP-A is enriched in the DNA damage sites when a DNA double-stranded break is induced on the chromatin by laser or I-SceI cleavage in HeLa cells (Zeitlin et al. 2009). Irradiation-induced DNA damage in human pluripotent stem cells also causes uncoupling and relocalization of CENP-A and CENP-C to separated small foci in the nucleus (Ambartsumyan et al. 2010). It is not known if the ectopic CENP-A can seed the formation of neocentromere. We postulated that if an acentric chromatin fragment is generated by a DNA double-stranded break, the formation of a neocentromere may help to rescue the fragment by restoring the centromeric function. Whether CENP-A has a role in DNA damage repair is still in debate, as DNA breaks induced by multiphoton laser micro-irradiation in human osteosarcoma U2OS cells did not result in CENP-A accumulation on the DNA damage sites (Helfricht et al. 2013)

During primate evolution, centromere repositioning can occur in the absence of chromosomal rearrangement and with synteny maintained along the chromosome (Fig. 1b). During centromere repositioning in the primate evolution, it is hypothesized that the ectopic neocentromeric locus rapidly re-acquires repetitive DNA sequences and maintains a large and complex organization, similar to that at the original centromere within a short evolutionary time (Montefalcone et al. 1999; Rocchi et al. 2012; Ventura et al. 2007), whereas the original centromere is inactivated and partially deleted, with remnants of ancestral centromeric DNA (Baldini et al. 1993; Rocchi and Archidiacono 2006; Villasante et al. 2007; Wong and Choo 2001). It is still not known whether centromere repositioning in primates is first initiated by a partial deletion of the original centromeric DNA, rendering the original centromere inactive, or induced merely by epigenetics (Amor et al. 2004). Phylogenetic, karyotypic, and fluorescence in situ hybridization (FISH) analysis also uncovered centromere repositioning during the evolution of many different mammalian species, such as Macropus (Ferreri et al. 2005), Tokudaia (Kobayashi et al. 2008), Equus (Carbone et al. 2006; Nergadze et al. 2018; Piras et al. 2009; Wade et al. 2009), and Bos (De Lorenzi et al. 2017).

Besides animals, centromere reposition is also observed during the evolution of Cucumis. Cucumber (Cucumis sativus L.) and melon (Cucumis melo L.) are diverged from the same ancestor (Schaefer et al. 2009). By comparing their chromosomes with FISH mapping and next-generation sequencing, a centromere repositioning event was identified in the chromosome C7 of cucumber (Han et al. 2009; Yang et al. 2014). Interestingly, the inactivation of the original centromere was associated with a loss of pericentromeric heterochromatin near the original centromeric DNA (Han et al. 2009).

In disease conditions

Marshall et al. (2008) have summarized more than 90 cases of neocentromere reported in human patients with congenital abnormalities or developmental disorders. In some cases, inactivation of the original centromere occurs by unknown reason without any change of centromere sequences. These functionally “acentric” chromosomes would have been lost, unless neocentromere formation rescues them (Amor et al. 2004; Capozzi et al. 2008). In atypical lipoma and well-differentiated liposarcoma (ALP-WDLPS) cells, chromosomal breakages and rearrangements result in the formation of acentric chromosomal fragments that lack the endogenous centromere, known as supernumerary ring (circular) or giant rod-shaped marker (RGM) chromosomes, which are patchworks from 2 to 6 different chromosomes. A neocentromere can form on a non-centromeric region lacking alpha-satellite DNA in these supernumerary ring or RGM chromosomes, enabling such chromosomes to propagate in cancer cells to potentially confer some selective advantage (Fig. 1e) (Sirvent et al. 2000). The neocentromere may be composed of multiple breakpoints from different chromosomes (Macchia et al. 2015). However, as the neocentromeres are often detected long after they are formed, it is not straightforward to conclude whether the formation of the neocentromere is a cause or a result of chromosomal rearrangements. It is important to clarify the causal relationship between chromosomal rearrangement and neocentromere formation and elucidate how they contribute to tumorigenesis. On the other hand, neocentromere formation without original centromere inactivation will result in dicentric chromosomes, causing chromosomal instability, which are commonly found in cancer cells (Fig. 1d) (Beh et al. 2016; Blom et al. 2010).

Chromosomal rearrangement or experimental deletion of the original centromere

Scientists have induced chromosomal rearrangements or manually deleted the original centromere to mimic the loss of the original centromere and investigated the location preference of neocentromere formation on acentric chromosomes. As demonstrated in Drosophila melanogaster, γ-irradiation induced chromosomal breakage and generated acentric chromosomal fragments, in which some may acquire centromere activity (Williams et al. 1998). Intriguingly, neocentromeres are established on DNA fragments which were formerly juxtaposed to the native centromeric DNA, but not on fragments that were far from the centromere.

Maggert and Karpen (2001) suggested that these acentric fragments may acquire new centromeric activity due to the spreading of centromeric epigenetic marks from the neighboring native centromere (Fig. 1f). Similarly, in a human fibroblast cell line, the pericentromere was removed from chromosome 17 by chromosomal rearrangement, resulting in neocentromere formation with spreading of CENP-A from α-satellites into the chromosome arm (Sullivan et al. 2016).

Besides γ-irradiation–induced chromosomal breakage, neocentromeres can also be induced by deletion of the original centromere through genetic manipulation (Fig. 1g). In a number of cell models, such as Schizosaccharomyces pombe, Candida albicans, and chicken DT40 cells, neocentromeres can be established on non-centromeric regions by deleting the original centromeric sequence through homologous recombination and selecting for cells that retain that specific chromosome using a marker gene. The neocentromere rescues the loss of the chromosome and the lethality of cells. Interestingly, these neocentromeres are preferred to establish on certain chromatin regions, indicating the presence of neocentromere hotspots (Fig. 2). In Schizosaccharomyces pombe, centromeric deletion resulted in neocentromere formation at subtelomeric regions (Ishii et al. 2008). Unlike in fission yeast, Candida albicans lacks precentromeric heterochromatin and has smaller centromeres that occupy about 3–4 kb of DNA sequence. In C. albicans, deletion of the native centromere resulted in the formation of neocentromere at intergenic, non-transcribed regions with flanking repeated DNA either at centromere-distal sites (200–450 kb from the original centromere), including telomeres, or in centromeric-proximal regions with an absence of specific sequence motifs or repeats (Ketel et al. 2009; Thakur and Sanyal 2013). Yet, neocentromere formation in centromeric-proximal regions suppresses expression of the newly integrated maker gene, which replaces the original centromere (Thakur and Sanyal 2013). Similarly, in chicken DT40 cells, 76% of neocentromeres form adjacent to the deleted centromere region and suppress the expression of the genes within the neocentromere region (Shang et al. 2013).

Fig. 2
figure 2

Experimentally induced neocentromere establishment. Neocentromeres can be induced experimentally at chromosome arm, juxtaposed to the native centromere, at the pericentric-euchromatin boundary, at the telomere-euchromatin boundary, or at subtelomeres. Details of DNA

Neocentromeres in plants induce multicentromeres in meiosis

A special type of neocentromere, namely terminal neocentromere, or “knob,” is observed in plant species such as rye and maize at the terminal region of all of the chromosomes (Kattermann 1939), which mainly contains heterochromatin formed on tandem repeated sequence with some homology with the centromeric repeats (Fig. 1c) (Hiatt et al. 2002; Peacock et al. 1981). Knobs are structurally different from the classical neocentromere, as they lack conserved centromeric component CENP-A and inner kinetochore protein CENP-C (Dawe et al. 1999; Dawe and Hiatt 2004), and they interact with spindle microtubules in a lateral way, but not the canonical end-on fashion (Yu et al. 1997). These terminal neocentromeres form in strains carrying the maize abnormal chromosome 10 (Ab10). Ab10 contains an extremely large knob and two neocentromere-activating cassettes, controlling the activities of the terminal neocentromeres. How the cassettes evolved on the Ab10 is still unknown (Hiatt et al. 2002). Unlike the neocentromeres of humans and Drosophila, plant terminal neocentromeres is active only in meiosis, together with the native centromere (Manzanero and Puertas 2003). Yet, these dicentric chromosomes do not cause chromosomal breakage in meiosis. Instead, the microtubules attach laterally to the neocentromeres and attach end-on to the native kinetochores. Together, they move the chromosomes to the opposite poles, and thus these chromosomes with “knobs” are preferentially segregate to gametes (see the section “Meiotic drive and speciation” for details), resulting in a meiotic drive that is not fixed yet, potentially because they may also display deleterious consequences (Manzanero and Puertas 2003; Yu et al. 1997).

Neocentromere formation caused by misregulation of centromeric proteins

In pathological conditions or cancer cells

The precise timing of CENP-A assembly on centromere is regulated through phosphorylation by cell cycle–dependent kinase (CDK) and Polo-like kinase (PLK) and dephosphorylation by protein phosphatase 1 alpha (PP1a) (Yu et al. 2015). HJURP/Smc3 (Camahort et al. 2007; Dunleavy et al. 2009; Foltz et al. 2009), CAL1 (Chen et al. 2014), or LIN-53 (Lee et al. 2016) is the CENP-A chaperone for centromere propagation, which directly binds to CENP-A in humans, yeast, flies, or worms, respectively, at specific cell cycle stages. Under the normal condition, the native centromere in human cells maintains existing centromeric proteins across cell cycles to form a functional kinetochore. CENP-A’s protein level is regulated by E3 ubiquitin and SUMO ligases, which can promote the degradation of ectopically incorporated CENP-A (Canzonetta et al. 2015; Collins et al. 2004; Deyter and Biggins 2014; Gonzalez et al. 2014; Hewawasam et al. 2010; Moreno-Moreno et al. 2006; Ohkuni et al. 2016; Ranjitkar et al. 2010). Misregulation of CENP-A level and/or the degradation system in pathological conditions could cause neocentromere formation. For instance, in colorectal cancer cells lines, CENP-A and other kinetochore proteins are overexpressed and deposited on ectopic regions (Athwal et al. 2015; Tomonaga et al. 2003), potentially promoting neocentromere formation and ectopic kinetochores and driving aneuploidy (Fig. 1d). Yet, ectopic CENP-A does not always induce the formation of kinetochore. In another study, when CENP-A is mistargeted to non-centromeric chromatin by overexpression artificially, the mistargeted CENP-A is not sufficient to induce the formation of a complete kinetochore complex in human cells (Van Hooser et al. 2001). It is possible that ectopic CENP-A needs to reach a high threshold level to drive the neocentromere formation, possibly in conditions in which the CENP-A degradation system is disrupted. In addition, a simultaneous ectopic accumulation of CENP-T or CENP-C to the CENP-A–mistargeted sites may also require for ectopic kinetochore formation (Gascoigne et al. 2011). Besides forming ectopic kinetochore, ectopic CENP-A could sequester away a subset of centromeric and kinetochore proteins from the native centromere and weaken the native kinetochore, leading to chromosomal instability (Shrestha et al. 2017; Van Hooser et al. 2001). Nonetheless, the expression level of CENP-A and other kinetochore proteins has been proposed to be used in prognosis to predict the chromosomal instability (CIN) level of cancers and the response to treatments (Zhang et al. 2016). Intriguingly, overexpression of CENP-A can sometimes be beneficial. When challenged with the DNA-damaging agent camptothecin and ionizing radiation, the survival rate of HeLa cells increased when CENP-A is artificially overexpressed (Lacoste et al. 2014). This may be due to a potential role of CENP-A in DNA damage repair, which will be discussed in the section ““Ectopic” CENP-A in response to DNA damage.”

In cancer cells with CENP-A overexpression, the proportion of CENP-A/H4 to H3/H4 dimers/tetramers increased. Other more promiscuous histone chaperones, like DAXX, could bind to heterotetrameric H3.3-CENP-A-H4 and deposit this complex into ectopic chromatin regions (Lacoste et al. 2014). Interestingly, a study in fission yeast showed that chromosomal segregation defect caused by induced CENP-A overexpression can be suppressed by overexpressing histone H3 or histone H4, as this may restore the stoichiometry of CENP-A/H4 to H3/H4 (Gonzalez et al. 2014). In addition, when CENP-A chaperone HJURP is overexpressed in CENP-A-overexpressed cancer cells, ectopic CENP-A localization is reduced, indicating that a balance level between CENP-A and CENP-A chaperone HJURP could help to minimize CENP-A mislocalization (Nye et al. 2018).

Experimentally

The formation of neocentromere can also be induced experimentally by overexpressing CENP-A (as mentioned in some studies in the section “In pathological conditions or cancer cells”) or CENP-A chaperone in many organisms (Fig. 1i). Studies in HeLa cells found that ectopic CENP-A localization on chromosome arms can be stimulated by CENP-A overexpression (Lacoste et al. 2014; Van Hooser et al. 2001; Yu et al. 2015). Similar to human cells, overexpression of CENP-ACnp1 in fission yeast and CENP-ACID in Drosophila S2 cells also result in ectopic CENP-A localization (Gonzalez et al. 2014; Heun et al. 2006). In fission yeast, overexpressed CENP-A is deposited at non-centromeric regions, preferentially at/near heterochromatin regions localized at nuclear periphery (Fig. 2) (Gonzalez et al. 2014). Overexpression of CENP-A in Drosophila S2 cells leads to ectopic CENP-A deposition with neocentromere formation at heterochromatin-euchromatin boundaries near telomeres and pericentromeric heterochromatin (Fig. 2, Table 4) (Olszak et al. 2011). In budding yeast, when CENP-ACse4 is overexpressed, CENP-ACse4 nucleosomes are found in ectopic sequences known as centromere-like regions (CLRs) (Lefrancois et al. 2013).

Neocentromeres can also be formed by artificial tethering of a kinetochore protein. Tethering LacI-CENP-A, CENP-A chaperone, or kinetochore protein to euchromatin-located LacO arrays (or tethering tetR in a TetO system) is commonly used in many species, e.g., human cells (Logsdon et al. 2015), Drosophila (Mendiburo et al. 2011; Roure et al. 2019), chicken DT40 cells (Gascoigne et al. 2011; Hori et al. 2013), and budding yeast (Ho et al. 2014), for inducing ectopic centromere formation. For instance, in budding yeast, a synthetic kinetochore can be formed by tethering an outer kinetochore protein Ask1 to a LacO array on the chromosome. Interestingly, CENP-ACse4 is also localized to this synthetic kinetochore without a centromere sequence (Ho et al. 2014). This technique allows researchers to investigate the determinants of individual CENP-A domains or kinetochore components in centromere establishment and kinetochore formation.

De novo centromere formation by introducing exogenous naked DNA

Yeast artificial chromosomes

Budding Saccharomyces cerevisiae yeast artificial minichromosome (ScYAC) was the first artificial chromosome constructed (Murray and Szostak 1983). It consists of centromeric DNA sequence, replication origin, and telomere sequences. S. cerevisiae consists of a point centromere. The minimal functional centromeric DNA sequence (~ 125 bp) contains no DNA repeats and is comprised of three conserved DNA domains, namely CDEI, CDEII, and CDEIII. CDEI (~ 8 bp) and CDEIII (~ 25 bp) consist of palindromic motifs. CDEII (~ 90 bp) represents the central domain with approximately 90% of AT content, where CENP-A localizes (Meluh and Koshland 1997). In fungi closely related to S. cerevisiae, their point centromeric DNAs are between 125 and 225 bp, arranged in the conserved CDEI-CDEII-CDEIII structure (Meraldi et al. 2006). Transformation of naked or purified ScYAC DNA results in de novo centromere establishment on the centromeric DNA, and the ~ 125-bp centromeric DNA sequence is sufficient for centromere formation and function (Fig. 1h, Table 2) (Cottarel et al. 1989).

Table 2 Comparison of ACs in different organisms

Fission yeast S. pombe centromeric DNA ranges from 35 to 110 kb, consisting of a 4–7-kb central core (cc) domain flanked by two repetitive domains (imr and otr) (Chikashige et al. 1989; Nakaseko et al. 1986). CENP-A localizes to the central domain (cc and imr repeats), while otr repeats constitute the pericentric heterochromatin (Partridge et al. 2000). Transforming purified DNA with centromere DNA sequence (core and surrounding repeated sequences) results in the formation of fission yeast artificial minichromosome (SpYAC) with de novo centromere establishment (Fig. 1h, Table 2) (Hahnenberger et al. 1989). Importantly, both the central and flanking repetitive domains are required for de novo centromere formation (Folco et al. 2008), in which the otr domain has a role in establishing heterochromatin for de novo CENP-ACnp1 recruitment and kinetochore assembly, but not in maintenance of CENP-ACnp1 (Kagansky et al. 2009).

Human artificial chromosomes

Human regional centromeres, spanning 3–5 Mb, consist of 171-bp α-satellite DNA (alphoid DNA) arranged in tandem arrays (Waye and Willard 1987; Willard and Waye 1987; Wu and Manuelidis 1980). Based on the bottom-up approach, introducing long arrays of α-satellite DNA (minimum 30 kb and usually 60–70 kb) (Okamoto et al. 2007), together with telomeric DNA, a selectable marker and genomic DNA into human HT1080 cells, followed by concatemerization, can result in an assembly of mitotically stable human artificial minichromosomes with de novo centromere formation efficiency at about 30% (Fig. 1h, Table 2) (Harrington et al. 1997; Ohzeki et al. 2012). However, the formation efficiency of human artificial chromosome (HAC) is rather low in other cell lines and requires several weeks of selection (Macnab and Whitehouse 2009). Alternatively, partial deletion of an existing chromosome based on the top-down approach can also lead to the formation of HAC (Table 2) (Katoh et al. 2004; Yang et al. 2000).

De novo centromere formation on worm artificial chromosomes in Caenorhabditis elegans

C. elegans is becoming a useful model organism to study centromere establishment. Instead of forming a single discrete (monocentric) centromeric domain on the chromosome, C. elegans centromere is diffused along the length of the chromosomes (holocentric) (Gassmann et al. 2012; Steiner and Henikoff 2014). CENP-A is de-enriched in active genes expressed in embryos, and germline genes (Gassmann et al. 2012). DNA injected into C. elegans germline resulted in the formation of worm artificial chromosomes (WACs), also known as extrachromosomal arrays (Ex) (Fig. 1h, Table 2) (Mello et al. 1991; Stinchcomb et al. 1985). Importantly, no worm DNA is required for the WAC formation (Mello et al. 1991; Stinchcomb et al. 1985), possibly due to its holocentric nature, which may be less DNA sequence–dependent and more promiscuous for different DNA sequences. These artificial chromosomes form de novo centromeres and segregate accurately at a high frequency within a few cell divisions after AC formation (Yuen et al. 2011). The formation of de novo centromere on the artificial chromosomes is fast, efficient, and robust (Yuen et al. 2011), facilitating the mechanistic investigation of de novo centromere formation and the interplay between centromere function and transcription.

Plant artificial chromosomes

In plants, de novo centromere formation on naked DNA has not been reported (Gaeta et al. 2012; Houben et al. 2008; Mette and Houben 2015). It is suggested that epigenetic marks that direct centromere formation cannot be reestablished on naked DNA introduced into the plant (Birchler 2015; Gaeta et al. 2012). An alternative approach to create plant artificial chromosomes (PACs) with a functional centromere is to truncate the endogenous chromosome by inserting telomere to the chromosome arms (Farr et al. 1991; Yu et al. 2006, 2007). The telomere-mediated chromosomal truncation generated a minichromosome that can be further engineered to carry desired sequences. PACs generated by this method were reported in maize (Yu et al. 2006, 2007), Arabidopsis (Nelson et al. 2011; Teo et al. 2011), barley (Kapusi et al. 2012), rice (Xu et al. 2012), wheat (Yuan et al. 2017), and Brassica (Yan et al. 2017).

“Ectopic” CENP-A in response to DNA damage

CENP-A enrichment has been reported at sites of laser-induced DNA damage or I-SceI cleavage in HeLa cells, together with other centromeric proteins, such as CENP-N, CENP-U, and DNA repair factors (Fig. 1j) (Zeitlin et al. 2009). Irradiation-induced DNA damage also causes relocalization of CENP-A and CENP-C to separated small foci in the nucleus, resulting in uncoupling of kinetochore proteins and kinetochore weakening (Ambartsumyan et al. 2010) (Fig. 1j). Indeed, the increasing CENP-A expression level in cells increases the survival rate in DNA damage (Lacoste et al. 2014; Lawrence et al. 2015; Zeitlin et al. 2009). Reciprocally, when CENP-A is knocked down in human pluripotent stem cells (hPSCs), DNA damage induces significant apoptosis, possibly because the knockdown reduced the availability of new CENP-A protein to rebuild the weakened kinetochores (Ambartsumyan et al. 2010). The relocalization of CENP-A after DNA damage suggested that CENP-A may participate in the DNA double-stranded break (DSB) repair process (Zeitlin et al. 2009). Alternatively, the enrichment of CENP-A at the DNA damage site could also serve as a mechanism to restore the centromeric function on potentially acentric chromatin fragments. Nonetheless, DNA breaks induced by multiphoton laser micro-irradiation in human osteosarcoma U2OS cells did not show CENP-A accumulation but showed CENP-S and CENP-X assembly on the DNA damage sites (Helfricht et al. 2013). The discrepancy may be due to the use of different cell lines or conditions. Interestingly, another study showed that upon DNA damage by etoposide in the murine NIH/3T3 cells, CENP-A is delocalized from centromere to the periphery and inside of the nucleolus, dependent on ATM kinase and p53 (Hedouin et al. 2017). However, this is a rather late event (~ 24 h, versus within minutes in the study of Zeitlin et al. (2009)) and may not participate in DNA repair per se but may be a consequence of prolonged stress and DNA damage response, promoting maintenance of cell cycle arrest as in senescent cells (Hedouin et al. 2017). Whether CENP-A participates in DNA damage and repair is still in debate.

Why is centromere establishment important in evolution and diseases?

Meiotic drive and speciation

Maize terminal neocentromere plays an important role in meiotic drive (Dawe et al. 2018; Lyttle 1991). Maize chromosome with a terminal neocentromere (knob) has an advantage to move more rapidly to the spindle pole during meiosis I. Such orientation is maintained in meiosis II; thus, chromosomes containing the knob are often segregated to the outermost megaspores. As the basal outermost megaspore always becomes the functional megaspore (equivalent to the egg in mammals) instead of non-functional megaspores (equivalent to the polar body in mammals) (Rhoades 1942), the chromosome with a terminal neocentromere will be transmitted to progenies at a higher frequency than the expected Mendelian ratio, and it is proposed to be a driving force of evolution which favors certain karyotypes (Sandler and Novitski 1957). However, the chromosome with a terminal neocentromere is still not fixed in the population, suggesting that it may also somehow reduce fitness, causing a negative selection and balancing out the meiotic drive effect (Buckler et al. 1999). Centromere repositioning may result in stronger centromeres (with increased kinetochore protein levels and/or enhanced kinetochore-microtubule interaction), which could also promote meiotic drive, fixation, and speciation (Chmatal et al. 2014). Meiotic drive is also demonstrated in mice. Chromosomes with more centromeric DNA repeats form a stronger kinetochore, which increases the chance of that chromosome to segregate to the egg during female meiosis (Iwata-Otsubo et al. 2017; Lampson and Black 2017).

Rescuing acentric chromosomes from loss and affecting growth advantage

Neocentromeres are also associated with certain cancer types. As mentioned, supernumerary ring (circular) or giant rod–shaped marker chromosomes are a feature of ALP-WDLPS. These chromosomes lack α-satellite sequences, but the formation of neocentromeres on non-centromeric sequences (Sirvent et al. 2000) is proposed to confer a selective advantage for such chromosomes to stably segregate in cancer cells. On the other hand, human patients ascertained with neocentromeres often contain chromosomal rearrangements, which result in developmental delays or infertility (Klein et al. 2012). Indeed, the neocentromeres may have rescued the rearranged acentric chromosome from being lost and enabled the survival of the individuals (Marshall et al. 2008).

Inducing chromosome instability in dicentric chromosomes

In colorectal and other solid cancer cells, CENP-A and other kinetochore proteins are commonly overexpressed and deposited on ectopic regions (Athwal et al. 2015; Tomonaga et al. 2003), potentially promoting neocentromere formation and ectopic kinetochores and driving chromosomal instability, resulting in aneuploidy. Therefore, it is important to understand the causes and consequences of centromere protein overexpression and the mechanism of neocentromere formation in cancer cells, as it may unravel how tumors progress and reveal novel molecular markers for cancer diagnosis or prognosis (de Wolf and Kops 2017; Zhang et al. 2016).

DNA sequence preference for centromere establishment

AT content

Studies on human neocentromeres formed on ectopic, non-centromeric DNA regions revealed that they prefer to be localized to AT-rich regions, similar to the feature of native centromeric sequences (Lo et al. 2001, b). Despite that, many AT-rich regions in the human genome have not been correlated to neocentromere formation, possibly due to the low number of cases ascertained (Alonso et al. 2007). Human neocentromeres are usually identified on marker chromosomes in patients with developmental delays or in human cancers (Amor and Choo 2002). Whether these examples represent the general features of human neocentromeres is still in question. Therefore, the exact role of AT content in centromere establishment is still not clear. One possibility is that there is a threshold requirement for AT content (Marshall et al. 2008). Such hypothesis is theoretically testable as de novo centromeres can be established on exogenous, purified centromeric DNA introduced into cells as demonstrated in yeast artificial chromosomes (YACs), HACs, and C. elegans WACs. However, native centromeric sequences are usually required for efficient formation of the de novo centromeres in yeast and humans (Tables 2 and 3). Surprisingly, in holocentric C. elegans, any DNA seems to be able to form artificial chromosomes that establish centromeres at high frequency. Using this model, the preference of AT content in centromere establishment on C. elegans artificial chromosomes is further investigated by injecting DNA with different AT contents into the germline. High AT content is preferred for new centromere formation as co-injection with the AT-rich sequence increases WAC segregation frequency (Lin and Yuen, unpublished).

Table 3 Summary of the effects of DNA sequence on centromere establishment

Sequence repetitiveness

Centromere repeats may be derived from retroviral elements

Native centromeres are commonly found embedded in tandem repeat sequences, such as satellite DNA in mammals and retroelements in some plants that last for kilobases to megabases on a chromosome. In humans, retroelements spread across the centromeres of 15 chromosomes (Zahn et al. 2015). Concerted evolution of centromere tandem repeats promotes the homogenization of centromere DNA across chromosomes within a species (Malik and Henikoff 2009; Melters et al. 2013). The widely observed repetitive sequences at centromeric DNA, in paradox with its high sequence diversity among species, raise a question of what the function of repetitive DNA in centromere establishment is.

While the origin of centromeric tandem repeat sequence is unknown, one hypothesis proposes that they may be derived from subtelomeric elements, which may be, in turn, retroelements (Villasante et al. 2007). Centromeric retrotransposons (CRs) were found in almost all grass species (Jiang et al. 2003), where they were predicted to be integrated into the centromeres million years ago. By sequence alignment, the centromeric retrotransposons in rice and maize reveal their common characteristics of the long terminal repeat (LTR), indicating that they may originate from the same ancestor (Nagaki et al. 2005; Wolfgruber et al. 2009). Retrotransposon repeat integration into the established centromere could be a potential driving force to homogenize the centromeric sequences among chromosomes (Sharma et al. 2013). Despite the sequence differences of the centromeric repeats between Arabidopsis lyrata and Arabidopsis thaliana, transformation of CR element (Tal1), that is enriched in Arabidopsis lyrata centromere, could also target and integrate into the centromeric repeats of A. thaliana (Birchler and Presting 2012; Tsukahara et al. 2012), suggesting that the accumulation of newly integrated CR can eventually change the centromere composition. In maize inbreeds, severe selection for favorable centromere-linked genes increases the transcription level at the centromere, which, in turn, forces the centromere to relocate from the maize centromere–specific satellite repeat, CentC, to its flanking regions, followed by integration and enrichment of centromere-specific retrotransposon, maize CR (CRM). Accumulated CR invasion events would restore the repetitiveness of the neocentromere, stabilize it, and drive the replacement of the centromere repeat context in domestic maize (Schneider et al. 2016). However, expanded centromeric retrotransposons can also form multiple active centromeres on a chromosome, which may cause chromosomal instability and drive karyotype change. For example, in a wheat-rye hybrid, rye centromeric transposons were expanded to the euchromatin of rye chromosome and caused chromosomal breakage and re-joining (Guo et al. 2016).

However, CR expansion or integration into the centromere is not the only driving force that triggers centromere evolution. In an oat-maize hybrid, the maize centromere in oat background could expand to their flanking regions from 1.4–1.8 Mb to 3.3–3.8 Mb, which is not caused by CR invasion. The expansion of maize centromere in oat background is restricted by the actively transcribed regions, suggesting that transcription background on chromosome also shapes the centromere composition. One explanation for this kind of expansion is that centromere sizes tend to be relatively uniform within an organism (plant hybrid), regardless of the chromosome size or the origin of the centromeres (Wang et al. 2014). The expansion of maize centromeres in an oat-maize hybrid, to the size comparable to that of oat centromeres, may improve the stability of maize chromosomes in the oat background.

Indeed, retroviral sequences have been found not only in plants but also in human and tammar centromeres (Zahn et al. 2015). In humans, HIV infection and the presence of HIV Tat protein can trigger endogenous, centromere-localized retroviral K111 expression, and relocation to other genomic locations (Contreras-Galindo et al. 2013). Karyotyping of Macropus species shows that chromosomal rearrangements at the centromeres could have been driven by breakage at a centromere-specific retroviral element (Bulazel et al. 2007; Ferreri et al. 2004).

The sequence features of neocentromeres

Unlike endogenous centromeres, neocentromeres identified in chicken DT40 cells (Shang et al. 2013), Drosophila (Olszak et al. 2011; Williams et al. 1998), yeast (Copenhaver et al. 2009), and wheat (Guo et al. 2016) show common localizations at non-repetitive sequences (Figs. 1g and 2, Table 3). Human neocentromeres were also mostly found at non-repetitive regions (Garsed et al. 2014; Marshall et al. 2008; Sullivan et al. 2016), with an exception that localizes at non-centromeric tandem repeats (Hasson et al. 2011). Interestingly, the underlying sequences of neocentromeres also have enriched dyad symmetries and non-B-form DNA, which may facilitate de novo centromere formation (Kasinathan and Henikoff 2018). Most surprisingly, a recent study has shown that a neocentromere sequence, from pseudodicentric-neocentric (PD-NC4) on human chromosome 4 (Amor et al. 2004), which lacks α-satellite repeats, can bypass the need of CENP-B in de novo centromerization and can form HACs (Logsdon et al. 2019).

Features of α-satellite in de novo centromere formation in mammalian cells

A sequence-specific centromeric protein, CENP-B, was found to be required for de novo formation of a HAC or mouse artificial chromosome (MAC) with a functional centromere (Okada et al. 2007). The presence of CENP-B box, a DNA element of the α-satellite DNA, facilitates de novo centromere formation in HACs, and α-satellite DNA with enriched CENP-B box density further enhances the de novo centromere formation efficiency (Tables 2 and 3) (Basu et al. 2005, b; Grimes et al. 2002; Harrington et al. 1997; Ohzeki et al. 2002). On the contrary, CENP-B does not affect endogenous chromosomal stability in mice (Kapoor et al. 1998). Yet, a recent study found that mouse fibroblasts without CENP-B have a chronic elevated rate of chromosomal missegregation (Fachinetti et al. 2015). New characterization approaches allow human α-satellite DNA to be further subdivided, and third-generation long-read DNA sequencing enables more precise mapping of centromeric regions (Jain et al. 2018). By a combination of chromatin immunoprecipitation (ChIP)-seq and de novo HAC formation assay, sequence requirements for centromere formation were further tested. Interestingly, α-satellites that are not associated with CENP-A at endogenous human centromeres, but with high-order repeats (HORs), have centromere formation competency, while those monomeric repeat units do not. It should be noted that both these HORs and monomeric sequences contain CENP-B boxes, which indicates that the repeat structure, rather than the CENP-B box alone, affects centromere formation competency (Hayden et al. 2013). Indeed, sequence abundance and composition within the repeats could affect de novo centromere formation. It is found that sequence variants within HOR reduces CENP-A level and centromere stability (Table 3) (Aldrup-MacDonald et al. 2016). Surprisingly, human alphoid DNA is sufficient to form MACs in mouse cells, despite that their centromeric satellite DNA shares no conserved sequence except for the CENP-B box (Okada et al. 2007; Zeng et al. 2004).

On tracts of repeat sequences, non-B-form DNA secondary structures tend to spontaneously form hairpins, R-loops, and cruciform (Sharma 2011). Non-B-form DNAs have been observed commonly in centromeric DNA from a variety of organisms, including humans, and are recognized by CENP-A chaperones (Kasinathan and Henikoff 2018). A recent study proposed that centromeres are formed at non-B-form DNA, by its dyad symmetries, where inverted repeats of base pair with each other sequences to form cruciform structure. However, some species, such as humans, mice, and budding yeast, that lack dyad symmetries in centromeric DNA recruit sequence-specific DNA-binding proteins, such as CENP-B, to promote the formation of non-B-form DNA secondary structures (Kasinathan and Henikoff 2018).

Sequence specificity

For neocentromeres formed on non-centromeric endogenous DNA, there is no sequence similarity to native centromere sequence, suggesting that DNA sequence is not necessary or sufficient for centromere establishment. It is clear that the localization of neocentromeres can be influenced by the epigenetic environment at the loci. In the section “Epigenetic effect on centromere,” we will discuss the epigenetic environment of centromeric establishment by dissecting the neocentromeric chromatin into two regions as in native centromeres: (1) the core centromeric chromatin where ectopic CENP-A is deposited in the section “Chromatin environment for CENP-A centromeric chromatin establishment” and (2) the flanking chromatin in the section “Flanking chromatin environment for centromeric chromatin establishment.”

Epigenetic effect on centromere

Chromatin environment for CENP-A centromeric chromatin establishment

Histone modifications

A number of studies showed that euchromatic histone marks in the chromatin are favored for centromere establishment (Bergmann et al. 2011; Nakano et al. 2008; Ohzeki et al. 2012, 2015). By tethering different histone modifiers onto a LacO array, which replaces the alternating CENP-B binding site on alphoid DNA in HACs, H3K9ac (a euchromatin mark) is found to be promoting de novo CENP-A incorporation and stable HAC formation, whereas H3K9me3 (the heterochromatin mark) is inhibiting the process (Table 4) (Ohzeki et al. 2012). Yet, heterochromatin marks that are enriched in the flanking regions of centromere could help to determine the centromere boundaries (Sharma et al. 2019). Similarly, histone H3 and H4 acetylations on C. elegans ACs facilitate de novo centromere formation (Zhu et al. 2018) and heterochromatin inhibits the formation of de novo centromeres on AC in C. elegans (Table 4) (Yuen et al. 2011). These findings suggest that a general open chromatin environment is favorable for centromere establishment.

Table 4 Summary of the epigenetic effects on centromere establishment or CENP-A incorporation

In a human chromosome in which an ectopic alphoid DNA from YAC is integrated at the terminal region, the centromere activity is suppressed in the integrated alphoid DNA. However, inducing histone hyperacetylation by treating cells with histone deacetylase inhibitor trichostatin A (TSA) promotes the assembly of CENP-A on the integrated alphoid DNA and the release of minichromosomes from the YAC integration site (Nakano et al. 2003). Besides the role of centromere establishment, euchromatin is involved in maintaining the stability of de novo centromeres. On HACs which have been propagating through mitoses, depletion of a euchromatin mark H3K4me2, by tethering lysine-specific demethylase 1 or 2 (LSD1/2) to HAC centromeres, causes defective incorporation of CENP-A on the alphoid DNA, resulting in a gradual loss of kinetochore function (Table 4) (Bergmann et al. 2011; Molina et al. 2016). It is proposed that H3K4me2 may help to maintain an open chromatin environment for CENP-A centromeric nucleosome incorporation, facilitating centromere propagation (Stimpson and Sullivan 2011).

Surprisingly, in a study using chicken DT40 cells, in which neocentromere formation is induced by deleting the native centromere on chromosome Z by loxP recombination, chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) indicated that the neocentromeres are not associated with either the euchromatin mark H3K4me2 or the heterochromatin mark H3K9me3 (Shang et al. 2013). The reason for these different epigenetic features is unclear but may be related to differences in organisms (Table 4) or differences between de novo centromeres and neocentromeres (Table 1). Moreover, the epigenetic states may dynamically change during the new centromere establishment process or after they have been stabilized; thus, the experimental results may vary depending on whether the chromatin status is analyzed.

DNA modifications

In maize endogenous B chromosome, hypomethylated centromeric DNA is functioning normally, whereas hypermethylation of centromeric DNA causes centromere inactivation (Koo et al. 2011). On a neocentromere on human chromosome mardel(10) (a complex rearrangement of chromosome 10 with a complete loss of alphoid DNA found in a child patient with developmental delay) (Voullaire et al. 1993), there is an increase of DNA methylation as compared to the locus 10q25 on the normal chromosome 10 (Wong et al. 2006). The presence of DNA hypermethylation at the neocentromere, which may recruit chromatin modifiers for the formation of heterochromatin (Lorincz et al. 2004; Stirzaker et al. 2004), seems to contradict the above hypothesis that heterochromatin antagonizes centromere establishment in HACs (Ohzeki et al. 2012). However, DNA hypermethylation at centromeric satellite repeats can prevent centromeres to illicit mitotic recombination in mice, suggesting that DNA hypermethylation may only be essential for centromere maintenance (Jaco et al. 2008). However, the mardel(10) study also identified hypomethylated pockets within the hypermethylated neocentromeric DNA, suggesting that there is still euchromatic property within the neocentromere, which may support the formation or function of the neocentromere (Table 4) (Wong et al. 2006).

Euchromatin (open chromatin) and non-coding RNA

In line with the notion that de novo centromeric chromatin on HACs is euchromatic (Bergmann et al. 2011; Ohzeki et al. 2012), RNA polymerase II–driven transcription on C. elegans ACs also facilitates de novo centromere formation (Zhu et al. 2018). Neocentromeres are also often found in transcriptionally active DNA or genic regions. In the mardel(10) neocentromere, the CENP-A domain was spanned by the actively expressed ATRNL1 gene, and the neocentromere formation did not affect the expression of the gene significantly (Fig. 2, Table 4) (Saffery et al. 2003). A similar observation has also been found on a natural monocentric chromosome of rice, where at least four active genes are interspersed in Cen8 (Nagaki et al. 2004). In HACs, the CENP-A domain is not only restricted to form on the alphoid DNA, but it also spreads over the neighboring, active selection marker genes (Lam et al. 2006). Colorectal cancer cell lines with overexpression of CENP-A contain ectopic incorporation of CENP-A in DNase I–hypersensitive regions and transcription factor binding sites, suggesting that CENP-A prefers to deposit in euchromatic, transcriptionally active regions (Athwal et al. 2015). Indeed, on endogenous human chromosomes, a low level of ectopic CENP-A was assembled at transcriptionally active sites on euchromatin, but this was removed after DNA replication (Nechemia-Arbely et al. 2019).

On the contrary, C. albicans neocentromeres can be found in large intergenic regions, and neocentromere activity can silence active genes nearby (Ketel et al. 2009). Similarly, in fission yeast, neocentromeres are formed on subtelomeric, heterochromatin-euchromatin boundary regions after deletion of endogenous centromeric DNA (Fig. 2, Table 4) (Ishii et al. 2008). A number of nitrogen starvation-induced genes are mapped in these regions, and the gene expression levels remained low even after the removal of nitrogen from the media (Ishii et al. 2008), indicating that neocentromere activity may not be compatible with high gene expression. However, one should note that neocentromeres generated experimentally are often identified under selective pressure. Therefore, whether the gene within or near the neocentromere is silenced or not may depend on the combined growth advantage conferred by maintaining the neocentromere and, concurrently, silencing or expressing the nearby genes.

If neocentromeres are transcriptionally incompatible in C. albicans and fission yeast, why they prefer to establish on the euchromatin at the first place? In particular for species with relatively compact genomes, there are fewer intergenic regions, which might limit the location options for neocentromere to establish. It is possible that euchromatin possesses a more open environment for initial CENP-A deposition, promoting the formation of neocentromere. However, after the new CENP-A domain is established, the underlying transcriptional activity at the neocentromere may be reduced or even silenced, potentially to maintain its stability. In fact, studies in budding yeast have suggested that a fine balanced level of centromeric transcription is essential for normal centromere function (Hill and Bloom 1987; Ling and Yuen 2019a, b; Ohkuni and Kitagawa 2012). Similarly, tethering either a transcriptional activator or silencer to the propagating HAC centromere can disrupt kinetochore structure, resulting in missegregation of HACs (Cardinale et al. 2009; Nakano et al. 2008).

When euchromatin provides a more open chromatin environment for de novo CENP-A deposition, such environment also concomitantly allows the expression of RNA from the neocentromeric region. In fact, transcripts originated from the centromeric core region may have a direct role in centromere function. In S. pombe, the accumulation of RNAPII at the centromeric DNA at G2 phase destabilizes the H3 nucleosomes, which, in turn, facilitates new CENP-A deposition (Shukla et al. 2018). Despite that the central-core centromere sequences from other Schizosaccharomyces yeasts, Schizosaccharomyces octosporus and Schizosaccharomyces cryophilus, hold non-homologous sequences, they are compatible with de novo CENP-A deposition in S. pombe (Tong et al. 2019). Such result indicates that these centromeric DNAs possess some conserved properties, such as the transcriptional landscape that promotes centromere establishment. In native centromeres, centromeric transcripts promote new CENP-A loading in human cells (Bobkov et al. 2018; McNulty et al. 2017), enhance the interaction between inner kinetochore protein CENP-C with the centromeric DNA in human cells (McNulty et al. 2017) and also in maize (Du et al. 2010) (Table 4), and regulate the chromosome passenger complex (CPC) component Aurora B activity and localization (Blower 2016; Ferri et al. 2009; Ideue et al. 2014). Importantly, too much or too little centromeric transcripts are detrimental. In human cells, SUV39 histone methyltransferases interact with α-satellite RNA transcripts to establish heterochromatin and ensure genomic stability (Johnson et al. 2017). Knockdown of centromeric transcripts resulted in mitotic defects in human cells (Ideue et al. 2014) and loss of minichromosome in budding yeast (Ling and Yuen 2019b). On the other hand, overexpression of centromeric transcript in mouse and human cells also caused chromosomal missegregation (Bouzinba-Segard et al. 2006; Chan et al. 2017) and a decrease of centromeric localization of CENP-A, CENP-C, and Aurora B in budding yeast (Ling and Yuen 2019b). Nonetheless, it is not known if centromeric transcripts are participating directly in centromere establishment as these studies are focusing on existing centromere function.

Flanking chromatin environment for centromeric chromatin establishment

Pericentric heterochromatin

The requirement of flanking heterochromatin for centromere establishment is best illustrated in S. pombe. It was observed that both the central core and at least part of the flanking centromeric domains (otr) are required for de novo centromere formation on the SpYAC (Tables 2 and 3) (Folco et al. 2008). Non-coding RNA transcription at the pericentric heterochromatin and processing of the transcripts by the RNA interference pathway are important for the establishment of the pericentric heterochromatin (Allshire and Ekwall 2015a; Folco et al. 2008; Kagansky et al. 2009). In turn, heterochromatin facilitates the initial recruitment of CENP-ACnp1 to naïve DNA introduced, but once the de novo centromere is established, CENP-ACnp1 can be propagated in the absence of heterochromatin (Folco et al. 2008). Other studies suggested that the cohesin complex, associated with the flanking pericentric heterochromatin, helps to maintain sister chromatid cohesion and may also play a role in centromere function (Bernard et al. 2001; Nonaka et al. 2002). In Drosophila S2 cells in which neocentromeres can be generated by overexpression of CENP-A, the ectopic incorporation of CENP-A is always found to be localized at heterochromatin-euchromatin boundaries (either near telomeres or pericentromeric heterochromatin), suggesting that such heterochromatin junction is favorable for ectopic CENP-A loading (Fig. 2) (Olszak et al. 2011). Besides, in a human cell line, BBB, which contains a neocentromere in band 13q33.1 (Warburton et al. 2000), ChIP analysis of the heterochromatin histone mark H3K9me3 has revealed a small block (~ 15 kb) of heterochromatin domain close to the CENP-A domain (Alonso et al. 2010).

With the plausible function of the flanking heterochromatin in centromere establishment, it is predicted that certain heterochromatin histone marks may favor centromere establishment. However, heterochromatin is not necessary for centromeric protein deposition on the alphoid sequence in HACs (Nakashima et al. 2005). In contrast, inhibition of heterochromatin protein 1 (HP1) accelerated the de novo CENP-A incorporation in HACs (Ohzeki et al. 2012) and WACs (Yuen et al. 2011). Yet, heterochromatin formed outside the centrochromatin is essential for establishing a stable HAC (Nakashima et al. 2005). Indeed, other than the neocentromere in BBB cell line, it is not common to see a heterochromatin domain at human neocentromeres, indicating that heterochromatin or its associated cohesion activity is not a prerequisite for centromere establishment per se (Alonso et al. 2010). However, as a result, the neocentric sister chromatids always separate prematurely (Alonso et al. 2010).

Gene context, transcription, and non-coding RNA

Although the importance of flanking heterochromatin on centromere establishment is not well understood, a study in S. pombe has identified the role of pericentromeric transcripts in regulating heterochromatin formation (Verdel et al. 2004), which, in turn, governs de novo centromere formation (Folco et al. 2008). Double-stranded non-coding RNAs (ncRNAs) are found to be expressed from the pericentromeric region (outer repeats, otr), and these ncRNAs are processed into small interfering RNA (siRNA) by Dicer (Dcr1) and the RNA interference (RNAi) effector complex. RNA-induced initiation of transcriptional gene silencing (RITS) complex is recruited onto the pericentromeric region by the siRNAs. The RITS complex then recruits the methyltransferase Clr4 for promoting H3K9 methylation and heterochromatin formation through HP1Swi6 (Verdel et al. 2004; Zhang et al. 2008). If the formation of pericentromeric heterochromatin is disrupted, either by deletion of HP1Swi6 or the RNAi components (Chp1 or Dcr1), CENP-A deposition on the introduced naked centromeric DNA will be disrupted (Folco et al. 2008), indicating that the integrity of pericentromeric heterochromatin is important for centromere establishment and the non-coding pericentromeric transcript is the key initial player behind this process. By tethering H3K9 methyltransferase Clr4 to the region flanking the centromeric DNA, pericentromeric heterochromatin can be formed without RNAi. This synthetic heterochromatin bypasses the need for RNAi in promoting de novo centromere formation (Kagansky et al. 2009).

Although pericentromeric transcripts are also found in many different organisms such as humans and mice, they seem to be in the form of long ncRNA (> 1 kb, up to 5 kb) instead of siRNA as in S. pombe (Jolly et al. 2004; Lu and Gilbert 2007; Rudert et al. 1995; Saksouk et al. 2015; Valgardsdottir et al. 2008). Dicer-deficient cells show defects in cohesin localization in chicken DT40 cells (Fukagawa et al. 2004) and cell proliferation, viability, and differentiation in mice (Kanellopoulou et al. 2005; Maison et al. 2002; Murchison et al. 2005). However, it is unclear whether these pericentromeric lncRNA transcripts are processed by the RNAi machinery as in S. pombe and if RNAi-mediated pericentromeric heterochromatin formation is conserved in all regional monocentromeres. Nonetheless, long pericentromeric transcripts are also able to interact with HP1 in mouse cells (Maison et al. 2011), suggesting that pericentromeric transcripts may still regulate heterochromatin formation in an RNAi-independent manner. It will be exciting to see more studies investigating the function of these long pericentromeric transcripts on centromere establishment in the future.

Future perspectives

Interplay between centromeric and the flanking chromatin in centromere establishment

Taken together, one common characteristic in neocentromeres in multiple models is that they are established on DNA which shows no homology to native centromeric DNA sequences, preferably close to the original centromere location or repeats, indicating that the preference of ectopic centromere is not merely controlled by sequence but also affected by position and chromatin environment. The results from Drosophila, fission yeast, and humans suggest that neocentromeres may prefer to form at the junction of euchromatin and heterochromatin (Fig. 2) (Ishii et al. 2008; Olszak et al. 2011; Wong et al. 2006). A euchromatic chromatin environment allowing transcription is observed in new CENP-A domains in human cells (Bergmann et al. 2011; Ohzeki et al. 2012; Saffery et al. 2003). On the other hand, experimental data in Candida and chicken cells suggest an inverse relationship between robust gene expression and neocentromere formation (Shang et al. 2013; Thakur and Sanyal 2013). Moreover, while the flanking heterochromatin is important for de novo centromere formation in fission yeast (Allshire and Ekwall 2015a; Folco et al. 2008; Kagansky et al. 2009), it is not always a prerequisite for de novo centromere formation (Ohzeki et al. 2012; Yuen et al. 2011). For a cis chromosomal locus that is critical for organism survival, such diversity in structure is surprising. How centromeric and the flanking chromatin environments interplay with each other for centromere establishment still needs further investigation.

Cancer patient–derived cell models for neocentromere identification

Chromosomal instability has been suggested to promote tumorigenesis, but the exact mechanism of how a normal cell progresses to develop into a cancer cell is still unclear. Cancer patient–derived cell lines show a high frequency of genome rearrangements, neochromosomes, and neocentromere (Goodspeed et al. 2016). A recent study uses fluorescent dyes to mark neochromosomes, followed by flow sorting of chromosomes to isolate and enrich neochromosomes and high-throughput sequencing to identify chromosomal rearrangement positions at single base pair resolution (Garsed et al. 2014). These new methods would help to elucidate the composition of neochromosomes, tracing the chromosomal breakage and rearrangement events (Papenfuss and Thomas 2015). Moreover, the long reads from third-generation DNA sequencing technique not only enable researchers to resolve the context of highly repetitive DNA in centromere (Jain et al. 2018; Mahajan et al. 2018) but also more reliably identify indels and chromosomal rearrangements of the tumor genome and describe their architecture in detail.

A combination of centromere protein immunofluorescence, centromere fluorescence in situ hybridization, and multicolor FISH would also enhance the identification of neocentromeres (Beh et al. 2016). In addition, the neocentromere on neochromosomes or RGM chromosomes can also be precisely mapped by high-resolution ChIP-seq using antibodies against centromeric protein, e.g., CENP-A. ChIP-seq data can resolve neocentromere in higher resolution, which provides a complement of the traditional FISH method. Such a combination of cytological and high-throughput sequencing techniques would accelerate the unraveling of the relationship between chromosomal rearrangement and neocentromere formation.

New model organisms for studying centromere establishment on artificial chromosomes

One should note that it is sometimes technically difficult to dissect between the process of centromere establishment and maintenance. Many neocentromeres analyzed were under selective pressure or have already been propagated in the cells for many rounds of mitotic divisions or generations. It will be interesting to analyze the chromatin requirements for neocentromere formation in a live cell condition and in real time, preferably without arbitrary selection, to follow the process of centromere establishment and the fate of the neocentromere, distinguishing this establishment process from the maintenance of existing centromere through mitosis or meiosis. In C. elegans, we can trace the artificial chromosomes specifically by injecting LacO DNA and expressing LacI::GFP fusion protein (Yuen et al. 2011; Zhu et al. 2018), similar to the TetO/TetR system used in HACs (Bergmann et al. 2011, 2012; Ohzeki et al. 2012). The centromere establishment process is more efficient in artificial chromosomes in C. elegans than in HACs. Live-cell imaging without selection can be used to observe the artificial chromosomes before and after de novo centromere formation. The epigenetic environment of WACs can be manipulated as in HACs by tethering histone modifiers (Bergmann et al. 2011, 2012; Ohzeki et al. 2012). With such complementary setups, it is expected that systematic cracking of the genetic and epigenetic requirement for centromere establishment could be accomplished in the future.

Improving HACs for gene therapy

HACs can be used as a vector to deliver the functional copy of a gene for gene therapy. Compared with some virus-based vectors, HACs can host a large gene insert, and HACs do not integrate into the host genome, avoiding insertional mutagenesis. For successful and sustainable gene therapy, it is important for the HAC to maintain a functional centromere and segregate faithfully. To avoid potential side effects from introduced DNA, it is advantageous to minimize the sequence requirement of HAC. In the past, most HACs require the use of alphoid DNA sequence for efficient chromosomal assembly. In the bottom-up approach, this alphoid DNA, usually 60–70 kb in length (Okamoto et al. 2007), is transfected into cells and multimerized and forms HACs with various sizes. The location or the copy number of the gene being assembled in the HAC cannot be predetermined, which may limit the use of such HACs in gene therapy (Kim et al. 2011). In the top-down approach, the alphoid array, telomeric sequences, and gene insertion site are cloned (Kazuki and Oshimura 2011). By elucidating the de novo centromere formation mechanism, smaller de novo centromere-based HACs with a defined centromere sequence may be generated. The most recent HACs contain non-repetitive centromere sequence, bypassing the need of centromeric DNA and CENP-B (Logsdon et al. 2019). This has revolutionized and streamlined the construction and characterization of HACs. By understanding the epigenetic environment for centromere formation, it may even be possible to manipulate the histone environment to facilitate the establishment of centromere in the HAC or improve the stability of the centromere in the HAC. With the improvement of microcell-mediated chromosome transfer (MMCT) for introduction of HACs into the target cells (Brown et al. 2017; Hiratsuka et al. 2015; Liskovykh et al. 2016; Suzuki et al. 2016), we anticipate that these efforts will contribute to the development of an effective vector for gene delivery in therapeutic and research use.