Background

Centromeres play the key role in chromosomal segregation and transmission. Centromeric chromatin is marked by the presence of CENH3 (CENP-A in humans), a centromere-specific H3 variant. In most eukaryotes, centromeres contain highly repetitive DNA sequences, mostly satellite repeats and transposable elements (Henikoff et al. 2001; Jiang et al. 2003). The repetitive DNA sequences, however, are not required or essential for centromere function because centromeres can be activated de novo in genomic regions devoid of centromeric repeats, which are often called “neocentromeres.” Neocentromeres were first discovered in humans (Voullaire et al. 1993) and have since been reported in a number of plant and animal species (Fu et al. 2013; Liu et al. 2015; Marshall et al. 2008; Nasuda et al. 2005; Rocchi et al. 2012; Sullivan and Schwartz 1995; Tolomeo et al. 2017; Topp et al. 2009). Neocentromeres are functional centromeres and are marked by the presence of CENH3/CENP-A as well as all other essential centromeric proteins (Saffery et al. 2000; Scott and Sullivan 2014). It is hypothesized that repeat-based centromeres are evolved from neocentromeres via invasion of repetitive DNA sequences (Gong et al. 2012; Zhang et al. 2014).

More than 100 neocentromeres have been reported in humans (Marshall et al. 2008; Scott and Sullivan 2014). Most human neocentromeres were associated with significantly rearranged “marker chromosomes” discovered in human clinical cases. However, a few human neocentromeres were found on normal chromosomes that also contained an inactivated native centromere (Marshall et al. 2008; Rocchi et al. 2012), which are also described as “centromere repositioning” events. Some genomic regions are particularly prone for human neocentromere activation (Marshall et al. 2008), suggesting that neocentromere seeding is not a random process, yet it is not clear what genomic or epigenomic feature(s), if any, of the cognate chromosomal regions determine the seeding of neocentromeres. Interestingly, some human neocentromeres are located in regions that are predicted to be ancestral centromeres. These ancestral centromeres were inactivated during mammalian chromosomal evolution (Rocchi et al. 2012). Thus, regions associated with inactivated centromeres may maintain intrinsic genomic/epigenomic features that are favorable for neocentromere formation.

Neocentromeres can be recovered when the native centromere of a eukaryotic chromosome is conditionally deleted. Artificial neocentromere activation experiments were conducted in several model animal species, including Schizosaccharomyces pombe (Ishii et al. 2008), Candida albicans (Ketel et al. 2009; Thakur and Sanyal 2013), and chicken (Gallus gallus) (Shang et al. 2013). Over 100 neocentromeres were induced at various locations on chicken chromosomes Z and 5, but no specific DNA sequences or motifs were associated with these neocentromeres (Shang et al. 2013). Induced neocentromeres on chromosome I of S. pombe were exclusively located near the ends of the chromosome, which represent the most distinct heterochromatic domains on the chromosome (Ishii et al. 2008). By contrast, in C. albicans and chicken, induced neocentromeres were mostly formed at a close proximity to the native centromeres (Shang et al. 2013; Thakur and Sanyal 2013). Thus, results from induced neocentromeres in three different model animal species suggest that certain chromosomal regions have intrinsic genomic/epigenomic properties for neocentromere seeding.

Neocentromeres have been documented in several plant species, including barley (Nasuda et al. 2005), rice (Gong et al. 2009), oat (Topp et al. 2009), maize (Fu et al. 2013; Liu et al. 2015; Zhang et al. 2013), and wheat (Guo et al. 2016). Centromere repositioning events were also reported in plants (Han et al. 2009; Wang et al. 2014). Several centromeres of maize may have spontaneously shifted positions in different genetic backgrounds (Schneider et al. 2016). However, the sequence or epigenomic features that promote or sustain centromere movement are not known. Maize chromosome 3, including its centromere (Cen3), has been well sequenced (Jiao et al. 2017; Schnable et al. 2009). Evolutionarily, maize chromosome 3 was formed by fusion of two ancestral chromosomes and contains an inactivated ancestral centromere (Wang and Bennetzen 2012; Wei et al. 2007). Two independent neocentromeres associated chromosome 3 were reported (Schneider et al. 2016; Wang et al. 2014). Here, we report another neocentromere associated with chromosome 3. Interestingly, all three neocentromeres were located within a specific chromosomal region that is located very close to the progenitor Cen3. We demonstrate that lack of genes and transcription and a relatively high level of DNA methylation in this region may provide a favorable chromatin environment for neocentromere activation.

Results

The origin of the centromere of maize chromosome 3

Maize (2n = 2x = 20) and sorghum (2n = 2x = 20) split from a common ancestor approximately 12 million years ago (Swigonova et al. 2004). The maize genome initially contained 40 chromosomes after a genome-wide duplication event and underwent dramatic intra- and inter-chromosomal rearrangements, which resulted in the current karyotype with 20 chromosomes (Paterson et al. 2004; Wei et al. 2007). Reconstruction of maize progenitor chromosomes based on physical mapping and genome sequence analysis showed that the sorghum genome has maintained its original chromosome number and structure (Schnable et al. 2011; Wei et al. 2007). Thus, sorghum chromosomes are considered to maintain their synteny with the “ancient chromosomes” that predate the divergence of maize and sorghum. Comparative sequence analysis between maize and sorghum genomes indicated that maize chromosome 3 was derived from two ancient chromosomes, which are homologous to sorghum chromosomes 3 and 8, respectively (Wang and Bennetzen 2012) (Fig. 1a). Ancient chromosome 8 (achro. 8) inserted in the pericentromeric region of ancient chromosome 3 (achro. 3) and the fused chromosome underwent several intrachromosomal rearrangements, which resulted in the current maize chromosome 3 (Wang and Bennetzen 2012) (Fig. 1b).

Fig. 1
figure 1

A model for the origin of maize Cen3 based on comparative sequence analysis. a A syntenic dotplot between maize chromosome 3 and sorghum chromosomes 3 and 8. Red dots represent syntenic gene pairs between maize and sorghum. Maize chromosome 3 contains seven syntenic blocks (1–7) with sorghum chromosome 8 and four syntenic blocks (a–d) with sorghum chromosome 3. Shaded regions represent the centromeric regions of the three chromosomes. b A model on the origin of maize Cen3 based on the synteny analysis. Maize chromosome 3 was derived from insertion of ancient chromosome 8 inserted in the pericentromeric region ancient chromosome 3 followed by several inversions. The centromere of ancient chromosome 8 was inactivated after the fusion of the two chromosomes. Red circle: retained ancient centromere 3; pink ellipse: inactive ancient centromere 8. The position of blocks 1–7 and a–d identified based on syntenic dotplot are marked under the chromosomes

The DNA composition of maize chromosome 3 (version 4 of B73 maize reference genome (Jiao et al. 2017)) can be revealed by syntenic analysis with sorghum chromosomes 3 and 8 (Fig. 1a). Based on this analysis, the short arm of achro. 3 (block “a” in Fig. 1) was retained nearly intact and represented the short arm of maize chromosome 3 (Fig. 1b). Achro. 8 became scrambled after the fusion. However, seven syntenic blocks (“1–7” in Fig. 1) derived from achro. 8 can be identified. The centromere of achro. 8 is predicted to be located between blocks 3 and 4 based on the centromere position on sorghum chromosome 8 (Fig. 1b). These two blocks have maintained the synteny between maize chromosome 3 and sorghum chromosome 8. These results suggest that the centromere of maize chromosome 3 (Cen3) was derived from achro. 3 and the centromere of achro. 8 was inactivated after the fusion of the two chromosomes (Fig. 1b).

Confirmation of the chromosomal position of Cen3

The CENH3-binding domain of Cen3 was previously mapped to 99.8–100.8 Mb on chromosome 3 (version 3 of the B73 reference genome) (Wang et al. 2014). However, this domain has recently been moved to 85.8–86.9 Mb on the version 4 pseudomolecule (http://ensembl.gramene.org/Zea_mays/Info/Index) (AGPv4 maize assembly). We wanted to verify if the version 3 or the version 4 pseudomolecule has the correct position of the CentC repeat, the landmark for maize centromeres (Zhong et al. 2002), on maize chromosome 3. We designed two single copy DNA probes, “L” and “R,” respectively (Table S1). Both probes were predicted to be located on the short arm and were 17 and 5 Mb, respectively, from the CentC array within Cen3 in version 3 genome (Fig. 2a). However, probe R was predicted to locate on the long arm and the two probes were 2 and 17 Mb, respectively, from the CentC array in version 4 (Fig. 2a). Fluorescence in situ hybridization (FISH) using L, R, and CentC probes revealed that the L probe resided on the short arm and flanked closely to the CentC repeats. In contrast, probe R resided on the long arm and was far away from the CentC repeats (Fig. 2b). Thus, the cytological positions of these three probes matched well with their relative positions on the version 4 pseudomolecule.

Fig. 2
figure 2

FISH-based confirmation of the chromosomal position of Cen3. a The predicted relative chromosomal positions of the CentC satellite repeat and two single copy DNA sequence probes (L and R) on version 3 and version 4 of the B73 reference genome. b FISH mapping of the three DNA probes on pachytene chromosome 3 of B73 maize. The L and R probes were detected in green and red colors, respectively. The CentC repeat was detected by white color. The FISH results matched the predicted positions based on the version 4 reference genome. Bar = 10 μm

DNA compositions and genes flanking Cen3 and an inactive centromere

We performed a detailed analysis of the gene content within and surrounding Cen3 and the inactivated centromere derived from achro. 8. Based on the conservation of homologous maize and sorghum gene pairs, the 63.7–88.4 Mb region of maize chromosome 3 is syntenic to 17.1–45.3 Mb of sorghum chromosome 3 (Fig. 3b). Similarly, 88.8–103.1 Mb region of maize chromosome 3 is syntenic to 12–39.7 Mb of sorghum chromosome 8 (Fig. 3b). The fusion junction between achro. 3 and achro. 8 is predicted to be located between 88.4 and 88.8 Mb. For convenience in description, these two regions were named as P1 and P2, respectively (Fig. 3a).

Fig. 3
figure 3

Evolution of ancient Cen3 and Cen8. a Predicted position of maize centromere 3 and an inactivated centromere derived from ancient chromosome 8. Blue rectangles, chromosome fragments derived from ancient chromosome 8. Green rectangles, chromosome fragments derived from ancient chromosome 3. Solid red circle, the B73-like maize Cen3 that was derived from ancient Cen3. Pink ellipse, inactive ancient Cen8. P1 and P2, included by two red boxes, are two regions that span Cen3 and the inactivated Cen8, respectively. b Maize-sorghum gene pairs that flank Cen3 and the inactivated Cen8. The diagram shows the P1 and P2 regions of maize chromosome 3 and their corresponding genomic regions of sorghum, which span the sorghum centromeres 3 and 8, respectively. Short black bars represent annotated genes in the maize and sorghum reference genomes. Short red bars represent gene pairs identified in both species. Each homologous maize-sorghum gene pair is connected by a black line. The red square and pink ellipse mark the positions of maize Cen3 and inactivated Cen8, respectively. Solid purple circles mark the positions of sorghum Cen3 and Cen8, respectively

The sorghum genomic region corresponding to P1 spans the centromere of sorghum chromosome 3 (Fig. 3b). Well aligned maize-sorghum gene pairs were identified along both sides of the two Cen3s, although the order of the gene pairs became more disrupted toward the centromeres (Fig. 3b, Fig. S1). These results support the conclusion that maize Cen3 is orthologous to sorghum Cen3.

The centromere position of achro. 8 is located between synteny blocks “3” and “4” (Fig. 1b). Thus, P2 is predicted to contain an inactivated centromere derived from achro. 8 (Fig. 3a). P2 and its corresponding genomic region of sorghum chromosome 8 contained a number of well conserved gene pairs (Fig. 3b). Interestingly, the order of these genes was reversed in the two genomes, suggesting that an inversion spanning the ancient Cen8 occurred during evolution. P2 included 14.3 Mb of DNA in maize. Strikingly, its corresponding sorghum genomic region included 27.7 Mb of DNA (Fig. 3b). These results suggested that the P2 region may have lost most of the repetitive DNA originally associated with the ancient Cen8. By contrast, P1 and its corresponding sorghum genomic region, which spans sorghum Cen3, included 24.7 and 28.2 Mb, respectively (Fig. 3b). Collectively, these results support the conclusion that maize Cen3 represents ancient Cen3 and ancient Cen8 was inactivated and lost the cognate centromeric repeats during the evolution of maize chromosome 3 during the last 12 million years.

Recurrent activation of a neocentromere in the pericentromeric region of chromosome 3

The CENH3-binding domain of maize Cen3 was mapped to 85.8–86.9 Mb (Fig. 4a). Maize chromosome 3 was transferred into the genetic background of oat through oat-maize cross and backcrosses (Kynast et al. 2001). Surprisingly, maize Cen3 in the oat-maize chromosome addition line OMA3.01 moved to a new position in the pericentromeric region toward the short arm (Topp et al. 2009; Wang et al. 2014) (Fig. 4b). The original maize Cen3 was inactivated although it still contained the CentC satellite repeats. The CENH3-binding domain of chromosome 3 moved from 85.8–86.9 to 79.6–84.7 Mb (Fig. 4b). The cause of this centromere repositioning event was not known. Interestingly, the CENH3-binding domains of all maize centromeres were expanded from ~ 1.8 to ~ 3.6 Mb in the oat background (Wang et al. 2014). It is likely that the expansion of Cen3 in oat was hindered by the presence of large transcription domains flanking the original CENH3-binding domain. Therefore, repositioning of Cen3 in the oat background became an alternative path to reach the expansion (Wang et al. 2014).

Fig. 4
figure 4

The CENH3-binding domains on chromosome 3 in five different maize lines. a The CENH3-binding domain associated with Cen3 in maize inbred B73. b The CENH3-binding domain associated with a repositioned Cen3 in oat-maize chromosome 3 (from maize line Seneca 60) addition line OMA3.01 (data from Wang et al. 2014). c The CENH3-binding domain associated with the Cen3 in maize inbred P39 (data from Schneider et al. 2016). d The CENH3-binding domain associated with the Cen3 in maize line Dp3a (data from Fu et al. 2013). e The CENH3-binding domain associated the Cen3 in maize line ax-3. The CENH3-binding domain associated with the presumptive original Cen3 is shaded with pink. The region associated with the predicted inactivated ancient centromere 8 is shaded with green. The latent region is shaded with light blue. The two dark-blue shaded regions represent two different activation centers

Recent mapping of centromere positions in 26 maize inbreds revealed a number of centromere repositioning events (Schneider et al. 2016). The position of the CENH3-binding domain of the Cen3 in most inbreds overlaps with that of B73. Interestingly, the CENH3-binding domain in one inbred, P39, was mapped to 80.6–82.8 Mb, which is approximately 3 Mb away from the Cen3 position in B73 (Fig. 4c). Strikingly, the profile of the CENH3-binding domain in P39 nearly perfectly overlaps with the center of the CENH3-binding domain in OMA3.01 (Fig. S2). The cause of Cen3 repositioning in P39 is not known. However, the authors suggested that many of the repositioning events were associated with the loss of CentC repeats (Schneider et al. 2016).

Two Cen3 repositioning events, which were likely induced by different mechanisms, resulted in formation of a neocentromere at the same location in P39 and OMA3.01. Intriguingly, P39 is an inbred sweet corn; and Seneca 60, which was the donor of maize chromosome 3 in the OMA3.01 line, was a hybrid sweet corn. We cannot exclude the possibility that P39 and Seneca 60 are related. Results from P39 and OMA3.01 suggest that maize chromosome 3 region around 80.6–82.8 Mb may provide an ideal chromatin environment for neocentromere activation.

Another independent neocentromere activation event

Searching of all previously published genome-wide CENH3 nucleosome mapping datasets in maize led to the discovery of another putative Cen3 repositioning event. Maize line Dp3a, originally derived from UV-treated materials (Stadler and Roman 1948), contains a small minichromosome derived from the long arm of chromosome 3. This small chromosome contains a de novo centromere over unique sequences that drives its transmission (Fu et al. 2013). Careful examination of data from chromatin immunoprecipitation (ChIP) followed by sequencing (ChIP-seq) from this line revealed that the CENH3-binding domain of Cen3 in the normal chromosome 3 was present at 83.3–85.3 Mb, immediately flanking the original Cen3 (Fig. 4d). This domain partially overlapped with the CENH3-binding domain of Cen3 in OMA3.01 (Fig. S2).

The minichromosome in the Dp3a line contains genes A1 and Sh2, which cause purple color of the aleurone layer and plump starchy kernels. Since the minichromosome has a low transmission rate, Dp3a with its minichromosome was maintained by crossing with recessive colorless, shrunken maintainer lines (a1, sh2), including ax-3, which contain a small deletion in this region. Thus, the minichromosome can be monitored based on morphological markers present on the kernels. We hypothesized that a repositioned Cen3 was present in maintainer lines. To answer this question, we conducted a CENH3 ChIP-seq experiment using the most recent maintainer stock line ax-3. We generated 22 millions of paired-end ChIP-seq reads and mapped 6.5 million unique reads to the B73 genome. Interestingly, we detected two major CENH3-binding domains on chromosome 3 of ax-3. The first CENH3 domain overlapped with the Cen3 in B73. The second CENH3 domain was mapped to the same region as that of the Dp3a line (Fig. 4e).

The CENH3-binding domain of Cen3 in the Dp3a line was mapped exclusively to 83.3–85.3 Mb (Fig. 4d). This result indicated that both copies of Cen3 in this line are present at this site. We performed single nucleotide polymorphism (SNP) analysis using the ChIP-seq sequences from the Dp3a line. The sequences from the 83.3–85.3 Mb region were heterozygous. Phasing of the haplotypes revealed that one of the haplotypes is the same as the maintainer line ax-3, confirming that one of the repositioned Cen3s in the Dp3a line was received from ax-3. We then analyzed the SNPs using the ChIP-seq sequences, which were mapped to 83.3–85.3 Mb, from ax-3, OMA3.01, and P39. The DNA sequences associated with the second haplotype of Cen3s in the Dp3a line was identical to those from OMA3.01 and P39 (Fig. S3). Thus, the data indicate two repositioned Cen3s in the Dp3a material analyzed. The second repositioned Cen3 was potentially received from another parental line used in the pedigree of the Dp3a line.

Genomic and epigenomic features associated with Cen3, the de novo centromeres, and the inactivated centromere

The recurrent neocentromere activations in 79.6–85.3 Mb prompted us to investigate if this region contains unique genomic or epigenomic features that would be favorable for CENH3 deposition. For convenience, we refer to 79.6–85.3 Mb as the latent region and the two CENH3-binding domains of P39 (80.6–82.8 Mb) (Fig. 4c) and the Dp3a line (83.3–85.3 Mb) (Fig. 4d) as two subcenters within the latent region. We first analyzed the chromosomal distribution of the CentC repeat and the CRM1/CRM2 elements, which are associated with most maize centromeres (Ananiev et al. 1998; Jin et al. 2004; Zhong et al. 2002). Unambiguous CentC signals were only detected in the Cen3 in B73 (Fig. 2b). CRM2 represents a young CRM subfamily and the majority of the CRM2 elements were mapped to the B73-like Cen3 (Fig. 5a), which agreed with the analyses based on other maize centromeres (Wolfgruber et al. 2009). By contrast, CRM1 represents an old CRM subfamily and the CRM1 elements were more broadly distributed in the pericentromeric region (Fig. 5a, b). No CRM2 and only a few CRM1 elements were detected in the neocentromeres highlighted in dark blue (Fig. 5). Similarly, only a single CRM2 element and no CRM1 element were detected in the inactivated centromere highlighted in green (Fig. 5).

Fig. 5
figure 5

Distribution of CRM elements and transcripts in the centromeric and pericentromeric region of chromosome 3. a Distribution and age of full length CRM1 and CRM2 elements. The full length elements were identified and dated by estimating the number of nucleotide substitutions per site (κ) between the two LTRs of each element. b Distribution of all CRM1 and CRM2 elements. The 75–100 Mb region was divided into 100 kb windows and the number of CRM elements was calculated in each window. c Gene density and transcription in the centromeric and pericentromeric regions. Chromosome 3 was divided into 5 Mb windows with a step of 1 Mb. Number of genes and RNA-seq reads of each window was counted and plotted. Black line represents the distribution of gene number; red line represents the distribution of RNA-seq read number. The CENH3-binding domain associated with the original Cen3 is shaded with pink. The region associated with the predicted inactivated ancient centromere 8 is shaded with green. The latent region is shaded with light blue. The two dark-blue shaded regions represent two different activation centers

The B73-like Cen3 and the two subcenters within the latent region represent the most gene-deficient regions on chromosome 3 (Fig. 5c, Fig. 6a). These three regions contained only 1, 10, and 9 genes, respectively. To compare the gene densities in other regions on the chromosome, we masked these three regions and divided chromosome 3 into 2-Mb windows. We found that only 5% of the 2-Mb windows contain less than 10 genes. By contrast, the inactivated centromere, spanning 96.4–99.3 Mb, contained 25 genes, which has a gene density of two-fold higher than the two subcenters. Mapping of RNA-seq datasets on chromosome 3 matched well with gene annotation results (Fig. 5c). The overall transcriptional activity within the latent region was lower than the inactive centromere. Analysis of the RNA-seq data indicated 8 expressed genes in the inactive centromere but only 3 and 2 in the two subcenters of latent region, respectively.

Fig. 6
figure 6

DNA methylation and gene expression associated with maize chromosome 3. a Profiles of DNA methylation and gene expression along the entire length of chromosome 3. The chromosome was divided into 5 Mb windows with a step of 1 Mb. DNA methylation level and number of RNA-seq reads of each window were calculated and plotted. The CENH3-binding domain associated with the original Cen3 is shaded with pink. The region associated with the predicted inactivated ancient centromere 8 is shaded with green. The latent region is shaded with light blue. The two dark-blue shaded regions represent two different activation centers. b The 75–100 Mb region of chromosome 3 is enlarged. Note: Cen3 and the latent region showed a higher level of DNA methylation and a lower level of transcription than its flanking regions and the inactivated centromere. c Boxplot of DNA methylation levels in the four color-shaded regions in b. Genomic regions of original centromere (pink), the two subcenters within the latent region (dark blue), and the inactivated centromere (green) were divided in to 50 kb windows. DNA methylation level in each window was calculated. The distribution of median DNA methylation level in each genomic region was determined by resampling 10,000 times with replacement. The same calculation was done for the entire maize genome as a comparison. Y-axis is the median DNA methylation level in each genomic region

We then analyzed the DNA methylation associated with the three chromosomal regions. The Cen3 and the two subcenters within the latent region showed the highest levels of DNA methylation on the chromosome, which were significantly higher than the DNA methylation level in the inactivated centromere (Fig. 6b). Interestingly, region 89.5–91.4 Mb, which immediately flanks Cen3 on the long arm, represented the most gene-deficient region (4 genes/Mb) on the chromosome (Fig. 6a). However, the methylation level in this region was lower than the two subcenters as well as the inactivated centromere. Taken together, the lack of genes and transcription and a relatively higher level of DNA methylation in the latent region may provide a favorable chromatin environment for neocentromere formation.

Discussion

Neocentromeres associated with inactivated ancient centromeres

Several human neocentromere “hotspots” were associated with ancient centromeres that were inactivated during mammalian chromosome evolution (Capozzi et al. 2009; Kalitsis and Choo 2012; Ventura et al. 2004). For example, repeated neocentromere activation was found to be associated with region 15q24–26 of human chromosome 15 (Marshall et al. 2008). Comparative cytogenetic and genomics analyses revealed that this region contains an ancestral centromere that became inactivated about 25 million years ago (Ventura et al. 2003). This region contains segmentally duplicated DNA clusters that typically reside around pericentromeric regions of native human centromeres. Thus, the association between neocentromere and this inactivated ancestral centromere was proposed to be due to the persistence of recombinogenic duplications accrued within the ancient pericentromere, rather than the retention of “centromere-competent” sequences per se (Ventura et al. 2003). Chromosome 6p22.1 represents another region of neocentromere formation that is associated with an inactivated ancient centromere. However, only one neocentromere was reported in this region (Capozzi et al. 2009). The CENP-A binding domain of this neocentromere was precisely mapped. No peculiar sequence features, except for a massive clustering of tRNA, were found in this neocentromere (Capozzi et al. 2009).

The maize B73-like Cen3 likely represents the original (ancentral) centromere of this chromosome, which is supported by the fact that this region contains a CentC array and is highly enriched with CRM2 (Fig. 5), which are specifically associated with maize centromeres (Ananiev et al. 1998; Jin et al. 2004; Zhong et al. 2002). Maize chromosome 3 contains an inactivated ancentral centromere (Fig. 4). Comparative sequence analysis indicates that this inactivated centromere, possibly including its surrounding regions, has lost the typical genomic and epigenomic features associated with most native centromeres, which is reflected by the fact that this region has lost most of the repetitive DNA sequences associated with all maize and sorghum centromeres (Fig. 3). In a similar case, human chromosome 2 arose from fusion of two chromosomes after the divergence of the hominid and chimp lineages (Yunis and Prakash 1982). Remnant alpha satellite repeats, which are specific to human centromeres, can be detected in the inactivated centromere located on the long arm of chromosome 2 (Baldini et al. 1993). However, close examination of this chromosomal region indicated that this ancient centromere has lost the typical DNA composition and organization, such as high-order repeat organization and recombinogenic duplicons, that are found in native human centromeres. No human neocentromeres have been reported to be associated with this inactivated centromere (Kalitsis and Choo 2012).

Results from both humans and maize suggest that the potential of neocentromere activation from an inactivated ancient centromere will depend on whether the inactivated centromere has maintained the intrinsic genomic and/or epigenomic features associated with the active centromeres in the same species. Such features may become decayed or lost completely during evolution.

Interpretation for neocentromere activation in regions at close proximity to native centromeres

Most induced neocentromeres on different C. albicans chromosomes were formed near the native centromeres (Thakur and Sanyal 2013). In chicken, 76% of the induced neocentromeres on the Z chromosome were close to the native centromere. Similarly, 97% of the neocentromeres induced on chromosome 5 were formed within 3 Mb of the native centromere (Shang et al. 2013). Shang et al. (2013) proposed that the preferential positions of chicken neocentromeres near the native centromeres are due to the fact that potential epigenetic marks, which would be favorable for centromere formation, are enriched around the original centromeres. In a striking contrast, induced neocentromeres on chromosome I of S. pombe were located exclusively near the telomeric ends, far away from the native centromere. However, this preference at the telomeric end may be also related to the distinct epigenetic marks associated with telomeric chromatin (Ishii et al. 2008).

Both the latent region and the inactivated centromere on maize chromosome 3 are located in close proximity to the native centromere (Fig. 6). Cen3 and the latent region share more similarities in genomic (genes and transcription) and epigenomic (DNA methylation) characteristics as compared to the inactivated centromere. Thus, the latent region provides a similar chromatin environment as the native centromere for CENH3 deposition. However, this latent region is more close to the native centromere than the inactivated centromere (Fig. 6). Thus, both factors, distance to the native centromere and a favorable chromatin environment, may play the role for the recurrent establishment of de novo centromeres in the latent region.

It has been well documented that the pericentromeric regions of all maize chromosomes are nearly completely suppressed in meiotic recombination (Anderson et al. 2003). Neocentromeres may form at any sites on the chromosomes with a favorable environment (Fu et al. 2013; Liu et al. 2015), but distally formed ones have the potential to form destructive anaphase bridges if recombination occurs between displaced centromeres in a heterozygote. By contrast, a chromosome with a neocentromere located close to the native centromere will be favored to survive because a heterozygote, including one normal chromosome and one neocentric chromosome, will not self-destruct because crossing overs between the two centromeres will not likely occur (Lamb et al. 2007).

Materials and methods

FISH

Immature tassels were harvested from B73 plants grown in the green house and fixed using 3:1 ethanol:glacial acetic acid. FISH on meiotic pachytene chromosomes was performed following published procedures (Koo et al. 2011). DNA probes were labeled with DNP-11-dUTP (PerkinElmer, Waltham, MA), digoxigenin-11-dUTP, and biotin-16-dUTP (Roche Diagnostics USA, Indianapolis, IN). The hybridization signals were detected with Alexafluor 488 streptavidin (Invitrogen, Carlsbad, CA) for biotin-labeled probes, and rhodamine-conjugated anti-digoxigenin (Roche Diagnostics USA, Indianapolis, IN) for dig-labeled probe. The DNP-labeled probe was detected with rabbit anti-DNP, followed by amplification with a chicken anti-rabbit Alexafluor 647 antibody. Chromosomes were counterstained with 4′,6-diamidino-2-phenylindole (DAPI) in VECTASHIELD antifade solution (Vector Laboratories, Burlingame, CA). The images were captured with a Zeiss Axioplan 2 microscope using a cooled CCD camera CoolSNAP HQ2 (Photometrics, Tucson, AZ) and AxioVision 4.8 software (Carl Zeiss Microscopy LLC, Thornwood, NY).

ChIP and ChIP-seq

Maize maintainer line “ax-3” was grown in the greenhouse under photoperiod of 16/8 h light/dark. Leaf tissues from 2-week-old seedlings were collected and ground into fine powder in liquid nitrogen. The resulting powder was suspended in the nucleus extraction buffer [10 mM potassium phosphate, 100 mM NaCl, 0.1% β-mercaptoethanol, 1/10 (v/v) Hexylene glycol (Sigma Cat # 112100)] for nuclei isolation. ChIP was performed following a published protocol (Nagaki et al. 2003). An antibody against centromeric histone H3 (Nagaki et al. 2004) was used in the ChIP experiment. ChIP-seq libraries for Illumina sequencing were constructed according to the protocol of “Preparing samples for ChIP sequencing of DNA” provided by Illumina. Briefly, both ends of the ChIPed DNA fragments were repaired using an End-It DNA end repair kit (Epicenter, ER0720). The “dA” base was then added to 3′ ends of the end-repaired DNA fragments using Klenow fragment (New England BioLabs, M0212S), followed by Illumina adapter ligation for pair-end sequencing, using a quick ligase (New England BioLabs, M2200). Adapter-ligated DNA fragments were purified by running a 2% agarose gel in TAE buffer and were size-selected from 200 to 300 bp. The resulting DNA fragments were enriched by PCR with 13 cycles. The purified ChIP-seq libraries were ready for Illumina sequencing after passing quality validation. The library was sequenced using the Illumina HiSeq 2000 platform.

Mapping of CENH3-binding domains and synteny blocks between maize and sorghum

Mapping Adapters and nucleotides with quality score less than 30 were removed from raw sequencing reads by Cutadapt (Martin 2011). Trimmed reads were mapped to B73 reference genome (AGPv4) by BWA-MEM (Li 2013) using default parameters. Alignments with mapping quality at least 50 were retained and converted to BAM format using SAMtools (Li et al. 2009) for further analysis. CENH3 ChIP data of maize inbred line P39 (SRR3018404), maize line dp3a (SRR639499), and oat-maize chromosome 3 addition line OMA3.01 (SRR867050) were downloaded from NCBI (http://www.ncbi.nlm.nih.gov/). CENH3-binding domains in each line were identified using SICER (Zang et al. 2009). The parameters of SICER are window size of 1000 bp, gap size of 3000 bp, effective genome fraction 0.75, redundancy threshold 1, fragment size 150 bp and FDR 0.01. Synteny map between maize version 3 reference genome (AGPv3) and sorghum (V2.0) were conducted using DAGchainer (Haas et al. 2004) at SynMap (Lyons et al. 2008) with parameters of MegaBlast, -D 20, and –A 5. Maize AGPv3 genes were converted into that AGPv4 coordinates by mapping genes to the AGPv4 genome assembly.

Analyses of transcriptome and DNA methylation

Transcriptome (SRR445382) and B73 DNA methylation data (SRR850328) (Li et al. 2015) were downloaded from NCBI (http://www.ncbi.nlm.nih.gov/). Transcriptional analyses followed published procedures (Zhao et al. 2016). Briefly, the transcriptome reads were mapped by TopHat (Trapnell et al. 2009) and Cufflinks (Trapnell et al. 2010) with default parameters. DNA methylation reads were mapped by Bismark (Krueger and Andrews 2011) with parameters: -q -N 1. Methylation information of each cytosine was extracted by bismark_methylation_extractor in the Bismark tool. Each chromosome was divided into 50 Kb windows. Percentage methylation level of CpG in each window was calculated by dividing number of methylated CpG by total number of CpG.

Data availability

The CENH3 ChIP-seq sequencing data of ax-3 is available from NCBI Sequence Read Archive (SRA) under SRR3709784.