Introduction

During eukaryotic cell division, accurate chromosome segregation requires the interaction of chromosomes with the microtubules from the spindle apparatus. This interaction is mediated by the kinetochore, a multiprotein structure that is hierarchically assembled onto centromeres. Upstream in the assembly of the kinetochore are CenH3 and Cenp-C, two interdependent proteins in their roles of establishing centromere identity and function. CenH3 is the histone H3 variant found in centromeric nucleosomes and, therefore, considered the centromere epigenetic marker (Dalal et al. 2007). During kinetochore assembly, Cenp-C binds to CenH3 and recruits other kinetochore proteins (Przewloka et al. 2011; Liu et al. 2016). CenH3 and Cenp-C are fundamentally interdependent because the centromeric localization of one depends on the centromeric localization of the other (Erhardt et al. 2008; Orr and Sunkel 2011). This interdependence is also illustrated by the fact that both CenH3 and Cenp-C have similar phylogenetic profiles (i.e., they have similar patterns of presence and absence across the eukaryotic evolutionary tree) and likely co-evolve as a functional unit (van Hooff et al. 2017). One interesting case is that seen in insects, where CenH3 was lost independently five times, and in all these cases Cenp-C was also lost (Drinnenberg et al. 2014).

Despite the essentiality of centromeres, both centromeric DNA (cenDNA) and proteins are remarkably diverse (Henikoff et al. 2000; Talbert et al. 2004; Plohl et al. 2008). This rapid evolution despite the expectation of constraint is referred to as the “centromere paradox” (Henikoff et al. 2001). This paradox may be explained by the centromere drive hypothesis, which proposes that genetic conflicts during female meiosis drive centromere evolution (Henikoff et al. 2001; Dawe and Henikoff 2006; Kursel and Malik 2018).

In the female meiosis of animals and plants, the meiotic spindle fibers are asymmetric in a way that one pole will originate a polar body and the other will give rise to the oocyte. As a result, there is potential for non-Mendelian (biased) inheritance if a pair of homologous chromosomes has kinetochores that interact unequally with the spindle fibers (Ross and Malik 2014). The heterogeneity in kinetochore function between homologs is a result of differences in the abundance of cenDNA sequences. One homolog may have a ‘strong’ centromere, which has an expanded cenDNA that recruits more kinetochore proteins and delivers its chromosome into the oocyte at > 50% frequency, or a ‘weak’ centromere, which has a contracted cenDNA that in turn recruits less kinetochore proteins and delivers its chromosome into the oocyte at < 50% frequency (Iwata-Otsubo et al. 2017). However, the spread of expanding centromeres throughout a population might be accompanied by deleterious effects, such as increased male sterility or a skewed sex ratio (Fishman and Saunders 2008; Rutkowska and Badyaev 2008; Malik and Henikoff 2009). The centromere drive hypothesis proposes that changes in CenH3 and Cenp-C related to more ‘flexible’ DNA-binding preferences are expected to counteract the transmission advantage gained by expanded centromeres and diminish the associated deleterious effects, thus restoring meiotic parity for both homologs (Henikoff et al. 2001; Dawe and Henikoff 2006).

The kinetochore is highly diverse across species, with proteins being lost, duplicated, created, or diversified during evolution (van Hooff et al. 2017). Given that data directly supporting a correlation between the evolution of cenDNA, CenH3, and Cenp-C are still absent, it is not known if and how such structural divergence is related to centromere drive suppression. However, the likely subfunctionalization of CenH3 paralogs in some lineages of Drosophila has been hypothesized to be linked to centromere drive suppression. Kursel and Malik (2017) have recently reported that the Drosophila CenH3 homolog Cid underwent four independent duplication events during evolution, and some Cid paralogs are primarily expressed in the male germline and evolve under positive selection (Kursel and Malik 2017). These duplications could have allowed the rapid evolution of centromeric proteins without compromising their essential function by separating functions with divergent fitness optima. The existence of germline-biased CenH3 duplicates (which do not interfere with essential mitotic functions) in genetically tractable organisms provides an opportunity to study the functional consequences of the genetic variation for kinetochore-related processes.

Given the interdependence between CenH3 and Cenp-C, we decided to further investigate the molecular evolution of the Cid and Cenp-C genes in Drosophila species. Here, we report a novel Cid duplication within the Drosophila subgenus and show that not only Cid, but also Cenp-C is duplicated in the entire Drosophila subgenus. The Cenp-C paralogs appear to have partially diverged in function, as motifs involved in centromere identity and function are conserved in both paralogs while other motifs are alternatively conserved between the paralogs. Interestingly, both the Cid and Cenp-C duplications generated copies that are male-biased and evolve under positive selection. Our findings point towards a specific kinetochore composition in a specific context (i.e., the male germline), which could prove valuable for the understanding of how the extensive kinetochore diversity is related to essential cellular functions.

Results and Discussion

Cid1 Was Replaced by a New Paralog in a Clade Within the Drosophila Subgenus

Duplicate Cid genes exist in D. eugracilis (Cid1, Cid2) and in the D. montium subgroup (Cid1, Cid3, Cid4), both within the Sophophora subgenus, and in the entire Drosophila subgenus (Cid1, Cid5). In all analyzed species from the Drosophila subgenus, Cid1 is flanked by the cbc and bbc genes, and Cid5 is flanked by the Kr and CG6907 genes (Kursel and Malik 2017). As expected, we found two Cid genes while looking for the orthologs of Cid1 and Cid5 in the assembled genomes of two cactophilic species from the Drosophila subgenus, D. buzzatii and D. seriema (repleta group, buzzatii cluster). Surprisingly, while one of the genes is present in the expected locus of Cid5, the other one is located in an entirely different locus, flanked by the CG14341 and IntS14 genes. We named this new paralog as Cid6.

By investigating the Cid1 locus of D. buzzatii, we found a myriad of transposable elements (TEs) surrounding a 116-bp fragment of the original Cid1 gene (Fig. 1, upper panel). Due to fragmentary genome assembly, the Cid1 locus of D. seriema could not be identified. Both Cid5 and Cid6 of D. buzzatii and D. seriema share ~ 40% amino acid identity but, in contrast, Cid6 of each species and Cid1 of the closely related D. mojavensis are much more similar, sharing ~ 80% identity. This result suggests that Cid6 is paralogous of Cid1.

Fig. 1
figure 1

Cid1 degenerated after the inter-chromosomal duplication event giving rise to Cid6. (Upper panel) Comparison between the Cid1 and Cid6 loci of D. buzzatii and the corresponding regions of D. mojavensis. The black asterisk indicates a fragment of Cid1, ‘N’ indicates unidentified nucleotides, red boxes indicate transposable elements, and arrows indicate primers used for the fluorescent in situ hybridization (FISH) experiments. (Lower panel) FISH on polytene chromosomes of D. buzzatii (Dbuz) and D. seriema (Dser) using Cid6 probes, and of the closely related D. mojavensis (Dmoj) and the outgroup D. virilis (Dvir) using Cid1 probes. The chromosome arm in which the Cid probe hybridized (red signal) is indicated by a letter representing the corresponding Muller element. The chromocenter, a region in which all centromeres bundle together, is indicated by a white asterisk (Note: the chromocenter of D. buzzatii and D. mojavensis ruptured during the fixation step). (Color figure online)

Fluorescent in situ hybridizations on polytene chromosomes showed that Cid6 is distal (in relation to the chromocenter) in the Muller element B of D. buzzatii and D. seriema, and that Cid1 is proximal in the Muller element C of D. mojavensis and the outgroup D. virilis (Fig. 1, lower panel). Therefore, we inferred that Cid1 was degenerated by several TE insertions after the origin of Cid6 by an inter-chromosomal duplication of Cid1 in the lineage that gave rise to D. buzzatii and D. seriema. The time of divergence between D. buzzatii and D. seriema has been estimated at ~ 4.6 mya, and the divergence between them and the closely related D. mojavensis has been estimated at ~ 11.3 mya (Oliveira et al. 2012). Therefore, the Cid1 duplication that gave rise to Cid6 happened between ~ 4.6 and 11.3 mya.

Why Cid6 remained while Cid1 degenerated? The Cid1 locus of D. buzzatii is located in the most proximal region of the Muller element C (scaffold 115; Guillén et al. 2014), which is very close to the pericentromeric heterochromatin where TEs are highly abundant (Pimpinelli et al. 1995; Casals et al. 2005; Rius et al. 2016). Natural selection is known to be less effective in pericentromeric and adjacent regions due to low rates of crossing-over (Zhang and Kishino 2004; Clément et al. 2006; Comeron et al. 2012; Nambiar and Smith 2016). Thus, it is reasonable to suggest that the presence of an extra copy of Cid1 (i.e., Cid6) in Muller element B alleviated the selective pressure on Cid1 in Muller element C, whose proximity to the pericentromeric heterochromatin fostered its degradation by several posterior TE insertions.

Cenp-C Is Duplicated in the Drosophila Subgenus

It has been recently shown that the Drosophila CenH3 homolog Cid underwent duplication events during evolution (Kursel and Malik 2017). Given that CenH3 and Cenp-C are interdependent and co-evolve as a functional unit, we investigated if Cenp-C was also duplicated in Drosophila species where Cid was duplicated.

In D. eugracilis, in species from the montium subgroup, and in all the other species of the Sophophora subgenus we found only one copy of Cenp-C, which is always flanked by the 5-HT2B gene. On the other hand, in species of the Drosophila subgenus we found two copies of Cenp-C with ~ 52% nucleotide identity, which we named as Cenp-C1 and Cenp-C2: the former is flanked by the 5-HT2B and CG1427 genes, and the latter is flanked by the CLS and RpL27 genes. A maximum likelihood tree showed that Cenp-C was likely duplicated after the split between the Sophophora and Drosophila subgenera but before the split between D. busckii and the other species of the Drosophila subgenus (Fig. 2). Thus, we concluded that Cenp-C2 originated from a duplication of Cenp-C1 in the lineage that gave rise to species of the Drosophila subgenus, at least 50 mya (Russo et al. 2013).

Fig. 2
figure 2

Cenp-C1 duplicated in the lineage that gave rise to species of the Drosophila subgenus. Maximum likelihood tree of the Cenp-C1 and Cenp-C2 paralogs. Red and green branches, respectively, correspond to Cenp-C1 and Cenp-C2 sequences from species of the Drosophila subgenus. Bootstrap values are shown in each node. Scale bar represents number of substitutions per site. (Color figure online)

Why is Cenp-C duplicated only in the Drosophila subgenus if Cid is also duplicated in D. eugracilis and in the montium subgroup? The fact that both Cid and Cenp-C duplicated in the Drosophila subgenus does not mean that there is a cause-and-effect relationship between the duplications. However, it probably means that the new paralogs influenced each other’s evolution.

As a histone H3 variant, CenH3 has the C-terminal histone fold domain, which is reasonably conserved among species, and the N-terminal tail (NTT), which is highly variable among species (Henikoff et al. 2000). The NTT evolves in a modular manner, with four core motifs always conserved when there is only one Cid protein encoded in the genome (Kursel and Malik 2017). In D. eugracilis, the Cid2 paralog functionally replaced the pseudogenized ancestral Cid1 paralog. In species of the montium subgroup, these four motifs are alternated between the paralogs, which share ~ 25% amino acid identity. In contrast, in species of the Drosophila subgenus, all four motifs are conserved in Cid1 but only 1–2 are conserved in Cid5, with the paralogs sharing only ~ 15% amino acid identity at their NTT. Therefore, we propose that if the NTT of Cid interacts with Cenp-C, a new Cenp-C copy would allow a higher divergence of the Cid paralogs by alleviating the selective pressure over the Cid/Cenp-C interaction, thus explaining the higher divergence of the Cid1 and Cid5 paralogs. Future studies focusing on the specific interactions between Cid and Cenp-C shall shed light on the exact basis behind the flexibility of these two proteins during evolution.

The Cenp-C Paralogs Are Differentially Expressed

Given that Cenp-C is incorporated onto centromeres concomitantly with Cid (Schuh et al. 2007) and that the excess of both proteins can cause centromere expansion and kinetochore failure (Schittenhelm et al. 2010), the expression of both proteins needs to be tightly regulated. Kursel and Malik (2017) showed that Cid5 expression is male germline-biased and proposed that Cid1 and Cid5 subfunctionalized and now perform non-redundant centromeric roles. In order to investigate if Cenp-C1 and Cenp-C2 are differentially expressed and correlated in some way with the expression of the Cid paralogs, we analyzed the available transcriptomes from embryos, larvae, pupae, adult females and males of D. buzzatii (Guillén et al. 2014), and from testes of D. virilis and D. americana (BioProject Accession PRJNA376405).

While Cid6 is transcribed in all stages of development in D. buzzatii, confirming that Cid6 functionally replaced Cid1, Cid5 transcription is limited to pupae and adult males, with a higher transcription than Cid6 in the latter (Fig. 3a). Additionally, Cid5 transcription is elevated in testes of D. virilis and D. americana, whereas Cid1 is virtually silent (Fig. 3c). Our results therefore further support the finding of Kursel and Malik (2017) that Cid5 displays a male germline-biased expression. In this context, our finding that Cid5 is also transcribed in pupae of D. buzzatii may be related to the ongoing development of the male gonads.

Fig. 3
figure 3

Cid5 and Cenp-C2 are male germline-biased. Cid and Cenp-C expression patterns in D. buzzatii (a, b) and D. virilis and D. americana (c, d). Data are presented as mean ± 95% confidence interval and analyzed by one-way ANOVA (A and B) and Student’s t test (c, d): n.s. not significant; *P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001; ****P ≤ 0.0001. TPM transcripts per million

In contrast to the Cid paralogs, we found that both Cenp-C1 and Cenp-C2 are transcribed in almost all stages of D. buzzatii development, with the exception of larvae (Fig. 3b). Cenp-C1 transcription is higher than that of Cenp-C2 in D. buzzatii embryos and adult females. On the other hand, transcription of Cenp-C2 is higher than that of Cenp-C1 in D. buzzatii pupae and adult males. Cenp-C2 transcription is also higher than that of Cenp-C1 in D. virilis testes, but there is no significant difference between their expression in D. americana testes (Fig. 3d). The male germline-biased expression of both Cid5 and Cenp-C2 points towards their interaction in spermatogenesis, but biochemical assays need to be performed to confirm this possible interdependence. The difference in the expression profiles of the Cenp-C paralogs, specially the biased expression of Cenp-C2 in testis, indicates that the paralogs may be functionally distinctive in some way.

The New Cenp-C Paralog Could Have Specialized for a Particular Function

Cenp-C was previously thought to be absent in Drosophila (Talbert et al. 2004), but it turned out that a protein that interacts with the regulatory subunits of separase is a highly divergent Cenp-C homolog (Heeger et al. 2005). The D. melanogaster Cenp-C1, as characterized by Heeger et al. (2005), has seven independent functional motifs, from N- to C-terminal: arginine-rich (R-rich), drosophilid Cenp-C homology (DH), AT hook 1 (AT1), nuclear localization signal (NLS), CenH3 binding (also known as the Cenp-C motif), AT hook 2 (AT2), and C-terminal dimerization (Cupin). The R-rich and DH motifs, as well as both AT1 and AT2 motifs (which may mediate binding to the minor grove of DNA), are functionally poorly characterized. However, all except AT1 appear to hold essential functions, as Cenp-C1 variants lacking these regions are unable to prevent phenotypic abnormalities in Cenp-C1 mutant embryos (Heeger et al. 2005). In fact, it is known that the DH motif must be involved in the recruitment of kinetochore proteins (Przewloka et al. 2011; Liu et al. 2016). Furthermore, arginine 1101 (R1101), present in the CenH3 binding motif, is crucial for centromere localization (Heeger et al. 2005). Given the functional relevance of these motifs, we searched for them in both Cenp-C1 and Cenp-C2.

With the exception of D. kikkawai (from the montium subgroup), in which the AT2 motif is absent, all seven motifs are conserved in Cenp-C1 from all other species of the Sophophora subgenus. In contrast, the motifs appear to be alternatively conserved between Cenp-C1 and Cenp-C2 in species from the Drosophila subgenus (Fig. 4). Both Cenp-C1 and Cenp-C2 of all species have the DH, NLS, and CenH3 binding motifs (with the corresponding R1101 of D. melanogaster) but lack the AT1 motif. Furthermore, only Cenp-C2 has the R-rich and AT2 motifs conserved. Both Cenp-C1 and Cenp-C2 of most species have the Cupin motif, the exceptions being Cenp-C1 of D. busckii, which lacks the final half of it, and Cenp-C2 of D. grimshawi, which entirely lacks it. Interestingly, the DH and NLS motifs of Cenp-C2 are more similar to those of Sophophora Cenp-C1 than to those of Drosophila Cenp-C1 (Table 1). The logo representation of the motifs can be seen in Supplementary Figure S1.

Fig. 4
figure 4

Some Cenp-C motifs are alternatively conserved between Cenp-C1 and Cenp-C2. Both Cid and Cenp-C genes were duplicated in the lineage that gave rise to species of the Drosophila subgenus, as indicated in the species tree. Moreover, Cid1 was also duplicated in D. eugracilis, the montium subgroup (which includes D. kikkawai), and the buzzatii species cluster, the new paralogs of which are indicated at their respective branches. After the Cenp-C duplication, some functional motifs were alternatively conserved between the paralogs, as indicated at the right half of the image. High amino acids identity is indicated by the same color shade. Motifs are as follow: R-rich, arginine-rich; DH, drosophilid Cenp-C homology; AT1, AT hook 1; NLS, nuclear localization signal; CenH3 binding, also known as Cenp-C motif; AT2, AT hook 2; Cupin, a dimerization domain near the C-terminal region. The asterisk in the CenH3 binding motif indicates the corresponding R1101 of D. melanogaster, which is crucial for the centromere localization of Cenp-C1. In order to test for positive selection acting on the Cid and Cenp-C paralogs among lineages, foreground branches defined for the branch, branch sites, and clade models of CodeML are indicated on the species tree (#1–6)

Table 1 Genetic distances between the Cenp-C paralogs

The observed pattern of motif conservation, as well as the pattern of gene expression, shows that both paralogs are functional. However, their retention and subsequent divergence may reflect either subfunctionalization or neofunctionalization. In neofunctionalization, one copy is maintained by purifying selection and retains the original function, whereas the redundant copy is free to evolve and potentially acquire new functions (Walsh 1995). In subfunctionalization, both copies accumulate mutations through genetic drift, with their evolutionary rates consequently increasing symmetrically relative to the original copy, and the ancestral function may be split between duplicates (Lynch and Force 2000). Subsequently, purifying selection is expected to maintain the two functionally distinct copies.

The conservation of the DH motif (involved in the recruitment of kinetochore proteins) and the NLS and CenH3 binding motifs (involved in centromere localization) in both Cenp-C1 and Cenp-C2 (Fig. 4) may indicate that it is unlikely that any of the paralogs underwent neofunctionalization, as these motifs are involved in the basal centromeric functions of Cenp-C. The (partial) loss of the Cupin motif in D. busckii and D. grimshawi points towards subfunctionalization. It is currently difficult to evaluate the loss of the AT1 motif in both Cenp-C1 and Cenp-C2, given that its function is unknown. However, the exclusive conservation of the R-rich and AT2 motifs in Cenp-C2, and the higher similarity of the DH and NLS motifs of Cenp-C2 to those of Sophophora Cenp-C1 suggest that Cenp-C2 hold a higher resemblance to the ancestral protein, and possibly indicate that Cenp-C1 may have neofunctionalized.

In order to better comprehend the evolution of the paralogs in the context of neofunctionalization or subfunctionalization, we performed the CodeML maximum likelihood-based branch-site test of PAML version 4.8 (Yang 2007), which tests for positive selection both among sites in the protein and across branches on the tree phylogeny, aiming to detect positive selection affecting particular sites along the major Cenp-C lineages: Drosophila subgenus, Drosophila Cenp-C1, and Drosophila Cenp-C2. Interestingly, we found that the Drosophila Cenp-C2 lineage shows signs of positive selection (Table 2). Sites identified as having evolved under positive selection are numerous and situate throughout the protein (data not shown).

Table 2 Parameter estimates and likelihood-ratio tests for branch-site models performed on the major Cenp-C lineages

Differences in tissue gene expression patterns that parallel a contrasting evolutionary rate reinforce the role of neofunctionalization in the evolution of a particular gene duplication (Pegueroles et al. 2013). Accordingly, both the male-biased expression and the contrasting evolutionary rate of Cenp-C2 relative to Cenp-C1 indicate that Cenp-C2 may have neofunctionalized. However, the fact that CodeML did not find evidence of positive selection on Cenp-C1 does not mean that this paralog has not experienced positive selection, as CodeML cannot account for the role of insertions or deletions on the evolution of a given set of sequences. In fact, it is possible that the loss of the R-rich and AT2 motifs by Cenp-C1 was favored after the gene duplication.

Altogether, the evidence gathered is conflicting about the role of either sub- or neofunctionalization in the evolution of the Cenp-C duplication. An alternative model that could make sense of the data is that of “escape from adaptive conflict,” in which a protein with multiple functions can optimize each function independently after gene duplication, following release from antagonistic pleiotropy (Hughes 1994; for an example, see; Des Marais and Rausher 2008). Indeed, the same concept was briefly considered in the interpretation of the evolution of the CenH3 paralogs found in Mimulus monkeyflowers (Finseth et al. 2015); and later, Kursel and Malik (2017) used the model in order to interpret the evolution of the Cid paralogs in the context of centromere drive.

The centromere drive hypothesis states that CenH3 and Cenp-C constantly evolve in an effort to suppress and diminish the associated deleterious effects of cenDNA selfish spread throughout the population, which is fostered by female meiotic drive (Henikoff et al. 2001; Dawe and Henikoff 2006; Kursel and Malik 2018). It has been proposed that CenH3 could be under adaptive conflict imposed by distinct selective pressures in meiosis and mitosis (Finseth et al. 2015; Kursel and Malik 2017). Once the gene undergoes duplication, one copy may become specialized to act as a suppressor for centromere drive in meiosis in order to escape from the adaptive conflict. In the context of the Cenp-C duplication, it is possible that both the male-biased expression and evolution under positive selection of Cenp-C2 reflect specialization for the suppressor function. Moreover, the resemblance of the Cenp-C2 motifs to those of the ancestral state could mean that this is only so because the ancestral state, as found in species of the Sophophora subgenus, hold functions that are in adaptive conflict; thus, the fact that Cenp-C1 lost the R-rich and AT2 motifs could mean that these are more relevant only in the context of both male meiosis and the suppressor function. Similarly, the fact that the DH and NLS motifs are more divergent in Cenp-C1 could also mean that these are specialized for contexts in which the suppressor function is irrelevant.

The possibility that duplication followed by specialization allowed the Cid and Cenp-C paralogs to achieve fitness optima for divergent functions predicts that selection may act differently in each of the paralogs. To test this hypothesis, we looked in our full-length alignments of the Cid and Cenp-C paralogs for signatures of positive selection using the CodeML NSsites models of PAML version 4.8 (Yang 2007). Given that CenH3 and Cenp-C are highly divergent, we focused our analyses on five closely related cactophilic Drosophila species from the repleta group: D. mojavensis, D. arizonae, D. navojoa, D. buzzatii , and D. seriema (Fig. 4).

We first used random-site and branch-site models to test for positive selection on particular sites during the evolution of the paralogs. The random-site models, which allow ω to vary among sites but not across lineages, revealed that both Cid5 and Cenp-C2 show extensive signs of positive selection (Table 3). Particularly, Bayes Empirical Bayes analyses identified with a posterior probability > 95% four amino acids in the NTT of Cid5 and six amino acids across Cenp-C2 as having evolved under positive selection. Of the six Cenp-C2 amino acids, one is in the DH motif, one is in the Cupin motif, and the remaining four are in inter-motif sequences.

Table 3 Parameter estimates and likelihood-ratio tests for random-site models performed on each Cid and Cenp-C paralog

The branch-site models allow ω to vary both among sites and across branches on the tree and aim to detect positive selection affecting a few sites along particular lineages. The tests revealed that the paralogs show signs of positive selection in the branches of D. navojoa Cid1 and Cenp-C2, D. buzzatii Cenp-C1 and Cenp-C2, and D. seriema Cenp-C1 and Cenp-C2 (Table 4). Particularly, Bayes Empirical Bayes analyses identified with posterior probability > 60% four amino acids in the NTT of D. navojoa Cid1, seven in inter-motif sequences of D. navojoa Cenp-C2, four in D. buzzatii Cenp-C1 (one in the DH motif and three in inter-motif sequences), six in inter-motif sequences of D. buzzatii Cenp-C2, four in D. seriema Cenp-C1 (two in the Cupin motif and two in inter-motif sequences), and six in inter-motif sequences of D. seriema Cenp-C2 as having evolved under positive selection.

Table 4 Summary of branch-site models for positive selection performed on each Cid and Cenp-C paralog

Finally, we used clade model C to test for positive selection among a priori designated lineages and found evidence of positive selection acting on Cid1, Cenp-C1, and Cenp-C2 across almost all the foreground branches, the exception being D. buzzatii (Table 5). The clade model C test showed that the majority of sites are under negative selection across all lineages, while a small proportion do show signatures of positive selection (data not show); however, there is no obvious pattern of positive selection across the phylogeny. Unlike the sites-models, clade models freely estimate ω’s for each a priori designated clade and permit sites under positive selection in null models, which could explain the discrepancy among the sites-models and the clade model.

Table 5 Summary of the clade model for divergent selection performed on each Cid and Cenp-C paralog

Overall, our data revealed that, on average, Cid5 and Cenp-C2 show extensive signs of positive selection, which may indicate that these male germline-biased genes are specialized for the drive-suppression function. Kursel and Malik (2017) found signs of positive selection in the Cid3 paralog of the montium subgroup and proposed that Cid3 and Cid5 could be attenuating deleterious effects of centromere drive due to their male germline-biased expression. Our results of extensive positive selection on both Cid5 and Cenp-C2 in species from the repleta group do support this hypothesis. However, male germline-biased genes are widely known to evolve adaptively as the result of male–male or male–female competition (Ellegren and Parsch 2007; Meisel 2011). On the other hand, branch-site models revealed that different sites of both Cenp-C1 and Cenp-C2 show signs of positive selection in D. buzzatii and D. seriema, which may indicate that either a drive-suppression function is not restricted to male-biased genes or that they are adapting to contexts other than male meiosis and drive suppression. Either way, molecular genetic data alone cannot reveal the underlying cause of adaptive evolution. What our findings do suggest is that species of the Drosophila subgenus likely have a specific inner kinetochore composition that mainly functions in spermatogenesis.

Concluding Remarks

The extensive diversity of kinetochore compositions in eukaryotes poses numerous questions regarding the flexibility of essential cellular functions (van Hooff et al. 2017). Is the kinetochore less conserved than other core eukaryotic cellular systems? And if so, why so many core kinetochore proteins are so diverse? Are the variants adaptive to the species? To answer such questions, it is necessary to investigate how a specific kinetochore composition affects specific cellular features and lifestyles. Herein, we showed that Cid5 and Cenp-C2 offer such a possibility, as both are inner kinetochore protein variants likely specialized to function mainly in spermatogenesis. Thus, finding out if and how Cid5 and Cenp-C2 play a role either in centromere drive suppression or reproductive competition can shed a new light into our understanding of centromere evolution.

Materials and Methods

Identification of Cid and Cenp-C Orthologs and Paralogs in Sequenced Genomes

For most Drosophila species, Cid and Cenp-C coding sequences were obtained from EST data. For Cenp-C1 of D. navojoa, D. mojavensis, D. buzzatii, D. seriema and D. americana, Cenp-C2 of D. buzzatii, D. seriema, D. americana and D. grimshawi, Cid5 of D. virilis, and both Cid5 and Cid6 of D. buzzatii and D. seriema, coding sequences were identified by tBLASTx in sequenced genomes. Since Cid is encoded by a single exon in Drosophila, we selected the entire open reading frame for each Cid gene hit, and since Cenp-C has multiple introns, we used the Augustus gene prediction algorithm (Stanke and Morgenstern 2005) to identify the coding DNA sequences. For annotated genomes, we recorded the 5′ and 3′ flanking genes for the Cid and Cenp-C genes of each species. For genomes that are not annotated, we used the 5′ and 3′ nucleotide sequences flanking the Cid and Cenp-C genes as queries to the D. melanogaster genome using BLASTn and verified the synteny in accordance to the hits. For the D. seriema genome assembly, see Supplementary File S1. All Cid and Cenp-C coding sequences and their database IDs can be found in Supplementary Files S2 and S3, respectively.

Fluorescent In Situ Hybridizations (FISH) on Polytene Chromosomes

Probes for Cid1/Cid6 were obtained by PCR (see Fig. 1 upper pannel, for primer positions) from genomic DNA of D. buzzatii (strain st-1), D. seriema (strain D73C3B), D. mojavensis (strain 14021-0248.25), and D. virilis (strain 15010-1551.51). We cloned the PCR products into the pGEM-T vector (Promega) and sequenced them to confirm identity. Recombinant plasmids were labeled with digoxigenin 11-dUTP by nick translation (Roche Applied Science). FISH on polytene chromosomes was performed as described in Dias et al. (2015). The slides were analyzed under an Axio Imager A2 epifluorescence microscope equipped with the AxioCam MRm camera (Zeiss). Images were captured with the AxioVision software (Zeiss) and edited in Adobe Photoshop. Chromosome arms were identified by their morphology (Kuhn et al. 1996; González et al. 2005; Schaeffer et al. 2008).

Phylogenetic Analyses

Cid and Cenp-C sequences were aligned at the codon level using MUSCLE (Edgar 2004) and refined manually. Subsequently, we generated maximum likelihood phylogenetic trees in MEGA6 (Tamura et al. 2013) with the GTR substitution model and 1000 bootstrap replicates for statistical support.

Expression Analyses

RNA-seq data from D. buzzatii (Guillén et al. 2014), and from D. virilis and D. americana (BioProject Accession PRJNA376405) were analyzed for the Cid and Cenp-C expression patterns with Bowtie2 (Langmead and Salzberg 2012), as implemented to the Galaxy server (Afgan et al. 2016). Mapped reads were normalized by the transcripts per million (TPM) method (Wagner et al. 2012), and all normalized values < 1 were set to 1 so that log2 TPM ≥ 0.

Positive Selection Analyses

Cid and Cenp-C alignments and gene trees were used as input into the CodeML NSsites models of PAML version 4.8 (Yang 2007). Random-site and branch-site models were used to test for positive selection on particular sites during the evolution of the Cid and Cenp-C paralogs. Random-site models allow ω to vary among sites but not across lineages; for this analysis, we compared three models that do not allow ω to exceed 1 (M1a, M7 and M8a) to two models that allow ω > 1 (M2a and M8). Branch-site Model A was compared with Model Anull to examine whether particular sites evolved under positive selection along a priori specified branches (called foreground branches). Positively selected sites were classified as those with a Bayes Empirical Bayes posterior probability > 90%. Clade model C (CmC) tests for divergent selection on particular sites among a priori designated lineages. The modified null model of CmC (M2a_rel) assumes that sites fall into three classes: purifying selection (0 < ω < 1); neutral evolution (ω = 1); or positive selection (ω > 1). In CmC, the third site class allows the estimated ω for a site to diverge across foreground branches.