The legacy of B chromosomes

B chromosomes (Bs) are enigmatic accessory elements to the regular chromosome set (A) and, since their discovery at the beginning of the twentieth century (Wilson 1907), Bs have ranked among the main topics of chromosome biology. The importance of B chromosomes is illustrated by the series of conferences on B chromosomes that have been organized during the last three decades (1st, 2nd and 3rd B chromosome Conferences in 1993, 2004, and 2014, respectively). B chromosome science has advanced from classical (conducted during most of the twentieth century) and molecular cytogenetics (conducted from 1990 to 2010) to genomics and bioinformatics approaches (conducted during the last few years). Recent advances in next-generation sequencing (NGS) technologies and high-throughput molecular biology protocols have led to the B chromosome becoming the subject of massive data analysis, thus enabling an investigation of structural and functional issues at a level that was previously never considered possible. Here, B chromosome history will be reviewed with an emphasis on recent advances resulting from high scale DNA and RNA analyses and their contribution to the construction of new concepts regarding B chromosomes.

Bs are regularly found in some but not all individuals within a population and are considered dispensable. In 2005, the number of species harboring B chromosomes was approximately 2000 species, including animals, plants, and fungi (for a review, see Camacho 2005). Bs can occur in up to 14 % of karyotyped orthopteran insect species and in as many as 10–15 % of cytogenetically known flowering plants (Jones 1995). Such frequencies might have increased in the last decade as Bs have been described for new taxonomic groups. Most B chromosomes are rich in heterochromatin, gene poor, and composed of repeated DNA sequences. These characteristics seem to support the modest or absent effects of Bs on their hosts. Consequently, the presence of Bs is not associated with any phenotype in most species.

Although classically considered non-functional, both beneficial and harmful effects of Bs have been described. One of the most harmful effects of Bs on fitness is the energy cost of B maintenance during the cell cycle and their potential interference with the proper assortment of As during meiosis (Jones and Reed 1982). B chromosome effects are more frequently associated with vigor and fertility (Camacho et al. 2000). One of the classical influences of Bs is their role in meiosis, during which they increase chiasma frequency and recombination (as reviewed in Jones 1995; Camacho 2005; Burt and Trivers 2006). Other effects of Bs have been observed in cichlid fishes (Yoshida et al. 2011), in which Bs are likely to play a role in sex determination, and in the fungus Nectria haematococca, in which Bs confer antibiotic resistance and pathogenicity (Coleman et al. 2009). The high occurrence of Bs without clear benefits for most host species appears to be related to their high transmission rate due to drive mechanisms that cause non-Mendelian segregation at meiosis or post-meiotic events (Jones 1991); these mechanisms increase the probability of Bs being retained in the host genome, according to a parasitic model (Camacho 2005). Although the parasitic model for B chromosomes appears plausible and has been observed worldwide, the molecular mechanisms that regulate B chromosome maintenance and segregation during the cell cycle remains enigmatic.

The application of cytogenetics and genomics to B chromosome analysis

Cytogenetic science was born with the first analysis of chromosomal behavior during cell division, which was carried out at the end of the nineteenth century and at the beginning of the twentieth century (for a review, see Martins et al. 2011). During the first half of the twentieth century, animal chromosomes were viewed in tissue sections, yielding imprecise and incorrect information. In the second half of the twentieth century, high-quality chromosomes were obtained from cell suspensions and cell cultures and were subjected to new methods of chromosome staining and banding. Such new methods enabled rapid progress in the cytogenetic field. After 1980, due to advances in molecular biology, cytogenetics was used in combination with molecular biology techniques, thus allowing significant advances in understanding genomes through chromosome studies. After the first hybridizations of DNA/RNA molecules to the nucleus and chromosomes (Pardue and Gall 1969; Gall and Pardue 1969), several studies used radioactively labeled repetitive DNAs (rRNA genes and satellite DNAs), and the latter experiments used fluorescent in situ hybridization (FISH) techniques (Pinkel et al. 1986) that allowed the simultaneous hybridization of multiple gene/DNA sequences on chromosomes. Modified plasmids (BACs—bacterial artificial chromosomes, cosmids, and fosmids, among others) were used to construct libraries containing large (30–300 kb) genomic segments and probes representing the DNA pool of partial or whole chromosomes; such techniques enabled researchers to refine chromosome analysis methods, thus allowing the development of chromosome painting and multicolor hybridization during the 1990s.

The availability of several completely sequenced eukaryotic genomes in the last decade opened new avenues for cytogenetics, thus opening further prospects for the physical chromosomal mapping of genes and comparative analyses. The advent of new genome sequencing technologies in the 2000s suggested that cytogenetics would become obsolete and that molecular/genomic tools would solve most biological questions, including those related to chromosomes. However, although most recent technological advances in genomics have proved to be of great importance for chromosome biology, significant genome data can be acquired from fundamental cytogenetic studies involving the identification of chromosome number and morphology as well as the advanced molecular and bioinformatics approaches that are applied in cytogenetics. Furthermore, the integration between genomic sequencing and physical chromosome mapping is very useful for classical genomic projects, in which the mapping of contigs/scaffolds onto chromosomes was considered a “golden path” to better understanding genomes. In this way, both genomics and cytogenetics have advanced as genomic data have been integrated with chromosome analysis, allowing the emergence of cytogenomics as a new field of biological research.

Advances in various forms of the large-scale analysis of DNA, RNA, and proteins, and in the study of chromatin modification have reduced the time, cost, and difficulties involved in the study of complex genomes, thus allowing the development of audacious structural and functional projects, such as “ENCODE” (https://www.encodeproject.org/), “Genome 10K” (genome10k.soe.ucsc.edu/), “i5K” (arthropodgenomes.org/wiki/i5K), “1000 Genomes” (http://www.1000 genomes.org), and “The Cancer Genome Atlas” (http://cancergenome.nih.gov), among others. The exploration of genomic data from the perspective of chromosome biology has allowed the use of higher resolution approaches for understanding the basis of chromosomal changes during evolution. Currently, the field of cytogenetics is synergistic with genomics, and the integration of structural and functional genomics with cytogenetics has the potential to clarify the molecular mechanisms governing the origin and fate of chromosomal rearrangements during evolution.

Classical and molecular cytogenetic approaches have been extensively applied to the study of B chromosomes in an attempt to answer questions regarding their origin and evolution. Although important findings have been made, such techniques present limited resolution at the molecular level. The recent development of large-scale genomic analysis has also advanced the field of chromosome biology, including the study of B chromosomes. Millions of sequences can be generated for several genome samples of interest, which can then be used to identify chromosomal variations. The ordinary mapping of sample reads to a reference genome allows the identification of duplicated genomic/chromosomal regions based on a quantitative study of the sequences generated (Fig. 1). This is particularly useful in studies of polymorphism in large chromosomes, such as B or sex chromosomes. The genomic regions of specific chromosomes can be identified, annotated, and subjected to deep analysis using quantitative PCR and FISH mapping (Fig. 1). Furthermore, the generation of large RNA datasets and their analysis using powerful bioinformatics tools also contribute to studies of B chromosomes, thus providing a global view of cell physiology (Fig. 2).

Fig. 1
figure 1

Sequencing coverage of three representative scaffold regions (ac) for the 0B, 1B, and 2B genomes of the cichlid fish Astatotilapia latifasciata aligned against the reference genome of the cichlid Metriaclima zebra. The higher read coverage for samples 1B and 2B provide evidence for rounds of segmented duplications that increased the copy number for B+ sequences. The read coverage is indicated at the left of the figures. The higher sequencing coverage detected under genomic analysis is corroborated by FISH-mapping (d, e). d, e Represent FISH-mapping of genomic regions a, b, respectively. Arrowheads indicate the B chromosomes in d and e

Fig. 2
figure 2

Global metrics of co-expression network, representative miRNA/coding genes regulatory network, and gene ontology (GO) analysis for B− and B+ samples of A. latifasciata. a Degree distribution for B+ and B− co-expression networks. The power law is clearly observed in both situations and indicate reliable biological networks; b Box-plot of all node degrees calculated for B+ and B− networks. The B+ network is larger than B− and contains more diverse connections; c Venn diagram representing the number of transcripts in both networks. The number of GO-annotated transcripts is also shown; d partial network showing the regulation between up-regulated miRNAs (red nodes) and targeted genes (black nodes); e hierarchical network of the molecular functions of B+ genes. Red circles indicate molecular functions terms with high-scored values, and gray circles represent statistically unsupported nodes

Next-generation sequencing enables the use of large-scale analysis to explore B chromosome biology in plants, animals, and fungi. Flow-sorted isolated B chromosomes of rye (Secale cereale) have been analyzed using 454 Roche NGS sequencing (Martis et al. 2012); the findings showed that rye B chromosomes are mostly derived from the autosomes 3R and 7R, with the subsequent accumulation of sequences from the other autosomes and organelle genomes. Large-scale genomic analysis of the cichlid fish Astatotilapia latifasciata (Valente et al. 2014) based on the Illumina NGS of microdissected B chromosome and whole DNA from B+ (presence of one or two B chromosomes) and B− (absence of B chromosomes) animal samples showed that the B chromosome contains thousands of sequences that have been duplicated from almost all chromosomes of this species. Although most genes on B chromosomes are fragmented, some are largely intact. Whole-genome shotgun (WGS) sequencing of the fungal wheat pathogen Mycosphaerella graminicola (Goodwin et al. 2011) showed that the chromosome dispensome set (accessory chromosomes) contains some genes and repetitive sequences from most or all of the core chromosomes, with additional unique genes of unknown origin. Although the chromosome dispensome of M. graminicola is not maintained by meiotic drive as are the classical B chromosomes, it can be lost with no visible effect on the fungus (Wittenberg et al. 2009). NGS technology enables the large-scale analysis of genomic content, which was not possible in classical and molecular cytogenetics, and the identification of a larger number of genes and DNA elements that are related to specific karyotype variations. This information changes the classical view of B chromosomes as being gene-poor, suggesting that Bs can exert effective action over cell biology.

Genes and functional sequences on B chromosomes

Repetitive DNA classes, including mobile elements and satellite DNAs, are enriched in B chromosomes, as extensively described in many studies (for a review, see Camacho 2005; Burt and Trivers 2006). The identification of protein coding genes and transcriptionally active sequences has advanced over the last decade, considerably altering our view of the molecular structure of Bs. The first discovery of protein-coding genes in Bs of the fungus N. haematococca (Miao et al. 1991) was followed by a growing number of reports describing the presence of copies of protein-coding genes, rRNA genes, pseudogenes, and transcriptionally active sequences on Bs. Multigene families for rRNA (López-León et al. 1994; Donald et al. 1995; Stitou et al. 2000; van Vugt et al. 2005; Bidau et al. 2004; Silva and Yassuda 2004; Baroni et al. 2009; Poletto et al. 2010), H1, H3, and H4 histones (Teruel et al. 2010; Oliveira et al. 2011; Silva et al. 2014, 2016; Utsunomia et al. 2016) and U2 snRNA (Bueno et al. 2013) have also been observed in the B chromosomes of many species. Indirect evidence of B activity was first described for the grasshopper Dichroplus pratensis (Bidau 1986) and the mosquito Simulium juxtacrenobium (Brockhouse et al. 1989) through the identification of nucleolar phenotypes related to rRNA gene expression. Indeed, the first molecular evidence of gene activity on B chromosomes was the finding of rRNA gene transcription in the plant Crepis capillaris (Leach et al. 2005) and in the parasitoid wasp Trichogramma kaykai (van Vugt et al. 2005). Transcription of B chromosome rRNA gene copies was later identified in the grasshopper Eyprepocnemis plorans (Ruiz-Estévez et al. 2013, 2014). The most recent findings regarding this subject provided evidence that various repetitive DNAs are transcriptionally modulated by B chromosomes (Lamb et al. 2007; Carchilan et al. 2007, 2009; Klemme et al. 2013; Banaei-Moghaddam et al. 2015; Ramos et al. 2016).

The proto-oncogene c-KIT maps to B chromosomes of the canids red fox and Chinese and Japanese raccoon dogs (Graphodatsky et al. 2005; Yudkin et al. 2007) and in the ruminant Siberian roe deer (Trifonov et al. 2013). This gene is transcriptionally active in the Siberian deer. Several genes involved in microtubule organization (TUBB1, TUBB5), kinetochore structure (SKA1, KIF11, CENP-E), recombination (XRCC2, SYCP2, RTEL1), and progression through the cell cycle (Separase, AURK) were identified in the B chromosomes of the cichlid fish A. latifasciata (Valente et al. 2014). Transcriptome analysis revealed that B-chromosomal copies of the Separase, TUBB1, and KIF11 genes are transcribed in another cichlid fish, Pundamilia niererei (Valente et al. 2014). RNA-seq applied to transcriptome analyses of maize with varying copies of B chromosome (B73 + 0B, B73 + 1B, and B73 + 6Bs) provides evidence that the expression of genes in the A genome is influenced by the presence of B chromosomes and that higher numbers of Bs generate more obvious phenotypic effects (Huang et al. 2016). Other authors have also demonstrated the effects of B chromosomes on the transcription of the A genome (Carchilan et al. 2009; Banaei-Moghaddam et al. 2013, 2015).

The presence of genes and functional sequences in Bs and their action over the whole genome modifies the conventional interpretation of B chromosomes as non-functional elements and suggests that B chromosomes can cause extensive changes over the whole transcriptional profile of cells with important implications for cell biology.

B chromosomes in the context of integrative omics and biological networks

Systems biology is another emergent area in biological science and analyzes biological phenomena from a holistic rather than a reductionist perspective. The integrative principle offered by systems biology is an important concept for biology because the analysis of complex systems in other areas has shown that the features of a system arise from the interaction of its components; thus, systems have emergent properties that cannot be observed when the components are investigated separately.

Although this emergent science remains new to most biologists, it has proved highly effective for the understanding of protein-protein interactions, regulatory and metabolic networks, and other biological interactions. Biological networks obey a power-law distribution concerning the degree (the number of edges for a node) of their nodes (the nodes can be genes, proteins, or enzymes, etc.), thus fitting the small world effect; thus, a few edges convey information among lots of nodes, as can be explained by the Barabási-Albert model. This model predicts (i) the high flexibility and integrity of biological networks (they are resistant to random perturbations); (ii) the growth in networks resulting from biological evolution; and (iii) the preferential attachment of new nodes (new nodes prefer to bind to older, highly connected nodes; the rich-get-richer principle) (Barzel et al. 2012). Systems biology is currently well connected to omics sciences because the latter provides large-scale data (e.g., RNA-Seq can provide thousands of transcript expression measurements from which co-expression networks can be inferred).

Differential expression analyses of B+ and B− total tissue messenger RNAs (mRNAs) of A. latifasciata have revealed genes that are involved with important cell structures/processes, such as the structural maintenance of cells and chromosomes, gene regulation, and the cell cycle (Marques et al. in preparation). The inference of co-expression networks (Spearman’s rank correlation) (Ballouz et al. 2015) using RNA-Seq from the B+ and B− genotypes of A. latifasciata and computational analysis reveals that both nets fit the power-law distribution for degrees (Fig. 2a), indicating that the networks are non-random and possess the small-world property. The B+ genotype network is larger than the B− network (>4 M connections) and is also more diverse (Fig. 2b); furthermore, the degrees for all transcripts are significantly different (p value ≪ 0.001 according to the Kruskal-Wallis test) between the networks. However, the B+ and B− networks do not differ greatly in the number of transcripts (177,534 and 175,819 for the B− and B+ networks, respectively) and have many nodes in common (170,845) (Fig. 2c). Network comparative analysis indicates that B parasitism is not only related to the expression of its own genes but also modifies A genome expression; otherwise, the co-expression networks would not differ between the genotypes; this finding suggests that B exerts a high level of edge rewiring of the network.

All nodes of both networks were submitted to a gene ontology annotation (94,900 transcripts were annotated); 88,870 annotated transcripts are shared between the networks, and 3356 and 2674 transcripts are exclusive to B− and B+, respectively (Fig. 2c). A detailed analysis of the genes that are present in the most highly scored nodes for the molecular function domain (DNA binding and nucleotide binding terms) retrieved six genes that were exclusive to the B+ network annotation (Fig. 2e, Table 1). Most of these genes (LRC23 and RRP7A are the exceptions) encode proteins that regulate gene transcription. For instance, in humans, MYCBP binds to MYC after its expression is increased, generating positive feedback (Taira et al. 1998; Furusawa et al. 2001). MYC in eukaryotes is a transcription factor that regulates the cell cycle and cell differentiation (McMahon et al. 1998) and also plays a role in preventing chromosomal rearrangements through the induction of telomerase activity (Wu et al. 1999). Transcription factors were also found in our analysis (NR2F6 and MSGN1), and ZNF596 was the only transcription factor that was found with high integrity levels in the B gene list reported by Valente et al. (2014) (~71 % for XM_004572867.1 and XM_004574919.1 and ~63.4 % for XM_004576325.1).

Table 1 Gene ontology analysis of the genes present in the B+ network annotated as high-scored nodes for the molecular function domain

Similar data were obtained during the micro-RNA (miRNA) analysis of whole microRNomes of B+ tissues compared to B− tissues of A. latifasciata (Fantinatti et al. in preparation) (Fig. 2d); these data highlight the fact that miRNAs are involved in processes that are related to cell development and cell cycle control, cellular division, and other functions. Based on such an integrative analysis, it was also clearly observed that the effects of B chromosomes on the host genome are extraordinarily variable depending on the tissue analyzed; this suggests that B chromosome activity is somewhat dependent on the environmental condition of cells. For example, among the target genes of differentially expressed B+ miRNAs, it can be observed through target prediction that the Neurog1 gene is targeted by three female-specific, up-regulated miRNAs (Fig. 2d). It is known that in zebrafish, Neurog1 is involved in the regulation of cell fate during neuronal tissue development (Onoguchi et al. 2012) and is also involved in ganglion development (Andermann et al. 2002).

Although it is not clear how Bs exert their influence on cell biology, we hypothesize that regulators expressed by Bs, such as transcription factors, miRNAs, and other ncRNAs (including lncRNAs) might trigger extensive A-genome expression changes. Noncoding RNA represents one of the most expressed RNA classes in the cell and has been highly explored in recent years because of their importance in controlling several biological processes (Morris and Mattick 2014; Rinn and Guttman 2014). Based on the accumulated evidence obtained from several organisms, we propose that B chromosomes might modulate cell biology through (i) the production of regulatory RNAs that are transcribed from B sequences and/or (ii) the differential expression of B paralog genes. Indeed, we know that the B chromosome of A. latifasciata regulates the expression of noncoding RNAs, thus influencing A-genome expression (Ramos et al. 2016; Fantinatti et al. in preparation; Marques et al. in preparation), as is observed in other species (Carchilan et al. 2009; Akbari et al. 2013; Banaei-Moghaddam et al. 2013, 2015; Huang et al. 2016). Some individuals of the Nasonia vitripennis genome contain a supernumerary (B) chromosome known as paternal sex ratio (PSR) that expresses both potentially coding and noncoding transcripts that could act over the B transmission to males (Akbari et al. 2013). Furthermore, in vitro activity of a B chromosome encoded protein gene (Argonaute-like, involved in the regulation of gene expression) was for the first time observed in rye, providing evidences that a B derived protein could act silencing chromatin/DNA elements (Ma et al. 2016). We propose that the expression of regulators from Bs could initiate a cascade of events, apparently in a selfish strategy, thus altering the expression of A-genes in a non-Boolean way for pre- and post-transcriptional regulation as well as translational regulation. Our co-expression network analysis for A. latifasciata strongly supports the notion that the influence of B on the A complement is not slight but that it widely changes cell physiology. Currently, we can state that Bs are probably maintained in the cell and transmitted between generations by this mechanism.

Final remarks

The accumulated data suggests that B can act by manipulating the entire cell system to benefit its own survival. However, the mechanism by which this is accomplished remains completely unknown. We can hypothesize that B acts in a manner similar to the parasitic dodder plant, which seems to continuously track the physiological status of the host and uses its own mRNA to manipulate the host in a manner that benefits the parasite (Kim et al. 2014). We suggest that the next steps in B chromosome science are to determine whether the A-genome control that results from the presence of B is “motivated” by a parasitic behavior similar to that exhibited by dodder. A complementary focus would use strategies involving extensive networks analysis, such as that illustrated here, to better understand the parasitism strategy used by Bs from a systemic perspective. Omics, computational science, and systems biology are the most suitable strategies to answer questions such as the following: (i) Does the B chromosome “monitor” A-genome expression to influence systemic changes for its own benefit? (ii) How does the signaling cascade of co-expression and protein-protein interactions alter A-genome expression? (iii) Could B chromosomes influence other important networks in the cell that are unrelated to its parasitism?