Introduction

The Brassicales order, despite its name referring to crucifer species (Brassicaceae) primarily dominating the northern Hemisphere, is an eco-geographically and morphologically diverse clade spread over all continents except Antarctica. Brassicales comprise 17 families and some 400 genera and c. 4700 species (Christenhusz and Byng 2016; Cardinal-McTeague et al. 2016; Edger et al. 2018). Due to the pivotal role of Arabidopsis thaliana and other Brassicaceae species as model organisms in various research disciplines, Brassicales made their way into contemporary textbooks. The diversity and importance of Brassicaceae, the largest Brassicales family comprising of 3997 species in 341 genera (BrassiBase accessed on January 18, 2018), over shadow the remaining sixteen families. The exceptions are the papaya tree (Carica papaya, Caricaceae), commonly grown for its fleshy berries in tropical and subtropical regions of the world; the caper bush (Capparis spinosa, Capparaceae), cultivated for edible flower buds (capers) and fruits (caper berries); and the lesser-known but multi-purpose drumstick tree or moringa (Moringa oleifera, Moringaceae).

The recently published phylogenetic analyses of Brassicales (Cardinal-McTeague et al. 2016; Rockinger et al. 2016; Edger et al. 2018) show a renewed interest in evolution of this group. The robust phylogenetic frameworks permit the reinterpretation of existing data sets, identification of knowledge gaps and formulation of new hypotheses. Here I am reassessing what is currently known about chromosome number and genome size variation across Brassicales and how these data can be used to reconstruct chromosomal and genome evolution patterns in this order. Furthermore, three family-specific whole-genome duplication (WGD) events are inferred, and chromosome numbers of an ancestral pre-Brassicales genome and the paleotetraploid genome shared by the core Brassicales are discussed.

Chromosome number and genome size data: knowledge gaps

Chromosome numbers for Brassicaceae species are continuously updated in BrassiBase (Kiefer et al. 2014; Koch et al. 2018), and several new counts for Caricaceae were published by Rockinger et al. (2016). The Chromosome Counts Database (CCDB, Rice et al. 2015) represents the most comprehensive chromosome number data resource for Brassicales species, and CCDB was explored to draw the conclusions detailed on the following lines (see also Online Resource 1 and Fig. 1).

Fig. 1
figure 1

Phylogeny of Brassicales with plotted chromosome numbers and genome sizes for each family. Known haploid chromosome numbers, number of genera/number of species and genome sizes (Mb, in square brackets) are given for each family (the most frequent chromosome numbers are underlined in three families). Inferred ancestral haploid chromosome numbers are marked in black. A star represents a whole-genome duplication; two stars for the At-β duplication reflect the uncertainty of its phylogenetic placement (Edger et al. 2018). Three red stars symbolize WGDs purported at the base of Capparaceae, Resedaceae and Tropaeolaceae. Note that Cleomaceae can be treated as containing only Cleome species or 18 different genera (Patchell et al. 2014). The plastid phylogeny was adopted from Edger et al. (2018), the number of taxa follows Christenhusz and Byng (2016) and karyological records are based on Online Resource 1

Chromosome number data are lacking for three monospecific Brassicales families, namely Emblingiaceae, Pentadiplandraceae and Setchellanthaceae. Chromosome numbers are known for only one of the two species in four bispecific families—Akaniaceae, Bataceae, Koeberliniaceae and Tovariaceae. Chromosome number variation is relatively well characterized in smaller Brassicales families, such as Caricaceae (all 6 genera, c. 21 out of 35 species), Gyrostemonaceae (3 out of 4 genera, 3 out of 20 spp.), Limnanthaceae (all 8 spp.), Moringaceae (5 out of 13 spp.) and Salvadoraceae (2 out of 3 genera, 4 out of 11 spp.).

Among the species-rich families, Brassicaceae is the best researched Brassicales family. Warwick and Al-Shehbaz (2006) reported chromosome counts for 232 out of 338 genera (69%) and for 1558 out of 3709 species (42%). Since 2006, the number of known chromosome counts for Brassicaceae species has further increased (cf. BrassiBase and CCDB). On the other hand is Tropaeolaceae, with counts for only eight species (8% of species); Capparaceae, with records for 38 species (12% of species) in 13 out of 30 genera; Cleomaceae, with 49 (14%) species counted; and Resedaceae, with chromosome number counts for c. 39 species in five genera (36% of species).

At least one genome size record is available for ten out of 17 Brassicales families (Online Resource 1 and Fig. 1). Not surprisingly, Brassicaceae is the family with the most extensive genome size variation knowledge. Genome sizes in other species-rich families remain largely unexplored. Kew’s Plant DNA C-values database (Bennett and Leitch 2012, accessed on January 20, 2018) contains only a single C-value for Cleomaceae and Tropaeolaceae, and only two entries for Resedaceae. Not a single value is available for Capparaceae, harboring more than 300 species.

This statistics summary reveals that the least known are chromosome numbers for three monospecific families (i.e., Emblingiaceae, Pentadiplandraceae and Setchellanthaceae) and that four species-rich Brassicales families (i.e., Capparaceae, Cleomaceae, Resedaceae and Tropaeolaceae) remain largely understudied. The knowledge gap on genome size variation in Brassicales is even more evident, with genome sizes either unknown or reported only for c. 1% of species in all families, except Brassicaceae and Caricaceae. To expedite and navigate future comparative phylogenomic studies in Brassicales, a collaborative effort should be undertaken to compile basic karyological information for most Brassicales taxa.

Extreme chromosome numbers and genome sizes in Brassicales

The lowest haploid chromosome number known for angiosperms is n = 2, found so far only in about five species (Cremonini 2005). Across Brassicales, the lowest known chromosome number is n = 4, known only in a few Brassicaceae species (e.g., Physaria bellii, Stenopetalum nutans). The second lowest chromosome count, n = 5, has been detected in only several Brassicaceae species (e.g., Arabidopsis thaliana, Physaria spp. and Stenopetalum lineare), and in all eight Limnanthaceae species. All n = 4 and n = 5 genomes resulted from descending dysploidies, i.e., reductions in chromosome number, following one or more WGD events postdating the ancient whole-genome triplication (WGT) At-γ (gamma) shared by all eurosids and for the first time detected in grape (Jaillon et al. 2007). The highest chromosome counts in some North American Cardamine species (Brassicaceae) resulted from more recent polyploidization cycles (2n = ±240 in C. concatenata, 2n = c. 256 in C. diphylla).

Genome sizes across Brassicaceae and also Brassicales varies 32-fold. The lowest genome size was reliably reported for Arabidopsis thaliana (157 Mb), whereas the highest DNA content was estimated in the polyploid Crambe cordifolia (4630 Mb) with c. 60 chromosome pairs. Genome sizes of diploid-like species in non-crucifer Brassicales families tend to be small to medium, with the exception of Limnanthes douglasii (Limnanthaceae) having its 1360-Mb genome divided among only five chromosomes (Fig. 2).

Fig. 2
figure 2

Chromosomes of two Brassicales species with the same chromosome number but almost a ninefold difference in genome size. The two species also differ strikingly by their chromosome size and structure. Whereas in Arabidopsis thaliana (Brassicaceae) a low number of repetitive sequences are mainly located in heterochromatic pericentromeres, repeats in Limnanthes douglasii (Limnanthaceae) are presumably uniformly distributed along much larger chromosomes. In both species, more and less condensed mitotic chromosomes, obtained from young root tips, were stained by DAPI and photographs inverted in Adobe Photoshop

Ancient WGDs in Brassicales

The first ancient WGD in Brassicales was detected during the Arabidopsis genome sequencing and assembly, when several genome regions were found to be duplicated on the same or different chromosomes (Arabidopsis Genome Initiative 2000). This paleotetraploid event was referred to as 3R or α (alpha), or alternatively as At-α duplication (e.g., Bowers et al. 2003; Maere et al. 2005; Barker et al. 2009). The At-α is shared by all Brassicaceae tribes including Aethionemeae (Schranz et al. 2012), the tribe sister to all other Brassicaceae tribes. Similarly, Cleomaceae, the second largest Brassicales family (Christenhusz and Byng 2016), experienced a WGT referred to as Cs-α or Th-α duplication (Schranz and Mitchell-Olds 2006; Cheng et al. 2013). Both Brassicaceae and Cleomaceae, but not Caricaceae, share an older WGD, called At-β (beta) (Ming et al. 2008; Barker et al. 2009). This pattern suggests that the At-β must have postdated the divergence of core Brassicales families from Caricaceae and the other three early diverging families (i.e., Akaniaceae, Moringaceae and Tropaeolaceae), which experienced only the At-γ WGT. However, precise phylogenetic placement of the At-β duplication remains unclear. Most recently, Edger et al. (2018) placed the WGD either before or after the split between Setchellanthaceae and core Brassicales (Fig. 1).

Based on the association between family-specific WGDs, post-polyploid diploidization and increased species and genus diversity observed in Brassicaceae and Cleomaceae, as well as in other angiosperm families (e.g., Schranz et al. 2012; Huang et al. 2016; Mandáková et al. 2017; Mandáková and Lysak 2018), I propose that the large Brassicales families—Capparaceae, Resedaceae and Tropaeolaceae—have each experienced their own WGD predating their diversification (Fig. 1). Future sequencing efforts will help to elucidate this assumption.

Evolution of ancestral genomes and chromosome numbers in Brassicales

From a phylogenetic perspective, a hypothetical ancestral Brassicales genome was placed between two modern species whose genomes have been sequenced—Theobroma cacao (cocoa, Malvales) and Carica papaya (papaya). As a gap of millions of years and many as yet unsequenced genomes lay between those two genomes, here we can only speculate on the number of linkage groups and the structure of the ancestral Brassicales genome.

Both Malvales and Brassicales are descendants of the paleohexaploid At-γ genome common to all eudicots and supposedly having 21 chromosome pairs (n = 21; resulting from a triplication of an ancestral n = 7 genome; Salse 2016). The extant chromosome number of cocoa (n = 10) most likely originated from descending dysploidy from the n = 21 genome (Argout et al. 2010). As papaya has a similarly low chromosome number (n = 9) to cocoa and n = 9 was also inferred as the most probable ancestral chromosome number for the entire Caricaceae (Rockinger et al. 2016), an ancestral genome with nine linkage groups can be proposed for the early diverging Brassicales clades. This is further supported by n = 9 reported for Akania and Bretschneidera, two genera of Akaniaceae, which together with Tropaeolaceae, are sister to all the remaining Brassicales families (Fig. 1).

However, the idea of an ancestral genome with n = 9 is weakened by the extant karyological variation in families sister to Akaniaceae and Caricaceae. Whereas the species richness and chromosome number variation across Tropaeolaceae (n = 12–14, up to n = 21, plus some higher counts) is presumably a consequence of post-polyploid diversification following a family-specific WGD, the origin of chromosome numbers in Moringaceae is more puzzling. All Moringaceae species chromosome numbers [n = 14 (13)] are higher than n = 9. Recently, Tian et al. (2015) analyzed a whole-genome sequence of Moringa oleifera, compared it with the papaya genome and showed that Moringaceae did not experience a family-specific genome duplication. This is supported by the low species diversity of this monogeneric family. The absence of a WGD could suggest that the extant chromosome numbers in Moringaceae (n = 14) were derived directly from the ancestral Brassicales genome, either through stasis (n = 14) or by chromosome fissions (n = 9 → n = 14).

To sum up, current phylogenomic data do not allow us to decide on the structure of the ancestral Brassicales genome(s). Two alternative scenarios should be tested: (1) the proto-Brassicales genome had n = 9, still preserved in genomes of Akaniaceae and Caricaceae, and n = 14 in Moringaceae originated by ascending dysploidy via chromosome fissions, or (2) the ancestral genome had more than nine chromosomes, most likely n = 14, still conserved in Moringaceae, and ancestral n = 9 genomes of Akaniaceae and Caricaceae originated through independent descending dysploidies—once in Africa (Caricaceae; Rockinger et al. 2016) and once in America (Akaniaceae; Gandolfo et al. 1988; Stevens 2001 onwards, accessed on January 20, 2018).

At-β paleotetraploid genome

The uncertainty about Brassicales ancestors chromosome number also hampers the inference of the At-β paleotratraploid genome’s chromosome number. A more parsimonious scenario favors a genome with nine chromosomes being duplicated to n = 18, rather than the origin of a tetraploid genome with n = 28 from its n = 14 precursor. However, neither n = 18 nor n = 28 is retained in modern Brassicales genomes, which all diversified after the At-β duplication (Fig. 1). With the exception of Koeberliniaceae (n = 22), chromosome numbers lower than n = 18 found in all post-At-β families without a younger WGD would be in accord with the idea of a tetraploid At-β genome with n = 18, later diploidized by independent dysploidies. Then, the most extensive descending dysploidy would be found in Limnanthaceae, presumably representing a 3.6-fold reduction in chromosome number from n = 18 to n = 5.