17.1 Introduction to Crop and Evolution

Brassica crops provide the maximum diversity of products from a single genus Brassicaceae includes 372 plant genera and almost 4060 are accepted species names (Brassicaceae—The Plant List) and 3660 species are classified within the 321 genera (Kiefer et al. 2014). Brassica species play an essential role in agriculture and horticulture (Rakow 2004; El-Esawi 2016). Annual coverage of cultivation of Brassica oilseed crops is ~ 34 million hectares of the world’s agricultural land (FAO 2013). India stands third in rapeseed-mustard production with a total of 12–15% of cultivated oilseeds’ area (Venkattakumar and Padmaiah 2010). Members of Brassica are mostly adaptive to lower temperatures and hence are well adapted to cultivation at high elevations and as winter crops in the subtropical areas. In temperate zones, oilseed rape (Brassica napus) and turnip rape (Brassica rapa) are predominately cultivated, while Indian mustard or rai (Brassica juncea) is cultivated as major oil source in the subtropics of Asia. The three allotetraploids (B. juncea, B. carinata, and B. napus) account for 12% of edible oil production of the world (http://www.fao.org/faostat). Besides this, Brassicas serve as leaf, flower, and root vegetables that are eaten fresh, cooked and processed and also being used as fodder and forage. There is a wide variation among the Brassicas for morphological and adaptive traits which has been useful for breeding for improved cultivars (Jambhulkar 2015; Rai et al. 2021).

Wild diploid Brassica and their related hybrid amphidiploids have evolved naturally and are confirmed by extensive experimental crosses between diploid and/or tetraploid followed by karyotyping and microscopic observations at the synapsis stage of meiosis in these crosses (Cheng et al. 2014). Based upon the studies, the genetic relationships of these species were identified by the Korean botanist Nagaharu (1935) that three basic diploid Brassica forms were probably the parents of subsequent amphidiploid crops. Brassica nigra (black mustard), the ancestor of culinary mustards, is found as annual herb growing in the rocky Mediterranean coasts. Natural populations of B. oleracea and associated types have been identified as potential progenitors of many European cole vegetables which are capable of conserving water and nutrients. The putative ancestor of B. rapa may have originated from the high plateau regions in today’s Iran–Iraq–Turkey which had the ability to grow rapidly in the hot, dry conditions, forming copious seed (Dixon 2007; El-Esawi 2016). Brassica carinata (n = 17) hybrid might have originated from the hybridization of B. oleracea (n = 9) with wild or semi-domesticated forms of B. nigra (n = 8). Another amphidiploid, Brassica juncea (n = 18) is a hybridization product of B. rapa (n = 10) and B. nigra (n = 8) (Frandsen 1943). The third amphidiploid, B. napus (n = 19) developed from a cross between B. rapa (n = 10) and B. oleracea (n = 9). Besides these, an additional gene pool involves genera and species related to Brassica crops in 36 cytodemes such as Diplotaxis, Enarthrocarpus, Eruca, Erucastrum, Hirschfeldia, Rhynchosinapis, Sinapis, Sinapodendron, and Trachystoma genera (Harbered 1976; Branca and Cartea 2011). The nuclear DNA content among the different species in Brassicaceae has a very narrow range (0.16 pg < 1C < 1.95 pg) much lower than Poaceae, and Fabaceae suggesting a dynamic, genome size divergence during evolution in the Brassica members. Genetic relationship of the Brassica species and genome size are presented in Fig. 17.1. Despite such conservative DNA content, a great deal of structural evolution of genomes has taken place during the evolution (Lagercrantz and Lydiate 1996; Lan et al. 2000). According to Song et al. (1995) genome instability was the basis for all the genomic changes observed in allopolyploids.

Fig. 17.1
figure 1

Genetic relationship of the Brassica species [1C, 1C nuclear DNA content (pg); GS, genome size (Mbp)] (Johnston et al. 2005; Chang et al. 2008)

The evolution of Brassica and allied genera from a common ancestor with n = 6 was explained through the phylogenetic studies suggesting an increase in the number of chromosomes and partial homology of A, B, and C genomes (Branca and Cartea 2011). Whole genome sequencing and comparative genomic analysis based on the genome sequences of B. rapa and A. thaliana further suggested the whole genome triplication (WGT) phenomenon in the speciation and morphotype diversification of Brassica spp. After WGT, extensive genome fractionation, block reshuffling and chromosome reduction produced the stable diploid species (Cheng et al. 2017a, b). Further rearrangement of these species and their hybridization has led to Brassica speciation (Cheng et al. 2014). Genome sequencing of B. juncea and B. napus revealed that A subgenomes of these species had independent origins. Homoeolog expression dominance has been observed between subgenomes of allopolyploid B. juncea and differentially expressed genes for glucosinolates and lipid metabolism showed more selection potential over neutral genes (Yang et al. 2016). In B. napus, transcriptomic shock was found to be dominated, and variation in the expression level dominance biasness was observed from tissue to tissue along with more transgressive upregulation, rather than down regulation (Li et al. 2020).

17.2 Transcriptome Studies

In the initial era of genomics, gene expression studies were initially restricted to few/specific genes using techniques like expressed sequence tags (ESTs) (Marra et al. 1998), Northern hybridization (Alwine et al. 1977), PCR analyses of specific genes (Becker-André and Hahlbrock 1989). This was followed by genome scale approaches to transcript characterization, namely serial analysis of gene expression (SAGE) (Velculescu et al. 1995) and DNA microarrays (Lockhart et al. 1996) which allowed a direct transcript quantification and discovery of new genes. With the advancement of sequencing techniques, i.e., next generation sequencing (Margulies et al. 2005), the whole genome transcriptomics (RNA Seq) has become a significant tool for transcriptome analysis of non-model organisms (Ellegren et al. 2012; Lamichhaney et al. 2012).

17.2.1 Transcriptome Sequencing

RNA-Seq combines the high-throughput sequencing methodology with computational methods to capture and quantify transcripts (Ozsolak and Milos 2011) in a tissue, organ, or organism (Martin et al. 2013; Conesa et al. 2016). This technique enables comparative quantification of total gene expression in different tissues, developmental stages, or environmental conditions and has been used to identify genes responsible for specific biological or regulatory functions. Moreover, a comprehensive “snapshot” of the total transcripts present in a sample can be developed to determine the presence or absence of specific transcripts and quantify transcript abundance. RNA-seq can also provide valuable information on unusual transcriptional events, such as alternate splicing, gene fusion, and novel transcripts (Mutz et al. 2013). There are three basic strategies for RNA-seq analysis: genome mapping, transcript mapping, and reference-free assembly (Conesa et al. 2016). In case of genome mapping, all the resultant RNA-seq reads are mapped against the organism’s reference genome for transcript identification which can be subsequently quantified. Transcripts not able to be mapped to the reference genome are identified as novel transcripts and all the relevant genome information is used to predict novel transcript function enabling further genome annotation (Conesa et al. 2016; Yang et al. 2016). Finally, reference-free assembly uses an RNA-seq derived transcript profile to de novo assemble a complete transcriptome in the absence of a reference genome; this approach is also known as de novo transcriptome assembly (Grabherr et al. 2011). Several next generation sequencing (NGS) technologies have been developed for transcriptome analysis, including Illumina, Solexa, SOLID, and Roche 454 (Conesa et al. 2016). Of these, Illumina has become the predominant transcriptome platform for NGS research, due its cost-effectiveness and high-throughput nature. In the “short-read sequencing,” total transcript can be sequenced in short (< 500 bp) fragments, which are then bioinformatically assembled with or without a reference genome to obtain full-length transcripts and isoforms. These total transcripts may then be annotated using reference databases for functional characterization and comparative analyses (Garg and Jain 2013).

17.2.2 Long-Read-Based Transcriptome Sequencing

Recent improvements in long-read sequencing (LS) technologies, such as Oxford Nanopore Technologies (ONT) and PacBio (PB), have enabled the direct RNA and cDNA sequencing of full-length transcriptomes (Cui et al. 2020). With the ability to sequence polynucleotide molecules which are hundreds of thousands of nucleotides in length, long-read transcriptome sequencing has greatly improved the ability to obtain full-length transcript information (Wang et al. 2016a, b). Furthermore, LS-based transcriptomics provided support for alternate splicing analysis and complete isoform characterization, which paved the ways for existing genome annotations and gene models. Recently, LS-based maize transcriptome analysis helped to identify the most comprehensive mRNA profile to date, including identification of 57% novel transcripts and isoforms. In B. napus, single molecule long-read sequence analysis provided a highly accurate and comprehensive transcriptome, in which approximately 15,000 genes (18%) were identified as multi-exonic and showed complex alternative splicing (Yao et al. 2020). These data facilitate a critical new understanding of B. napus transcriptomics for functional genomics research. Such work has not only revealed the previously unexplored intricacies of B. napus transcriptomes, but also exemplifies the importance of LS in exploring and understanding transcriptome complexities (Wang et al. 2016a, b).

The PacBio single-molecule real time (SMRT) sequencing approach has been employed for transcriptome sequencing of many different plant species, including maize, rice, coffee bean, Amborella trichopoda, Rhododendron lapponicum, and B. napus (Cheng et al. 2017a, b; Yao et al. 2020). Using the SMRT approach, Sun et al. (2019) reported the genome assembly of cauliflower of 584.60-Mb size constituting 47,772, 56.65% repetitive sequences. The study also found larger genome size of cauliflower than A genome of B. rapa, the B genome of B. nigra, and the A or B subgenome of B. napus and B. juncea. Interestingly, cauliflower had the same number of genes as that in C genome Brassica species, and higher abundance of repetitive sequences and other noncoding sequences. In another study, SMRT sequencing was employed to generate transcriptome of Xinjiang green and purple turnips, (Brassica rapa var. rapa) at five developmental stages. The results have yielded a novel resource of alternative splicing, simple sequence repeats, long-noncoding RNAs for use in future genomics research of turnips (Zhuang et al. 2020). In contrast, transcriptomic study using Oxford Nanopore Technologies (ONT) has been severely limited, owing primarily to the low-throughput and high read-error rates associated with the platform. However, it is likely that the continued improvements in the long-read RNA-seq technologies will make these studies attractive and affordable in the near future (Cui et al. 2020). As both ONT and PB LS-based transcriptome analyses have been minimally explored in Brassica genomes, these platforms are expected to play an important role in developing a comprehensive transcriptome atlas of Brassica species.

17.2.3 Single-Cell Transcriptomics

Single-cell transcriptomics or single-cell RNA sequencing (scRNA-seq) has been used to study cell-to-cell gene expression variation within a cell population, which in turn helps to identify the developmental trajectory of individual cell types (Tang et al. 2011; Shulse et al. 2019). Drop-seq is a recently developed high-throughput scRNA-seq method which encapsulates and separates cells in emulsified droplets, enabling the user to transcriptionally profile hundreds of thousands of cells in a single experiment (Macosko et al. 2015). Recently, Drop-seq profiling of > 12,000 Arabidopsis root cells revealed distinct cell types involved in different root stages and developmental activities (Shulse et al. 2019). In this study, the authors demonstrated the rapid identification of rare and novel cell types from plant tissue and simultaneous characterization of multiple and different cell types. This analysis also demonstrated the ability to determine the cell-specific transcriptional response of environmental stimuli such as exogenous sucrose treatment. Such approaches will greatly enhance our understanding of the functional role of tissues, cells, and genes in plant developmental processes and environmental responses. The full potential of this recently evolving technology in plant research is just now being realized and scRNA-seq is expected to be used extensively in future for many plant species, including Brassica (Shaw et al. 2021).

17.2.4 Considerations Regarding RNA Seq

RNA-seq is an efficient technique, showing high resolution and cost advantages for profiling of gene expression between samples or differential expression (DE). However, there are several sources of sequencing bias and systematic noise because of wrong base calls, sequence quality biases (Dohm et al. 2008; Hansen et al. 2010), variability in sequence depth (Sendler et al. 2011) and differences in the composition and coverage of raw sequence data generated from technical and biological replicate samples (Lü et al. 2009).

Thus, the guidelines and standards have been defined by ENCODE to emphasize upon the best practices designed to get quality transcriptome measurements. RNA seq experiments should be performed with two or more biological replicates. A typical R2 (Pearson) correlation of gene expression (RPKM) between two biological replicates, for RNAs should be between 0.92 and 0.98 and the experiments with biological correlations below 0.9 should either be explained or repeated. Experiments related to global view of gene expression typically require 30–60 million reads per sample, whereas 100–200 million reads required to get an in-depth view of the transcriptome or new transcripts assembly. For RNA-seq, sequencing platforms giving reads of ≥ 75 bp length is optimal to minimize the sequencing cost. Other recommendations needs to be taken care as suggested by ENCODE to design the transcriptome experiments (ENCODE 2011, 2016) for significant finding.

17.3 Transcriptome Research in the Brassica Genome

The use of RNA-seq in Brassica research has expanded rapidly in the areas of de novo transcriptome assembly and analyses, differential expression triggered by various biotic and abiotic stresses, noncoding RNA analyses, investigations of genome structure, diversity and genome origin, evolutionary analysis, and marker development (Bancroft et al. 2011; Izzah et al. 2014; Kim et al. 2014; Parkin et al. 2014; Wang et al. 2015). So far, complete sequencing has been reported in five important Brassica family members which include diploids [B. rapa Wang et al. 2011a, b, c; B. oleracea, Liu et al. 2014; B. nigra, Perumal et al. 2020] and allotetraploids [B. napus, Chalhoub et al. 2014; B. juncea, Yang et al. 2016]. In all cases, de novo transcriptome assembly played an important role in decoding the final whole-genome transcripts. Critically, the Brassica transcriptome landscape has facilitated the identification of agronomically important genes, such as those relevant to biotic and abiotic stress tolerance (Mohd Saad et al. 2021). For example, transcriptome analysis in B. napus was used to elucidate the genes involved lipid and glucosinolate biosynthesis (Chalhoub et al. 2014) which could greatly accelerate Brassica breeding programs. Besides there are several other agronomic traits which are targeted for Brassica improvement, and genomic approaches are now sought to aid the breeding efforts (Fig. 17.2).

Fig. 17.2
figure 2

General breeding considerations for the improvement of Brassica family members

RNA-seq-based genome analysis also provided a valuable foundation for the understanding the phenomena of biased gene fractionation and genome dominance of the mesohexaploid B. oleracea genome, whereby one subgenome exhibits transcriptional dominance over the two other subgenomes (Parkin et al. 2014). In addition, a transcriptomic approach employed to dissect the complexity of the origin and diversification of the B. napus genome found that over 8000 differentially expressed genes are associated with diversification in this species (An et al. 2019). Furthermore, RNA-seq has been used in Brassica species to identify the roles of noncoding RNAs (ncRNAs), particularly microRNAs and long ncRNAs, in important biological process such as abiotic stress (Ahmed et al. 2020). Harper et al. (2020) provided confirmatory results on the Associative Transcriptomics platform in Brassica juncea. Using a diverse panel of B. juncea accessions, transcriptome data was mapped to pan-transcriptome. The authors identified several single nucleotide polymorphism variants and measured the quantity of thousands of transcripts. The study identified potential candidate gene BjA.TTL for seed weight trait and other markers for seed color and vitamin E content.

17.3.1 Transcriptomic Studies Related to Biotic Stresses

Plant disease and pests cause significant yield loss in Brassica spp. Major 16 disease and 37 insect pests have been reported in mustard or oilseed rape growing regions (Zheng et al. 2020a, b). The development of host resistance is one of the most desirable and cost-effective method for disease control. Plant-pathogen interaction is a broad process and starts with the detection of microbial elicitors, pathogen-associated molecular patterns (PAMPs) by the membrane-localized receptor proteins with PRRs motif of plants (Dodds and Rathjen 2010; Zipfel 2014). Plant immunity is mainly effector-triggered immunity (ETI) constituting the hypersensitive response (HR), however mostly, the effective resistance against pathogen is imposed through PAMP-triggered immunity (PTI) (Neik et al. 2017). Plants also develop broad-spectrum immunity through various hormonal signaling pathways (Kazan and Lyons 2014).

The differentially expressed genes, QTLs, and the corresponding pathways play important role in host–pathogen interaction and other biotic stresses have become more apparent with the transcriptome profiling in several Brassica species. The RNA-Seq analysis has strengthened the basic understanding of the defense mechanism and the factors imparting tolerance toward the diseases like clubroot disease caused by Plasmodiophora brassicae in B. rapa (Chu et al. 2014; Fu et al. 2019), B. napus (Hejna et al. 2019) and B. juncea (Luo et al. 2018). Similarly, the RNA-Seq studies have also unraveled the defense mechanism for the disease like Fusarium wilt (F. oxysporum), Sclerotinia stem rot (S. sclerotiorum), Blackleg (Leptosphaeria maculans), Downy mildew (Hyaloperonospora brassicae), etc. (Table 17.1A). Most of the studies suggested upregulation of genes related to salicylic acid (SA), jasmonic acid (JA)/ethylene (ET) and brassinosteroid (BR) signaling pathways induced after the pathogen infection. The other components and the pathways providing a shield of host defense against the invading pathogens include secondary metabolites, phenolics, signal transduction, phytohormones, Studies have thrown light on the enrichment of genes in metabolic processes, plant-pathogen interactions, plant hormone signal transduction, glucosinolate biosynthesis, cell wall thickening, chitin metabolism and pathogenesis-related (PR) genes and pathways (Jia et al. 2017). Transcriptomic studies have also revealed insights on the host-defense mechanism(s) for insect-pest attack (Table 17.1B) which includes pathways of cell wall synthesis, secondary metabolite production, redox homeostasis, phytohormones signaling, glucosinolate biosynthesis and degradation (Gruber et al. 2018).

Table 17.1 Brassica transcriptomics related to biotic stresses

17.3.2 Transcriptomic Studies Related to Abiotic Stress

Abiotic stresses have become one of the major threats which restrict crop production and productivity. These influence plant growth at all the phenological stages and induce yield losses depending on stress intensity and durability. Comprehensive studies regarding abiotic stress impact and indices used to assess the impact of these stress have been compiled by Rai et al (2021). Abiotic stress tolerance is a quantitative trait and involves cross talk between various signaling, metabolic, and defense pathways (Fig. 17.3).

Fig. 17.3
figure 3

Source Ali Raza et al. (2021)

Abiotic stress responsive pathways in plants, from signal perception to downstream stress responses.

Transcriptomic studies have been performed to understand the plant stress responses to different abiotic stresses and the tolerance mechanisms. Genome-wide gene expression analysis under drought, salinity, heat, cold, Cadmium metal stress and combined stresses have been performed using RNA seq. These studies have led to generation of enormous datasets which are now being utilized to understand the abiotic stress responses. For example, the major upregulated transcripts identified belong to classes like transcription factors, kinases, heat shock factors (HSFs), calcium signaling pathways, ROS detoxification. Yue et al. (2021) identified candidate heat stress tolerance genes by comparative transcriptomics study on contrasting B. rapa accessions subjected to long-term heat stress treatment. There were notable alterations in functional gene expression, especially of processes related to ER protein processing, hormones and signal transduction pathways. Transcriptomic studies related to abiotic stresses in various Brassica species are summarized in Table 17.2.

Table 17.2 Brassica transcriptomics related to abiotic stress

17.3.3 Transcriptomic Studies Related to Other Traits

Hybrid lethality is an important criterion especially in view of problems in gene exchange and stabilization of a breeding population. Xiao et al. (2021) observed that hybrid lethality in cabbage is the result of program cell death, and hence studied the transcriptome which showed the activation of defense pathways, hormonal and MAPK signaling pathway, related to Ca2+ and hydrogen peroxide. Transcriptomic studies to decipher the heterosis event in B. oleracea suggested the involvement of regulatory processes involving light and hydrogen peroxide-mediated signaling pathways (Li et al. 2018). In B. napus, biomass and yield traits, and harvest index traits related genes were identified using RNA seq (Lu et al. 2017; Lu et al. 2017). Flowering time is an important agronomic trait. Natural variation in the expression levels of floral repressor FLOWERING LOCUS C (FLC) leads to differences in vernalization. In Brassica napus, nine copies of FLC have been found which control time of vernalization and the transcriptome study suggested the dynamic shift in the expression of multiple paralogs of BnaFLC (Calderwood et al. 2021). The RNA seq-based studies have also helped in deciphering the mechanism involved in bolting, flowering, leaf color, petal color and size, seed color, embryo development, and oil accumulation (Table 17.3). Dynamic gene expression changes of acyl-CoA-binding proteins, BnACBP2 and BnACBP6 were found to regulate the distribution of lipids in embryos and seed coats of B. napus suggesting their importance in fatty acid and triacylglycerol biosynthesis and oil accumulation (Pan et al. 2019). In B. rapa, which is important as a vegetable and oil crop, seed related traits like size, color, and oil content assume great relevance. Niu et al. (2020) studied transcriptomes of seed samples and developed transcriptional networks to identify key regulatory genes governing the above traits. This study has further highlighted regulatory networks through transcription factors like TT8, WRI1, FUS3, and CYCB1; genes underlying the trait variation in the seeds for use in biotechnological efforts to breed high yield and improved oil content in Brassica crops.

Table 17.3 Brassica transcriptomics for yield and other attributes

17.4 RNA-seq-Based Marker Development for Genotype Analysis

RNA-seq analyses have become an important resource for developing polymorphic genetic markers, such as expressed sequence tag (EST)-derived simple sequence repeat (SSR) markers and single nucleotide polymorphisms (SNPs). Such markers enable high-throughput and cost-effective genotyping analysis (Paritosh et al. 2013; Izzah et al. 2014), and have various applications in plant breeding, including genetic diversity and population structure analysis, linkage mapping, mapping quantitative trait loci (QTLs) and association analysis, marker-assisted selection, and evolutionary analysis (Izzah et al. 2014; Ding et al. 2015; Chen et al. 2017). RNA-seq-based EST-SSR or SNP markers are developed using expressed transcripts or unigenes, and are therefore expected to have a higher correlation with functional traits than traditional genome-wide SSR and SNP markers (Chen et al. 2017). Furthermore, RNA-seq-based EST-SSR marker development requires minimal labor compared to the conventional approach of EST library-based SSR marker development (Tóth et al. 2000).

RNA-seq-based EST-SSR and SNP markers have been developed for many plant species, including Brassica spp. SNP markers developed from a complete transcriptome assembly of 40 B. napus lines helped to elucidate the impact of polyploidy on breeding and evolution of the B. napus genome (Bancroft et al. 2011). In this study, over 23,000 SNP markers were used to create multiple linkage maps without a reference genome, and elucidated the genome rearrangements and genomic inheritance of the allotetraploid B. napus genome (Bancroft et al. 2011).

Gene expression and transcriptome diversity are contributed by a central mechanism known as alternative splicing which is responsible for plant development, evolution, complexity, and adaptation (Mastrangelo et al. 2012; Ganie and Reddy 2021). Typical codominant markers InDel and SNP are highly polymorphic and are used in marker-assisted selection, genetic mapping, identification, and characterization of brassica germplasm. Three available transcriptome datasets of cabbage were collected to study alternative splicing events and markers like InDel, SNP, SSR markers. Novel mRNA transcripts among these three cabbage transcriptomes were identified via alignment of short reads to the cabbage genome dataset (Xu et al. 2019). InDel genetic markers were used for studying genetic diversity in 36 cabbage genotypes and the transcriptomic analysis showed 20.8% alternate splicing events in the total cabbage genome.

17.5 Genomic and Computational Databases for Brassica spp.

Genomic tools and resources are important in revolutionizing the field of Brassica improvement. With the advancement in sequencing technology, mass sequencing of genomes of various crops have become possible. The custom computational tools and databases play important role in proper utilization of the huge genomic data being produced. Some of the genomic databases for important oilseed crop Brassicas are being outlined in this section.

17.5.1 Brassica Database (BRAD)

The Brassica database, BRAD is a decade old database and was built after the whole genome sequencing of Brassica rapa (Chiifu-401-42) (Cheng et al. 2011). It is a web-based genomic database which can be accessed through http://brassicadb.org and alternative domain (http://brassicadb.cn/). Major sections of the database include Browse, Search, Tools, Download, and Links.

Browse: It contains information on genetic markers, gene families, various genes (glucosinolate gene, anthocyanin genes, resistance genes, flower genes, and auxin genes) and some basic phenotype and species information. Markers and map, subsection of Browse section gives information of a reference genetic linkage map and covers all ten chromosomes. The genetic map was constructed using a population (RCZ16_DH) of 119 doubled haploid (DH) lines obtained from F1 cross between DH line (Z16) and rapid cycling inbred line (L144) (Wang et al. 2011a, b, c). A total of 182 gene families in B. rapa corresponding to that in A. thaliana are given under the subsection gene families. Another subsection under Browse is Glucosinolate genes which describes 102 putative genes and corresponding A. thaliana orthologs (Wang et al. 2011a, b, c). Similarly, under Anthocyanin genes 73 genes of B. rapa as orthologs of 41 anthocyanin biosynthetic genes are given (Guo et al. 2014). Also, the other subsections consist of 244 resistance genes, 136 flowering genes, 342 auxin genes, and 3561 transcription factors of genes of B. rapa.

Search: This section provides the option of keyword search for annotations, syntenic genes, non-syntenic ortholog and gene sequence, and flanking regions. Searching a gene ID under annotations provides result in five databases (Gene Ontology, InterPro domain, KEGG, Swissprot, and Trembl) and orthologous genes as well as BLASTX (best hit) to A. thaliana.

Syntenic genes and non-syntenic orthologs between Brassicaceae and A. thaliana, a well-studied model plant can be accessed using a simple keyword search in BRAD. Insyntenic genes three abbreviations, viz. LF, MF1, and MF2, are used for least fractionized, moderate fractionized, and most fractionized, respectively, to denote subgenomes. Non-syntenic genes in BRAD are determined using two rules that the BLASTP alignment identity should be more than 70% and the genes should not be syntenic orthologs (Cheng et al. 2012). By using the flanking region search in BRAD, users can find the genomic elements such as genes, miRNA, tRNA, rRNA, snRNA, transposons, and genetic markers that flank the region of interest.

Tools: BRAD provides with two embedded tools, viz. BLAST and Genome browse (Gbrowse). BLAST can be used for sequence analysis while Gbrowse can be used to visualize B. rapa genome. Under the alternative domain of BRAD (http://brassicadb.cn/) JBrowse is integrated to visualize the genome of 35 species.

17.5.2 Brassica Genome

This database contains repeat information related to Brassica at http://www.Brassicagenome.net (Wang et al. 2011a, b, c; Golicz et al. 2016; Hurgobin et al. 2018). The database Brassica genome is maintained through grants from the University of Western Australia and the Australian Research Council. The pangenome of B. oleracea, B. rapa, and B. napus can be downloaded from this database. It contains an integrated analysis tool Blast Gbrowse by which a query sequence can be blast against available Brassica genomes and resulting hits can be viewed using Genome Browser. Furthermore, pangenome of B. oleracea, B. rapa, and B. napus can be viewed and searched using embedded tool JBrowse genome browse.

17.5.3 brassica.Info

“brassica.Info” was established under Multinational Brassica Genome Project (MBGP) in 2002 and since then it collates and shares the open source information regarding Brassica genetics and genomics. Information regarding Brassicales Map Alignment Project (BMAP) can also be retrieved through this platform. The major sections of “brassica.info” include genome, phenome, tools, infome, crop use, and outreach. The section genome contains download links to reference annotated Brassica genomes, pan-genomes of B. oleracea and B. napus, 52 B. napus re-sequenced genomes, 4.3 million SNPs and other Brassica genome resources. The section phenome contains link to important research articles related to Brassica ionome, metabolome, proteome, and transcriptome. Under tools section, information regarding clone libraries, genetic markers, research populations (mapping population, TILLING population, mutant population, and Brassica rapa Fast plants) is provided. Another important section of “brassica.info” is infome under which links to a range of databases and web portals relating to Brassica genetics and genomics are given.

17.5.4 BnPIR: Brassica napus Pan-Genome Information Resource

More whole genomes have been sequenced owing to the advancement in sequencing technology. Moreover, for the better understanding of genome complexity and genetic difference analysis pan-genomes has been proposed. So, based on the genome sequence of eight representative rapeseed cultivars and 1688 rapeseed re-sequencing data, BnPIR database (http://cbi.hzau.edu.cn/bnapus) was constructed (Song et al. 2020). It is a comprehensive functional genomic database and its important sections include pan browser, search (gene, species, gene expression, transposable elements, population variation and NLR genes), Gbrowse, tools (blast, KEGG/GO enrichment, homologous region, orthologous, phylogenetic tree, seq_fetch), and KEGG pathway for all the eight representative rapeseed cultivars, viz. Gangan, Zheyou7, Shengli, Tapidor, Quinta, Westar, No2127, and ZS11. The pan-genome is displayed using JBrowse and details of a query gene can be visualized using Gbrowse. Also Gbrowsesynteny can be used to identify gene structural differences. Overall the database BnPIR contains gene classification and annotation, (presence/absence variations) PAV and phylogenetic information, sequence and expression data, and common tools for multi-omics analysis.

17.5.5 BrassicaDB

The database BrassicaDB (http://brassica.nbi.ac.uk/BrassicaDB/index.html) contains information on genetic maps, markers, sequence accessions, “BBSRC set” of Brassica SSR markers and bibliographic information related to B. napus and B. oleracea. Brassica BLAST server is embedded in the database. This database was funded by BBSRC UK CropNet until 2003. However, newly deposited data is still automatically updated periodically in the database. Chao et al. (2020) developed the Brassica Expression Database (BrassicaEDB, https://biodb.swu.edu.cn/brassica/) for the brassica research community to retrieve the expression level data for target genes in different tissues and in response to different treatments to elucidate gene functions and explore the biology of rapeseed at the transcriptome level.

17.5.6 Bolbase

The database Bolbase (http://ocri-genomics.org/bolbase) contains genome data of B. oleracea and provides comparative genomics information including syntenic regions (Yu et al. 2013). The database Bolbase contains two important sections: (1) genomic data and genomic component data (2) analysis on syntenic regions. The information on genomic data includes genome sequence, scaffold and psuedochromosome sequences while genomic component data mainly includes gene structure, location, functional annotation, orthologs, syntenic regions, repeats elements, and predicted noncoding RNAs. Major sections of the database include browse, synteny, search, and document. Bolbase contains important tools including keyword and similarity search, and an embedded generic genome browser (GBrowse) for visualization (Table 17.4).

Table 17.4 Genomic databases of Brassica

17.6 Functional Genomics and Its Role in Brassica Improvement

17.6.1 Functional Genomics

Functional genomics research in Brassica has enabled the understanding the function and regulation of several genes associated with productivity related traits. Loss of function or knockout mutants can be created using techniques such as mutagenesis, RNA interference, and CRISPR/Cas9. Mehmood et al. (2021) analyzed cold-stress responses in tolerant and sensitive rapeseed lines using RNA-Seq and found involvement of pathways of photosynthesis, antioxidant defense, and energy metabolism. Further authors validated the function of three genes (nir, cml, and cat) by analyzing the T-DNA insertion lines mutant lines of Arabidopsis and suggested varied freezing response. Function of a gene can be assigned using mutant analysis which further can provide important information on its regulation and metabolic activity. In mutagenesis, mutation in a specific gene is produced to disrupt its function and phenotype of the mutant is then observed for assigning function to the particular gene. One of the most important objectives of mutagenesis is to produce maximum genetic variation (Sikora et al. 2011). Ethyl methanesulfonate or EMS is the most commonly used chemical mutagen while other chemical mutagens such as sodium azide and methylnitrosourea are also in use (Sikora et al. 2011).

17.6.2 TILLING for Identification of Genes Related to Erucic Acid and Abiotic Stress Tolerance

Targeting induced local lesions in genomes (TILLING) is an efficient technique to detect mautagenesis (McCallum et al. 2000). TILLING as a reverse genetics tool provide numerous advantages in functional genetics. It can be applied to any species irrespective of its genome size and ploidy level. This technique combines the advantage of classical mutagenesis for producing high frequency of mutation and high throughput screening for nucleotide polymorphism (Kurowska et al. 2011). TILLING has been applied for important crops including B. oleracea (Himelblau et al. 2009), B. rapa (Stephenson et al. 2010), and A. thaliana (Greene et al. 2003). Briefly, TILLING includes three major steps, i.e., (1) mutant population generation, (2) detection of mutation, and (3) analysis of mutant phenotype. Sequencing of target gene can be done to confirm the mutation, and phenotyping of M3 individuals is done for the analysis (Kurowska et al. 2011). The seeds and DNA samples from M2 population are archived and form TILLING platform. RevGenUK (http://revgenuk.jic.ac.uk/about.html) and CAN-TILL (http://www.botany.ubc.ca/can-till/) are the TILLING platforms related to Brassica (Himelblau et al. 2009; Stephenson et al. 2010).

A TILLING platform in B. napus was constructed using EMS for functional genomics and generated two mutated populations derived from cv. Ningyou7. Furthermore, these populations were used for forward genetic screen for gene discovery. The TIILING platform was tested for mutations in fatty acid elongase1 (FAE1) gene, an important gene in erucic acid biosynthesis. Using reverse genetics screening, 19 mutations for FAE1 in 1344 M2 plants could be identified out of which three mutations were associated with reduction in erucic acid content (Wang et al. 2008). Another TILLING platform in diploid Brassica (B. rapa) was also created using EMS and is available publicly through RevGenUK platform (Stephenson et al. 2010).

Phytoremediation potential of various species of genus Brassica is well reported in literature (Rizwan et al. 2018; Thakur et al. 2019; Raj et al. 2020). Function of a vacuolar transporter, i.e., calcium exchanger 1 (CAX1), was examined in B. rapa using TILLING. The mutants for the gene CAX1 were created through TILLING. It was revealed that BraA.cax1a mutation enhances cadmium uptake capacity but BraA.cax1a-12 mutants were found suitable for phytoremediation as it accumulated threefold more cadmium than parental line as well as greater cadmium tolerance (Navarro-León et al. 2019). A mutant (BraA.hma4a-3) detected through TILLING, having mutation for HMA4 transporter in B. rapa, was found to be a better zinc accumulator than parental line (R-o-18). Moreover, BraA.hma4a-3 plants showed better tolerance toward zinc toxicity (Blasco et al. 2019). Another study found that BraA.hma4a-3 mutants can accumulate greater amount of cadmium in leaves and showed better tolerance to cadmium toxicity than parental line (Navarro-León et al. 2019).

17.6.3 RNA Interference

RNA interference (RNAi) is an important tool of functional genomics. RNAi has been used successfully to find out the function and biological role of genes in crops including wheat, cotton and B. napus (Travella et al. 2006; Abdurakhmonov et al. 2016). It is a universal eukaryotic process of sequence-specific gene silencing (Hannon 2002). Dicer enzymes recognize and cleave dsRNA into siRNA (21–25 bp long double stranded fragments) which is further processed into single stranded “passenger” and “guide” RNAs. While the “passenger” RNA is degraded “guide” RNA recognize and digest the target RNA through RNA-induced silencing complex (Hannon 2002).

For its use as functional genomics tool, knock out lines are generated and phenotype is tested to characterize the function of knock out gene. RNAi as a functional genomics tool has many advantages such as multiple target genes silencing (McGinnis et al. 2007). Using RNAi, a loss-of-function analysis for BnaNPR1 was performed and it was found that BnaNPR1 repression is associated with reduction in S. sclerotiorum resistance in B. napus (Diepenbrock 2000). Another study demonstrated the function of BnGPAT19 and BnGPAT21 in B. napus using RNAi. Suppression of BnGPAT19 and BnGPAT21 resulted in thinner cuticle and necrotic lesions on fungal inoculation, indicating the possible role of these genes in cuticular wax biosynthesis (Wang et al. 2020a, b).

Glucoraphanin is a glucosinolate found in Brassicales and its breakdown product sulphoraphane is known to have anti-cancerous properties (Fahey et al. 1997; Variyar et al. 2014). It is known that GSL-ALK enzyme catalyze conversion of glucoraphanin to undesirable products; a total of 29 transgenic lines (knock-down of gene GSL-ALK) of B. juncea were created using RNAi. Silencing of GSL-ALK enzyme led to reduction in undesirable glucosinolates while the growth and seed quality was not hampered as compared to untransformed control (Augustine and Bisht 2015). Similarly, in another study the transgenic B. juncea lines (BjMYB28 gene suppressed) were created using RNAi, which leads to reduction in glucosinolate content without affecting its growth and development (Augustine et al. 2013).

17.7 Genome Editing Tools

Advancements in genome editing techniques, especially the Clustered regularly interspaced short palindromic repeat /CRISPR-associated protein 9 (CRISPER/Cas9) has become a powerful tool for plant functional genomics research (Feng et al. 2013; Shan et al. 2013; Liu et al. 2016). Using CRISPER/Cas9, the target DNA is cut and which then is repaired by non-homologous end-joining giving rise to indel mutations. Knockout mutants created using the CRISPER/Cas9 technology can be used for loss of function analysis (Puchta 2017; Liu et al. 2019a, b). Further, high throughput functional screening can be done as it is programmable and highly precise (Liu et al. 2019a, b). This technology has been successfully used in different plant species. However, in Brassica, there are few successful examples of genome editing. In B. napus, the modification of the metabolic pathway for fatty acid synthesis was done using a CRISPR/Cas9-based editing of target gene, fatty acid desaturase 2 gene (FAD2), responsible for the catalysis of the desaturation of oleic acid. Seeds of one of the mutants having fad2_Aa allele with a 4-bp deletion was found to have significantly high oleic acid over the wild-type seeds (Okuzaki et al. 2018). Pod shattering is a problem for achieving higher yield in rapeseed cultivation. Zaman et al. (2019) successfully reported multiplex editing of five homeologs BnJAG.A02, BnJAG.C02, BnJAG.C06, BnJAG.A07, and BnJAG.A08. The knockout mutants showed altered pod shape and size phenotypes. One mutant, (BnJAG.A08-NUB-Like paralog of the JAG gene) had significant change in the pod dehiscence and resistance to pod shattering by ~ twofold. Ma et al. (2019a, b) synthesized a tandemly arrayed tRNA-sgRNA sequence to simultaneously generate several sgRNAs by employing the plant endogenous tRNA processing system in cabbage. Target genes included, phytoene desaturase gene (BoPDS), self-incompatibility determinant gene (BoSRK3), and the male sterility associated gene (BoMS1). The application of CRISPR/Cas9 system in B. campestris was studied by targeting the pectin-methylesterase genes Bra003491, Bra007665, and Bra014410. Results have shown the introduction of mutations at the rate, ranging from 20 to 56%. The study has highlighted the potential of CRISPR/Cas9 system for single and multiplex genome editing in a stable and inheritable manner (Xiong et al. 2019). Jeong et al. (2019) have successfully used CRISPER/cas9 system to modify the early-flowering trait in B. rapa by designing seven guide RNAs to target the FLOWERING LOCUS C (FLC). The double knockouts, BraFLC2 and BraFLC3 showing indel efficiency of 97.7 and 100%, were found to have early-flowering phenotype without depending on vernalization. Yellow seed color is a desirable trait for seed quality. By using CRISPR/Cas9 editing, yellow-seeded mutants were generated in rapeseed having mutations in the target gene, BnTT8 gene. The mutants were found genetically stable with high seed oil, protein content and modified fatty acid (FA) profile with no compromise on yield (Zhai et al. 2020).

17.8 Conclusions and Future Perspective

The Brassica family has a wide spectrum of phenotypic and genomic plasticity. Breeding aimed at improvement in traits for biotic and abiotic stress tolerance, and nutritional quality besides yield associated characters is a continued priority. Advances in genomics tools have opened up new avenues in the detection of genetic basis of trait variation and development of molecular markers for accelerating introgression of useful traits (Hu et al. 2021). Transcriptomics advances including RNA-Seq technologies are now increasingly used for profiling gene expression of thousands of genes in spatial and temporal mode. The availability of assembled genomes has enabled molecular marker development, marker-aided selection and functional genomics of important agronomic traits for designing better crops. In this context, functional gene characterization through approaches like loss of function mutants has become valuable for information on regulatory, developmental, biochemical and metabolic networks. Besides other tools like TILLING for fatty acid biosynthesis, insertional mutagenesis and RNA interference for disease resistance and glucosinolates synthesis have been useful in Brassica breeding and improvement.

The study of transcriptomes in Brassica crops has provided significant resource on genome structure, diversity and genome origin, evolutionary analysis, differential gene expression and marker development. This has become possible because of the advances in genome sequencing of important Brassica species (B. rapa, B. oleracea and B. nigra, B. napus, B. juncea) for investigating the whole-genome transcripts, identification of agronomically important genes for stress tolerance, and lipid and glucosinolate biosynthesis. Brassica genome databases are an information gateway for unraveling pathways of biological processes regulated by noncoding RNAs (ncRNAs), particularly microRNAs and long ncRNAs. Further to these developments, the genome editing based on CRISPR/Cas9 system for single and multiplex genome editing has opened up means for designing Brassicas with useful targeted and precise trait modifications. Genomics of the Brassicaceous crops along with other omics technologies offer immense scope for designing highly productive new crop varieties in the Brassica family.