Introduction

Agriculture provides food and other products to sustain the global human population. With the growing population, limited land availability, and water shortage, there is a need to enhance the crop yield and improve the crop traits. This can be accomplished by having a better understanding of the physiological and metabolic pathways of crop production. The evolution of sequencing has made it possible to have a thorough knowledge of genes influencing crop enhancement. Sequencing technologies have progressed from traditional to next-generation sequencing. Traditional sequencing methods include Sanger dideoxy synthesis and the Maxam-Gilbert chemical cleavage method. However, Sanger Sequencing by Synthesis (SBS) dideoxy method became the method of choice for sequencing after incorporating some modifications, such as the use of fluorescent (terminator) dyes, thermal-cycle sequencing, and the introduction of capillary electrophoresis software to interpret and analyze the sequences (Slatko et al. 2018). Human Genome Project, International Rice Genome Sequencing Project, and Arabidopsis Genome Initiative were performed using sanger sequencing only followed by the sequencing of many plant species (Sharma et al. 2018). This is the first-generation sequencing method and is still used nowadays for sequencing a few genes where high throughput sequencing is not required. Advancements in sequencing technologies known as next-generation sequencing include second-generation sequencing such as Illumina or Ion Torrent platforms and third-generation sequencing such as Pacific Biosciences or Oxford Nanopore technology platforms (Taishan et al. 2021). Next-generation sequencers follow the massively parallel mode to sequence whole genomes or transcriptomes and produce large sequence data. Next-generation pyrosequencing includes steps: DNA fragmentation, adapter ligation, amplification, and sequencing.

Genomics is necessary for the improvement of crops to fulfill the rising agricultural demands. Advancements in sequencing technologies have led to the expansion of genomics in plants, animals, and microorganisms, as well as, it has modernized the landscape of sciences. Large, complex, and repetitive crop genomes are also easily sequenced using NGS techniques. Genomes of polyploidy crops such as Triticum aestivum Gossypium hirsutum, Brassica oleracea, and Brassica napus were sequenced using 454 and Illumina sequencing (Choulet et al. 2010; Li et al. 2015; Chalhoub et al. 2014). Further, sequencing of complex polyploidy plant genomes such as Chenopodium quinoa genome and Solanum pennellii was performed using Single Molecule Real-Time (SMRT) and Oxford Nanopore Technologies, respectively (Jarvis et al. 2017; Schmidt et al. 2017). NGS defines new sequencing methods with reduced cost, high speed, and better coverage than traditional sequencing methods. The current review summarizes the next-generation sequencing technologies and their applications in the field of agriculture. Various methods employing these NGS platforms have been developed and utilized in the agricultural field to enhance crop productivity. One of the high-throughput sequencing-based approaches, Genotyping by Sequencing (GBS) has been used for the identification of markers in crop genetics (Poland and Rife 2012). In this review, we have focused on how next-generation sequencing has revolutionized the existing agricultural research by introducing different techniques utilizing next-generation sequencing technologies.NGS platforms

NGS platforms

As mentioned above, next-generation sequencing technologies are classified into two categories: the second generation and third-generation sequencing. This section evaluates different NGS platforms used in the field of agriculture and how genomic information extracted from these technologies is enhancing agricultural research.

Second generation sequencing platforms

Roche 454 genome sequencer FLX

454 sequencing utilizes a large-scale parallel pyrosequencing system in which a pyrophosphate particle is released after the incorporation of polymerase-dependent deoxynucleotide triphosphate to operate a downstream set of luciferases oxyluciferin responses that ultimately generates light. In this technology, DNA is fragmented into 800 to 1000 bp in length followed by adapter ligation for library preparation using small DNA-capture beads. Emulsion PCR is used for the clonal amplification of DNA molecules bound to these beads followed by enrichment of clonally amplified DNA molecules. These DNA-bound beads are transferred on a Pico Titre Plate with the addition of DNA polymerase, ATP sulfurylase, and luciferase and then kept for sequencing. 454 Genome Sequencer (GS) FLX Titanium platform delivers about 100,000 sequences in a single run duration of 10 h, with an average read length of 330–500 bp. However, the latest GS FLX platform can generate reads of length 700–1000 bp (Gupta and Gupta 2014). This platform has been used to sequence comparatively long stretches of repetitive DNA in Hordeum vulgare, one of the most important cereal crops. 454 Roche GS FLX platform was used in this study to obtain reads with average lengths between 200 and 250 bp and assemblies achieved High Throughput Genome Sequences phase I quality with N50 (Steuernagel et al. 2009). In another study, transcriptomes of Jatropha curcas L., an important non-edible oilseed crop, have been sequenced using 454 pyrosequencing enabled GS FLX titanium platform. 197.7 Mb data was generated from 383,918 reads with an average read length of 515 bases (Natarajan and Parani 2011). Whole-genome sequencing of Solanum tuberosum, the third most important food crop by the Potato Genome Sequencing Consortium (PGSC) was done using both 454 GS FLX and Illumina GA2 instruments (Visser et al. 2009).

Illumina genome sequencing analyzer

The Illumina Genome Analyser was introduced in 2006 by the name of Solexa which was purchased by Illumina in 2007. It is reversible terminator-based sequencing in which reversible terminators are used instead of dideoxy terminators in sequencing by synthesis approach. Illumina sequencers use a glass flow cell coated with millions of oligonucleotides that are complementary to the sequencing adaptors. The flow cell is divided into eight distinct routes, and the internal surface of each flow cell is covalently bound and linked to the fragments of the library. Hybridization of each library fragment with the primers is followed by amplification to generate millions to billions of clonal clusters. Then, fluorescently labeled nucleotides are used to synthesize a complementary strand for each fragment. The flow cell is imaged after the addition of each tagged nucleotide, and the emission from each cluster is recorded. Based on fluorescent emission intensity and wavelength, sequences of the templates are identified. Different Illumina sequencing platforms are available depending on the number and length of reads such as benchtop sequencers: ISeq, MiniSeq, MiSeq, NextSeq, and production-scale sequencers: NextSeq, HiSeq, and NovaSeq (Kulski 2016). Due to the low cost and improved assembly programs of Illumina, it is the most preferred platform for sequencing crop genomes. Many crop genomes have been sequenced by Illumina technology such as Hordeum vulgare, Triticum aestivum, Oryza sativa, Zea mays, and Glycine max.

Applied biosystems SOLiD sequencer

Sequencing by Oligo Ligation and Detection (SOLiD) platform is based on the sequencing by ligation technology in which DNA ligase is used to detect and incorporate bases in a particular pattern. This technique includes oligo adaptors attached to DNA fragments through magnetic beads connected to complementary oligos and each bead-DNA complex is amplified by PCR emulsion. Hybridization of adapter sequences attached to amplified DNA fragments is allowed with specific primers that deliver a free 5′ phosphate group for ligation to fluorescently labeled probes. Following ligation, generated fluorescence corresponds to the probe ligated. This process of ligation, detection, and cleavage is repeated multiple times which corresponds to the read length (Gupta and Gupta 2014).

Ion personal genome machine (PGM) sequencer

This platform is based on the detection of hydrogen ions released after the incorporation of a nucleotide in a newly synthesized strand by a polymerase. The pH of the solution is changed by the charge of the released ion which is detected by the proprietary ion sensor. Massively parallel detection of sequencing reactions is possible with the help of an ion torrent chip that contains millions of ion-sensitive field-effect transistor (ISFET) sensors. It is the first commercial sequencing procedure without the use of fluorescence and camera scanning (Pereira et al. 2020).

Third-generation sequencing platforms

Also known as single-molecule sequencing because it doesn’t include the amplification step during sequencing library preparation as exists in second-generation sequencing. Also, the read lengths obtained are much longer than the read lengths from second-generation sequencing techniques, hence called long-read sequencing (Christoph 2016).

Pacific biosciences

It is based on Single-Molecule Real-Time (SMRT) sequencing in which a single-stranded molecule of DNA is attached to a polymerase enzyme. Polymerase activity is monitored while incorporating fluorescently labeled nucleotides in newly synthesized strands. This process is enabled with the use of Zero Mode Waveguide (ZMW), extremely small wells made in a metal film deposited on a glass surface. This sequencing is comparatively fast and takes about 4 h per SMRT cell. Read lengths obtained are about 10 kbp, where reads of 54 kbp can also be generated (Christoph 2016).

Oxford nanopore sequencing

This sequencing technology is based on biological nanopores, tiny holes made up of certain transmembrane cellular proteins. Each nanopore is linked to its electrode associated with a channel and sensor chip, which measures the electric current flowing through the nanopore. When a moving nucleotide passes through a nanopore, the current is interrupted and this minute change in the current is decoded to determine the sequence of the passing nucleotide. Every nucleotide disrupts the nanopore to a different level and hence the change in electric current is also different to enable the detection of each nucleotide (Mehdi et al. 2017).

Helicos single molecule sequencing

This sequencing technology is also based on single molecule sequencing, hence called Helicos True Single Molecule Sequencing (tSMS). Firstly, single molecules of template DNA are attached to the specific proprietary surface with the addition of polymerase and one of the fluorescently labeled nucleotides (C, G, A, or T), which is incorporated into the growing complementary strands on all the templates in a sequence-specific manner. After the washing step, imaging of incorporated nucleotides is performed, and their positions are recorded. Subsequently, the fluorescent group is removed in a highly efficient cleavage process, leaving behind the incorporated nucleotide. This process continues through each of the other three bases in multiple cycles, providing a 25–55 bp read (average 35 bp) from each of those individual templates. From 600 million to 1 billion DNA strands, a total of 21 to 35 GB of sequence data is generated per run with 99.995% accuracy (Gupta and Gupta 2014).

NGS technologies used in agricultural research

Next-generation sequencing has become an important genomics tool for the enrichment of genetic gain in plant species. It has enabled the sequencing of plant genomes that are of economic importance resulting in a lot of genomic information that can be extracted and utilized for trait improvement. It includes whole-genome sequencing, transcriptome sequencing, and metagenomics which led to the discovery of functional genes and markers for desired traits through molecular assisted breeding and to improve crop production and conservation (Fig. 1).

Fig. 1
figure 1

NGS Technologies employed in agricultural research

Whole-genome sequencing

For agriculture, the generation of genomic data provides new ways to improve food security, reduce poverty, and introduce new reforms in species conservation programs. NGS has made it possible to sequence more complex genomes in a short period and in a cost-effective way. Crop genomes are more complex due to the repetitive sequences and polyploidy that is problematic for gene identification and understanding the function of the gene. Further, the presence of repetitive sequences, whole-genome duplication, and polyploidy led to the large genome size of crops. However, sequencing has made it easier to get information on all genes present in the genome of a specific crop of interest. The first crop to have its genome sequenced was rice (Delphine et al. 2012). Two subspecies of rice i.e., Oryza sativa L. species japonica and indica were sequenced using a combination of hierarchical clone-by-clone and whole-genome shotgun sequencing (Goff et al. 2002). After that, sequencing of bigger genomes, such as Glycine max (Schmutz et al. 2010), Populus trichocarpa (Tuskan et al. 2006), and Vitis (Jaillon et al. 2007; Velasco et al. 2007) was also performed by the whole genome shotgun sequencing method. Next-generation sequencing enabled the sequencing of complex plant genomes in a time and cost-effective manner. These technologies have been applied to sequence the genomes of different crops such as Malus domestica, Cucumis melo, Theobroma cacao, Cajanus cajan, Cicer arietinum, Musa, Solanum tuberosum, Brassica oleracea, Citrullus lanatus, Citrus sinensis, and Picea (Michael et al. 2015). Different platforms of short-read sequencing such as Illumina and 454 pyrosequencing were used to sequence the genomes and the assembled short reads were mapped to a reference genome for studying species diversity. Also, the up-gradation in second-generation sequencing technologies has facilitated the generation of complete and contiguous assemblies de novo, but the complexity of transposable elements and repetitive sequences resulted in partial genome assemblies in these regions. Third-generation sequencing (long-read sequencing) has proven a boon for these genomes giving complete and contiguous assemblies of the genomes. Next-generation platforms, Illumina HiSeq and NextSeq, Pacific Biosciences, and Nanopore Sequencing technology have been employed for sequencing most crop genomes recently (Table 1).

Whole-genome sequencing is categorized into two sequencing methods:

  • Whole-genome resequencing.

  • De novo sequencing.

Whole-genome resequencing

It is the method of sequencing the entire genome when a reference species genome is available for mapping. Whole-genome resequencing (WGR) of crop genomes enables the identification of different markers such as copy number variations (CNV), Indels, Single Nucleotide Polymorphisms (SNPs), and presence/absence variations (PAV) which provides a deeper understanding of genetic variations in crops species (Xu and Bai 2015). Advancement in NGS technologies has resulted in the sequencing of many genomes which has increased the list of available reference genomes for performing whole-genome resequencing to study genomic variations and specific signature discovery in different species of a particular crop. WGR has been applied to the genomes of Oryza sativa (Huang et al. 2009), Zea mays (Lai et al. 2010) Glycine max (Lam et al. 2010), and many other crops. WGR of crop genomes has revealed various CNVs and PAVs that have agronomic importance. In a study by Tong et al. whole-genome sequencing of 445 Lactuca accessions from 47 countries, was performed which provided a comprehensive variation map. SNPs, Indels, and structural variants were identified to reveal the phylogenetic relationship and the domestication history of cultivated lettuce (Wei et al. 2021). Also in a recent study, WGR of Cucurbita pepo identified genes controlling early flowering which is the prerequisite for selecting varieties for cultivation. Indels present in the promoter regions of genes were utilized to discover markers to distinguish between cultivars (Abbas et al. 2022). WGR of Sorghum discovered various SNPs and Indels, which further resulted in the development of an SNP marker that could be used in molecular breeding to improve aphid resistance in Sorghum (Wei et al. 2021). Resequencing of 588 diverse Brassica napus accessions from 21 countries uncovered different SNPs, Indels that are involved in improving phenotypic traits (Lu et al. 2019).

De novo sequencing

Several species of crops are distant from their sequenced closed species. In such cases, where no reference genome is available for resequencing, de novo sequencing must be performed for exploring their genomes. However, large genome size and high repetitive content is the hurdle for the utilization of this strategy. In de novo sequencing, reads are assembled to contigs, whose size and continuity determine the coverage quality of sequencing. Short read de novo assemblers are not efficient for obtaining high-level contiguity and assembling genomes having high repetitive content (Liao et al. 2019). The growth of long-read sequencing facilitated the development of new assembly algorithms to generate the complete and gapless assembly of crop genomes with heterozygous, polyploidy, and high repetitive content (Gao et al. 2019). Oropetium thomaeum was the first plant genome sequenced by Pacific Biosciences single-molecule real-time (SMRT) sequencing. It was the fourth most contiguous genome including 30% complete centromeres (VanBuren et al. 2015). Further, de novo genome sequencing of Solanum lycopersicum, Musa, Sorghum, Brassica, and many other crops was performed by Oxford Nanopore Technologies (ONT). Oxford Nanopore third-generation sequencing and Hi–C technology were employed for the de novo sequencing of (Olea europaea) that provided a reference genome for the study of gene function and molecular breeding in olive (Rao et al. 2021).

Transcriptome sequencing

Transcriptome sequencing also known as RNA-Seq, is the sequencing of transcriptionally active elements of a genome in a specific condition. It provides information about gene expression analysis, functional genomic studies, and gene characterization, in the absence of a sequenced genome (Afzal et al. 2020). The transcriptome can be assembled de novo or it can be mapped with the reference genome for studying genetic variation. It is comparatively easy to assemble and functionally annotate the transcriptome de novo than the genome (Bryant et al. 2017). The generated transcripts from transcriptome sequencing are reliant on the specific condition, whether it is the organ, stress, developmental stage, or external stimuli. Hence, it doesn’t provide information about structural variation (Schreiber et al. 2018). Transcriptomes of various crops have been sequenced and gene mining has been performed to study the regulatory mechanisms, gene expression analysis, and the identification of biosynthetic pathways and key genes associated with the particular trait of interest. Transcriptomes of Vitis amurensis (Xu et al. 2014), Aloe vera (Pragati et al. 2018), Paeonia suffruticosa (Guo et al. 2018), Trillium govanianum (Singh et al. 2017), Polygonum minus (Loke et al. 2017), Cornus officinalis (Hou et al. 2018), Dracocephalum tanguticum (Li et al. 2017), Salvadora oleoides (Bhandari et al. 2020) and many more have been sequenced to gain a deeper understanding of their molecular mechanisms.

Targeted resequencing

Targeted resequencing provides information on the gene of interest (exome) identified from association mapping studies. This strategy is more efficient and economical as sequences that are present in fixed areas of genetic variation regions over a big number of samples are used to identify distinctive variations (SNPs, CNVs) that provide information about breeding decisions or to characterize disease susceptibility. Compared to whole-genome sequencing approaches, targeted resequencing is a cost-effective approach of studying the particular region of interest. This sequencing method provides good coverage quality and allows the identification of genetic variants that are rare and difficult to sequence through whole-genome sequencing (Bewicke-Copley et al. 2019). These variants reveal the beneficial mutations that help in notifying the breeding choice as well as causative mutations for plant or animal disease, or susceptibility to parasites. In a study by Pankin et al. targeted resequencing of 433 wild and domesticated barley accessions was performed for the identification of candidate domestication genes. Further, phylogenetic and ancestry analyses revealed the origin of domesticated barley haplotypes (Pankin et al. 2018).

Epigenetics

Alterations in the environment (drought stress, food accessibility, etc.) influence the adaptive responses that cause physiological changes, affecting the viability and reproductive fitness of plants and animals. Epigenetic regulation is accomplished by mechanisms such as posttranslational histone modifications, DNA methylation, and noncoding RNAs that alter the chromatin states leading to gene activation or gene silencing, respectively. Next-generation sequencing has enabled the study of epigenetic regulation at the genome level, referred to as epigenomics. Investigating the epigenetic variation at the genome level provides new insights into the phenotypic diversity in species having low genetic variation. Comparative analysis of epigenomes led to the development of biomarkers specific for a particular genotype or condition, which is termed plant epibreeding (Kapazoglou et al. 2018). In a study by Forestan et al. (2018) chromatin data from genomic and transcriptomic sequencing were analyzed to study how Zea mays responds to drought and recovers from drought stress. Several studies have been reported on the analysis of epigenetic modifications in Triticum aestivum under abiotic stress. In one such study, microRNAs, heterochromatic small interfering RNA, and small regulatory RNAs were found to be involved in wheat drought stress response (Budak et al. 2015). Also, sequencing of Hordeum vulgare revealed several differently methylated genes to be induced in leaves than roots, when exposed to drought and salt stress (Chwialkowska et al. 2016). Epigenetic studies have been performed on many important crops, such as Oryza sativa, Solanum lycopersicum, Brassica napus, and many legume crops (Varotto et al. 2020). Two technologies have emerged for epigenetic analysis of genome using next-generation sequencing:

  • Whole-genome bisulfite sequencing (WGBS).

  • Chromatin immunoprecipitation (eChIP) approach.

Whole-genome bisulfite sequencing (WGBS)

This method allows the evaluation of DNA methylation throughout the genome. In this process, genomic DNA is treated with sodium bisulfite that converts unmethylated cytosines into uracil. Subsequently, PCR amplification, library preparation, and sequencing are performed to determine the untreated and treated sequences for the identification of methylated sequences. Genome-wide DNA methylation levels in Zea mays and Hordeum vulgare have been accessed using this technology (Li et al. 2018).

Chromatin immunoprecipitation sequencing (ChIP-seq)

Chip technology in combination with the sequencing approach allows the identification of transcription factor binding sites and histone modifications at a genomic scale. Chromatin immunoprecipitation is profiling of the chromatin component of epigenomes. ChIP procedure followed by sequencing is called ChIP-seq. In a study by Zhao et al. genome-wide profiling of five histone modifications and RNA polymerase II was performed using the eChIP approach, and epigenome landscapes were constructed in 20 representative rice varieties (2020).

Genotyping by sequencing (GBS)

Next-Generation Genotyping (NGG) or genotyping by sequencing (GBS), is a less expensive technique of genome screening to explore different plant and animal SNPs for genotyping research (Elshire et al. 2011). This method is a restriction enzyme-mediated complexity reduction followed by sequencing to discover random markers across an entire genome (Jiang et al. 2016). Large numbers of SNPs are generated for genetic analysis and genotyping. It is a low-cost tool with which researchers can accelerate the screening of breeding lines and breeding practices. Through this simple, quick, highly reproducible, extremely specific approach, it is possible to reach significant genomic regions that are unapproachable for other sequence capture-based methods (Poland et al. 2012). The major benefits of this system include reduced handling of samples and fewer PCR and purification steps, inexpensive barcoding, and no size fractionation. Initially, GBS was developed for performing high-resolution association studies in Zea mays and then has been used for other species with complex genomes. It is a cost-effective strategy that has been used for discovering and genotyping SNPs in various crop species, performing populations studies, and studying plant genetics and breeding in crop genomes (Truong et al. 2012). Various studies are reported to use the GBS method for genetic analysis and marker development in different crops such as Lactuca sativa (Poland et al. 2012), Brassica napus (Yang et al. 2012), Panicum virgatum (Bus et al. 2012), Lupinus (Lu et al. 2013), and Glycine max (Sonah et al. 2013). GBS approach has also been shown to optimize cereals crops such as Triticum aestivum, Oryza sativa, Zea mays, Sorghum, Hordeum vulgare, Avena sativa, and tuber crops like Manihot esculenta, Solanum tuberosum, and Gossypium, an industrial crop (Deschamps et al. 2012; He et al. 2014c).

Advantages of GBS include.

  • Simple and quick procedure compared to conventional approaches.

  • Easy computational data analysis.

  • High accuracy of SNP arrays.

  • Low cost creates more attractive and availability of large numbers of markers for scientists.

  • A small amount of DNA input is sufficient.

Environmental DNA sequencing

Environmental DNA (eDNA) sequencing is a rapidly emerging technique that allows the characterization of microorganisms in the soil, aquatic, systems, and other samples. This approach is utilized for monitoring ecosystem changes, studying biodiversity, and testing water and soil suitability. Organisms shed their DNA into the environment they live in, from which eDNA is characterized and its analysis provides clues about the species present in the environment without disrupting the ecosystem. NGS allows the profiling of thousands of species simultaneously from a single sample. It also provides the required sensitivity for the detection of eDNA, when present at low levels in the environment (Van Poecke et al. 2013). In eDNA sequencing, firstly the eDNA is analyzed by PCR amplification using the single-species approach with specific primers or by using a multiple-species (multiple-taxon) approach with generic primers. After PCR amplification, DNA sequencing is performed. Evolving NGS technologies have made it possible for performing wide-range biodiversity surveys easily with reduced costs. eDNA metabarcoding is the technique where mass sequencing of species from a complex sample is performed. It is a more powerful and cost-efficient method than single species identification. However, there are some drawbacks of eDNA metabarcoding. With the utilization of generic primers, primer affinity bias results in less efficient amplification of certain sequences than others. Also, data analysis and interpretation are more complex in high throughput sequencing (Thomsen and Willerslev 2015).

Genome editing

Genome editing is defined as the utilization of various molecular biology techniques such as zinc-finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs) for introducing particular modifications at specific genomic loci. However, with the development of clustered regularly interspaced short palindromic repeats (CRISPR)/Cas systems, there is exceptional progress in genome engineering due to ease and precise DNA manipulation (Zhang et al. 2018). Genome-editing has been applied to a variety of crops and has transformed crop improvement. Insertion, expression, and silencing of certain significant genes are applied to alter or enhance the particular trait of crops. Genomic information is the prerequisite for the application of genomic editing techniques. CRISPR/Cas9 and TALENs genome-editing systems utilize sequence-specific nucleases (SSNs) to create double-stranded breaks (DSBs) at the target locus for targeted insertion, disruption, or replacement of genes in plants. The advantages of the CRISPR/Cas9 system have led to its utilization in genome editing of various crop species (Jiang et al. 2013; Sun et al. 2015; Svitashev et al. 2016; Zhang et al. 2016; Shimatani et al. 2017; Soyk et al. 2017). NGS in combination with CRISPR genome editing technology is applied to resolve the issues related to viruses causing diseases in crops at the genomic level (Mushtaq et al. 2021).

Skim sequencing

Skim sequencing, also known as low coverage genome sequencing, is a less complex approach to genome sequencing for performing high-resolution genotyping of multiple genomes. It omits the complexity of genome reduction and low deep sequencing that occurs in genotyping by sequencing approach. Recombinant inbred lines (RILs) and doubled haploid populations are the most utilized populations for skim sequencing. It provides better coverage than the GBS method and is used for the identification of SNPs, recombination events, quantitative trait loci (QTL) mapping, and genome-wide association analysis. Availability of reference genome is a prerequisite for performing skim sequencing. Firstly, it aligns the reads from the population to the reference genome followed by variant calling. Variants having more than 80% missing alleles are discarded and the variants present in a particular position are called SNPs. This process is known as SNP genotyping. After that, SNP filtering is done to remove the false positive SNPs resulting from sequencing errors. Lastly, SNP imputation for collecting the missing genotype is performed. Skim sequencing has been used for a variety of crops such as Oryza sativa, Gossypium, Nicotiana tabacum, Cicer arietinum, and many others (Kumar et al. 2021).

Table 1 List of crop genomes sequenced recently using next-generation sequencing technologies

Applications of NGS in agriculture

Trait screening/genomic selection

Genomic Selection (GS) is a marker-based selection technique in which genomic loci, molecular markers, and haplotypes are utilized for developing novel breeding strategies. It evaluates the genomic signatures containing thousands of genetic markers for the prediction of complex traits (Dekkers 2007). Advancements in sequencing technologies have led to the development of genome-wide high throughput markers that are cost-effective and flexible. Hence, NGS-based genotyping has become a promising agrigenomics tool for performing genome selection in both model and non-model crop species as well as for crops with large and complex genomes. Moreover, in this case, genome selection can be performed in the absence of a reference genome. Genetic markers linked with desired traits are exploited to screen a large population of progeny related to specific features. Trait screening is best for multigene characteristics that are hard to manage with standard breeding or propagation methods. Recurrent population screening enables obvious segregation of the progeny with the required characteristics to be used for breeding practices. Genome selection has been successfully performed in various crop species such as Triticum aestivum, Oryza sativa, Zea mays, Glycine max, and Panicum virgatum (Bhat et al. 2016). Recently, GS has been attempted in common bean where a panel of elite Andean breeding lines was evaluated for various agronomic traits in two locations under drought, irrigated, and low phosphorous conditions (Keller et al. 2020). Availability of different crop pan genomes has also contributed to the genome selection where the genomic data is linked with the phenotype of interest. Solanum lycopersicum, Hordeum vulgare, and Cajanus cajan pan genomes have been utilized by the genome selection approach for studying the possible associations of the desired phenotypes. The establishment of pan genomes before employing genome selection is the prerequisite for incorporating the broader extent of markers in the model training process (Bayer et al. 2021).

Evolution and diversity

Compared to other eukaryotic organisms, plant genomes have a high evolution rate that led to greater genome diversity (Murat et al. 2012). Whole-genome duplication is the main mechanism behind the evolution and diversification of plants. Plant genomes have several gene duplicates, some of them are not essential for cell functioning, while some become functional genes via subfunctionalization, neofunctionalization, or non-functionalization. Whole-genome sequencing facilitates the comparison of sequences of the same species (Species Pan-genome) or interprets genomes within the family or genus (Clade Genomes). Whole-genome sequencing has wide applications particularly in population evolution, through the introduction of linkage disequilibrium, phylogenetic analysis, species formation, and genetic structure to investigate mechanisms of biological evolution. Also, mutations such as insertion, deletions, substitutions, and structural variations led to the alteration in the nucleotide sequences of genomes leading to genetic diversity within species. Not only mutations but also different breeding methods cause genetic diversity (Yaman and Uzun 2020; Yaman and Uzun 2021). Cost-efficient sequencing technologies have made it possible to sequence these genomes and to analyze the genetic diversity between or within species of crop genomes (Unamba Chibuikem et al. 2015).

Genome-wide association studies (GWAS)

Also known as genome-wide association (GWA) study, is an association mapping approach in which markers are identified across the complete sets of DNA, or genomes, for a large population to find genetic variations linked with a specific trait. These genetic associations provide information for the enhancement or improvement of crop species. The association mapping study provides various plant and animal whole-genome wide selection applications such as fingerprinting, marker-assisted breeding, and net merit, to improve the significance of commercial crops where millions of genetic variants are read by SNP arrays in a sample of DNA. GWAS has been used in various crops for a better understanding of the genetic architecture of important traits namely days to flowering, resistance, panicle architecture, fertility restoration, and other agronomic traits (Berhe et al. 2021). Agriculturally important traits of various crop species including Hordeum vulgare, Triticum aestivum, Oryza sativa, Zea mays, Glycine max, Sorghum, and Gossypium hirsutum have been explored by the GWAS approach (Liu and Yan 2019). With next-generation sequencing technologies, fast and cost-efficient SNP discovery has become possible across the genome, leading to the development of high-throughput genome-wide SNP genotyping. Genome-wide SNP genotyping has enabled the use of GWAS study in many significant crops. In a recent study by Yang et al. (2022), GWAS was performed on stems of cucumber. Genes related to cucumber stem diameter were identified and the data is beneficial for the development of new cucumber varieties with strong growth potential and high yield.

QTL analysis

Conventional QTL mapping requires genotyping and phenotyping of many progenies from a bi-parental mapping population, which is time-consuming and labor-intensive. However, QTL detection by next-generation sequencing provides an efficient alternative approach to conventional QTL analysis as it significantly reduces the scale and cost of analysis with comparable power to QTL detection using a full mapping population. Various methods are utilized for QTL mapping, most efficient among them are whole‐genome sequencing‐based bulked segregant analysis (BSA) and genotyping-by-sequencing (GBS). In the NGS‐based BSA analysis, members of a segregating population with two different phenotypes are taken and sequenced using NGS platforms. Comparison of the allele frequency must be done to detect differences associated with phenotype which corresponds to QTLs linked with the trait. Subsequently, a confidence interval for the location of QTL is determined using suitable statistical tests. Various tools specific to BSA-based QTL mapping are available such as standalone software package (QTL‐seq package), and R package (QTLseqr) for the analysis. This approach has been applied to map QTLs for traits of various crops (Deokar et al. 2019). QTLseq has been successfully applied in Oryza sativa and Cucumis sativus to identify major genes underlying QTLs linked with blast resistance and seedling vigor (Takagi et al. 2013; Lu et al. 2014). High-throughput QTL-seq approach was employed in an intra-specific F4 mapping population of Cicer arietinum to identify a major genomic region harboring the robust QTL associated with 100-seed weight in chickpea (Das et al. 2015). In combination with differential expression profiling and diversity analysis, QTLseq is utilized in the rapid identification of potential candidate genes. This approach has been applied to crops, such as Oryza sativa and Setaria italica (Wang et al. 2010; Gedil et al. 2016). Multi-seq is another approach where QTL-seq is applied to several mapping populations derived from crosses with at least one common parent. MQTL-seq has been successfully applied to two F5 mapping populations of chickpea to identify the commonly found major genomic regions (Nguyen et al. 2019).

GBS as already discussed above involves library preparation and sequencing approaches. Restriction site-associated DNA sequencing is a cost-effective GBS method that standardizes the number and coverage of genotyped loci and SNPs by altering the enzymes used and the sequencing depth. GBS has been used for linkage mapping and QTL analysis in a range of crops (Scheben et al. 2020).

Agricultural and soil metagenomics

Metagenomics facilitates the research on large microbial communities directly under their natural and environmental circumstances which involve the study of complex and diverse populations of microbes related to plant and animal development. Soil plays an important part in plant growth, hence understanding of associated microorganisms provides information for soil health as well as enhancement of crop yields. NGS technologies along with PCR and DNA fingerprinting techniques have a significant role in microbial research (Kaushik et al. 2020). The speed and accessibility of NGS technology have improved the field of metagenomics. Next-generation sequencing system provides an effective strategy for screening samples from the environment, and it is fast becoming an essential tool to study microbial diversity. Novel methods such as metabolomics and biotic factors are more important for understanding the microbiome. In a study by Sabale et al. (2019), diazotrophs were identified from the rhizosphere soil of red kidney beans using metagenomics approach by targeting nitrogen fixing gene nifH. Metagenomic Next-Generation Sequencing (mNGS) has been utilized for the identification and characterization of novel viruses from various samples of livestock, including poultry, cattle, pig, and small ruminants (Kwok et al. 2020). Plant fitness and other agricultural features, such as quality traits, soil biogeochemical properties, and crop yield are influenced by the microbiome of soil, plant, and livestock (Iquebal et al. 2022). A recent study has revealed the genome-resolved diversity of Phosphate-solubilizing Bacteria in agricultural soils using a metagenomics approach, which enhances the understanding of targeted engineering and is helpful in management practices for sustainable agriculture (Wu et al. 2022). Hence, metagenomics has its broad application in the field of agriculture by exploring unexploited soil microorganisms, identifying their functionalities and essential genes, and improving crop productivities, nutrient cycling, and phytopathogen resistance (Chidinma and Oluranti 2022).

Genotyping and marker-assisted breeding

Genome resequencing has allowed genetic variants detection by sequencing the genomes of the same species but with different accessions. Different genotyping platforms have been used for obtaining large-scale marker segregation data on mapping populations for the construction of genetic maps. Various molecular markers for breeding are also identified from the sequenced genome. However, the development of sensitive markers to select desirable ecotypes is critical in plant breeding (Wenqin et al. 2017). The objective of marker-assisted breeding is to develop abiotic and biotic stress-tolerant varieties with high yields. The desired traits are transferred from the donor parent to the offspring. This strategy allows the detection of trait genes and transformants through the identified markers associated with traits for releasing commercially important varieties or breeding stock (Varshney et al. 2005). The markers are very efficient to transfer traits to linked species, known as anchor markers, used for evolutionary studies and comparative genomic analysis.

Parentage

Molecular markers are used for determining parentage based on marker genotypes by constructing pedigrees in outcrossing genotypes. The evaluation of a distinct marker is not sufficient, various markers are used to assess the possibility to detect the real parent. For inline breeding, where there are multiple generations of males or females present in the group, the interpretations of the markers are pooled to determine parentage with the breeder’s potential data. In addition to breeding, paternity analysis is also utilized for maintaining paternal balance in poly crosses, tracking pollination events across distances, and evaluating progeny relatedness. Next-generation sequencing provides low-cost single nucleotide polymorphic (SNP) markers to perform paternity analysis in large populations with higher accuracy than other traditional approaches. With this ability of NGS, paternity analysis has been used commonly in breeding programs for selections, maintaining pedigrees, and retaining paternal balance in poly crosses. Paternity analysis has been successfully employed in several crops including Trifolium pretense and Trifolium repens Phleum pretense and Eucalyptus urophylla (Crain et al. 2020).

GMO characterization

Molecular characterization of genetically modified organisms (GMOs) involves obtaining information about the location, copy number, and integrity of the exogenous gene in the plant genomes. It is crucial for obtaining desired traits, safety assessment, and detection of transgenic events. Various methods are utilized for molecular detection of GMOs including digital PCR, southern blotting, quantitative real-time PCR, and PCR-based genome walking. However, all these methods have one or other limitations related to time, effort, and efficiency. These limitations have been overcome with the development of next-generation sequencing leading to the efficient identification of copy number, insertion site, and flanking host DNA sequence of foreign DNA fragments in transgenic crops. NGS provides benefits such as a high degree of automation, accuracy, and standardization with good repeatability for molecular characterization of GMOs as compared to other crops (Wang et al. 2020). Different approaches to GMO characterization using NGS have been developed including whole genome sequencing, targeted sequencing, the combination of NGS with Site Finding PCR, and DNA walking (Debode et al. 2019).

Targeting induced local lesions in genomes (TILLING)

TILLING is a high-throughput technique to identify single nucleotide mutations in a specific region of a gene of interest with a powerful detection method that resulted from chemical-induced mutagenesis. This technology was first established in Arabidopsis thaliana and then has been employed in many diploid crops, including Hordeum vulgare, Oryza sativa, Zea mays, Glycine max, Sorghum, Avena sativa, Solanum lycopersicum, and Brassica rapa. Because of the complex genomes, TILLING is applied to a lesser extent in polyploid crops such as Triticum aestivum, Arachis hypogaea, and Brassica napus (Gao et al. 2020). Next-generation sequencing technologies have also integrated with the TILLING pipeline for mutation discovery resulting in the improved efficiency of the mutation detection in polyploidy species such as Glycine max (palaeopolyploid), Triticum aestivum (allotetraploid), Arachis hypogaea (allotetraploid), and crambe (hexaploid) (Lakhssassi et al. 2021). Hence, TILLING by Sequencing is the more efficient method for mutation detection in a population, where amplicons from mutagenized plants are assembled and subsequently subjected to high-throughput sequencing (Fanelli et al. 2021). Initially, the TILLING method was developed as a discovery platform for functional genomics, but now it has become the substitute for transgenics for crop breeding (Kurowska et al. 2011). Advancement in next-generation sequencing may result in the development of an in silico resource for the assembly of all the mutations from a mutagenized population. The in silico TILLING approach allows researchers to immediately get results for the target gene based on available mutations (Till et al. 2018).

Conclusion

Genome sequencing is an integral part of biology that has been used in agriculture for regulating productivity and overcoming the challenges of food security. Recently, the development of NGS technologies has enhanced various strategies for scientific innovation. NGS is a high throughput sequencing that performs massive parallel sequencing of clonally amplified templates on a solid surface with reduced cost of new platform technologies. The advent of long-read sequencing platforms has simplified the sequencing of complex crop genomes having large genome sizes and high repetitive content. Different platforms are available that can be chosen according to the study. NGS has impacted agricultural genomics by introducing transcriptome sequencing, genome resequencing, targeted sequencing, and genotyping by sequencing methodologies. It is an essential genomics tool for many plant breeding and improvement programs. Whole-genome sequencing and genome resequencing provide information on structural variations present within or between the crop species, which is useful in uncovering the population diversity. Transcriptome sequencing and epigenetic analysis are valuable tools for studying the gene expression in a specific environmental condition impacting crop yield or responsible for a particular trait. Variants such as SNPs, CNVs, and markers that are associated with a particular phenotypic effect can be identified by sequencing. In addition, genotyping by sequencing enables the genotyping of multiple genomes in the absence of reference genomes. NGS also play role in various stages of genome editing using CRISPR technology. Whole-genome sequencing is utilized for analyzing CRISPR off-target effects and also CRISPR knockouts and other edits are verified by targeted sequencing. Hence, NGS technologies play a crucial role in the innovations in the agricultural field and provide the information and better understanding for regulation and enhancing the performance of the crop to meet the global demand.