Keywords

2.1 Introduction

Crops play a significant and diverse role in our economy, environment and feeding the increasing world population. Increased demand for biofuel crops, population explosion and global climate change have become a challenge for current plant biotechnology, and sustainable agricultural production is an urgent issue in this response (Brown and Funk 2008; Ozturk 2010; Hakeem et al. 2012). Climate change will severely influence the world’s food supply, and it is predicted to have immense negative effects on both the yield and the quality of crop plants (Kumar 2016), unless steps are taken to increase crop resilience. Plant genomics is a potentially powerful defence against this looming threat. Now to solve these issues and increase crop yields, breeding of novel crops and adaptation of current crops to the new environment based on a better molecular understanding of gene function, and on the regulatory mechanisms involved in crop production (Takeda and Matsuoka 2008), have become a primary necessity. Crop yields have increased during the past century and will continue due to enhanced breeding and new biotechnological-engineered strategies. Some of the important gene sequences and their function have been designated, many of which are related to crop yields (production), crop quality, tolerance to biotic and abiotic stresses and development of molecular markers (De Filippis 2012). One vital tool of bioinformatics is “genomics”, which is commonly used to identify genotypic and phenotypic changes in plants, and this information helps in improving the overall performance of crop plants (Ahmad et al. 2011).

Modern technologies of bioinformatics have enhanced the study of plant biology to a higher level than before and have assisted in unravelling genetic and molecular networks (Schuster 2007). After a rapid surge in genome sequencing through innovative high-throughput methods, scientists have an opportunity to exploit the structure of the plant genetic material at the molecular level which is known as “plant genomics” (Govindaraj et al. 2015). Some of the latest applications of bioinformatics in plant science research field (Fig. 2.1) are as follows:

  • Integrated “omics” strategies clarify the molecular system of the plant which is used to improve the plant productivity. Innovations in omics-based research improve plant-based research.

  • Genomics strategy, especially comparative genomics, helps in understanding the genes and their functions and the biological properties of each species.

  • Bioinformatics databases are also used in designing new techniques and experiments for increased plant production (Mochida and Shinozaki 2010).

  • Advancement in the bioinformatics tools has enabled us in providing information about the genes present in the genome of microorganisms (role in agriculture). These tools have also made it possible to predict the function of different genes and factors affecting these genes, and this information is used by scientists to produce enhanced species of plants which have a drought, herbicide and pesticide resistance in them (Mochida and Shinozaki 2010).

  • Nowadays, genomics provides breeders with a new set of tools and techniques that allow the study of the whole genome, and which represents a paradigm shift, by facilitating the direct study of the genotype and its relationship with the phenotype (Tester and Langridge 2010). The present genomics is leading to a new revolution in plant breeding at the beginning of the twenty-first century.

Fig. 2.1
figure 1

Application of bioinformatics in crop improvement

At the most fundamental level, the advances in genomics will greatly accelerate the acquisition of knowledge and that, in turn, will directly affect many aspects of the processes associated with plant improvement. Bioinformatics information and databases have become ready-to-use tools for crop scientists and breeders in gene data mining and linking this knowledge to its biological significance (Mochida and Shinozaki 2010). Knowledge of the function of all plant genes, in conjunction with the further development of tools for modifying and interrogating genomes, will lead to the development of a genuine genetic engineering paradigm in which rational changes can be designed and modelled from the first principles. Bioinformatics, when combined with genomics, has the potential to help maintain food security in the face of climate change through the accelerated production of climate-ready crops (Batley and Edwards 2016). Based on these understandings, this chapter focuses on challenges and opportunities, which knowledge and skills in bioinformatics can bring to plant scientists in present plant genomics era as well as future aspects in critical need for effective tools to facilitate the translation of knowledge from new sequencing data to the improvement of plant productivity. This chapter emphasizes on a number of applications of bioinformatics in agriculture in view of crop improvement, breeding programmes, fruit breeding, overviewing the main bioinformatics strategies and challenges, as well as perspectives in this field and various bioinformatics tools/databases important for breeders and plant biotechnologists.

2.2 Role of Bioinformatics in Crop Improvement

To understand and unravel the genetic and molecular basis of all biological processes in plants have become a key objective for plant and crop biologists. This understanding helps in the practical utilization of plants as biological resources in the development of new cultivars with improved quality and at reduced economic costs (Schlueter et al. 2003). Since climate change and population explosion have increased pressure on our ability to produce sufficient food, the breeding of novel crops and the adaptation of current crops to the new environment are required to ensure continued food production. At the most fundamental level, advances in genomics have accelerated the acquisition of knowledge and that, in turn, has helped in providing rational annotation of genes, proteins and phenotypes, and this omics data can now be envisioned as a highly important tool for plant improvement (Fig. 2.1). Several new gene-finding tools are tailored for applications to plant genomic sequences, which have resulted in enhancing the nutritional quality and composition of food crops, increasing agricultural production for food, feed and energy (Schlueter et al. 2003; Van Emon 2016). This amalgam of bioinformatics with genomic tools has potential to maintain food security in the face of climate change through the accelerated production of new cultivars with improved quality and reduced economic and environmental cost (Batley and Edwards 2016).

The onset of research in the field of sequence analysis and genome annotation has played a significant role in the area of crop improvement. Whole-genome sequencing of several species permits to define their organization and provides the starting point for understanding their functionality (Ellegren 2014), therefore favouring human agriculture practice. An extremely large amount of genomics data is available from plants due to the tremendous improvements in the field of omics (Fig. 2.1), and nowadays function of different genes in the plant and the factors affecting these genes can be predicted (Morrell et al. 2012). This information has helped scientists to generate plant species resistant to abiotic and biotic stresses, herbicides and pesticides. In recent years, a number of latest sequencing technologies, which are adaptations of already existing pyro-sequencing methods (Ansorge 2009), have provided us with new opportunities to be addressed at the entire genome level in the fields of comparative genomics, meta-genomics and evolutionary genomics (Varshney et al. 2009). Indeed, the contribution of genomics to agriculture spans the identification and the manipulation of genes linked to specific phenotypic traits (Zhang et al. 2014) as well as genomics breeding by marker-assisted selection of variants (Organization 2005). Efforts addressed to the achievement of appropriate knowledge of associated molecular information, such as the one arising from transcriptome, metabolome and proteome sequencing (Fig. 2.2), are also essential to better depict the gene content of a genome and its main functionalities.

Fig. 2.2
figure 2

General description of a standard workflow in omics’ data analyses

Current advances in genomics and bioinformatics provide opportunities for accelerating crop improvement (Fig. 2.2) in the following areas:

  • “Gene finding” refers to the prediction of introns and exons in a segment of DNA sequence. Bioinformatics has aided in genome sequencing, and it has shown its success in locating the genes, in phylogenetic comparison and in the detection of transcription factor binding sites of the genes. Such an approach to identify key genes and understand their function will result in a “quantum leap” in quantitative and qualitative trait improvement in commercially important crops (Morrell et al. 2012).

  • “Comparative genetics” (model and non-model plant) with computational tools can reveal an organization of agronomically important genes with respect to each other which can be further used for transferring information from the model crop systems to other food crops. Species-specific nucleotide sequences are now providing information related to phenotypic characters, even when based on genome comparative analyses from the few model plants available (Cogburn et al. 2007; Paterson 2008).

  • “Cheminformatics” for designing of agrochemicals is based on an analysis of the components of signal perception and transduction pathways to select targets and to identify potential compounds that can be used as herbicides, pesticides or insecticides, thereby improving plant quality and quantity (Bennetzen et al. 1998).

  • “Agricultural genomics” leads to the global understanding of plant and pathogen biology, and its application would be beneficial for agriculture and in providing massive information to improve the crop phenotype. Further, whole-genome sequencing in plants allows chromosome-scale genetic comparisons, thereby identifying conserved genetic areas, which can facilitate identification and documentation of similar genomic sequences in related plant species (Haas et al. 2004; De Bodt et al. 2005).

  • “Microarray technology” has been widely adopted in gene expression analysis in crop plants to clarify the function of key genes and uncover the regulation mechanism through dissecting regulatory elements and the interaction of responsible genes. These gene expression studies allow us to understand how plants respond to and interact with the physical environment and management practices. These data may become a crucial tool of future breeding decision management systems (Langridge and Fleury 2011).

  • “Full-length cDNA libraries” serve as primary sequence resources for designing microarray probes and as clone resources for genetic engineering to improve crop efficiency (Futamura et al. 2008). These libraries have been used to identify biological features through comparisons of target sequences with those of model organisms.

  • “Multiple alignments” provide a method to estimate the number of genes in the gene families and in the identification of the previously undescribed genes. The multiple alignment information helps in studying the gene expression pattern in plants.

  • “Mutant analysis” is an effective approach for the investigation of gene function (Stanford et al. 2001). Comprehensive collections of mutant lines are also essential bioresources for radically accelerating forward and reverse genetics.

  • “Gene pyramiding” or gene stacking implies multiple desirable genes are assembled from different parent crops to enhance trait and develop elite lines and varieties. It is mainly used in improving existing elite cultivars for a few unsatisfactory traits, for which genes with large positive effects are identified (Malav et al. 2016).

  • “Molecular DNA marker” identification and location have contributed significantly to marker-assisted studies and selection (MAS) in plant breeding, and in a wider range of research, including species identification and evolution (Feltus et al. 2004; Varshney et al. 2005).

  • “Genetic markers” constructed to cover the complete genome may allow identification of individual genes associated with complex traits by QTL (quantitative trait loci) analysis and the identification of genetic diversity and induced variations (Feltus et al. 2004; Varshney et al. 2005).

  • In silico genomics technology has made it easier for researchers (working on plant-pathogen interactions) to identify defence/disease-resistant gene-enzyme with their promoter region and transcription factor which help to enhance the immunity and defence mechanisms (Pandey and Somssich 2009).

  • Bioinformatics has also enabled scientists to improve the nutritional quality of the plants by making changes in its genome. Researchers have been successful in inserting genes in the genome of rice to increase vitamin A levels. The genetically modified rice contains more vitamin A (essential to maintain healthy eyes) that has helped in reducing the blindness rate worldwide (Ye et al. 2000).

  • Bioinformatics tools are also indispensable to agriculture and horticulture from the climate change perspective. Some varieties of cereal have been modified to be drought/submergence resistant and enhanced to grow in infertile soils.

  • “Host-pathogen interactions” help in understanding the disease genetics and pathogenicity factor of a pathogen, which ultimately helped in designing best management options. Metagenomics and transcriptomics approaches are used to understand the genetic architecture of microorganism and pathogens to check how these microbes affect the host plant so that pathogen−/insect-resistant crop is generated and in the identification of host beneficial microbes (Schenk et al. 2012).

  • “Insect genomics” helps in the identification of resistance mechanisms and finding the novel target sites (Cory and Hoover 2006). By mapping the genome for Bacillus thuringiensis (bacteria that increases soil fertility and protects the plants from pests), scientists were able to incorporate these genes into the plant (e.g. cotton, maize and potato), which made them insect resistant. This resulted in a decrease in insecticide usage, enhancing productivity and nutritional value of crops.

  • Improving crops through breeding is a sustainable approach to increase yield and yield stability without intensifying the use of fertilizers and pesticides. Third-generation sequencing technologies are assisting to overcome challenges in plant genome assembly caused by polyploidy and frequent repetitive elements. As a result, high-quality crop reference genomes are increasingly available, benefitting downstream analyses such as variant calling and association mapping that identify breeding targets in the genome (Hu et al. 2018).

  • Machine learning also helps identify genomic regions of agronomic value by facilitating functional annotation of genomes and enabling real-time high-throughput phenotyping of agronomic traits in the glasshouse and in the field.

  • Crop databases integrate the growing volume of genotype and phenotype data, providing a valuable resource for breeders and an opportunity for data mining approaches to uncover novel trait-associated candidate genes (Hu et al. 2018).

2.3 Crop Breeding: Bioinformatics and Preparing for Climate Change

Plant breeding has been practised for thousands of years, since near the beginning of human civilization (Kingsbury 2009), “plant breeding is the art and science of changing the genetic structure of plants in order to produce desired characteristics” (Sleper and Poehlbman 2006). Bioinformatics has been involved in different aspects of sciences including plant breeding, and a large portion of these tools and techniques are related to the omics category (Barh et al. 2013). From the past few years, plant breeding has been extended through development and deployment of a large number of methods and tools with respect to specific objectives (Al-Khayri et al. 2015). With the use of omics, the consistency and predictability of plant breeding programmes have been improved, reducing the time and the expense of stress-tolerant varieties (Van Emon 2016). The field of genomics and its application to plant breeding are developing very quickly, and this boom in plant breeding has started after genome sequencing of Arabidopsis and rice (Kaul et al. 2000; Matsumoto et al. 2005), followed by many genome sequencing projects of different plant species (Skuse and Du 2008). The combination of conventional breeding techniques with genomic tools and approaches is leading to a new genomics-based plant breeding (Fig. 2.3). A fully assembled and well-annotated genome will allow breeders to discover genes related to agronomic traits, determine their location and function as well as develop genome-wide molecular markers (Hu et al. 2018).

Fig. 2.3
figure 3

Reaping benefits of omics in crop breeding. Discovery of the genes and the genetic architecture by different omics underlying critical traits provides insights for crop improvement. Identification of genes and quantitative trait loci (QTL) and genome-wide association study (GWAS) enhances rice yield, quality and stress tolerance in a wide range of environments. Genetic maps help to locate genes and provide molecular/genetic markers for selection. Gene discovery provides knowledge of genetic mechanisms and interactions. Databases provided (barcoded) sample tracking and breeding history (pedigrees). Phenotypes (trait measurements) are stored with experimental design and environmental data and can be connected to individual and genotype (marker). All these constitute a toolbox for plant breeders

One of the most substantial transformation of bioinformatics’ techniques in plant breeding is that it had replaced the conventional molecular marker technology with high-throughput DNA sequencing technologies and has developed a number of databases (Mochida and Shinozaki 2010) (Table 2.1). These and other technical revolutions provide genome-wide molecular tools for breeders (large collections of markers, high-throughput genotyping strategies, high-density genetic maps, etc.) that can be incorporated into existing breeding methods (Tester and Langridge 2010; Lorenz et al. 2011). With the progress of genome sequencing and large-scale EST analysis in various species, these sequence datasets have become quite efficient sequence resources for designing molecular markers covering the entire genomes (Feltus et al. 2004). Recent advances in genomics are producing new plant breeding methodologies, improving and accelerating the breeding process in many ways (e.g. association mapping, marker-assisted selection, “breeding by design”, gene pyramiding, genomic selection, etc.) (Lorenz et al. 2011). Some of these molecular and genetic markers, which have played a significant role in improving plant breeding, are as follows:

  1. 1.

    Crop breeders have known the complexity of multiple alleles for decades. However, with the advent of molecular markers, genetic diversity and other forms of genetic structure in breeding populations are possible. For high-throughput genotyping, a number of platforms have been developed that have been applied to genetic map construction, marker-assisted selection and QTL cloning using multiple segregation populations (Hori et al. 2007). Such genotyping systems have also been used in post-genome sequencing projects such as genotyping of genetic resources, accessions to evaluate population structure and association studies to identify genetic loci involved in phenotypic changes of species. Listed in Table 2.1 are the most important web-based sites for DNA markers.

  2. 2.

    Molecular DNA markers have contributed significantly to marker-assisted studies and selection (MAS) in plant breeding and in a wider range of research, including species identification and evolution (Feltus et al. 2004).

  3. 3.

    Genetic markers designed to cover a genome extensively allow not only identification of individual genes associated with complex traits by QTL analysis but also the exploration of genetic diversity with regard to natural variations (Feltus et al. 2004; Varshney et al. 2005).

  4. 4.

    A number of attempts to design polymorphic markers from accumulated sequence datasets have been made for various species, e.g., genome-wide rice (Oryza sativa) DNA polymorphism datasets have been constructed based on alignment between japonica and indica rice genomes (Han and Xue 2003; Shen et al. 2004).

  5. 5.

    The most important database EST (expressed sequence tag) consists of ESTs drawn from the multiple cDNA. Large-scale EST datasets are also important resources for the discovery of sequence polymorphisms, especially for allocating expressed genes onto a genetic map (Heesacker et al. 2008).

  6. 6.

    The Illumina GoldenGate Assay allows the simultaneous analysis of up to 1536 SNPs in 96 samples and has been used to analyse genotypes of segregation populations in order to construct genetic maps allocating SNP (single nucleotide polymorphism) markers in crops (Hyten et al. 2008).

  7. 7.

    Diversity Arrays Technology (DArT) is a high-throughput genotyping system that was developed based on a microarray platform (Wenzl et al. 2007). These DArT markers have been used together with conventional molecular markers to construct denser genetic maps and/or to perform association studies in various crop species.

  8. 8.

    Affymetrix Gene Chip Arrays have been used to discover nucleotide polymorphisms as single-feature polymorphisms based on the differential hybridization of Gene Chip probes in barley and wheat (Rostoks et al. 2005; Bernardo et al. 2009).

  9. 9.

    Transcriptomics subcategories of omics attract a large number of biologists, especially in plant breeding area (Hakeem et al. 2016).

  10. 10.

    The most powerful application of third-generation sequencing for breeding is the assembly of improved contiguous crop genomes (Hu et al. 2018).

Table 2.1 GWAS acceleration tools and molecular marker database

Further, as the resolution of genetic maps in the major crops increases, and as the molecular basis for specific traits or physiological responses becomes better elucidated, it will be increasingly possible to associate candidate genes, discovered in model species, with corresponding loci in crop plants (Fig. 2.3). Appropriate relational databases will make it possible to freely associate across genomes with respect to the gene sequence, putative function and genetic map position. Once such tools have been implemented, the distinction between breeding and molecular genetics will fade away. Breeders will routinely use computer models as toolbox to formulate predictive hypotheses to create phenotypes of interest from complex allele combinations (Fig. 2.3) and then construct those combinations by scoring large populations for very large numbers of genetic markers (Walsh 2001).

2.3.1 Informative Bioinformatics Databases/Tools for Crop Breeders

Crop breeding has long relied on cycles of phenotypic selection and crossing, which generate superior genotypes through genetic recombination. When genome sequences are available, all genes and genetic variants contributing to agronomic traits can be identified, and changes made during breeding processes can be assessed at the genotype level (Hu et al. 2018). Availability of ready-to-go genomic data for breeders today plays an increasingly important role in all aspects of crop breeding, such as quantitative trait loci (QTL) mapping and genome-wide association studies (GWAS), where genomic sequencing of crop populations can allow gene-level resolution of agronomic variation. The progress made in genomics-based breeding has even assisted in identification of genetic variation in crop species, which can be applied to produce climate-resilient varieties (Mousavi-Derazmahalleh et al. 2018; Dwivedi et al. 2017).

GWAS (comparative genomic analysis, phylogenomics, evolutionary analysis and genome-wide association study) is presently a favourable tool to explore the allelic variation in a broader scope for extensive phenotypic diversity and higher resolution of QTL mapping. GWAS is an alternative to overcome the disadvantages of existing classical crop breeding methods, e.g. a biparental cross-mapping method for genetic dissections of the agronomically important traits (Myles et al. 2009). GWAS has a powerful application in plant breeding for identifying phenotypic diversity in trait-associated loci, as well as allelic variation in candidate genes addressing quantitative and complex traits (Kumar et al. 2013). GWAS has been successfully applied to study Arabidopsis thaliana, where more than 1300 distinct accessions have been genotyped for 250,000 SNP (Kozlov et al. 2015) phenotypes. A few rice genes having large effects in controlling traits are involved in determining yield, morphology and stress tolerance, and nutritional quality was also identified (Famoso et al. 2011). GWAS has been widely used to dissect complex traits in some other major crops, e.g. maize and soya bean (Li et al. 2013; Hwang et al. 2014). Several bioinformatics approaches have been introduced as GWAS acceleration tools (Table 2.1).

Advances in genomics offer the potential to accelerate the genomics-based breeding of crop plants (Fig. 2.3). However, relating genomic data to climate-related agronomic traits for use in breeding remains a huge challenge and one which will require coordination of diverse skills and expertise. Bioinformatics, when combined with genomics, has the potential to help maintain food security in the face of climate change through the accelerated production of climate-ready crops (Batley and Edwards 2016). The vast breeding knowledge gathered over the last several decades will become directly linked to basic plant biology and enhance the ability to elucidate gene function in model organisms (Hospital et al. 2002). The expected dramatic improvements in phenotypes of commercial interest include both the improvement of factors that traditionally limit agronomic performance (input traits) and the alteration of the amount and kinds of materials that crops produce (output traits). Examples include:

  • Abiotic stress tolerance

  • Biotic stress tolerance

  • Improving nutrient use efficiency

  • Manipulation of plant architecture and development (size, organ shape, number and position, the timing of development and senescence)

  • Metabolite partitioning (redirecting of carbon flow among existing pathways or shunting into new pathways)

Appropriate relational databases will make it possible to freely associate across genomes with respect to the gene sequence, putative function or genetic map position. Once such tools have been implemented, the distinction between breeding and molecular genetics will fade away. Breeders will routinely use computer models to formulate predictive hypotheses to create phenotypes of interest from complex allele combinations and then construct those combinations by scoring large populations for very large numbers of genetic markers (Walsh 2001; Deckers and Hospital 2002).

2.4 Application of Bioinformatics in Fruit Breeding

During the last three decades, the world has witnessed a rapid increase in the knowledge about the plant genome sequences and the physiological and molecular roles of various plant genes, which have revolutionized the molecular genetics and its efficiency in plant breeding programmes. Since bioinformatics has application in every field of science, genome programme can now be envisioned as a highly important tool for fruit breeding. Identifying key genes and understanding their function will result in a “quantum leap” in improving fruit quality and quantity (Meyer and Mewes 2002). The revolution in life sciences brought on by genomics dramatically increases the scale and scope of our experimental enquiry and applications in fruit plant breeding. The scale and high-resolution power of genomics make possible a broad and detailed genetic understanding of plant performance at multiple levels of aggregation (Meyer and Mewes 2002).

The primary goal of fruit plant genomics is to understand the genetic and molecular basis of all biological processes in plants that are relevant to the species. This understanding is fundamental to allow efficient exploitation of fruit plants as biological resources in the development of new cultivars of improved quality and reduced economic and environmental costs (Fig. 2.1). This knowledge is also vital for the development of new diagnostic tools and traits of primary interest like pathogen resistance and abiotic stress, fruit quality and yield. Moreover, gene expression analysis will allow us to understand how fruit plants respond to and interact with the physical environment and management practices. This information, in conjunction with appropriate technology, may provide predictive measures of plant health and fruit quality and become part of future breeding decision management systems. Current genome programmes generate a large amount of data that will require processing, storage and distribution to the international research community. The data include not only sequence information but also information on mutations, markers, maps and functional discoveries.

The key objectives for fruit plant bioinformatics include:

  • Integrating phenotypes, genomics and bioinformatics tools and resources in public and private breeding pipelines will address this challenge and help deliver breeding targets

  • Providing rational annotation of genes, proteins and phenotypes.

  • Elaborating relationships both within the data on individual fruits and between fruits and other organisms.

2.5 Future Prospects

Bioinformatics era and high-throughput sequencing (HTS) are revolutionizing the experimental design in molecular biology, strikingly contributing to increasing scientific knowledge while affecting relevant applications in many different aspects of agriculture. Bioinformatics plays a significant role in the development of the agricultural sector, agro-based industries, agricultural by-product utilization and better management of the environment. With the increase of sequencing projects, bioinformatics continues to make considerable progress in biology by providing scientists with access to the genomic information and plays a big role to analyse the data properly. Recent wealth of plant genomic resources, along with advances in bioinformatics, have enabled plant researchers to achieve a fundamental and systematic understanding of economically important plants and plant processes, critical for advancing crop improvement. The scale and high-resolution power of genomics enable to achieve a broad as well as a detailed genetic understanding of plant performance at multiple levels of aggregation. Advances in genomics are providing breeders with new tools and methodologies that allow a great leap forwards in plant breeding, including the “super domestication” of crops and the genetic dissection and breeding for complex traits. The ability to represent high-resolution physical and genetic maps of crops has been one of the paramount implications of bioinformatics. Plant scientists have an opportunity to use these resources to the full, to ensure that bench work, both in the present and in the future, can be combined with bioinformatics to fully reap the rewards of the genomics revolution. By applying novel technologies and methods in concert, future plant breeding can achieve the crop improvement rate required to ensure food security. Despite these exciting achievements, there remains a critical need for effective tools and methodologies to advance plant biotechnology, to tackle questions that are hardly solved using current approaches and to facilitate the translation of this newly discovered knowledge to improve plant productivity. Overall, due to great impact of plant breeding in order to provide the world food security through improving current staple food crops and also overcome the current harsh environmental situation (as a result of climate change), it is necessary to assess the role and achievements of bioinformatics in breeding science of crop plants.