Bioinformatics in Plant Pathology

Khan, Aamir; Singh, Sakshi; Singh, Vinay Kumar

doi:10.1007/978-981-15-6275-4_32

Aamir Khan⁴,
Sakshi Singh⁵ &
Vinay Kumar Singh⁶

1312 Accesses
1 Citations

Abstract

Unprecedented success and availability of enormous next-generation sequencing data of host-pathogen in the public domain give us opportunities to understand the disease system biologically. The availability of genome data of host-pathogen in popular depository systems provides strong and proper help to retrieve, annotate, analyze and identify the functional elements for characterization at gene and genome levels for application development. The primary goal of bioinformatics is to enhance the understanding of biological processes using sequence pattern recognition, biological data mining, machine learning algorithms for biological datasets and visualization of biological data and molecules. Significant research efforts in the field include databases, software and tools development, genome analysis, anthropology, forensic genetics, sequence alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, gene expression analysis, microarray data analysis, protein–protein interactions and genome-wide association studies. Scientists, Paulien Hogeweg and Ben Hesper coined the term in 1970 to refer to the study of biological information processes in biotic systems. Margaret Oakley Dayhoff, the mother and father of bioinformatics compiled one of the first protein sequence databases. Elvin A. Kabat, the scientist who pioneered biological sequence analysis, developed the approach in 1970. Bioinformatics tools, techniques and databases can be used to identify potential genes, and target protein for host–pathogen interaction, drug designing and discovery and harvesting biological information from the plant genomes and their genes. Bioinformatics applications can be very beneficial in the improvement of crops and helpful for the development of designer crops.

Access provided by Autonomous University of Puebla. Download chapter PDF

Application of Bioinformatics in the Plant Pathology Research

Research on Plant Pathogenic Fungi in the Genomics Era: From Sequence Analysis to Systems Biology

Impact of Bioinformatics on Plant Science Research and Crop Improvement

Keywords

32.1 Introduction

Quality and quantity based designer crops and disease-free crops are in demand today. For that, crop improvement and protection is the first priority, in which computational biology approach for sequenced plant genomes plays a very important role and helps in crop improvement by maximizing the yield, quality-based fruits and grains production and disease resistant crops varieties (Chen and Chen 2008; King 2004; Mochida and Shinozaki 2010; Batley and Edwards 2016; Moody 2004). Development of sequence markers based on single nucleotide polymorphism and simple sequence repeat identification has now become feasible method for crop improvement. Lots of techniques, databases, tools and software have been developed to understand and analyze the biological system fully. Here standard bioinformatics techniques with specific tools and software are described.

32.2 Bioinformatics Techniques

32.2.1 Comparative Analysis

A comparative analysis is a field of biological sequence analysis in which the genomic sequence features of different organisms are compared. The genomic features may include the DNA sequence, regulatory region sequence genes and gene order. The major principle of comparative analysis is that to identify the common features between homologous sequences, it will often be encoded within the DNA that is evolutionarily conserved between them or differ region which are involved in diversity (Hardison 2003; Ong et al. 2016; Gebhardt et al. 2005; Sayers et al. 2019) (Fig. 32.1).

32.2.2 Sequence Analysis

Sequence analysis is the process of subjecting a DNA, RNA or protein homologous gene (orthologous and paralogous genes) sequence to understand its evolution, function, structure or features based on sequence alignment and searches against biological sequence databases like reference genes, proteins, UniProtKB/swiss-prot, protein data bank, etc. Sequence analysis includes the comparison of common region homologous sequences in order to find similarity and dissimilarity; identification of intrinsic features of the sequence such as active sites, post-translational modification sites, gene-structures, reading frames and distributions of introns and exons and regulatory elements; identification of sequence differences and variations such as point mutations, single nucleotide variants (SNV) and single nucleotide polymorphisms (SNPs) in order to get the genetic marker, revealing the evolution and genetic diversity of sequences and organisms and identification of molecular structure from sequence alone. A basic local alignment tool is the best tool for revealing the evolutionary and genetic diversity of sequences and organisms and identification of molecular structure from sequence (Aljanabi 2001; Bolger et al. 2018; Martinez 2013; Demuth and Hahn 2009; Lyons and Freeling 2008; Altschul et al. 1990; McClure et al. 1994; Pirovano and Heringa 2008; Bawono et al. 2017) (Fig. 32.2).

32.2.3 Gene Identification

Gene hunting, gene finding or gene prediction refers to the process of identifying the regions of genomic DNA that encode genes. Gene identification is one of the first and most important steps in understanding the gene and genome of organisms once they are sequenced and available to the public domain. Gene finding is one of the key steps in genome annotation, following genome sequence assembly and the filtering of non-coding (intronic) regions and coding (exonic) regions (Alioto 2012; Wang et al. 2004; Mochida and Shinozaki 2010) (Fig. 32.3).

32.2.4 Phylogenetic Analysis

Phylogenetic analysis is the study of the evolutionary relationships among groups of homologous genes from organisms (e.g. species or populations). These phylogenetic relationships are discovered based on phylogenetic inference methods (distance-matrix methods: Neighbor-Joining (NJ), UPGMA (Unweighted Pair Group Method with Arithmetic mean) and WPGMA (Weighted Pair Group Method with Arithmetic mean), Fitch–Margoliash method, using outgroups, etc.; Maximum parsimony: Branch and bound, Sankoff-Morel-Cedergren algorithm, MALIGN and POY; Maximum likelihood; Bayesian inference) using sequence or morphological data. A phylogenetic tree is a branching tree diagram that represents the evolutionary relationships among selected biological organisms or species. The phylogeny inferences based on similarities and differences in their genetic or physical characteristics. Phylogenetic analyses have become central to understanding genomes, diversity, evolution and ecology (Thompson et al. 1994, 2002).

32.2.5 Protein–Protein Interaction

Protein–protein interactions (PPIs) are the physical contacts between two or more protein molecules with high specificity based on biochemical events directed by hydrophobic effect and electrostatic forces. In STRING database known interactions based on curated databases or experimentally determined, predicted interactions based on gene neighbourhood or gene fusions or gene co-occurrence and other interactions based on textmining or co-expression or protein homology (De Las Rivas and Fontanillo 2010; Kozakov et al. 2017; Szklarczyk et al. 2019) (Figs. 32.4 and 32.5).

32.2.6 Microarray Data Analysis

NCBI developed the Gene Expression Omnibus (GEO) database in 2000 for high-throughput gene expression data. Microarray data analysis is used to infer information from the data generated from DNA, RNA and protein microarray experiments; these information allows researchers to investigate the expression level of a huge number of genes of the entire organism genome in a single experiment. Gene Expression Omnibus (GEO) is a public database using MIAME (Minimum Information About a Microarray Experiment) compliant data submissions. Sequence and array-based data are accepted by the repository. Techniques and tools are available to help researchers query and download experimental datasets and gene expression profiles. GEO has collected repository and it consists freely available microarray data, next-generation sequencing data, and other high-throughput functional genomics data submitted by the scientific community (Clough and Barrett 2016). Due to the complexity of data which are generated by experiments are analyzed by bioinformaticians and bio scientists with specialized softwares. GEO has developed many tools for data query, analysis and visualization that can be analyzed directly on the GEO server (Fig. 32.6).

32.2.7 Structure Prediction and Refinement

Protein structure prediction is the construction of the three-dimensional (3D) structure of a protein from its amino acid sequence. In three-dimensional structure, the 3D prediction contains folds and secondary and tertiary structures from its primary sequence. It is highly important in drug designing and in the designing of 3D novel enzymes (Krieger et al. 2003; Xiang 2006; França 2015; Cavasotto and Phatak 2009; Xu et al. 2000).

32.2.8 Molecular Docking Calculation

Molecular docking is the interaction of two or more molecules to provide a stable complex structure. Based on the binding properties of the ligand and target, it generates a three-dimensional structure complex. Molecular docking is an approach to predict the orientation of one molecule to second molecule in the bound structure, which forms a stable complex. Knowledge of the active site orientation in turn may be useful in predicting the binding strength or binding affinity between receptor-ligand molecules using scoring functions. Molecular docking is a prominent method for structure-based drug design, due to the prediction of the binding-conformation of molecular ligands to the target receptor binding site. Characterization of the active binding behaviour plays an important role in rational design of novel pesticides, herbicides, insecticides and fungicides (Ferreira et al. 2015; Guedes et al. 2014; Morris and Lim-Wilby 2008; Meng et al. 2011; de Ruyck et al. 2016; Pagadala et al. 2017; Zhao and Caflisch 2015; Kroemer 2007; Sousa et al. 2006; Jones and Willett 1995; Lybrand 1995; Goodsell et al. 1996; Gschwend et al. 1996; Trosset and Cavé 2019).

32.3 Bioinformatics Databases

Biological Data Model

Biological data model is a library of biological life sciences information and biological databases; it has a collection of computational analysis tools, literature and high-throughput experimental data. Biological database contains information from research areas including genomics, phylogenetics, proteomics, metabolomics microarray gene expression and phenomics. Information contained in biological databases includes gene structure and function, macromolecular structure, cellular and chromosomal localization and SNP and mutations in sequences and structures (Wheeler et al. 2005; Galperin and Fernández-Suárez 2012). NCBI is a data model that contains popular search engine Entrez. Entrez is NCBI’s retrieval system and primary text search that integrates the PubMed and PMC database of biomedical literature with so many molecular databases including genome, gene, DNA, genetic variation, gene expression, protein sequence and structure.

32.3.1 NCBI

NCBI stands for the National Center for Biotechnology Information and is strongly associated with the National Library of Medicine (NLM) and National Institutes of Health (NIH), Bethesda, Maryland. The NCBI was founded in 1988 by Senator Claude Pepper. NCBI resources contain chemicals and bioassays data, data and software, DNA and RNA sequence data, domains and structures, genes and expression data, genetics and medicine, genomes and maps, homology data, literature, protein sequence and structure, sequence analysis, taxonomy, training and tutorials data and variation data (NCBI Resource Coordinators 2016; Wheeler et al. 2005) (Figs. 32.7, 32.8, and 32.9).

32.3.2 DDBJ

DDBJ (DNA Data Bank of Japan), founded in 1986, is a biological databank that mainly contains DNA sequence information. DDBJ is located at National Institute of Genetics (NIG), Shizuoka prefecture, Japan. It is also a member of INSDC (International Nucleotide Sequence Database Collaboration). The INSDC consists of a joint effort to collect and share DNA and RNA sequence data with GenBank (USA) and the European Nucleotide Archive (UK). DDBJ Sequence Read Archive (DRA), NCBI Sequence Read Archive (SRA) and EBI Sequence Read Archive (ERA) share new data and updated data on nucleotide sequences, and each of the three databases (DDBJ, NCBI and EMBL) are synchronized on a daily basis through continuous interaction between the staff at each of the collaborating organizations (Kodama et al. 2012) (Fig. 32.10).

32.3.3 EMBL

European Molecular Biology Laboratory (EMBL) is a research institution supported by 25 member states. EMBL was founded in 1974 and is a molecular biology research organization funded by public money from its member states conducted by approximately 85 independent groups. The web-based submission systems include WebIn at EMBL-EBI, Sakura (“cherry blossoms”) at DDBJand BankIt at the NCBI (Madeira et al. 2019) (Fig. 32.11).

32.3.4 Ensembl Plants

Ensembl Plants is an integrative database containing genome-scale information of plants. Ensembl Plants database includes genome sequence, gene models, polymorphic loci and functional annotation and various tools for analysis of sequence data. It contains various additional information, such as variation data, individual genotype data, linkage, population structure and phenotype data (Bolser et al. 2016, 2017) (Fig. 32.12).

32.3.5 PlantGDB

PlantGDB is a resource for comparative genomics and a database of molecular sequence data for plant genomes. PlantGDB contains assembled unique transcripts (PUT), genome survey sequence assemblies (GSS), genome browsers and workflow Management (Dong et al. 2004; Duvick et al. 2008) (Fig. 32.13).

32.3.6 Phytozome

Phytozome is a comparative hub for plant genomes and gene family’s data and analysis. Phytozome provides a view of genome organization, gene family, gene structure and the evolutionary history of gene at the level of sequence. It also provides access to the sequences and functional annotations of plant genomes and genes (Goodstein et al. 2012) (Fig. 32.14).

32.3.7 UNIPROT

UniProt database is a freely accessible database for protein sequence and functional annotation information, many entries being derived from different genome sequencing projects. UniProt contains a large amount of biological function of protein information derived from the literature mining. The main aim of UniProt is to provide a freely accessible resource, comprehensive and high-quality information of protein sequence and functional annotation information to scientific community (UniProt Consortium 2018) (Fig. 32.15).

32.3.8 PDB

PDB (Protein Data Bank) is a databank for the three-dimensional (3D) structural data of a large number of biological molecules, such as nucleic acids and proteins. The structural data is typically obtained by X-ray crystallography, NMR spectroscopy and cryo-electron microscopy. They are submitted by structural biologists from all around the world and are freely accessible on the net via website URLs. PDBmain member organizations are PDBe, PDBj, RCSB and BMRB. The PDB is overseen by an international organization called the Worldwide Protein Data Bank, wwPDB (Berman et al. 2000; Berman 2008; Laskowski et al. 1997) (Fig. 32.16).

32.3.9 MMDB

The Molecular Modeling Database (MMDB) is a three-dimensional biomolecular structure database of experimentally determined macromolecules and hosted by the National Center for Biotechnology Information (Chen et al. 2003) (Fig. 32.17).

32.3.10 GEO

GEO (Gene Expression Omnibus) is a gene expression database that archives and freely distributes microarray datasets, next-generation sequencing analysis details and other high-throughput functional genomics datasets deposited by the research community. The main goals of GEO are to provide versatile and robust database in which researchers can efficiently store high-throughput functional genomic data, offer simple submission procedures and formats to the research community that supports complete and well-annotated data deposits and provide user-friendly mechanisms to researchers that allow users to review, query, locate and download studies and gene expression profiles of interest for query and analysis (Clough and Barrett 2016) (Fig. 32.18).

32.4 Bioinformatics Tools and Software

32.4.1 BiGGEsTS

BiclusterinG Gene Expression Time Series (BiGGEsTS) is a free tool and graphical application based on bi-clustering algorithms mainly developed for analysis of gene expression time series data (Gonçalves et al. 2009) (Fig. 32.19).

32.4.2 HCE

HCE (Hierarchical Clustering Explorer) consists of hierarchical clustering algorithm to enable researchers to determine the grouping of data with informative dendrogram and colour mosaic visual feedback and dynamic query controls (Seo et al. 2006) (Fig. 32.20).

32.4.3 ClustVis

ClustVis is a web tool which allows researchers to upload their data and create Heat maps and PCA (Principal Component Analysis) plots. Data can be uploaded as a file or by pasting data to the text box (Metsalu and Vilo 2015) (Fig. 32.21).

32.4.4 BLAST

BLAST (Basic Local Alignment Search Tool) finds regions of similarity and dissimilarity between sequences. The BLAST programme compares nucleotide or protein sequences to sequence databases and calculates identity with statistical significance (Altschul et al. 1990; Mount 2007) (Fig. 32.22).

32.4.5 Clustal

Clustal omega, Clustalw and Clustalx (Clustal series) are widely used programmes for multiple sequence alignment (Higgins et al. 1996; Chenna et al. 2003; Sievers and Higgins 2014) (Fig. 32.23).

32.4.6 Bioedit

BioEdit is a free sequence alignment editor for editing and manipulation of sequence alignment data (Tippmann 2004) (Fig. 32.24).

32.4.7 MEGA

MEGA is a tool for manual and automatic sequence alignment, phylogenetic tree preparation, estimating rates of molecular evolution, web-based database mining and testing evolutionary hypotheses (Kumar et al. 2018) (Fig. 32.25).

32.4.8 Figtree

Figtree is a graphical viewer of phylogenetic tree visualization and for producing publication-ready figures of phylogenetic trees (Rambaut 2012) (Fig. 32.26).

32.4.9 Circos

Circos server is basically for identification and analysis of similarities and dissimilarity/differences generated from gene and genome comparisons (Krzywinski et al. 2009) (Fig. 32.27).

32.4.10 Prosite

PROSITE server is protein database that consists of protein families, functional domains and functional signature sites and amino acid profiles and patterns in sequence (Sigrist et al. 2002) (Fig. 32.28).

32.4.11 CDD

Conserved Domain Database (CDD) is a protein database that consists of well-annotated multiple sequence alignments as position-specific score matrices (PSSMs) for identification of conserved domains via RPS-BLAST. CDD includes NCBI-curated functional domains based on 3D-structure information to define domain boundaries and provide functional insights into sequence/structure/function relationships, using Pfam, SMART, COG, PRK and TIGRFAMs databases (Marchler-Bauer et al. 2017) (Fig. 32.29).

32.4.12 Interproscan

InterProScan is a server to annotate protein families and domains automatically. InterPro provides functional signature analysis of proteins by classifying them into families, domains and important sites (Mitchell et al. 2019) (Fig. 32.30).

32.4.13 EasyModeller

EasyModeller is a graphical user interface programme used for homology modeling for predicting models of protein tertiary structures (Kuntal et al. 2010) (Fig. 32.31).

32.4.14 RAMPAGE/PROCHECK

PROCHECK server checks the stereochemical quality of a protein structure model; it produces Ramachandran plot to analyze the overall and residue-by-residue geometry (Laskowski et al. 2017; Lovell et al. 2003) (Figs. 32.32 and 32.33).

32.4.15 VERIFY3D

VERIFY3D server is used for determination of an atomic model (3D) with its amino acid sequence, by assigning a structural class based on alpha, beta, loop, polar, non-polar, etc. location and comparing the results to template structures (Eisenberg et al. 1997) (Fig. 32.34).

32.4.16 YASARA

YASARA (Yet Another Scientific Artificial Reality Application) is a computer programme for molecular vizualization, modeling and docking (Krieger and Vriend 2014) (Fig. 32.35).

32.4.17 BIOVIA Discovery Studio 2019

BIOVIA Discovery Studio contains BIOVIA Pipeline Pilot used for simulations, macromolecule design and analysis, antibody modeling, structure-based design, pharmacophore and ligand-based design, QSAR, ADMET and predictive toxicology, X-ray and visualization (Fig. 32.36).

32.4.18 Patchdock

The PatchDock server performs protein–protein docking and generates protein-small molecule complexes (Schneidman-Duhovny et al. 2005) (Fig. 32.37).

32.4.19 Hex

Hex tool/server is a graphics programme for docking calculation and visualizing docking modes of pairs of protein and DNA molecules. Hex is also useful for calculation of protein-ligand docking; it can superpose molecules (Macindoe et al. 2010) (Fig. 32.38).

32.5 Plant and Pathogen Genomics

Five main types of pathogenic organisms that cause plant diseases are viruses, bacteria, fungi, protozoa and worms/nematodes, which can lead from damage to death. The genome availability of plants and pathogens gives us opportunities to understand the bio systems and disease mechanisms (Tables 32.1, 32.2, and 32.3).

Table 32.1 List of important plant diseases with their causing organism, in which most of pathogen genomes are available in the NCBI database

Full size table

Table 32.2 List of important plant pathogen genome details

Full size table

Table 32.3 Plant genome sequence details

Full size table

32.6 Conclusion

The applications of bioinformatics to plant pathology have been pivotal role in understanding of host and pathogen evolution and molecular interactions between host and pathogen. Availability of next-generation sequencing data of candidate model organisms of all kingdom through high-throughput technology is convenient to deal with biological systems and understand the biological sequence–structure–function correlation using in-silico biology tools, technology and databases. Genome annotation, assembly, bioproject, biosample submission, sequence data submission, retrieval of data, data analysis, variation analysis, conserved domain analysis, gene identification, regulatory elements analysis, gene expression analysis, structure prediction, structure visualization, structure analysis, structure classification, molecular modeling, epitope identification and mapping using 3D, drug designing, active site analysis and molecular docking, etc. play an important role to achieve biological function and understand the sequence–structure–function relationship. These all in-silico biology techniques will be further helpful in genomics-assisted crop improvement and development of designer crops with high yield and super quality.

References

Alioto T (2012) Gene prediction. Methods Mol Biol 855:175–201
Article CAS PubMed Google Scholar
Aljanabi S (2001) Genomics and plant breeding. Biotechnol Annu Rev 7:195–238
Article CAS PubMed Google Scholar
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Article CAS PubMed Google Scholar
Batley J, Edwards D (2016) The application of genomics and bioinformatics to accelerate crop improvement in a changing climate. Curr Opin Plant Biol 30:78–81
Article PubMed Google Scholar
Bawono P, Dijkstra M, Pirovano W, Feenstra A, Abeln S, Heringa J (2017) Multiple sequence alignment. Methods Mol Biol 1525:167–189
Article CAS PubMed Google Scholar
Berman HM (2008) The protein data bank: a historical perspective. Acta Crystallogr A 64(Pt 1):88–95
Article CAS PubMed Google Scholar
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28(1):235–242
Article CAS PubMed PubMed Central Google Scholar
Bolger ME, Arsova B, Usadel B (2018) Plant genome and transcriptome annotations: from misconceptions to simple solutions. Brief Bioinform 19:437–449
CAS PubMed Google Scholar
Bolser D, Staines DM, Pritchard E, Kersey P (2016) Ensembl plants: integrating tools for visualizing, mining, and analyzing plant genomics data. Methods Mol Biol 1374:115–140
Article CAS PubMed Google Scholar
Bolser DM, Staines DM, Perry E, Kersey PJ (2017) Ensembl plants: integrating tools for visualizing, mining, and analyzing plant genomic data. Methods Mol Biol 1533:1–31
Article CAS PubMed Google Scholar
Cavasotto CN, Phatak SS (2009) Homology modeling in drug discovery: current trends and applications. Drug Discov Today 14:676–683
Article CAS PubMed Google Scholar
Chen YP, Chen F (2008) Using bioinformatics techniques for gene identification in drug discovery and development. Curr Drug Metab 9:567–573
Article CAS PubMed Google Scholar
Chen J, Anderson JB, DeWeese-Scott C, Fedorova ND, Geer LY, He S, Hurwitz DI, Jackson JD, Jacobs AR, Lanczycki CJ, Liebert CA, Liu C, Madej T, Marchler-Bauer A, Marchler GH, Mazumder R, Nikolskaya AN, Rao BS, Panchenko AR, Shoemaker BA, Simonyan V, Song JS, Thiessen PA, Vasudevan S, Wang Y, Yamashita RA, Yin JJ, Bryant SH (2003) MMDB: Entrez’s 3D-structure database. Nucleic Acids Res 31:474–477
Article CAS PubMed PubMed Central Google Scholar
Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 31:3497–3500
Article CAS PubMed PubMed Central Google Scholar
Clough E, Barrett T (2016) The gene expression omnibus database. Methods Mol Biol 1418:93–110
Article PubMed PubMed Central Google Scholar
De Las Rivas J, Fontanillo C (2010) Protein-protein interactions essentials: key concepts to building and analyzing interactome networks. PLoS Comput Biol 6(6):e1000807
Article PubMed PubMed Central CAS Google Scholar
de Ruyck J, Brysbaert G, Blossey R, Lensink MF (2016) Molecular docking as a popular tool in drug design, an in silico travel. Adv Appl Bioinforma Chem 9:1–11
Google Scholar
Demuth JP, Hahn MW (2009) The life and death of gene families. BioEssays 31:29–39
Article PubMed Google Scholar
Dong Q, Schlueter SD, Brendel V (2004) PlantGDB, plant genome database and analysis tools. Nucleic Acids Res 32(Database issue):D354–D359
Article CAS PubMed PubMed Central Google Scholar
Duvick J, Fu A, Muppirala U, Sabharwal M, Wilkerson MD, Lawrence CJ, Lushbough C, Brendel V (2008) PlantGDB: a resource for comparative plant genomics. Nucleic Acids Res 36(Database issue):D959–D965
CAS PubMed Google Scholar
Eisenberg D, Lüthy R, Bowie JU (1997) VERIFY3D: assessment of protein models with three-dimensional profiles. Methods Enzymol 277:396–404
Article CAS PubMed Google Scholar
Ferreira LG, Dos Santos RN, Oliva G, Andricopulo AD (2015) Molecular docking and structure-based drug design strategies. Molecules 20:13384–13421
Article CAS PubMed PubMed Central Google Scholar
França TC (2015) Homology modeling: an important tool for the drug discovery. J Biomol Struct Dyn 33(8):1780–1793
Article PubMed CAS Google Scholar
Galperin MY, Fernández-Suárez XM (2012) The 2012 nucleic acids research database issue and the online molecular biology database collection. Nucleic Acids Res 40(Database issue):D1–D8
Article CAS PubMed Google Scholar
Gebhardt C, Schmidt R, Schneider K (2005) Plant genome analysis: the state of the art. Int Rev Cytol 247:223–284
Article CAS PubMed Google Scholar
Gonçalves JP, Madeira SC, Oliveira AL (2009) BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data. BMC Res Notes 2:124
Article PubMed PubMed Central CAS Google Scholar
Goodsell DS, Morris GM, Olson AJ (1996) Automated docking of flexible ligands: applications of AutoDock. J Mol Recognit 9:1–5
Article CAS PubMed Google Scholar
Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N, Rokhsar DS (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40(Database issue):D1178–D1186
Article CAS PubMed Google Scholar
Gschwend DA, Good AC, Kuntz ID (1996) Molecular docking towards drug discovery. J Mol Recognit 9:175–186
Article CAS PubMed Google Scholar
Guedes IA, de Magalhães CS, Dardenne LE (2014) Receptor-ligand molecular docking. Biophys Rev 6:75–87
Article CAS PubMed Google Scholar
Hardison RC (2003) Comparative genomics. PLoS Biol 1:E58
Article PubMed PubMed Central CAS Google Scholar
Higgins DG, Thompson JD, Gibson TJ (1996) Using CLUSTAL for multiple sequence alignments. Methods Enzymol 266:383–402
Article CAS PubMed Google Scholar
Jones G, Willett P (1995) Docking small-molecule ligands into active sites. Curr Opin Biotechnol 6:652–656
Article CAS PubMed Google Scholar
King GJ (2004) Bioinformatics: harvesting information for plant and crop science. Semin Cell Dev Biol 15:721–731
Article CAS PubMed Google Scholar
Kodama Y, Mashima J, Kaminuma E, Gojobori T, Ogasawara O, Takagi T, Okubo K, Nakamura Y (2012) The DNA Data Bank of Japan launches a new resource, the DDBJ Omics Archive of functional genomics experiments. Nucleic Acids Res 40(Database issue):D38–D42
Article CAS PubMed Google Scholar
Kozakov D, Hall DR, Xia B, Porter KA, Padhorny D, Yueh C, Beglov D, Vajda S (2017) The ClusPro web server for protein-protein docking. Nat Protoc 12:255–278
Article CAS PubMed PubMed Central Google Scholar
Krieger E, Vriend G (2014) YASARA view – molecular graphics for all devices – from smartphones to workstations. Bioinformatics 30:2981–2982
Article CAS PubMed PubMed Central Google Scholar
Krieger E, Nabuurs SB, Vriend G (2003) Homology modeling. Methods Biochem Anal 44:509–523
CAS PubMed Google Scholar
Kroemer RT (2007) Structure-based drug design: docking and scoring. Curr Protein Pept Sci 8:312–328
Article CAS PubMed Google Scholar
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19:1639–1645
Article CAS PubMed PubMed Central Google Scholar
Kumar S, Stecher G, Li M, Knyaz C, Tamura K (2018) MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol 35(6):1547–1549
Article CAS PubMed PubMed Central Google Scholar
Kuntal BK, Aparoy P, Reddanna P (2010) Easy modeller: a graphical interface to MODELLER. BMC Res Notes 3:226
Article PubMed PubMed Central CAS Google Scholar
Laskowski RA, Hutchinson EG, Michie AD, Wallace AC, Jones ML, Thornton JM (1997) PDBsum: a web-based database of summaries and analyses of all PDB structures. Trends Biochem Sci 22(12):488–490. PubMed PMID: 9433130
Article CAS PubMed Google Scholar
Laskowski RA, Jabłońska J, Pravda L, Vařeková RS, Thornton JM (2017) PDBsum: structural summaries of PDB entries. Protein Sci 27:129–134
Article PubMed PubMed Central CAS Google Scholar
Lovell SC, Davis IW, Arendall WB 3rd, de Bakker PI, Word JM, Prisant MG, Richardson JS, Richardson DC (2003) Structure validation by Calpha geometry: phi, psi and Cbeta deviation. Proteins 50:437–450
Article CAS PubMed Google Scholar
Lybrand TP (1995) Ligand-protein docking and rational drug design. Curr Opin Struct Biol 5:224–228
Article CAS PubMed Google Scholar
Lyons E, Freeling M (2008) How to usefully compare homologous plant genes and chromosomes as DNA sequences. Plant J 53:661–673
Article CAS PubMed Google Scholar
Macindoe G, Mavridis L, Venkatraman V, Devignes MD, Ritchie DW (2010) HexServer: an FFT-based protein docking server powered by graphics processors. Nucleic Acids Res 38(Web Server issue):W445–W449
Article CAS PubMed PubMed Central Google Scholar
Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, Basutkar P, Tivey ARN, Potter SC, Finn RD, Lopez R (2019) The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res 47(W1):W636–W641
Article CAS PubMed PubMed Central Google Scholar
Marchler-Bauer A, Bo Y, Han L, He J, Lanczycki CJ, Lu S, Chitsaz F, Derbyshire MK, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Lu F, Marchler GH, Song JS, Thanki N, Wang Z, Yamashita RA, Zhang D, Zheng C, Geer LY, Bryant SH (2017) CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res 45:D200–D203
Article CAS PubMed Google Scholar
Martinez M (2013) From plant genomes to protein families: computational tools. Comput Struct Biotechnol J 8:e201307001
Article PubMed PubMed Central Google Scholar
McClure MA, Vasi TK, Fitch WM (1994) Comparative analysis of multiple protein-sequence alignment methods. Mol Biol Evol 11:571–592
CAS PubMed Google Scholar
Meng XY, Zhang HX, Mezei M, Cui M (2011) Molecular docking: a powerful approach for structure-based drug discovery. Curr Comput Aided Drug Des 7:146–157
Article CAS PubMed PubMed Central Google Scholar
Metsalu T, Vilo J (2015) ClustVis: a web tool for visualizing clustering of multivariate data using principal component analysis and heatmap. Nucleic Acids Res 43:W566–W570
Article CAS PubMed PubMed Central Google Scholar
Mitchell AL, Attwood TK, Babbitt PC, Blum M, Bork P, Bridge A, Brown SD, Chang HY, El-Gebali S, Fraser MI, Gough J, Haft DR, Huang H, Letunic I, Lopez R, Luciani A, Madeira F, Marchler-Bauer A, Mi H, Natale DA, Necci M, Nuka G, Orengo C, Pandurangan AP, Paysan-Lafosse T, Pesseat S, Potter SC, Qureshi MA, Rawlings ND, Redaschi N, Richardson LJ, Rivoire C, Salazar GA, Sangrador-Vegas A, Sigrist CJA, Sillitoe I, Sutton GG, Thanki N, Thomas PD, Tosatto SCE, Yong SY, Finn RD (2019) InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res 47:D351–D360
Article CAS PubMed Google Scholar
Mochida K, Shinozaki K (2010) Genomics and bioinformatics resources for crop improvement. Plant Cell Physiol 51:497–523
Article CAS PubMed PubMed Central Google Scholar
Moody G (2004) Digital code of life: how bioinformatics is revolutionizing science, medicine, and business. Wiley, Hoboken. ISBN 978-0-471-32788-2
Google Scholar
Morris GM, Lim-Wilby M (2008) Molecular docking. Methods Mol Biol 443:365–382
Article CAS PubMed Google Scholar
Mount DW (2007) Using the basic local alignment search tool (BLAST). CSH Protoc 2007:pdb.top17
PubMed Google Scholar
NCBI Resource Coordinators (2016) Database resources of the national center for biotechnology information. Nucleic Acids Res 44:D7–D19
Article CAS Google Scholar
Ong Q, Nguyen P, Thao NP, Le L (2016) Bioinformatics approach in plant genomic research. Curr Genomics 17:368–378
Article CAS PubMed PubMed Central Google Scholar
Pagadala NS, Syed K, Tuszynski J (2017) Software for molecular docking: a review. Biophys Rev 9:91–102
Article CAS PubMed PubMed Central Google Scholar
Pirovano W, Heringa J (2008) Multiple sequence alignment. Methods Mol Biol 452:143–161. https://doi.org/10.1007/978-1-60327-159-2_7. Review. PubMed PMID: 18566763
Article CAS PubMed Google Scholar
Rambaut A (2012) FigTree v1. 4.0. University of Oxford, Oxford. http://tree.bio.ed.ac.uk/software/figtree/
Sayers EW, Agarwala R, Bolton EE, Brister JR, Canese K, Clark K, Connor R, Fiorini N, Funk K, Hefferon T, Holmes JB, Kim S, Kimchi A, Kitts PA, Lathrop S, Lu Z, Madden TL, Marchler-Bauer A, Phan L, Schneider VA, Schoch CL, Pruitt KD, Ostell J (2019) Database resources of the national center for biotechnology information. Nucleic Acids Res 47:D23–D28
Article CAS PubMed Google Scholar
Schneidman-Duhovny D, Inbar Y, Nussinov R, Wolfson HJ (2005) PatchDock and SymmDock: servers for rigid and symmetric docking. Nucleic Acids Res 33(Web Server issue):W363–W367
Article CAS PubMed PubMed Central Google Scholar
Seo J, Gordish-Dressman H, Hoffman EP (2006) An interactive power analysis tool for microarray hypothesis testing and generation. Bioinformatics 22(7):808–814
Article CAS PubMed Google Scholar
Sievers F, Higgins DG (2014) Clustal Omega, accurate alignment of very large numbers of sequences. Methods Mol Biol 1079:105–116
Article CAS PubMed Google Scholar
Sigrist CJ, Cerutti L, Hulo N, Gattiker A, Falquet L, Pagni M, Bairoch A, Bucher P (2002) PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform 3(3):265–274
Article CAS PubMed Google Scholar
Sousa SF, Fernandes PA, Ramos MJ (2006) Protein-ligand docking: current status and future challenges. Proteins 65:15–26
Article CAS PubMed Google Scholar
Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, Simonovic M, Doncheva NT, Morris JH, Bork P, Jensen LJ, Mering CV (2019) STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res 47:D607–D613
Article CAS PubMed Google Scholar
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680
Article CAS PubMed PubMed Central Google Scholar
Thompson JD, Gibson TJ, Higgins DG (2002) Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics. Chapter 2:Unit 2.3
Google Scholar
Tippmann HF (2004) Analysis for free: comparing programs for sequence analysis. Brief Bioinform 5(1):82–87
Article CAS PubMed Google Scholar
Trosset JY, Cavé C (2019) In silico drug-target profiling. Methods Mol Biol 1953:89–103
Article CAS PubMed Google Scholar
UniProt Consortium T (2018) UniProt: the universal protein knowledgebase. Nucleic Acids Res 46:2699
Article PubMed PubMed Central CAS Google Scholar
Wang Z, Chen Y, Li Y (2004) A brief review of computational gene prediction methods. Genomics Proteomics Bioinformatics 2:216–221
Article CAS PubMed PubMed Central Google Scholar
Wheeler DL, Smith-White B, Chetvernin V, Resenchuk S, Dombrowski SM, Pechous SW, Tatusova T, Ostell J (2005) Plant genome resources at the national center for biotechnology information. Plant Physiol 138:1280–1288
Article CAS PubMed PubMed Central Google Scholar
Xiang Z (2006) Advances in homology protein structure modeling. Curr Protein Pept Sci 7:217–227. Review. PubMed PMID: 16787261; PubMed Central PMCID: PMC1839925
Article CAS PubMed PubMed Central Google Scholar
Xu D, Xu Y, Uberbacher EC (2000) Computational tools for protein modeling. Curr Protein Pept Sci 1:1–21
Article CAS PubMed Google Scholar
Zhao H, Caflisch A (2015) Molecular dynamics in drug design. Eur J Med Chem 91:4–14
Article CAS PubMed Google Scholar

URLs

Download references

Author information

Authors and Affiliations

Division of Crop Improvement and Biotechnology, Indian Institute of Vegetable Research, Varanasi, Uttar Pradesh, India
Aamir Khan
Department of Molecular and Human Genetics, Institute of Science, Banaras Hindu University, Varanasi, Uttar Pradesh, India
Sakshi Singh
Centre for Bioinformatics, School of Biotechnology, Institute of Science, Banaras Hindu University, Varanasi, Uttar Pradesh, India
Vinay Kumar Singh

Authors

Aamir Khan
View author publications
You can also search for this author in PubMed Google Scholar
Sakshi Singh
View author publications
You can also search for this author in PubMed Google Scholar
Vinay Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vinay Kumar Singh .

Editor information

Editors and Affiliations

Plant Pathology, G.B. Pant University of Agriculture & Technology, Pantnagar, India
Krishna P. Singh
Department of Plant Pathology, University of Agricultural Sciences Dharwad, Dharwad, Karnataka, India
Shamarao Jahagirdar
Department of Mycology and Plant Pathology, Banaras Hindu University, Varanasi, Uttar Pradesh, India
Birinchi Kumar Sarma

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Khan, A., Singh, S., Singh, V.K. (2021). Bioinformatics in Plant Pathology. In: Singh, K.P., Jahagirdar, S., Sarma, B.K. (eds) Emerging Trends in Plant Pathology . Springer, Singapore. https://doi.org/10.1007/978-981-15-6275-4_32

Download citation

DOI: https://doi.org/10.1007/978-981-15-6275-4_32
Published: 10 December 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6274-7
Online ISBN: 978-981-15-6275-4
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Bioinformatics in Plant Pathology

Abstract

Similar content being viewed by others

Application of Bioinformatics in the Plant Pathology Research

Research on Plant Pathogenic Fungi in the Genomics Era: From Sequence Analysis to Systems Biology

Impact of Bioinformatics on Plant Science Research and Crop Improvement

Keywords

32.1 Introduction

32.2 Bioinformatics Techniques

32.2.1 Comparative Analysis

32.2.2 Sequence Analysis

32.2.3 Gene Identification

32.2.4 Phylogenetic Analysis

32.2.5 Protein–Protein Interaction

32.2.6 Microarray Data Analysis

32.2.7 Structure Prediction and Refinement

32.2.8 Molecular Docking Calculation

32.3 Bioinformatics Databases

Biological Data Model

32.3.1 NCBI

32.3.2 DDBJ

32.3.3 EMBL

32.3.4 Ensembl Plants

32.3.5 PlantGDB

32.3.6 Phytozome

32.3.7 UNIPROT

32.3.8 PDB

32.3.9 MMDB

32.3.10 GEO

32.4 Bioinformatics Tools and Software

32.4.1 BiGGEsTS

32.4.2 HCE

32.4.3 ClustVis

32.4.4 BLAST

32.4.5 Clustal

32.4.6 Bioedit

32.4.7 MEGA

32.4.8 Figtree

32.4.9 Circos

32.4.10 Prosite

32.4.11 CDD

32.4.12 Interproscan

32.4.13 EasyModeller

32.4.14 RAMPAGE/PROCHECK

32.4.15 VERIFY3D

32.4.16 YASARA

32.4.17 BIOVIA Discovery Studio 2019

32.4.18 Patchdock

32.4.19 Hex

32.5 Plant and Pathogen Genomics

32.6 Conclusion

References

URLs

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation