Abstract
Unprecedented success and availability of enormous next-generation sequencing data of host-pathogen in the public domain give us opportunities to understand the disease system biologically. The availability of genome data of host-pathogen in popular depository systems provides strong and proper help to retrieve, annotate, analyze and identify the functional elements for characterization at gene and genome levels for application development. The primary goal of bioinformatics is to enhance the understanding of biological processes using sequence pattern recognition, biological data mining, machine learning algorithms for biological datasets and visualization of biological data and molecules. Significant research efforts in the field include databases, software and tools development, genome analysis, anthropology, forensic genetics, sequence alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, gene expression analysis, microarray data analysis, protein–protein interactions and genome-wide association studies. Scientists, Paulien Hogeweg and Ben Hesper coined the term in 1970 to refer to the study of biological information processes in biotic systems. Margaret Oakley Dayhoff, the mother and father of bioinformatics compiled one of the first protein sequence databases. Elvin A. Kabat, the scientist who pioneered biological sequence analysis, developed the approach in 1970. Bioinformatics tools, techniques and databases can be used to identify potential genes, and target protein for host–pathogen interaction, drug designing and discovery and harvesting biological information from the plant genomes and their genes. Bioinformatics applications can be very beneficial in the improvement of crops and helpful for the development of designer crops.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
32.1 Introduction
Quality and quantity based designer crops and disease-free crops are in demand today. For that, crop improvement and protection is the first priority, in which computational biology approach for sequenced plant genomes plays a very important role and helps in crop improvement by maximizing the yield, quality-based fruits and grains production and disease resistant crops varieties (Chen and Chen 2008; King 2004; Mochida and Shinozaki 2010; Batley and Edwards 2016; Moody 2004). Development of sequence markers based on single nucleotide polymorphism and simple sequence repeat identification has now become feasible method for crop improvement. Lots of techniques, databases, tools and software have been developed to understand and analyze the biological system fully. Here standard bioinformatics techniques with specific tools and software are described.
32.2 Bioinformatics Techniques
32.2.1 Comparative Analysis
A comparative analysis is a field of biological sequence analysis in which the genomic sequence features of different organisms are compared. The genomic features may include the DNA sequence, regulatory region sequence genes and gene order. The major principle of comparative analysis is that to identify the common features between homologous sequences, it will often be encoded within the DNA that is evolutionarily conserved between them or differ region which are involved in diversity (Hardison 2003; Ong et al. 2016; Gebhardt et al. 2005; Sayers et al. 2019) (Fig. 32.1).
32.2.2 Sequence Analysis
Sequence analysis is the process of subjecting a DNA, RNA or protein homologous gene (orthologous and paralogous genes) sequence to understand its evolution, function, structure or features based on sequence alignment and searches against biological sequence databases like reference genes, proteins, UniProtKB/swiss-prot, protein data bank, etc. Sequence analysis includes the comparison of common region homologous sequences in order to find similarity and dissimilarity; identification of intrinsic features of the sequence such as active sites, post-translational modification sites, gene-structures, reading frames and distributions of introns and exons and regulatory elements; identification of sequence differences and variations such as point mutations, single nucleotide variants (SNV) and single nucleotide polymorphisms (SNPs) in order to get the genetic marker, revealing the evolution and genetic diversity of sequences and organisms and identification of molecular structure from sequence alone. A basic local alignment tool is the best tool for revealing the evolutionary and genetic diversity of sequences and organisms and identification of molecular structure from sequence (Aljanabi 2001; Bolger et al. 2018; Martinez 2013; Demuth and Hahn 2009; Lyons and Freeling 2008; Altschul et al. 1990; McClure et al. 1994; Pirovano and Heringa 2008; Bawono et al. 2017) (Fig. 32.2).
32.2.3 Gene Identification
Gene hunting, gene finding or gene prediction refers to the process of identifying the regions of genomic DNA that encode genes. Gene identification is one of the first and most important steps in understanding the gene and genome of organisms once they are sequenced and available to the public domain. Gene finding is one of the key steps in genome annotation, following genome sequence assembly and the filtering of non-coding (intronic) regions and coding (exonic) regions (Alioto 2012; Wang et al. 2004; Mochida and Shinozaki 2010) (Fig. 32.3).
32.2.4 Phylogenetic Analysis
Phylogenetic analysis is the study of the evolutionary relationships among groups of homologous genes from organisms (e.g. species or populations). These phylogenetic relationships are discovered based on phylogenetic inference methods (distance-matrix methods: Neighbor-Joining (NJ), UPGMA (Unweighted Pair Group Method with Arithmetic mean) and WPGMA (Weighted Pair Group Method with Arithmetic mean), Fitch–Margoliash method, using outgroups, etc.; Maximum parsimony: Branch and bound, Sankoff-Morel-Cedergren algorithm, MALIGN and POY; Maximum likelihood; Bayesian inference) using sequence or morphological data. A phylogenetic tree is a branching tree diagram that represents the evolutionary relationships among selected biological organisms or species. The phylogeny inferences based on similarities and differences in their genetic or physical characteristics. Phylogenetic analyses have become central to understanding genomes, diversity, evolution and ecology (Thompson et al. 1994, 2002).
32.2.5 Protein–Protein Interaction
Protein–protein interactions (PPIs) are the physical contacts between two or more protein molecules with high specificity based on biochemical events directed by hydrophobic effect and electrostatic forces. In STRING database known interactions based on curated databases or experimentally determined, predicted interactions based on gene neighbourhood or gene fusions or gene co-occurrence and other interactions based on textmining or co-expression or protein homology (De Las Rivas and Fontanillo 2010; Kozakov et al. 2017; Szklarczyk et al. 2019) (Figs. 32.4 and 32.5).
32.2.6 Microarray Data Analysis
NCBI developed the Gene Expression Omnibus (GEO) database in 2000 for high-throughput gene expression data. Microarray data analysis is used to infer information from the data generated from DNA, RNA and protein microarray experiments; these information allows researchers to investigate the expression level of a huge number of genes of the entire organism genome in a single experiment. Gene Expression Omnibus (GEO) is a public database using MIAME (Minimum Information About a Microarray Experiment) compliant data submissions. Sequence and array-based data are accepted by the repository. Techniques and tools are available to help researchers query and download experimental datasets and gene expression profiles. GEO has collected repository and it consists freely available microarray data, next-generation sequencing data, and other high-throughput functional genomics data submitted by the scientific community (Clough and Barrett 2016). Due to the complexity of data which are generated by experiments are analyzed by bioinformaticians and bio scientists with specialized softwares. GEO has developed many tools for data query, analysis and visualization that can be analyzed directly on the GEO server (Fig. 32.6).
32.2.7 Structure Prediction and Refinement
Protein structure prediction is the construction of the three-dimensional (3D) structure of a protein from its amino acid sequence. In three-dimensional structure, the 3D prediction contains folds and secondary and tertiary structures from its primary sequence. It is highly important in drug designing and in the designing of 3D novel enzymes (Krieger et al. 2003; Xiang 2006; França 2015; Cavasotto and Phatak 2009; Xu et al. 2000).
32.2.8 Molecular Docking Calculation
Molecular docking is the interaction of two or more molecules to provide a stable complex structure. Based on the binding properties of the ligand and target, it generates a three-dimensional structure complex. Molecular docking is an approach to predict the orientation of one molecule to second molecule in the bound structure, which forms a stable complex. Knowledge of the active site orientation in turn may be useful in predicting the binding strength or binding affinity between receptor-ligand molecules using scoring functions. Molecular docking is a prominent method for structure-based drug design, due to the prediction of the binding-conformation of molecular ligands to the target receptor binding site. Characterization of the active binding behaviour plays an important role in rational design of novel pesticides, herbicides, insecticides and fungicides (Ferreira et al. 2015; Guedes et al. 2014; Morris and Lim-Wilby 2008; Meng et al. 2011; de Ruyck et al. 2016; Pagadala et al. 2017; Zhao and Caflisch 2015; Kroemer 2007; Sousa et al. 2006; Jones and Willett 1995; Lybrand 1995; Goodsell et al. 1996; Gschwend et al. 1996; Trosset and Cavé 2019).
32.3 Bioinformatics Databases
Biological Data Model
Biological data model is a library of biological life sciences information and biological databases; it has a collection of computational analysis tools, literature and high-throughput experimental data. Biological database contains information from research areas including genomics, phylogenetics, proteomics, metabolomics microarray gene expression and phenomics. Information contained in biological databases includes gene structure and function, macromolecular structure, cellular and chromosomal localization and SNP and mutations in sequences and structures (Wheeler et al. 2005; Galperin and Fernández-Suárez 2012). NCBI is a data model that contains popular search engine Entrez. Entrez is NCBI’s retrieval system and primary text search that integrates the PubMed and PMC database of biomedical literature with so many molecular databases including genome, gene, DNA, genetic variation, gene expression, protein sequence and structure.
32.3.1 NCBI
NCBI stands for the National Center for Biotechnology Information and is strongly associated with the National Library of Medicine (NLM) and National Institutes of Health (NIH), Bethesda, Maryland. The NCBI was founded in 1988 by Senator Claude Pepper. NCBI resources contain chemicals and bioassays data, data and software, DNA and RNA sequence data, domains and structures, genes and expression data, genetics and medicine, genomes and maps, homology data, literature, protein sequence and structure, sequence analysis, taxonomy, training and tutorials data and variation data (NCBI Resource Coordinators 2016; Wheeler et al. 2005) (Figs. 32.7, 32.8, and 32.9).
32.3.2 DDBJ
DDBJ (DNA Data Bank of Japan), founded in 1986, is a biological databank that mainly contains DNA sequence information. DDBJ is located at National Institute of Genetics (NIG), Shizuoka prefecture, Japan. It is also a member of INSDC (International Nucleotide Sequence Database Collaboration). The INSDC consists of a joint effort to collect and share DNA and RNA sequence data with GenBank (USA) and the European Nucleotide Archive (UK). DDBJ Sequence Read Archive (DRA), NCBI Sequence Read Archive (SRA) and EBI Sequence Read Archive (ERA) share new data and updated data on nucleotide sequences, and each of the three databases (DDBJ, NCBI and EMBL) are synchronized on a daily basis through continuous interaction between the staff at each of the collaborating organizations (Kodama et al. 2012) (Fig. 32.10).
32.3.3 EMBL
European Molecular Biology Laboratory (EMBL) is a research institution supported by 25 member states. EMBL was founded in 1974 and is a molecular biology research organization funded by public money from its member states conducted by approximately 85 independent groups. The web-based submission systems include WebIn at EMBL-EBI, Sakura (“cherry blossoms”) at DDBJand BankIt at the NCBI (Madeira et al. 2019) (Fig. 32.11).
32.3.4 Ensembl Plants
Ensembl Plants is an integrative database containing genome-scale information of plants. Ensembl Plants database includes genome sequence, gene models, polymorphic loci and functional annotation and various tools for analysis of sequence data. It contains various additional information, such as variation data, individual genotype data, linkage, population structure and phenotype data (Bolser et al. 2016, 2017) (Fig. 32.12).
32.3.5 PlantGDB
PlantGDB is a resource for comparative genomics and a database of molecular sequence data for plant genomes. PlantGDB contains assembled unique transcripts (PUT), genome survey sequence assemblies (GSS), genome browsers and workflow Management (Dong et al. 2004; Duvick et al. 2008) (Fig. 32.13).
32.3.6 Phytozome
Phytozome is a comparative hub for plant genomes and gene family’s data and analysis. Phytozome provides a view of genome organization, gene family, gene structure and the evolutionary history of gene at the level of sequence. It also provides access to the sequences and functional annotations of plant genomes and genes (Goodstein et al. 2012) (Fig. 32.14).
32.3.7 UNIPROT
UniProt database is a freely accessible database for protein sequence and functional annotation information, many entries being derived from different genome sequencing projects. UniProt contains a large amount of biological function of protein information derived from the literature mining. The main aim of UniProt is to provide a freely accessible resource, comprehensive and high-quality information of protein sequence and functional annotation information to scientific community (UniProt Consortium 2018) (Fig. 32.15).
32.3.8 PDB
PDB (Protein Data Bank) is a databank for the three-dimensional (3D) structural data of a large number of biological molecules, such as nucleic acids and proteins. The structural data is typically obtained by X-ray crystallography, NMR spectroscopy and cryo-electron microscopy. They are submitted by structural biologists from all around the world and are freely accessible on the net via website URLs. PDBmain member organizations are PDBe, PDBj, RCSB and BMRB. The PDB is overseen by an international organization called the Worldwide Protein Data Bank, wwPDB (Berman et al. 2000; Berman 2008; Laskowski et al. 1997) (Fig. 32.16).
32.3.9 MMDB
The Molecular Modeling Database (MMDB) is a three-dimensional biomolecular structure database of experimentally determined macromolecules and hosted by the National Center for Biotechnology Information (Chen et al. 2003) (Fig. 32.17).
32.3.10 GEO
GEO (Gene Expression Omnibus) is a gene expression database that archives and freely distributes microarray datasets, next-generation sequencing analysis details and other high-throughput functional genomics datasets deposited by the research community. The main goals of GEO are to provide versatile and robust database in which researchers can efficiently store high-throughput functional genomic data, offer simple submission procedures and formats to the research community that supports complete and well-annotated data deposits and provide user-friendly mechanisms to researchers that allow users to review, query, locate and download studies and gene expression profiles of interest for query and analysis (Clough and Barrett 2016) (Fig. 32.18).
32.4 Bioinformatics Tools and Software
32.4.1 BiGGEsTS
BiclusterinG Gene Expression Time Series (BiGGEsTS) is a free tool and graphical application based on bi-clustering algorithms mainly developed for analysis of gene expression time series data (Gonçalves et al. 2009) (Fig. 32.19).
32.4.2 HCE
HCE (Hierarchical Clustering Explorer) consists of hierarchical clustering algorithm to enable researchers to determine the grouping of data with informative dendrogram and colour mosaic visual feedback and dynamic query controls (Seo et al. 2006) (Fig. 32.20).
32.4.3 ClustVis
ClustVis is a web tool which allows researchers to upload their data and create Heat maps and PCA (Principal Component Analysis) plots. Data can be uploaded as a file or by pasting data to the text box (Metsalu and Vilo 2015) (Fig. 32.21).
32.4.4 BLAST
BLAST (Basic Local Alignment Search Tool) finds regions of similarity and dissimilarity between sequences. The BLAST programme compares nucleotide or protein sequences to sequence databases and calculates identity with statistical significance (Altschul et al. 1990; Mount 2007) (Fig. 32.22).
32.4.5 Clustal
Clustal omega, Clustalw and Clustalx (Clustal series) are widely used programmes for multiple sequence alignment (Higgins et al. 1996; Chenna et al. 2003; Sievers and Higgins 2014) (Fig. 32.23).
32.4.6 Bioedit
BioEdit is a free sequence alignment editor for editing and manipulation of sequence alignment data (Tippmann 2004) (Fig. 32.24).
32.4.7 MEGA
MEGA is a tool for manual and automatic sequence alignment, phylogenetic tree preparation, estimating rates of molecular evolution, web-based database mining and testing evolutionary hypotheses (Kumar et al. 2018) (Fig. 32.25).
32.4.8 Figtree
Figtree is a graphical viewer of phylogenetic tree visualization and for producing publication-ready figures of phylogenetic trees (Rambaut 2012) (Fig. 32.26).
32.4.9 Circos
Circos server is basically for identification and analysis of similarities and dissimilarity/differences generated from gene and genome comparisons (Krzywinski et al. 2009) (Fig. 32.27).
32.4.10 Prosite
PROSITE server is protein database that consists of protein families, functional domains and functional signature sites and amino acid profiles and patterns in sequence (Sigrist et al. 2002) (Fig. 32.28).
32.4.11 CDD
Conserved Domain Database (CDD) is a protein database that consists of well-annotated multiple sequence alignments as position-specific score matrices (PSSMs) for identification of conserved domains via RPS-BLAST. CDD includes NCBI-curated functional domains based on 3D-structure information to define domain boundaries and provide functional insights into sequence/structure/function relationships, using Pfam, SMART, COG, PRK and TIGRFAMs databases (Marchler-Bauer et al. 2017) (Fig. 32.29).
32.4.12 Interproscan
InterProScan is a server to annotate protein families and domains automatically. InterPro provides functional signature analysis of proteins by classifying them into families, domains and important sites (Mitchell et al. 2019) (Fig. 32.30).
32.4.13 EasyModeller
EasyModeller is a graphical user interface programme used for homology modeling for predicting models of protein tertiary structures (Kuntal et al. 2010) (Fig. 32.31).
32.4.14 RAMPAGE/PROCHECK
PROCHECK server checks the stereochemical quality of a protein structure model; it produces Ramachandran plot to analyze the overall and residue-by-residue geometry (Laskowski et al. 2017; Lovell et al. 2003) (Figs. 32.32 and 32.33).
32.4.15 VERIFY3D
VERIFY3D server is used for determination of an atomic model (3D) with its amino acid sequence, by assigning a structural class based on alpha, beta, loop, polar, non-polar, etc. location and comparing the results to template structures (Eisenberg et al. 1997) (Fig. 32.34).
32.4.16 YASARA
YASARA (Yet Another Scientific Artificial Reality Application) is a computer programme for molecular vizualization, modeling and docking (Krieger and Vriend 2014) (Fig. 32.35).
32.4.17 BIOVIA Discovery Studio 2019
BIOVIA Discovery Studio contains BIOVIA Pipeline Pilot used for simulations, macromolecule design and analysis, antibody modeling, structure-based design, pharmacophore and ligand-based design, QSAR, ADMET and predictive toxicology, X-ray and visualization (Fig. 32.36).
32.4.18 Patchdock
The PatchDock server performs protein–protein docking and generates protein-small molecule complexes (Schneidman-Duhovny et al. 2005) (Fig. 32.37).
32.4.19 Hex
Hex tool/server is a graphics programme for docking calculation and visualizing docking modes of pairs of protein and DNA molecules. Hex is also useful for calculation of protein-ligand docking; it can superpose molecules (Macindoe et al. 2010) (Fig. 32.38).
32.5 Plant and Pathogen Genomics
Five main types of pathogenic organisms that cause plant diseases are viruses, bacteria, fungi, protozoa and worms/nematodes, which can lead from damage to death. The genome availability of plants and pathogens gives us opportunities to understand the bio systems and disease mechanisms (Tables 32.1, 32.2, and 32.3).
32.6 Conclusion
The applications of bioinformatics to plant pathology have been pivotal role in understanding of host and pathogen evolution and molecular interactions between host and pathogen. Availability of next-generation sequencing data of candidate model organisms of all kingdom through high-throughput technology is convenient to deal with biological systems and understand the biological sequence–structure–function correlation using in-silico biology tools, technology and databases. Genome annotation, assembly, bioproject, biosample submission, sequence data submission, retrieval of data, data analysis, variation analysis, conserved domain analysis, gene identification, regulatory elements analysis, gene expression analysis, structure prediction, structure visualization, structure analysis, structure classification, molecular modeling, epitope identification and mapping using 3D, drug designing, active site analysis and molecular docking, etc. play an important role to achieve biological function and understand the sequence–structure–function relationship. These all in-silico biology techniques will be further helpful in genomics-assisted crop improvement and development of designer crops with high yield and super quality.
References
Alioto T (2012) Gene prediction. Methods Mol Biol 855:175–201
Aljanabi S (2001) Genomics and plant breeding. Biotechnol Annu Rev 7:195–238
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Batley J, Edwards D (2016) The application of genomics and bioinformatics to accelerate crop improvement in a changing climate. Curr Opin Plant Biol 30:78–81
Bawono P, Dijkstra M, Pirovano W, Feenstra A, Abeln S, Heringa J (2017) Multiple sequence alignment. Methods Mol Biol 1525:167–189
Berman HM (2008) The protein data bank: a historical perspective. Acta Crystallogr A 64(Pt 1):88–95
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28(1):235–242
Bolger ME, Arsova B, Usadel B (2018) Plant genome and transcriptome annotations: from misconceptions to simple solutions. Brief Bioinform 19:437–449
Bolser D, Staines DM, Pritchard E, Kersey P (2016) Ensembl plants: integrating tools for visualizing, mining, and analyzing plant genomics data. Methods Mol Biol 1374:115–140
Bolser DM, Staines DM, Perry E, Kersey PJ (2017) Ensembl plants: integrating tools for visualizing, mining, and analyzing plant genomic data. Methods Mol Biol 1533:1–31
Cavasotto CN, Phatak SS (2009) Homology modeling in drug discovery: current trends and applications. Drug Discov Today 14:676–683
Chen YP, Chen F (2008) Using bioinformatics techniques for gene identification in drug discovery and development. Curr Drug Metab 9:567–573
Chen J, Anderson JB, DeWeese-Scott C, Fedorova ND, Geer LY, He S, Hurwitz DI, Jackson JD, Jacobs AR, Lanczycki CJ, Liebert CA, Liu C, Madej T, Marchler-Bauer A, Marchler GH, Mazumder R, Nikolskaya AN, Rao BS, Panchenko AR, Shoemaker BA, Simonyan V, Song JS, Thiessen PA, Vasudevan S, Wang Y, Yamashita RA, Yin JJ, Bryant SH (2003) MMDB: Entrez’s 3D-structure database. Nucleic Acids Res 31:474–477
Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 31:3497–3500
Clough E, Barrett T (2016) The gene expression omnibus database. Methods Mol Biol 1418:93–110
De Las Rivas J, Fontanillo C (2010) Protein-protein interactions essentials: key concepts to building and analyzing interactome networks. PLoS Comput Biol 6(6):e1000807
de Ruyck J, Brysbaert G, Blossey R, Lensink MF (2016) Molecular docking as a popular tool in drug design, an in silico travel. Adv Appl Bioinforma Chem 9:1–11
Demuth JP, Hahn MW (2009) The life and death of gene families. BioEssays 31:29–39
Dong Q, Schlueter SD, Brendel V (2004) PlantGDB, plant genome database and analysis tools. Nucleic Acids Res 32(Database issue):D354–D359
Duvick J, Fu A, Muppirala U, Sabharwal M, Wilkerson MD, Lawrence CJ, Lushbough C, Brendel V (2008) PlantGDB: a resource for comparative plant genomics. Nucleic Acids Res 36(Database issue):D959–D965
Eisenberg D, Lüthy R, Bowie JU (1997) VERIFY3D: assessment of protein models with three-dimensional profiles. Methods Enzymol 277:396–404
Ferreira LG, Dos Santos RN, Oliva G, Andricopulo AD (2015) Molecular docking and structure-based drug design strategies. Molecules 20:13384–13421
França TC (2015) Homology modeling: an important tool for the drug discovery. J Biomol Struct Dyn 33(8):1780–1793
Galperin MY, Fernández-Suárez XM (2012) The 2012 nucleic acids research database issue and the online molecular biology database collection. Nucleic Acids Res 40(Database issue):D1–D8
Gebhardt C, Schmidt R, Schneider K (2005) Plant genome analysis: the state of the art. Int Rev Cytol 247:223–284
Gonçalves JP, Madeira SC, Oliveira AL (2009) BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data. BMC Res Notes 2:124
Goodsell DS, Morris GM, Olson AJ (1996) Automated docking of flexible ligands: applications of AutoDock. J Mol Recognit 9:1–5
Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N, Rokhsar DS (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40(Database issue):D1178–D1186
Gschwend DA, Good AC, Kuntz ID (1996) Molecular docking towards drug discovery. J Mol Recognit 9:175–186
Guedes IA, de Magalhães CS, Dardenne LE (2014) Receptor-ligand molecular docking. Biophys Rev 6:75–87
Hardison RC (2003) Comparative genomics. PLoS Biol 1:E58
Higgins DG, Thompson JD, Gibson TJ (1996) Using CLUSTAL for multiple sequence alignments. Methods Enzymol 266:383–402
Jones G, Willett P (1995) Docking small-molecule ligands into active sites. Curr Opin Biotechnol 6:652–656
King GJ (2004) Bioinformatics: harvesting information for plant and crop science. Semin Cell Dev Biol 15:721–731
Kodama Y, Mashima J, Kaminuma E, Gojobori T, Ogasawara O, Takagi T, Okubo K, Nakamura Y (2012) The DNA Data Bank of Japan launches a new resource, the DDBJ Omics Archive of functional genomics experiments. Nucleic Acids Res 40(Database issue):D38–D42
Kozakov D, Hall DR, Xia B, Porter KA, Padhorny D, Yueh C, Beglov D, Vajda S (2017) The ClusPro web server for protein-protein docking. Nat Protoc 12:255–278
Krieger E, Vriend G (2014) YASARA view – molecular graphics for all devices – from smartphones to workstations. Bioinformatics 30:2981–2982
Krieger E, Nabuurs SB, Vriend G (2003) Homology modeling. Methods Biochem Anal 44:509–523
Kroemer RT (2007) Structure-based drug design: docking and scoring. Curr Protein Pept Sci 8:312–328
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19:1639–1645
Kumar S, Stecher G, Li M, Knyaz C, Tamura K (2018) MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol 35(6):1547–1549
Kuntal BK, Aparoy P, Reddanna P (2010) Easy modeller: a graphical interface to MODELLER. BMC Res Notes 3:226
Laskowski RA, Hutchinson EG, Michie AD, Wallace AC, Jones ML, Thornton JM (1997) PDBsum: a web-based database of summaries and analyses of all PDB structures. Trends Biochem Sci 22(12):488–490. PubMed PMID: 9433130
Laskowski RA, Jabłońska J, Pravda L, Vařeková RS, Thornton JM (2017) PDBsum: structural summaries of PDB entries. Protein Sci 27:129–134
Lovell SC, Davis IW, Arendall WB 3rd, de Bakker PI, Word JM, Prisant MG, Richardson JS, Richardson DC (2003) Structure validation by Calpha geometry: phi, psi and Cbeta deviation. Proteins 50:437–450
Lybrand TP (1995) Ligand-protein docking and rational drug design. Curr Opin Struct Biol 5:224–228
Lyons E, Freeling M (2008) How to usefully compare homologous plant genes and chromosomes as DNA sequences. Plant J 53:661–673
Macindoe G, Mavridis L, Venkatraman V, Devignes MD, Ritchie DW (2010) HexServer: an FFT-based protein docking server powered by graphics processors. Nucleic Acids Res 38(Web Server issue):W445–W449
Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, Basutkar P, Tivey ARN, Potter SC, Finn RD, Lopez R (2019) The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res 47(W1):W636–W641
Marchler-Bauer A, Bo Y, Han L, He J, Lanczycki CJ, Lu S, Chitsaz F, Derbyshire MK, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Lu F, Marchler GH, Song JS, Thanki N, Wang Z, Yamashita RA, Zhang D, Zheng C, Geer LY, Bryant SH (2017) CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res 45:D200–D203
Martinez M (2013) From plant genomes to protein families: computational tools. Comput Struct Biotechnol J 8:e201307001
McClure MA, Vasi TK, Fitch WM (1994) Comparative analysis of multiple protein-sequence alignment methods. Mol Biol Evol 11:571–592
Meng XY, Zhang HX, Mezei M, Cui M (2011) Molecular docking: a powerful approach for structure-based drug discovery. Curr Comput Aided Drug Des 7:146–157
Metsalu T, Vilo J (2015) ClustVis: a web tool for visualizing clustering of multivariate data using principal component analysis and heatmap. Nucleic Acids Res 43:W566–W570
Mitchell AL, Attwood TK, Babbitt PC, Blum M, Bork P, Bridge A, Brown SD, Chang HY, El-Gebali S, Fraser MI, Gough J, Haft DR, Huang H, Letunic I, Lopez R, Luciani A, Madeira F, Marchler-Bauer A, Mi H, Natale DA, Necci M, Nuka G, Orengo C, Pandurangan AP, Paysan-Lafosse T, Pesseat S, Potter SC, Qureshi MA, Rawlings ND, Redaschi N, Richardson LJ, Rivoire C, Salazar GA, Sangrador-Vegas A, Sigrist CJA, Sillitoe I, Sutton GG, Thanki N, Thomas PD, Tosatto SCE, Yong SY, Finn RD (2019) InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res 47:D351–D360
Mochida K, Shinozaki K (2010) Genomics and bioinformatics resources for crop improvement. Plant Cell Physiol 51:497–523
Moody G (2004) Digital code of life: how bioinformatics is revolutionizing science, medicine, and business. Wiley, Hoboken. ISBN 978-0-471-32788-2
Morris GM, Lim-Wilby M (2008) Molecular docking. Methods Mol Biol 443:365–382
Mount DW (2007) Using the basic local alignment search tool (BLAST). CSH Protoc 2007:pdb.top17
NCBI Resource Coordinators (2016) Database resources of the national center for biotechnology information. Nucleic Acids Res 44:D7–D19
Ong Q, Nguyen P, Thao NP, Le L (2016) Bioinformatics approach in plant genomic research. Curr Genomics 17:368–378
Pagadala NS, Syed K, Tuszynski J (2017) Software for molecular docking: a review. Biophys Rev 9:91–102
Pirovano W, Heringa J (2008) Multiple sequence alignment. Methods Mol Biol 452:143–161. https://doi.org/10.1007/978-1-60327-159-2_7. Review. PubMed PMID: 18566763
Rambaut A (2012) FigTree v1. 4.0. University of Oxford, Oxford. http://tree.bio.ed.ac.uk/software/figtree/
Sayers EW, Agarwala R, Bolton EE, Brister JR, Canese K, Clark K, Connor R, Fiorini N, Funk K, Hefferon T, Holmes JB, Kim S, Kimchi A, Kitts PA, Lathrop S, Lu Z, Madden TL, Marchler-Bauer A, Phan L, Schneider VA, Schoch CL, Pruitt KD, Ostell J (2019) Database resources of the national center for biotechnology information. Nucleic Acids Res 47:D23–D28
Schneidman-Duhovny D, Inbar Y, Nussinov R, Wolfson HJ (2005) PatchDock and SymmDock: servers for rigid and symmetric docking. Nucleic Acids Res 33(Web Server issue):W363–W367
Seo J, Gordish-Dressman H, Hoffman EP (2006) An interactive power analysis tool for microarray hypothesis testing and generation. Bioinformatics 22(7):808–814
Sievers F, Higgins DG (2014) Clustal Omega, accurate alignment of very large numbers of sequences. Methods Mol Biol 1079:105–116
Sigrist CJ, Cerutti L, Hulo N, Gattiker A, Falquet L, Pagni M, Bairoch A, Bucher P (2002) PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform 3(3):265–274
Sousa SF, Fernandes PA, Ramos MJ (2006) Protein-ligand docking: current status and future challenges. Proteins 65:15–26
Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, Simonovic M, Doncheva NT, Morris JH, Bork P, Jensen LJ, Mering CV (2019) STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res 47:D607–D613
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680
Thompson JD, Gibson TJ, Higgins DG (2002) Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics. Chapter 2:Unit 2.3
Tippmann HF (2004) Analysis for free: comparing programs for sequence analysis. Brief Bioinform 5(1):82–87
Trosset JY, Cavé C (2019) In silico drug-target profiling. Methods Mol Biol 1953:89–103
UniProt Consortium T (2018) UniProt: the universal protein knowledgebase. Nucleic Acids Res 46:2699
Wang Z, Chen Y, Li Y (2004) A brief review of computational gene prediction methods. Genomics Proteomics Bioinformatics 2:216–221
Wheeler DL, Smith-White B, Chetvernin V, Resenchuk S, Dombrowski SM, Pechous SW, Tatusova T, Ostell J (2005) Plant genome resources at the national center for biotechnology information. Plant Physiol 138:1280–1288
Xiang Z (2006) Advances in homology protein structure modeling. Curr Protein Pept Sci 7:217–227. Review. PubMed PMID: 16787261; PubMed Central PMCID: PMC1839925
Xu D, Xu Y, Uberbacher EC (2000) Computational tools for protein modeling. Curr Protein Pept Sci 1:1–21
Zhao H, Caflisch A (2015) Molecular dynamics in drug design. Eur J Med Chem 91:4–14
URLs
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Khan, A., Singh, S., Singh, V.K. (2021). Bioinformatics in Plant Pathology. In: Singh, K.P., Jahagirdar, S., Sarma, B.K. (eds) Emerging Trends in Plant Pathology . Springer, Singapore. https://doi.org/10.1007/978-981-15-6275-4_32
Download citation
DOI: https://doi.org/10.1007/978-981-15-6275-4_32
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6274-7
Online ISBN: 978-981-15-6275-4
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)