Keywords

11.1 Introduction

Plants are the major sources of food, fibre and fuel in the agriculture sector and hence play a dominant role in the world economy. Plant pathogens cause a major threat to and are responsible for the huge loss in crops by causing diseases in plants. Also, plant pathogens spread very quickly while infecting a healthy plant from a diseased plant. So the primary challenge associated with a plant pathologist is to minimise crop loss by eradiating the plant pathogen (Mack et al. 2000; Mitra 2021). Plants are persistently under the threat of several pathogens like bacteria, viruses, fungi, nematodes and others. However, molecular complicacy in plant-pathogen interactions makes it difficult to interpret. The plant pathogens that cause disease in plants are directly responsible for food security and scarcity and ultimately even threaten human health. However, plants also contain a specific immune system that provides resistance to the pathogen. Plants have evolved highly sophisticated mechanisms to resist pathogens by using different barriers and induction of specific signalling pathways. The induction of several metabolic pathways in the plant system also requires the recognition of the pathogen by pathogen-derived factors and by specific proteins (effector molecules) that are encoded by pathogens. However, if the pathogen is suppressed, these factors enable them to infect and cause diseases in plants. Due to recent developments in the fundamental biological research, many of the interesting molecular mechanisms regarding infection of pathogen, effector molecule and modulator activity of the immune systems are known (Kachroo et al. 2017; Zhang et al. 2013; de Wit 2007). In addition to the fundamental molecular biological research, due to the advancement of genomic technologies, there is a flooding of a huge amount of genomic information for the analysis by in silico methods. However, challenges exist to validate the biological data as well as for proper prediction and interpretation. In this aspect, bioinformatics-based analysis plays a major role in the management of data. Bioinformatics is generally defined as the application of computational techniques to the storing, processing and managing of the biological data that are usually generated from molecular biological experiments. The ultimate objective of bioinformatics is the functional and statistically reliable prediction from the given biological data. To facilitate this, several categories of databases, web servers as well as executable software are being developed and many are currently available with a suitable interface for analysis and interpretation of these data. Bioinformatics-based analysis facilitates to open the door to understand the complex biological processes by implementing the genomic and protein sequence analysis, advanced data mining and machine learning algorithms on biological data and molecules (Fig. 11.1). So, the new knowledge can be suitably used for several aspects of biotechnological research (Untergasser et al. 2007; Mishra et al. 2016; Singh et al. 2011; Satpathy 2014; Satpathy et al. 2015).

Fig. 11.1
figure 1

Basic tasks of bioinformatics

The challenge in controlling plant diseases lies upon the molecular basis of identification of the key pathogenic factors that are responsible for spreading in case of a specific plant pathogen. Many of these molecular pieces of information are available that can be suitably analysed by in silico methods. This chapter provides a specific report on the application of specific computational tools and the methods for plant pathogen analysis such as the study of host-pathogen interaction, molecular modelling studies and whole-genome analysis (WGA) used for plant pathology research (Alemu 2015).

11.2 Applications of Bioinformatics in the Plant Pathological Study

Biological databases are a repository of several molecular biological data that are stored in a consistent manner. For example, the database might contain a single file containing several records but each of which having the same set of information. Similar several tools are available to understand the mechanism and function of metabolites and compounds and their pathway details involved in the phytopathological mechanism. Plants having the potential to resists themselves from the infection of a pathogen are known as resistant plants and in this case the host-pathogen interaction is considered as incompatible. Despite the economic impact of plant pathology, the fundamental molecular mechanisms underlying the pathogenicity of pathogens are still poorly understood, which opens the door for implementation of bioinformatics methods (Andersen et al. 2018; Scholthof 2001; Narayanasamy 2008).

From the biological point of view the in-depth study of plant pathogenesis processes includes four different approaches: (a) gene expression analysis, (b) structural and comparative genomics, (c) molecular modelling study and (d) GWAS analysis. Currently, the databases contain many numbers of molecular data of host plants, as well as the information of plant pathological aspects of specific pathogens provides a strong platform to analyse the data (Fig. 11.2). Many of the databases and tools have been developed to perform a thorough analysis specifically in the plant pathology area (Tables 11.1 and 11.2).

Fig. 11.2
figure 2

Bioinformatics-based methods to study plant pathology

Table 11.1 Database and resources of plant pathology study
Table 11.2 Some of the major bioinformatics resources, availability and their application in the plant pathological study

Some of the specific implementations of the bioinformatics applications for the plant pathology study are described under the following sections:

11.2.1 Plant-Pathogen Interaction Study

Plant-pathogen interactions exhibit several important molecular responses based on which pathogens can colonise and spreading of the disease occurs. For example, some fungi produce secondary metabolites that control a wide range of molecular functions such as the production of virulence factors siderophores and phytotoxins that lead to the establishment of the disease. Shi-Kunne et al. carried out in silico analysis to identify the 25 potential secondary metabolites producing gene clusters in case of Verticillium dahliae (Shi-Kunne et al. 2019). Graham-Taylor et al. described the number of gene clusters with a potential role in virulence in Sclerotinia sclerotiorum (Graham-Taylor et al. 2020). Computational analysis by Kamal et al. described the identification of interacting regions in Begomovirus-encoded βC1 protein with cotton plant (Gossypium hirsutum) SnRK1 protein by using computational approaches including sequence recognition, and binding site and interface prediction methods followed by experimental analysis (Kamal et al. 2019). Pavlopoulou described the interacting molecules that are involved in plant defence by building a protein-protein interaction (PPi) network, and provided evidence for prominent crosstalk between the various defence mechanisms to several stresses including pathogen infection (Pavlopoulou et al. 2019). Kaur et al. (2017) analysed the expression pattern and role of pathogenesis-related (PR) proteins (possess antifungal activities such as PR-1, PR-2, PR-5, PR-9, PR-10 and PR-12) in case of Arabidopsis thaliana and Oryza sativa by using computational analysis. The in silico study about the plant cell wall-degrading enzymes (PCWDEs) has been carried out by Chang et al. (2016). As plant pathogens secrete PCWDEs for the degradation of plant cell walls, to counter this, plants also release some PCWDE inhibitor proteins (PIPs) to reduce the infection. However, some of the species of the pathogen Fusarium can escape this PIP inhibition. So in silico study has been performed to understand this resistance mechanism by analysing the genomic structure of the pathogen.

11.2.2 Gene Expression, Structural and Comparative Genomics Study

Study about the expression pattern of pathogenesis-related genes is important to build the computational model of the establishment process of plant diseases. The gene expression analysis also leads to identifying the pathogenic genes and expression profile in different host systems. This finally provides insights into the possible ways of attack and resistance mechanisms involved in the pathogenesis process. In addition to this, comparative genomics about the different pathogens and among host and pathogens is essential to identify the region of the gene responsible for pathogenesis and resistance. Also, it is possible to explore the distribution of homologous genes and their locus in several pathogenic genomes. The gene expression pattern as well as structural and comparative genomics-based study uncovers the path to gain deeper knowledge about the relationship between the host plant and pathogens. Pinzón et al. studied the gene expression of Phytophthora infestans in host cells and identified the favourable and non-favourable patterns of gene interaction. Further sequence-level analysis about resistance genes has been proposed to identify virulence gene pathogens and gene families (Pinzón et al. 2009). Comparative genomics analysis by Klosterman et al. (2011) established a set of proteins that are shared among three selected fungal pathogens which cause the wilt disease. A homologue of a bacterial homologous gene glucosyltransferase that synthesises virulence-related osmoregulated periplasmic glucans to adopt the pathogen in osmotic stress has been identified. Valero-Jiménez et al. (2019) used comparative genomics methods to determine the function of 7668 protein families of selected 9 numbers of Botrytis species. These families of proteins were observed in two distinct phylogenetic clades that contain unique genes for secondary metabolite synthesis. Benevenuto et al. implemented the comparative genomics approach to analyse the genetic basis of invading the smut fungi that infect different host systems. Different types of genes such as positively selected genes, gain or loss of effector genes, orphan genes and a genomic signature have been studied in terms of their host specialisation (Benevenuto et al. 2018). Adhikari et al. (2013) reported the sequencing, assembly and annotation study of given six Pythium genomes with other plant pathogenic oomycetes such as Phytophthora species. The comparative genomic analysis established the close relationship between the oomycetes and Phytophthora species based on the involvement of different protein families with diverse functions. Different proteins such as proteolytic enzymes, effector molecules and cell wall-degrading enzymes were found to be associated according to the trophic behaviour of the pathogen. Trantas et al. conducted extensive comparative genomics of the pathogens Pseudomonas corrugata and Pseudomonas mediterranea to identify the gene clusters for the biosynthesis of siderophores and other metabolites (Trantas et al. 2015). Chen et al. (2019) studied the genomic assembly of Puccinia hordei (Ph), which is a damaging pathogen of barley, and identified three candidate genes that can be investigated further for their biological properties, to uncover the mechanism of pathogen virulence. Genomic analyses by Méndez et al. showed the phylogenetic relationships among three Chilean strains of Clavibacter and identified the unique virulence factors responsible for virulence activity in tomato plants (Méndez et al. 2020).

11.2.3 Molecular Modelling Study

Progress in computational molecular modelling studies in the last 20 years about plant-pathogen analysis has revealed some of the key mechanisms of this complex process. Due to the availability of the genomic and protein sequence information as well as the three-dimensional (3D) structures, it is possible to use several molecular modelling approaches to deduce basic molecular phenomena associated with it. Molecular modelling analysis depicts different processes such as the interaction of pathogen-secreted molecules with host target molecules followed by their responses. It is also essential to study different molecules and the metabolic pathways in the case of the plants that play an important role in establishing the diseases. Apart from this, the activity, affinity and specificity of specific agrochemicals towards the pathogenic target can be obtained by applying computer-aided drug design (CADD) methods in the plant pathology area. After choosing specific target molecules in the database such as Protein Data Bank (PDB), specific chemical molecules can be docked to identify the binding site, energy as well as position of chemicals by the process of molecular docking.

A review by Shanmugam and Jeon (2017) described two major categories of computer-based drug discovery strategies, such as structure-based drug design (SBDD) and ligand-based drug design (LBDD) as shown in Fig. 11.3. Several methods such as structure prediction, molecular docking, de novo ligand design, pharmacophore modelling and quantitative structure-activity relationship modelling are used to facilitate the drug design process as described in Fig. 11.3. Shanmugam et al. (2019) studied the essential enzyme such as MoRPD3, a histone deacetylase (HDAC), that causes histone protein acetylation and deacetylation, which helps in the growth and development of rice blast fungus, Magnaporthe oryzae. So considering the protein as the drug target to which several compounds were virtually screened by molecular docking method followed by in vitro study and 3D QSAR analysis suggested that [2-[[4-(2-methoxyethyl) phenoxy] methyl] phenyl] boronic acid compound is a good hit as a HDAC inhibitor. Kumar et al. (2020) used the molecular docking (protein-protein) method between the polygalacturonase inhibitor protein of banana and polygalacturonase (PG) of the pathogen Erwinia carotovora. Further, in silico site-directed mutagenesis, docking and molecular dynamics simulation results revealed that particularly the residues at the active sites and the structural changes are responsible for the inhibition of enzyme activity. System biological computational model has been utilised by Islam et al. (2020), who identified three potential antifungal compounds from Bacillus subtilis that can be suitably used for suppression of Rhizoctonia solani mycelium growth. In silico analysis was performed by using homology modelling and molecular docking followed by molecular dynamics simulation and ADMET analysis. Imran and Ravi (2020) predicted 3D structures of potential drug target proteins of the plant pathogen Colletotrichum falcatum that causes ‘red rot’ disease of sugar cane. This study was conducted by using online resources to construct homology models of drug target proteins against which the suitable drug molecule can be designed. Mishra et al. (2019) used virtual screening and molecular docking strategies to find the lead compounds against fungal diseases such as Fusarium wilt, rice blast, late blight of potato, necrotrophic, early blight of Solanaceae members, flax rust to eradicate these. In the study, seven different antifungal ligand molecules were docked into the selected target proteins of six different fungal pathogens and it showed that several hydrophobic and polar contacts are responsible for binding of the ligand molecule. Pathak et al. (2016) considered molecular targets such as ABC transporter, Amr1, beta-tubulin, cutinase, fusicoccadiene synthase and glutathione transferase of Alternaria brassicicola in order to study the binding affinity with phytoalexin. Molecular modelling and docking confirmed that the compound spirobrassinin can be used for the protection of Brassica plants against infection by Alternaria sp. In the work by Prajapat et al. (2011) the homology modelling method was followed to deduce the 3D structure of coat protein of mimosa yellow vein virus. The subsequent molecular docking study was performed on the modelled structure of coat protein with α-lactalbumin and further binding pattern was analysed. A recent molecular modelling and protein-protein docking study of pepper yellow leaf curl virus (PepYLCV) pathogenicity protein BC1 and pepper SnRK1 protein revealed the involvement of domain-level interaction in pathogenicity (Nova and Jamsari 2020). The in silico approach for the domain arrangement study of several R-proteins belonging to 33 plant organisms was analysed by Sanseverino and Ercolano (2012). Detailed analysis performed on conserved profiles revealed that specific domain features and several atypical domain associations were also obtained from a diverse set of R-proteins.

Fig. 11.3
figure 3

Basic strategies of drug design approaches against plant pathogens

11.2.4 GWAS Study in Plant Pathology

Genome-wide association studies (GWAS) are an effective tool widely used for mapping multiple traits in case of wild-type genome. The advancement in genomic sequencing technologies with reduced cost of genotyping, enhanced computational efficiency and development of improved algorithms has made the genome-wide association study more perfect to explore the position of several essential traits. The basic objective of the GWAS is to identify single-nucleotide polymorphisms (SNP) in the given population, so that any other trait can be measured that is associated with it. Hence, it is expected that such associations may provide variants in specific genes that play a crucial role in the phenotype of interest. Presently, this method is suitable for identifying important genes in natural populations and is being widely used in case of plants for traits as crop yield, crop quality, disease resistance and abiotic stress tolerance (Skøt et al. 2005; Quesada et al. 2010; Rosenberg et al. 2010). The basic steps followed for the GWAS analysis have been described by Marees et al. (2018) and are outlined as below:

DNA sample (from cases and controls) → Hybridise DNA to the array → Identify the genotypes → Find additional SNPs → Find the hotspot for the disease resistance gene → Compute the association of SNP markers with disease resistance genes → Perform statistical analysis → Interpret findings

Alqudah et al. (2020) conducted the genome-wide association study (GWAS) with the aim to map the stem rust resistance loci of barley plant genome by identifying single-nucleotide polymorphic (SNP) markers. Bartoli and Roux (2017) described the importance of GWA mapping tool for the detection of genomic regions associated with disease resistance that predicts the pathogenicity in plant pathogens. Shrestha et al. (2019) reviewed the implementation of GWAS analysis in five major disease resistance varieties of maize plant along with novel SNPs and identification of novel disease resistance genes associated with it. Sánchez-Vallet et al. (2018) used both GWAS and classic linkage mapping methods to establish the function of the avirulence effector of Zymoseptoria tritici that is recognised by the resistance genes of wheat. A GWAS study by Volante et al. (2017) identified two regions (qBK1_628091 of chromosome 1 and qBK4_31750955 of chromosome 4) in the genome of Oryza japonica rice plant that are associated with the single-nucleotide polymorphism (SNP) marker and proposed to be involved in bakanae disease resistance mechanism.

11.3 Future Aspects

Despite many advancements in research in the area of plant pathology, molecular basis of various functions is still poorly understood. Hence it becomes essential to study the complex mechanism using bioinformatics-based tools and methods. Some of the opportunities for the application of bioinformatics in plant pathology are as follows:

  • Exploring the phylogenetic as well as the structural basis for the study of biomolecules associated with the plant immune system and their distribution across taxonomical diverse species.

  • Analysing the plant pathological system and understanding the mechanism of resistance against virulence factors acquired by diverse host plants.

  • Understanding the structural features of specific plant proteins to predict the pathological phenomena like how pathogens cause disease in plants and how plants defend themselves against pathogens.

  • Development of a unique database of the plant pathogenic target is essential to discover the role of new agrochemicals as the effective drug molecule.

  • Use of system biological study by using the available multiomics data is an important aspect, in which it will enable to develop new sophisticated models relating to the phenomena like plant-pathogen-disease establishment-environmental factors/parameters.

  • The next-generation sequencing data from the database can be conveniently used for the analysis of plant genome and pathogens to elucidate the key genomic features associated with the pathogenesis.

11.4 Conclusion

Plant diseases cause significant destruction of crop plants ultimately leading to huge economic loss worldwide, especially in the food production sector. So it is crucial to study the disease-causing mechanism related to physiological systems in case of plants. Research on plant-pathogen as well as the molecular basis of the study is interesting as well as complex too while conducting experiment and interpreting the result. However in the recent age, implementation of bioinformatics-based application makes the prediction task easy. The availability of sophisticated software tools and databases for the biological information about the plant pathology enables researchers to focus on in silico studies of individual components in which genes and proteins can be investigated. In this chapter a recent view of different bioinformatics-based methods that are being used by researchers has been provided. In addition to this, major bioinformatics resources have been listed out that can be implemented to retrieve and analyse plant pathological data. However, in the upcoming years, one of the major challenges for the scientific community of plant pathology is more utilisation of the genomics data and tools in model plants so that it can be extrapolated to the disease management aspects. Ultimately this will lead to the enhancement of productivity. Therefore, bioinformatics-based findings would provide a deeper understanding and insights into plant pathogen-host protein interactions and will ultimately lead to understanding of the complex plant pathological system.