Introduction

The term ‘host–pathogen interaction’ refers to the ways in which a pathogen (virus, bacteria, prion, fungus, and viroid) interacts with its host. Pathogens adapt to the changes, and find alternative ways to survive and infect a host. They are infectious agents that cause diseases in a host body, when the host immune system fails against them. Questions like how the pathogens function, how their entry point into the host is facilitated through the biological barriers, and how they survive inside a host that is often under treatment or immunized for the same pathogen, can be answered by exploring host–pathogen interactions. Host–pathogen interactions can be described on the population level (virus infections in a human population), on the organismal level (pathogens infecting host), or on the molecular level (pathogen protein binding to a receptor on a human cell). However, before stepping into methodological details of host–pathogen interaction processes, a brief glimpse into the history of this research field is included here to sum up the how(s) and why(s) of recent advancements of this field.

Some of the earliest research works in the domain of host–pathogen interactions are (i) study of host–pathogen interaction in mouse typhoid caused by Salmonella typhimurium [141], (ii) genetic study of physiology of parasitism of the corn rust pathogen Puccinia sorghi [31], (iii) a correlation study of a-galactosidase production and host–pathogen interaction between Phaseolus vulgaris and Colletotrichum lindemuthianurn [42], (iv) study of ultrastructural aspects of a host–pathogen relationship of a deuteromycetes fungus, Pyrenochaeta terrestris with two Allium cepa (onion) varieties with the help of electron microscopy [56], (v) fine structure study of principal infection procedure during infection of Barley by Erysiphe graminis [40], (vi) a study on proteins which obstructs the action of the polygalacturonases (polygalaicturonide hydrolases, EC 3.2.1.15) released by the fungal plant pathogens Fusarium oxysporum, Colletotrichum lindemuthianum, and Sclerotium rolfsii. These proteins are extracted from the cell walls of Red Kidney bean hypocotyls, tomato stems, and suspension-cultured sycamore cells [1], (vii) a study on proteins secreted by plant pathogens, which impedes enzymes of the host having the ability to attack the pathogen. The study is conducted on an interaction system of a fungal pathogen (Colletotrichum lindemuthianum) and its host, the French bean (Phaseolus vulgaris) [2], (viii) a study on a single plant protein that efficiently hinders endopolygalacturonases secreted by Aspergillus niger and Colletotrichum lindemuthianum [46], (ix) a molecular basis study to showcase mutation of Xanthomonas campestris to overcome resistance in pepper (Capsicum annuum) [59], (x) a study on stress and immunological response in host–pathogen interactions [90].

Some recent research works have focused on (i) the basic notion of virulence and pathogenicity, which defines and suggests a classification system for microbial pathogens based on their capacity to cause damage as a consequence of the host’s immune response [17], (ii) model organisms for host–pathogen interactions, i.e., C. elegans [70], D. melanogaster [91, 131] and zebrafish [53, 127] among others, (iii) molecular cross-talk of host–pathogen interactions where Type III secretion system is mentioned [108], (iv) novel studies involving epigeneticsFootnote 1 [49], metallobiology [11], quantitative temporal viromicsFootnote 2 [134], heterogeneity in same host tissue [14], and computational systems biology [36] of host–pathogen interactions.

All these investigations indirectly show us the trend of development of the host–pathogen interactions research field. The field has started with sporadic research works of a pathogen and its interaction with a host. The earliest research has been done on host–pathogen interactions with respect to environmental factors, like light, temperature, season, and pathogen/host population among others. Later some organisms, like C. elegans and D. melanogaster have been found as model organisms to study the pathogen behavior of other complex hosts (human beings) due to their easy body plan, known genome structure and short life cycle. Gradually, certain proteins and then protein clusters have been marked for taking part in host–pathogen interactions. Moreover, definite classification has been found for the mechanism of host–pathogen interactions at the advent of recent developments in imaging and molecular biology techniques.

Moreover, some research works have defined and have given direction to the host–pathogen interactions research field. Discovery of distinct secretion systems [30, 47, 68, 100, 101, 135] has provided the basic background of host–pathogen interaction research. The concerned studies have spanned from genome locus [68] to biochemical and genetic evidence [88]. With discovery of PPI prediction methods [10], the chance of finding host–pathogen protein pairs and their interactions has become more prominent and such studies have given a different direction to the research field. Then methods have been developed for the machine learning based in silico prediction of secretion system associated proteins [4]. There are also a couple of newly proposed methods [54, 84], which provide new glimmer of hope to the research field in controlling pathogenesis in a host as described below.

  • Secretion systems Type I [135], Type II [30], Type III [47] and Type V [100] have been discovered in 1980s, which have defined the base for host–pathogen interaction research.

  • Kuldau et al. [68] have predicted 11 ORFs from virB locus in 1990. Based on a hydropathy plot, they have analyzed that nine of them encode proteins which may interact with membranes and may form a membrane pore or channel to mediate exit of the T-DNA copy. This is the first indirect indication of a distinct secretion system, later known as Type IV Secretion system (T4SS).

  • Pukatzki et al. have functionally defined T6SS in 2006 [101].

  • Mougous et al. in 2006 have provided biochemical and genetic evidence that a virulence-associated genetic locus of P. aeruginosa, termed as HSI-I, encodes a protein secretion apparatus (T6SS) [88].

  • Machine learning-based prediction of PPIs have been done by Bock et al. in 2001 [10]. They have used Support Vector Machine (SVM) to train and predict interactions based on primary structure and related physicochemical properties. This work has provided a shift in research direction from genes to their protein counter parts and their nature of interaction.

  • First ever machine learning-based prediction of Type III secretion system associated proteins have been done by Arnold et al. in 2009 by analyzing the amino acid composition and secondary structure composition of a few experimentally verified effector proteins at N-terminal [4].

  • A few new studies and methods have proposed new avenues of future host–pathogen interaction research, i.e., a new way of studying host–pathogen interaction by dendritic cell subtypes [84] and chemoproteomic profiling of host and pathogen enzymes for finding candidates (proteases) to disrupt pathogenic mechanisms which have often boosted the host’s defense mechanisms directly or indirectly [54].

The present review tries to encompass the in silico prediction of host–pathogen interactions by machine learning and the related aspects. It has been organized into dedicated sections of classification of host–pathogen interactions, availability of host–pathogen interaction data, prediction of host–pathogen interaction domains, image processing-based research techniques, and conclusive remarks. There are several substrates and pathways whereby pathogens can invade a host. The human body has its own natural defense mechanism against some of the common pathogens in the form of an immune system that acts against these pathogens. Pathogens have the capability to adhere to host tissues, to evade host defenses, and to invade host cells. However, deeper understanding has revealed that each pathogen has its own variation of these themes [107]. Host–pathogen interactions take place between a host and a pathogen through the protein(s) and gene(s), and by disrupting normal functioning of pathway(s), forming biofilm(s), inhibiting macrophage activity and by other methods. In this review, we have briefly discussed the various probable factors that directly or indirectly contribute to host–pathogen interactions. Pathogens can either attack a host at the gene level by emitting RNA, or they can release proteins that could lead to pathogenicity, or they can inhibit the mechanism of macrophage. Some pathogens utilize the components of a host system to survive in the host. These components are called host factors. In a few cases, some factors of a pathogen can initiate the autophagy mechanism, which acts in favor of the host. The classification of the host–pathogen interactions is based on traditional pathogen invasion into host.

Fig. 1
figure 1

Classification of some common pathogens and a list of diseases caused by them

The review starts with categorization (Fig. 1) of pathogens, and makes a comprehensive list of diseases caused by them. The following section discusses the classification of host–pathogen interactions based on different biology-based reasoning. Then, the widely used in silico prediction methods in the domain of host–pathogen interactions are described. Moreover, an extensive list of the online repositories is given. The review concludes with a brief discussion that includes the merits and demerits of this research field in general, a few scopes for future research and concluding remarks.

Classification of host–pathogen interactions

The components of a host–pathogen interaction can be broadly classified into four stages, i.e., invasion of host through primary barriers, evasion of host defenses by pathogens, pathogen replication in host, and a host’s immunological capability to control/eliminate the pathogen. A pathogen can invade a host only after breaching the primary host defenses. Pathogens contain virulence factors that promote and cause disease. The greater the virulence, the more likely the disease will occur. We have classified the host–pathogen interactions according to these stages. A summary of the methods discussed in this review has been diagrammatically represented in Fig. 2. However, in silico prediction methods used for detection of such interactions have been described in the Section “1”. The stages mentioned below are overlapping in nature. They do not have a clear boundary between them. The in silico prediction methods described later cannot be uniquely associated to only one of the stages. Their applicability spans over many or all the stages of host–pathogen interactions.

Fig. 2
figure 2

Classification of host–pathogen interactions

Invasion of host through breach of primary barriers

One of the main ways in which pathogens invade the host is via protein secretion. Pathogens, particularly Gram-negative bacteria, which cause pathogenesis in host, consist of secretion systems. These secretion systems release proteins, called effectors, into the body of the host when they come in contact with the host. There are at least six specialized secretion systems in Gram-negative bacteria. Type I, Type II, Type III, Type IV, Type V, and Type VI are the prominent ones based on their mechanisms of host infection. Details of these mechanisms can be obtained from Costa et al. [27]. Numerous secreted proteins are crucial in bacterial pathogenesis. We have described a few of them here, i.e., toxins, urease, and multivalent adhesion molecules.

Toxins are substances created by plants and animals that are poisonous to humans. Most toxins that cause problems in humans come from germs such as bacteria. Toxins can be small molecules, peptides, or proteins that are capable of causing disease on contact with or absorption by body tissues interacting with biological macromolecules such as enzymes or cellular receptors. These toxins, once in the body of the host, intervene with the normal functioning of the metabolism of host. Minimized toxin expression in a pathogen has a lesser effect on the stimulation of host’s TCR signaling pathway at the time of attack than that with higher toxin expression. It has been observed that viruses interact with different proteins of individual pathways temporally [117]. The molecules that are secreted by Gram-negative pathogens lead to damage of the host cells. The vesicle released from the enclosure of the growing bacteria serves as a container for the proteins and lipids of the Gram-negative bacteria. This suggests the importance of vesicle-mediated toxin delivery for the onset of infection in the host.

Effector proteins are secreted by pathogenic bacteria for their entry into the host. Effector proteins help a pathogen for invading host tissue, suppressing the host’s immune system, and often help the pathogen in its survival. Effector proteins are crucial for virulence. For example, in Yersinia pestis (the causative agent of plague), loss of the T3SS has rendered the bacteria completely avirulent [80]. Naive Bayes classifier and support vector machine have already been applied to detect effector proteins of T3SS [4, 132]. More details regarding the methodology are given in the Section “1”.

Urease (an enzyme) plays an important role in Mtb–host interaction [23]. Urease is present in many species of mycobacterium, and its presence/absence is frequently used in the speciation of mycobacteria. Urease has been considered to be a virulence factor for several pathogenic microorganisms. Generation of ammonia by urease of urinary pathogens, such as P. mirabilis, has contributed to its pathogenesis due to its toxicity to renal epithelium, participation in complement inactivation, and promotion of urinary stone formation [13]. Urease of H. pylori alkalinizes the bacterial micro-environment in the stomach and is toxic to stomach epithelium [119]. In the case of Mtb, urea is readily available to the bacteria in both its intracellular and extracellular locations within the host.

The multivalent adhesion molecule (MAM) is responsible for establishing high-affinity binding to host cells during early stages of infection [63]. MAM7 connects to a host via protein–lipid (phosphatidic acid) and protein–protein (fibronectin) interactions. MAM7 has been found on the outer membrane of the Gram-negative pathogens, which contributes to its virulence.

Evasion of host defenses by pathogens

In order to survive inside the host, the pathogens need to avoid the host defense mechanism. Mycobacterium tuberculosis (Mtb) showcases that it actively transcribes a number of genes involved in fortification and evasion from a host system [103]. Assessment of the genome of 58 strains of Staphylococcus aureus reveals that all the immune evasive proteins are present in all the strains but not all the surface proteins [81]. Remarkably, four strains have surface and immune evasion genes similar to human strain. On the other hand, the putative targets of these proteins vary in different hosts, which proposes that these proteins are not crucial for virulence. Signaling for anti-inflammation by glycolipids and host–system interaction may be considered a method of Mycobacteria to evade the host or may be playing a vital role in preventing extreme inflammatory response [128].

Pathogens often affect the essential pathways of their hosts with the aim of evading the host defenses. The NF- ?B family of transcription factors help in the development of APCs (antigen-presenting cells) and the lymphocyte [124]. Once the host is compromised, the NF- ?B pathway gets activated. HIV-1 mostly depends on its host for survival, as it has a few genes of its own. An integrated study of HIV-1 and human signal transduction pathways have been carried out to infer that most of these pathways may get effected by HIV virus during its life cycle [7]. It has assessed and analyzed all possible paths (perturbed and unperturbed) starting from one protein (start point) terminating into another (end point).

Human proteins potentially targeted by EBV (Epstein–Barr virus) tend to be hubs in the human interactome. This is consistent with the hypothesis that hub protein targeting is an effective mechanism for viruses to convert pathways for their use [16]. Bacterial and viral pathogens are more inclined to interact with hub proteins, and the proteins that are central to multiple pathways in the network [38]. Certain cellular mechanisms, like cell cycle regulation and nuclear transport, participate in these interactions with a different set of pathogens. A study has identified 3073 human-B. anthracis, 1383 human-F. tularensis, and 4059 human-Y. pestis PPIs (protein–protein interactions) [39]. As suggested by Dyer et al. [38], these PPIs have occurred among those hub and bottleneck proteins. The extracellular hydrolytic enzymes, especially the aspartyl proteinases (Saps) secreted by C. albicans, are major factors of its pathogenicity [92]. Protein Chaperon 60 and 60.1 have a higher impact on activation of the cytokines than the protein Chaperon 60.2 [75]. In Staphylococcus aureus, proteins EsxA and EsxB act as virulent factors to enforce pathogenesis [15]. Mutants that do not secrete these proteins have been observed for failing to enforce strong pathogenesis. Among two closely related families of proteins, PE and PE_PGRS, PE_PGRS of Mtb activates a considerable humoral immune response but not PE [29]. Further study suggests that unlike PE, certain PE_PGRS genes are expressed during infection and antibody response. In case of Enterovirus, 71 genes out of 699 get differentially expressed significantly during infection [77]. Lack of the flagella gene in Salmonella typhimurium contributes to its virulence. Addition of flagella gene increases the cytotoxicity. However, it does not increase the production of IL-6 (interleuken-6) [96].

One of the crucial host defenses is the macrophage. Hence, macrophage inhibition is a factor using which the pathogen evades the host immune mechanism. Macrophage activation happens due to multiple components, i.e., gene(s) encoding receptor(s), signal transduction molecule(s), transcription factor(s), and bacterial component(s) that activate Toll-like receptor(s) (lipopolysaccharide, muramyl dipeptide, lipoteichoic acid, and heat shock proteins) [94] among others. Pathogens attempt to survive in the host by preventing the macrophages from acting on them. It has been found that pathogens disrupt the enzymatic activity in activated macrophages by disrupting the actin filament network [50].

It has been identified that falsatin is an endogenous protease inhibitor of Plasmodium falciparum. Analysis of inhibition of normal functionality of macrophages to engulf pathogens and ingest killed parasites due to the functioning of ornithine decarboxylase has been done by Nairz et al. [60]. Due to pathogen-specific responses, interleuken-12 production is inhibited for Mtb, hence allowing the host to fight against the pathogen. It has been found that 26 to 37 proteins of HIV-1 are associated with MDM (monocyte-derived macrophages) derived from HIV [22]. Inhibition by Mtb can be avoided with the help of IFN- ? and transfection of LRG-47 [52]. It has been found that Mtb residing in macrophage switches to anaerobic growth [114] to evade host defense for a longer period of time.

The crosstalk of host–pathogen interactions is often governed by miRNAs [48, 111, 112]. The small RNAs, like siRNAs and shRNAs, also play a vital role in host–pathogen interactions. Konig et al. [62] have studied the association of siRNAs with host–pathogen interactions. They have explored it by combining genome-wide siRNA analysis along with the knowledge from human interactome databases. Pathogens have short linear motifs (SLiM) that have high similarity with host SLiMs. Motif mimicry is used by pathogens to rewire host signaling pathways by co-opting SLiM-mediated protein interactions to affect the host systems [130].

Pneumolysin (an enzyme) is a key virulence factor [78]. It activates multiple genes and signal transduction pathways in eukaryotes. Cytolytic effect of Pneumolysin contributes to lung injury and neural damage. It sometimes induces apoptosis in neurons and other cells. It can also trigger host-mediated apoptosis in macrophages, thus magnifying extermination of pathogens.

Pathogen replication in host

For surviving inside a host, pathogens have multiple ways to facilitate their growth by speedy replication. First of all, they need a few genes and proteins to survive effectively in the host, while many more genes and proteins are required for their survival outside the host. A study on the metabolic network of the pathogen Salmonella typhimurium has revealed 1083 genes catalyzing 1087 metabolic and transport reactions. This suggests that a minimal set of potent metabolic pathways within Salmonella typhimurium is required for its favorable replication of Salmonella typhimurium within the host [104]. Erythrocytic malaria parasites need proteases for a number of their cellular processes [98] in order to survive in the host.

Pathogens have evolved strategies to promote their survival by performing hijacking of the host cells they infect. Viruses implant their DNA sequence into the normal sequence of these hosts in the hope of their better survival [105] inside the hosts. A genome of the strain of Mtb, H37Rv, made up of 4000 genes comprising 4,411,529 base pairs, has a high guanine and cytosine content [24]. In this genome, 194 genes are required for the growth of Mtb [110]. A large number of these genes is unique to mycobacteria and its closely related species. This leads to the fact that the mechanism of infection of Mtb is different from other pathogenic species.

Some pathogens even respond to more than one micro-environment for their replication and survival. The genes responsible for Snm (secretion in mycobacterium) protein secretion in a mutation of Mtb, which is Mycobacterium smegmatis, are homologs of their Mtb counterpart [26]. This suggests that some strains may have similar secretion mechanisms. Four essential gene products (Sm3866, Sm3869, Sm3882c, and Sm3883c) are needed for Snm secretion. Mtb exists in various metabolic states. This fact indicates that it may be responsive to more than one micro-environment [45].

The genome of Mycobacterium tuberculosis possesses a large family of Ser/Thr protein kinases (STPKs). STPKs have been found to play an important role in cell division and cell envelope biosynthesis [87]. The outer membrane of the bacteria facilitates the interaction between a host and a pathogen [67]. C. albicans have the capability to colonize and infect the majority of the tissues of the human host, which indicates that it can have functionally distinct proteinases (enzymes performing proteolysis) so as to have enough flexibility to multiply and survive in the host.

Sometimes a host itself unknowingly facilitates/inhibits the survival of its pathogens. These facilities are referred to as the host factors. These factors help in pathogen replication, transcription, integration, growth, 198 propagation, pathogen entry, and host–pathogen interactions among others. A set of 295 cellular cofactors (of host) are essential for replication of influenza virus in the early stage[61]. Among these cofactors, 181 are highly significant in host–pathogen interactions, 219 help in efficient influenza virus growth, 23 have role in vital entry, and ten are required for post-entry steps of virus replication. Small molecule inhibitors of multiple factors, including vATPase and CAMK2B, go against influenza virus replication. A set of 116 Dengue Virus Host Factors (DVHF) are needed for the propagation of DENV-2 (dengue virus type 2) [115]. Among 82 human homologs of dipteran DVHF, 42 have been identified to be human DVHF. A set of 311 host factors have been found to be responsible for the growth of HIV-1 [143]. Considering HIV dependency factors obtained previously in [12] [143], it is observed that the cardinality of the set of intersection is 311 host factors. Six newly identified host factors are AKT1, PRKAA1, CD97, NEIL3, BMP2k, and SERPINB6 [143]. A set of 250 such factors in HIV has been identified [12]. Rab6 and Vps53 play a role in viral entry, and TNPO3 is important for viral integration and Med28 for viral transcription. HDF genes show a stronger presence in the immune cell, thus allowing the viruses to evolve in the host cells that perform the life-cycle functions needed for them to survive. A set of 213 host factors and 11 HIV-encoded proteins have been found to be responsible for HIV-1 replication [12]. Among them, a few proteins help in regulation of ubiquitin conjugation, DNA damage response, proteolysis, and RNA splicing. Forty new factors play a vital role in the process of initiation and/or kinetics of DNA synthesis. Fifteen proteins with different functions have been found to play a significant role in nuclear import or viral DNA integration.

Pathogens like M. laprae cannot survive independently. Hence, they convert the glial cells of a host into progenitor cells and using these progenitor cells, it can survive and spread infection inside the host [55]. It alters the genetic structure of the adult Schwann cells to form the progenitor cells. However, it is still unknown how long M. laprae can survive in the de-differentiated Schwann cells, as they will eventually differentiate back into adult Schwann cells.

Often apoptosis of host factors has been found to be involved in bacterial growth and sustenance inside host [144]. Apoptosis contributes to the processes of the host-cell deletion method, triggering the inflammation and defense mechanism. Apoptosis by the pathogen Bordetella pertussis allows Bordetella to survive in the introductory stages of infection. After the pathogen has successfully colonized the tissue of the host, it stops producing the toxin adenylate cyclase hemolysin.

Biofilm formation plays a major role in host–pathogen interactions. This is a mechanism of pathogens by which they form a biofilm for their survival in the host, often utilizing degraded host proteins. Leucobacter chromiireducens subsp. solipictus strain TAN 31504 forms biofilm. Exposure to TAN 31504 leads to change in a few innate immunity-related genes in C. elegans [89]. Esp (a serine protease secreted by S. epidermidis) degrades 75 proteins of Staphylococcus aureus by proteolytic activity, which include 11 proteins essential for the formation of biofilm [121]. Esp also degrades several human receptor proteins involved in colonization and infection by the pathogen for the benefit of the host.

A host’s immunological capability to control/eliminate the pathogen

In order to prevent occurrence of infection/disease, the host body launches immune response with respect to the pathogenic invasion, i.e., high expression of certain genes [122], autophagy [118, 129], role of dendritic cells [84, 106], glycoconjugates [86, 87], and iron [32, 93] in activation/alteration of host immune system.

Host genes play an important role in its (hosts) immune response. Mutated ß-catenin homolog bar 1 or homeobox gene egl-5 of C. elegans has resulted in defective response and hypersensitivity to Staphylococcus aureus [57]. Bar-1 and the fgl-5 genes function parallel to the immune response pathway taken up by C. elegans. Over-expression of egl-5 resulted in the modification of NF- ?B-dependent TLR2 (Toll-like receptor 2) signaling in epithelial cells, suggesting the role played by these two genes in immune defense of a host. Pro-16 in E cadherin is responsible for host specificity towards the human pathogen Listeria monocytogenes [73]. E-cadherin of mouse, which is 85 % similar to E-cadherin of human, denotes the entry of bacterial pathogen, Listeria monocytogenes, by not allowing E-cadherin to interact with bacterial surface protein internalin. If Proline (Pro) in the position 16 of amino acid in human is replaced by Glutamic acid (Glu), then interaction with internalin is disabled. However, in mouse, if Glu is substituted by Pro, then interaction with internalin is enabled. On Mtb interaction with mice, a group of 67 genes in an immuno-competent host has shown a higher level of expression than the immuno-deficient host often in 21 days. This shows that 67 genes are responsible for immunity of mice (host) [122].

Autophagy is another mechanism of the hosts defense against pathogen. Autophagy can be used in the elimination of Mtb [129]. LRG-47 initiates autophagy according to the study carried out by Singh et al. [118]. IRGM (immunity-related GTPase family M protein) also plays role in autophagy and degradation of intracellular bacillary load.

Dendritic cells (DCs) play a vital role in the activation of the immune system on encountering a pathogen [106]. DCs are summoned to the lamina propria of the small intestine after bacterial infection. The number of DCs summoned depends on the pathogenicity of microorganisms confronted. Infection stimulates the release of a variety of soluble factors, including chemokines, which facilitate the summoning of DCs, and cytokines that are strong arbitrators of DC activation. Pathogens, viruses, and their components can activate DCs directly. One of the important characteristics of DCs is their ability to migrate. During some infections, this property may have a harmful as well as a favorable side. Relocation of pathogen-laden DCs from the periphery into lymph nodes leads to the activation of T cells. On the other hand, this contributes to the spread of infection within the host.

Glycoconjugates can alter the immune system of the human body. Immunomodulatory components of Mtb are phosphatidyl-myo-inositol (PMI), lipomannan (LM), and lipoarabinomannan (LAM). Apart from LM and LAM, mannose also contributes to the synthesis of multiple glycosylated proteins and also polymethylated polysaccharides in Mycobacteria [86]. These molecules are synthesized by both pathogenic and non-pathogenic species. Many of the genes involved in biosynthesis of these glycoconjugates are important for survival of Mycobacteria [109, 110]. Only serine-threonine kinases have been predicted to take part in the regulation process of Mycobacterial glycosyltransferases [3, 87]. The interaction of Mycobacteria with the pattern recognition receptors may be an influencing factor for the functioning of the inflammatory signals, hence determining the way in which the immune system reacts [3, 87].

Iron plays a crucial role in the secretion of cytokines and in the activity of the transcription factors, affecting the immune response[32, 93]. Iron homeostasis is controlled by immune cell-derived mediators and acute-phase proteins. An effective method of host defense is to restrict the supply of iron to the pathogens. Pathogens have evolved to utilize iron, as it is found abundant in the host. The control of iron homeostasis is one of the main issues, as it can be controlled by the host or the pathogen for their benefit.

With such diverse mechanisms involved at each step of pathogen infection, predicting host–pathogen interactions are extremely crucial. However, prediction of interactions among the huge number of host and pathogen proteins do pose a real-time experimental problem. Hence, many in silico prediction methods have been devised to abate such issues. They effectively provide the primary screening of the possible interactions and provide a list of highly probable interactions, which can then be experimentally verified. In the following section, we have listed and described a few of these.

Methods for prediction of host–pathogen interactions

Predictions in the domain of host–pathogen interactions play a vital role in designing rational-therapeutic measures including drugs. Sometimes, experimental procedures can be cumbersome, time-consuming, and expensive. Experimenting with all possibilities takes a lot of time. Prediction methods with the help of machine learning can overcome such problems. They can be used to predict the putative data first, which satisfies certain conditions. Then the predicted set can be verified experimentally, which will engage far less time and resources. The respective subsections describe some of the widely used techniques for in silico prediction of host–pathogen interactions. One or more of these methods can be used for prediction of genes, proteins, factors, and pathways among others of both the host and pathogen. Experimental- and data-related aspects of these techniques have been covered in Section “1”.

Biological reasoning based prediction of host–pathogen interactions

The most extensively explored way by which a pathogen interacts with the host is by PPIs. Pathogen proteins interact with host proteins for invading the host. Proteins of a pathogen can affect a host and its environment in multiple ways. They can directly bind with host protein(s) and affect downward cascades of reactions preventing normal function(s) of host. They can even compromise a host’s immunological defenses by misguiding and weakening it. They can even utilize the components of a crumbling harsh anaerobic environment of a immune-compromised host. Hence, predicting the putative PPIs between a pathogen and its host(s) is of paramount importance. In order to foretell whether a host protein can interact with a pathogen protein or vice-versa, the following categories of methods can be used.

Homology-based prediction

An interaction between a pair of proteins in one species is anticipated to be conserved in its related species [79]. Prediction of host–pathogen PPIs in Homo sapiens (as host) and Plasmodium falciparum (as pathogen) [64] considers interaction templates of human and P. falciparum genomic sequences to bring out the probable set of PPIs. Then a homology detection algorithm as shown in Fig. 3 is applied to these PPIs to filter out non-homologous ones. The new set thus formed is made to pass through the filter of stage-specific and tissue-specific expression data of P. falciparum and Homo sapiens respectively, and further filtered using the concept of predicted localized data. A study by Lee et al. [74] has considered orthologous pair of genes from 18 different species to predict PPIs. Further analyzing them, 81 genes are found to be conserved in all the 18 species and 243 genes are missing in P. falciparum but found in the rest of the 17 species. Hence, these 81 genes and their related PPIs are probably conserved.

Fig. 3
figure 3

Homology-based predictions of host–pathogen interactions

Homology-based approaches to host–pathogen PPI prediction are widely used for their sheer simplicity and biological background support. Since the data needed for implementing the prediction are only the template PPIs and protein sequences, these approaches are adaptable and can be applied to multiple different host–pathogen systems.

Similar is the case of molecular interaction between GBP (galactose-binding protein) and LPS (Gram-negative bacterial lipopolysaccharide). GBP from Carcinoscorpius rotundicauda performs as an anti-microbial defense [76]. Most importantly, GBP shares architectural and functional homology to human proteins. Therefore, there is a probability of some human protein and LPS interactions. Moreover, there are 6 Tectonic domains containing LPS binding sites in GBP. GBP acts as a bridge between LPS and CRP (C-reactive protein) by indulging in GBP-LPS and GBP-CRP interactions with the aim at forming a stable pathogen recognition molecule. These interactions have indicated that Tectonin domains can differentiate between host and pathogen proteins.

Homology-based approaches have their own set of weaknesses. In an infection, two proteins in a predicted PPI may actually have very low probability to be present together. Therefore, host–pathogen PPIs predicted completely on the homology basis, without taking into consideration other biological properties of the proteins involved, may not be very dependable. Further information is needed to increase the accuracy of the prediction. An investigation by Wuchty and Stefan [138] has described filtering of the PPIs predicted by the homology-based approach using a Random Forest classifier. Then the result has been filtered according to expression and molecular characteristics. It has led to a potent subset of proteins that indeed interact.

Structure-based prediction

When a pair of proteins has structures that are similar to a known interacting pair of proteins, it is justifiable to believe that the former are likely to interact in a way similar to the latter. Likewise, several investigations have used structural information to recognize the similarity between query proteins (i.e., proteins in the host and pathogen) and template PPIs (i.e., known interacting protein pairs), and conclude that host–pathogen protein pairs, which match some template PPIs, indeed interact. The method is depicted in Fig. 4.

Fig. 4
figure 4

Structure-based predictions of host–pathogen interactions

A computational method for prediction of PPIs representing host–pathogen interactions has been devised by Davis et al. [28]. Their proposed method has first scanned the host and pathogen genome, searched for structural similarity to the already known protein complexes, and then analyzed their probable interactions, using the physical structures of the proteins. The result finally has undergone a filtering by tissue-specific expression data of host proteins and stage-specific expression data of pathogen proteins, leading to a potent set of proteins that have a high probability to interact.

Mapping of PPIs between the dengue virus and its human and insect host has been carried out by Doolittle et al. [34]. They have also predicted the interactions depending on structural similarity of the host and the pathogen proteins. It has also focused on predictions relevant to stress, unfolded protein response and interferon pathways. Another work by Dolittle et al. [33] has predicted PPIs between HIV-1 and Homo sapiens based on structural similarity. It has modeled a network of interactions between HIV-I and human proteins. Structurally similar proteins from host and HIV-1 have been retrieved and from this structurally similar set of proteins, the known interactions have been mapped. The resultant subset has again been screened with factors like cellular co-localization and RNAi screen to get a more determined set that has higher probability to interact. The result has highlighted a more potent set of proteins with higher chances of forming PPIs, representing the interactions among human and HIV-1.

Domain/motif interaction-based prediction

Here, the methodology for prediction of host–pathogen PPIs involves integration of known intra-species PPIs with protein domain profiles, and thereby predicting PPIs between a host and a pathogen [37]. For a set of intra-species PPIs, the functional domains are identified for each interacting protein. For each pair of functional domains, Bayesian statistics is used to compute the possibility of two proteins containing that pair of domain will interact. The method is shown in Fig. 5. It has been applied to Homo sapiens-Plasmodium falciparum host–pathogen system, and has successfully predicted 516 PPIs. Human proteins anticipated to interact with the same Plasmodium protein are close to each other in the human PPI network, and Plasmodium pairs predicted to interact with the same human protein are co-expressed in DNA micro-array datasets measured during various stages of the Plasmodium life cycle.

Fig. 5
figure 5

Domain/motif-based prediction of host–pathogen interactions

Prediction of PPIs, based on motifs conserved in HIV-1, has been performed by Evans et al. [43] and Bertoletti et al. [8]. The similarity between the binding motifs shared by virus and host proteins plays an important part in the crosstalk between virus and host. Similarly, the study by Bertoletti et al. [8] has attempted to predict PPIs based on motifs conserved in HIV-1. It has also highlighted the role of chemokines as a factor for liver inflammation.

Table 1 Summary of the machine learning-based tools used in the domain of host–pathogen interactions

Machine learning-based predictions of host–pathogen interactions

Machine learning-based prediction methods are extensively used for detecting host–pathogen interactions, as shown in Table 1. This table lists a few machine learning methods used for the prediction of various aspects of host–pathogen interactions in different species. Moreover, the particular domain knowledge is also included in this table. The sub-area of research in some cases is referred to as “pathogen informatics”. Supervised learning has been used for the prediction of PPIs in the host–pathogen domain by Tastan et al. [123]. The work has considered 35 features, including tissue distribution, gene expression profile, gene ontology, graph properties of human interactome, sequence similarity, post-translational modification similarity to neighbor, and HIV-1 protein-type features among others. Then, the authors have selected the top three and top six features that are of maximum importance to classify the given data set into interacting and non-interacting classes. The Random Forest classifier has been used as a tool for supervised learning with these feature set for training and resulting in MAP (maximum a posteriori) of 23 %. From this computation, it has been concluded that graph and neighbor similarity features contribute to a better classification.

Prediction of proteins secreted by Type III (T3) secretion system has been carried out by Arnold et al. [4]. The authors have examined the amino acid composition and the secondary structure of the N-terminal of 100 experimentally verified effector proteins, and used them for identification of T3 secretion signal. They have used Naive Bayes algorithm for classification. The training samples have been grouped depending on how similar they are, and this similarity has been measured by the Smith–Waterman local alignment algorithm. The input feature set has included frequencies of amino acid, amino acid properties, and short combinations of them. Finally, the feature-selection strategies have been applied to identify the most important feature to do away with computational complexity. In another attempt for prediction, the authors have used derived features from the secondary structure elements. They have used PSIpred software [82] to predict the structure. From the predicted structures, the features of the input vector have been formulated.

In another attempt to predict bacterial type III secreted (T3S) effectors, a distinct N terminal position-specific amino acid composition feature has been found in more than 50 % of T3S proteins [132]. Bi-profile Bayes method has been used in this particular work for feature extraction. Then, the entire dataset along with the new feature has been analyzed with a new SVM-based classifier. The new classifier has classified T3S and non-T3S proteins successfully.

In order to establish a relation among a host and multiple pathogens, Kshirsagar et al. [66] have developed a method taking the similarity in infection initiated by four different pathogens in human host. The authors have used the machine-learning technique in the form of multi-task classification frameworks. The host–bacteria PPIs have been used as the input to the multi-task classifier, which has then classified the PPIs into interacting and non-interacting classes. Considering the biological hypothesis of similar pathogens targeting the same critical biological processes in a host, the classifier has minimized the empirical error on the training set and favored models that are biased towards the biological hypothesis. A bias term has been incorporated into the classifier in the form of a regularizer to overcome it.

A semi-supervised multi-task method has been used on Homo sapiens-HIV 1 dataset [102] to predict host–pathogen PPIs. The method has involved both supervised and semi-supervised learning. The supervised classifier has worked on labeled PPIs data. The semi-supervised classifier has shared network layers of the supervised classifier and got trained with partially labeled PPIs. This entire framework has been used to improve the recognition of interacting pairs. The supervised classifier has done multi-tasking with a semi-supervised classifier so that weak positive labels could ameliorate the supervised classification.

For prediction of PPIs between Homo sapiens and Plasmodium falciparum, a Random Forest classifier has assessed a set of PPIs and then filtered the result according to expression and molecular characteristics, leading to a subset of proteins, which indeed interact among themselves [138]. It has been observed here that the separate sets and a combined set of predicted and experimentally verified interactions have shared similar characteristics. In another investigation, Kshirsagar et al. [65] have tried to improve the supervised learning-based prediction of PPIs between Salmonella-human and Yersinia-human. This has been done by replacing the missing values of the dataset by the values generated by cross species information along with group lasso technique with regularization (obtained 77.6 % precision). In order to impute values, localized nearest-neighbor approach (which uses sequence similarity) has been used as the basis to compute locality.

Data mining also forms an integral part of machine learning. Retrieved data about host–pathogen interactions in a few cases reflects information in two different ways, i.e., feature-based (SVM) [126] and language-based [19]. The investigation by Chaussabel et al. [19] used the hierarchical clustering algorithm by taking the literature available to identify a functionally and transcriptionally homologous pair of genes as input. Removal of noise from the PPI databases was done by removing PPIs that have less probability of taking place. Each such PPI has then been given a score. Then, these PPIs have been hierarchically clustered to obtain the PPIs likeliness of occurrence. In this way, it has been found that out of 12,122 binary PPIs obtained from BioGRID, 7504 PPIs are less likely to take place.

Online repositories for host–pathogen interactions

Host–pathogen interactions data can be obtained from several databases and repositories. We have summarized some of these repositories in Table 2. Some of these databases are referred to purely for their data content, i.e., genome, proteome, and metabolic pathway data [133], virus–virus, host–virus, and host–host interaction networks [95], PPIs of hosts and pathogens [69], literature-based viral–human protein interactions [18], experimentally verified pathogenic, virulence and effector genes of fungal pathogens [136], human signaling and regulatory pathways [113], information on specific biodefense and public health pathogens [120], 3D viral proteins [116], information on invertebrate vectors of human pathogens [71], and a collection of genus-specific databases [6] among others. Some of these databases even have integrated in-house tools, i.e., BLAST interface [35] and browser [142] for host–pathogen interactions data analysis. Moreover, we have described some tools [44] used in analysis and visualization of these kinds of data.

Table 2 List of online repositories storing data related to host–pathogen interactions

The PAThosystems Resource Integration Center (PATRIC) [133] includes a relational database, analytical pipelines, and a website that supports querying, browsing, data visualization, and allowing the download of raw and curated data in standard formats. Currently, the database houses complete sequences for viral and bacterial genomes, hence providing an all-inclusive bioinformatics resource for pathogens.

The Pathway Interaction Gateway (PIG) provides a text-based search and a BLAST interface for searching the host–pathogen PPIs. Each entry in PIG incorporates information on the functional annotations and the domains present in the interacting proteins [35].

VirHostNet (Virus-Host Network) [51, 95] is a public knowledge base specialized in the management and analysis of integrated virus–virus, host–host, and virus–host interaction networks coupled with their functional annotations. VirHostNet contains data of virus–host and virus–virus interactions constituting more than 180 distinct viral species. The VirHostNet Web interface provides suitable tools which allow effective query and visualization of infected cellular networks.

HPIDB (Host–Pathogen Interaction Database) [69] basically contains experimentally confirmed and predicted PPIs of hosts and pathogens.

GPS-Prot [44] is a software tool that permits users to easily create an all-inclusive and integrated HIV–host networks. Its web-based format, which requires no software installation or data downloads, gives it an extra edge over other visualization tools. GPS-Prot enables users to quickly generate networks that amalgamate both genetic and protein–protein interactions between HIV and its human host into a single representation.

VirusMint [18] contains protein interactions between viral (papilloma viruses, HIV-1, Epstein–Barr, hepatitis B, hepatitis C, herpes, and Simian virus 40) and human proteins reported in the literature. VirusMINT presently stores interactions constituting more than 490 unique viral proteins from more than 110 different viral strains.

PHIDIAS (a Pathogen Host Interaction Data Integration and Analysis System) [139] is a database and analysis system to curate, analyze, and address different scientific issues in the areas of host–pathogen interactions (PHI, or called host–pathogen interactions or HPI).

MvirDB [142] integrates DNA and protein sequence information from multiple databases. Entries in MvirDB are hyper-linked back to their original sources. A blast tool enables the user to blast against all DNA or protein sequences in MvirDB, and a browser tool enables the user to explore the database to retrieve virulence factor descriptions, sequences, and classifications, and to download sequences of interest.

PHI-base [136], a web-accessible database currently catalogs experimentally verified virulence and effector genes from fungal and oomycete pathogens. These pathogens interact with animal, plant, and fungi as hosts.

PID [113] is a freely available collection of curated and peer-reviewed pathways composed of human molecular signaling and regulatory events and key cellular processes. PID offers a range of search features to facilitate pathway exploration.

BioHealthBase [120] is a public bioinformatics database and analysis resource for study of specific biodefense and public health pathogens like Francisella tularensis, Mycobacterium tuberculosis, Influenza virus, Microsporidia species and ricin toxin. It serves as a substantial integrated repository of data imported from public databases and data derived from various computational algorithms and information curated from the scientific literature. Its 3D visualization capacity allows researchers to view proteins with their key structural and functional features highlighted.

VPDB (Viral Protein Structural Database) [116] is an interactive database for three-dimensional viral proteins. It provides an all-inclusive resource, with an emphasis on the description of derived data from structural biology. At present, VPDB includes viral protein structures from more than 277 viruses with more than 465 virus strains.

VectorBase [71, 72, 85] is a web-accessible data repository storing information about invertebrate vectors of human pathogens. It annotates and maintains vector genomes, providing an integrated resource for the research community. It hosts data related to nine genomes, i.e., mosquitoes (3 Anopheles gambiae genome), Aedes aegypti and Culex quinquefasciatus, body louse (Pediculus humanus), tick (Ixodes scapularis), tsetse fly (Glossina morsitans) and kissing bugs (Rhodnius prolixus). The data spans across genomic features, expression data, population genetics, and ontologies.

EuPathDB [5, 6] is an integrated database covering the eukaryotic pathogens of the genera Giardia, Cryptosporidium, Neospora, Leishmania, Toxoplasma, Plasmodium, Trypanosoma and Trichomonas. These groups are supported by a taxon-specific database built upon the same infrastructure. EuPathDB portal provides an entry point to all these resources, and the opportunity to leverage orthology for searches across genera.

Similarly, a number of other databases, like PHISTO [125], ViPR [99], HoPaCI-DB [9], VFDB [21] [140] [20], EDWIP [97], AquaPathogen X [41], are available, which help in the host–pathogen interactions domain research.

Discussions and future scopes

In this section, we discuss multiple faucets of host–pathogen interactions research, the shortcoming of the previously defined methodologies as discussed in Sections “1” and “1”, and the future scopes associated with the aforesaid methodologies, which takes both the host and pathogen points of view into account. We discuss the ways in which a pathogen can attack its host, the proteins emitted by a pathogen responsible for perturbing normal functionality of host, the genes responsible for such proteins, silencing and hijacking gene mechanism of pathogens, inhibiting the functions of macrophages, along with genes and proteins needed for their survival inside a host. From the host’s point of view, we also discuss the factors of pathogen that activates immune response. Salient features of the discussion are given in Table 3.

Table 3 Summary of host protection and pathogen-attacking mechanisms

The genes of multiple strains of an organism have been studied in several investigations [58, 81, 96] to understand the infection mechanism of these strains on the host and to locate the difference between them. In order to survive in a host, a pathogen can either perform hijacking [105] or it can use the existing environment to survive [12]. The effect of the genes in different strains of a pathogen has been studied. There is still uncertainty in the generalization/specialization of interactions in different strains of pathogens. A study has suggested that different strains of the same pathogen have different methods of invasion [81]. On the contrary, a counter example has also been provided in [26], which indicates that two strains of Mycobacterium have homologous genes required for Snm.

Influenza, DENV-2, and HIV have been in the limelight for identification of the host factors. Other pathogens too need to be taken into account. Inhibition of macrophage is a prospective aspect of research in bioinformatics. The inhibition mechanism needs to be studied in more pathogens apart from the mostly studied ones to find similarity between the inhibition mechanisms among these organisms.

Machine learning-based prediction methods have been applied mainly to PPIs. However, protein–ligand interactions, and hence prediction of pathways (excluding signal transduction pathways) via machine learning methods, have not been attempted much. Different pathogens become drug resistant and form new pathways, and these newly formed pathways can perturb the present host pathways in an unknown way. Similarly, machine learning algorithms in the field of pathway predictions are needed, which would mainly consider protein-ligand binding. Along with reaction dynamics are needed to be known too, as pathways are nothing but chain of reactions. Prediction of Type III secreted bacterial proteins by machine learning techniques is also a challenging task. However, a major drawback in the area of prediction of host–pathogen PPIs, are the unavailability of data sets for different pathogens. Moreover, there is always this lurking issue of biological validation of the predicted PPIs.

Some of the organisms studied for the exploration of host–pathogen PPIs are Homo sapiens-Plasmodium falciparum [37, 64, 74, 138], Homo sapiens-Dengue virus [34], Homo sapiens-HIV 1 [8, 33, 43]. However, there are many more host–pathogen pairs waiting in the line for these kinds of studies. In addition, homology-based approaches have their own inherent weaknesses. In a real scenario, two proteins in a predicted PPI may actually have little opportunity to be present close enough to interact with each other. Therefore, host–pathogen PPIs predicted entirely on the basis of homology, without considering other biological characteristics of the proteins involved, may not be reliable. Additional information must be used to increase the accuracy of the prediction and make the predictions biologically sound. Keeping this in mind, the study by Wuchty [138] has filtered the predicted PPIs based on homology using gene expression and molecular characteristics. It has led to the formation of a concrete set of PPIs closer to the biological scenario. The prediction of PPIs by comparative modeling [28] has very stringent filters leading to the formation of a smaller and robust set of PPIs.

Supervised, unsupervised and semi supervised learning have been mostly used for prediction of host–pathogen PPIs. The organisms for which these predictions have been made are mainly Homo sapiens-HIV1 [102, 123], Homo sapiens-Plasmodium falciparum [138], and Homo sapiens-Saccharomyces cerevisiae [25]. Both Tastan et al. and Yanjun et al. [102, 123] have applied their respective algorithms on the same dataset, which basically restricts the contribution of the articles. The performance of the Random Forest-based classifier is negligibly better than the Multi-Layer Perceptron classifier [102]. Some research articles have selected the top six and top three features among 35 features to predict whether a protein is interacting or not [123]. This is not a novel way of prediction since the interaction between proteins depends on all of its features even if by a negligible amount, which should not be ignored.

A flaw is often noticed in the choice of a dataset. In a semi-supervised based learning approach to identify PPIs [102], the negative dataset is way more extensive than the positive one. The negative (non-interacting) data set has approximately 16,000 pairs of proteins while the experimentally verified positive (interacting) dataset has only 158 pairs of proteins. Training with such a dataset might lead to a biased classifier, and the classifier would be inclined to predict most test pairs as non-interacting. Moreover, the logic used behind selecting a non-interacting dataset is based on a random list of pairs of proteins that do not fall into the positive set. It is always a risk, since there is no experimental evidence that the selected negative pairs will not interact at all. There may be several interacting pairs present among the negative set. Another study has been done for predicting proteins secreted by a Type III secretion system based only on structural and compositional aspects of the proteins [4]. These studies should include other factors like expression and molecular characteristics.

One notable thing is that a few attempts have been made on metabolic pathways. For host–pathogen interactions, most of the work has been done with signal transduction pathways. If enzyme(s) from a pathogen is introduced into a host, they get involved with more than one host pathway. There is no tool available that would take a list of protein (enzyme) names and provide the pathway (just one pathway based on these enzymes) based only on those enzymes (at least 90 %). Moreover, a pathogen can be associated with more than one disease. Such diseases, for which a pathogen is responsible, need to be looked into. The scenario becomes more complex when a host suffers from two or more diseases simultaneously, which implies the presence of multiple pathogens responsible for multiple diseases in a host in real time. Such real-time simulation studies are hardly done.

An important aspect that needs to be considered is that some pathogenic proteins prevent the working of macrophage. This is a serious problem in host–pathogen domain. Drugs are needed that would facilitate the working behavior of a macrophage. Drugs are also needed for the prevention of formation of intracytoplasmic vesicle that HIV-1 uses [22] to prevent identification by macrophages. Formation of biofilms [89, 121] is another domain that needs to be looked into. Breaking the biofilm formed by pathogens is indeed recommended to avoid the spread of infection. More attention is needed in this domain, given the rate at which new infectious pathogens are emerging along with their variety of degree of infection.

Hardly any research has been done based on the automated image processing-based techniques available for predicting host–pathogen interactions. A study by Mech et al. [83] has come up with a technique of a more robust analysis of microscopy images of macrophages that is made to coexist with different A. fumigatus strains. Usually, the images are manually analyzed, which is both time-consuming and error prone. The authors used the feature set which includes size, shape, number of cells, and cell–cell contacts. By analyzing the images, it has been found that different mutants of A. fumigatus have an impact on the ability of the macrophages to adhere and phagocytose the conidia. It has been observed that the rate of phagocytosis is higher in pksP mutants of A. fumigates, while it is not the same case in the other strains.

Conclusions

In this review, we have covered various aspects of host–pathogen interactions. Interaction of a pathogen with its host(s) is always a unique mechanism. Each one of the pathogenic species has specific mechanism(s) to interact with their host. The different mechanisms of a number of species have been included in this review along with the similarities and similar factors in the attacking mechanism(s) of pathogens. The review has introduced a brief history and introduction of the host–pathogen interactions research field followed by classification of host–pathogen interactions based on gene(s), protein(s), host-factor(s), involved pathway(s), and inhibition mechanism of macrophage(s). It has listed prediction methods used in the host–pathogen interactions domain based on biological reasoning (homology, structure, and motif interaction), machine learning (unsupervised, semi-supervised, and supervised) and sometimes both methods. Various data sources used for research in this domain have also been listed. The review concludes with a general discussion of the topic and future scopes followed by a conclusion. The field of host–pathogen interactions is emerging as a crucial area of infectious disease research in the post-genomic era. It is a budding research field where new discoveries are getting announced almost each day around the globe. The discovery of dynamics of the host–pathogen interactions will aptly facilitate further development in the field of discovering new drugs and new therapies for different diseases.