Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Mycobacterium tuberculosis is an extremely successfully pathogen due to its ability to persist, and to latently infect more than one-third of the world’s population [1, 2]. Annually, there are approximately eight million new cases of TB and two million deaths worldwide. The increase in multidrug-resistant (MDR), extensively drug-resistant (XDR) and super XDR Mycobacterium tuberculosis (Mtb) strains, together with the synergy with HIV infection is a frightening development [2, 3] and poses significant problems in the treatment and control of TB.

Genome-scale molecular networks such as protein interaction and gene regulatory pathways are taking a center stage in the emerging disciplines of systems biology and biocomplexity. As a result, an important challenge for TB investigators in the ­postgenomic era is to integrate functional strategies such as allelic replacement ­techniques [35], signature tagging mutagenesis [6, 7], in vivo expression ­technology [8, 9], proteomics, [10, 11], DNA microarrays [1216], deep-genome sequence strategies [17, 18] and protein–protein interaction (PPI) approaches [19, 20], to study the molecular mechanism of Mtb virulence.

To fulfill their biological function in cells, most proteins function in association with protein partners or as large molecular assemblies. Not surprisingly, virulence pathways are also mediated by molecular connections that require PPIs. The ­rationale for studying PPI in bacterial pathogens such as Mtb is several fold. First, in dissecting these pathways, it has been established that physical association between a protein of unknown function and a known protein suggests that the former often has a function related to that of the latter. This “guilty by association” principle has led to the functional annotations of numerous proteins of unknown function. Since over the past decade more than 1,000 microbial genomes have been sequenced it is anticipated that the focus on genes of unknown function will continue to increase, as it is these genes (of unknown function), which make the particular microbe unique. Second, an important feature of PPI networks is that most proteins associate with multiple interacting partners, suggesting that they fulfill multiple ­functions. Third, elucidation of PPI can rapidly provide detailed mechanistic ­information about a specific biological question. The above-mentioned approaches usually attempt to identify new drug targets, or to achieve a better understanding of the mechanistic basis of Mtb virulence.

While substantial efforts focused on prediction of protein–protein association by in silico analysis using phylogenetic profiles [21], domain fusion [22], and gene clustering methods [23, 24], these types of analyses must be supported by biological experimentation. Not surprisingly, due to the large number of PPIs studies over the past 20 years, a large number of protein interaction databases such as HPRD, DRP, MIPS, STRING, BIND [25] have been generated.

Mtb is a genetically intractable microbe and there is an urgent need to develop effective genome-wide tools to study protein–protein association in mycobacterial cells. By characterizing PPIs on a genomic scale it will be possible to assemble physiologically relevant protein pathways in mycobacteria, the outcome of which will be invaluable for determining the function of previously uncharacterized proteins and virulence mechanisms.

Thus far, despite the development of bacterial systems (BacterioMatch [BM] and bacterial adenylate cyclase two-hybrid [BACTH] and the mammalian two-hybrid system [M2H]), Saccharomyces cerevisiae is the most exploited surrogate host and represents the current standard. The first large-scale yeast two-hybrid (Y2H) ­interaction network was performed with the Escherichia coli bacteriophage T7 [26] and was rapidly followed by a whole-genome analysis of S. cerevisiae [2730], Drosophila [31, 32], Arabidopsis [33], and C. elegans [34]. These studies predicted the function of a multitude of proteins and revealed numerous novel interactions, thereby allowing investigators to link biological functions together into larger ­cellular processes.

In this review, we will provide an overview about the different PPI techniques that have successfully been exploited to study Mtb. We will discuss different ­mycobacterial PPI technologies, how it could be exploited for the discovery of new antimycobacterial drugs, potential pitfalls of PPI technologies, and in silico ­methods for predicting PPI.

2 Microbial PPI Systems

2.1 The Y2H System

In the original Y2H assay, a bait protein is fused to the GAL4 DNA-binding domain (DNA-BD), and a library of prey proteins are expressed as fusions to the GAL4 activation domain (AD) [35] (Fig. 5.1a). When the “bait” protein interacts with a “prey” protein from the library, the DNA-BD and AD are brought into proximity to activate transcription of several reporter genes (e.g., ADE2, HIS3, MEL1, and AUR1). The Y2H system is an effective tool use to identify novel protein interactions, confirm putative interactions, and define interacting domains and residues. Subsequent to the development of the original Y2H system, the reverse Y2H [36] and yeast three-hybrid (Y3H) [37] systems were developed.

Fig. 5.1
figure 00051

Conceptual basis of PPI methods used to study protein function in mycobacteria and other pathogens. In the (a) Y2H, (b) BM, (c) BACTH, and (d) M-PFC systems the two interacting proteins (bait [B] and prey [P]) are independently fused to either a DNA-AD (e.g., Gal4-AD, α-subunit of RNAP) and DNA-BD (e.g., Gal4-BD, λcI) (a, b) or to two enzymatic subunits (c, d) that reconstitute enzymatic activity (e.g., AC or hDHFR). However, in the case of RAP-inducible M-PFC (e), rapamycin functions as a bridge that forces FKBP12 and FRB to “interact,” thereby functionally reconstituting the reporter system consisting of F-[1,2] and F-[3] to generate TrimR mycobacterial clones. Note that in case of e, F-[1,2] and F-[3] can be replaced by any two proteins that the investigator wishes to force to interact. Although the above systems examine bimolecular protein interactions, (a), (b), and (d) have been modified to examine tri-molecular protein interactions (see text for detail). UAS upstream activating sequence, cAMP-CAP prom cAMP-CAP promoter, λcI oper λcI operator, T18 and T25 adenylate cyclase enzymatic domains

2.2 The E. coli BacterioMatch System

Similar to the Y2H system, the BM two-hybrid system is designed to examine PPIs between a pair of proteins cloned into separate “bait” and “prey” vectors. The bait protein is fused to the full-length bacteriophage λ repressor protein (λcI) containing the amino terminal DNA binding domain and the carboxyl terminal dimerization domain (Fig. 5.1b). When produced inside cell, the bait fusion is tethered to the operator sequence upstream of the reporter promoter through the DNA-BD of λcI [38]. The target or prey protein is fused to the N-terminal domain of the α-subunit of RNA polymerase (RNAP). When the bait and prey proteins associate, they recruit and stabilize binding of RNAP at the promoter and activate transcription of the HIS3 and aadA (confers streptomycin resistance) reporter genes [39].

Very recently, the BM system was modified to study ternary mycobacterial ­protein complexes in E. coli [40]. Using this three-hybrid system, it was ­demonstrated that the interaction between CFP-10 and Rv3871 was strengthened in the presence of Esat-6. Lastly, the BM system was also used to examine PPIs between Mtb ­proteins and approximately 8,000 novel interactions were discovered [41]. Notably, validation of PPI using overexpression and surface plasmon resonance analyses demonstrated a success rate of approximately 60 %. Important findings include demonstrating a link between the Mtb ESX1 and ESX5 protein secretion systems, and that the Fe–S cluster proteins WhiB3 and WhiB7 are highly connected [41].

2.3 The Bacterial Adenylate Cyclase Two-Hybrid System

In the BACTH system, proteins of interest are fused with two fragments of the ­catalytic domain of the Bordetella pertussis adenylate cyclase (AC) and ­co-expressed in an E. coli ∆cya strain (Fig. 5.1c). Interaction of the two proteins results in the functional complementation between the two AC subunits, leading to cAMP synthesis and subsequent activation of catabolic operons [42] or the expression of the lacZ gene. Using BACTH, it was demonstrated that Msm PsPpm2 interacts with MsPpm1 to stabilize the synthase MsPpm1 in the bacterial membrane [43]. The BACTH system was also successfully used to examine the interactions between Mtb ClpX and FtsZ [44].

2.4 Protein Fragment Complementation

Recently, a different experimental system, coined protein fragment ­complementation (PFC), was shown to be highly effective in studying PPIs in a variety of organisms [4548]. In PCF, a particular reporter enzyme is rationally dissected into two ­fragments and fused with two interacting proteins. Interaction among the two ­proteins results in active refolding and reconstitution of the enzyme activity of the two fragments. Since nuclear translocation of interacting proteins is not required, membrane proteins can also be analyzed.

For example, using human dihydrofolate reductase (hDHFR), any two proteins (X and Y) thought to interact are fused to two rationally dissected DHFR fragments called F-[1,2] and F-[3]. In vivo reassembly due to the interacting proteins X and Y, and subsequent reconstitution of hDHFR domains X-F-[1,2] and Y-F-[3] into active hDHFR can be monitored in vivo by cell survival under methotrexate selection, by fluorescence detection of fluorescein-conjugated methotrexate binding to ­reconstituted hDHFR, or by trimethoprim (Trim) resistance (Fig. 5.1d). hDHFR is a small 21-kDa monomeric protein that contains three structural fragments (F-1, F-2, and F-3) containing two domains; an adenine-binding domain (F-2) and a ­discontinuous domain (F-[1] and F-[3]). Previously, it has been shown that d­isruption of the disordered loop at the junction between F-[2] and F-[3] has no significant effect on activity [49]. This property was exploited to develop a eukaryotic DHFR PFC system to analyze reassembly of murine dihydrofolate reductase (hDHFR) fragments [47, 50, 51]. Using eukaryotic DHFR PFC, 148 combinations of 35 different PPIs in the RTK/FRAP signal transduction pathway were studied with no false-positive interactions observed among the pairs tested [47]. Importantly, the DHFR PFC system (albeit eukaryotic) is the only system that could validate the interactions through pharmacological perturbation of the interactions—even if the site of action of the perturbant is distant from the interaction studied [47].

The concept of PFC using hDHFR fragments was successfully exploited to develop a mycobacterial PFC system termed mycobacterial PFC (M-PFC) [52] (Fig. 5.1d). Using this system, the interactions between the two-component proteins DevS and DevR, and KdpD and KdpE were demonstrated. In addition, several ­previously ­undiscovered proteins were shown to interact with Mtb Cfp-10. Notably, proteins complexes were identified that form only in mycobacteria and not in the Y2H system [52]. It is likely that many interacting Mtb proteins will require the mycobacterial cytoplasmic environment to associate and is an important ­consideration in a PPI screen. In an independent study, M-PFC identified a strong interaction between Pup and the proteasome substrate FabD (malonyl coenzyme A acyl carrier protein), whereas this interaction was not detected using E. coli as ­surrogate host [53]. M-PFC was successful in demonstrating interaction between an essential DNA-binding protein (IdeR) and the enzymatic complexes (LeuC/LeuD) [54]. Lastly, M-PFC was also used to demonstrate interaction between Mtb ClpX and FtsZ [44].

The split-Trp system is another PFC assay that monitors the enzymatic ­reconstitution of tryptophan biosynthesis in a tryptophan autotrophic microbe. This system was originally developed in S. cerevisiae [55] and shown to be effective for examining the Mtb protein complexes Esat-6/CFP-10, RegX3 homodimerization, self-association of Rv3782 (galactosyl transferase), and the coiled-coil peptides C1 and C2 [56].

3 Shared Properties of Microbial Interaction Systems

The bacterial and Y2H systems (or variations thereof) have several properties in common that can profoundly affect postscreen analyses. For example, all systems have relatively strong promoters (Y2H: ADH1 p or GAL10; M2H: CMV; BM: ­lac-UV5; BACT: lac-UV5 and M-PFC: hsp60), and all PPI systems are based upon fusion technologies (Y2H; GAL4 AD and BD or LexA; M2H; JAK and GP130; BM; α-subunit of RNAP and λcI; BACT; AC; and M-PFC; DHFR). Lastly, all systems rely on unique peptide detection tags (HA, cMyC, FLAG, His, GP120, etc., or the reporter domains itself) to enable specific detection.

The two most widely used protein interaction validation approaches are to fuse a detection tag (e.g., GST) to the “bait” protein, in vitro transcribe/translate the “prey” protein followed by incubation of the mixtures and assessment of binding/elution of the labeled prey protein. A second widely-used validation experiment includes in vivo co-affinity purification in which one protein is tagged, overexpressed in E. coli (or native host) followed by a pull-down of the prey from the extract.

Important differences between the BM and PFC systems are that: (1) in PFC, protein interaction does not need to take place near the transcription machinery, (2) PFC is better suited for studying interactions among membrane proteins, (3) PFC requires no other host-specific processes or enzymes, (4) the structure of DHFR is known thereby allowing control over the way interactions can occur, and (5) it is advantageous to employ PFC in the native host rather than surrogate hosts such as yeast or E. coli wherein protein interactions are determined in the native host where they function in the context of other native proteins.

4 Is Yeast the Optimal Host for Studying Mycobacterial PPIs?

As is described in the section below, the Y2H system has been used successfully to study Mtb biology and pathogenesis. While in some cases it might be beneficial to use yeast as surrogate host, the Y2H system does have certain limitations. For example, (1) protein interactions occur in the nucleus, (2) membrane proteins are not fully compatible with the conventional Y2H system, (3) bacterial proteins do not undergo appropriate post-translational modification, (4) self-activation of bait proteins can occur, and finally, (5) high G+C DNA is sometimes not well tolerated in the Y2H system. A well-known class of Y2H false positives is “anti-sense” clones that contain anti-sense DNA fragments cloned in the library vector that when ­translated produce a nonphysiological peptide that associates with the bait protein. False positives are inherently present in all large-scale Y2H screens and are ­extensively documented in the literature.

5 Specificity and False Positives of PPI Technologies

In large-scale PPI studies, technical and biological false positives are typically being considered. Technical false positives resulting in experimental errors can be avoided. However, in order to eliminate false positives (e.g., those interacting clones that are genuinely observed in more than one assay, but do not occur in vivo) and to increase the verification rate, the following factors are taken in consideration during ­large-scale protein interaction screens: (1) overlapping (interacting) clones increases the confidence score [28, 31, 57], (2) literature curated interactions increases the confidence score [31, 57], (3) membrane proteins are underrepresented and ­negatively affect the confidence score [31], (4) post-translational modifications (e.g., phosphorylation) may be required for many interactions, (5) verification with other independent techniques increase the confidence score [27], (6) “masking” of bait or prey proteins and “self-activation” affect the screens [28, 31], (7) logistic regression models increase the probability of interactions [31], and (8) most studies validate interactions using detection tags or reporter fusions for “pull-down assays” [57].

6 How Can PPI Technologies Help Us Understand Mtb Virulence?

PPI technologies are flexible approaches that typically allow investigators to address previously unanswered questions. This is important to the mycobacteriology field as Mtb is a genetically intractable microbe for which few novel tools to determine virulence mechanisms are available. A widely cited rationale for exploiting PPI technologies in microbes is to ascribe function to genes of unknown function (e.g., those genes that are unique to the organism). It can be speculated that these genes distinguish the particular species from all other species and play a unique role in the biology of the microbe. Other areas in which PPI technologies can play an timportant role include the identification and dissection of virulence pathways, linking virulence pathways with each other, and examining the components of signaling cascades and drug resistance pathways. Particularly relevant to the study of Mtb is the effect of in vivo environmental conditions implicated in Mtb persistence (e.g., temperature, pH, NO, superoxide, etc.) on protein–protein association. Other ­important areas include the effect of post-translational modifications on Mtb PPI, screening for drugs that disrupt PPI, and construction of a complete protein linkage map of the Mtb proteome.

7 Impact of the Y2H System on Mtb Research

Over the past decade, PPI technologies have filled an important void in the ­mycobacterial field and opened up new avenues of TB research. The original discovery of Mtb WhiB3 in 2002 using the Y2H system [19] is a particularly good example for how PPI technologies can advance a particular research area.

7.1 Mtb WhiB3

It was previously established that a single-point mutation in the 4.2 region of the principal σ-factor rpoV causes loss of virulence in Mycobacterium bovis (Mbov), a member of the Mtb complex [58]. This mutation, known to result in an Arg515-His change, was originally suggested to influence recognition of the –35 promoter region that abolished or altered expression of a gene or subsets of genes essential for ­virulence. However, it was hypothesized that this mutation might alter the interaction of RpoV with a transcription factor responsible for regulating the expression of one or more genes involved in virulence. An abundance of data have shown that ­mutations in, or close to the helix-turn-helix motif in region 4.2 of bacterial δ70-type sigma ­factors results in either positive or negative effects on activation by transcription ­factors. Subsequently, it was hypothesized that the 4.2 domain of Mtb SigA interacts with a regulatory protein that controls a subset of genes involved in virulence. To screen for proteins that interact exclusively with the 4.2 region of SigA in which the Arg515-His mutation is localized, a sigA DNA fragment (spanning region 4.2) was screened against a Mtb library using the Y2H system. Several clones contained in-frame fusions with the full-length open reading frame of Mtb whiB3 (Rv3416) [19]. Since it was initially hypothesized that the Arg515-His mutation ­abolished or reduced interaction of an unknown transcription factor with the 4.2 region of SigA, it was shown that SigAR515-H does not interact with WhiB3, ­suggesting that the single ­Arg515-His mutation abolishes the interaction of WhiB3 with SigA. Knock-out ­studies have shown that the Mtb whiB3 mutant behaved identically to the wild-type strain with respect to its ability to replicate in mice, but was ­attenuated in terms of host survival. In addition, the whiB3 mutant strain showed much reduced lung pathology, compared to wild type infected mice [19]. Intriguingly, a whiB3 mutant of virulent Mbov was completely impaired for growth in guinea pigs. These mutants define a new class (“path”; pathology) of virulence genes in Mtb and Mbov. It is notable that this virulence gene would not have been detected using conventional screens such as signature-tagged mutagenesis, which screen primarily for mutants defective in growth and not virulence. Notably, these findings led to the identification of WhiB3 as a 4Fe–4S cluster protein that reacts with NO and O2 [59], and is ­implicated in the metabolic switchover from using glucose as carbon source to fatty acids. Mtb WhiB3 was also shown to regulate virulence lipid production, function as an intracellular redox sensor [60] and ­prevent the bacillus from experiencing reductive stress during infection of ­macrophages [61] (for a recent review on Mtb WhiB3, see [62]). The above findings illustrate the power of PPI technologies to study virulence mechanisms in a genetically intractable pathogen.

7.2 Secretion

Mtb Esat-6 and Cfp-10 are important secreted antigens that are part of the ESX-1 secretion system, which delivers virulence proteins during infection of host cells [63]. These small proteins interact strongly with each other as well as ­several other Mtb proteins. In recent years, the Y2H system has been ­particularly effective in mapping Mtb ESX PPIs [64], and identifying and characterizing the individual components of the ESX-1 secretion system [6468], which has led to new testable hypotheses. A substantial advance in our understanding of Mtb protein secretion was the discovery of a C-terminal signal sequence in Cfp-10 using the Y2H system. This C-terminal signal sequence was shown to be ­necessary for targeting Cfp-10 and Esat-6 for secretion. Besides, the C-terminal seven amino acid signal sequence was sufficient for targeting unrelated proteins such as ubiquitin for secretion [65].

7.3 Mtb Two-Component Signaling Proteins and Sigma Factors

Two-component signal transduction pathways are typically comprised of a ­membrane bound histidine kinase and its cognate cytoplasmic response regulator. In response to a signal, auto-phosphorylation occurs at a conserved residue of the histidine kinase and subsequently the phosphate group is transferred to the ­conserved aspartate residue of the response regulator. Even though these ­interactions are likely transient the Y2H system was effective in examining these interactions. Interactions among different domains of Mtb HK1 (Rv0600c), HK2 (Rv0601c), and TcrA (Rv0602c) were ­examined using the Y2H system [69]. It was found that HK2, but not HK1 or TcrA self-interacted, and that HK2 interacted with HK1 and TcrA. Lastly, the conserved receiver domain of TcrA was shown to interact with HK2, but not HK1 [69]. In another study the Y2H system was used to identify proteins that interact with the sensing domain of the Mtb histidine kinase, KdpD. Two membrane lipoproteins, LprJ and LprF, were identified that specifically associated with KdpD [20].

Mtb contains 13 sigma factors that can associate with one or more components of RNAP (RpoB, RpoB′, α-subunit) under distinct environmental conditions. In addition, anti-anti-sigma factors can interact with anti-sigma factors (e.g., RsbW) or sigma factors. In extensive Y2H studies, it was shown that most anti-sigma factor antagonists interact with either RsbW or SigF or both [70]. In a separate study, it was shown that SigK positively regulates expression of the antigenic proteins MBP70 and MBP83 [71]. High-level expression of sigK was associated with a mutated Rv0444c, and Y2H analysis demonstrated that the N-terminal region of Rv0444c interacted with SigK. The authors concluded that Rv0444c functions as a regulator of SigK (RskA) that modulates MPT70/MPT83 expression [71]. As described earlier the principle sigma factor, SigA was used in Y2H screen to ­identify the virulence factor WhiB3 [19].

7.4 DNA Repair

The Y2H system has been effectively exploited to study DNA damage and repair in Mtb [7275]. For example, in a genome wide screen UvrD1 was identified as a novel interacting partner of Ku, suggesting potential cross-talk between ­components of nonhomologous end-joining and nucleotide excision repair pathways [74]. In another study that examined the role of Mtb DinB homologs in DNA damage, Y2H analyses showed that DinB1, but not DinB2 interacts with the mycobacterial β clamp, which is consistent with its C-terminal DNA-binding motif [73]. In a related study Y2H analysis showed that ImuB interact with ImuA′ and DnaE2 as well as with the β clamp [75].

7.5 Other

The Y2H system has also been used to identify and characterize interacting partners of Mtb WhiB1, an iron–sulfur cluster protein [76], the Mtb SUF machinery [77, 78], components of FASII (KasA, KasB, mtFabH, InhA, and MabA) [79, 80], the ABC transporter Rv1747 [81], resuscitation promoting factors (Rpfs) [82], a GTP ­binding protein (Obg) [83], and VapBC toxin-antitoxin modules [84].

8 Protein–Protein Interaction in Other Pathogens

PPI networks of bacteria have not yet reached the same comprehensive level as their yeast counterpart. An exception is the protein network of the human gastric ­pathogen Helicobacter pylori [85]. A high-throughput Y2H systems was used to screen 261 H. pylori proteins against a highly complex library of genome-encoded polypeptides and yielded over 1,200 interactions; connecting 46.6 % of the proteome [85]. The success of this approach in detecting new protein interactions and assignment of previously un-annotated proteins to new pathways lead to many such studies using the Y2H system to develop PPI maps for Plasmodium falciparum [86], Rickettsia sibirica [87], Bacillus anthracis, Francisella tularensis, Yersinia pestis [88], Campylobacter jejuni [89], Treponema pallidum [90] and viruses including HIV and HCV [91]. Unfortunately, these high-throughput screens are plagued by many drawbacks including false positives and negatives, and the temporal or spatial requirement of expression and post-translational modifications. Consequently, ­high-throughput PPI approaches have been augmented by the addition of techniques such as protein arrays and mass spectrometry [9294].

While evaluating intra-bacterial PPIs provides a unique resource to identify essential cell processes and protein targets for drug screens against pathogenic ­bacteria, assessing interactions between host and bacterial proteins are imperative for understanding the mechanism of disease pathogenesis. Using a high-throughput Y2H screen, extensive host–pathogen PPIs have been identified for the pathogens B. anthracis, F. tularensis, and Y. pestis. Though the three pathogens cause different diseases (anthrax, lethal acute pneumonic disease and bubonic plague, respectively), PPIs pointed to similar mechanisms of immune modulation. For example, both B. anthracis and Y. pestis proteins interact with host major histocompatibility complex proteins, whereas TGF-β1 was shown to interact with Y. pestis and F. tularensis proteins. In sum, a network of 3,073 human-B. anthracis, 1,383 human-F. ­tularensis, and 4,059 human-Y. pestis PPIs were identified. The networks included 304 ­uncharacterized proteins from B. anthracis, 52 from F. tularensis, and 330 from Y. pestis [88].

Using three datasets that include physical interaction assays, genome-wide RNA interference (RNAi) screens, and microarray assays, the first draft of the mosquito PPI network was developed for the Dengue virus (DENv) carrier. This PPI network included 4,214 Aedes aegypti proteins with 10,209 interactions [95]. The study identified 714 putative DENv-associated mosquito proteins, and RNAi-mediated gene silencing of some of the highly interconnected proteins reduced the dengue viral titer in mosquito midgets. This observation further underscores the importance of identifying critical host–pathogen PPIs, which can provide an immense resource for identifying prospective antimicrobial drug targets.

In an attempt to characterize essential cellular process in Bacillus subtilis, a PPI network was generated that comprised 793 interactions that connected 287 proteins. Further evaluation of these hubs provided insights into distinct subgroups of PPI corresponding to protein networks or regulatory pathways differentially expressed under diverse conditions [96]. These PPI network data are a valuable resource for the functional annotation of genes of unknown function and integration of cellular pathways.

In addition to the Y2H system, high-throughput pull-down strategies combined with quantitative proteomics have also been used to decipher interacting circuits in methicillin-resistant Staphylococcus aureus [97]. Several highly connected hub proteins were identified. Notably, examination of the PPI network of S. aureus drug targets indicated that most of the clinical or experimental drugs targets lie at the periphery of the interacting circuit with few interacting partners. In contrast, the proteins that lie at the network hub, which could logically serve as a better target, were overlooked as drug targets [97].

9 Considerations for Mycobacterial PPIs

9.1 Some Mycobacterial Proteins Interact Exclusively in Their Native Environment

Although many bacterial PPIs have been identified in the Y2H system, it is logical to expect that some bacterial protein interactions may require the native ­cytoplasmic or membrane environment. For example, using M-PFC some Mtb proteins were shown to only interact in mycobacterial cytoplasmic environment, but not in yeast [52]. In an independent study, detection of interactions between Pup and other Mtb proteasome components in E. coli was unsuccessful. However, using M-PFC and therefore Msm as host, a strong interaction was observed between Pup and the ­proteasome substrate FabD [53].

9.2 Some Mycobacterial Proteins Require More Than One Protein for Interaction

Since all in vivo PPI methods (with the exception of the Y3H system) are binary systems that can detect interaction between only two proteins, interacting partners that require the presence of two or more proteins might be missed. In a recent study that made use of a modified M-PFC system, the Mtb ESX secretory system was examined by using a single fusion protein comprised of EsxB and EsxA as bait [98]. Three novel prey ­proteins, Rv3869, Rv3884 and Rv3885 were identified, whereas the single bait protein EsxB was unable to interact with any of these three proteins [98]. Exploiting fusion proteins that naturally associated in mycobacteria as bait has broad implications for the characterization of Mtb protein complexes, and may open new avenues of research.

The Y3H system was also exploited to delineate the molecular interactions between two membrane proteins and the Mtb two-component sensor kinases KdpD [20]. In this system, a third protein acts as a bridge between two proteins and can stabilize, enhance, or prevent interaction between proteins. The third protein is under the control of the inducible methionine promoter that is ­positively regulated in media lacking methionine. In this study LprJ and LprF were shown to modulate the interaction between N-KdpD and C-KdpD, and it was speculated that it is this ternary protein complex that modulates the ­KdpE-specific phosphatase activity of KdpD to regulate the expression of the KdpFABC system [20].

9.3 Post-translational Modification Can Affect PPI in the Y2H System

In 2003, a Y2H assay was developed to examine nitric oxide (NO)-dependent PPI [99]. Deleting yeast hemoglobin, which consumes NO very efficiently, was essential to the success of this approach. In this study, the authors screened a library of proteins that interact with procaspase-3 only in the presence of NO and identified four clones, iNOS, ASM, IRG and PGM [99]. These findings suggest that S-nitrosylation regulates PPI and may profoundly influence cellular signaling.

In another study, in vitro proteomic analysis identified numerous thioredoxin (TRX) targets. However, in vivo approaches failed to identify the expected number of TRX targets [100]. This problem was solved by constructing a specific yeast strain that contains deletions of genes encoding cytosolic TRX1 and TRX2. Subsequently, numerous TRX interacting partners were identified, whereas the same interactions could not be detected in the classic Y2H strain [100]. The above findings are highly relevant for studying mycobacterial PPIs, and illustrate the f­undamental concepts that (1) proteins only interact when functionally required, (2) essential genes can be studied since genetic knockouts are not required, and (3) genes that are transcriptionally switched off can be studied since constitutive promoters are being used in the PPI systems.

10 Molecules That Dissociate or Force Protein–Protein Interaction

Despite recent successes [101103], no new effective anti-tuberculosis drugs have been developed in the past 40 years. As a result, a high priority of the Global Alliance for TB Drug Development is the generation of new drugs with activity against ­dormant bacilli as well as the discovery of agents which could shorten or simplify the treatment of active TB. TB can be cured with existing drugs; however, the 6–9 months of treatment lead to patient noncompliance, which enhances drug ­resistance. Approximately 50 million people are already infected with MDR-TB [2]. While drug-sensitive TB can be cured with isoniazid, rifampin, ethambutol and pyrazinamide following a 6-month regimen, treatment of MDR-TB can exceed 2 years, thus dramatically increasing costs.

How can PPI in pathogenic microbes contribute to the discovery of new drugs? Tightly regulated PPIs are required for cellular functions in all living systems. The necessity for proper protein placement within enzymatic and receptor–ligand ­complexes, cell signaling pathways, and PPIs lend to the appeal of disruption of critical PPIs as therapeutic intervention. However, it should be noted that PPIs that participate in virulence or persistence pathways may only be induced in vivo and therefore, not be susceptible to drugs in in vitro screens and must be identified through other means. PPIs in particular share complimentary interfaces and “hot spots” with one another [104, 105], in which the primary forces that drive two proteins to interact are: van der Waal’s forces, electrostatic interactions, hydrogen bonds, and hydrophobic interactions [106110]. Successful “disruption” of these interactions by an inhibitor, while not necessarily always in the context of protein–protein separation would be considered as any compound which modulates a protein interacting complex to achieve a desired therapeutic outcome and/or downstream effect.

Among several well-known inhibitors that modulate PPI, common mechanisms of action have emerged: prevention of PPI via protein binding, allosteric inhibition and, forced dissociation and association. More importantly, their method of action differs from drugs that prevent substrates from binding to active sites on enzyme complexes, as these sites are often marked with clear, defined pockets [111]. PPI inhibitors can include peptides, drugs, and small molecule compounds. These PPI inhibitors exert their functions over a range of target protein complexes in different cell types and have been reviewed over recent years [112115].

Many inhibitors that prevent protein interaction have been shown to bind with amino acid residues that comprise “hot spots” at the protein–protein interface. Inhibitors form a complex with a protein at the binding site to structurally alter or prevent natural association of the cognate partner protein. For example, structural biology studies revealed that nutlins, a series of cis-imidazoline analogs identified via high-throughput screening, act by binding to three dominant residues of the p53 binding site on MDM2 and display in vitro and in vivo antitumor activity [116, 117]. Virstatin is an example of a compound that targets the dimerization domain of ToxT, a homodimer that regulates the production of cholera toxin and toxin co-regulated pilus in Vibrio cholera. Bacterial two-hybrid assays with ToxT truncation mutants demonstrated that virstatin specifically targets the N-terminal dimerization domain of ToxT [118, 119].

PPI can also be modulated when compounds bind distally to the protein interaction interface, that cause structural changes that prevent PPI without competition for protein binding sites. Such allosteric modifications have been documented for compounds that inhibit iNOS dimerization [120, 121]. PPA250, BBS-2, and clotrimazole are compounds that bind to the heme cofactor in the protein active site, which subsequently distort the α-helices [120] or the 8b and 9b β-strands [121] to prevent iNOS dimerization. In addition, other examples of allosteric inhibitors have been ­demonstrated for CBFβ-RUNX1, LFA-1-ICAM-1, and β-lactamase [122124].

Inhibitors that dissociate preexisting protein complexes are functionally different from those that prevent protein dimerization. The most notable example of this method is TNF-α, whose active complex is maintained as a homotrimer when bound to its receptor. He et al. [125] demonstrated that at low TNF-α concentrations, the compound SPD00000034 bound to the pre-associated TNF-α trimer and promoted the dissociation of the active complex into dimer and monomer subunits. Similarly, previous studies have shown that the GroEL multimeric complex can exist in an “open” state, which allows 4,4′-dithiodipyridene to bind to an otherwise inaccessible Cys458, leading to GroEL subunit disassembly [126]. More recently, a proof-of-­concept quantitative HTS screen was developed to screen for small-molecule inhibitors of Mtb PPI [54], which demonstrated the versatility of M-PFC.

Finally, several compounds modulate PPI by inducing the formation of previously unassociated complexes or by stabilizing protein complexes. Chemical inducers exist for the p66 and p51 subunits of HIV-1 reverse transcriptase [127]. However, the most well-known example of forced protein association comes from studies involving a physical relationship between immunophilins, their ligands, and their target [128]. FK506, rapamycin, and cyclosporine, are examples of hydrophobic, immunosuppressive ligands that contain two protein binding surfaces which mediate interactions between FKBP12 or cyclophilin [128, 129] and their corresponding target protein. In mammalian cells, the FKBP12–FK506 complex binds to and inhibits calcineurin phosphatase activity [130]. The FKBP12–rapamycin complex binds to the rapamycin binding domain (FRB) of FRAP [131]. The resulting complexes affect different immune responses and can lead to programmable physiological responses. Furthermore, these binding partners have led many researchers to exploit forcible ligand binding of effector molecules for the development of inducible PFC assays (PCA) [132135]. In mycobacteria, a rapamycin inducible mycobacterial-PFC (RAP-inducible M-PFC) assay was developed as proof-of-concept to show forced interaction in bacterial cells, where FKBP12 and FRB were independently fused to the DHFR reporter fragments F-[1,2] and F-[3], respectively. Association of FKBP12 and FRB could only be detected in the presence of the selective drug trimethoprim and nanomolar concentrations of the rapamycin ligand [54]. Taken together, the M-PFC and the RAP-inducible M-PFC systems are powerful methods used to identify interacting proteins in protein networks, where in future studies, vehicles like FKBP12-ligand binding can be designed and utilized to manipulate PPIs in mycobacteria (Fig. 5.1e). In short, the ability of these effector molecules to bridge or induce dimeric and multimeric complexes paves the way for potential applications in controlling protein pathways for therapeutic and experimental studies [129, 136, 137].

11 In Silico Methods for Predicting PPI

Over the past few decades, knowledge of PPI has been generated primarily from biochemical and genetic experimentation approaches such as Y2H systems, ­pull-down assays, mass spectrometry, co-related mRNA expression, and protein arrays. However, despite the best attempts to collect experimental data on different organisms, the rate of discovery remains slow (e.g., approximately <10 % of ­interactions in humans have been experimentally characterized). With the advent of the genomic era, several computational and bioinformatics-based approaches have been developed to infer PPI. These in silico approaches exploit annotated information from ­established observations and use the structural, genomic, and biological context of proteins and genes from completely sequenced genomes to predict protein interaction networks [138, 139]. These in silico methods may rely on information gleaned from protein structure, gene sequence and the presence or absence of genes across numerous genomes, conserved gene neighborhoods across different species, co-expression of genes in transcriptome studies, involvement of proteins in a common metabolic pathway, curation of published literature or a combination of these datasets [140145]. High-resolution three-dimensional (3D) structures of interacting proteins provide the best source of information with atomic description of the binding interfaces based on hydrophobicity, charge, and thermodynamic constraints of the interaction [140, 146]. Several approaches have been developed that include computational modeling of homologous proteins based on previously known structures, or domain or sequence signature analysis if the complete structure of a homologous protein is not available [147, 148]. For example, Inter PreTS (EMBL Heidelberg) is a popular resource, which for any pair of query sequences first searches for homologues in a database of interacting domains of known 3D complex structures [149]. Pairs of sequences homologous to a known interacting pair are scored for how well they preserve the atomic contacts at the interaction interface and a priority ranking is used to score for possible interacting partners.

A number of structure-based computational methods have been developed for the prediction of PPIs, which utilize advances in the field of genomics. One such popular approach, known as phylogenetic profiling [150], is based on the pattern of the presence or absence of a given gene in a given set of genomes. This method could, for example, ascertain the distribution of a specific gene in different species [151]. Any similarity of phylogenetic profiles might then be interpreted as being indicative of the functional need for corresponding proteins to be present simultaneously to ­perform a given function together. This approach stems from the idea that functionally linked proteins would co-occur in genomes and that the phylogenetic trees for known interacting protein families tend to show a higher degree of similarity than trees for noninteracting proteins. In several cases, the similarity in topology of phylogenetic trees has been considered as a positive indication towards establishing the ­likelihood of interacting proteins pairs, especially in the case of protein partners that may have co-evolved (mirror tree approach) [150]. Likewise, co-localization-based approaches are based on the notion that physically interacting (or functionally associated) proteins must co-evolve to preserve their ability to interact with one another [152]. This is especially relevant in the case of prokaryotes, which have operonic transcription units.

Genomic context-based approaches also exploit gene fusion events, which can be considered as the ultimate form of co-localization as the fusion of two ­independent genes to encode a single unrelated polypeptide (called a Rosetta stone protein) retains the physical proximity of the two peptides, but also makes them a single entity [142]. Publically available databases that provide support for gene context and co-localization analyses include FUSION DB, STRING and PHYDBAC.

Another robust tool implementing genome context-based analysis is based on the Integrated Microbial Genomes (IMG) database. The IMG provides one of the largest genome integrations, containing ∼7,000 complete and draft genomes across all three domains of life [153]. Similarly, an in silico two-hybrid method has been proposed, based on the study of correlated mutations in multiple sequence alignments. In this method, pairs of multiple sequence alignments with a distinctive co-variation signal are analyzed based on the hypothesis that co-adaptation of interacting proteins can be detected by the presence of a distinctive number of compensatory mutations in the corresponding proteins of different species [154].

Similar to wet lab-based approaches, most computational approaches have ­intrinsic limitations [155]. For example, the success of most of sequence and genomic context-based approaches requires extensive analysis of completely sequenced genomes, whereas the success of phylogenetic tree-based methods depends on the number and distribution of genomes used to build the tree [156]. Similarly, gene fusion-based methods may be confounded by errors caused by the occurrence of lateral gene transfer events in prokaryotes and the longer multi-gene architecture of eukaryotes [157]. Likewise, despite providing the highest quality information on PPI, protein structure-based approaches are restricted in their scope because of the limited availability of high quality protein structures in the databases and the high cost associated with determination of protein structures.

There is a clear need to unify genome sequencing and functional genomics data using computational tools to minimize the discrepancies associated with the use of a single approach. Several worthy attempts have been made in this direction. In addition, there is an encouraging community-driven initiative in the form of ­guidelines such as “MIMIx” and “MIAPE, which are the minimum information required for reporting a molecular interaction experiment” or a proteomics experiment, respectively [158, 159]. Under this initiative, a checklist of information has been provided, which every scientist must furnish when describing experimental molecular interaction data in an article, displaying data on a website or depositing data directly into a public database.

12 Integrative Physiology: The Emergence of Systems Biology?

Proteins are the catalytic effectors that carry out the intent of the microbial cell, but protein levels do not necessarily correlate with gene expression. For example, a lack of correlation was found between mRNA level and the corresponding protein level in Haemophilus influenza exposed to antibiotics [160], increased cell density in E. coli cultures [161], Bacillus subtilis exposed to peroxide stress [162], exponentially growing S. cerevisiae cells [163], and S. cerevisiae exposure to lithium [164]. This demonstrates the challenges of correlating mRNA expression levels with protein levels, and highlights the role of post-transcriptional regulatory control. Furthermore, some studies have observed a disparity between gene expression profiles and ­metabolic flux. This was elegantly demonstrated by analysis of the transcriptome, metabolome, and fluxome of Corynebacterium glutamicum [165]. Integrating PPI data with ­complementary high-throughput techniques such as transcriptomes, ­proteomics, metabolomics, and fluxomics represent unique opportunities to study and predict Mtb protein function through systems biology (Fig. 5.2).

Fig. 5.2
figure 00052

Integrated analyses methodology depicting the role of PPI in TB systems biology. Towards this end, regulatory networks (gene expression arrays), proteomics, fluxomics and PPI networks have already begun to be established, but are commonly represented as static set of nodes to represent the components of the network (mRNA, proteins, metabolites, etc.). The ultimate goal will be to develop, test, and validate mathematical models that represent cellular components and their interactions to eventually predict cellular function. TAP tandem affinity purification, IP immunoprecipitation, SPR surface plasmon resonance

13 Conclusions

Tuberculosis research is primarily driven by the quest for a better understanding of how Mtb causes disease. The past decade, the Y2H system and mycobacterial PPI technologies gave mechanistic insight into distinct aspects of Mtb virulence, pathogenesis and have stimulated antimycobacterial drug discovery efforts. PPI studies are particularly powerful to provide information about the function of genes with unknown function through “guilt by association.” Not surprisingly, it is anticipated that the integration of functional data from PPI networks with the emerging discipline of systems biology could prove particularly useful to provide a better understanding of Mtb persistence. Although mycobacterial PPI networks have already begun to be established, the current focus is still on high-throughput PPI tool development, which is still lacking for mycobacteria. In addition, despite the generation of a single Mtb PPI map using E. coli as surrogate host, the more important stage of data ­interpretation, validation and integration with mycobacterial physiology is lacking. A future ­challenge would be to interconnect increasing amounts of mycobacterial PPI data with the PPI networks of other bacterial pathogens and its integration with other genome-wide databases, which should lead to new testable hypotheses. The generation of ­high-throughput global datasets will be an expensive venture that requires detailed knowledge about mycobacterial physiology, metabolism, pathogenesis, and ­computer modeling, which will contribute to a goal understanding of Mtb pathogenesis.