Introduction

The PhoPQ regulatory system governs the adaptation to Mg2+ limiting conditions in Salmonella typhimurium and Escherichia coli. In this two-component system, the PhoQ protein acts as a sensor for extracytoplasmatic Mg2+ and Ca2+ (Vescovi et al. 1997) that controls the activity of the response regulator PhoP. This two-component system has also been shown to determine the virulence characteristics of S. typhimurium and other Gram-negative species (Groisman 2001). In S. typhimurium, the sensing of extracellular Mg2+ permits the pathogen to determine its subcellular location, i.e., inside macrophages and to activate the virulence factors essential for survival. Mutations in the phoPQ operon of S. typhimurium result in an attenuated virulence phenotype i.e., an increased sensitivity to cationic antimicrobial peptides (Fields et al. 1989; Groisman et al. 1992; Guo et al. 1997; Gunn et al. 1998; Guina et al. 2000; Bader et al. 2003) and acid pH (Foster and Hall 1990; Bearson et al. 1998), a decreased resistance to bile salts (van Velkinburgh and Gunn 1999; Prouty and Gunn 2000), a deficiency in epithelial cell invasion, and the inability to survive within macrophages (Fields et al. 1986; Miller et al. 1989; Darwin and Miller 1999). However, although the PhoPQ system is a major determinant of the virulence characteristics of S. typhimurium, it does not confer a similar phenotype in a nonpathogenic strain of E. coli. In E. coli K12, the PhoPQ two-component system is, besides its physiological role in response to Mg2+ starvation, important for resistance to antimicrobial peptides (Groisman et al. 1992; Groisman 2001). The amino acid sequences of PhoP and PhoQ of E. coli are 93 and 86% identical, respectively, to those of S. typhimurium (Kasahara et al. 1992). This implies that the PhoPQ systems in E. coli and S. typhimurium are most probably functional counterparts of each other. A short consensus site (T/G)GTTTA, occurring as an interrupted dyad, has been suggested to be the binding site for the PhoP regulatory protein in Salmonella spp. (Soncini et al. 1995) and Escherichia coli (Kato et al. 1999). The biological relevance of this motif was experimentally verified by Yamamoto et al. (2002) and Lejona et al. (2003) in E. coli and S. typhimurium, respectively.

In this study, we estimated the size of the direct PhoPQ-dependent regulons in both species by combining the evidence gained from microarray data and motif information. Comparing the gene composition of the PhoPQ regulon revealed a very small overlap in the genes that were PhoPQ regulated in both species, indicating that the difference in virulence phenotype between the pathogenic and the nonpathogenic bacteria might be attributed to a group of target genes that are specifically PhoP regulated in the pathogenic S. typhimurium strain but not in E. coli K12.

Materials and Methods

First, PhoP-dependent operons were identified in S. typhimurium based on the microarray dataset of this study and in E. coli based on a previously published dataset (Minagawa et al. 2003). Subsequently, promoter regions of these PhoPQ-dependent operons were screened for the presence of the PhoP regulatory motif. This motif information was used to distinguish in each of the respective organisms the direct from the indirect regulon. Eventually, overlap between the regulons in both organisms was identified.

Identification of PhoPQ-Dependent Genes Based on Expression Data

To identify genes dependent on PhoPQ in E. coli, we used the results described by Minagawa et al. (2003). They used cDNA microarray experiments to identify target genes of the Mg2+ stimulon that responded to the availability of external Mg2+ in a PhoPQ-dependent manner. In their study, wild-type and phoP- and phoQ-defective E. coli strains were grown in the presence and absence of Mg2+, respectively. For the identification of the PhoPQ regulon in S. typhimurium, we set up a microarray experiment (data available at the supplementary information) in which RNA was isolated from strains of Salmonella enterica serovar Typhimurium ATCC 14028 harboring either a null mutation (phoP::Tn10dCm) or a constitutive mutation (phoQ24) in the genes encoding the PhoPQ two-component system (Miller and Mekalanos 1990). Strains were grown to mid-log phase in M9 minimal medium prior to harvesting RNA and the array experiments were performed as described by Bader et al. (2003).

Data were statistically analyzed by combining six replicates for each experiment using the maximum-likelihood analysis of Ideker et al. (2000, 2001). This method calculates for each gene a generalized likelihood ratio test statistic (λ value) and the ratio of the mean intensities for the two conditions (μ-ratio). The λ values were converted into p-values using a χ2 cumulative probability distribution with one degree of freedom (Ideker et al. 2000). These p-values were used to estimate the number of actually differentially expressed genes (Fig. 1A) and subsequently to plot the false discovery rate (FDR) and the number of true positives for each critical value of λ (Figs. 1A and 1B). Detailed description on how to interpret these plots can be obtained from Storey and Tibshirani (2003) and De Smet et al. (2004). In order to minimize the type 1 error (number of false positives), we chose a high critical threshold of λ (λ cutoff of 40) resulting in 324 significantly differentially expressed genes. Under the assumption that genes directly regulated by PhoPQ will be most severely affected by the PhoP mutation, we selected from the 324 most significantly differentially expressed genes (using a λ cutoff of 40) those for which the μ-ratio was less than −0.30 (genes upregulated by PhoP). This additional selection step reduced this number of selected genes to a subset of the 214 most influenced genes. This number approximated the number of E. coli PhoPQ upregulated genes selected by Minagawa et al. (2003).

Figure 1
figure 1

(A) Plot of V i versus the gene number (sorted according to p-value). \( V_i={{i - p_i \cdot n} \over {1 - p_i }}, \)Where n is the total number of genes in the dataset, p is the p-value of gene i, i is the rank order of a gene after sorting all genes according to their p-value, and Vi reaches a constant level at 2855 genes, which is an estimate for the number of actually differentially expressed genes. (B) Number of true positives (TP, dashed line)/FDR (solid line) versus the gene number (genes are sorted according to their p-value).

Sequence-Based Analysis

For sequence analysis, genome sequences were obtained from GenBank (NC_003197 for S. typhimurium LT2, NC_003277 for S. typhimurium LT2 plasmid, and NC_000913 for E. coli K12) (Benson et al. 2003). Intergenic regions used in this study were extracted from the genome sequences using the modules implemented in INCLUSive (Thijs et al. 2001) to automatically parse GenBank files. Here, we define intergenic regions as the noncoding sequence between two coding sequences. No overlap was allowed with the coding sequences.

Identification of Orthologs

Clusters of orthologs were identified using TribeMCL (Enright et al. 2002). The obtained clusters contain proteins that are close homologs (either orthologs or paralogs) of each other. The pairwise BLAST scores obtained by mutually aligning the whole-genome sequences using blastp (Altschul et al. 1997) were used as input of TribeMCL. Stringent criteria were applied to retain only closely related orthologs and paralogs (cutoff of the BLAST hit was an e-value of 1e−80). When the BLAST hit of a protein against itself resulted in an expectation value higher than 1e−80, we used less stringent criteria (a cutoff of 1e−20). Because the analysis of the PhoPQ regulon is performed at an operon level, in the context of this study we defined an operon to be unique if, for all genes of an operon in one organism (e.g., E. coli), no orthologs are found in the other organism (e.g., S. typhimurium).

Operon Prediction

Note that for our analysis it was of importance not to detect highly reliable operons (high selectivity, low sensitivity), but to unveil anything that could possibly be an operon (high sensitivity, low selectivity). Therefore, compared to studies focusing on operon prediction, we used deliberately nonstringent criteria for our operon prediction: the intergenic distance was used as the main criterion for defining an operon (Salgado et al. 2000; Moreno-Hagelsieb and Collado-Vides 2002). When two genes were located on the same strand and the distance of the intergenic sequence between these two genes was smaller than a predefined cutoff value, these two genes were considered residing in the same operon. For both E. coli and S. typhimurium we used a cutoff value of 40 nucleotides to predict operons.

These cutoff values were determined as follows: we first predicted operons in E. coli using different cutoff values. Each of these predicted sets was compared to the operon prediction of RegulonDB that was used as a benchmark set (Salgado et al. 2004). An operon in RegulonDB was considered to be identical to our own predicted operon if both operons contained exactly the same genes. A maximal match of the predicted operons in E. coli with the RegulonDB database was obtained using the cutoff value of 40 nucleotides, mentioned above. Since for S. typhimurium no such benchmark dataset existed, we used the cutoff value on the S. typhimurium intergenic distance that maximized the match between the predicted set in E. coli and that in S. typhimurium. A match in this context means that an operon in E. coli and S. typhimurium contains orthologous genes only. For E. coli the average number of genes in an operon was predicted to be 2.88, while for S. typhimurium an average of 2.90 genes per operon was found.

Motif Detection

We used MotifSampler (Thijs et al. 2002) to construct the PhoP motif model (i.e., a probabilistic representation of the DNA pattern). The MotifSampler is a motif detection algorithm based on Gibbs sampling that allows retrieving statistically overrepresented patterns in the promoter regions of coregulated genes (Thijs et al. 2001; Marchal et al. 2003). Based on a training set containing promoter regions of known PhoPQ-regulated genes (phoP, mgtA, pmrD, pdgL, and slyB in S. typhimurium [Yamamoto et al. 2002; Lejona et al. 2003] and yrbL, ybcU/vboR, and rstA in E. coli [Minagawa et al. 2003]), the motif detected with the highest information content had a length of six nucleotides and a consensus site (T/G)GTTTA. This corresponds to one half site of the previously suggested dyad PhoP motif (Soncini et al. 1995; Yamamoto et al. 2002). The motif model corresponding to this motif was used for the genomewide motif screening of all intergenic sequences of E. coli and S. typhimurium, respectively, using MotifLocator (Thijs 2003; Marchal et al. 2004). This algorithm uses the motif model to calculate a score for each window (with length similar to the length of the motif model) in the intergenic sequences. The threshold score was set at 0.75 (nonstringent screening criterion). This threshold corresponds to the selection of approximately the 2% best scoring motif hits of all possible motif positions in the genome if the intergenic regions of the whole genome of S. typhimurium and E. coli, respectively, would be screened (S. typhimurium, 1.91%; and E. coli, 1.96%).

For the stringent screening criterion, we included extra restrictions on the PhoP motif model that were derived from previous publications (Yamamoto et al. 2002; Minagawa et al. 2003; Lejona et al. 2003), in which the PhoP motif was described as a direct repeat of (T/G)GTTTA. A conserved thymine in the first half site (at position 3) and two conserved thymines and one conserved adenine in the second half site (at positions 3, 4, and 6, respectively) were shown to be essential for the binding of the PhoP regulatory protein. Therefore, as an additional criterion we explicitly required that these four positions were conserved inside and around the motif instances retrieved from the first screening.

Calculating the Overlap in Regulon Composition

To identify the direct regulon, we checked whether the intergenic regions of the first genes of operons that were selected as differentially expressed in either one (or both) of the organisms contained a PhoP motif. To prove unequivocally the presence of a restricted overlap in regulon composition of the PhoPQ system between the two organisms compared (see Discussion), we used extremely nonstringent criteria to calculate this overlap: for each gene differentially expressed by PhoP in one organism, we first identified the operon to which it belonged (based on the operon prediction outlined above). If at least one of the orthologs of the genes belonging to that operon turned out to be differentially expressed in the other organism as well, the operon was considered differentially expressed in both organisms (i.e., identifying an operon as PhoPQ dependent in both organisms does not necessarily imply that all genes of the operon were identified as differentially expressed in both organisms).

Enrichment of Functional Classes

To identify which functional classes were enriched in the directly PhoPQ-dependent E. coli and S. typhimurium genes, the functional classifications of, respectively, EcoCyc (Karp et al. 2004) and the S. typhi Sanger database (Parkhill et al. 2001) were used. To use the annotation of the Sanger database of S. typhi strain C18 for S. typhimurium, a mapping between the S. typhi and the S. typhimurium gene names based on ortholog sequence information was made. Functional enrichment of PhoPQ-dependent targets was calculated using the hypergeometric distribution (Tavazoie et al. 1999), which assigns to each functional class a p-value. This p-value describes for each functional class the probability that in a random set of genes, the same number of genes of that specific functional class will be observed. In detail, the probability of observing at least k predicted PhoPQ-regulated genes from a functional category within the total number of predicted PhoPQ target genes (n) is given by

$$ P=1 - \sum\limits_{i=0}^{k - 1} {{{\left( {\matrix{ f \cr i \cr } } \right)\left( {\matrix{ {g - f} \cr {n - i} \cr } } \right)} \over {\left( {\matrix{ g \cr n \cr } } \right)}}} $$
(1)

where f is the total number of genes in the functional category in the EcoCyc database or the S. typhi Sanger database and g is the total number of genes within the genome of E. coli and S. typhimurium, respectively.

Supplementary Information

Supplementary material is available at ftp://ftp.esat.kuleuven.ac.be/pub/sista/marchal/Supplementary_informationPhoPQ_2004/Supp Website.html.

Results

Identification of the PhoP-Dependent RegulonBased on Microarray Analysis

To identify genes dependent on PhoPQ in E. coli, we used the results described by Minagawa et al. (2003). They identified 219 genes as being upregulated by PhoP.

To identify the regulon regulated by PhoPQ in S. typhimurium we compared expression in a knockout and constitutive mutant of the genes encoding the PhoPQ system. Based on the analysis summarized in Fig. 1A the expression of approximately 2855 genes was found to be affected to some extent by the constitutive mutation. Based on selection criteria outlined in Materials and Methods, we selected, from the 324 most significantly differentially expressed genes, the 214 most upregulated genes. This number approximated the number of E. coli PhoPQ-upregulated genes selected by Minagawa et al. (2003). Conclusively, for both datasets we obtained a subset of genes most severely upregulated by PhoPQ. The quality of the microarray results and the biological relevance of the used cutoff values were confirmed by the presence of experimentally verified PhoPQ regulated genes in the selected subset of genes.

Based on our analysis, the 219 genes differentially PhoP regulated in E. coli were organized in 193 operons (on a total of 2848 predicted operons, of which 2027 are singletons). Similarly, the 214 most affected S. typhimurium genes were located in 189 operons (on a total of 3026 predicted operons of which 2195 are singletons).

Motif Screening

To identify target operons directly regulated by the PhoPQ regulatory system, intergenic regions of the first genes of the operons that were identified as differentially expressed in either one or both organisms were screened with the PhoP motif model. Initially the PhoP box was postulated to be a direct repeat of the hexanucleotide (T/G)GTTTA separated by a spacer of five nucleotides. Recent experimental evidence allowed refinement of the specific sequence requirements of the PhoP box (Yamamoto et al. 2002; Minagawa et al. 2003; Lejona et al. 2003). These studies confirmed that the PhoP regulatory protein is able to bind to a promoter region that does not display an intact dyad motif. A conserved thymine in the first half site (at position 3) and two conserved thymines together with one conserved adenine in the second half site (at positions 3, 4, and 6, respectively) were sufficient for detection of the DNA binding of the PhoP protein in in vitro DNA footprinting analysis.

However, as mentioned in these experimental studies, it cannot be excluded that alterations at some of these positions just reduce the PhoPQ-dependent regulation to a level that cannot be observed by in vitro footprinting analysis (Lejona et al. 2003). Therefore, in order not to bias our initial genomewide screening toward previously made observations and to guarantee a maximal sensitivity, we first screened the selected promoter regions with a motif model that describes only one half site of the PhoP box (high-sensitive, low-specific screening). In addition, we also performed a second more stringent screening taking into account also the specific sequence requirements described above (low-sensitive, high-specific screening). The first screening resulted in 130 and 147 putative target operons in E. coli and S. typhimurium, respectively. Taking into account the more stringent criterion, 42 and 34 operons potentially directly dependent on PhoP were retained in S. typhimurium and E. coli, respectively (see Fig. 2A and Fig. 2B).

Figure 2
figure 2

(A) Alignment of the detected PhoP motifs in E. coli. Conserved positions are highlighted. (B) Alignment of the detected PhoP motifs in S. typhimurium. Conserved positions are highlighted.

Combination of Motif Screening Resultsand Microarray Data

In this section we compare the overlap of the PhoPQ regulon between E. coli and S. typhimurium, i.e., between the 193 operons of E. coli and the 189 operons of S. typhimurium that were identified as upregulated by PhoPQ in each of the respective organisms. The directly regulated operons were distinguished from the indirectly regulated ones by the presence of a PhoP motif in the region upstream of a gene. Results are based on the stringent screening criteria. Detailed information can be found at the supplementary Web site. A schematic representation is shown in Fig. 3.

Figure 3
figure 3

A Overview of the PhoPQ-dependent genes differentially expressed in both S. typhimurium and E. coli. Each branch of the tree corresponds to a specific subcategory. The number of operons for the different subcategories is displayed. Subcategories for which the results need to be interpreted with caution are indicated by a ?. B Overview of the PhoPQ-dependent genes differentially expressed in E. coli but not in S. typhimurium. Legend as for A. C. Overview of the PhoPQ-dependent genes differentially expressed inS. typhimurium but not in E. coli. Legend as for B.

Corresponding Genes in Both Datasets

When comparing both datasets, 13 operons were differentially expressed in both the S. typhimurium and E. coli (Fig. 3A). Only 2 of these 13 orthologous operons contained a PhoP box in their upstream region in both S. typhimurium gene and E. coli (i.e., phoPQ and slyB). In both these genes, a PhoP motif was also conserved in the evolutionary-related bacteria Shigella flexneri and Yersinia pestis, offering an extra validation for the biological relevance of the identified regulatory motif. In one gene (pagP), a PhoP box was found in S. typhimurium but was missing in the promoter region of the corresponding ortholog of E. coli (crcA). These results suggest that there is a direct regulation of pagP in S. typhimurium, while this regulation is indirect in crcA in E. coli. Alternatively, this motif might be a false-positive result. In the remaining 10 operons, no PhoP box was present either in E. coli or in S. typhimurium, indicating that these genes are probably indirectly regulated by the PhoPQ two-component system. A detailed list of the operons that are differentially expressed in both organisms can be found in Table 1.

Table 1 List of PhoPQ-dependent operons that are differentially expressed both in E. coli and in S. typhimurium

E. coli Microarray Data

Besides the 13 operons that were differentially expressed in both organisms according to the corresponding microarray data, remarkable differences were found for other operons (Fig. 3B). Forty-five operons that were differentially expressed in E. coli do not have an orthologous operon in S. typhimurium or in evolutionary-related bacteria like Yersinia pestis and Shigella flexneri. Eleven of these operons displayed a PhoP box in their promoter region, while the remaining 34 E. coli operons were most likely indirectly regulated by the PhoPQ system (no PhoP box found).

On the other hand, 135 E. coli operons that were differentially expressed in the microarray of Mina- gawa et al. (2003) and that had an orthologous operon in S. typhimurium were not differentially expressed according to our data in S. typhimurium. For 17 of these 135 orthologous operons, differential expression and direct regulation (i.e., a PhoP box found) was confined to E. coli only. Ninety-five of these 135 Mg2+ repressed E. coli operons appeared to be indirectly regulated by the PhoPQ two-component system (no PhoP box was detected upstream these operons). Besides the previous results, some inconsistent observations were made. Nineteen of the PhoPQ-dependent differentially expressed E. coli operons did not contain a motif, while in the corresponding non-differentially expressed S. typhimurium orthologs, a PhoP box was present. For a total of four operons, in both organisms a PhoP box was found but differential expression was observed in E. coli only. The latter two classes might be due to false-positive motif hits, measurement errors in the microarray data or incomplete sampling of relevant PhoPQ-dependent conditions. Therefore, these results should be interpreted with caution.

S. typhimurium Microarray Data

Performing a similar analysis for S. typhimurium (Fig. 3C), 84 PhoPQ-dependent S. typhimurium operons do not have an ortholog in E. coli or in evolutionary-related species. In 21 of these 84 unique operons, we found evidence for direct PhoPQ dependency (i.e., a PhoP box is found in the promoter region of the first gene of the operon), while the remaining 63 operons were indirectly PhoPQ dependent (i.e., no PhoP box present in the promoter region of the gene). Ninety-two of the significantly differentially expressed S. typhimurium operons contained an orthologous operon in E. coli. However, these operons in E. coli seemed not to be differentially expressed in a PhoPQ-dependent way. In 13 of these 92 S. typhimurium operons, evidence for direct PhoPQ-dependent regulation was found. On the other hand, 64 of the 92 operons did not display a PhoP motif in their promoter region pointing toward indirect regulation by the PhoPQ system in S. typhimurium operons. Also in this dataset, a limited number of inconsistencies was observed for which no clear conclusions could be drawn. For 9 of the 92 PhoPQ-dependent differentially expressed S. typhimurium operons that did not contain a PhoP motif, a PhoP box was present in the corresponding non-differentially regulated E. coli orthologs. Another six operons contained a PhoP box in both organisms, although differential expression only was observed in S. typhimurium.

Functional Classes

Based on the functional classification schemes of the EcoCyc and the S. typhi Sanger database, we retrieved for each PhoPQ regulated gene in E. coli and S. typhimurium its corresponding functional class. We subsequently determined which functional classes were significantly enriched in the sets of directly/indirectly PhoPQ-dependent genes in both E. coli and S. typhimurium. It should be mentioned that many of the detected PhoPQ-dependent targets (especially the unique genes) are not yet annotated in these databases, resulting most probably in an underestimation of the degree of overrepresentation of certain functional classes.

For both the stringent and the nonstringent estimation of the direct PhoPQ-dependent regulon, we calculated the enrichment of functional categories in the selected gene sets. The enrichment was most pronounced using the gene sets resulting from the stringent criteria (lower p-values). Moreover, the functional classes that were enriched in these gene sets better corresponded to the known functionalities of PhoPQ-dependent genes and the average size of the stringent direct PhoPQ-dependent regulon approximated the previously predicted direct PhoPQ regulon size of 40 loci (Soncini et al. 1996; Soncini and Groisman 1996). Therefore, the most stringent criteria seemed to be the best approximation of the true regulon composition and these were used for further characterization of this regulon.

In Table 2, the overrepresented functional classes in both the directly and the indirectly PhoPQ-dependent regulons are shown for S. typhimurium. The categories of “small molecule metabolism” (subcategories “degradation” [1.A] and “energy metabolism” [1.B]), “protein translation and modification” (3.A.8), “transport of anions and carbohydrates, organic acids and alcohols” (4.A.3 and 4.A.5), and “adaptations and atypical conditions (5.F)” are mainly overrepresented in the indirect regulon. Overrepresentation of the classes “aminoacyl tRNA synthetases” (3.A.5 containing pheS, pheT, rbn) and “transport of amino acids and amines” (4.A.1; containing artP, artI, nrdF, STM4157, artM, PSLT043, ybj) and the class “drug/analogue sensitivity” (5.D; to which belong pmrD, pqaA, pagP) is confined to the direct regulon. Significantly enriched within the subset of both direct and indirect PhoPQ target genes is the category for “central intermediary metabolism” (1.C), e.g., the directly regulated genes of sulfur metabolism (cysC, cysI, cysJ, cysN, cysD), the functional class involved in the “synthesis and modification of the cell envelope” (3.C.1), to which belong the directly PhoPQ regulated genes slyB, pmrF, yfbG, STM2303, STM1940, spy, STM1864, lpxO, STM2302, ybjY, ybaY, STM4065, yahO, and pagO, and the functional category responsible for “cation transport” (4.A.2), to which belong the directly regulated genes chaB, artQ, and kefB.

Table 2 Overview of enriched functional classes for PhoPQ-dependent S. typhimurium genes
Table 3 Overview of enriched functional classes for PhoPQ-dependent E. coli genes

In E. coli, a significant part of both direct and indirect dependent PhoPQ genes is also enriched in functional categories involved in general metabolism (7; e.g., “central intermediary metabolism and carbon utilization”), in “transport” (9), and in functions related to “cell wall and membrane structure” (2). Besides “these previous classes,” the indirect regulon is also enriched in the categories “cell processes” (1) and information transfer (5). Some genes, mainly those of direct regulon, are involved in “regulation” (8). Note, however, that the functional analysis of these E. coli genes is only partial because 30 of the potentially directly regulated PhoPQ genes are not assigned to any functional class in the EcoCyc database.

Detailed Analysis of Novel Direct PhoPQ Targets

Most of the known PhoPQ regulated genes with an experimentally verified PhoP box could be retrieved by our analysis (phoP, slyB, pdgL/pcgL, and pmrD in S. typhimurium [Lejona et al. 2003] and phoP, slyB, yrbL, ybcU/vboR, and rstA in E. coli [Minagawa et al. 2003]). The mgtA gene was missing in our analysis because it was inaccurately measured (low significance) on our arrays and was not spotted on the microarray of Minagawa et al. (2003). Retrieving the known PhoPQ targets illustrates the predictive power of our analysis and allows us to suggest the presence of new, promising, directly PhoPQ regulated targets in S. typhimurium.

To the functional category related to the “cell envelope” belong several directly PhoPQ-regulated genes with, as the most promising novel targets, pagO, lpxO, and STM1940. pagO encodes an integral membrane protein, previously described as PhoPQ dependent (Gunn et al. 1998) that is similar to a product of the Yersinia virulence plasmid. According to our analysis, pagO would be directly PhoPQ dependent. lpxO codes for a dioxygenase that plays a role in lipid synthesis (Gibbons et al. 2000). Belonging to the same functional class (subcategory “membranes and lipoproteins”) is the novel potential direct PhoPQ target STM1940. This gene codes for a cell wall-associated hydrolase in S. typhimurium.

The set of directly PhoPQ-dependent genes we identified was also statistically enriched for the functional class related to “drug sensitivity.” Three of the genes we identified from this category (pmrD, pqaB, pagP) are important for resistance to antimicrobial peptides. The pmrD gene is shown to be directly PhoPQ regulated (Kato et al. 2003). The pagP gene was shown to be PhoPQ dependent and important for the modification of lipid A (Guo et al. 1997), but in addition, our results point to its direct regulation. A third gene, pqaB, is shown to be involved in antimicrobial peptide resistance in S. typhi but is suggested to be indirectly PhoPQ regulated via the PmrAB system (Baker et al. 1999). This means that the detected PhoP box is either false positive or points toward a complex dual regulation of this gene.

Although not significantly enriched within the set of directly PhoPQ-regulated genes, the functional class related to pathogenicity and virulence also contained potential PhoPQ-dependent targets (pagC, mgtC, virK, and STM0306). Although the three former genes have previously been related to PhoPQ dependency, we find here, in addition, evidence for a potential direct dependency on PhoPQ. PagC is a membrane protein (Gunn et al. 1995) that was shown to be essential for survival within macrophages and for virulence in S. typhimurium (Pulkkinen and Miller 1991). MgtC was also shown to be required for intramacrophage survival and growth under low-Mg2+ conditions (Blanc-Potard and Groisman 1997). The exact function of MgtC, however, is still unknown. VirK contributes to the resistance of S. typhimurium against polymyxine B and is important for the systemic infection of the bacteria (Detweiler et al. 2003). Moreover, both pagC and mgtC are located on the SPI-3 pathogenicity island and were previously suggested to be acquired via horizontal gene transfer (Gunn et al. 1995; Blanc-Potard and Lafay 2003). This would imply that—after acquisition—both genes were integrated into the PhoPQ-dependent regulatory cascade of S. typhimurium. STM0306, a fourth PhoPQ-regulated gene that is involved in pathogenicity, is a paralog of the S. typhimurium SapA protein, which was shown to play a role in virulence (Parra-Lopez et al. 1993). An ortholog of SapA is present in E. coli and Erwinia chrysanthemi. In the latter, SapA was shown to play a role in the resistance of this organism to antimicrobial peptides (Lopez-Solanilla et al. 1998).

Discussion

Based on the combination of microarray and motif data, the PhoPQ-dependent regulon in both E. coli and S. typhimurium was reconstructed. This reconstruction is the best estimate of the true regulon composition that can be made at this stage, due to the presence of inherent variation commonly observed in microarray experiments, the uncertainty about the motif requirements of the PhoP motif model, and the restricted availability of experimental data. Note that, for instance, the E. coli and the Salmonella microarray experiments were not performed under exactly the same conditions. However, we can expect that the conditions that were used trigger a large part of the PhoPQ regulon in both organisms. Indeed, the E. coli dataset tests the influence of Mg2+, the most important PhoPQ signal. The Salmonella experiment, on the other hand, makes use of a constitutive mutant and therefore is relatively independent of the conditions applied. Conclusively, the size of the E. coli regulon might be underestimated compared to the size of the S. typhimurium regulon. This, however, does not prevent us from studying the overlap in regulon composition: a large overlap in regulon composition would imply that at least the targets induced by the major PhoPQ trigger, i.e., Mg2+ detected in E. coli, would be contained within the PhoPQ regulon in Salmonella. This not being the case, as we observed in this study, indicates that the true overlap indeed will be low.

Statistical analysis of the PhoPQ-related expression data of S. typhimurium clearly pointed out the pleiotropic nature of the PhoPQ regulatory system. We found evidence for at least 2855 genes being affected by the PhoP mutation. From these, only a limited subset of approximately 42 operons was directly regulated by PhoP, which is in accordance with previous predictions (Soncini et al. 1996; Soncini and Groisman 1996). The seemingly contradictory observations of the pleiotropic nature of the PhoPQ system, on the one hand, and the small size of the direct PhoPQ-dependent regulon, on the other hand, can be explained by the high number of regulatory proteins that are part of the direct regulon (e.g., mig-14, slyA, pmrD, traM, STM0859). Each of these regulators can activate other regulatory cascades (Lejona et al. 2003), allowing a combinatorial increase in PhoPQ-affected genes, e.g., pmrD encodes a protein that posttranscriptionally activates the two-component PmrAB system (Groisman 2001; Kato et al. 2003).

In an initial attempt to compare the overlap in the PhoPQ regulon composition between E. coli and S. typhimurium, we used a very strict definition of overlap: when two orthologs were differentially expressed in both organisms (data not shown). Although such a stringent analysis will result in a highly reliable set of potential PhoPQ targets, overlapping targets might escape detection because of potential type II error in the expression data (presence of false negatives in either one of the organisms). Therefore, we repeated the analysis using the low-stringent criteria described under Materials and Methods. Using this higher-sensitivity analysis, however, did not drastically increase the detected overlap in regulon composition, proving the biological relevance of the detected low overlap. Indeed, comparing the average regulon composition using nonstringent criteria, an overlap of only 13 PhoPQ regulated operons was observed (i.e., 26 genes).

For estimation of the direct regulon, we made approximations based on both stringent and nonstringent sequence requirements of the PhoP motif, assuming that the true sequence requirements must be somewhere in between. Besides the mgtA gene, which was missing in our analysis for reasons explained before, only 2 of these 13 overlapping operons (phoPQ and slyB) were directly regulated by the PhoPQ two-component system in both organisms.

In addition to these overlapping operons, both organisms had a considerable set of operons for which the PhoPQ dependency was confined to either one of the two organisms. These results point toward a high specialization of the PhoPQ regulon in either one of the species. This might not be so unexpected in view of the major role the PhoPQ system plays in determining the virulence phenotype of S. typhimurium, a phenotype that is absent in the related nonpathogenic strain E. coli K12 (Groisman 2001). This also explains why a large part of the direct regulon comprises genes that are unique for both species. These general conclusions seemed relatively independent of the stringency of the definition of “regulon overlap” and of the motif requirements used, indicating that the true overlap between the PhoPQ regulon compositions in both species is indeed low.

In S. typhimurium, significantly enriched functional classes within the set of directly PhoPQ-regulated genes are involved in “central intermediary metabolism,” “synthesis and modification of cell envelope,” “cation transport,” and “drug sensitivity.” The most significantly enriched classes (i.e., lowest p-values) in the subset of E. coli PhoPQ directly dependent genes are also related to “general metabolism” and “cell and membrane structure.” Despite the limited overlap in regulon composition between the two species, the PhoPQ regulatory system seems to have conserved common functions in both species.

Conclusion

Our analysis shows how a regulatory system that is very well conserved (Kasahara et al. 1992) and that corresponds to the same extracellular signal can become integrated in a relatively short time period (120–160 million years (Cotter and DiRita 2000)) in seemingly completely different pathways. This acquisition of novel target genes might explain the high ability of prokaryotic organisms to evolve novel phenotypes and adapt to specific niches. The PhoPQ regulon might have recruited genes that contribute to a virulence phenotype in S. typhimurium but not in E. coli K12.