Introduction

The Toll gene was first discovered in the 1980s as an important component of a pathway that establishes the dorsoventral axis in the early embryo of Drosophila (Anderson and Nusslein-Volhard 1986; Schupbach and Wieschaus 1989). It was later found also to play an essential part in antifungal defense (Lemaitre et al. 1996). Nine proteins were revealed to belong to the Toll family by sequencing of the Drosophila genome (Tauszig et al. 2000). The first mammalian proteins structurally related to Drosophila Toll were then identified and named Toll-like receptor (TLR) 1 and 4 (Nomura et al. 1994; Medzhitov et al. 1997). The Toll and TLR family proteins are characterized as a transmembrane receptor with an extracellular domain with Leucine-rich repeats (LRR) participating in ligand recognition and an intracytoplasmic domain containing a Toll/interleukin-1 receptor homology (TIR) domain, critical to both Drosophila Toll and mammalian TLR signaling (Werling and Jungi 2003). There are 10 human and 9 murine transmembrane proteins belonging to the mammalian TLR family (Akira et al. 2001; Zarember and Godowski 2002). The TLR family members are crucial in the early phase of infection when innate immunity is important, as well as linking innate and adaptive immunity throughout the entire course of the host defense response (Takeuchi and Akira 2001; Werling and Jungi 2003). Apparently, the expansion of TLR receptors was in the early stage of vertebrates, probably due to the large-scale gene duplications (Gu et al. 2003).

These broad functional roles of TLR receptors are likely prompted by the substantial diversity of ligand-binding affinity. Although it is generally believed that sequence evolution in the extracellular domain after gene duplications may be the key for understanding how the ligand-binding affinity evolves (Smirnova et al. 2000), this evolution remains largely uncharacterized. In this brief communication, we address this issue by conducting a phylogeny-based functional divergence analysis to formulate a testable hypothesis that may be valuable to direct further experimentation.

Materials and Methods

A comprehensive search by Gapped BLAST and PSI-BLAST were performed in several major protein databases using the human TLR2 gene as a query sequence. After partial sequences and redundant sequences were removed, the final data set included 40 complete vertebrate TLR sequences, and one Drosophila homologous gene. The multiple alignment of 41 TLR amino acid sequences was obtained by using the software CLUSTAL X (Thompson et al. 1997). This was from human (10), non-human primate (5), rodents (10), non-human, non-primate, non-rodent mammals (11), non-mammalian vertebrates (4) and Drosophila (1).

Based on the multiple alignment of 41 TLR amino acid sequences by CLUSTAL X (Thompson et al. 1997), we inferred the phylogenetic tree, using the neighbor-joining method (Saitou and Nei 1987) implemented in the software MEGA2.0 (http://www.megasoftware.net/); other methods (parsimony (PAUP) and likelihood (PHYLIP)) gave almost the same results.

Gu (1999) developed a statistical method to detect critical amino acid residues that might be responsible for functional divergence by investigating whether the evolutionary conservation of these residues have been changed, in our case, between these three clades, that is, an amino acid residue can be highly variable in one clade but highly conserved in the other one. Statistically, this functional divergence between two clades is measured by the coefficient of functional divergence, θ, ranging from 0 to 1. A null hypothesis of θ = 0 indicates that the evolutionary rate is virtually the same between two duplicate genes at each site (Gu 1999, 2001). If the null was rejected, a site-specific profile is then used to predict critical amino acid residues that are most likely responsible for the detected functional divergence. This method is implemented in the software DIVERGE (http://www.xgu.zool.iastate.edu) (Gu and Vander Velden 2002).

Results and Discussion

There are two important domains in the TLR gene family: the extracellular domain containing LRR and the cytoplasmic domain with the TIR. The LRR domain, consisting of many tandem LRR, plays key roles in binding ligands for defending pathogens. The LRR motifs of TLR2, 4, and 9 were defined by the Interpro program from the SWISSPROT database. There were sufficient conservations between TLR2, 4 and 9 genes, and enough amino acid sequences available only in TLR2, 4 and 9 to have reliable multiple alignments. The LRR motif alignments among the human TLR2, 4 and 9 genes were constructed, based on the sequence alignments of three genes. (See the online supplement materials for the accession numbers of sequences and the multiple alignment).

The inferred phylogeny of TLR gene family (Fig. 1) shows that there are three major clades, supported by high bootstrapping values: clade A includes TLR1, 2, 6, and 10; clade B includes TLR4; and clade C includes TLR 3, 5, 7, 8 and 9. Interestingly, these three major clades correspond to diversified ligand properties: clade A recognize the Gram-positive bacteria, except for the unknown ligand of TLR10; the clade B recognize Gram-negative bacteria; and the ligand of the clade C are mixed (Gram-positive, Gram-negative, virus, CpG-DNA, antiviral compound). Based on the ligand property of clade A, one might infer the ligand of TLR10 was Gram-positive. The evolutionary closeness between TLR2, TLR1 and TLR6 implies a similar function they may have, supported by the fact that there are functional interactions between TLR2 and TLR1 or TLR6 in response to pheno-soluble modulin in mouse (Haijjar et al. 2001).

Fig. 1
figure 1

The phylogenetic tree of the TLR gene family. The neighbor-joining algorithm was used to infer the topology based on the multiple sequence alignment with Poisson distance. Bootstrap scores >50% are presented. Accession numbers: TLR1: AAC34137.1 (Human), NP_109607.1 (mouse); TLR2: NP_003255.2 (human), AAK91868.1 (macaca), AAD46481.1 (mouse), Q9R1F8 (hamster), AAL16722.1 (Bovine), BAB16842.1 (chicken), Q8MIQ3 (rabbit). TLR3: NP_003256.1 (human), AAK26117.1 (mouse); TLR4: NP_612564.1 (human), AAD29272.1 (mouse), AAC13313.1 (rat), AAG32061.2 (bovine), BAB43947.1 (cat), AAF05320.1 (chimpanzee), Q8SPE9 (orangutan), AAF07059.1 (papio), Q8MIQ2 (rabbit), TLR4_HORSE (horse), AAD41891.1 (cricetulus); TLR5: NP_003259.2 (human). NP_058624.1 (mouse); TLR6: NP_006059.2 (human), BAA78632.1 (mouse); TLR7: NP_057646.1 (human), NP_573474.1 (mouse); TLR8: NP_057694.2 (human), NP_573475.1 (mouse), Q865R7 (pig); TLR9: NP_059138.1 (human), NP_112455.1 (mouse), BAC66473.1 (pig), AAN15751.1 (cat), BAC65192.1 (dog), CAD52053.2 (bovine); TLR10: NP_112218.1 (human); Toll-8: AAF86224.1 (Drosophila); TLR21: Q800I6 (pufferfish); TLR: Q801F9 (goldenfish)

We used the method of Gu (1999) to explore the connection between the TLR protein sequence evolution and the distinct ligand properties of three clades. Several case studies, for example, the caspase gene family (Wang and Gu 2001), and Jak protein kinase family (Gu et al. 2002) have shown promising perspective of this methodology in functional genomics study. The results are presented in Table 1. Interestingly, though clade A and clade B are evolutionarily more closely-related, a relatively higher coefficient of functional divergence is observed than that between clades A and C, or clades B and C. This pattern is consistent with properties of ligand of clade. The properties of ligands of clade A (Gram-positive) and B (Gram-negative) are different. Therefore, more functional divergence would be expected, while the ligands of clade C share part of ligands with clade A or B. A similar pattern is obtained when TLR2, 4, and 9 genes, which have relatively large sets of sequences available are used for representing clades A, B and C.

Table 1 The θ values and SE from pairwise cluster comparisons in the Toll-like receptor (TLR) gene family.

To investigate the connection between sequence evolution and the different ligand properties of clade A and B, we predicted the critical amino acid residues that are responsible for the functional divergence by calculating the (posterior probability) site-specific profile between pair-wise comparisons of TLR2, 4 and 9, respectively. We observed that the number of sites with higher values in the extracellular region is much greater that in the cytoplasmic region for all three gene comparisons. This pattern is not affected by the cut-off value, implying a potential connection with the diversity of ligand properties. Given an appropriate cut-off value (see the footnote of Table 2), the number of predicted critical residues within each domain of TLR genes between pairs of gene clusters are presented in Table 2. These predicted sites were definitely conserved in one cluster but variable in the other cluster. The χ2 test shows significant differences between extracellular and cytoplasmic regions in all comparisons; the signal and transmembrane regions were not included because of short lengths.

Table 2 Number of sites, predicted critical residue sites within each domain between pairs of gene clusters and χ2 (P value) test of distribution between extracellular and cytoplasmic domains based on cutoff value.

It is well known that duplication in the TLR gene family provides the opportunities for the host to recognize the variability of pathogens. The extracellular domain had significantly higher functional divergence than the cytoplasmic (TIR) domain we discovered, supports the concept that the extracellular domain is biologically critical for host-pathogen interactions. Indeed, the highest level of functional divergence was detected by the site-specific analysis in the region between LRR9 to LRR13 motif of TLR4, which may have potential RNA-binding domain function (Kirschning and Schumann 2003). Fig. 2 shows the distributions of the number of predicted critical sites in LRRs among four pairs of cluster comparisons. The numbers of predicted critical amino acid residues between LRR10 and LRR15 motifs were generally higher than the rest of the motifs of the extracellular domain. This implies that the motif between LRR10 and LRR15 might contain potential targets responsible for ligand binding, which is testable by further biological experimentation. On the other hand, the conserved cytoplasmic domain may not be required to cope with a variety of ligands and ligands of variable structure. Indeed, all TLRs share a common adaptor molecule MyD88 that interacts with the TIR domain for signal transduction (Means et al. 2000), therefore, a more conserved structure of the cytoplasmic domain can ascertain specificity or affinity of binding.

Fig. 2
figure 2

The distribution of the number of predicted critical sites among leucine-rich regions (LRR) among four pairs of clusters comparisons: TLR2/4, TLR2/9, TLR4/9 and clade A/B

In summary, the specific-site posterior profile approach was applied to predict only Type I functional divergence among homologous genes within gene family. There are still many other approaches to identify functional divergence from the evolutionary perspective (Casari et al. 1995; Livingstone and Barton 1996; Pollock et al. 1999; Gaucher et al. 2001; 2002). With the accumulation of more sequence data, multi-species sequence analysis will make more accurate and reliable predictions for functional divergence using the current approach. Further study will combine the microarray data to evaluate the relative importance of expression and protein function divergence after vertebrate gene duplications (Gu 2004; Gu et al. 2005).