Introduction

Cells of the innate immune system, such as macrophages and dendritic cells, express a limited number of germline-encoded pattern-recognition receptors (PRR) that specifically recognize pathogen-associated molecular patterns (PAMPs) within microbes. These molecular patterns are unique to these microbes and are absent in the host [1]. Toll-like receptors (TLRs) are currently the best-characterized members of the PRRs [2]. The progress of genome sequencing projects has led so far to the identification of 13 groups of TLRs in mammalian genomes, ten in humans and 13 in mice [3], and more than 20 in non-mammalian genomes [4]. All TLRs have a common domain organization, with an extracellular ectodomain, a helical transmembrane domain, and an intracellular Toll/IL-1 receptor homology (TIR) domain [5]. The extracellular domain (ectodomain) is responsible for the recognition of common structural patterns in various microbial molecules. For example, lipoproteins or lipopeptides are recognized by TLR2 complexed with TLR1 or TLR6, viral double-stranded RNAs by TLR3, lipopolysaccharides by TLR4, bacterial flagellins by TLR5, single-stranded RNAs by TLR7 or TLR8, and microbial CpG DNAs by TLR9 [6, 7]. The TIR domains of TLRs are associated with the intracellular signaling cascade leading to the nuclear translocation of the transcription factor NF-κB [8].

A TLR ectodomain contains 19–27 consecutive leucine-rich repeat (LRR) motifs sandwiched between two terminal LRR modules (LRRNT and LRRCT) [4]. LRRs exist in more than 6000 proteins, and more than 100 crystal structures of these proteins have been deposited in the Protein Data Bank (PDB) [9, 10]. In every case, the protein adopts an arc or horseshoe shape. An individual LRR motif is defined as an array of 20–30 amino acids that is rich in the hydrophobic amino acid leucine. All LRR sequences can be divided into a conserved segment and a variable segment. The conserved segments, LxxLxLxxNxL, generate the concave surface of the LRR arc or horseshoe by forming parallel β-strands, while the variable parts form its convex surface, consisting of helices or loops. The terminal LRRNT and LRRCT modules stabilize the protein structure by shielding its hydrophobic core from exposure to solvent.

To date, only the crystal structures of the ectodomains of human TLR1–4 and mouse TLR2–4 have been determined [1115]. High-throughput genome sequencing projects, however, have led to the identification of more than 2000 TLR sequences. Thus, the structures of most TLRs are still unknown because structure determination by X-ray diffraction or nuclear magnetic resonance spectroscopy experiments remains time-consuming. Protein structure prediction methods are powerful tools for bridging the gap between sequence determination and structure determination.

Homology modeling, also referred to as comparative modeling, is currently the most accurate computational method for protein structure prediction. This approach constructs a three-dimensional model for a target protein sequence from a three-dimensional template structure of a homologous protein. Thus, the quality of the homology model strongly depends on the sequence identity between the target and template. Below 30% identity, serious errors may occur [16]. Due to different repeat numbers and distinct arrangements of LRRs in the TLR ectodomains, a proper full-length template with a sufficiently high sequence identity to the target is often missing. This limitation can be overcome by assembling multiple LRR templates. In this approach, the most similar (on the sequence level) LRR with a known structure is searched for as a local template for each LRR in the target sequence. Such an LRR template may be derived from TLRs or from other proteins. Thereby, a suitable template may even be found for an insertion-containing irregular LRR. All local template sequences are then combined to generate a multiple sequence alignment for the complete target sequence. Thus, a high-quality model can be created, even if no adequate single template is available. To facilitate the multiple template assembly of LRR proteins, we developed the LRRML database [9], which archives individual LRR structures that were manually identified from all known LRR protein structures. In addition, we have developed TollML [4], a database of sequence motifs of TLRs. In TollML, all known sequences of TLR ectodomains were semi-automatically partitioned into LRR segments and made available for query. For newly sequenced TLRs that are not yet archived in TollML, we have implemented an LRR prediction program named LRRFinder on the TollML webpage. It requires an LRR-containing amino acid sequence as input, and returns the number and positions of the LRRs in the input sequence. LRRFinder recognizes LRR motifs based on a position-specific weight matrix scan, with sensitivity and specificity both higher than 93%. With the help of these two databases, LRR partitions of a TLR ectodomain can be obtained directly, and an optimal structure template for each LRR segment can be quickly found. A schematic flowchart of the modeling procedure is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of the LRR template assembly method

In this study, we apply the multiple template assembly approach to TLRs. To demonstrate the potential of the method, we constructed two models of the mouse TLR3 ectodomain as a test case using our LRR template assembly method and a standard profile–profile alignment-aided full-length template recognition method. Both models were then compared with the crystal structure of mouse TLR3. The overall and ligand-binding site conformation of the template assembly-based model is closer to that of the crystal structure than that based on the standard method. We also modeled the human TLR5–10 and mouse TLR11–13 ectodomains, which represent mammalian TLR ectodomains with unknown structures. A comparison of the model for human TLR6 with the very recently reported crystal structure of mouse TLR6 shows a very good structural agreement.

Methods

Template selection and sequence alignment

Amino acid sequences of the mouse TLR3, human TLR5–10, and mouse TLR11–13 ectodomains were extracted from TollML release 3.0 (IDs 627, 531, 571, 992, 575, 1022, 851, 703, 705, and 704). Their LRR partitions were annotated by TollML. For each LRR sequence contained in each target TLR, the three-dimensional LRR structure with the highest sequence identity was selected as a template from LRRML through a sequence similarity search. Then, a multiple sequence alignment of a target with all its local LRR templates was generated, with each template comprising one alignment line. For instance, the mouse TLR3 has a total of 25 LRRs and accordingly required 25 templates. The associated multiple sequence alignment then has 26 lines (Fig. 1). Because of the characteristic consensus sequences of LRRs, these alignments were made more accurately manually than automatically. To generate an alternate model with standard methods, the widely acknowledged template recognition program pGenTHREADER [17] was executed to find templates for mouse TLR3. This method calculates sequence profiles from an input sequence and combines profile–profile alignments with secondary structure specific gap-penalties, pair potentials, and solvation potentials using a linear combination. The output is the complete PDB structures that serve as candidate templates ranked by P values. Each candidate sequence is aligned with the target sequence.

Model construction and validation

The initial three-dimensional coordinates of all models were calculated by MODELLER 9v7 [18]. The above-described alignment file and the corresponding template structures of the target model were inputted into the default “model” routine of MODELLER. A given number of three-dimensional models were calculated. The ectodomains of TLRs contain a number of insertion regions. Some of them corresponded to four to 15 amino-acid-long gaps in the alignments because their templates do not contain a corresponding insertion. During modeling, these gaps produced loop structures in the model, thus deteriorating the model accuracy. ModLoop [19] was used to rebuild the coordinates of these loop regions. Finally, we used the model quality assessment programs ProQ [20] and MetaMQAP [21] to evaluate the output candidate models and select the one with the best scores as the final model. The structure superimpositions and molecular electrostatics involved in the structural analysis were carried out using SuperPose v1.0 [22] and VMD [23], respectively. The docking studies of TLR11 and its ligand profilin were performed with GRAMM-X [24].

Results

LRR templates

The full-length ectodomains of mouse TLR3, human TLR5–10, and mouse TLR11–13 contained 25, 23, 21, 28, 28, 28, 21, 26, 25, and 27 LRRs, respectively. Consequently, a total of 252 individual LRR templates sourced from 41 different PDB structures were selected from LRRML. Their sequence identities with the targets varied from 26.0% to 95.7% (43.8% on average), and similarities ranging from 39.0% to 100% (58.2% on average) were obtained. Remarkably, all cases of relatively low sequence identity (<35%) were caused by highly irregular target LRRs. These highly irregular sequences include LRRNT/CTs, the highly mutated LRR15 of TLR7/8/9, and the insertion-containing LRRs, whose templates do not include a similar insertion. The sources (LRRML IDs) and sequence identities of all LRR templates are listed in Table 1.

Table 1 Sources and sequence identities (%) of the LRR templates

As the modeling of mouse TLR3 was carried out to verify our approach, we assumed that the crystal structure of the mouse TLR3 ectodomain was unknown and excluded the corresponding LRR entries of mouse and human TLR3 ectodomains from LRRML before the template search. The selected individual LRR templates for mouse TLR3 were associated with 18 PDB structures, 14 of which were from non-TLR proteins. The target–template sequence identities ranged from 33.3% to 50.0% (44.1% on average). By contrast, pGenTHREADER provided only complete PDB structures of LRR proteins as candidate templates, with each candidate possessing a pairwise sequence alignment with the target. Because no single template covered the entire sequence of mouse TLR3, we selected the first seven candidates by rank (except mouse and human TLR3) and combined them into a multiple alignment to avoid template gaps. These templates included PDB structures 2Z64, 2Z81, 1O6V, 3FXI, 2Z7X, 1JL5, and 3BZ5, which covered the closest homologs (mouse TLR2/4 and human TLR1/4) to mouse TLR3 among all proteins with known structures. Nevertheless, the sequence identities of the seven templates to mouse TLR3 ranged from 16.1% to 21.2% (18.7% on average), which fell much below the cut-off value of 30% for homology modeling [16].

Structural models

Model of mouse TLR3

Recently, we constructed a model of mouse TLR3 with the LRR template assembly method as a test case for the LRRML database [9]. It revealed a horseshoe-shaped assembly that adopts a regular solenoid structure without disordered regions. The model was superimposed on the crystal structure of the mouse TLR3 ectodomain (PDB code: 3CIG) [14] at both of its ligand-binding regions, LRRNT–LRR3 and LRR19–LRR21. The backbone root mean square deviations (RMSDs) were 1.96 Å and 1.90 Å, respectively [9]. To verify the improvements in the database and the modeling process, we reconstructed the mouse TLR3 model (Fig. 2b) with up-to-date LRR templates. Compared with the old model, four of the 25 LRRs of TLR3 in the new model were assigned new templates with higher sequence identities (Table 1). Because these four LRRs are not involved in the TLR3 ligand-binding sites, the corresponding RMSD values of the new model were the same as for the previous model. These values indicate that the model predicted with our method closely matches the crystal structure and can be used to predict potential ligand-binding sites [25].

Fig. 2a–d
figure 2

Homology models and crystal structure of the mouse TLR3 ectodomain. a The homology model based on the standard method. The framed region exhibits serious disorder. b The homology model based on the template assembly method. c The crystal structure (PDB code: 3CIG). The dotted region is an insertion on LRR20 that is missing in the crystal structure. d The target-template sequence alignment of the disordered region of the standard method-based model. Mismatches and target gaps resulted in the disorder in a, where two to four template LRRs were wrongly assigned to five target LRRs

For comparison purposes, the mouse TLR3 ectodomain was also modeled with a standard profile–profile alignment-aided full-length template recognition method. All of the ten output models obtained from MODELLER for the full-length templates-based standard alignment showed a serious structural disorder spanning LRR6–10 (Fig. 2a). The LRR6–10 on the crystal structure form a regular solenoid structure with an α-helix in the variable segment of LRR8 (Fig. 2c). By contrast, the corresponding LRRs in the model completely lost the proper LRR shape and were interwoven with one another. This disorder was caused by mismatches or target gaps in the alignment, where only two to four template LRRs were assigned to five target LRRs (Fig. 2d). The standard alignment could not create a one-to-one correspondence between the target and template LRR units due to the irregularity of the LRRs. ProQ and MetaMQAP were used to evaluate the quality of the different models of mouse TLR3 (Table 2). Both programs make an integrative assessment of the structure quality, considering the geometries, stereochemistries, and energy distributions of the structures. Both of the template assembly-based models received better scores than the model based on the standard method.

Table 2 Evaluation of the crystal structures and models of the TLRs. Higher ProQ_LG/MS and MetaMQAP_GDT values indicate higher model qualities; higher MetaMQAP_RMSD values indicate lower model qualities

Models of human TLR5–10 and mouse TLR11–13

Using the LRR template assembly method, we modeled the human TLR5–10 and mouse TLR11–13 ectodomains. All of the resulting models are provided in supplementary file 1. Ramachandran plots of these models were created with PROCHECK [26] and are provided in supplementary file 2. Model evaluation data obtained with ProQ and MetaMQAP are listed in Table 2. The models reveal a horseshoe shape (Fig. 3), where a longer or shorter sequence (more or fewer LRR units) implies a smaller or larger horseshoe opening; e.g., TLR7/8 (smaller opening) and TLR6/10 (larger opening). Their overall structural similarity reflects the phylogenetic relationships among these TLRs. For example, TLR6 is similar to TLR10, while TLR7 is similar to TLR8, consistent with the molecular tree proposed by Roach and co-workers [27]. Mammalian TLRs are distinct from other LRR proteins in that they contain two to seven insertion-containing irregular LRRs, which may be necessary for ligand binding and receptor dimerization. Our models show that all insertions are located on the face of the horseshoe to which the convex site β-strands point, whereas the other face is completely insertion-free.

Fig. 3
figure 3

Models of the human TLR5–10 and mouse TLR11–13 ectodomains. The N-linked glycan sites of these TLRs were obtained from the NCBI protein database and are labeled with black balls

To highlight the applicability of these models to analyses of receptor–ligand interaction mechanisms, we performed molecular electrostatics calculations of the mouse TLR11 model with a ligand. TLR11 can recognize the profilin of some apicomplexan protozoan parasites. This protein is involved in parasite motility and invasion [28]. Expression of TLR11 is suppressed in humans [29]. The electrostatic analysis (Fig. 4a) shows that the entire surface of the profilin of P. falciparum (PDB code: 2JKF) is predominantly negatively charged, whereas TLR11 exhibits several positively charged patches (Fig. 4b). Protein docking studies using GRAMM-X showed that profilin and the positive patches on TLR11 possess compatible sizes and electrostatic complementarity (supplementary file 3).

Fig. 4a–b
figure 4

Surface charge analysis (APBS electrostatics) of a the crystal structures of profilin (PDB code: 2JKF) and b the model of the mouse TLR11 ectodomain. Blue, positive charge; red, negative charge

Very recently, the crystal structure of the TLR6 ectodomain complexed with TLR2 and a Pam2CSK4 ligand was released in PDB (PDB code: 3A79). The TLR6 structure is a hybrid structure of mouse and inshore hagfish, where 18 mouse LRRs (LRRNT–LRR17) were hybridized with two hagfish LRRs (LRR18 and LRRCT) [30]. This crystal structure served as an additional benchmark for our template assembly approach. The superimposition of the mouse part of the crystal structure with our human TLR6 model yielded a backbone RMSD of 1.94 Å, which indicates that the model is very similar to the crystal structure (Fig. 5). A second model of human TLR6 was generated with pGenTHREADER in a similar procedure to that described for the mouse TLR3 (supplementary file 1). The backbone RMSD between the crystal structure and this model is 1.89 Å. The only full-length template used for this model was the structure of human TLR1 (PDB code: 2Z7X). Because TLR6 and TLR1 possess the same number of LRRs and have a very high sequence identity (63.3%), the structure of TLR1 serves as an excellent full-length template. Under these very good conditions, both the standard and the template assembly approaches provided high-quality models.

Fig. 5
figure 5

Superimposition of the homology model and the crystal structure of the TLR6 ectodomain. Green, homology model of human TLR6 (LRRNT–LRR17); orange, crystal structure of mouse TLR6 (LRRNT–LRR17). The Pam2CSK4 ligand-binding site is located on the variable parts of LRR10–12. The overall backbone RMSD is 1.94 Å and the backbone RMSD of the ligand-binding region is 1.18 Å

Discussion

In template-based protein modeling, the overall sequence identity between the target and template is an important criterion for the selection of suitable templates [31]. For repetitive LRR proteins, however, an appropriate full-length template is often not available, due to different repeat numbers and distinct arrangements. This problem can be solved by combining individual repeating units that are locally optimal for the target sequence. In the method validation with mouse TLR3, the average target–template sequence identity achieved by our method was 44.8%, which was significantly higher than that (18.7%) achieved by a standard profile–profile alignment-aided template recognition method. However, both the standard and the template assembly methods produced models of human TLR6 that closely matched the crystal structure of mouse TLR6. A comparison between the models obtained with both methods highlights the potential of the template assembly approach. It can produce models of a similar quality to those obtained with the standard profile–profile alignment method. The template assembly method, however, reveals its particular strength in situations where no adequate full-length templates are available. In the case of TLR3, the standard profile–profile alignment method failed to predict a reliable model due to significant gaps in the template. Here, the template assembly method overcomes the difficulties and generates a realistic model.

In a previous work, we constructed models of human TLR7/8/9 ligand-binding domains by combining LRR segments that were extracted from all known crystal structures of TLRs [32]. The average target-sequence similarities for TLR7/8/9 were 47.7%, 47.2%, and 46.8%, respectively. The resulting models supported experimentally determined ligand-binding residues [33, 34] and provided a reliable basis to identify potential ligand-binding residues and potential receptor dimerization mechanisms. Here, we went a step further and extended the scope of the approach by checking LRR segments from all LRR-containing proteins with known structures in the LRRML database, because the same type of LRR can exist in different proteins [9]. Consequently, 33 of the 41 source PDB structures are non-TLR proteins (numbers derived from Table 1). The average target-sequence similarities for TLR7/8/9 increased to 55.9%, 58.2%, and 59.2%.

Another key issue in LRR protein modeling is the sequence-level LRR partition of the target TLR sequence. The indicated number and beginning/end positions of the LRRs in TLRs vary a great deal across different databases or research reports, due to the irregularity and periodicity of the LRRs. TollML reports the most complete and accurate LRR motifs for TLRs as compared with a number of databases [4]. In addition, TollML provides a statistics-based LRR prediction program, LRRFinder, for new TLR entries that are not yet collected in TollML. It can recognize LRRs from an input amino acid sequence with high confidence.

Conclusions

In conclusion, this work depicts an LRR template assembly approach to protein homology modeling. The comparison of a mouse TLR3 model with its crystal structure underlined the feasibility and reliability of the method. With this method, a series of mammalian TLR ectodomains were modeled. These models can be used to perform ligand docking studies or to design mutagenesis experiments, and hence to investigate TLR ligand-binding mechanisms. Our modeling approach can be extended to other repetitive proteins.