Main

The immunostimulatory activity of CpG-DNA is affected by its length, the number of CpG motifs, and the sequences flanking the CpG motif. The core CpG motif, which consists of a hexamer with a central unmethylated CpG, has the general formula RRCGYY (where R represents a purine and Y a pyrimidine)5. Unlike other TLRs, TLRs 7–9 each have a long insertion loop (Z-loop) between LRR14 and LRR15 (Extended Data Fig. 1). Several recent studies have shown that proteolytic processing at the Z-loop is necessary for the creation of functional TLR96,7,8,9,10. To reveal the molecular mechanism by which TLR9 specifically recognizes CpG-DNA and sends signals to the intracellular compartment, we performed biochemical, biophysical and crystallographic studies of the agonistic- and antagonistic-DNA-binding modes.

We screened recombinantly expressed extracellular domains of TLR9 from various species. Of the proteins we examined, the purified yields of the horse (Equus caballus; Ec), bovine (Bos Taurus; Bt) and mouse (Mus musculus; Mm) orthologues (Extended Data Fig. 1) were sufficient for crystallographic study. We used DNA1668_12mer derived from 20-mer single-stranded DNA16683 as agonistic DNA (Extended Data Fig. 2a), and iDNA408411 and iDNA_super as iDNAs (Fig. 1a). DNA1668 and iDNAs (iDNA4084 and iDNA_super) function as an agonist and antagonist, respectively, for both horse and bovine TLR9 (Extended Data Fig. 2b). Unless otherwise noted, the DNAs used in this study were single-stranded. Both DNA1668_12mer and iDNA4084 were able to bind Z-loop-processed and -unprocessed TLR9, demonstrating that the binding of both types of DNA is independent of Z-loop processing (Fig. 1b, c). Z-loop-unprocessed TLR9 remained mostly monomeric, irrespective of the presence of agonistic DNA1668_12mer, whereas the proportion of Z-loop-processed TLR9 present as a dimer increased significantly in the presence of DNA1668_12mer (Fig. 1d and Extended Data Fig. 3). These results indicate that the Z-loop is functionally significant in the ligand-dependent oligomerization of TLR9. By contrast, DNA1668_12mer_GC (DNA1668 with the CpG motif swapped for GpC) and iDNA4084 could induce the formation of only small quantities of TLR9–DNA dimers (Fig. 1d).

Figure 1: DNA binding to TLR9 is independent of Z-loop processing, but subsequent oligomerization is dependent on processing.
figure 1

a, DNA sequences used in this study. The DNA regions involved in the intramolecular base pairing in iDNA4084 and iDNA_super are underlined. b, SDS–PAGE analysis of EcTLR9 with unprocessed and processed Z-loops. c, Gel-filtration chromatography of EcTLR9. Black line and dashed line denote absorption at 280 nm and 260 nm, respectively. The ratios of absorbances at 260 nm and 280 nm are shown in parentheses. d, The oligomerization states of EcTLR9 analysed by SV–AUC. The original c(s) distributions were normalized against the height of the main peak (see Methods). kDa, kilodalton; S, Svedberg (unit of sedimentation coefficient).

PowerPoint slide

We determined the crystal structures of the unliganded, agonistic-DNA-bound, and iDNA-bound forms of Z-loop-processed TLR9 molecules (Extended Data Table 1 and Extended Data Fig. 4). In the crystals, the unliganded and iDNA-bound forms of TLR9 are monomeric (Fig. 2a, b), whereas the agonistic-DNA1668_12mer-bound forms of TLR9 are dimeric (Fig. 2c and Extended Data Fig. 5). Similar to other TLRs12,13,14,15,16,17, the agonistic-DNA-bound forms form an m-shaped 2:2 complex in which the C-termini of the two TLR9 protomers are positioned in the centre (Fig. 2c and Extended Data Fig. 5d).

Figure 2: Structures of TLR9.
figure 2

a, Monomer structure of unliganded EcTLR9 showing the lateral face (left) and the convex face from the N-terminal side (right). The N-glycan residues are shown in stick representations with their O, N and C atoms coloured red, blue and grey, respectively. The N- and C-terminal halves of EcTLR9 are coloured light green and orange, respectively. b, Structure of EcTLR9 in complex with iDNA4084 showing the lateral face (left) and the concave face of the N-terminal half (right). Bound iDNA4084 is shown in stick representation and semi-transparent surface representation. The 5′ and 3′ ends of the DNA are indicated. c, Dimer structure of the EcTLR9–DNA1668_12mer complex. EcTLR9 and its dimerization partner EcTLR9* are shown in green and cyan, respectively. The second TLR9 within the dimer and its residues are indicated with asterisks. The DNA1668_12mer molecules are shown in stick representation and semi-transparent surface representation.

PowerPoint slide

The agonistic DNA binds to two equivalent positions in the dimer, and each DNA1668_12mer is recognized by both TLR9 and TLR9* (The second TLR9 within the dimer and its residues are indicated with asterisks) in a bent conformation (Fig. 2c and Extended Data Fig. 6). DNA1668_12mer winds around the N-terminal fragment of TLR9 from the ascending lateral face to the concave face to interact with a region spanning from LRRNT to LRR10 (interface 1). This binding region is consistent with the results of previous reports18,19. Simultaneously, DNA1668_12mer interacts with the loop regions from LRR20–22 in the C-terminal fragment of TLR9* (interface 2) (Extended Data Fig. 6b). Thus, agonistic DNA acts as ‘molecular glue’ to bridge the two TLR9 molecules. This structural feature strongly suggests that only single-stranded DNA can act as an agonist. Accordingly, double-stranded DNAs containing the CpG motif had greatly reduced affinity (Extended Data Fig. 7 and Extended Data Table 2).

In DNA1668_12mer, the G4–T9 sequence corresponding to the consensus hexamer of GACGTT is mainly recognized by TLR9 as opposed to the other part of DNA. The bases of the CpG motif are accommodated in the groove formed by LRRNT, LRR1 and LRR2 in the ascending lateral face of TLR9 (Fig. 3a). The CpG motif and the flanking bases are recognized via interactions with multiple amino acids, as well as via water-mediated hydrogen bonds (Fig. 3a). The C6 moiety in the CpG motif forms direct hydrogen bonds: the cytosine O2 atom with Met106 N and Ser104 Oγ and the cytosine N3 atom with Ser104 Oγ. In addition, the cytosine N4 atom makes water-mediated (W1 and W2) hydrogen bonds with His76, Pro99 and Phe108, and the cytosine ring itself is wedged between Pro105, Phe108 and a neighbouring CpG guanine (G7) (Fig. 3a). Together with Trp47 and Phe49, C6 forms a three-walled cage that accommodates the neighbouring G7. The G7 N2 atom engages in hydrogen bonds with the Trp96 O and O4 atom of T9 at the +2 position, anchoring the guanine ring to the bottom of the CpG binding groove. Because the N2 atom is unique to guanine these interactions define the specificity for guanine in the CpG motif. The thymine ring at the +1 position (T8) stacks with Trp47, and the thymine ring at the +2 position (T9) is inserted into the CpG-binding groove and sandwiched between Trp47 and Trp96. T9 also forms hydrogen bonds to Ser72 and G7. The adenine ring at the –1 position (A5) stacks with Phe108, and in turn G4 (–2 position) stacks onto A5 (Fig. 3a). Because purine bases can form more extensive contacts than pyrimidine bases, the purine–purine sequence preceding the CpG motif is favoured by this three-layered stacking interaction revealed by gel-filtration analysis (Extended Data Fig. 8). The backbone phosphates of G4 and A5 are recognized electrostatically by the positively charged side chains of Lys51, Arg74, His76 and His77. The mutation of several residues (Trp47, Trp96 and Phe108) important for the interaction with DNA resulted in a protein with dramatically reduced binding affinity (Extended Data Table 2), in agreement with the results of the structural analysis.

Figure 3: Agonistic-CpG-DNA recognition by TLR9.
figure 3

a, Left, CpG-DNA binding groove formed by LRRNT, LRR1 and LRR2. Middle, overview of CpG-DNA recognition by TLR9 (EcTLR9_DNA1668_12mer interface 1). Right, magnified view of the CpG recognition. The simulated omit electron densities are shown at a contour level of 3.0σ. b, Magnified view of EcTLR9_DNA1668_12mer in interface 2. c, Schematic summary of CpG-DNA recognition. Dashed lines in red and black depict hydrogen bonds and van der Waals interactions, respectively. d, Magnified view of the protein–protein interface. e, NF-κB activation of mouse TLR9 mutants stimulated by agonistic DNA1668 (1 μM). Residues in parenthesis are derived from horse TLR9. ‘Ctrl’ represents the empty vector without mouse TLR9 cDNA. f, The NF-κB activation of mouse TLR9 by DNA1668, DNA1668_GC or DNA1668_met in HEK293T cells. Ctrl, no stimulation. In e and f, data from triplicate experiments are shown as mean ± s.d. and two-tailed Student’s t-test was used to evaluate the statistical significance (**P < 0.01).

PowerPoint slide

Interface 2 also plays an important role in the dimerization of TLR9. Residues of interface 2 engage in four hydrogen bonds and several van der Waals interactions (Fig. 3b, c). In contrast to interface 1, which recognizes the base moieties of the CpG motif, LRR20*–LRR22* from the C-terminal fragment primarily recognize the backbone of CpG motif-containing DNA. The insertion loops in the ascending lateral surface of LRR2, LRR5, LRR8, LRR11, LRR18* and LRR20*, which is characteristic of the TLR7–9 family, are involved in protein–protein interactions with complementary shapes (Fig. 3b, d).

We mutated the residues important for the recognition and examined the ability to activate NF-κB (Fig. 3e). Most of the mutants exhibited reduced or completely abolished activation of NF-κB signalling in response to DNA1668. These results clearly demonstrate that interfaces 1 and 2 are both important for the functional integrity of TLR9. We converted the CpG motif into GC, UG, TG and CA and employed isothermal titration calorimetry (ITC) to determine the affinity of TLR9 for the resultant DNAs. The dissociation constant (Kd) values were 20 nM (CG), 569 nM (GC), 54 nM (UG), 163 nM (TG) and 883 nM (CA), respectively (Extended Data Table 2), demonstrating that the CG sequence is important for binding. The mutations of the CpG motif reduced affinities for TLR9: several interactions were disrupted by swapping CG for GC, whereas the direct interactions between G7 (N2) and Trp96 (O) and T9 (O4) were lost by the conversion of CG into CA. C6 (N4) forms a hydrogen bond with the water molecule (W1) that makes hydrogen bonds with Pro99 O and Phe108 O. The substitution of UG for CG would be unfavourable because W1 is surrounded by three hydrogen bond acceptors. The conversion of CG into TG would result in a weak affinity for the same reason as UG, and the methyl group might further weaken an affinity, possibly by disrupting water molecule clusters; the importance of water-mediated interactions for the recognition of methylated DNA in MeCP2 having been previously identified20. We also employed ITC to assess the pH-dependence of CpG-DNA binding to TLR9. Binding affinity decreased as pH increased, with Kd values ranging from 20 nM at pH 6.0 to 2500 nM at pH 8.0, revealing that the interaction was stronger under acidic than basic conditions (Extended Data Table 2). Consistent with this, the structural study revealed that His residues are concentrated around the DNA-binding region, resulting in a higher-affinity interaction under acidic conditions (Extended Data Fig. 6c). In addition, we examined the binding affinity of methylated CG (DNA1668_12mer_met) for TLR9. The Kd value for this interaction was 50 nM (Extended Data Table 2), demonstrating that methylated CG yields weaker binding. Accordingly, sedimentation velocity analytical ultracentrifugation (SV–AUC) analysis demonstrated that methylation of the CpG motif exhibited a reduced ability to dimerize (Fig. 1d). Consistent with this, DNA1668_met exhibited reduced activation (Fig. 3f).

We also determined the crystal structures of TLR9 bound to iDNAs (iDNA4084 and iDNA_super) (Fig. 2b and Extended Data Table 1). The binding site for iDNA partially overlaps with the binding site for agonistic DNA (Extended Data Fig. 6a, b), and this overlap between binding sites accounts for the antagonistic effect of iDNA. Of particular interest, both iDNA4084 and iDNA_super in complex with TLR9, interacting with LRR2–LRR11, form stem-loop structures that fit snugly into the interior of the ring structure of TLR9 (Fig. 4). The stem-loop structure of the iDNAs is formed by intramolecular base pairing between C1-C2-T3 and T7-G8-G9 (two GC pairs and one TT mismatch pair) in the iDNA4084 complex, and between C1-C2-T3-C4 and G13-A14-G15-G16 (three GC pairs and one AT pair) in the iDNA_super complex (Fig. 4 and Extended Data Fig. 4c–e). The length of the loop seems to be immaterial: iDNA_super, which has a long loop, binds similarly to iDNA4084, which has a short loop. The recognition of iDNAs is primarily mediated via the DNA backbone. Also, the base at the cohesive-end position (G10 in iDNA4084) was directly recognized (Fig. 4 and Extended Data Fig. 5f).

Figure 4: Recognition of iDNA by TLR9.
figure 4

a, Magnified view of iDNA4084 bound to EcTLR9. b, Schematic summary of iDNA recognition. Dashed lines in red and black depict hydrogen bond and van der Waals interactions, respectively.

PowerPoint slide

The structures of the unliganded, agonistic CpG-DNA-bound and iDNA-bound forms of TLR9 described in this study reveal the structural basis of CpG-DNA recognition and signalling by TLR9, as well as the inhibitory mechanism of iDNA. These results will contribute to the development of therapeutic agents that target TLR9.

Methods

Protein expression, purification and crystallization

The DNA encoding the extracellular domain of Toll-like receptor 9 (TLR9) from various species (human (Q9NR96, residues 25–818, 100%), monkey (F6UZJ0, residues 26–817, 95.7%), horse (Q2EEY0, residues 26–817, 83.6%), bovine (Q866B2, residues 25–815, 77.9%), pig (S5R6V0, residues 25–816, 80.4%), rat (M0RAA8, residues 26–818, 71.5%), mouse (Q9EQU3, residues 26–818, 73.4%) and zebrafish (B3DJW3, residues 23–844, 38.1%)), where values in parentheses corresponds to the Uniprot accession number, the region and sequence identity (versus human) of the extracellular domain, with a C-terminal thrombin cleavage site followed by protein A tag were inserted into the expression vector pMT/BiP/V5-His of the Drosophila Expression System. For the preparation of crystallization samples of MmTLR9, a total of seven mutations (N200Q, N242Q, N309Q, N495Q, N568Q, N695Q and N752Q) were introduced to produce the protein with reduced glycosylation sites. Drosophila S2 cells were co-transfected with the TLR9 and pCoHygro vectors. Stably transfected cells were selected in Sf-900 II SFM medium containing 300 μg ml−1 hygromycin. Z-loop processing of TLR9 is important for its function6,7,8,9,10. Therefore, to mimic the Z-loop processing that occurs in the cell, the purification protocol included V8-protease treatment, which yielded the Z-loop-cleaved product. After proteolytic processing, the N- and C-terminal halves of TLR9 remained associated in subsequent purification steps. Protein secreted to the supernatant was captured by IgG Sepharose 6 Fast Flow (GE healthcare) equilibrated with phosphate buffered saline (PBS), washed with ten column volumes of PBS, and eluted by 0.1 M glycine-HCl pH 3.5 and 0.15 M NaCl. Eluent was immediately neutralized by adding with 1/20 volume of 1 M Tris-HCl pH 8.0 and was concentrated to 5–10 mg ml–1 and further purified by Superdex 200 gel filtration chromatography equilibrated with 10 mM Tris-HCl pH 7.5 and 0.15 M NaCl. For BtTLR9 and EcTLR9, concentrated TLR9 was added with 1/10 volume of 1 M Na-acetate pH 5.0 and incubated overnight at room temperature with 1–2 U per mg of protein endo Hf (New England Biolabs) for saccharide trimming. Monomeric fractions from Superdex 200 were collected and was incubated with 1/20–1/50 (w/w) V8 protease (Wako) for 12 to 48 h to cleave the Z-loop and protein A tag. Z-loop-processed TLR9 was further purified by HiTrap SP (GE healthcare) cation exchange chromatography. The column was equilibrated with 10 mM Mes pH 6.0 and 0.1 M NaCl and the bound protein was eluted by a linear gradient from 0.1 to 0.7 M NaCl.

For the crystallizations, purified TLR9 was concentrated to 4.0–6.8 mg ml–1 in 10 mM Tris (pH 7.5), 150 mM NaCl. To prepare the DNA complex of TLR9, the protein solutions were combined with an approximately twofold excess of DNAs (DNA1668_12mer, iDNA4084 and iDNA_super). Crystallization experiments were performed with sitting-drop vapour-diffusion methods at 293 K. The crystallization droplets were made by mixing the equivolume of protein solution and reservoir solution, typically around 0.5–2.0 μl, except in the case of the EcTLR9–DNA1668_12mer complex where the protein solution and reservoir solution were mixed with to a 3:1 ratio. Corresponding to the observed pH dependency of TLR9, the crystals of agonistic forms of TLR9 complexed with DNA1668_12mer were obtained only in acidic conditions (pH 5.8 and pH 5.5 for horse and bovine TLR9, respectively), while the crystal of unliganded and iDNA bound forms of TLR9 were obtained in wide range of pH (4.5–8.0). The crystallization conditions are summarized in Extended Data Table 1.

Data collection and structure determination

Diffraction data sets were collected on beamlines PF-AR NE3A (Ibaraki, Japan) and SPring-8 BL41XU (Hyogo, Japan) under cryogenic conditions at 100 K. Crystals were soaked into cryoprotectant solution summarized in Extended Data Table 1 and then flash-cooled under a cold gas stream. The diffraction data sets were processed using the HKL2000 package21 or imosflm22. The initial phases for the unliganded form of MmTLR9 were determined with the molecular replacement method by using the program Molrep23 with the coordinates of the human TLR8 structure (PDB ID: 3W3J)16. The model was further refined with stepwise cycles of manual model building using the COOT program24 and restrained refinement using REFMAC25 or phenix.refine26 until the R factors converged. The EcTLR9 and BtTLR9 structures were determined by the molecular replacement method using the Molrep23 program using the refined MmTLR9 structure. Ligand molecules, N-glycans and water molecules were modelled into the electron density maps at the latter cycles of the refinement. The quality of the final structure was evaluated with MolProbity27. In the structures of EcTLR9 (unliganded), EcTLR9–DNA1668_12mer, EcTLR9–iDNA4084, BtTLR9–DNA1668_12mer, MmTLR9 (unliganded), MmTLR9–iDNA4084 (form1), MmTLR9–iDNA4084 (form2) and MmTLR9–iDNA_super, 100%, 100%, 98%, 99%, 99%, 99%, 100% and 99% of the residues were in Ramachandran favoured or allowed regions, respectively. The statistics of the data collection and refinement are summarized in Extended Data Table 1. The figures representing structures were prepared with PyMOL28.

Isothermal titration calorimetry

ITC experiments were performed at 298 K in a buffer composed of 50 mM phosphate buffer pH 6.0–8.0, 250 mM NaCl using a MicroCal iTC200 (GE Healthcare). DNAs at a concentration of 50 μM were titrated into 5 μM of wild-type or mutant horse TLR9. The titration sequence included a single 0.4 μl injection followed by 19 injections, 2 μl each, with a spacing of 120 s between the injections. OrigineLab software (GE Healthcare) was used to analyse the raw ITC data. Thermodynamic parameters were extracted from curve fitting analysis with a single-site binding model.

Analytical ultracentrifugation sedimentation velocity

SV–AUC analyses were performed in a ProteomeLab XL-I analytical ultracentrifuge (Beckman Coulter) equipped with a 4-hole An60Ti rotors at 20 °C using Beckman Coulter 12-mm double-sector charcoal-filled epon centerpieces and sapphire windows. The scanning at 42,000 r.p.m. was performed as quickly as possible between 6.0 and 7.2 cm from the axis of rotation with a radial increment of 30 μm. To analyse the dimerization induced by DNA binding, horse TLR9 Z-loop-unprocessed and Z-loop-processed samples were run at a loading concentration of 20 µM with or without equimolar concentrations of DNAs (Fig. 1d). To analyse the concentration dependence of the dimerization, the AUC measurements were performed at protein concentrations ranging from 1.5 µM to 30 µM in the presence of equimolar DNA1668_12mer (Extended Data Fig. 3). All sedimentation velocity experiments were conducted in a buffer containing 10 mM MES and 250 mM NaCl at pH 5.5.

The sedimentation coefficient distributions were obtained using the c(s) method of SEDFIT29. The sedimentation coefficients ranging from 0.1 to 50 S with a logarithmically spaced grid and resolution of 500 were used. The frictional ratio, meniscus, radial and time-invariant noise were floated during the fitting procedure, and a regularization level of 0.68 was used. The partial specific volume, the buffer density and viscosity were calculated using the program SEDNTERP 1.09 and were 0.7407 cm3 g–1, 1.00852 g ml–1 and 1.0256 cP, respectively. The percentages of monomer and dimer were calculated by dividing the corresponding peak area by the sum of the areas under two peaks.

NF-κB-dependent luciferase reporter assay

To check mouse TLR9 response, HEK293T cells were seeded in collagen-coated 6-well plates at a density of 5 × 105 cells per well, and transiently transfected with wild-type or mutant mouse TLR9 cDNAs in pMX-puro-IRES-rat CD2 (1 μg), together with wild-type mouse Unc93B1 cDNA in pMX-puro (0.5 μg) and a pELAM1-luc reporter plasmid (5 ng), using PEI (Polyethylenimine “Max”, MW40,000 ; Polysciences, Inc.) at 36 h before stimulation. To check horse and bovine TLR9 response, HEK293T cells were plated on collagen-coated 10-cm dishes at a density of 6 × 106 cells per well, and transiently transfected with wild-type horse or bovine TLR9 cDNAs in pMX-puro-IRES-rat CD2 (3 μg), together with wild-type human Unc93B1 cDNA in pMX-puro (6 μg) and a pELAM1-luc reporter plasmid (30 ng), using PEI at 30 and 24 h before stimulation. The NF-κB luciferase reporter plasmid, pELAM1-luc, was provided by T. Muta (University of Tohoku, Japan)30. Twenty-four hours after first transfection, cells were reseeded in collagen-coated flat 96-well plates (Corning) at a density of 1 × 105 cells per well. Then, after pre-culture for 4∼6 h, attached cells were stimulated with various DNAs or 100 ng ml–1 recombinant human TNF-α (Wako Pure Chemical Industries) for 6 h. Stimulated cells were lysed by 40 μl of Cell Culture Lysis Reagent (Promega) and 6 μl of lysate was subjected to a luciferase assay using the Luciferase Assay System (Promega). The relative light unit (RLU) of chemiluminescence was measured by GloMax 96 Microplate Luminometer (Promega). Since retroviral vector pMX-puro-IRES-rat CD2 allowed for indirect validation of inserted cDNA expression by checking rat CD2 expression, transfection efficiency of wild-type and mutant mouse TLR9 cDNAs in HEK293T cells was evaluated by examination of cell-surface rat CD2 expression level using a FACSCalibur flow cytometer (BD Biosciences). The activity of pELAM1-luc in each transfected HEK293T cells was also verified by stimulation of recombinant human TNF-α.

Oligonucleotide

Oligonucleotides for the gel-filtration, ITC, AUC and crystallographic analyses were single-stranded DNAs with normal phosphodiester linkage unless otherwise noted. Oligonucleotides for the luciferase reporter assay were single-stranded DNAs with phosphorothioate linkage. Phosphorothioate DNAs (DNA1668, DNA1668_GC, DNA1668_12mer and DNA1668_12mer_GC) were purchased from FASMAC (Kanagawa, Japan). Other phosphorothioate DNAs (iDNA4084, iDNA_super and DNA1668_met) and all phosphodiester DNAs were purchased from Eurofins MWG Operon (Ebersberg, Germany).

Statistical analysis

Data from triplicate samples in Fig. 3e, 3f and Extended Data Fig. 2 were shown as mean ± s.d. and subjected to statistical analysis. Statistical significance was determined by two-tailed Student’s t-tests. A P value of less than 0.01 was considered to be significant. No statistical method was used to predetermine sample size.