Abstract
Polymorphisms in the human leukocyte antigen (HLA) genes strongly influence autoimmune disease risk. HLA risk alleles may influence thymic selection to increase the frequency of T cell receptors (TCRs) reactive to autoantigens (central hypothesis). However, research in human autoimmunity has provided little evidence supporting the central hypothesis. Here we investigated the influence of HLA alleles on TCR composition at the highly diverse complementarity determining region 3 (CDR3), which confers antigen recognition. We observed unexpectedly strong HLA–CDR3 associations. The strongest association was found at HLA-DRB1 amino acid position 13, the position that mediates genetic risk for multiple autoimmune diseases. We identified multiple CDR3 amino acid features enriched by HLA risk alleles. Moreover, the CDR3 features promoted by the HLA risk alleles are more enriched in candidate pathogenic TCRs than control TCRs (for example, citrullinated epitope-specific TCRs in patients with rheumatoid arthritis). Together, these results provide genetic evidence supporting the central hypothesis.
Similar content being viewed by others
Data availability
All raw TCR sequence and genotype data of the discovery and the replication datasets are available at the Adaptive Biotechnologies immuneACCESS site and EGA under accession nos. EGAD00001002671 and EGAD00001002663. All summary statistics of the cdr3-QTL analysis are available at https://github.com/immunogenomics/cdr3-QTL.
Code availability
All code used in this study is available at https://github.com/immunogenomics/cdr3-QTL and deposited at Zenodo (https://zenodo.org/badge/latestdoi/306212645).
References
Raychaudhuri, S. et al. Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nat. Genet. 44, 291–296 (2012).
Hu, X. et al. Additive and interaction effects at three amino acid positions in HLA-DQ and HLA-DR molecules drive type 1 diabetes risk. Nat. Genet. 47, 898–905 (2015).
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).
Lenz, T. L. et al. Widespread non-additive and interaction effects within HLA loci modulate the risk of autoimmune diseases. Nat. Genet. 47, 1085–1090 (2015).
Gutierrez-Achury, J. et al. Fine mapping in the MHC region accounts for 18% additional genetic risk for celiac disease. Nat. Genet. 47, 577–578 (2015).
Stahl, E. A. et al. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat. Genet. 42, 508–514 (2010).
Gebe, J. A., Swanson, E. & Kwok, W. W. HLA Class II peptide-binding and autoimmunity. Tissue Antigens 59, 78–87 (2002).
Busch, R., Kollnberger, S. & Mellins, E. D. HLA associations in inflammatory arthritis: emerging mechanisms and clinical implications. Nat. Rev. Rheumatol. 15, 364–381 (2019).
Koning, F., Thomas, R., Rossjohn, J. & Toes, R. E. Coeliac disease and rheumatoid arthritis: similar mechanisms, different antigens. Nat. Rev. Rheumatol. 11, 450–461 (2015).
Scally, S. W. et al. A molecular basis for the association of the HLA-DRB1 locus, citrullination, and rheumatoid arthritis. J. Exp. Med. 210, 2569–2582 (2013).
Ting, Y. T. et al. The interplay between citrullination and HLA-DRB1 polymorphism in shaping peptide binding hierarchies in rheumatoid arthritis. J. Biol. Chem. 293, 3236–3251 (2018).
Kwok, W. W., Domeier, M. L., Raymond, M. L., Byers, P. & Nepom, G. T. Allele-specific motifs characterize HLA-DQ interactions with a diabetes-associated peptide derived from glutamic acid decarboxylase. J. Immunol. 156, 2171–2177 (1996).
Jabri, B. & Sollid, L. M. T cells in celiac disease. J. Immunol. 198, 3005–3014 (2017).
Molberg, Ø. et al. Tissue transglutaminase selectively modifies gliadin peptides that are recognized by gut-derived T cells in celiac disease. Nat. Med. 4, 713–717 (1998).
Kim, C. Y., Quarsten, H., Bergseng, E., Khosla, C. & Sollid, L. M. Structural basis for HLA-DQ2-mediated presentation of gluten epitopes in celiac disease. Proc. Natl Acad. Sci. USA 101, 4175–4179 (2004).
Gough, S. C. L. & Simmonds, M. J. The HLA region and autoimmune disease: associations and mechanisms of action. Curr. Genomics 8, 453–465 (2007).
Dendrou, C. A., Petersen, J., Rossjohn, J. & Fugger, L. HLA variation and disease. Nat. Rev. Immunol. 18, 325–339 (2018).
Crux, N. B. & Elahi, S. Human leukocyte antigen (HLA) and immune regulation: how do classical and non-classical HLA alleles modulate immune response to human immunodeficiency virus and hepatitis C virus infections? Front. Immunol. 8, 832 (2017).
Jung, D. & Alt, F. W. Unraveling V(D)J recombination: insights into gene regulation. Cell 116, 299–311 (2004).
Dupic, T., Marcou, Q., Walczak, A. M. & Mora, T. Genesis of the αβ T-cell receptor. PLoS Comput. Biol. 15, e1006874 (2019).
Lu, J. et al. Molecular constraints on CDR3 for thymic selection of MHC-restricted TCRs from a random pre-selection repertoire. Nat. Commun. 10, 1019 (2019).
Klein, L., Kyewski, B., Allen, P. M. & Hogquist, K. A. Positive and negative selection of the T cell repertoire: what thymocytes see (and don’t see). Nat. Rev. Immunol. 14, 377–391 (2014).
Sakaguchi, N. et al. Altered thymic T-cell selection due to a mutation of the ZAP-70 gene causes autoimmune arthritis in mice. Nature 426, 454–460 (2003).
Kishimoto, H. & Sprent, J. A defect in central tolerance in NOD mice. Nat. Immunol. 2, 1025–1031 (2001).
Liston, A., Lesage, S., Wilson, J., Peltonen, L. & Goodnow, C. C. Aire regulates negative selection of organ-specific T cells. Nat. Immunol. 4, 350–354 (2003).
Emerson, R. O. et al. Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire. Nat. Genet. 49, 659–665 (2017).
Glanville, J. et al. Identifying specificity groups in the T cell receptor repertoire. Nature 547, 94–98 (2017).
Sharon, E. et al. Genetic variation in MHC proteins is associated with T cell receptor expression biases. Nat. Genet. 48, 995–1002 (2016).
Okada, Y. et al. Contribution of a non-classical HLA gene, HLA-DOA, to the risk of rheumatoid arthritis. Am. J. Hum. Genet. 99, 366–374 (2016).
Hinks, A. et al. Fine-mapping the MHC locus in juvenile idiopathic arthritis (JIA) reveals genetic heterogeneity corresponding to distinct adult inflammatory arthritic diseases. Ann. Rheum. Dis. 76, 765–772 (2017).
Chen, L. et al. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell 167, 1398–1414.e24 (2016).
Hennecke, J. & Wiley, D. C. Structure of a complex of the human α/β T cell receptor (TCR) HA1.7, Influenza hemagglutinin peptide, and major histocompatibility complex class II molecule, HLA-DR4 (DRA*0101 and DRBI*0401): insight into TCR cross-restriction and alloreactivity. J. Exp. Med. 195, 571–581 (2002).
Hahn, M., Nicholson, M. J., Pyrdol, J. & Wucherpfennig, K. W. Unconventional topology of self peptide–major histocompatibility complex binding by a human autoimmune T cell receptor. Nat. Immunol. 6, 490–496 (2005).
Deng, L. et al. Structural basis for the recognition of mutant self by a tumor-specific, MHC class II-restricted T cell receptor. Nat. Immunol. 8, 398–408 (2007).
Atchley, W. R., Zhao, J., Fernandes, A. D. & Drüke, T. Solving the protein sequence metric problem. Proc. Natl Acad. Sci. USA 102, 6395–6400 (2005).
Stadinski, B. D. et al. Hydrophobic CDR3 residues promote the development of self-reactive T cells. Nat. Immunol. 17, 946–955 (2016).
Christophersen, A. et al. Distinct phenotype of CD4+ T cells driving celiac disease identified in multiple autoimmune conditions. Nat. Med. 25, 734–737 (2019).
Qiao, S.-W. et al. Posttranslational modification of gluten shapes TCR usage in celiac disease. J. Immunol. 187, 3064–3071 (2011).
Dahal-Koirala, S. et al. TCR sequencing of single cells reactive to DQ2.5-glia-α2 and DQ2.5-glia-ω2 reveals clonal expansion and epitope-specific V-gene usage. Mucosal Immunol. 9, 587–596 (2016).
Han, A. et al. Dietary gluten triggers concomitant activation of CD4+ and CD8+ αβ T cells and γλ T cells in celiac disease. Proc. Natl Acad. Sci. USA 110, 13073–13078 (2013).
Petersen, J. et al. T-cell receptor recognition of HLA-DQ2–gliadin complexes associated with celiac disease. Nat. Struct. Mol. Biol. 21, 480–488 (2014).
DeWitt, W. S.3rd. et al. Human T cell receptor occurrence patterns encode immune history, genetic background, and receptor specificity. eLife 7, e38358 (2018).
Madi, A. et al. T cell receptor repertoires of mice and humans are clustered in similarity networks around conserved public CDR3 sequences. eLife 6, e22057 (2017).
Kula, T. et al. T-Scan: a genome-wide method for the systematic discovery of T cell epitopes. Cell 178, 1016–1028.e13 (2019).
Bentzen, A. K. et al. Large-scale detection of antigen-specific T cells using peptide-MHC-I multimers labeled with DNA barcodes. Nat. Biotechnol. 34, 1037–1045 (2016).
James, E. A. et al. Citrulline-specific Th1 cells are increased in rheumatoid arthritis and their frequency is influenced by disease duration and therapy. Arthritis Rheumatol. 66, 1712–1722 (2014).
Rims, C. et al. Citrullinated aggrecan epitopes as targets of autoreactive CD4+ T cells in patients with rheumatoid arthritis. Arthritis Rheumatol. 71, 518–528 (2019).
Cerosaletti, K. et al. Single-cell RNA sequencing reveals expanded clones of islet antigen-reactive CD4+ T cells in peripheral blood of subjects with type 1 diabetes. J. Immunol. 199, 323–335 (2017).
Acknowledgements
This work was supported in part by funding from the National Institutes of Health (grant nos. 1-U01-HG012009 to S.R., AR063759-05 to S.R., U01-HG009379-04 to S.R., U19-AI111224-06 to S.R. and T32GM007753 to K.A.L.). K.I. was supported by The Uehara Memorial Foundation. We thank M. Brenner, A. H. Jonsson and D. Rao for helpful feedback.
Author information
Authors and Affiliations
Contributions
K.I. and S.R. conceived and designed the study. K.I. conducted all the analyses with support from K.A.L., Y.L. and S.R. E.A.J. and J.H.B. generated and managed the TCR data from patients with RA. K.I. and S.R. wrote the initial draft of the manuscript. All co-authors contributed to the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Corey Watson, Giulio Isacchini and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 TCR structure in the discovery dataset.
a, The amino acid positioning scheme used in this study. Additional amino acids in longer CDR3s align to middle positions. b, Schematic explanation of the structure of CDR3. During T cell development in thymus, TCRs are generated by randomly recombining component genes (V, D, and J gene for beta chain). In addition, several nucleotides are randomly added or deleted at the junctional regions. c, The distribution of CDR3 amino acid length in the discovery dataset. d, The diversity and mutual information of amino acid composition at CDR3 positions (length = 15 amino acids). Normalized entropy (bar plot) and normalized mutual information (NMI, heatmap) of amino acid usage at each position of CDR3 and V/J gene usage were calculated in each individual, and the averaged values are provided. In the top heatmap, NMI is shown in a linear scale. In the bottom heatmap, NMI is shown in log scale. CDR3 positions 107–116, which directly contact antigenic peptides, are highlighted in red (b and d).
Extended Data Fig. 2 Statistical models used in this study.
a, Strategy to calculate amino acid frequencies in the main analysis. In this example, the alanine (A) usage ratio at CDR3 position 110 is calculated for each individual. b, Strategy to calculate amino acid frequencies for the linear mixed model (LMM) used to adjust the effect of V genes. In this example, alanine (A) usage ratio at CDR3 position 110 was calculated for each individual for each V gene. c, Schematic explanation of LM and MMLM, the two main linear models in this study. Each square indicates the dimensions of the matrix. In LM, the frequency of a single amino acid at a position of CDR3 is the response variable; the count of a single amino acid allele at a site of HLA is the explanatory variable. In MMLM, a vector of frequency of all amino acids at a given position of CDR3 is the response variable; the counts of all amino acid alleles except one at the HLA site are the explanatory variables. When we have 20 CDR3 phenotypes and the five HLA alleles, we need to conduct 100 LM tests to cover all combinations (as shown in d). On the other hand, MMLM model aggregates all 100 combinations into one single association test, maximizing the power of detecting associations. cov, covariates. d, On the left, we provide an association plot between the allele count of arginine (R) in HLA-DRB1 site 13 and the frequency of lysine (K) in position 109 of L13-CDR3. The P value from the LM analysis is provided (n = 628 donors; two-sided linear regression test). On the right, we provide a heatmap showing P values from all 100 LM tests. We also provide a heatmap showing the P values from a permuted dataset. Within each boxplot, the horizontal lines reflect the median, the top and bottom of each box reflect the interquartile range (IQR), and the whiskers reflect the maximum and minimum values within each grouping no further than 1.5 × IQR from the hinge.
Extended Data Fig. 3 Permutation analyses using the MMLM.
a, MANOVA test P values in cdr3-QTL analysis using the MMLM with permuted sample labels (n = 628; the discovery dataset). At each HLA site, P values of all CDR3 phenotypes are plotted. The black dashed line corresponds to the Bonferroni P value threshold (P = 0.05/24,360 total tests). b, QQ plots of MANOVA test P values in cdr3-QTL analysis using the MMLM with the real and the permuted sample labels (n = 628). Both have 24,360 data points. c, The distribution of minimum P values (Pmin) using the MMLM in each round of the 1,000 permutations (MANOVA test). We restricted this analysis to alleles at HLA-DRB1 site 13. In each round of permutation, we tested associations for all CDR3 positions (70 length-position combinations). The bottom 5 percentile of Pmin was 8.6 × 10−4, almost identical to the Bonferroni P value threshold (= 0.05/70 total tests = 7.1 × 10−4), which indicates that our P values are well calibrated. d, The distribution of variance in amino acid composition at position 109 of L13-CDR3 explained by the alleles at HLA-DRB1 site 13 in each round of the 1,000 permutations. Red vertical line denotes the observed variance explained in unpermuted data.
Extended Data Fig. 4 Variance explained in the MMLM analysis summarized across different lengths of CDR3.
Variance explained in the MMLM analysis (n = 628; the discovery dataset). The results for all HLA genes except HLA-DRB1 are provided. For each HLA site-CDR3 position pair, the largest variance explained across different CDR3 lengths is shown in a heatmap.
Extended Data Fig. 5 MMLM and LM results in the replication dataset.
a, Explained variance in the MMLM analysis in the discovery dataset (n = 628; peripheral blood) compared with that in the replication dataset (n = 169; naïve CD4+ T cells). All pairs of class II HLA sites and CDR3 phenotypes are shown without any filtering (9,735 data points). The results at HLA-DRB1 site 13 and the results with P < 0.05 in the replication dataset are highlighted. b, Explained variance in the MMLM analysis in the replication dataset (n = 169; naïve CD4+ T cells). For each HLA site-CDR3 position pair, the largest variance explained across different CDR3 lengths are shown in a heatmap. The results of HLA-DRB1 are provided. Only associations with P < 0.05 are colored in the heatmap. The results both for alpha and beta chains are provided. The pair with the largest variance is indicated by an asterisk. c, LM analysis using the replication dataset (n = 169; naïve CD4+ T cells). Effect sizes for non-transformed phenotypes from discovery and replication datasets are provided. The error bar indicates ± 2 × s.e. The nominally significant associations in the replication dataset are highlighted in red (P < 0.05). The analysis was restricted to the 388 CDR3 phenotypes (length-position-amino acid combinations) that had at least one significant association in the LM analysis (P < 0.05/1,249,742 total tests) and were testable in the replication dataset. For each CDR3 phenotype, we used the HLA amino acid allele that had the lowest P value for that phenotype in the LM analysis of the discovery dataset. We used P values from two-sided linear regression test.
Extended Data Fig. 6 Six sites in HLA-DRB1 have independently significant cdr3-QTL effects.
a, Structure of HLA-DRB1 protein and amino acid sites with independently significant cdr3-QTL effects (Protein database 2IAM). Positions 13, 71 and 74 are within the P4 binding pocket. On the left, we depict only HLA-DR molecules looking into the binding groove. In the middle, we depict the antigen (red) and CDR3 (dark blue) overlaid onto HLA-DR molecules. On the right, we depict HLA-DR, antigen, and CDR3 from a side view. b, Variance explained by six HLA-DRB1 amino acid sites with independently significant cdr3-QTL effects (n = 628; MMLM; the discovery dataset). The order of sites on the x-axis indicates the order of significance. c, The distances from HLA-DRB1 amino acid sites to antigen (Ag) or to CDR3. We analyzed five structures and the shortest distances in each structure were used. One-sided paired t test P values are provided (n = 5). Within each boxplot, the horizontal lines reflect the median, the top and bottom of each box reflect the interquartile range (IQR), and the whiskers reflect the maximum and minimum values within each grouping no further than 1.5 × IQR from the hinge.
Extended Data Fig. 7 Conditional analysis using four-digit classical alleles.
Conditional analysis using four-digit classical alleles (n = 628; MMLM; the discovery dataset). In the first conditioning analysis, to assess whether there were independent effects outside of the HLA-DRB1 locus, we conducted cdr3-QTL analysis using all four-digit classical alleles of HLA-DRB1 as covariates, and the strongest signal was found in HLA-B region. In the second conditional analysis, we additionally included all four-digit classical alleles of HLA-B as covariates. We sequentially included as covariates all four-digit classical alleles of the gene with the strongest signal until we did not observe further significant signal (P > 0.05/24,360 total tests). We excluded strongly correlated alleles among covariates (r2 > 0.8). We reported MANOVA test P values.
Extended Data Fig. 8 The pair-wise distances of amino acids in MHC-peptide-TCR complexes.
The distances (in Å) between HLA-DRB1 sites and antigen (left), CDR3 amino acids of beta chains and antigen (middle), and HLA-DRB1 sites and CDR3 amino acids of beta chains (right) are shown in heatmaps.
Extended Data Fig. 9 CDR3 amino acids associated with MHC-wide risk of RA, T1D, and CD.
CDR3 amino acids influenced by HLA risk score. We conducted the LM analysis using HLA risk score; the CDR3 phenotypes were each amino acid at each position of each length of CDR3 (n = 628; the discovery dataset). a-c, The effect sizes of significant associations for each amino acid at a given position are illustrated by sequence logo (P < 0.05/1,354 total test), separately for different CDR3 lengths (a, RA; b, T1D; and c, CD). We used P values from two-sided linear regression test.
Extended Data Fig. 10 Amino acid features at each position of CDR3 influenced by HLA risk score.
We conducted the LM analysis using HLA risk score in which the phenotypes were amino acid features at a given position of each length of CDR3 (n = 628; the discovery dataset). a, Effect sizes were plotted separately for different lengths of CDR3. Within each boxplot, the horizontal lines reflect the median, the top and bottom of each box reflect the interquartile range (IQR), and the whiskers reflect the maximum and minimum values within each grouping no further than 1.5 x IQR from the hinge. b, Meta-analyzed effect sizes were plotted (the results for charge and hydrophobicity are shown in Fig. 6c). The error bar indicates ± 2 × s.e.
Supplementary information
Supplementary Information
Supplementary Note and Figs. 1–21.
Supplementary Tables
Supplementary Tables 1–14.
Rights and permissions
About this article
Cite this article
Ishigaki, K., Lagattuta, K.A., Luo, Y. et al. HLA autoimmune risk alleles restrict the hypervariable region of T cell receptors. Nat Genet 54, 393–402 (2022). https://doi.org/10.1038/s41588-022-01032-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-022-01032-z
- Springer Nature America, Inc.
This article is cited by
-
Comparison of T-cell receptor diversity of people with myalgic encephalomyelitis versus controls
BMC Research Notes (2024)
-
Large-scale integrative analysis of juvenile idiopathic arthritis for new insight into its pathogenesis
Arthritis Research & Therapy (2024)
-
The immunopathogenesis of narcolepsy type 1
Nature Reviews Immunology (2024)
-
Anti-cytokine autoantibodies: mechanistic insights and disease associations
Nature Reviews Immunology (2024)
-
Adaptive immune receptor repertoire analysis
Nature Reviews Methods Primers (2024)