Keywords

1 Introduction

Leprosy, like other infectious diseases, was widely accepted as hereditary in the pre-microbiological era. The revolutionary finding of a microorganism—originally named Bacillus leprae—in lesions of leprosy-affected individuals led Gerhard H. Armauer Hansen to fiercely refute the belief that leprosy was inherited. Today, scientists have clearly shown that exposure to M. leprae is necessary but not sufficient to explain leprosy occurrence, and several genes and genomic regions have been implicated in the complex genetic mechanism controlling host susceptibility to leprosy at different stages of the disease (Fig. 3.1).

Fig. 3.1
figure 1

Schematic representation of the clinical classification spectrum of leprosy. TT tuberculoid-tuberculoid, BT borderline-tuberculoid, BB borderline-borderline, BL borderline-lepromatous, LL lepromatous-lepromatous, I indeterminate, PB paucibacillary, MB multibacillary, MDT multidrug therapy, T1R type-1 reaction, T2R type-2 reaction

2 Genetics of M. leprae and the Origins of Leprosy

The complete sequence of the M. leprae genome was first published in the early 2000s. Compared to M. tuberculosis, the M. leprae genome shows strong reductive evolution as the bacteria specialized as an obligatory intracellular parasite in humans [1]. Since then, whole genome analysis has provided insights about several aspects of leprosy, including the history of the disease. For example, in 2018, genome sequences of ten M. leprae DNA samples obtained from the remains of medieval Europeans produced a snapshot of the last 1500 years of leprosy history in the European continent. M. leprae from four distinct phylogenetic branches were found among the ten samples, some matching modern strains from different locations around the world. This study highlights the diversity of M. leprae strains in medieval Europe, and the authors proposed new models for leprosy dissemination: (1) the introduction of strains from different parts of the world into Europe, which may have happened before the medieval era, or (2) the onset of leprosy occurred in Western Eurasia or in Europe, and not in western Africa, as previously proposed [2].

Regarding leprosy pathogenesis, bacterial genomics also identified a novel mycobacterial species named M. lepromatosis [3], a rare mycobacterium that apparently causes a distinct form of leprosy and is mainly found in Central America [4].

Comparative analysis of M. leprae isolates from different parts of the world confirmed the conserved nature of its genome. The low variability of the M. leprae genome suggests that the wide variety of responses observed upon exposure to the pathogen is largely controlled by host genetic factors. The hypothesis has been reinforced by observations such as familial aggregation of cases, a higher concordance rate of leprosy phenotypes in monozygotic as compared to dizygotic twin pairs [5], and the presence of a strong major gene effect controlling leprosy, as demonstrated by complex segregation analysis [6]. Although powerful to detect the existence of a genetic component controlling a specific trait, these observational studies do not provide any information about the identity of the genes or the nature of the genetic variants underlying the identified effect; for that, molecular studies are necessary.

3 Leprosy Genes and Genomic Loci

Genetic epidemiology approaches have successfully identified genes and genetic variants impacting upon susceptibility to infectious diseases, including leprosy [7]. The molecular nature of the genetic component controlling host susceptibility to leprosy has been intensively investigated by candidate gene studies, genome-wide linkage or association searches, and, more recently, genome/exome/target DNA sequencing approaches. A brief description of selected genetic findings in leprosy is presented next.

3.1 Major Histocompatibility Complex (MHC) Genes

In leprosy, clinical manifestation of disease depends on the Th1/Th2 balance that is partially controlled by antigen-presentation and cell-cell interactions via MHC genes. The MHC locus located in chromosome 6p21.32-p22.2 harbors the three classes of the human leukocyte antigens (HLA), which include genes that are key mediators of host immune responses. In fact, the first genetic risk factors described for leprosy susceptibility were variants of the MHC complex.

Perhaps the most well-known genetic association with leprosy are alleles of the HLA-DRB1 gene (rev. in [8]). Variants of HLA-DRB1 were associated with resistance or susceptibility to leprosy in samples from Brazil, Vietnam [9], and China [10], and the markers near the HLA-DRB1 locus were the most significant association signal identified in the first genome-wide association (GWA) study in leprosy [11]. A case-control analysis in a New Delhi sample observed consistent association between leprosy and variations of HLA-DRB1 and HLA-DQA1, another well-described HLA class II leprosy susceptibility locus [12].

HLA class I (A, B, and C) has been also intensively studied in leprosy, and HLA-A*2, A*11, B*40, and Cw*7 are some examples of alleles detected more often among leprosy cases as compared to non-affected controls [13]. Class I HLA molecules interact with killer cell immunoglobulin-like receptors (KIR); in a south Brazilian cohort, KIR alleles were associated with tuberculoid leprosy [14]. Of note, the HLA-B*13:01 allele was shown associated with dapsone hypersensitivity syndrome [15], an observation that highlights the importance of HLA genes in the control of drug toxicity during treatment and opens the road for pharmacogenomics in leprosy.

Investigation of a cohort of 22 Vietnamese multiplex leprosy families resulted in evidence of linkage between leprosy type and two microsatellite markers of the TNF-α gene (TNFA) located in the HLA class III region [16]. This finding is in agreement with evidences of association between promoter polymorphisms of TNFA and clinical manifestation of leprosy (rev. in [8]). A study demonstrated that a functional single nucleotide polymorphism (SNP) located at base pair +80 of the LTA gene, located immediately upstream TNFA, is associated with early-onset leprosy [17]. Finally, variants of additional HLA-linked genes, such as TAP, MICA [18], and MICB have also been described in association with leprosy phenotypes in different populations, the latter two recently replicated in the New Delhi population sample [12].

Nowadays, the challenge is to dissect the exact nature underlying HLA association with leprosy. The MHC/HLA locus is a highly polymorphic gene-rich region presenting long-range linkage disequilibrium (i.e., cross-association between alleles). The complexity of the MHC/HLA locus makes pinpointing the actual causative variant very difficult; yet, a few studies tackled this challenge. A fine mapping of the HLA complex in the Vietnamese and Indian population narrowed the association to two intergenic SNPs close to HLA-C in the HLA class I region [19]. Two studies in 2020 investigated in depth the HLA complex. In the first, a family-based GWAS identified three independent signals, two in the HLA class I region and one in HLA class 2 close to HLA-DQA1 [20]. The second applied deep sequencing to study 11 HLA class I and II genes at the amino acid level. The authors identified haplotypes of HLA-DRB1, HLA-DQA1, HLA-DRB3, HLA-B, and HLA-C alleles associated with susceptibility or protection against leprosy. Furthermore, the authors were able to narrow down the association to four independent amino acids (i.e., HLA-DRβ1 57D and 13F, HLA-B 63E and HLA-A 19K), a major advance toward the understanding of the complex pattern of association of HLA genes with leprosy [21].

3.2 Non-HLA Genes

To date, numerous non-HLA variants of different genes have been described as leprosy genetic risk factors, with most of the early evidence being produced by hypothesis-driven, candidate gene studies. These types of studies are limited in scope but have been very powerful to detect relevant genetic association between leprosy phenotypes and genes such as SLC11A1 (an iron transporter across the phagosome membrane), VDR (vitamin D receptor), IL10 (a Th2 cytokine), and TLR1 (a pattern recognition receptor), among others (rev. in [22, 23]).

More recently, hypothesis-free approaches have been consolidating as an alternative to candidate gene studies, extending the reach of the investigation to the entire genome and allowing the discovery of previously unsuspected genes. In 2001, the first genome-wide linkage analysis for leprosy identified a paucibacillary leprosy susceptibility locus at chromosomal region 10p13 [24], but only in 2010 the first candidate gene emerged from that chromosomal region: a non-synonymous SNP located at MRC1 was associated with leprosy in both Vietnamese and Brazilians [25]. The MRC1 gene was later associated with paucibacillary leprosy in individuals from southwest China [26]. Two years later, fine mapping of the 10p13 identified the CUBN gene associated with multibacillary leprosy in Vietnamese [27].

Interestingly, the linkage signal for paucibacillary leprosy at chromosome 10p13 was replicated in a second genome-wide scan that, most importantly, identified a strong linkage peak for leprosy per se on chromosome 6q25-q27 [28]. Subsequent fine mapping of the 6q25-q27 locus led to the first successful positional cloning of genetic variants impacting on risk of an infectious disease: two SNPs located at the shared regulatory region of the PRKN and PACRG genes were found independently associated with leprosy per se in two population samples from Vietnam and Brazil [29]. These findings triggered an exciting series of subsequent studies aiming to fully understand the impact of the 6q25-q27 locus in general and the PRKN/PACRG genes in particular upon leprosy risk and the disease physiopathology. In addition, two studies successfully replicated the PRKN/PACRG associations [30, 31], and more sophisticated analyses have revealed interesting nuances of the exact nature of the association signals observed. For example, a study performed in Vietnamese and Indian populations showed that the linkage disequilibrium structure and the age at disease diagnosis are crucial for the association of PRKN/PACRG with leprosy per se [31]. Finally, an effort to completely dissect the strong linkage signal identified at the 6q25-q27 locus led to the identification of a second association hit with leprosy per se near the SOD2 gene, coding a superoxide dismutase, in two independent Brazilian population samples [32].

How parkin, an E3 protein-ubiquitin ligase encoded by PRKN, is involved in the pathophysiology of an infectious disease is a question that has been generating very exciting results. For example, a remarkable study demonstrated that parkin is a critical player controlling susceptibility to Mycobacterium tuberculosis infection in mice, with a particularly important effect upon autophagy; the same authors demonstrated that parkin also modulates susceptibility to other intracellular pathogens—such as L. monocytogenes—in different species, indicating a highly conserved evolutionary role in innate immunity for this protein [33]. Interestingly, leprosy patient that experienced excessive inflammatory responses shared PRKN mutations observed in Parkinson’s disease (PD) cases. This genetic overlap between leprosy and PD highlight the key role of PRKN as a mediator of host inflammatory responses [34].

In addition to the HLA-linked variants previously mentioned [11], a GWAS in the Chinese population reported polymorphisms of six non-MHC genes—TNFSF15, NOD2, RIPK2, LRRK2, CCDC122, and LACC1—significantly associated with leprosy. Since then, several studies have validated/replicated the original findings. The CCDC122 and LACC1 genes, both located at chromosome 13q14.11, were replicated in population samples from India, Mali [35], Vietnam [36], Brazil [37], and China [38]. The NOD2 gene was validated in Nepal [39], Vietnam [36], Brazil [37], and China [38]. The RIPK2 gene was replicated in Indian [40] and Vietnamese individuals.

Several suggestive findings from the original GWAS have been later explored either by expanding the initial population sample or by applying hypothesis-driven approaches. As results, many additional non-HLA genes were identified significantly associated with leprosy, including IL23R [41], BCL10 [42], CCDC88B [43], MED30 [44], and TYK2 [45], among others (rev. in [46]). While these studies expanded the number of genes and pathways contributing to leprosy susceptibility, one of the most exciting findings has been the overlap of genes associated with both leprosy and inflammatory bowel disease (IBD) [47]; studying the genetic and molecular component shared between these two apparently distinct phenotypes may pave the road to drug repurposing and perhaps the development of alternative therapies for both diseases.

4 Leprosy Reactions

Permanent disabilities caused by leprosy reactions are a major disease burden likely to persist even under the unlikely scenario of leprosy elimination as a public health problem. Since leprosy reactions may occur years after completion of leprosy treatment, identifying predictive risk factors—genetic or otherwise—for leprosy reactions is a major research goal.

Genetic epidemiology studies on leprosy reaction are few compared to other leprosy phenotypes. Variants of the TLR2 and TLR1 genes were the first associated with leprosy type-1 reaction (T1R) [48,49,50]. Variants on the NOD2 gene were associated with both T1R and T2R in Nepal [39]; however, these SNPs were not the same associated with leprosy per se in the leprosy GWAS [11]. In Brazilians, functional IL6 promoter variants that regulate IL6 plasma levels were associated with leprosy T2R reaction [51]. A subsequent study using survival analysis showed that the same IL6 variants were associated with the time of leprosy reaction onset [52].

Based on the observation that several studies failed to replicate the association between TNFSF15/TNFSF8 and LRRK2 genes and leprosy per se led to investigations of these genes as candidates for T1R. Variants near the TNFSF15/TNFSF8 genes were associated with risk for T1R [53, 54] in both Vietnamese and Brazilian population samples. In Vietnamese, two LRRK2 amino acid changes (R1628P and M2397T) and a set of variants regulating gene expression were also preferentially associated with T1R [34, 55]. In 2019, using a targeted resequencing approach, researchers identified additional rare LRRK2 amino acid changes associated with T1R [34]. Remarkably, in the same study, the authors have reported that T1R leprosy cases carried rare PRKN damaging mutations, while T1R-free leprosy did not. This was an interesting observation that places parkin as a central mediator of multiple leprosy phenotypes, as noncoding variants near parkin are established risk factors for leprosy per se. A GWAS comparing T1R-affected versus T1R-free leprosy cases identified regulatory variants of a long noncoding RNA (lncRNA) ENSG00000235140 associated with T1R in Vietnamese and Brazilians. Apart from this novel lncRNA, all other genes reported for T1R had also previously been associated with leprosy per se.

5 New Insights

Based on the exposed above, it is difficult to undervalue the contribution of genetics to the advance of the understanding of the molecular basis of leprosy susceptibility. However, it is also true that most of the identified associations provide a small contribution to leprosy risk, thus explaining only part of the large heritability estimated for the disease by observational studies. This may be partially be due to the fact that classic linkage and association studies (candidate gene-based and GWAS) rely on the use of informative, thus polymorphic, markers with a minimum allele frequency (MAF) higher than 1% [56], which leaves out an entire fraction of the human genetic variation represented by rare variants (MAF < 1%). With the development of novel sequencing techniques, it is now possible to investigate rare or structural variations at a relatively low cost. Thus, analysis of complete genomes/exomes or targeted protein coding regions is likely to find additional genetic factors with an impact on leprosy risk. Using this strategy, a recent study involving whole-exome and target sequencing identified a rare missense in the HIF1A gene influencing host susceptibility to leprosy in Han Chinese [57]. Furthermore, susceptibility to leprosy is very likely to depend on other sources of variation such as differential methylation of Cs and Gs, histone modification, and DNA translocations, a field of research yet to be systematically explored.

Finally, new, creative, or better-defined phenotypes are beginning to be explored with exciting results. For example, it is known that continuously exposed patients may suffer from leprosy recurrence, a poorly explored disease phenotype. Recently, a pilot study revealed an enrichment of homozygous genotypes for the risk alleles of genes classically associated with leprosy among two out of three cases of leprosy recurrence when compared to three nonrecurring leprosy patients. The study, although limited to a description of a series of cases, suggests the existence of a genetic profile of particularly high innate leprosy susceptibility among patients that may predispose to disease recurrence [58].

6 Perspectives

Genetics and genomics of complex traits in general and of infectious diseases in particular are a vibrant and productive field of medical research. The discovery of functional variants initially identified through genetic approaches and later confirmed in functional studies may lead to better protocols for diagnosis, treatment, and prevention of disease. One possibility is the development of laboratory tests using panels of reliable disease markers coupled with bioinformatics and artificial intelligence tools aiming at producing predictive indicators of prognosis or response to treatment. The description of variants and their impact on protein function can be an initial step toward identifying new therapeutic targets eventually leading to the development of much needed new and more efficient leprosy therapeutic protocols, with fewer side effects and better patient compliance. Moreover, the characterization of leprosy genetic susceptibility markers can lead to important advances in the field of other infectious, inflammatory, or chronic degenerative diseases such as tuberculosis and Parkinson’s and Crohn’s diseases [36, 46, 59,60,61].

In summary, our understanding of the genetic mechanisms controlling the classic leprosy phenotypes, such as disease per se and clinical subtypes, is fairly advanced, particularly as compared to other infectious diseases. However, secondary but interesting phenotypes, such as disease recurrence, age of onset, and even leprosy reactions, still need in-depth investigations as they represent the latest frontiers in leprosy genetic research.

7 Comments on Human Genetics of Buruli Ulcer

Buruli ulcer (BU), caused by Mycobacterium ulcerans, is the third most common mycobacteriosis in the world after tuberculosis and leprosy [62]. BU presents a wide spectrum of clinical manifestations ranging from single, small lesions to severe ulcers, osteomyelitis, osteitis, and joint involvement (see Chaps. 42 and 43).

Similar to T1R in leprosy, BU patients may also develop an abrupt cell-mediated inflammatory reaction, known as a paradoxical reaction [63] (see also Chap. 43).

Host genetic susceptibility to BU is a relatively unexplored field; however, exciting results have been produced through different approaches following the leprosy model. Classic candidate gene studies, usually targeting genes and loci associated previously with tuberculosis and leprosy, have revealed association between BU and genes SLC11A1 [64], PRKN, NOD2, ATG16L1 [65], iNOS, and IFNG [66]. Of note, the SLC11A1 gene was associated with both BU per se [64] and the paradoxical reaction [67], while NOD2 has only been associated with the most severe form of the disease [65]. A first BU GWAS led to the description of two loci containing the lncRNAs ENSG00000240095.1 and LINC01622 associated with the disease [68]. Finally, whole-exome sequencing of a pair of sisters belonging to a co-sanguineous family and displaying a severe form of the disease revealed a microdeletion on chromosome 8p23.1 as the most likely causative genetic variant [69].