Introduction

Hirschsprung disease (HSCR), or congenital intestinal aganglionosis, is a rare, complex and life-threatening birth defect of the intestine. It was named after Dr. Harald Hirschsprung who comprehensively described in 1888 two unrelated infants died from abdominal distension as a consequence of congenital megacolon–the dilatation and hypertrophy of the colon [1]. It was only recognized decades later that the cause of the disease is due to the absence of the enteric nervous system (ENS) (also referred as the ‘Second Brain’) in the distal narrowed colon rather than the proximal dilated segment [2,3,4]. HSCR is by far the most recognized disease model of enteric neurocristopathy. The lack of enteric ganglia in the hindgut of HSCR patients arises from the incomplete colonization of the ENS progenitors derived from the enteric neural crest cells (ENCCs) due to their underlying genetic defects in migration, proliferation and/or differentiation.

Thanks to Professor Prem Puri’s untiring efforts, a comprehensive account of this complex condition can be found in his authoritative textbook [5]. The clinical presentation of HSCR is highly variable, with subtypes primarily defined by the length of aganglionosis and comorbidities with other congenital malformation. The majority of the HSCR patients are classified as short-segment HSCR (S-HSCR, 80%) where the aganglionic segment is limited to the rectal and distal sigmoid colon. Cases with more severe phenotype are classified as either long-segment HSCR (L-HSCR, 15%) when the aganglionosis extends proximal to the sigmoid colon or as total colonic aganglionosis (TCA, 5%) when the entire colon is affected [6]. Particularly for S-HSCR, there is a sex bias with a male preponderance in a ratio of 4:1. HSCR typically presents sporadically and in isolation (70%) or concomitantly as part of the phenotypic spectrum of several neurodevelopmental syndromes (30%). It has been well recognized as a multifactorial genetic disorder with a pattern of inheritance varying largely among these disease subtypes.

Since the first report of familial segregation hinting at the high heritability of the disorder, a number of linkage-based genetic studies have uncovered the genetic causes in a substantial fraction of HSCR patients. Recent advances in genotyping and massive parallel sequencing technologies further highlight the remarkable genetic complexity, including genetic predisposition by common genetic modifiers, mutational burden and genetic interaction, underlying the disease pathogenesis. While recent reviews have summarized the major HSCR genes and their associated biological pathways in relation to the development of the ENS [6, 7], this review aims to focus on the implication of these genetic findings to clinical genetic testing and disease risk prediction.

The roadmap of genetic researches on HSCR

Familial aggregation of HSCR has been noted since 1920s. The substantial genetic contribution to the heterogeneous etiology was first supported by the evidence of a higher familial incidence among siblings (4%) than in the general population (0.02%) from the notable work on families of HSCR [8, 9]. Later, the landmark segregation analysis on 487 probands and families by Badner et al. (1990) further provided precise estimates of the recurrence risk stratified by the extent of aganglionosis, sex of the proband as well as of sex of the siblings and children [10]. Recurrence risk is highest for children (male: 27–29%; female: 21–22%) of a girl with L-HSCR/TCA and is lowest for children of HSCR patients (< 1%) with aganglionosis restricted to only the rectosigmoid region. To date, these estimates remain the standard reference for informed genetic counselling and the most valuable epidemiological data as indications for clinical genetic testing.

In the past decades, family based studies not only informed the high disease heritability but also implicated the polygenic nature and non-Mendelian inheritance in the majority of HSCR. Indeed, familial HSCR cases have contributed to most of the discoveries of HSCR genes that are linked to the monogenic dominant forms of the disease (Fig. 1). The first HSCR gene, RET receptor tyrosine kinase, was mapped in early 1990s through linkage analyses of multiplex HSCR families assisted by prior reports on the co-occurrence with multiple endocrine neoplasia type 2 (MEN2) [11,12,13,14,15]. Similarly, EDNRB was identified as the second major HSCR gene by linkage analysis in an extended inbred Mennonite kindred that has high incidence of HSCR as one of the clinical features of Waardenburg syndrome type 4 (WS4 syndrome)[16,17,18]. Using these early genetic techniques, candidate gene studies on comorbid disorders further gave rise to discoveries of a number of HSCR genes, including PHOX2B, ZEB2, SOX10, and KIFBP, in which their loss of function (LoF) of which is pathogenic to the syndromic forms of HSCR [19,20,21,22]; however, it was also realized that most of these mutations were family-specific and were unlikely to account for the majority of the sporadic and isolated HSCR cases.

Fig. 1
figure 1

Timeline of genetic discoveries of HSCR. The lower panel denotes the genetic technologies widely used in genome-wide scale for the genetic discoveries within the period. WS4 Waardenburg syndrome type 4, GWAS genomewide association analysis, NGS next generation sequencing, WES whole exome sequencing, WGS whole genome sequencing, S-HSCR short-segment HSCR, L-HSCR long-segment HSCR, TCA total colonic aganglionosis

Common polymorphism (variant with frequency > 1% in general population) is another key contributor to phenotypic variation through regulation of gene expression and epigenetic modifications. The early findings of the association of common single nucleotide polymorphisms (SNPs) in RET with HSCR marked an important milestone in our understanding of the complex genetic landscape of the sporadic form of HSCR [23,24,25]. Represented by the non-coding intron 1 SNP (rs2435357), these RET common variants were found in higher frequencies in HSCR patients compared to controls as well as in Asians than in Caucasians, thereby accounting for the population differences in disease incidence. Mechanistically, through disrupting transcription factor binding and hence lowering gene expression, these RET common variants predispose to dysregulation of ENCC migration and impairment of neurogenesis, which results in increased risk of HSCR. Over the past decade, with the technological advances in SNP array-based genotyping, five GWAS and one multi-ethnic meta-analysis interrogating association of millions of SNPs have been carried out [26,27,28,29,30,31]. In addition to RET, these genome-wide approaches further identified two novel loci, neuregulin-1 (NRG1) and semaphorin 3C or 3D (SEMA3), confidently associated with HSCR. Of note, while the association of NRG1 was universal across populations, that of the SEMA3 locus is European-specific with the risk allele being absent in Asians. Altogether, these findings on common variants collectively explain 10–20% of the phenotypic variance of HSCR. Expanding the meta-analysis by increasing the sample size and including disease cohort of diverse populations may further uncover the “hidden heritability”.

Rare variants in novel genes are another potential source of missing heritability. Recent genetic studies using NGS approaches, such as whole exome sequencing (WES) and whole genome sequencing (WGS) studies, on tens of severe HSCR trios (proband and unaffected parents) and > 150 sporadic HSCR cases have uncovered a dozen new HSCR candidate genes, including DENND3, NCLN, NUP98, TBATA, ERBB2, ERBB3, BACE2, PTK2, ITGB4, ACSS2, ENO3, SH3PXD2A and UBR4 [32,33,34,35,36]. These candidate genes were mostly discovered by statistical enrichment of deleterious mutations or by segregation analysis in a simplex family. Unlike the HSCR genes identified from the familial and syndromic cases, mutations in these new genes typically have moderate effect size and lower penetrance. Although some of these genes were demonstrated to be involved in ENS development using zebrafish or human-induced pluripotent stem cell models, their causal molecular mechanisms in disease pathogenesis remain largely unknown. A closer understanding on the pathological cell biology is indeed critical to an accurate interpretation of their contribution to disease risk. More genetic and functional studies are needed to firmly establish the gene-disease validity and to evaluate which, what, and how these mutations can lead to the clinical manifestation of HSCR.

Identification of disease genes and interrogation of the underlying molecular mechanisms are the very first steps to understand the etiology of the disorder. Ultimately, translating these genetic findings to routine clinical practice to improve disease management is the primary goal of all genetic studies. While the clinical diagnosis of HSCR does not rely on genetic testing, these genetic findings are instrumental in clinical genetic testing as well as polygenic risk prediction.

Potentials and challenges for clinical genetic testing in HSCR

Unlike research-based sequencing study, clinical utility and medical actionability are the most important considerations for clinical genetic testing. For HSCR, results of the clinical testing are most informative in the context of (i) evaluating risk of developing comorbid hereditary cancer syndromes (e.g., medullary thyroid carcinoma (MTC)), influencing (ii) family planning; (iii) reproductive options; and (iv) preimplantation and prenatal genetic diagnosis for the patients and their family members.

RET protooncogene is the major HSCR gene. Over 100 rare damaging, germline protein-altering RET mutations have been reported, either as de novo or inherited events, in approximately 50% of familial and 15–20% of sporadic HSCR cases. Aligning with the recurrence risk, RET damaging mutations were more frequently found in patients with L-HSCR/TCA than in S-HSCR patients [37]. These damaging coding mutations are predominantly heterozygous mutations inactivating RET. Paradoxically, RET-activating missense mutations were also found in HSCR patients. Among these, gain-of-function mutations in exon 10 (known as “Janus” mutations) affecting codons 609 (18%), 611 (2%), 618 (32%), and 620 (48%), are pathogenic to MTC (including MEN2A, MEN2B or familial medullary thyroid carcinoma) [38]. Collectively, these mutations were estimated to have a penetrance of 80% for MTC by the age of 50. In view of the incremental benefit for early clinical surveillance, it was recommended by both the American Thyroid Association and the European Thyroid Association to screen at least exon 10 of RET for MEN2-associated mutations in all patients with HSCR [39,40,41].

Regarding genetic testing for primary disease risk prediction, it was reported that a large majority of adult HSCR patients and parents of children with HSCR showed definite or possible interest in reproductive genetic counselling and prenatal testing, which may help guide their reproductive decision making[42]. Although recent advances in genetic studies help unravel the genetic architecture of the disease, there remains several barriers to the translation of these findings to clinical practice in HSCR. First, in order to minimize unnecessary anxiety for receiving inconclusive findings, clinical genetic testing is mainly limited to HSCR genes showing “definitive” gene-disease relationships, i.e. RET and other HSCR genes linked to syndromic HSCR (as summarized in Table 1), which restricts the application to familial and syndromic HSCR cases. Second, the large proportion of variants of uncertain significance (VUS) even in the well-established HSCR genes introduces additional uncertainty in clinical genetic testing. According to the standard guidelines of the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) most widely implemented by clinical laboratories to date, only pathogenic and likely pathogenic variants are considered as medically actionable. Taking RET as example, of the 34 variants submitted to ClinVar for clinical testing of HSCR patients (by Dec 2022), majority of them (~ 59%) are classified as VUS and all of which are missense changes. The remaining pathogenic/likely pathogenic variants are either protein truncating mutations or de novo missense changes. In addition, variable penetrance also poses challenges in variant interpretation. Variable penetrance often refers to the presence of a rare damaging mutation in known gene that does not always manifest in disease. Although LoF of RET is confidently linked to HSCR, there were multiple reports in which rare likely pathogenic LoF mutations were found in unaffected parents in heterozygous or mosaic forms [43, 44]. Several theories have been proposed to account for the incomplete penetrance and the most appealing explanation is epistasis. Epistasis refers to the genetic interaction in which the effect of one genetic variant differs depending on other (modifier) variants. In the case of RET, the high risk allele (T) of the common regulatory variant (rs2435357) may modify the penetrance of damaging mutations in trans on another chromosome via the joint effect on the final dosage of functional gene product. Compound inheritance of rare damaging mutation and rs2435357 was found to explain the clinical manifestation in several HSCR families41. Meanwhile, it also conferred a fivefold increase in the risk of S-HSCR [33]. Such an epistatic effect not only supports a sensitized genetic background but also suggests the importance of examining the haplotype configuration with rs2435357 to fully interpret the pathogenicity of RET damaging mutations.

Table 1 Known HSCR genes with definitive gene-disease association for clinical genetic testing

Future genetic research direction: validation of association of genes, variants, and polygenic risk scores

To overcome these hurdles of clinical translation, large-scale genetic and mechanistic studies should be performed to improve the diagnostic yield and to extend the screen to other sporadic L-HSCR/TCA patients with higher recurrence. In line with the ClinGen clinical validity framework, in order to expand the gene panel to identify additional pathogenic variants, more case- or case–control level of evidence and replication on segregation or statistical association are needed to conclusively establish the gene-disease association for the new candidate genes. Gene editing on non-human model organism/s or human-surrogate models (e.g., human-induced pluripotent stem cells) demonstrating disease pathogenicity and subsequent rescue in human/non-human models are needed to provide the experimental causative evidence.

Likewise, to reduce the number of VUS and to increase actionability, trio-based sequencing design or targeted Sanger sequencing on suspected VUS (ACMP-AMP criteria PS2: de novo occurrence or PM3: detected in trans with a pathogenic variant for recessive inheritance) as well as follow-up standardized functional assay on the variant (PS3: well-established functional studies supportive for damaging effect) are highly recommended to provide additional strong or moderate evidence of pathogenicity. In addition, database should be setup to curate all the mutations found in patients worldwide in known or candidate HSCR genes. Such database will be instrumental in estimating the prevalence of the variants recurrently found in cases compared to controls and to provide strong level of evidence for pathogenicity if the relative risk/odds ratio is high (PS4: prevalence of variant in affected individuals is significantly increased compared with controls). For very rare variants found in multiple, unrelated HSCR patients not reaching statistical significance, the recurrence together with their absence in public database can also be used as moderate level of evidence.

Generally, broader genetic testing, such as exome/genome-wide genetic testing should be performed only once in the lifetime and the results should be well-documented in the patient’s health record. Due to the rapid evolution of genetic testing, new genes may be added to the disease gene panel and additional evidence for variant classification may arise. Patients and clinicians should be aware of the possibility of reanalysis, particularly when there is change in reproductive plans.

Like other complex diseases, HSCR is genetically heterogeneous. Genetic risk derived from polygenic burden of both common and rare variants with small to moderate effect can be comparable to that derived from only rare variants of large effect [45]. Recently, a polygenic risk model computed based on genetic data of 190 European HSCR patients and 740 controls suggested that the risk of HSCR with both rare and common variants is collectively larger than that that with only rare coding variants or only common regulatory variants[32]. Overall, this study implied that using polygenic risk score (PRS) aggregating genetic risk of many genetic variants with low penetrance can help recover the missing heritability of HSCR. Although the use of PRS in clinical setting is still immature currently, harnessing PRS across the whole genome in future can be as important as genetic testing on rare coding variants to stratify patients with high risk of HSCR.

To summarize, genetic studies on HSCR have revealed new insights on the genetic architecture of the disease; however, genetic factors underlying the variable disease prognosis, complications as well as the association with chromosomal anomalies remain unknown [46,47,48]. Global efforts of researches are needed to fill this gap of knowledge. In coming years, genomics researches will generate large amount of multi-omics data of patients, animal, and stem cell-based models. It is envisioned that the development of deep learning and other machine learning approaches to explore and integrate the big data will revolutionize disease risk prediction. Although there are challenges that needed to be overcome before the clinical translation of the genetic findings and the deployment of artificial intelligence in medical applications, it is optimistic that leveraging these findings will pave the way toward precision medicine in the near future by facilitating the development of personalized genetic risk prediction and eventually alternative therapies.