Introduction

Intellectual disability (ID) is a common morbidity affecting at least 1% of the population, a fraction that represents the extreme end of the distribution curve of cognitive capacity in humans.1 The natural history of ID is usually a stable course of impaired cognition presenting in early childhood as delayed acquisition of speech and other cognitive domains, and persisting into adulthood with variable degrees of limited mental function. Alternatively, ID may follow a progressive course characteristic of neurodegenerative diseases of childhood (ID only applies to the developing brain by definition2) where loss of cognitive skills follows a period of normal development.

The exact contribution of genetics to ID is unknown. Previous estimates relied on the recognition of identifiable genetic syndromes or positive family history even though the absence of these criteria is still compatible with a genetic etiology of ID. The extreme genetic heterogeneity of ID was a major impediment to the establishment of molecular diagnosis in the absence of a recognizable clinical syndrome or positional mapping data that guide the search for the likely candidate gene. The development of genomic tools that are agnostic to the phenotype marked a dramatic change in the diagnostic approach to individuals with ID. Molecular karyotyping greatly improved our understanding of the role of copy number variants in cognitive phenotypes and is now the recommended first-tier diagnostic test in children with ID.3, 4 Genomic sequencing (whole-exome or whole-genome) is a more recent development and has been found to identify causal mutations in up to 50% of cases.5, 6, 7, 8 Thanks to these recent technological developments, the 'morbid genome' of ID, defined as the sum of genetic and genomic mutations identified in the context of ID phenotypes, has expanded greatly in the recent years and is likely to expand further as more patients get tested.

Despite the established diagnostic yield of genomic sequencing in the setting of ID, previously published studies reported on pre-selected individuals, that is, those who had been through the 'routine' testing strategy that failed to identify the causal mutation. Studies that evaluate the clinical diagnostic yield of genomic sequencing on 'naive' ID individuals are needed, however, to inform recommendations regarding the routine use of this technology in clinic. In an attempt to address this gap in our knowledge about the clinical diagnostic yield of genomic sequencing, we report our experience with prospective application of genomic analysis on all individuals with ID (or delayed cognitive development in the case of younger children) who were referred to our clinical genetics service by their pediatricians or pediatric neurologists.

Materials and methods

Human subjects

Individuals with a documented intelligent quotient of 70 or less were eligible for the study. Younger children (<5 years) were eligible if developmental assessment by a pediatric neurologist revealed delayed acquisition of speech and other cognitive developmental domains regardless of whether other developmental domains were also involved (cases were labeled as developmental delay or global developmental delay accordingly). All subjects were evaluated by board-certified neurologists and clinical geneticists. Clinical evaluation included standard medical and family history and clinical examination. Subjects underwent brain imaging (magnetic resonance imaging or computed tomography), a metabolic screen (plasma carnitine, acylcarnitines, amino acids, ammonia and lactate), complete blood count, electrolytes, and liver, renal and thyroid function tests as part of standard clinical evaluation. A written consent was signed by the parents (or legal guardians) of all subjects prior to enrollment (KFSHRC RAC# 2121053). Once enrolled, blood was collected in ethylenediaminetetraacetic acid collection tubes for genetic analysis from the index and available family members in parallel with the standard clinical evaluation.

Genomic testing algorithm

Simplex cases underwent molecular karyotyping and sequencing by a multi-gene panel that encompasses 758 genes with published link to various neurogenetic diseases as previously described.9 If negative, we proceeded with whole-exome sequencing (WES). Male simplex cases who lack major facial dysmorphism also underwent FRAXA testing using a standard Southern blot protocol combined with triplet-repeat PCR. Familial cases underwent the multi-gene panel sequencing and, if negative, WES was carried out. Familial cases consistent with dominant inheritance also underwent molecular karyotyping while those consistent with X-linked inheritance also underwent FRAXA testing, in parallel with the multi-gene panel. When negative, WES was pursued (Figure 1). The exception to the above workflow is cases enrolled prior to the availability of the multi-gene panel where this step was replaced by WES directly (Figure 1). Although cases were analyzed for all applicable modes of inheritance, we routinely conducted autozygome analysis for additional support when the identified variants are recessive in nature as described before.10, 11, 12

Figure 1
figure 1

Flowchart that summarizes the workflow of the study. WES, whole-exome sequencing.

PowerPoint slide

The technical details of molecular karyotyping, multi-gene panel and WES, including the rationale of including specific genes in the multi-gene panel, are described elsewhere.9, 13 in brief, molecular karyotyping was performed using CytoScan HD (Affymetrix, Santa Clara, CA, USA). This array platform contains 2.6 million markers for copy number variation (CNV) detection, of which 750 000 are genotyping single-nucleotide polymorphisms and 1.9 million are non-polymorphic probes for the whole-genome coverage. The analysis was performed using the Chromosome Analysis Suite version Cyto 2.0.0.195(r5758). Calling of pathogenic CNVs was in accordance with the ACMG (American College of Medical Genetics and Genomics) guidelines.14 The label 'solved' in the context of molecular karyotyping was only used if the CNV met the definition of 'pathogenic' or 'unknown significance-likely pathogenic' according to these guidelines. For the multi-gene panel, design used Ion AmpliSeq Designer software (Life Technologies, Carlsbad, CA, USA). Primers were then synthesized and pooled into two multiplex reactions based upon PCR compatibility minimizing likelihood of primer-primer interactions. For WES, each DNA sample was treated to obtain the Ion Proton AmpliSeq library using Exome Primer Pools, AmpliSeq HiFi mix (Thermo Fisher, Carlsbad, CA, USA). Libraries for the multi-gene panel and WES were run on Ion Proton instrument (Thermo Fisher). Calling of variants in previously reported disease genes using genomic sequencing followed the recently published guidelines by the ACMG.15 The label 'solved' in the context of single-nucleotide variants (SNVs) was only applied to cases who harbored 'pathogenic' or 'likely pathogenic' SNVs, according to these guidelines, that explain the phenotype.

For novel recessive candidate disease genes (this category is not covered by the ACMG guidelines), we only report those with variants that meet the following criteria: (a) minor allele frequency<0.001 based on 1500 Saudi exomes, (b) fully segregates with the disease by testing all available family members, (c) locus supported by positional mapping data and (d) loss of function (LOF) or at least a likely pathogenic nature of the variants. For novel dominant candidate disease genes (this category is also not covered by the ACMG guidelines), we only report in this study those with variants that meet the following criteria: (a) de novo nature of the variant with confirmed paternity, (b) novel based on 1500 Saudi exomes and ExAC and (c) LOF or at least predicted pathogenic nature of the variants based on in silico prediction. LOF was defined as nonsense, frameshift indels and canonical splicing mutations. Missense mutations were only considered if at least two of three in silico software (PolyPhen, SIFT and CADD) assigned a high pathogenicity score to the variant (PolyPhen score of >0.90, SIFT score <0.05 and CADD >20). Additional supportive evidence was sought from the published literature (for example, known link to brain development, neuronal function or animal models).

We also conducted our own computational structural analysis of mutants (Supplementary Computational Biology Materials). Sequences were retrieved from the Uniprot database. BLAST and SwissModel16 were used to search for suitable structural templates in the Protein Data Bank. SwissModel and RaptorX17 were used to produce homology models. Models were manually inspected, and mutations evaluated, using the Pymol program (pymol.org). Disorder and secondary structure elements were predicted using RaptorX. Transmembrane helices were predicted using Phobius.18 Functional information was compiled from various resources, including Uniprot, InterPro,19 and publications associated with the model templates used.

Results

Genomic analysis is more sensitive than standard clinical evaluation

The total number of eligible cases was 337. The rate of consanguinity defined as any degree of parental relatedness equal to or closer than third cousins was 76% (255/337). Male to female ratio was 163:173, and 45 and 50% were below 5 years of age, respectively. The proportion of syndromic versus non-syndromic ID was 152:183 (two were equivocal). These and all other characteristics, including relevant clinical data can be found in Supplementary Table S1. All simplex cases were molecular karyotyped (n=178) and simplex male ID cases who lacked major dysmorphism as well as familial cases potentially consistent with X-linked inheritance had Fragile-X testing (n= 87). One or more specific clinical entity was suspected on clinical basis (standard clinical evaluation) in 54 (16% sensitivity) but only 38 were subsequently confirmed by genomic analysis (70% specificity).

On the other hand, the sensitivity of genomic tests (excluding Fragile-X testing) was 57% (193/337) based on previously reported disease genes or CNVs and 74% (249/337) if the variants identified in novel genes are included (see below). Molecular karyotyping revealed pathogenic or likely pathogenic CNVs in 27% of tested cases (48/178). The multi-gene panel revealed a pathogenic (n=23) or likely pathogenic (n=31) SNVs in 34% of tested cases (54/157). WES uncovered pathogenic or likely pathogenic SNVs in 39% of tested cases (91/232) (Figure 1 and Supplementary Table S1). WES was only applied after a negative multi-gene panel and/or molecular karyotyping in many cases; however, we would like to highlight that in 129 of cases, WES was applied directly because these cases were recruited prior to the availability of the multi-gene panel. This explains the overwhelming majority of cases in which WES identified a mutation in a known disease gene. In six cases, however, we note that the multi-gene panel failed to identify the causal mutation in a known disease gene that was subsequently identified by WES (Supplementary Table S1). The fact that we applied WES directly on 129 of the cases gives us the opportunity to also calculate the diagnostic yield of WES without prior application of multi-gene panel at 60%. A detailed breakdown of the diagnostic yield of the various genomic tests based on age, gender, syndromic vs non-syndromic and consanguinity vs non-consanguinity is provided in Table 1. Of note, although most causal SNVs identified are recessive, the majority of these recessive mutations (65%) were 'private', that is, completely absent in the heterozygous state in 1500 ethnically matched exomes, which is highly consistent with our recent finding that, contrary to conventional assumptions, founder mutations account for a minority of recessive mutation in our population.20

Table 1 Diagnostic yield for each platform based on gender, age, syndromic vs non-syndromic and consanguinity

Expanding the morbid genome of ID

Pathogenic and unknown significance-likely pathogenic CNVs

Of the CNVs identified in this cohort, eight (15%) are novel (seven were assigned as pathogenic according to ACMG guidelines and one as unknown significance-likely pathogenic), whereas 46 are known pathogenic CNVs. Pathogenic CNVs include de novo deletion of 1476 kb (Chr18:47279692-48756541) in 13DG1493, which encompasses SMAD4 (MIM 600993), and deletion of 502 kb in 15DG1036 (Chr17:44212416-43710395), which encompasses KANSL1 (MIM 612452), thus confirming the diagnosis of Myhre (MYHRS [MIM 139210]) and Koolen-De Vries syndromes (KDVS [MIM 610443]), respectively, although neither was suspected clinically. Similarly, the de novo chr3:70986209-71412654 deletion in 15DG1264 led to complete loss of FOXP1 (MIM 605515), which was not suspected clinically despite the overlapping dysmorphology profile with the very few cases that have been reported with de novo point mutations this gene.21 A full list of the identified pathogenic and unknown significance-likely pathogenic CNVs is listed in Supplementary Figure S1 and Supplementary Table S1.

Pathogenic and likely pathogenic SNVs

Expanding the allelic spectrum of established disease genes and supporting the candidacy of previously reported candidate genes: Of the 145 pathogenic or likely pathogenic SNVs identified in this study, 68 (47%) are novel and involve previously reported disease genes (Supplementary Table S1, Supplementary Figure S1). The correct clinical diagnosis was only suspected in a minority of the solved cases, partly because of marked phenotypic differences compared with the published phenotype. For example, the deep intronic mutation in COG5 (confirmed at the real-time polymerase chain reaction level) was associated with global developmental delay, microcephaly, cleft palate, ambiguous genitalia and agenesis of corpus callosum, a constellation that is distinct from the hypotonia, ataxia and cerebellar hypoplasia described in CDG2I.22 Similarly, Rett syndrome was not suspected in 14DG1903 with a de novo MECP2 truncating variant because the head circumference remained normal despite the progressive neuroregression. 13DG0035 is another unusual case of phenotypic expansion where a de novo GNAS variant was associated with global developmental delay, brain heterotopia, severely hypoplastic scrotum and thyroid agenesis (Table 2).

Table 2 Atypical presentations of known disease genes

In addition, we were able to identify additional likely pathogenic alleles that support the candidacy of previously reported candidate disease genes. These include ASTN1, HELZ, THOC6, WDR45B, ADRA2B and CLIP1, all of which were reported as candidate disease genes based on single mutations.23, 24, 25, 26 The homozygous LOF variant we identified in C12orf4 in case 16DG0275 is the same we previously published when we reported C12orf4 as a novel candidate.25 Both cases have non-syndromic ID (Supplementary Table S1).

Expanding the genetic heterogeneity of ID

Novel genes with two independent homozygous SNVs 12DG1579 who presented with global developmental delay, microcephaly and epilepsy was found to have a homozygous truncating variant in DENND5A NM_015213.3:c.3811del: p. (Gln1271Argfs*67). 16DG0219 presented with an identical phenotype and was also found to have a homozygous likely pathogenic variant in the same gene NM_015213.3:c.1622A>G: p. (Asp541Gly) (Table 3, Supplementary Table S1, Supplementary Clinical Data)). NEMF was found to harbor a homozygous truncating variant NM_004713.4:c.1235_1236insC: p. (Pro413Serfs*10) in 12DG0891 and her sister who both presented with ID and hypotonia. A homozygous truncating mutation in DNHD1 (NM_144666.2:c.12347dup: p. (Gln4117Alafs*14) was identified in a case with global developmental delay and cerebellar dysgenesis (16DG0296) (Table 3, Supplementary Table S1, Supplementary Clinical Data). Through an international collaboration, we were able to identify additional patients with overlapping phenotypes and homozygous truncating variants in these two genes (NEMF: NM_004713.3:c.2517_2520del: p.Gly841Argfs*27, and DNHD1: NM_173589.3: c.103delC: p. (Leu36Trpfs*11), Supplementary Table S1, Supplementary Clinical Data). Comparison of the phenotype of these patients is provided in the Supplementary Clinical Data.

Table 3 List of novel/candidate genes and the corresponding clinical summary

Novel candidate genes Thirty-two genes not previously linked to human diseases were found to have single candidate variants (Table 3, Supplementary Table S1 and Supplementary Clinical Data). These include a de novo nonsense mutation in TADA1, and homozygous LOF variants in the following novel candidate genes suggesting their complete or near-complete deficiency in the ID subjects who harbor them: CDH11, PIP5K1A, PIANP, NUDT2, AP3B2, PLK2, QRFPR, UBE4A, PROCA1, TUBAL3, TP53TG5, ATOH1, SLC39A14, BTN3A2, SYDE2 and ZMYM5. Previously unreported missense or in-frame variants that are predicted to be pathogenic were identified in the following additional novel genes: KLHL24, MAMDC2, USP2, C16orf90, CPNE6, UFC1, HIR, TRERF1, RGL1, FEZF2, ARFGEF3, FAM160B1, SLC45A1, ARHGAP33 and CAPS2. Of note, there was a sufficient number of affected members in the family of 10DG0264 that a single locus could be established by positional mapping that spans NUDT2 (Supplementary Figure S2) providing additional support of pathogenicity. Similarly, positional mapping of the two cases (16DG0295 and 16DG0606) with the candidate variant in AP3B2 revealed a single shared ROH with the same haplotype (Supplementary Figure S2). A genomic map of the novel variants identified in this study (CNVs, SNVs in known genes and SNVs in novel candidate genes) are shown in Supplementary Figure S1. In addition, 3D modeling data that support the pathogenic nature of missense variants we identified in novel candidate genes are shown in Supplementary Computational Biology Materials.

Discussion

Several cohorts have been published to describe the diagnostic yield of genomic sequencing in individuals with ID, which ranged from 27 to 50%.27, 28 Those studies clearly demonstrate the usefulness of genomic sequencing compared to molecular karyotyping, which has an average clinical diagnostic yield of 11%.3 However, because the subjects in those studies are typically pre-selected based on negative 'routine' workup that included sequencing of one or more likely candidate gene, they do not address the question of whether genomic sequencing can be utilized as a first-tier test. This study is an attempt to address this deficiency in the literature.

Multi-gene panels offer the advantages of relatively low cost and ease of interpretation compared to WES or whole-genome sequencing.9 By applying this technique in parallel with molecular karyotyping on 226 individuals with ID, we were able to provide a likely molecular diagnosis to 45%, compared with 21% by molecular karyotyping alone. The application of WES to those with negative results on molecular karyotyping and/or multi-gene panel provided a likely etiology in 22% (40% if novel candidate genes are counted). In the hypothetical scenario of having applied WES to all cases, we estimate an overall yield of 43% (60% if novel candidate genes are counted) assuming it will detect all the variants in the multi-gene panel and none of the CNVs, although it is very likely that larger CNVs would also have been identified. Reassuringly, this is consistent with the diagnostic yield we observed when WES was indeed applied in lieu of multi-gene panel before the latter was available. Importantly, we show that the yield (based on known disease genes only) of genomics first approach remains high even if we limit our analysis to non-consanguineous cases (55%), which suggests that our findings have relevance to outbred populations as well.

Consistent with other studies, many of the molecular lesions identified by genomic techniques were not suspected clinically, which highlights their power in overcoming the limited sensitivity and specificity of unaided clinical evaluation of individuals with ID.29 This new trend of 'reverse phenotyping' or 'genotype to phenotype' made possible by the application of clinical genomics will continue to grow.30 As shown by the illustrative examples in Table 2, the potential of this approach to unravel the full spectrum of phenotypes associated with each disease gene will greatly enhance our ability to interpret the phenotypic consequences of variants.

One obvious advantage of WES is its ability to identify novel disease genes. As highlighted previously, it is critical that these candidates are made available to facilitate matchmaking, which in turn can establish their bona fide link to disease in humans.23, 25, 31 The majority (59%) of the novel candidate genes we report in this study harbor homozygous LOF variants that render the affected individual natural knockout for the respective gene.32 In the case of DENND5A, an additional missense mutation was also identified in a second family with an overlapping phenotype. DENND5A encodes a guanine-nucleotide exchange factor that activates Rab39b, which is also mutated in ID patients.33 Similarly, NEMF, which we found to be homozygously truncated in two families with ID, encodes a protein that directly interacts with MECP2 in the brain to form a complex that was proposed to mediate the pathogenesis of MECP2, a gene with established link to severe neurodevelopmental disorders in males and females.34 Although very little is known about the protein encoded by DNHD1, we note that this is another gene in which we identified more than one homozygous truncating mutation in two independent families with ID phenotypes, which substantiates the link we propose between DNHD1 mutations and ID.

Strong links to brain development and function support the candidacy of other candidate genes. For example, PLK2 deficiency was found to prevent homeostatic shrinkage and loss of dendritic spines, and to impair memory formation, making its biallelic loss of function a likely cause of ID.35 QRFPR is one of four significantly downregulated genes in the prefrontal cortex of the spontaneously hypertensive rat, a model for schizophrenia and attention-deficit/hyperactivity disorder.36 A knockout mouse model is available for Ap3b2 and exhibits marked neurobehavioral abnormalities and epilepsy, likely due to abnormal synaptic vesicle protein composition.37 The pontocerebellar hypoplasia observed in the individual with ATOH1-related ID is faithfully recapitulated in the knockout mouse model.38 Similarly, CPNE6 is necessary for synaptic plasticity and the knockout mouse displays deficient hippocampal long-term potentiation.39 The knockout mouse model of FEZF2 displays abnormal development of the cortex and corticospinal tract.40, 41, 42

Although there is no available mouse model for SYDE2, knockout of its closely related paralog Syde1 results in reduced docking of synaptic vesicle at the active zone and impaired synaptic transmission.43 SYDE2 and SYDE1 are the mammalian orthologs of SYD-1, which is required for axonal guidance in Caenorhabditis elegans, and Syd-1, which regulates pre- and postsynaptic maturation in Drosophila.44, 45 KLHL24 is widely expressed in rat brain, particularly in the cortex and hippocampus and is involved in glutamate receptors regulation.46 CDH11 is also expressed in cortical neurons and has been demonstrate in vivo to control migration and differentiation of neuroprogenitors.47

Even when the molecular diagnosis is based on a variant in a novel gene with essentially no published data on the natural history of the disease, there is a potential for the molecular diagnosis to influence the clinical management. A good example is our finding of a homozygous truncating mutation in SLC39A14 in 14DG0924, a girl with unexplained neurodegenerative disease that resulted in progressive dystonia and cognitive impairment with associated lesions in the basal ganglia. SLC39A14 encodes ZIP14, a transporter of trace elements.48 Evaluation of trace elements in the affected individual’s blood revealed a markedly elevated level of manganese (see Supplementary Clinical Data). This prompted us to initiate chelation therapy with excellent response in terms of manganese level. Clinical monitoring is ongoing.

Genomic testing of individuals with ID offers a higher diagnostic yield than the standard workup. Furthermore, recent studies show that it is cost-effective.49 The data we present in this study suggest that genomic sequencing should be considered early on in the diagnostic workup of these individuals in parallel with or after a negative result of molecular karyotyping.