Introduction

Epileptic encephalopathies (EEs) represent a large group of severe early-onset conditions that are characterized by intractable seizures, cognitive impairment and neurological deficits associated with frequent epileptiform activity on an electroencephalography (EEG) and that generally have poor prognoses and often lead to developmental delay or regression [14]. Among the various forms of EE, West syndrome, or infantile spasms (IS), is one of the most commonly and closely observed epilepsy syndromes and is followed in prevalence by epileptic spasms, psychomotor regression, and a specific EEG pattern called hypsarrhythmia [57]. West syndrome usually occurs during the first year of life and is estimated to affect 2–3.5 per 10,000 children per year [4, 8]. The etiology of West syndrome is variable and comprises hereditary and environmental conditions. To a certain extent, this condition also causes some difficulties in reaching a precise diagnosis [4, 7]. Therefore, the investigation of etiological factors, especially genetic causes, can provide important insights into the mechanisms underlying West syndrome [7, 9].

A genetic etiology is believed to play a prominent role in most patients with West syndrome, and a set of genes linked to West syndrome have recently been reported to harbor mutations in several patients [2, 5, 10, 11]. First, mutations of two genes, ARX and CDKL5, have been identified in patients with X-linked familial West syndrome on the basis of separate family-based linkage and microsatellite-exclusion mapping analyses [12, 13]. Subsequent large studies have also confirmed that mutations in these two genes are the most common known causes of West syndrome [1417]. In addition, genetic variations in several other genes, including GRIN1, SPTAN1, SLC25A22, and STXBP1, have been found in patients by using a candidate gene sequencing approach and have been strongly implicated in West syndrome [1821]. The application of array comparative genomic hybridization (CGH) and SNP microarray technologies has identified a few strong but rare candidate genes and loci contributing to West syndrome risk, such as FOXG1, MAGI2, and MEF2C [2224].

Furthermore, genetic studies in recent years have established that de novo dominant mutations are often responsible for many cases of sporadic West syndrome. De novo mutations (DNMs) in MEF2C, SCN2A, and CDKL5 have recently been identified in five patients with West syndrome, by using high-throughput targeted sequencing of candidate genes located in EE-related CNVs [25]. Furthermore, the application of a whole-exome sequencing (WES) approach based on large cohorts of parent-offspring trios and quartets has offered a cost-effective method for the discovery of DNMs associated with EE across all gene-coding regions in the genome, thus leading to a rapid increase in the identification of determinants underlying West syndrome [26, 27]. In a recent study, a total of 329 DNMs in 305 genes have been identified in patients with EEs, by using WES in 264 parent-child trios [10]. In addition to known EE genes harboring recurrent mutations, including STXBP1 (n = 4) and CDKL5 (n = 2), a statistically significant enrichment of mutations in two novel genes, GABRB3 and ALG13, has indicated that these genes may be implicated in West syndrome and Lennox-Gastaut syndrome (LGS). Other recent studies using exome sequencing have identified DNMs in several genes associated with West syndrome, such as GRIN2B [28], GNAO1 [29], KCNT1 [30], and SPTAN1 [31]. Owing to the marked genetic heterogeneity underlying West syndrome, each genetic variant probably accounts for a small proportion of cases [2]. Therefore, many more genes associated with West syndrome are likely to be discovered as a result of advances in DNA sequencing and increases in cohort size.

To gain insight into the characterization of DNMs in West syndrome, we performed a WES study with four unrelated Chinese proband-parent trios of subjects with West syndrome. Consequently, we identified two deleterious DNMs in DNMT3A and CDKL5 and two compound heterozygous mutations in KMT2A. In addition, integrated analysis of spatiotemporal expression patterns, co-expression networks, gene expression enrichment, and genetic interaction networks suggests an etiological role of DNMT3A in EE. DNMs in DNMT3A are shared among EE, autism spectrum disorder (ASD), and intellectual disability (ID), thus further indicating that DNMT3A may be involved in the onset of sporadic neuropsychiatric disorders.

Methods

Patient Recruitment

Four unrelated trios of Han Chinese subjects with West syndrome were recruited from the Second Affiliated Hospital and Yuying Children’s Hospital of Wenzhou Medical University. This research protocol was approved by the ethics committee of Wenzhou Medical University and was carried out in accordance with the approved guidelines. Written informed consent for clinical and genetic analyses was obtained from all the guardians of the subjects participating in the study. Clinical evaluations of all the patients consisted of an assessment of the presentation of clinical seizures and an EEG recorded by a pediatrician experienced in EEs.

Whole-Exome Sequencing

Genomic DNA was isolated from peripheral blood leukocytes through standard phenol/chloroform extraction protocols. DNA quality and quantity were assessed by agarose gel electrophoresis and a NanoDrop 2000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA). A total of 2 μg of genomic DNA from each sample was used for library construction with an Agilent SureSelect Library Prep Kit according to the manufacturer’s protocol. Exome libraries were captured using an Agilent SureSelect Human All Exon v5 Kit (Agilent Technologies, Santa Clara, CA, USA) and were sequenced with an Illumina HiSeq 4000 through a 150-bp paired-end run (Illumina, San Diego, CA, USA).

Data Processing and Variant Calling

The raw data generated by WES were processed using the Trim Galore program to filter out adapters and low-quality reads. Then, the remaining reads were aligned/mapped to the human reference genome (GRCH37/hg19) by using the BWA program (version 0.7.12) [32]. After alignment, duplicated reads were removed using Picard tools, and only uniquely mapped reads were used for the detection of variation. After this process, three GATK tools, including RealignerTargetCreator, IndelRealigner, and BaseRecalibrator, were incorporated for realigning and calling the single-nucleotide variants (SNVs) and InDels, as well as for recalibrating the quality scores for each variant [33]. Finally, the sequencing depth and coverage were calculated on the basis of the unique aligned reads.

Detection and Annotation of De Novo SNVs and InDels

The ForestDNM and mirTrios programs were used to detect de novo SNVs [34]. We also utilized mirTrios software to identify putative de novo InDels and rare inherited variants. All the variants were annotated with ANNOVAR and an in-house bioinformatics tool with UCSC annotation (http://www.ncbi.nlm.nih.gov/refseq/).

To further assess the effects of missense variants, SIFT, LRT, SiPhy, and VEST3 were applied to obtain functional predictions. We considered a missense variant deleterious if the variant was consistently predicted to be deleterious or damaging in all four genetic damage prediction tools. For each variant, we obtained the minor allele frequency (MAF) according to normal population variant databases, including dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/), the 1000 Genomes Project (http://www.1000genomes.org/), ESP6500 (http://evs.gs.washington.edu/EVS/), ExAC(http://exac.broadinstitute.org/), CG69 (http://www.completegenomics.com/public-data/69-genomes/), and GWAS Catalog (https://www.ebi.ac.uk/gwas/). The variants with a MAF >0.1% were removed.

Primers and Sanger Sequencing Validation

Validation of each potential DNM was confirmed through conventional PCR assays and Sanger sequencing. All the primers used in these assays are listed in Table S1.

Construction of Gene Co-expression and Genetic Interaction Networks

To build the gene co-expression network, we utilized spatially and temporally rich transcriptome data generated from the BrainSpan project (http://www.brainspan.org/). The Pearson correlation coefficients (r) for the gene co-expression levels were calculated for each pairwise combination of genes. Strong co-expression (absolute correlation r ≥ 0.7) was selected between seed genes and correlated epilepsy candidate genes determined on the basis of the EpilepsyGene database [35]. To create the genetic interaction network, genetic interaction data were downloaded from GeneMANIA by utilizing the Cytoscape program plugin [36]. Afterward, we used GeneMANIA to calculate the numbers of nodes and edges between the seed genes and epilepsy candidate genes determined on the basis of the EpilepsyGene database. Both the gene co-expression and genetic interaction networks were visualized using the Cytoscape program [37].

Gene Co-expression Enrichment in Specific Brain Regions

All the genes in the co-expression network were collected to analyze the spatiotemporal expression enrichment in the 12 developmental stages, which contained expression data for all 16 brain regions. We used a previously published method [38] based on a modified z-score ≥2 to compute a gene expression signature for the gene lists in all 192 combinations of brain regions and developmental stages. The significance of enrichment was determined with a rank-based enrichment test of the spatiotemporal signatures with 100,000 permutations. Multiple tests were used to correct for all 192 region-stage combinations to obtain Q values by using the Benjamini-Hochberg correction (FDR) [39].

Results

Detection of DNMs

Four trios, each comprising an affected child and two unaffected parents, were included in this WES study. After the removal of adapters and low-quality bases, each sample had 6.96–11.26 Gb of clean data, with 49.01 Mb of bases covering the target regions (Table S2). In general, each sample had at least 99.15% of reads aligned to the reference genome, and on average, approximately 56.00% of effective bases were located in the target regions after removing PCR duplications. A minimum of 95.20% of the target regions were covered at least 4-fold, 92.20% of the target regions were covered at least 10-fold, and 84.00% of the target regions were covered at least 20-fold. Above all, these data fully reflect the reliability of our sequencing, thus providing a reliable basis for the follow-up analysis.

De novo SNVs and InDels in coding regions were identified in each sample by using ForestDNM and mirTrios software. To minimize false-negative results, the DNMs identified by both of these software approaches were considered to be putative DNMs and subjected to further annotation and Sanger sequencing validation. After PCR and Sanger sequencing validation, three de novo SNVs (Table 1) in coding regions of the four trios were confirmed. In addition, we also detected two compound heterozygous mutations in one trio by utilizing mirTrios software in this study.

Table 1 Summary of DNMs detected by trios-based WES of EE

Identification of EE-Associated Genes with DNMs

Recently, WES of parent-offspring trios has established that DNMs are responsible for many EE phenotypes, although the size of this contribution is not yet known [2]. For all the DNMs in various genes detected in this study, a statistical assessment for potential causative genes was not performed because of the small sample size. Nevertheless, we assessed pathogenic de novo variants on the basis of the concordance of four functional prediction tools (SIFT, VEST3, LRT, and SiPhy). According to the mutation type and the effect of each mutation on protein function, two de novo mutations in two genes (CDKL5 and DNMT3A) were clearly predicted to be functionally deleterious by all four of these prediction tools (Table 1).

In the proband E3P, the validated de novo variant was a G>A nonsense transition (c.528G>A) in the coding sequence of CDKL5 (MIM 300203), which changed a tryptophan (Trp) at position 176 into a premature termination codon (p.Trp176X) (Fig. S1; Table 1). As previously reported, CDKL5 is expressed primarily in the brain, muscles, and thymus [40] and is known to cause encephalopathy, an X-linked dominantly inherited disorder characterized by early infantile epileptic encephalopathy or atypical Rett syndrome [41].

By Sanger sequencing, we also confirmed that a novel missense mutation consisting of a G to T substitution (c.1755G>T) in DNMT3A (MIM 602769) was present in the proband E2P. This single-nucleotide change produced a nonsynonymous substitution (p.M585I), and similarly, the mutation was also suggested to be a damaging missense variant according to the prediction software tools (Table 1). One other DNM was detected in MAMDC2 in the proband E1P consisting of a nonsynonymous mutation (c.1329C>G) resulting in an isoleucine (Ile) to methionine (Met) substitution at amino acid 443 (p.I443M) in the MAM3 conserved domain. Notably, this mutation was thought to be tolerable or nonconserved on the basis of the predictions of the SIFT, VEST3, and SiPhy software tools (Fig. S1; Table 1).

In addition to the above DNMs, we identified compound heterozygous mutations of KMT2A in the proband E4P (Fig. S1; Table 1). A single-nucleotide substitution (c.4166C>G) inherited from the father was found in KMT2A (MIM 159555), thus resulting in an amino acid alteration (p.S1389C). A second nonsynonymous mutation (c.9382G>A, p.G3128S) was also identified, which was inherited from the mother. The functional prediction revealed that the two variants were individually suggested to be damaging or deleterious by three out of four prediction tools (Table 1).

Expression Profile of DNMT3A in the Human Brain

One of the underlying phenotypes of EE is abnormal brain development. To further investigate the potential contribution to EE of the damaging DNM in DNMT3A, we studied the spatiotemporal expression patterns of this gene in the human brain, on the basis of two specialized human brain gene expression databases: BrainSpan and the Human Brain Transcriptome (HBT, http://hbatlas.org/). The expression levels of DNMT3A for 16 different brain regions (cerebellar cortex, mediodorsal nucleus of the thalamus, striatum, amygdala, hippocampus, and 11 areas of the neocortex) and 12–15 different stages from embryonic development to adulthood were obtained from the BrainSpan and HBT databases (Fig. 1). First, it was found that DNMT3A was widely expressed across all the human brain regions as well as all the different developmental stages analyzed. Interestingly, continuously decreasing DNMT3A expression levels were observed from early embryonic stage to late adulthood in all the brain regions analyzed. Notably, a drastic decrease in DNMT3A expression occurred near birth and the expression level was maintained at a stable and low level thereafter. In addition, the spatiotemporal expression profile of DNMT3A in the human brain described above was highly consistent between the two different databases used, thus indicating the reliability of this finding. Overall, the widespread expression of DNMT3A in different human brain regions, especially its markedly higher expression level at early embryonic stages, indicates an important role of DNMT3A in early brain development. Therefore, disruption of this gene may cause diseases related to brain dysfunction, such as EE.

Fig. 1
figure 1

Expression analysis of DNMT3A in 16 human brain regions. Expression profiles of DNMT3A in BrainSpan (a) and HBT (b). The expression levels of DNMT3A are shown for the developmental stages from embryonic development to late adulthood. A solid line separates the periods into prenatal periods and postnatal periods. CBC cerebellar cortex, MD mediodorsal nucleus of the thalamus, STR striatum, AMY amygdala, HIP hippocampus, OFC orbital prefrontal cortex, DFC dorsolateral prefrontal cortex, VFC ventrolateral prefrontal cortex, MFC medial prefrontal cortex, M1C primary motor cortex, S1C primary somatosensory cortex, IPC posterior inferior parietal cortex, A1C primary auditory cortex, STC posterior superior temporal cortex, ITC inferior temporal cortex, V1C primary visual cortex

Co-expression and Genetic Interaction Network Analyses of DNMT3A

Epilepsy-related genes are often involved in brain development. To further interrogate the relationships of DNMT3A, CDKL5, and KMT2A with other genes associated with epilepsy in the context of human brain development, a co-expression network analysis was performed by utilizing spatially and temporally rich transcriptome data from the BrainSpan project. We observed a clear co-expression relationship between each of the three genes and candidate epilepsy genes obtained from the EpilepsyGene database (Fig. 2a). Among these networks, DNMT3A was co-expressed with 75 epilepsy candidate genes, including 13 predicted high-confidence genes for epilepsy, on the basis of gene prioritization conducted by adopting the annotations of 10 functional prediction tools from the EpilepsyGene database (Fig. 2a). Among the high-confidence genes, CHD2 was reported as a known causal gene for ASD and ID [25, 42, 43], and another three genes (NEDD4L, CASK, and DYRK1A) were considered to be associated with ASD [44]. In addition, KMT2A was co-expressed with 56 epilepsy candidate genes, including 13 high-confidence genes for epilepsy (Fig. 2a). Moreover, we found that 40 epilepsy candidate genes, including 8 high-confidence genes for epilepsy, were shared among the co-expression networks of DNMT3A and KMT2A (Fig. 2a). Notably, the shared high-confidence gene CHD2 was also predicted to be an ID candidate gene [25, 45]. The known causal gene for EE, CDKL5, was co-expressed with nine epilepsy candidate genes, most of which were also high-confidence genes (Fig. 2a).

Fig. 2
figure 2

Co-expression analysis and specific spatiotemporal patterns of expression in the brain. a Co-expression network analysis of DNMT3A, CDKL5, and KMT2A. Gene co-expression levels were estimated using the Pearson correlation coefficients (r) between each pair of genes. Candidate epilepsy genes were extracted from the EpilepsyGene database. b Enrichment of spatiotemporal gene expression signatures with the top-ranking genes spatially and temporally co-expressed with DNMT3A in the brain at specific developmental stages. P values were calculated according to a rank-based score followed by a permutation test and then converted to Q values for multiple hypothesis testing on the basis of FDR

To determine whether the co-expressed genes are enriched in specific brain regions or developmental stages, we used a permutation test in 100,000 iterations based on gene expression during human brain development to investigate the enrichment of the co-expressed genes across 16 human cortical and subcortical structures during 12 human developmental periods, from early embryonic stages to late adulthood. The DNMT3A and the co-expressed genes were significantly enriched (Q < 0.05) across two different early developmental stages: the early prenatal (10–12 PCW) and early mid-prenatal (13–15 PCW) stages (Fig. 2b). In contrast, significant enrichment was widely spatially distributed across many brain regions, including the amygdala (AMY), mediodorsal nucleus of the thalamus (MD), striatum (STR), and six areas of the neocortex (the primary auditory cortex (A1C), posterior inferior parietal cortex (IPC), primary motor cortex (M1C), medial prefrontal cortex (MFC), orbital prefrontal cortex (OFC), and primary visual cortex (V1C)) (Fig. 2b). Specifically, the most significant enrichment was observed in the A1C, AMY, MD, and V1C regions during the early prenatal stage (13–15 PCW, Q = 4.57 × 10−4) (Fig. 2b). These findings also indicated that genetic changes in DNMT3A and the co-expressed epilepsy-associated genes may affect early human brain development.

Furthermore, to gain insight into the functional associations of DNMT3A, CDKL5, and KMT2A with other genes related to epilepsy, we constructed a gene-interaction network model based on the interaction dataset collected from the GeneMANIA database. As a result, we observed varying degrees of genetic interactions between DNMT3A and 20 candidate epilepsy genes, including 8 high-confidence genes for epilepsy, obtained from the EpilepsyGene database (Fig. 3). Among the high-confidence genes, the GRIN2B was found to be shared with ASD and ID [46] and the CNTNAP2 and UBE3A were considered as high likely candidate genes for ASD [44, 46]. Similarly, KMT2A displayed genetic interactions with 42 candidate epilepsy genes, including 17 high-confidence genes. The known causal gene CDKL5 was associated with 40 candidate epilepsy genes, including 19 high-confidence genes. Notably, eight genes had interactions with both DNMT3A and KMT2A, whereas nine genes had interactions with both KMT2A and CDKL5, and five genes had interactions with both DNMT3A and CDKL5 (Fig. 3). In particular, three high-confidence genes for epilepsy, UBE3A, IER3IP1, and CNTNAP2 were present in the genetic interaction networks of DNMT3A, CDKL5, and KMT2A. Furthermore, we also found that KMT2A and SVIL have not only a genetic interaction but also display a physical interaction between both gene products. In addition, CDKL5 and the high-confidence gene MECP2 also exhibit a strong physical interaction (Fig. 3). These results suggest that DNMT3A, CDKL5, and KMT2A may have similar functions regarding interaction with several candidate epilepsy genes.

Fig. 3
figure 3

Genetic interaction network analysis of DNMT3A, CDKL5, and KMT2A. Nodes denote genes, and edges denote interactions between two genes. The thickness of an edge denotes the degree of genetic interaction. Candidate epilepsy genes were extracted from the EpilepsyGene database

Discussion

Advances in WES in recent years have accelerated understanding of the genetic etiology of sporadic EE, in which DNMs have been shown to be dominant contributors to causality [1]. Certain genes implicated in West syndrome have been identified in larger cohorts of trios with affected offspring [10, 14]. In this work, we used WES to search for DNMs in four parent-offspring trios affected by West syndrome in an effort to identify potential disease-causing DNMs on the basis of a bioinformatics analysis. As a result, we successfully identified three coding DNMs in three probands and two compound heterozygous mutations in one proband.

With proband E3P, one stop-gain DNM of the CDKL5 was identified and found to potentially cause protein truncation at position 176, thus suggesting that the function of the mutant protein would be impaired. Notably, the CDKL5 is a well-documented known causal gene for EE [12] and other severe brain disorders [47]. In one study of ten sporadic EE trios using WES, one de novo stop-gain mutation at position 832 of the protein encoded by the CDKL5 has been found to be a likely cause in one proband [14]. Notably, the truncation of the CDKL5 protein at the upstream position 176 found in our study is expected to cause more severe defects than the truncation of this protein at the downstream position 832 found in the referenced study. In another large-scale study of 264 EE trios that has used WES, three DNMs have been found on the CDKL5, and the significantly high mutation rate observed suggests that CDKL5 is one of the most common disease-causing genes [10].

The DNMT3A is located on chromosome 14q32 and encodes a DNA methyltransferase that is essential for establishing methylation during embryogenesis and can methylate unmethylated and hemimethylated DNA, thereby playing important roles in genomic imprinting and X chromosome inactivation [48]. In this study, with the aid of a trio-based WES, we found a de novo SNV (c.1755G>T, p.Met585Ile) located in exon 8 of DNMT3A in the proband E2P (Fig. 4a, b). We also found that this mutation was not present in any large control population study, including the Exome Aggregation Consortium database, 1000 Genomes Project, dbSNP 147, and the NHLBI Exome Sequencing Project. Similarly, according to a search of the published literature, this mutation has not been previously reported. Additionally, the residue Met585 is evolutionarily conserved among different vertebrate species, and the variant was predicted to be deleterious on the basis of multiple bioinformatic algorithms (Fig. 4c; Table 1). In the analysis of the spatial and temporal expression profiles of DNMT3A in the human brain, high expression levels of DNMT3A at the embryonic stage (4–8 PCW) across multiple brain regions indicated that the gene may play an important role in early brain development. The co-expression network analysis in BrainSpan suggested that DNMT3A is co-expressed with several genes known to be associated with epilepsy in different human brain regions and developmental stages. Furthermore, the genetic interaction network analysis provided additional evidence that the DNMT3A can interact with genes known to be associated with epilepsy. These observations suggest that DNMT3A may play an important role in epilepsy.

Fig. 4
figure 4

DNMs identified in DNMT3A. a The DNM of DNMT3A (c.1755G>T, p.M585I) was confirmed in E2P by using Sanger sequencing. b Schematic representation of the DNMT3A protein and the DNMs of DNMT3A identified in EE, ASD, and ID. Amino acid changes are shown in black for ID, brown for ASD, and red for EE. c The conservation of the variant c.1755G>T, p.M585I among different species

Notably, the DNMs have been repeatedly reported in WES studies of probands with different sporadic neuropsychiatric disorders, such as EE, ASD, ID, and SCZ [46, 49]. In a recent study of 13 unrelated patients with human overgrowth syndromes, characterized by ID and a distinctive facial appearance, 13 different de novo heterozygous mutations have been identified in DNMT3A (Fig. 4b; Table 2) [50]. Notably, seizures have been reported in two unrelated patients, thus further suggesting an important role of DMNT3A in epilepsy. Additionally, other four different de novo nonsynonymous mutations and one de novo frameshift mutation in DNMT3A associated with ASD have been found in five different trios from three different cohorts (Fig. 4b; Table 2) [42, 43, 51]. These results suggest that DNMs of DNMT3A are shared among EE, ASD, and ID. In fact, a shared genetic etiology between EE and other neuropsychiatric disorders, including ASD and ID, has previously been reported [52], and in this study, we report a new gene shared among EE, ASD, and ID. We are not clear about the exact mechanisms for genotype-phenotype correlation with respect to different DNMT3A mutations. However, it is proposed that the nature of the different mutations, alternative neurobiological conditions, genetic background, and other factors including environmental influence, mutation timing, and epigenetic factors could affect the genotype to phenotype relationship [2, 25, 50, 53].

Table 2 The shared DNMs of DNMT3A among ID, ASD, and EE

On the basis of the predictions from the UniProt database, DNMT3A contains three functional domains: a PWWP domain, an ADD domain and a C-terminal SAM-dependent MTase C5-type domain (Fig. 4b). Notably, most of the DNMs in DNMT3A were identified in these functional domains in ASD, ID, and EE, with the exception of a frameshifting insertion (c.401dupA, p.Asn134fs) in ASD (Fig. 4b; Table 2). It has been shown that these domains play an important role in DNMT3A function. In this study, the DNM resulting in Met585Ile occurred in the ADD domain, which inhibits the enzymatic activity of the catalytic domain by blocking the binding of this domain to DNA, according to previous structural and biochemical analyses [54]. The disruption of the ADD domain may affect the recognition of unmethylated lysine 4 of histone H3 and, therefore, the regulation of DNA methylation, thus indicating that the ADD domain of DNMT3A is essential for establishing methylation [54]. In addition, two paralogs of DNMT3A: DNMT3B and DNMT3Lwere also examined as for their possible role in EE or other neuropsychiatric disorders based on previous studies. DNMT3B was considered a candidate gene for ID [55]. As for DNMT3L, the interaction of DNMT3L with DNMT3A and DNMT3B was well documented [50], which may indicate its role in similar genetic disorders.

In the proband E2P, we successfully identified two compound heterozygous mutations (p.S1389C and p.G3128S) in KMT2A (MLL1), which were individually predicted to be pathogenic by three bioinformatic algorithms. KMT2A encodes a histone methyltransferase that methylates histone H3 lysine 4 (H3K4) and mediates chromatin modifications, playing an essential role in the regulation of gene expression during early development and hematopoiesis [56]. It has been reported that MLL1-deficient mouse neural stem cells located in the subventricular zone could efficiently differentiate into glial lineage cells; hence, KMT2A might be implicated in neurogenesis in the mouse postnatal brain [57]. To date, this gene has not been clearly determined to be associated with any neuropsychiatric disorders. Nevertheless, in one recent study of 2270 trios with ASD, two different DNMs in KMT2A have been described in two unrelated ASD probands [43]. Another DNM in KMT2A has also been identified in a proband with ASD, by using a WES approach [42]. Moreover, one DNM (c.334A>G; p.M112V) in KMT2A has recently been described in a gene discovery study that conducted WES in 264 trios [10]. Additionally, analogously to the spatiotemporal expression and enrichment patterns for DNMT3A, KMT2A exhibited high expression levels and was co-expressed with epilepsy-associated genes that were also significantly enriched in early brain developmental periods across widespread brain regions (Fig. S2). These observations indicated that KMT2A may be a potential candidate gene for EE.

Through the results obtained by performing high-coverage WES and an in-depth bioinformatics analysis, we identified three different de novo coding mutations in the four trios with West syndrome. In addition to the clear link of CDKL5 to EE, the heterozygous DNMs of DNMT3A identified in this study suggest an important role of DNMT3A in EE, but elucidation of the molecular mechanism will require further genetic and functional studies. In particular, the application of WES should aid further understanding of the genetic architecture of EE and facilitate the molecular diagnosis of EE.