Introduction

Autism spectrum disorder (ASD) describes a group of neurodevelopmental diseases with high heritability and is characterized by social communication impairment, repetitive behaviors and restricted interests.1, 2 Genome-wide association studies identified a handful of genomic loci, including 1p13.2 in the Han Chinese population we previous reported,3 but continued to fall short in their reproducibility.

Progress in whole-exome sequencing (WES)4, 5, 6, 7, 8, 9 and whole-genome sequencing (WGS)10, 11, 12, 13, 14 allowed their preferential employment in thousands of ASD subjects to detect de novo mutations (DNMs) involved in ASD15, 16 and other neuropsychiatric disorders,17, 18 and enabled detection of rare pathogenic variations and risk-genes. In addition, several studies employed protein–protein interaction (PPI) and co-expression networks for ASD-risk genes to investigate biological pathways they involve,8, 19, 20, 21, 22 and found that candidate genes were mainly implicated in chromatin remodeling, Wnt signaling, transcriptional regulation and synaptic functions.8, 16 Moreover, previous studies revealed that ASD-risk genes exhibited dramatic spatiotemporal expression patterns in human brain,19, 20, 23, 24 accelerating the understanding of ASD pathology. However, known ASD risk-genes can explain only a small fraction of the probands.8, 22 Therefore, prioritizing novel and effective ASD risk-genes remains a critical issue.

Owing to the extremely high heterogeneity and complex pathophysiology of ASD,16, 25 investigating gene panels exhibiting similar functional effects in large cohorts will greatly assist in identifying pathogenic variations. Recently, targeted sequencing studies26, 27, 28, 29 identified several high-confidence candidate genes,16, 26, 30 including CHD8 and PTEN associated with macrocephaly (Mac); CTNNB1 and DYRK1A associated with microcephaly (Mic). Abnormal head circumference (HC) is known to be highly comorbid with ASD,26, 31, 32 which is caused by incorrect head growth and brain development during prenatal period and early childhood. Some Mic-associated genes actually interact with Mac-associated genes.16 For example, Mac-associated gene, CHD8 negatively regulates β-catenin targeted genes including Mic-associated gene CTNNB1 via WNT pathway, influences neuronal migration, polarity and synaptogenesis.33 With this interaction, ASD proband with CHD8 truncating variations have significant Mac, while ASD probands with CTNNB1 truncating variations exhibited Mic.16 Currently, more than 100 genes and loci have been found to be associated with Mic–Mac. However, no work to our knowledge has systematically elucidated the link between Mic–Mac-risk genes and ASD.

In this study, we hypothesize that some Mic–Mac-risk genes are involved in ASD; they might be important candidate genes to include in searching for mutations in ASD. With that, we sequenced 294 candidate genes including 136 Mic–Mac-risk genes and 158 possible ASD-risk genes, in 536 Chinese ASD patients. We then integrated the genetic and clinical data from the Simons Simplex Collection (SSC) and Autism Sequencing Consortium (ASC) for following aims (1) to prioritize additional ASD-risk genes and ASD-related Mic–Mac-risk (AMM) genes, (2) to characterize the relationship between AMM and ASD genes and (3) to investigate their functional networks and expression patterns in developing human brains.

Materials and methods

Targeted sequencing of subjects with ASD

A total of 536 unrelated ASD trios (probands and their unaffected parents) from the Autism Clinical and Genetic Resources in China (ACGC)28 were recruited. Genomic DNA (2 μg) from 504 patients with ASD were used to construct a DNA library prior to targeted sequencing, using Agilent SureSelect target enrichment (Agilent Technologies, Santa Clara, CA, USA). In addition, 8 μg of DNA from 32 ASD trios (96 samples) was used to construct a genomic DNA library for WGS. For all DNA libraries, the Illumina Hiseq2000 sequencing system (Illumina, San Diego, CA, USA) was employed to generate the paired-end raw data (Figure 1). This study was approved by the Institutional Review Board of the State Key Laboratory of Medical Genetics, School of Life Sciences at Central South University, Changsha, Hunan, China. All subjects who participated in this study completed informed consent before the original sample collection.

Figure 1
figure 1

Workflow of data analysis. Targeted sequencing of 294 genes as employed in 504 probands, WGS as employed in 32 ASD trios. Rare damaging mutations in the 294 genes from 536 samples were validated by Sanger sequencing. Genetic variations in the 294 genes from publicly available WES/WGS studies (4167 probands plus 1786 controls) along with our Chinese population (536 cases plus 1457 controls) were used to prioritize risk-genes using the TADA model. Then, we employed genotype-phenotype, genetic functional, PPI and gene co-expression analysis to investigate the functional impact of candidate genes. Finally, we developed a functional network that converged on three pathways for AMM risk-genes and characterized their spatiotemporal-expression patterns in brain and laminar-expression profiles in developing neocortex. AMM, ASD-related Mic–Mac-risk; ASD, autism spectrum disorder; Mac, macrocephaly; Mic, microcephaly; PPI, protein–protein interaction; TADA, Transmission and De N ovo Association; WES, whole-exome sequencing; WGS, whole-genome sequencing.

PowerPoint slide

We manually collected 105 Mic and 31 Mac-risk genes from PubMed, with each gene reportedly associated with Mic or Mac in at least one study (Supplementary Table 1). In addition, 158 possible ASD-risk genes were chosen according to the following criteria: (1) they harbored damaging DNMs including single-nucleotide variants (SNVs), insertions and deletions (indels), or located inside copy number variations in ASD or other neuropsychiatric disorders; (2) they were associated with ASD based on SFARI Gene database;34 and (3) they have been detected damaging variations in the 32 Chinese ASD proband by WGS. This study mainly focused on prioritizing additional ASD and AMM risk-genes, and investigated their functional pathways and expression patterns. The targeted sequencing data reported in this paper have been deposited in the Genome Sequence Archive in BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, under accession numbers PRJCA000393.

Mutation detection and annotation

Burrows-Wheeler Aligner35 was employed to align clean reads to the human reference genome (hg19), Sequence Alignment/Map tools36 was used to remove duplicate reads and generate position-sorted files, and the Genome Analysis Toolkit HaplotypeCaller37 was employed to detect SNVs and small indels. ANNOVAR38 was then employed to annotate mutations as described previously.39, 40 Only loss-of-function (LoF) mutations (stop-loss, stop-gain, splicing-site SNVs and frameshift indels) and deleterious missense mutations (mutations predicted by ⩾9 out of 14 methods to be deleterious) with minor-allele frequencies<0.001 (based on ExAC41) were retained and validated by Sanger sequencing (Figure 1).

Risk-gene prioritization

The Transmission and De N ovo Association (TADA) program42 was employed to prioritize risk genes. Damaging variations in the 294 sequenced genes from 1457 controls without neuropsychiatric disorders or related diseases from an in-house Chinese exome database were imported to the TADA program to improve the power of risk-gene prioritization. In-house Chinese exome database provides background variations in the Chinese population that were successfully used in several studies.43, 44, 45 The DNMs of 4167 probands and rare inherited mutations (RIMs) of 2377 probands, as well as those of 1786 siblings by WES/WGS, were combined with genetic data from the Chinese cohort to acquire the best possible power for prioritizing ASD-risk genes (Figure 1). Publicly available DNMs were mainly sourced from SSC and ASC, which can also been downloaded from the NPdenovo database (www.wzgenomics.cn/NPdenovo),39 and RIMs of individuals in SSC were obtained from a published study.46 The intelligence quotients (IQs) and HCs of ASD patients were sourced from SSC.

Functional networks and spatiotemporal expression patterns

Human PPI networks were obtained from STRING database (http://string.embl.de/). The spatiotemporal transcriptomes of the human brain across anatomically different regions and developmental periods were downloaded from BrainSpan (http://www.brainspan.org/). We calculated Pearson correlation coefficients between any two genes based on their expression levels in all 524 tissue samples. Gene pairs with |R|⩾0.8 were regarded as being co-expressed and defined as a connection in the human brain as previous study.47 The permutation test was performed similar to a previous study47 to evaluate the functional connections of candidate genes (Figure 1). In brief, we compared the number of co-expressed genes within the gene set or with known ASD-risk genes and their connections with those of 1 000 000 random iterations. The permutation test has took gene length and expected gene-level DNM rate48 into consideration.

We then merged the co-expression networks derived from the candidate genes with the PPI networks for the same genes. The connections between any two genes were defined by information from the PPI and/or co-expression networks. AMM genes were regarded as ‘seed genes’ that were directly connected and used to form an interconnected functional network. Known Mic–Mac- or ASD-risk genes based on the SFARI Gene database34 that were connected with at least two of AMM genes were added to the above network (Figure 1). Network figures were created using Cytoscape 3.1.0 (http://www.cytoscape.org/). Biological process of Gene Ontology (GO) was used to report the biological pathway in the functional network. Furthermore, human brain co-expression data from HBT49 (http://hbatlas.org/) and PPI data from InWeb_IM50 (http://www.intomics.com/inbio/map) were used to replicate the functional network. InWeb_IM was a scored experimental PPI network recently released, with less than one third of interaction overlapped with STRING.

Signed hybrid-weighted gene co-expression network analysis (WGCNA)51 was performed across all tissue samples from BrainSpan using the standard method with a power of six to cluster the spatiotemporal-expression patterns and prenatal laminar-expression profiles52 of a given gene set.

Results

Targeted sequencing and quality control

Targeted sequencing of 294 genes was employed in a cohort of 504 ASD probands from the ACGC (Figure 1). On average, we obtained 293.39 MB of clean data for each sample, about two-thirds of which were mapped to target regions, resulting in ~120 × sequence depth (Supplementary Table 2). In addition, an average of 97.90% and 95.52% of the target regions covered at least 10 and 20 reads, respectively. In another project, we employed WGS (>30 ×) in another 32 ASD trios to detect all classes of mutations in the genome. Here, to maximize the Chinese sample size, we added genetic variations found in 294 genes from 32 probands to those from 504 probands (Figure 1). Together, variation data from 536 Chinese ASD trios were used for further analysis.

De novo mutations and X-linked variations

We detected and verified a total of 496 LoF or deleterious missense mutations with minor-allele frequency<0.001 in the 294 genes (Supplementary Table 3). After confirming the state of these mutations in parents by Sanger sequencing, we detected 22 DNMs, including four nonsense mutations, four frameshift indels and 14 deleterious missense mutations, in 20 genes (Table 1). Based on previously reported expected DNM rates,48 we observed significantly higher DNM burden for the 294 sequenced genes (observed n=22 vs expected n=3.98; odds ratio=5.53; binomial test P=9.23 × 10−11). In our recently developed NPdenovo database,39 nine of the 20 genes also harbored damaging DNMs in published WES/WGS results, including two genes (CHD8 and SCN2A) with recurrent DNMs and POGZ, MECP2 and DYRK1A with LoF DNMs. We also detected a novel frameshift mutation in NRXN2 (c.808dupG, p.A270fs), a known causative gene associated with ASD.

Table 1 Detected DNMs and hemizygous variations

In our previous study,28 we sequenced 189 ASD-candidate genes using single-molecule molecular-inversion probes (smMIPs), with 37 genes including 11 Mic–Mac-associated genes from 504 samples overlapping with this study. Here, we recaptured the five DNMs identified in the previous study, as well as two additional DNMs that were missed in the smMIP study (Table 1). For the p.R282X site in MECP2, 96 reads covered here as compared with only two reads in the smMIP study (Supplementary Figure 1). The p.V1777L in ANK2 was missed because its inheritance not having been validated in previous study. In addition, 11 genes harboring DNMs in only Chinese patients were detected in present study, including a frameshift deletion (p.L612fs) in RFX3 that is essential for brain development.53 Furthermore, for the overlapped 37 genes in our previous study,28 additional 20 DNMs in eight genes were identified in another independent Chinese cohorts, including two Mic-associated genes (CDKL5, DYRK1A) and two Mac-associated genes (CHD8, MECP2) (Supplementary Table 4).

Among the 496 damaging variations, eight in seven risk genes (ATRX, CDKL5, FLNA, OFD1, PLXNA3, RAB39B and SLC6A8; Table 1) were inherited X-linked hemizygous variations. PLXNA3, which plays important roles in axon guidance in the developing nervous system,54 harbored one nonsense (c.2497 A>T, p.K833X) and one missense (c.5051C>T, p.S1684L) variation in two unrelated male patients. Meanwhile, a splicing-site DNM of this gene in a Caucasian male proband was detected (c.1928+1G>A) by WES,46 suggesting that PLXNA3 may play important roles in ASD molecular pathology.

To evaluate the functional impact of DNMs and hemizygous variations, we sourced gene-level haploinsufficiency rates55 and residual variation intolerance scores (RVIS)56 from original studies. Compared with all RefSeq genes (background), these genes exhibited significantly higher probabilities of haploinsufficiency (P=0.0098; Supplementary Figure 2a) and lower RVIS (P=0.0021; Supplementary Figure 2b). Specifically, 21 (77.78%) and 22 (81.48%) of 27 genes ranked in the top 50% of haploinsufficiency and RVIS, respectively (Table 1), indicating that they are likely to be intolerant of damaging mutations.

Prioritization of ASD-candidate genes and their functional impact

In addition to DNMs, RIMs play important roles in ASD etiology.46, 57, 58 Integrating datasets from multiple independent studies offers insight into ASD-candidate genes.8, 23, 59, 60 We then employed TADA program,42 to prioritize candidate genes. Based on genetic variations in 536 patients with ASD and 1457 Chinese controls, as well as DNMs from 4167 probands, RIMs from 2377 probands and 1786 controls from SSC or ASC (Supplementary Table 5), we prioritized 60 candidate genes (PTADA<0.01; Table 2 and Supplementary Table 6). Compared with random expectations (n=3), there was significant enrichment of genes with P<0.01 (n=60) corresponding to a false-discovery rate of 0.05 (Supplementary Figures 3a–b). In addition, seven genes carrying hemizygous variations were automatically listed as candidates due to their involvement in ASD, intellectual disability (ID) or other X-linked neurodevelopmental disorders. Together, 67 ASD-candidate genes were finally prioritized, and their DNMs and RIMs were detected in 2.76% (130 of 4703) and 15.10% (440 of 2913) of cataloged probands, respectively (Table 2). We also compared the 67 candidate genes to two previous TADA gene sets identified from large cohorts of WES data.8, 22 Stephan et al. paper identified 65 genes,22 of which 16 were included in our panel and 15 of 16 genes were prioritized in present study. Another study by Silvia et al. identified 108 genes,8 we prioritized 21 of 25 genes that were included in our panel (Supplementary Figure 4). We consider this high reproducibility.

Table 2 Prioritized of ASD candidate genes

We found that the 67 candidate genes exhibited significantly higher probabilities of haploinsufficiency (P=0.00025; Supplementary Figure 2a) and lower RVIS (P=7.39 × 10−6; Supplementary Figure 2b). In addition, compared to all ASD patients, probands with DNMs in candidate genes exhibited a significantly lower full-scale IQ (P=0.00023), verbal IQ (P=0.0081) and non-verbal IQ (P=0.00021) (Supplementary Figures 5a–c). Probands with RIMs exhibited moderately lower IQ (P<0.05), consistent with previous findings.46 Furthermore, permutation test was performed as our previous study39 to investigate the relationships of the 67 candidates based on co-expression networks in the human brain and PPI data. It revealed significantly co-expression (Supplementary Figure 6a–d) and interaction (Supplementary Figures 7a–d) between candidates, as well as with known ASD-risk genes. Since the 294 sequenced genes including the 136 Mic–Mac genes were significantly interacted and co-expressed than random expectation (P<0.01), we re-analysis the connectedness of the 67 candidate genes by regarding the 294 sequenced genes as background in the progress of random sampling. As a result, candidate genes were consistently co-expressed and interacted more than by chance (P<0.05, Supplementary Table 7). To replicate the strong interconnectedness of candidate genes, we sourced brain co-expression data from HBT,49 and human PPI data from InWeb_IM50 recently developed, and found the candidates were significantly interacted and co-expressed than random expectations (Supplementary Figures 8a–d). These results indicated that the 67 candidates are more likely to be associated with ASD as compared with random expectations.

Mic–Mac-candidate genes in ASD

We noted that 21 of 67 candidate genes were shared between Mic–Mac and ASD (Table 2). These included both previously well-studied genes, such as DYRK1A, CHD8 and PTEN, and several novel genes. We detected four frameshift indels (p.V531fs, p.L891fs, p.G1937fs and p.Q2594fs) and three damaging missense variations (p.R490C, p.E1132G and p.R1667H) of ASPM in nine unrelated patients, which encodes a centrosomal protein that regulates the Wnt pathway during neurogenesis. In addition, 1 missense DNM (p.C1557R) and 15 RIMs of ASPM were detected in ASD patients from the SSC,46 indicating that the Mic-risk gene ASPM is associated with ASD (PTADA=0.0013).

We found that DNMs and RIMs in the 21 AMM genes were detected in 0.98% (46 of 4703) and 5.32% (155 of 2913) of ASD patients, respectively (Table 2). We calculated standardized HC (Z-score) of probands from SSC based on age and gender as in previous study26 and it turned out that probands with DNMs or RIMs in ASD-related MIC-risk genes exhibited significantly lower HC, and those in ASD-related macrocephaly-risk genes exhibited significantly higher HC (both P<0.05; Supplementary Figure 9). In addition, we found Chinese probands harboring DNMs in CHD8 and MECP2 show macrocephaly and probands with DNMs of CDKL5 and DYRK1A show MIC, consistent with previous reports.26, 61

Convergent networks of AMM genes

In addition to the 21 prioritized AMM genes, we cataloged 18 additional genes listed as both Mic–Mac- and ASD-risk genes, including DPP6, FOXG1, MBD5, UBE3A and VPS13B, based on the SFARI Gene database (Supplementary Table 8). Together, 39 AMM risk-genes were used as ‘seed genes’ to develop a functional network based on PPI and co-expression connections. Known Mic–Mac- or ASD-risk genes that co-expressed or interconnected with at least two AMM genes were merged into this network, resulting in a functional network comprising 178 genes, converging on chromatin modification (n=51), synaptic transmission (n=84) and the mitotic cell cycle (n=56) (Figure 2 and Supplementary Table 9). The three modules included 12, 17 and 11 AMM risk-genes, respectively.

Figure 2
figure 2

Co-expression and PPI networks. AMM genes form a biological network involved in chromatin modification, synaptic transmission and the mitotic cell cycle. Only genes implicated in the above three functional subnetworks were shown. Biological process of GO was used to report the biological pathway. AMM genes were interconnected and co-expressed with one another, as well as with known ASD or Mic–Mac-risk genes. All nodes are sized based on the degree of connectivity. AMM, ASD-related Mic–Mac-risk; GO, Gene Ontology; Mac, macrocephaly; Mic, microcephaly; PPI, protein–protein interaction.

PowerPoint slide

We noted that DYRK1A, a MIC causative gene in ASD, was clustered in the subnetwork associated with the mitotic cell cycle, which is involved in brain growth and neurogenesis.16 Variations that disrupt genes involving mitotic cell cycle progression could lead to dysfunctional mitosis of neural cells and scarcity in the central nervous system pool, highlighting the important roles of cell cycle genes in the molecular pathology of ASD and Mic–Mac. In addition, we found that different subnetworks were connected by some hub genes (such as CTNNB1) that regulate multiple functional pathways. To replicate these results of AMM genes in other independent data, we constructed a functional network based on data from InWeb_IM and HBT and found that network genes enriched in nervous system development, synapse organization, chromatin organization, cell cycle and regulation progression (Supplementary Table 10).

Expression patterns involved in brain development

To characterize the spatiotemporal-expression patterns of the 178 genes in convergent networks during brain development, we performed WGCNA in the tissue samples from BrainSpan and identified three independent modules (M1–M3) comprising 146 genes (Figure 3a and Supplementary Table 9). We found that genes within M1 (n=72) were highly expressed in the human brain during the embryonic and early-to-middle fetal periods (8–26 post-conceptual weeks (pcw)), but gradually decreased and bottomed out at the age of 2-year old. In contrast, the expression levels of M2 genes (n=46) were minimal during the embryonic period and gradually increased during the prenatal period, reaching a stable level after birth. M3 genes (n=28) were highly expressed during 8 pcw to 9 pcw in some subregions, including the frontal cortex, hippocampus and amygdaloid complex, followed by a sharp decrease in expression and stabilization at low levels. We replicated above recognizable spatiotemporal-expression patterns based on HBT expression data (Supplementary Figure 10). Furthermore, we found that the three modules significantly overlapped with the three subnetworks previously (Figure 3b). Specifically, M1, M2 and M3 genes were significantly enriched in genes from subnetworks involving chromatin modification (hypergeometric test, P=4.71 × 10−6), synapse transmission (P=0.00039) and the mitotic cell cycle, respectively (P=2.20 × 10−14).

Figure 3
figure 3

Expression patterns in the human brain. (a) The three spatiotemporal expression patterns (M1, M2 or M3) for network genes based on RNAseq data from BrainSpan, and corresponding to 17 developmental stages across 16 subregions. (b) Genes in different functional subnetworks overlapped with M1, M2 or M3, with the number of genes and percentages indicated. Biological process of GO was used to report the biological pathway. (c) Characterization of neocortical expression profiles (Ma, Mb and Mc) for network genes. Four subregions of the developing neocortex, delineating nine layers per subregion, were analyzed. (d) For each functional subnetwork, the number of genes and their percentage of overlap with each model are shown. Expression levels are normalized according to the log2-fold of the average expression level of each gene. *P<0.05; **P<0.01; ***P<0.001. A1C, primary auditory cortex; AMY, amygdaloid complex; CPo, outer cortical plate; CPi, inner cortical plate; DFC, dorsolateral prefrontal cortex; GO, Gene Ontology; HIP, hippocampus; IPC, posteroinferior parietal cortex; ITC, inferolateral temporal cortex; M1C, primary motor cortex; MD, mediodorsal nucleus of thalamus; MFC, anterior cingulate cortex; MZ, marginal zone; OFC, orbital frontal cortex; STC, posterior superior temporal cortex; SG, subpial granular zone; SP, subplate zone; IZ, subplate zone; STR, striatum; SZo, outer subventricular zone; SZi, inner subventricular zone; S1C, primary somatosensory cortex; V1C, primary visual cortex; VFC, ventrolateral prefrontal cortex; VZ, ventricular zone.

PowerPoint slide

Since expression levels of candidate genes were dramatically fluctuated in the human cortex during the fetal period as report above, we then characterized prenatal neocortical-expression profiles based on laser-microdissection data from 526 prenatal neocortical samples52 (15–21 pcw). Interestingly, we identified three co-expression modules (Ma, Mb and Mc) having distinct laminar signatures (Figure 3c and Supplementary Table 9). The expression of genes within Ma (n=67) was high in the middle layers but lower in marginal to the intermediate zone, whereas that of Mb (n=38) displays opposite trends, which was lower in the middle layers and high in the deep cortical layers and the outer subventricular to the ventricular zone. Moreover, we found that Mb significantly overlapped with cell cycle genes (P=8.82 × 10−9), indicating that intricate regulations of the mitotic cell cycle play important roles in the developing cortex (Figure 3d).

Discussion

Recent large-scale WES/WGS studies illuminated the essential roles of DNMs and RIMs in ASD and accelerated the understanding of the neurobiological and genetic basis of ASD.4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 Here, we detected several pathogenic DNMs, including recurrences in known strong-risk genes, in ASD patients from a Chinese cohort. Generally, DNMs have strong effects on the function of genes, especially for the genes with high probabilities of haploinsufficiency and genic intolerance. We combined genetic data from Chinese and Caucasian populations and prioritized 67 ASD-risk genes. Probands with DNMs or RIMs involving risk genes exhibited significantly lower IQ, demonstrating the functional effects of these genes. Although pathogenesis and mechanisms of only a small fraction of ASD risk-genes have been validated by in vitro or in vivo experiments, such as ANK3,62 CTNND2,63 DYRK1A,34 GLRA2,64 PTEN,65, 66 PRICKLE2,67 SHANK3,68 SLC6A369 and TRPC6,70 our study provided a meaningful reference for further studies of ASD-risk genes. The fact that DNMs of these candidate genes contributed to a small fraction of ASD patients suggested more resequencing studies were needed to identify more strong candidates and decipher genetic mechanisms. Of interest, we also prioritized several risk-genes that only detected DNMs in Chinese cohorts and these genes warranted further studies.

Though ASD is highly clinical and genetic heterogeneous, some ASD candidate genes with DNMs in fact can define clinical subtype of ASD with characterized phenotypes.71 For example, individuals with DNMs in CHD8 were characterized by ID, macrocephaly, distinct faces and gastrointestinal complaints,72 while DNMs in DYRK1A is associated with ASD, ID, MIC, febrile seizures in infancy, impaired speech, hypertonia and a specific facial gestalt.30 The approach investigating the relationship between phenotypes and genotypes not only deepens the understanding of ASD etiology but also assists in clinical diagnose and intervention. In present study, we prioritized 39 genes associated with both Mic–Mac and ASD. Patients having DNMs or RIMs in these genes exhibited abnormal HC, highlighting the shared genetic mechanisms underlying ASD and Mic–Mac. Systematic discovery of genetic variants associated with both Mic–Mac and ASD may be helpful to test the pathogenic effects of AMM genes and in determining certain phenotypes.

We also performed PPI and co-expression analyses to investigate the related functional pathways, with results converging on chromatin modification, synaptic transmission and the mitotic cell cycle. Previous studies demonstrated that chromatin and synaptic genes play important roles in ASD pathophysiology,8, 22 and our study highlighted cell cycle genes involved in ASD and Mic–Mac. Of interest, cell cycle and Wnt-signaling pathway exhibit complex interaction and both of them were involved in regulation of cell fate, growth and proliferation.73 ASD patients with DNMs in the β-catenin/Wnt-signaling network have been observed to be associated with both macrocephaly and MIC.71 Consequently, we found that 14 AMM genes (ASPM, CHD8, CDKL5, CTNNB1, DYRK1A, FOXG1, HEPACAM, OFD1, PAX6, PTEN, MECP2, MEF2C, NIPBL and TCF4) were involved in the Wnt pathway. Of note, a recent study demonstrated that CHD8 regulates cortical development by stimulating transcription of cell cycle genes,74 providing new insight into the roles of the cell cycle and Wnt signaling in ASD and Mic–Mac.

We characterized spatiotemporal-expression patterns of genes in networks and correlated these patterns to functional pathways. The expression profiles of genes in the M1 and M2 clusters were similar to previous reports related to ASD.23 However, we identified novel patterns of M3 genes highly expressed during the early fetal period and significantly overlapping with cell cycle subnetworks. Characterization of their cerebral cortex laminar features revealed that network genes were highly expressed in different cortical layers, corresponding to the three models. As shown in module Mb, the cell cycle genes were highly expressed in deep cortical layers highlighting their essential roles during brain development. Genes involved in similar molecular functions usually exhibited similar expression patterns in human brain, providing new insights in understanding ASD etiology.

In summary, through sequencing of MIC and macrocephaly-related genes, and previously known ASD candidate genes in Chinese ASD cohorts, we identified new DNMs and RIMs in these genes. We further prioritized novel risk genes, and identified their associated brain expression features and biological networks. Our results shed light on deciphering etiologies of ASD and Mic–Mac and facilitate discovery of information necessary for accurate genetic counseling and testing.