Main

ASD is a childhood-onset neurodevelopmental disorder that is characterized by cognitive, motor and sensory deficits1. ASD has a strong genetic component, with risk contribution from hundreds of genes2,3,4,5,6. Furthermore, the same mutation can result in varied clinical manifestations, probably reflecting a modulatory effect of the overall genetic and epigenetic background7,8. The shared developmental effects that cause this large and heterogeneous collection of genes to converge on the phenotypic features of ASD remain poorly understood.

Here we used reproducible organoid models of the developing human cerebral cortex9 to investigate the roles of three ASD risk genes across multiple human stem cell lines. SUV420H1, ARID1B and CHD8 have emerged repeatedly as top hits in studies of ASD genetic risk6,10,11,12,13. All three of these genes are associated with severe neurodevelopmental abnormalities, including high frequencies of macrocephaly11,14,15,16. We show that mutations in these genes converge on asynchronous development of shared neuronal classes, rather than on shared molecular mechanisms. The degree of expressivity varies depending on the risk gene and phenotype, highlighting the nuanced interactions between the genetic variants and the genomic contexts that produce the phenotypic manifestation of ASD.

Organoids as models of ASD risk genes

To investigate whether mutations in different ASD risk genes converge on shared phenotypes, we generated cortical organoids9 from different human induced pluripotent stem (iPS) cell lines (Methods) and profiled individual organoids using scRNA-seq at three stages: 1 month, when organoids contain mostly progenitors and neurogenesis is beginning; 3 months, when progenitor diversity increases and multiple subtypes of cortical excitatory neurons emerge; and 6 months, when interneurons and astroglia are present. We first verified that these organoids initiate appropriate neurodevelopment and express known ASD risk genes6 (Supplementary Notes and Extended Data Figs. 1 and 2).

We next selected three ASD risk genes, SUV420H1, ARID1B and CHD8 (Supplementary Notes), and engineered heterozygous protein-truncating indel mutations in multiple parental lines, targeting protein domains that are mutated in patients (Methods, Supplementary Table 1 and Extended Data Fig. 3a–c). Notably, for all genes, the different parental lines showed substantial differences in endogenous expression of the risk proteins, consistent with documented interindividual variability7,8, which in turn influenced the absolute amount of protein remaining in the heterozygote (Supplementary Notes and Extended Data Fig. 3d–f). These differences underscore the importance of investigating risk genes across multiple genomic contexts.

As all three genes are linked to macrocephaly and/or microcephaly in patients, we quantified organoid size in each background, at two weeks and 1 month (Supplementary Table 2). Mutant lines showed size defects resembling the abnormalities seen in patients, with varying severity between different genomic contexts (Supplementary Notes and Extended Data Figs. 3g–j and 4a). These data indicate that organoids can capture clinically relevant features of ASD pathology.

Asynchronous development in SUV420H1-mutant organoids

We profiled early stages of SUV420H1+/− and control Mito294 (30,733 cells, 35 days in vitro (d.i.v.)), PGP1 (37,510 cells, 35 d.i.v.) and Mito210 (two independent differentiations: 57,941 cells, 28 d.i.v.; and 33,313 cells, 35 d.i.v.) organoids using scRNA-seq. Strikingly, mutants showed a consistent presence of GABAergic neurons in all backgrounds (Fig. 1a–c and Extended Data Fig. 4b–e), although these neurons are rare or absent in controls until approximately 3.5 months9. The GABAergic population in mutant organoids at 1 month expressed broad markers of GABAergic identity (hereafter, GABAergic neurons).

Fig. 1: The SUV420H1+/− genotype induces the asynchronous generation of GABAergic neurons and deep-layer projection neurons, and changes in circuit activity.
figure 1

ac, Combined t-distributed stochastic neighbor embedding (t-SNE) analysis of all organoids (top left, with total cells per dataset) and the percentage of the indicated cell types per organoid (top right), colour-coded by cell type, for Mito294 (a; 35 d.i.v.), PGP1 (b; 35 d.i.v.) and Mito210 batch I (c; 28 d.i.v.) organoids. Adjusted P values were determined using logistic mixed models, comparing the difference in cell-type proportions between genotypes (Methods). Bottom, t-SNE analysis of individual organoids; cell types of interest are coloured. d, e, Pseudotime uniform manifold approximation and projection (UMAP) analysis of the Mito210 organoids (batch II, 35 d.i.v.), colour-coded by cell type (d), pseudotime (early, blue; late, yellow; e, left) or genotype (e, right). Insets: cells indicated by the dotted boxes, separated by genotype. f, Neuronal maturation and synapse formation module of highly correlated genes in Mito210 batch II cells from d and e, showing a UMAP plot of module scores (left) and the score distribution across the genotypes (right). The horizontal bars show the median scores, and the dots show the average score per organoid. Adjusted P values were determined using linear mixed models, comparing differences between the control and mutant organoids (Methods). gi, Calcium imaging of neuronal activity in intact PGP1 organoids (128 d.i.v.). g, Left, representative organoid transduced with SomaGCaMP6f2. Scale bar, 100 μm. Insets: high-magnification image of individual cells (1–3). Scale bar, 10 μm. Right, spontaneous calcium signal for each example cell as ΔF/F (top) and a pseudocolour heat map (bottom). Scale bars, 10% (vertical), 30 s (horizontal). h, Representative heat maps of calcium signal for each condition (left). Right, spontaneous network burst frequency. The dots show the average values per organoid and the bars show the mean across all organoids. i, The population-averaged calcium transients (top left) and heat map for individual cells (bottom left). Scale bars, 2% (vertical), 5 s (horizontal). Right, spontaneous network burst duration. The dots show the average values per organoid and the bars show the mean across all organoids. aRG, apical radial glia; DL, deep layer; GABA, GABAergic; IPC, intermediate progenitor cells; N, neurons; NP, neuron progenitors; PN, projection neurons; SUV, SUV420H1+/−.

Despite the consistency of this phenotype across lines and differentiations, there were noticeable differences in phenotypic severity (expressivity) across genomic contexts. Specifically, the Mito294 SUV420H1 line showed the most substantial increase in GABAergic neurons, with over 50% of the cells in all of the mutant organoids belonging to the GABAergic lineage, and <5% belonging to the excitatory projection neuron lineage (n = 3 organoids per genotype, adjusted P = 0.002, logistic mixed models; Fig. 1a and Extended Data Fig. 4b, c). PGP1 SUV420H1 organoids showed intermediate severity, with up to 35% of cells in the mutants belonging to the GABAergic lineage (n = 2–3 organoids per genotype, adjusted P = 0.004, logistic mixed models; Fig. 1b and Extended Data Fig. 4d). Finally, Mito210 SUV420H1 organoids showed the mildest phenotype, with no more than 5% of cells in the mutants belonging to the GABAergic lineage in one batch (28 d.i.v., n = 3 organoids per genotype, adjusted P = 0.017, logistic mixed models; Fig. 1c and Extended Data Fig. 4e) and no GABAergic neurons in a second differentiation batch at 35 d.i.v. (Extended Data Fig. 4f). This suggests that, although these cell lines converge on the same phenotype of premature generation of GABAergic neurons, the genetic and epigenetic context of each cell line modulates phenotypic expressivity.

We next investigated whether the increase in GABAergic neurons persisted at later stages. We profiled organoids from the two lines that showed the greatest difference in phenotypic severity (Mito294 and Mito210) at 3 months in vitro. At 3 months and beyond, GABAergic populations expressed clear molecular features of cortical interneurons (therefore indicated as GABAergic interneurons). The Mito294 SUV420H1+/− organoids still showed a disproportionately large GABAergic population (32,276 cells, n = 3 single organoids per genotype; Extended Data Fig. 5a). However, two distinct batches of Mito210 SUV420H1 organoids showed no GABAergic interneurons in the mutant or control (Extended Data Fig. 5b, c). This indicates that, depending on its expressivity, the GABAergic phenotype can resolve over development (Mito210) or persist (Mito294).

We next sought to examine the changes in other cell types. Owing to the pronounced overgrowth of the GABAergic lineage in the Mito294 SUV420H1+/− organoids, most other cell types had reduced proportions (Fig. 1a and Extended Data Figs. 4c and 5a). However, in 1 month Mito210 SUV420H1+/− organoids, the milder GABAergic phenotype enabled us to detect an increase in immature deep-layer projection neurons, the first-born neurons of the cortical plate17,18, in two differentiation batches (batch I, 28 d.i.v., adjusted P = 0.027; batch II, 35 d.i.v., adjusted P = 0.001; logistic mixed models, n = 3 single organoids per genotype; Fig. 1c and Extended Data Fig. 4e, f). Earlier cell types of the deep-layer projection neuron lineage (intermediate progenitor cells and early-postmitotic newborn deep-layer projection neurons) were also increased (Extended Data Fig. 4f). Similar to the transient GABAergic phenotype in this line, the deep-layer projection neuron phenotype was rescued after 3 months in vitro (two batches, 92 and 90 d.i.v.; Extended Data Fig. 5b, c).

In the PGP1 background, although the GABAergic phenotype was consistently observed, we did not observe an increase in the number of deep-layer projection neurons at 1 month (35 d.i.v.; Fig. 1b and Extended Data Fig. 4d). However, genes that were upregulated in the deep-layer projection neuron lineage in mutants from both the PGP1 and Mito210 lines were enriched in gene ontology (GO) terms related to neuronal differentiation and maturation (Methods, Supplementary Notes, Extended Data Fig. 4g and Supplementary Table 3), indicating a molecular profile that is consistent with more advanced neuronal maturation in both backgrounds. Interestingly, although the Mito210 line showed a more severe phenotype for the deep-layer projection neurons compared with the PGP1 line, the PGP1 line showed a more severe phenotype for the GABAergic neurons, indicating that different features of the mutant phenotype can be differentially modulated by the same genomic context.

Accelerated maturation of neuron classes

We next examined the developmental dynamics of the affected cell types within a specific lineage. Owing to the low numbers of GABAergic neurons in the control lines at 1 month, we focused on the deep-layer projection neuron phenotype. We calculated pseudotime trajectories, and identified the portion of the trajectory corresponding to the development of the affected cell types (the partition of interest; Methods and Extended Data Fig. 5d). The deep-layer projection neuron lineage in the combined 35 d.i.v. Mito210 SUV420H1+/− and control organoids (batch II) showed an increased distribution of mutant cells towards the end point of the trajectory (P < 2.2 × 10−16, one-sided Kolmogorov–Smirnov test; Fig. 1d, e), supporting accelerated development of these neurons in the mutants. Co-expression analysis using WGCNA19 (Methods, Supplementary Table 4 and Extended Data Fig. 5e) identified a module containing multiple genes associated with neuronal maturation and synapse formation that was positively correlated with pseudotime progression (Fig. 1f; Pearson correlation r = 0.94, P < 2.2 × 10−16), and was significantly upregulated in mutant organoids (adjusted P = 0.0017, linear mixed models; Fig. 1f). These results support an accelerated differentiation phenotype in deep-layer projection neurons induced by SUV420H1+/−.

We next examined the mechanisms for the premature expression of maturation-associated genes in this mutant. As SUV420H1 is a histone-lysine N-methyltransferase20, we examined changes in chromatin accessibility. We performed a single-cell assay for transposase-accessible chromatin with high-throughput sequencing (scATAC-seq) on Mito210 SUV420H1 organoids at 1 and 3 months (31 d.i.v., 84,696 nuclei; 93 d.i.v., 23,669 nuclei; n = 3 single organoids per genotype and timepoint). Co-embedding scATAC-seq and scRNA-seq data showed that chromatin accessibility captures most of the cell types identified by gene expression (Extended Data Fig. 6a).

At 1 month, most of the significant differentially accessible regions (DARs) between the mutant and control overlapped across cell types (Supplementary Table 5). We therefore combined all cells, and identified 414 DARs (Methods). The genes that were nearest (within 10 kb) to DARs with increased accessibility in mutant organoids were enriched for GO terms associated with synaptic transmission and neuronal maturation, whereas the genes that were nearest to DARs with reduced accessibility were enriched for negative regulation of neuronal maturation and connectivity (Extended Data Fig. 6b–d and Supplementary Table 5), consistent with the phenotypes observed in the scRNA-seq analysis.

At a later developmental stage (93 d.i.v.), we detected only 43 significant DARs (adjusted P < 0.1) across all cells (Supplementary Table 5). However, regions that were more accessible in the mutant were enriched for GO terms linked to synaptic function (Extended Data Fig. 6b), suggesting that differential regulation of genes that are important to neuronal maturation and function remains.

Regions with increased accessibility in the mutant were enriched for transcription factor (TF)-binding sites for regulators of neurogenesis and patterning of the developing nervous system, including multiple genes involved in the development of the GABAergic lineage (Methods, Supplementary Table 5 and Extended Data Fig. 6e).

Our results show that, in SUV420H1+/− organoids, both GABAergic and deep-layer projection neurons exhibit accelerated development, and that the genomic context can differentially modulate phenotypic abnormalities affecting distinct cell types.

Reduced spontaneous circuit activity

The early developmental abnormalities in GABAergic and deep-layer projection neurons, along with the changes in expression and accessibility of synapse-associated genes, prompted us to investigate circuit activity.

We analysed spontaneous neuronal activity in a line with an intermediate phenotype (PGP1 SUV420H1; Fig. 1b), using adeno-associated viruses driving GCaMP (Methods) to record intracellular calcium dynamics in intact 4 month organoids (128 d.i.v.; Fig. 1g, Extended Data Fig. 7a and Supplementary Video 1). The predominant form of activity was a tetrodotoxin (TTX)-sensitive calcium signal (n = 10/10 organoids; Extended Data Fig. 7b), of which the large amplitude, slow kinetics and multipeak structure suggested that it was mediated by trains of action potentials (Extended Data Fig. 7c); this result was confirmed by extracellular single-unit recordings using a multielectrode array (MEA) (Extended Data Fig. 7d). These large calcium spikes occurred in periodic, synchronized bursts across most neurons (Extended Data Fig. 7e, f), resembling early network activity observed in the developing brain21,22. Network bursting was abolished after bath application of NBQX, an antagonist of non-NMDA glutamate receptors (Methods and Extended Data Fig. 7g), suggesting that coordinated network activity was driven by excitatory synaptic transmission.

Notably, after blockade of excitatory synapses with NBQX, only controls displayed calcium transients (Extended Data Fig. 7g, h), indicating that the control cells were more excitable, and probably more immature compared with the mutants. This is consistent with the accelerated molecular differentiation observed in SUV420H1+/− organoids.

Mutants showed a relative reduction in both frequency (P = 0.032, t-test; Fig. 1h) and duration (P = 0.026, t-test; Fig. 1i) of network bursts (Extended Data Fig. 7i, j), indicating that SUV420H1+/− organoids have reduced spontaneous activity, consistent with mouse models23. These data suggest that, beyond the molecular and cellular changes observed in mutant organoids, the SUV420H1+/− genotype can also induce long-term abnormalities in circuit activity.

ARID1B and SUV420H1 share target populations

We next investigated whether changes in the production of neuronal classes were a shared feature of ASD risk genes. We profiled individual Mito210 ARID1B+/− and control organoids from two independent differentiations at 1 month (35 d.i.v.) using scRNA-seq (batch I: 43,556 cells; batch II: 35,000 cells; Fig. 2a, b and Extended Data Fig. 8a–c).

Fig. 2: The ARID1B+/− genotype induces asynchronous generation of GABAergic neurons and deep-layer projection neurons.
figure 2

a, b, Combined t-SNE analysis of all organoids (top left), the percentage of the indicated cell types per organoid (top right) and t-SNE analysis for individual organoids (bottom), as in Fig. 1a–c, for Mito210 batch I (a; 35 d.i.v.) and Mito210 batch II (b; 35 d.i.v.) organoids. c, d, Pseudotime UMAP analysis of Mito210 batch II (35 d.i.v.) organoids as in Fig. 1d, e, colour-coded by cell type (c) and pseudotime (d). e, Cell division and proliferation module of highly correlated genes by co-expression network analysis of the Mito210 batch II cells in c and d, as in Fig. 1f.

Although the controls had few or no GABAergic lineage cells at this age, Mito210 ARID1B+/− organoids showed a consistent increase in the proportions of GABAergic neurons and their progenitors (batch I and II, respectively; GABAergic neurons: adjusted P = 0.0057, P = 0.0076; GABAergic neuron progenitors: adjusted P = 0.0004, P = 0.0128; cycling GABAergic neuron progenitors: adjusted P = 0.0004, P = 0.0001; logistic mixed models, n = 3 single organoids per genotype; Fig. 2a, b and Extended Data Fig. 8b, c). In the first batch, GABAergic neurons constituted up to 50% of the profiled cells, making it difficult to draw conclusions about other cell types (Fig. 2a and Extended Data Fig. 8b). In the second batch, the GABAergic phenotype was less severe, enabling us to detect a significant reduction in newborn deep-layer projection neurons (adjusted P = 0.001, logistic mixed models; Fig. 2b and Extended Data Fig. 8c). Notably, these are the same two neuron populations that are affected in SUV420H1+/− organoids. Although the ARID1B+/− genotype had an opposite effect on the deep-layer projection neuron lineage, the data show convergence of two previously unrelated risk genes in the classes of cells that they affect. This phenotype of increased GABAergic populations was still detectable at a later developmental stage, 3 months, although it was less severe (Extended Data Fig. 8d–f).

To test the effect of genomic context, we generated control and ARID1B+/− organoids in the Mito294 background, and profiled 50,081 cells at 35 d.i.v. (n = 3 per genotype). Consistent with the Mito210 ARID1B phenotype, there was a decreased number of newborn deep-layer projection neurons in the mutant (P = 0.025, logistic mixed models; Extended Data Fig. 8g). However, there was no significant increase in the GABAergic population in this background (P = 0.24, logistic mixed models; Extended Data Fig. 8g). This line, Mito294, had the most pronounced increase in GABAergic neurons in SUV420H1+/− organoids, showing that the genomic context modifies the expressivity of each mutation differently.

Pseudotime analysis (Extended Data Fig. 8h) showed a decreased distribution of cells towards the end point of the trajectory, progressing from progenitors to deep-layer projection neurons in Mito210 ARID1B+/− organoids at 1 month (35 d.i.v., batch II; P < 2.2 × 10−16, one-sided Kolmogorov–Smirnov test; Fig. 2c, d). Gene module analysis (Extended Data Fig. 8i) identified a module containing multiple DNA-replication and cell-cycle genes (Supplementary Table 3) that was enriched in progenitor cells and was significantly upregulated in mutants (adjusted P = 0.018, linear mixed models; Fig. 2e). These data indicate delayed differentiation of deep-layer projection neurons in ARID1B+/− organoids.

In sum, similarly to SUV420H1+/− organoids, ARID1B+/− organoids exhibit both a phenotype of premature expansion of the GABAergic neuron lineage, and asynchronous development of deep-layer projection neurons. Notably, as in SUV420H1+/− organoids, these two phenotypes were differentially modulated in distinct parental lines in the ARID1B+/− organoids.

CHD8 +/− genotype promotes interneuron development

To further examine the hypothesis of convergent phenotypes among ASD risk genes, we profiled HUES66 CHD8+/− and control organoids at 3.5 months (109 d.i.v., 67,024 cells, n = 3 single organoids per genotype; Fig. 3a and Extended Data Fig. 9a). Mutants had an increased number of GABAergic interneurons and their progenitors (GABAergic interneurons: adjusted P = 0.079; cycling GABAergic interneuron progenitors: adjusted P = 0.031; GABAergic interneuron progenitors: adjusted P = 0.0012, logistic mixed models; Fig. 3a and Extended Data Fig. 9a). A second independent batch of HUES66 CHD8 organoids showed an even more substantial increase (n = 2–3 single organoids per genotype; cycling GABAergic interneuron progenitors: adjusted P = 7.2 × 10−5; GABAergic interneuron progenitors: adjusted P = 1.8 × 10−5; GABAergic interneurons: adjusted P = 8.9 × 10−6, logistic mixed models; Fig. 3b and Extended Data Fig. 9b–d). Notably, at 6 months (190 d.i.v., 39,285 cells, n = 3 individual organoids per genotype), the GABAergic interneuron phenotype was still present (adjusted P = 0.002, logistic mixed models; Extended Data Fig. 9d–f).

Fig. 3: The CHD8+/− genotype leads to asynchronous generation of GABAergic interneurons.
figure 3

a, b, Combined t-SNE analysis of all organoids (top left), the percentage of the indicated cell types for each organoid (top right) and t-SNE analyses for individual organoids (bottom), as in Fig. 1a–c, for HUES66 batch I (a; 109 d.i.v.) and HUES66 batch II (b; 107 d.i.v.) organoids. c, d, Pseudotime UMAP analysis of HUES66 batch I organoids (109 d.i.v.), as in Fig. 1d, e, colour-coded by cell type (c) and pseudotime (d). e, Interneuron differentiation module of highly correlated genes by co-expression network analysis of the HUES66 batch I cells in c and d, as in Fig. 1f. CFuPN, corticofugal projection neurons; INP, interneuron progenitors; IN, interneurons; oRG, outer radial glia; UL CPN, upper layer callosal projection neurons.

This increase in GABAergic populations is consistent with two recent reports showing that CHD8+/− affects the expression of GABAergic interneuron marker genes in two additional human parental lines24,25. However, as we found for SUV420H1+/− and ARID1B+/− organoids, the genomic context was able to modulate the expressivity of the CHD8+/− phenotype. We compared CHD8+/− and control organoids generated from four different parental lines spanning different basal levels of CHD8 protein expression (Extended Data Fig. 3c, f and Supplementary Table 1). Bulk RNA-seq analysis of 35 d.i.v. organoids showed that, although differentially expressed genes (DEG) between the mutant and control did not significantly overlap between lines, DEGs from three out of the four lines (HUES66, GM08330 and H1) shared GO terms related to neurodevelopment and neuronal maturation (Extended Data Fig. 10a and Supplementary Table 6). However, scRNA-seq analysis of CHD8+/− and control organoids from GM08330 and H1 lines at 3.5 months showed no significant difference in the number of GABAergic interneurons (105–108 d.i.v., n = 3 individual organoids per genotype, 107,490 cells; Extended Data Fig. 10b–d).

Pseudotime analysis of the GABAergic lineage (progressing from radial glia to GABAergic interneurons) in 3.5 month HUES66 CHD8+/− and control organoids showed an increased distribution of mutant cells towards the end point of the developmental trajectory (P < 2.2 × 10−16, one-sided Kolmogorov–Smirnov test; Fig. 3c, d and Extended Data Fig. 9g). Gene module analysis of the GABAergic lineage (Extended Data Fig. 9h and Supplementary Table 4) identified a module of interneuron differentiation genes that was upregulated in HUES66 CHD8+/− organoids (adjusted P = 0.019, linear mixed models; Fig. 3e), and two modules related to progenitor biology that were downregulated in the mutant (Extended Data Fig. 9h).

Thus, similar to the SUV420H1+/− and ARID1B+/− genotypes, the CHD8+/− genotype leads to an accelerated development of the GABAergic lineage that, for CHD8, leads to a persistent increase in the proportion of these cell types. For all three risk genes, this phenotype occurs in multiple parental lines, but with different degrees of phenotypic expressivity.

Convergence through distinct mechanisms

We next investigated whether SUV420H1+/−, ARID1B+/− and CHD8+/− organoids converged on asynchronous development of the same neuronal lineages by acting through common molecular pathways. We compared gene expression changes across the three ASD risk genes in cell lines that showed a strong phenotype (Mito210 SUV420H1, Mito210 ARID1B and HUES66 CHD8). Although mutations shared enrichment for GO categories, DEGs from bulk or pseudobulk RNA-seq analysis showed little overlap (Supplementary Notes, Supplementary Table 7 and Extended Data Fig. 11a–d). Similarly, although related cell types within the same mutation shared overlapping DEGs, DEGs caused by different mutations rarely overlapped, even for identical or closely related cell types (Fig. 4, Supplementary Notes and Supplementary Table 7). Thus, although these three mutants share a degree of convergence in altered neurodevelopmental processes, they affect largely distinct genes.

Fig. 4: The SUV420H1+/−, ARID1B+/− and CHD8+/− genotypes act through distinct gene targets.
figure 4

Overlap between the sets of DEGs (mutant versus control) in individual cell types from the scRNA-seq datasets. The colour and size of the boxed circles indicates the significance of the number of overlapping genes between the corresponding cell populations (Bonferroni-adjusted P value, determined using hypergeometric tests). CP/CH, choroid plexus/cortical hem.

Whole-proteome mass spectrometry (MS) analysis of mutant and control single organoids (Methods) identified 233 significantly differentially expressed proteins (DEPs; false-discovery rate (FDR) < 0.1) for SUV420H1+/– (≥4,000 proteins detected per sample), 24 for ARID1B+/– (≥900 proteins) and 34 for CHD8+/– (≥2,800 proteins; Extended Data Fig. 12a–c and Supplementary Table 8) organoids. DEPs had a very low overlap between mutations, with only five proteins significantly dysregulated in at least two mutations (Supplementary Table 8). DEPs and enriched biological processes (gene set enrichment) for all mutations resembled the gene modules that were identified by scRNA-seq analysis (Supplementary Notes and Extended Data Fig. 12d–f).

To evaluate whether the affected proteins in the three mutants were predicted to interact with one another, or with shared target proteins, the top 50 DEPs (adjusted P) for each mutation were used to create a network of interacting proteins26,27, followed by clustering to identify subnetworks (Methods). Each subnetwork contained DEPs from multiple mutations (Supplementary Notes and Extended Data Fig. 12g–i), indicating that these three risk genes affect shared processes, albeit by dysregulating different proteins.

Discussion

The process by which mutations in ASD risk genes converge on the neurobiology of this multifaceted disorder remains unclear. Our results define two neuronal classes of the local cortical circuit (GABAergic and deep-layer projection neurons) as specifically affected populations. Excitatory/inhibitory imbalance of the cortical microcircuit is a major hypothesis for the aetiology of ASD28,29,30, and previous studies have implicated the dysregulation of GABAergic and glutamatergic cortical neurons in ASD patients and experimental models31,32,33,34,35,36,37. Notably, we show that different human genomic contexts modulate phenotypic expressivity, based on both the risk gene and the specific abnormalities caused by each mutation. This is interesting, as many ASD risk gene mutations show variable clinical manifestations in humans7,8,14,38.

Our finding that different ASD risk genes converge on a phenotype of asynchronous neuronal development but mostly diverge at the level of molecular targets suggests that a shared clinical pathology of these genes may derive from higher-order processes of neuronal differentiation and circuit wiring. These results encourage future investigation of therapeutic approaches aimed at the modulation of shared dysfunctional circuit properties in addition to shared molecular pathways.

Methods

PS cell culture

The HUES66 CHD8 parental hESC line39 and CHD8 mutant line (HUES66 AC2), a clone that has a heterozygous 13-nucleotide deletion, resulting in a stop codon at amino acid 1248 (CHD8 gRNA: 5′-TTCTTACTGTGTACCCGGGC-3′ (TGG)), were provided by N. Sanjana, X. Shi, J. Pan and F. Zhang (Broad Institute of MIT and Harvard). The psychiatric control Mito210 and Mito294 parental iPS cell lines were provided by B. Cohen (McLean Hospital); the parental PGP1 iPS cell line by G. Church (Harvard University)40; the GM08330 iPS cell line (also known as GM8330-8) by M. Talkowski (MGH) and was originally from Coriell Institute; and the H1 parental hESC line (also known as WA01) was purchased from WiCell. Cell lines were cultured as previously described9,41. Among these cell lines, we included iPS cell lines from individuals with no known history of ASD or other psychiatric condition (Mito210 and Mito294 confirmed by structured psychiatric interview, PGP1 with publicly available records). All human pluripotent stem (PS) cell lines were maintained below passage 50, were negative for mycoplasma (assayed with MycoAlert PLUS Mycoplasma Detection Kit, Lonza) and karyotypically normal (G-banded karyotype test performed by WiCell Research Institute). The HUES66 and PGP1 lines were authenticated using STR analysis completed by GlobalStem (in 2008) and TRIPath (in 2018), respectively. The Mito210 and Mito294 lines were authenticated by genotyping analysis (Fluidigm FPV5 chip) performed by the Broad Institute Genomics Platform (in 2017). The H1 and GM08330 lines were authenticated using STR analysis completed by WiCell (in 2021). In the Mito294 ARID1B control line, a CNV smaller than 0.5 Mb on chromosome 19 was detected by single-nucleotide polymorphism array analysis. The GM08330 parental line and edited lines all have an interstitial duplication in the long (q) arm of chromosome 20. All of the experiments involving human cells were performed according to ISSCR 2021 guidelines42, and approved by the Harvard University IRB and ESCRO committees.

CRISPR guide design

The CRISPR guides for SUV420H1 and ARID1B were designed using the Benchling CRISPR Guide Design Tool (Benchling Biology Software, 2017). The guides were designed to maximize on-target efficiency and minimize off-target sites in intragenic regions43,44. For SUV420H1, a guide was designed to target the N-terminal domain to create a protein truncation early in the translated protein in all known protein coding transcripts (SUV420H1 gRNA: 5′-CAAGAACCAAACTGGTTGCT-3′ (AGG)). The ARID1B guide was chosen to induce a stop codon immediately before the ARID domain (ARID1B gRNA: 5′-CTCTAGCCTGATGAACACGC-3′ (AGG)). For CHD8, all of the mutant lines were generated using the same gRNA previously used for the generation of the HUES66 AC2 (CHD8 gRNA: 5′-TTCTTACTGTGTACCCGGGC-3′ (TGG)).

CRISPR-mediated gene editing

Mutations in SUV420H1 were introduced in the Mito210, Mito294 and PGP1 iPS cell lines. For the Mito210 and Mito294 SUV420H1 mutant lines, nanoblades that were generated as previously described45 were mixed with 300 µl of mTeSR1 and 4 µg ml−1 of polybrene and added to 80% confluent cells. For selection of the edited clones, cells were enzymatically detached and plated at a ratio of ~5,000 cells per 60 mm dish with 10 μM of ROCK inhibitor (Y-27632, Millipore-Sigma) to increase single-cell survival. When the colonies started to appear, each clone was manually collected and transferred into a single well of a 96-well plate. During colony picking, some of the cells were reserved for DNA extraction and clonal screening. The PGP1 SUV420H1-mutant line was generated in collaboration with the Harvard Stem Cell Institute (HSCI) iPS Core Facility. In brief, parental cells were transfected using the Neon system (1,000 V, 1,100 V or 1,200 V, 30 ms, 1 pulse). For 100,000 cells, 6.25 pmol TrueCut Cas9 Protein v2 (Thermo Fisher Scientific, A36496) and 12.5 pmol of sgRNA (Synthego) were used. After transfection, the pools of cells were collected to test knock-out efficiency. The best pool was then selected for low-density plating (600 to 2,000 cells per 10 cm dish). A week later, colonies were picked into one 96-well plate. Clones were screened by PCR and Sanger sequencing. Heterozygous clones were expanded and the genotypes were reconfirmed after expansion.

Mito210 and Mito294 ARID1B-edited lines were generated by the Broad Institute Stem Cell Facility. The guide RNA and Cas9 (EnGen Cas9 NLS from New England Biolabs) were transfected by using the NEON transfection system (Thermo Fisher Scientific, 1,050 V, 30 ms, 2 pulses and 2.5 × 105 cells).

Mutations in CHD8 were introduced in the Mito210 and Mito294 lines using the Amaxa 4D-Nucleofector (Lonza), using the protocol optimized for PS cell lines. Parental cell lines were transfected with gRNA-CHD8-Cas92APuro and immediately plated in mTeSR1 for 24 h. Selection of transfected cells was performed by adding 0.25–0.5 μg ml−1 of puromycin after 48 h of transfection, for 2 days. Selection of the edited clones was performed according to the protocol described for the Mito210 and Mito294 SUV420H1 clones. The H1 and GM08330 CHD8-mutant lines were generated in collaboration with the HSCI iPS Core Facility according to the protocol that was used to generate the PGP1 SUV420H1-mutant line.

Sequence confirmation of edits

Insertions and deletions in individual clones were screened by PCR amplification using primers flanking the guide. Further details about the insertions/deletions are provided in Supplementary Table 1.

Organoid differentiation

Cortical organoids were generated as previously described9,41. Embryoid bodies were formed in the same pluripotent medium in which they were maintained for 1–2 days to better enable the formation of embryoid bodies from each line (except for the lines Mito210 SUV420H1 and HUES66 CHD8 where cells were plated directly in CDM 1 as previously described9,41).

Immunohistochemistry

Samples were prepared as previously described9. Cryosection thickness varied from 14 µm to 18 µm. A list of the primary and secondary antibodies is provided in Supplementary Table 9.

Whole-organoid imaging

Organoids in Extended Data Fig. 4a were processed using the SHIELD protocol46. In brief, organoids were fixed for 30 min in 4% paraformaldehyde (PFA) at room temperature and were then treated with 3% (w/v) polyglycerol-3-polyglycidyl ether (P3PE) for 48 h in ice cold 0.1 M phosphate buffer (pH 7.2) at 4 °C then transferred to 0.3% P3PE in 0.1 M sodium carbonate (pH 10) for 24 h at 37 °C. The samples were rinsed and cleared in 0.2 M SDS in 50 mM phosphate-buffered saline (pH 7.3) for 48 h at 55 °C. Organoids were stained with Syto16 (Thermo Fisher Scientific, S7578) and anti-SOX2 antibodies using the SmartLabel system (Lifecanvas). A list of the primary antibodies is provided in Supplementary Table 9. Tissues were washed extensively for 24 h in phosphate-buffered saline + 0.1% Triton X-100 and antibodies were fixed to the tissue using a 4% PFA solution overnight at room temperature. Tissues were refractive-index-matched in PROTOS solution (RI = 1.519) and imaged using a SmartSPIM axially swept light-sheet microscope (LifeCanvas Technologies). 3D image datasets were acquired using a ×15/0.4 NA objective (ASI-Special Optics, 54-10-12). Optical sections from whole-organoid datasets are shown in Extended Data Fig. 4a.

Microscopy and organoid size analysis

Images of organoids in culture were taken with an EVOS FL microscope (Invitrogen), Lionheart FX Automated Microscope (BioTek Instruments), or with an Axio Imager.Z2 (Zeiss). Immunofluorescence images were acquired with the latter two and analysed with the Gen5 (BioTek Instruments) or Zen Blue (ZEN 2.6 – blue edition, Zeiss) image processing software. ImageJ47 (v.2.0) was used to quantify organoid size. Area values were obtained by tracing individual organoids on ImageJ, which measured area pixels. Measurements were plotted as a ratio to the average value for control organoids of each experimental batch. A summary of the number of organoids and differentiations used for the measurements is provided in Supplementary Table 2.

Western blotting

Proteins were extracted from iPS cells using N-PER Neuronal Protein Extraction Reagent (Thermo Fisher Scientific) supplemented with protease (cOmplete Mini Protease Inhibitor Cocktail, Roche) and phosphatase (PhosSTOP, Sigma-Aldrich) inhibitors. Lysates were centrifuged for 10 min at 13,500 rpm at 4 °C. Protein concentration was quantified using the Pierce BCA Protein Assay Kit (Thermo Fisher Scientific). Protein lysates (15–20 μg) were separated on a NuPAGE 4–12%, Bis-Tris Gel (Invitrogen) or Mini-PROTEAN 4–15% Gels (Bio-Rad) and transferred onto a polyvinylidene difluoride membrane. Blots were blocked with 5% non-fat dry milk (Bio-Rad) and incubated with primary antibodies overnight (Supplementary Table 9). The blots were then washed and incubated at room temperature with secondary horseradish peroxidase-conjugated antibodies (Abcam) for 1 h. The blots were developed using SuperSignal West Femto Maximum Sensitivity Substrate (Thermo Fisher Scientific) or ECL Prime Western Blotting System (Millipore), and the ChemiDoc System (Bio-Rad). Densitometry band quantification was performed using Fiji software48 v.2.0 and normalized to housekeeping genes (GAPDH or ACTB). The bands used for quantification are marked with an asterisk in Extended Data Fig. 3d–f. Uncropped gel images of western blots are provided in Supplementary Fig. 1.

Calcium imaging

Organoids were transduced with pAAV-CAG-SomaGCaMP6f2 (Addgene, 158757) by pipetting 0.2 µl of stock virus into 500 µl Cortical Differentiation Medium IV (CDMIV, without Matrigel) in a 24-well plate containing a single organoid. The next day, each organoid was transferred to a 6-well plate filled with 2 ml of fresh medium. On the third day after transduction, organoids were transferred to low-attachment 10 cm plates and, on the seventh day, the medium was switched to BrainPhys (5790, STEMCELL Technologies) supplemented with 1% N2 (17502-048, Thermo Fisher Scientific), 1% B27 (17504044, Thermo Fisher Scientific), GDNF (20 ng ml−1, 78139, STEMCELL Technologies), BDNF (20 ng ml−1, 450-02, Peprotech), cAMP (1 mM, 100-0244, STEMCELL Technologies), ascorbic acid (200 nM, 72132, STEMCELL Technologies) and laminin (1 µg ml−1, 23017015, Life Technologies). The organoids were cultured in BrainPhys for at least 2 weeks before imaging.

Brain organoids were randomly selected and transferred to a recording chamber containing BrainPhys. Imaging was performed using a confocal scanner (CSU-W1, Andor confocal unit attached on an inverted microscope (Ti-Eclipse and NIS-elements imaging software (NIS-Elements Advance Research (v.4.51.01)), both from Nikon)), while the organoids were kept at 37 °C using a heating platform and a controller (TC-324C, Warner Instruments). The use of a ×10 objective (Plan Apo λ, ×10/0.45 NA) resulted in a field of view of 1.3 × 1.3 mm2 and a pixel size of 0.6 μm. Imaging took place in fast-time-lapse mode, with an exposure time of 100 ms, resulting in an acquisition rate of 8 f.p.s. Spontaneous activity was recorded in three different z planes, for at least 22 min of baseline activity in total (with no pharmacology treatment).

Stock solutions of 2,3-dioxo-6-nitro-1,2,3,4-tetrahydrobenzo[f]quinoxaline-7-sulfonamide disodium salt (NBQX disodium salt, Abcam; 100 mM) and tetrodotoxin citrate (TTX, Abcam; 10 mM) were prepared in double-distilled H2O. Bath application of NBQX (antagonist of AMPA/kainate glutamate receptors) and TTX (voltage-gated sodium-channel antagonist) was applied to achieve a final bath concentration of 20 µM and 2 µM, respectively.

Data were converted from nd2 format to tiff, and automated motion correction and cell segmentation were performed using Suite2p49, followed by manual curation of segmented cells (we examined the spatial footprint and temporal characteristics of each candidate cells, as well as manually adding neurons with clear cell body morphology; Fig. 1g). The mean raw fluorescence for each cell was then measured as a function of time.

Analysis of calcium imaging data

Analysis was performed using custom MATLAB scripts. Raw calcium signals for each cell, F(t), were converted to represent changes from the baseline level, ΔF/F(t) defined as (F(t) – F0(t))/F0(t). The time varying baseline fluorescence, F0(t), for each cell was a smoothed fluorescence trace obtained after applying a 10-s-order median filter centred at t. Calcium events elicited by action potentials were detected based on a threshold value given by their peak amplitude (5 times the s.d. of the noise value) and their first time derivative (2.5 times the s.d. of the noise value).

The analysis of network bursting was performed on the basis of the population-averaged calcium signal along all of the segmented cells. A peak in the population signal was considered to be a network burst if it met the following criteria: (1) the peak amplitude was greater than 10 times the s.d. of the noise value; (2) a set of bursting cells composed of at least 20% of total cells were active during that population calcium transient; and (3) a cell was considered part of the set of bursting cells only if it participated in at least 50% of the network bursts. Under these criteria, 89.3 ± 14% (range from 60.5% to 100%) and 95.5 ± 6.8% (range from 77.6% to 100%) of segmented cells participated in network bursting in control and mutant organoids, respectively.

The peaks of the network bursts were used to measure the interspike interval (ISI), and the burst frequency was obtained from the average ISI. The burst half-width was also measured from the population-averaged calcium signal by calculating the width of the transient at 50% of the burst peak amplitude.

For analyses of the synchronicity, the ΔF/F(t) signal was used to calculate the cross-correlation between all pairs of cells at zero lag (Extended Data Fig. 7e) as well as the cross-correlogram between a reference cell and the rest of the cells (Extended Data Fig. 7f). Along with the original signal, we randomly selected ten active cells, circularly shifted their ΔF/F(t) signal by random phases (keeping their internal temporal structure but altering their temporal relationship with the network) and used them as control.

Multi-electrode array

Extracellular neurophysiological signals were recorded using the Maxwell Biosystems CMOS-HD-MEA system50 (MaxOne System, MaxWell Biosystems). The MaxOne chip contains 26,400 platinum electrodes in a sensing area of 3.85 × 2.10 mm2 with 17.5 μm centre-to-centre pitch, 3,265 electrodes per mm2 density, and 1,024 configurable low-noise readout channels (2.4 μV root mean square (r.m.s.) in the 300 Hz–10 kHz band) with a sampling rate of 20 kHz s−1 at 10-bit resolution. Acute recordings were performed at room temperature, with the intact organoid in fresh BrainPhys.

For the recordings, we used MaxLab Live Software (v.20.1.6. MaxWell Biosystems). In brief, spontaneous activity of neurons was measured using the Activity Scan Assay whereby the whole chip area was scanned with a sparse recording (30 s per configuration, seven configurations). Active neurons were automatically identified on the basis of the firing rate and spike amplitude obtained from the Activity Scan. On the basis of the activity of the neurons, the most active electrodes were routed for the creation of the network configuration based on units of 4 × 3 electrodes each, with 1,024 recording electrodes in total (Extended Data Fig. 7d (top)). Selected electrodes were then simultaneously recorded using the network assay to investigate the spontaneous neuronal network activity.

For spike detection, the software used a finite impulse response bandpass filter between 300–3,000 Hz to preprocess the raw data (Extended Data Fig. 7d (middle)). The r.m.s. noise per electrode was calculated and every negative peak larger than 6 r.m.s. was considered to be a spike.

When extracting the waveform of the electrodes in a single unit (set of neighboring 4 × 3 electrodes; Extended Data Fig. 7d (bottom)), we used the spike time of one selected electrode as a reference to extract the simultaneous signal across the different electrodes (instead of using their individual spike times).

All descriptive statistics and statistical tests were performed in MATLAB (v.9.5, R2018b, MathWorks), using the Statistics Toolbox (v.11.4, R2018b, MathWorks). The Lilliefors test was used to test for normality of data distributions. All datasets met the assumptions of the applied statistical tests. When comparing groups, the equality of the variance was tested at the 5% significance level using a two-tailed squared-rank test. All statistical tests applied to the electrophysiological data were two-tailed, with a 5% significance level.

Cell lysis and filter-aided sample preparation digestion for MS

For SUV420H1, 4 mutant and 4 control organoids were used; for CHD8, 3 mutant and 3 control organoids were used; and, for ARID1B, 5 mutant and 4 control organoids were used. Cells were placed into microTUBE-15 (Covaris) microtubes with TPP buffer (truXTRAC Protein Extraction Buffer TP, Covaris, 520103) and lysed using a S220 Focused-ultrasonicator instrument (Covaris) with 125 W power over 180 s at 10% max peak power. After precipitation with chloroform/methanol, extracted proteins were weighed and digested according to the filter-aided sample preparation protocol51,52 (100 μg for ARID1B and CHD8; 150 μg for SUV420H1). In brief, the 10 kDa filter was washed with 100 μl of 50 mM triethylammonium bicarbonate (TEAB). Each sample was added, centrifuged and the supernatant was discarded. Then, 100 μl of 20 mM Tris (2-carboxyethyl) phosphine at 37 °C was added for 1 h, centrifuged and the supernatant was discarded. While shielding from light, 100 μl of 10 mM IAcNH2 was added for 1 h followed by spinning and discarding the supernatant. Next, 150 μl of 50 mM TEAB + 3 μg of Sequencing Grade Trypsin (Promega) was added to each sample and left overnight at 38 °C. The samples were then centrifuged and the supernatants were collected. Finally, 50 μl of 50 mM TEAB was added to the samples, followed by spinning and supernatant collection. The samples were then transferred to HPLC.

TMT mass tagging protocol peptide labelling

Tandem mass tag (TMT) label reagents (TMTPro, Thermo Fisher Scientific, 16plex Label Reagent Set, A44521) were equilibrated to room temperature and resuspended in anhydrous acetonitrile or ethanol (for the 0.8 mg and 5 mg vials, 41 μl and 256 μl were added, respectively). The reagent was dissolved for 5 min with occasional vortexing. TMT label reagent (41 μl, equivalent to 0.8 mg) was added to each 100–150 μg sample. The reaction was incubated for 1 h at room temperature. The reaction was quenched after adding 8 μl of 5% hydroxylamine to the sample and incubating for 15 min. The samples were combined, dried in a speedvac (Eppendorf) and stored at −80 °C.

Hi-pH separation and MS analysis

Before submission to liquid chromatography with tandem MS (LC–MS/MS), each experiment sample was separated on a Hi-pH column (Thermo Fisher Scientific) according to the vendor’s instructions. After separation into 40 (20 for the ARID1B experiment) fractions, each fraction was submitted for a single LC–MS/MS experiment performed on a Lumos Tribrid (Thermo Fisher Scientific) system equipped with 3000 Ultima Dual nanoHPLC pump (Thermo Fisher Scientific). The peptides were separated onto a microcapillary trapping column (inner diameter, 150 µm) packed first with approximately 3 cm of C18 Reprosil resin (5 µm, 100 Å, Dr. Maisch) followed by PharmaFluidics micropack analytical 50 cm column. Separation was achieved by applying a gradient of 5–27% acetonitrile in 0.1% formic acid over 90 min at 200 nl min−1. Electrospray ionization was enabled by applying a voltage of 1.8 kV using a custom-made electrode junction at the end of the microcapillary column and sprayed from stainless-steel tips (PepSep). The Lumos Orbitrap was operated in data-dependent mode for the MS methods. The MS survey scan was performed in the Orbitrap in the range of 400–1,800 m/z at a resolution of 6 × 104, followed by the selection of the 20 most intense ions (TOP20) for CID-MS2 fragmentation in the Ion trap using a precursor isolation width window of 2 m/z, AGC setting of 10,000 and a maximum ion accumulation of 50 ms. Singly charged ion species were not subjected to CID fragmentation. Normalized collision energy was set to 35 V and an activation time of 10 ms. Ions in a 10 ppm m/z window around ions selected for MS2 were excluded from further selection for fragmentation for 90 s. The same TOP20 ions were subjected to HCD MS2 events in the Orbitrap part of the instrument. The fragment ion isolation width was set to 0.8 m/z, AGC was set to 50,000, the maximum ion time was 150 ms, normalized collision energy was set to 34 V and an activation time of 1 ms for each HCD MS2 scan.

MS data generation

Raw data were submitted for analysis in Proteome Discoverer 2.4 (Thermo Fisher Scientific). Assignment of MS/MS spectra was performed using the Sequest HT algorithm by searching the data against a protein sequence database including all entries from the Human UniProt database53,54 and other known contaminants such as human keratins and common laboratory contaminants. Sequest HT searches were performed using a 10 ppm precursor ion tolerance and requiring the N/C termini of each peptide to adhere with Trypsin protease specificity, while allowing up to two missed cleavages. 16-plex TMTpro tags on peptide N termini and lysine residues (+304.207 Da) were set as static modifications while methionine oxidation (+15.99492 Da) was set as a variable modification. A MS2 spectra assignment FDR of 1% on the protein level was achieved by applying the target–decoy database search. Filtering was performed using a Percolator (64 bit version)55. For quantification, a 0.02 m/z window centred on the theoretical m/z value of each of the 6 reporter ions and the intensity of the signal closest to the theoretical m/z value was recorded. Reporter ion intensities were exported in the result file of Proteome Discoverer 2.4 search engine as Excel tables. The total signal intensity across all peptides quantified was summed for each TMT channel, and all intensity values were normalized to account for potentially uneven TMT labelling and/or sample handling variance for each labelled channel.

MS data analysis

Potential contaminants were filtered out and proteins supported by at least two unique peptides for the SUV420H1 and CHD8 experiment and at least one for the ARID1B experiment were used for further analysis. We retained proteins that were missing in at most one sample per condition. Data were transformed and normalized using variance stabilizing normalization using the DEP package of Bioconductor56. To perform statistical analysis, data were imputed for missing values using random draws from a Gaussian distribution with 0.3 width and a mean that was down-shifted from the sample mean by 1.8. To detect statistically significant differential protein abundance between conditions, we performed a moderated t-test using the LIMMA package of Bioconductor57, using an FDR threshold of 0.1. Gene set enrichment analysis (GSEA) was performed using the GSEA software58. GO and KEGG pathway annotation were used to perform functional annotation of the significantly regulated proteins. GO terms and KEGG pathways with FDR-adjusted q < 0.05 were considered to be statistically significant.

To build protein interaction networks, we used the prize-collecting Steiner forest algorithm26,59 using the top 50 DEPs (ranked by adjusted P value) from each mutation as terminals, with the absolute value of their log-transformed fold change as prizes. This algorithm optimizes the network to include high-confidence protein interactions between protein nodes with large prizes. We used the PCSF R package (v.0.99.1)60 to calculate networks, with the STRING database as a background protein–protein interactome27, using the parameters n = 10, r = 0.1, w = 2, b = 40 and mu = 0.01. As by default in that package, the network was subclustered using the edge-betweenness clustering algorithm from the igraph package, and functional enrichment was performed on each cluster using the ENRICHR API. Cytoscape software (v.3.8.2) was used for network visualization61. To assess relationships between the three sets of differential proteins, a protein–protein interaction (PPI)-weighted gene distance (pMM)62 was calculated between each pair of protein sets. A background distribution was calculated by drawing size-matched random lists of proteins from all of the detected proteins in each dataset and calculating the pMM between these sets. This was repeated 1,000 times, and an empirical P value was calculated by evaluating the number of times randomized pMMs were lower than the value calculated using DEPs.

Dissociation of brain organoids and scRNA-seq

Organoids were dissociated as previously described41,63. Volumes of reagents were scaled down 25× for one-month-old organoids. Cells were loaded onto either a Chromium Single Cell B or G Chip (10x Genomics, PN-1000153, PN-1000120), and processed through the Chromium Controller to generate single-cell gel beads in emulsion. scRNA-seq libraries were generated using the Chromium Single Cell 3′ Library & Gel Bead Kit v3 or v3.1 (10x Genomics, PN-1000075, PN-1000121), with the exception of a few libraries in the earlier experiments that were prepared using the v2 kit (10x Genomics, PN-120237). Information on the estimated number of cells loaded and the version of kit used is provided in Supplementary Table 10. Libraries were pooled from different samples based on molar concentrations and sequenced them on a NextSeq 500 or NovaSeq instrument (Illumina) with 28 bases for read 1 (26 bases for v2 libraries), 55 bases for read 2 (57 bases for v2 libraries) and 8 bases for index 1. If necessary, after the first round of sequencing, libraries were repooled on the basis of the actual number of cells in each and resequenced with the goal of producing an approximately equal number of reads per cell for each sample.

scRNA-seq data analysis

Reads from scRNA-seq were aligned to the GRCh38 human reference genome and the cell-by-gene count matrices were produced using the Cell Ranger pipeline (10x Genomics)64. Cell Ranger v.2.0.1 was used for experiments using the GM08330 control ‘single cell map’ and for HUES66 CHD8-mutant and control organoids at 3.5 months, batch I, while v.3.0.2 was used for all of the other experiments. The default parameters were used, except for the ‘--cells’ argument. Data were analysed using the Seurat R package v.3.1.565 using R v.3.6. Cells expressing a minimum of 500 genes were retained, and UMI counts were normalized for each cell by the total expression multiplied by 106 and log-transformed. Variable genes were found using the mean.var.plot method, and the ScaleData function was used to regress out variation due to differences in total UMIs per cell. Principal component analysis (PCA) was performed on the scaled data for the variable genes, and the top principal components were chosen based on Seurat’s ElbowPlots (at least 15 PCs were used in all cases). Cells were clustered in PCA space using Seurat’s FindNeighbors on top principal components, followed by FindClusters with resolution = 1.0 (in brief, a 20-nearest-neighbor graph was constructed and modularity optimization using the Louvain algorithm was performed to identify clusters). Variation in the cells was visualized by t-SNE analysis of the top principal components.

In the case of the GM08330 1 month organoids (single-cell map), cells were demultiplexed using genotype clustering from cells from a different experiment that were sequenced in the same lane. To demultiplex, SNPs were called from CellRanger BAM files using the cellSNP tool v.0.1.5, and then the vireo function was used with the default parameters and n_donor = 2, from the cardelino R library (v.0.4.0)66,67 to assign cells to each genotype.

In two cases, one organoid was excluded from the analysis as outliers. See the ‘Statistics and reproducibility’ section for details.

For each dataset, upregulated genes in each cluster were identified using the VeniceMarker tool from the Signac package v.0.0.7 from BioTuring (https://github.com/bioturing/signac). Cell types were assigned to each cluster by looking at the top most significant upregulated genes. In a few cases, clusters were further subclustered to assign identities at higher resolution. At 1 month, the excitatory projection neurons included a gradient of immature neurons, which were split into two clusters: we labelled the cluster representing the earlier developmental stage ‘newborn deep-layer projection neurons’ and the cluster representing the later stage ‘immature deep-layer projection neurons’. At 3 months and beyond, excitatory projection neuron clusters could be identified as deep-layer corticofugal neurons and upper-layer callosal projection neurons. For the GABAergic populations, 1 month organoids included neurons expressing broad markers of GABAergic identity (labelled GABAergic neurons), progenitor cells expressing markers of GABAergic lineage identity (GABAergic neuron progenitors) and progenitor cells with high expression of cell cycle markers in addition to the progenitor identity markers (cycling GABAergic neuron progenitors). At 3 months and beyond, GABAergic neurons expressed more specific markers of cortical interneurons (thus labelled GABAergic interneurons), and GABAergic lineage progenitors at these ages were divided into ‘GABAergic interneuron progenitors’ and ‘cycling GABAergic interneuron progenitors’ on the basis of the level of expression of cell cycle markers.

To assess gene expression of ASD risk genes in GM08330 and Mito210 control organoids across timepoints, datasets from 1, 3 and 6 months were merged using Seurat v.3.1.5, and then batch-corrected using Harmony v.1.0 with the default parameters68. As the 1 month data are dominated by cell cycle signal, the ScaleData function was used to regress out variation due to both total UMI count per cell and cell cycle stage differences, calculated using Seurat’s CellCycleScore. Variation was visualized using t-SNE on the first 30 harmony dimensions. Broad cell types were assigned as described above, and mutual information was calculated between cell type assignments and individual organoids using the mpmi R package69. Expression of the 102 ASD risk genes identified in the Satterstrom et. al.6 study was evaluated using Seurat’s AddModuleScore function using the default parameters. This function calculates the average expression level per cell of the set of genes (based on log-normalized, unscaled data), and then subtracts the average expression of a randomly selected expression-matched control set of genes. A resulting score of greater than zero indicates that the ASD risk gene set is expressed more highly in that cell than would be expected, given the average expression of the gene set across the dataset.

To compare cell type proportions between control and mutant organoids, for each cell type present in a dataset, the glmer function from the R package lme4 (v.1.1-23)70 was used to estimate a mixed-effect logistic regression model71. The output was a binary indicator of whether cells belong to this cell type, the control or mutant state of the cell was a fixed predictor, and the organoid that the cell belonged to was a random intercept. Another model was fit without the control-versus-mutant predictor, and the ANOVA function was used to compare the two model fits. P values for each cell type were then adjusted for multiple-hypothesis testing using Benjamini–Hochberg correction.

Pseudotime, gene module and differential expression analysis

Pseudotime analysis was performed using the Monocle3 v.0.2.0 software package72 with the default parameters. The cells were first subset to contain an equal amount from control and mutant. A starting point for the trajectory was chosen manually by finding an endpoint of the tree located in the earliest developmental cell type (generally, cycling progenitors). In cases in which the cells were split into more than one partition, the starting point was chosen within the partition of interest, and a new UMAP was calculated using just these cells. To test whether mutant trajectories were accelerated compared with the control, a one-sided Kolmogorov–Smirnov test was applied comparing the distribution of psuedotime values of control versus mutant cells, using the stats R package.

To learn patterns of coordinated gene regulation across the cells, we applied WGCNA19 to each dataset. In cases in which cells were split into partitions in the above pseudotime analysis, only cells belonging to the partition of interest were used. Normalized gene expression data were further filtered to remove outlying genes, mitochondrial and ribosomal genes. Outliers were identified by setting the upper (>9) and lower (<0.15) thresholds to the average normalized expression per gene. After processing, blockwiseModules function from the WGCNA v.1.69 library was performed in R with the parameters networkType=“signed”, minModuleSize=4, corType=“Bicor”, maxPOutliers=0.1, deepSplit=3, trapErrors=T and randomSeed=59069. Other than power, the remaining parameters were left as the default setting. To pick an adequate power for each dataset, we used the pickSoftThreshold function from WGCNA to test values from 1 to 30. The final resolution was determined by choosing the resolution that captured most variation in the fewest total number of modules— this resulted in a power of 3 for SUV420H1 35 d.i.v., 9 for ARID1B 35 d.i.v. and 12 for CHD8 109 d.i.v.

To calculate differential expression of modules, Seurat objects were downsampled to have an equal number of cells per organoid, and then the AddModuleScore function was used, using gene lists from WGCNA results. For each module, linear mixed-effect models were fit to the data, with the modules scores as the output, the organoid the cell belongs to as a random intercept, and with or without the control-versus-mutant state as a predictor. The ANOVA function was used to compare the models, and P values were then adjusted across modules using Benjamini–Hochberg correction.

DEGs between control and mutant organoids were assessed after datasets were subset to the cells from the partition of interest in the above pseudotime analysis, to the cells from each individual cell type, or not subset at all for pseudobulk analysis. Reads were then summed across cells in each organoid. Genes with less than 10 total reads were excluded, and DESeq2 (ref. 73) was used to calculate DEGs, with each organoid as a sample74. The clusterProfiler75 R package was used to find enriched biological processes in these gene sets, with the enrichGO function and the compareCluster function to highlight processes the gene sets might have in common.

Single-nucleus isolation and single-cell ATAC-seq

Nuclei from 1 month and 3 month organoids were extracted with two types of procedures according to their size differences. For the 1 month organoids, nuclei were extracted according to a protocol provided by 10x Genomics76 to minimize material loss, while a sucrose-based nucleus isolation protocol77 was used for the 3 month organoids to better remove debris. Single-nucleus ATAC-seq libraries were prepared using the Chromium Single Cell ATAC Library & Gel Bead v1 Kit (10x Genomics, PN-1000110) and around 15,300 nuclei per channel were loaded to give an estimated recovery of 10,000 nuclei per channel. Libraries from different samples were pooled on the basis of molar concentrations and sequenced with 1% PhiX spike-in on the NextSeq 500 instrument (Illumina) with 33 bases each for read 1 and read 2, 8 bases for index 1 and 16 bases for index 2.

Single-cell ATAC-seq data analysis

Reads from scATAC-seq were aligned to the GRCh38 human reference genome and the cell-by-peak count matrices were produced using the Cell Ranger ATAC pipeline v.2.0.0 (10x Genomics) with the default parameters. Data were analysed using the Signac R package (v.1.2.1)78 using R v.4.0. Annotations from the EnsDb.Hsapiens.v86 package79 were added to the object. After consideration of the quality control metrics recommended in that package, cells with 1,500–20,000 fragments in peak regions, at least 35% of reads in peaks, a nucleosome signal of less than 4 and a TSS enrichment score of greater than 2 were retained for further analysis. Latent semantic indexing (LSI) was performed to reduce data dimensionality (counts were normalized using term frequency inverse document frequency, all features were set as top features, and singular value decomposition was performed). The top LSI component was discarded as it correlated strongly with sequencing depth, and components 2–30 were used for downstream analysis. Cells were clustered using Seurat’s FindNeighbors, followed by FindClusters with the SLM algorithm (a 20-nearest-neighbor graph was constructed and modularity optimization using the smart local moving algorithm was performed to identify clusters). Variation in the cells was visualized using UMAP analysis of the top LSI components.

scATAC-seq data were integrated with scRNA-seq data from the corresponding Mito210 dataset for each timepoint, using Seurat’s TransferData to predict cell type labels for the ATAC profiles. Concurrently, differentially accessible (DA) peaks per cluster were called using FindMarkers using the logistic regression framework with the number of fragments in peak regions as a latent variable. These DA peaks were mapped to the closest genes. The top genes per cluster were used to confirm and refine cluster cell type assignments from those based on transferring RNA labels.

DA peaks between control and SUV420H1-mutant organoids were calculated per cell type, using the same method as described above. We noticed that most cell types had very few significantly DARs (range 6–34, except for apical radial glia cells, the most prevalent and, therefore, the most powered cell type at this time point, which had 515 DARs), and that the DARs were almost entirely overlapping in all cell types. Therefore, DARs were calculated using all cells together to improve power. DARs were visualized using Signac’s CoveragePlot function with the default parameters.

To find transcription factor motifs enriched in DARs, the top 400 up- and downregulated peaks for each time point differentially accessible peaks were supplied to the HOMER software (v.4.11.1)80, using a 300 bp fragment size and masking repeats. In the case of upregulated regions in 3 month mutant organoids, only 341 regions were supplied, as that was the total number of regions with log[FC] > 0.1 and P > 0.1. The top 5 de novo motifs per cell type found by HOMER with P ≤ 10−10 are reported, along with all TFs of which the known binding sites match that motif with a score of ≥0.59.

Statistics and reproducibility

Organoid size analysis

Information about the number of organoids used is provided in Supplementary Table 2. In summary, for SUV420H1: n = 132 for total control organoids, n = 132 for total mutant organoids, from 6 experimental batches. For ARID1B: n = 109 for total control organoids, n = 122 for total mutant organoids, from 4 experimental batches. For CHD8: n = 472 for total control organoids, n = 482 for total mutant organoids, from 7 experimental batches. P values were calculated using two-sided t-tests and then adjusted using Bonferroni correction.

Proteomic analysis

Four mutant and four control organoids were used for SUV420H1. Three mutant and three control, and five mutant and four control organoids were used for CHD8 and ARID1B, respectively. To detect statistically significant differential protein abundance between conditions, moderated t-tests were performed as described in ‘MS data analysis’ (FDR threshold of 0.1; Extended Data Fig. 12a–c). GO terms and KEGG pathways were calculated using the GSEA software (Extended Data Fig. 12d–f) and FDR-adjusted q < 0.05 was considered to be statistically significant. For each pair of protein set distances between pairs of DEP sets (Extended Data Fig. 12h, i), a PPI-weighted protein set distance was calculated between all significant DEPs (FDR < 0.1). To determine whether this distance was smaller than would be expected by chance, size-matched sets were randomly chosen from the proteins detected in each experiment, and the distance between these random sets was calculated 1,000 times per pair. P values were assigned by counting the fractions of times that this random distance was less than the actual distance value between differential sets.

scATAC-seq analysis

Detailed information is provided in Supplementary Table 10. In summary, three SUV420H1 mutant and three control organoids were used for each of the 1 month and 3 month timepoints, for a total of twelve individually sequenced organoids. The total number of cells sequenced was 45,988.

scRNA-seq analysis

Detailed information is provided in Supplementary Table 10. In summary, in each dataset, three individual organoids per genotype were profiled. In two cases, one organoid was excluded from the analysis as an outlier; in PGP1 SUV420H1 organoids at 1 month, a mutant organoid was excluded due to very low average nUMI and nGene in that sequencing lane, and in the HUES66 CHD8 organoids at 3.5 months batch II, a mutant organoid was excluded because it contained mostly interneuron lineage cells, with very few projection neuron cells. Although an increase in interneuron-lineage cells was seen in all mutant organoids, this organoid was excluded to be conservative. This left a total of 112 single organoids that passed quality control and were considered in downstream analysis, with a total of 749,370 cells. Adjusted P values for differences in cell type proportions between control and mutant organoids (Figs. 1a–c, 2a, b and Fig. 3a, b and Extended Data Figs. 4c–f, 5a–c; 8b, c, e, g, 9a, b, e and 10b–d) were based on logistic mixed models (see the ‘scRNA-seq data analysis’ section). Adjusted P values for differences in the distribution of module scores between control and mutants (Figs. 1f, 2e and 3e and Extended Data Figs. 5e, 8i and 9h) were based on linear mixed models (see the ‘Pseudotime, gene module and differential expression analysis’ section). In Fig. 4, for each comparison of two gene lists, the circles inside the box are coloured and sized according to the significance of the number of overlapping genes in those two lists, reported as the Bonferroni-adjusted P value determined using a hypergeometric test.

Bulk RNA-seq analysis

Three organoids were sequenced per genotype for a total of 30 individual organoids.

Calcium imaging analysis

Five organoids were analysed per genotype. Spontaneous activity was recorded in three different z planes (120 ± 803 neurons per plane (range from 25 to 294 neurons per plane) in control organoids, and 107 ± 75 neurons per plane (range from 32 to 255 neurons per plane) in SUV420H1+/− organoids). P values were calculated from two-tailed t-tests (Fig. 1h, i). P values for cumulative frequency distribution (Extended Data Fig. 7j) of ISI for control and SUV420H1+/− organoids were determined using two-sided Kolmogorov–Smirnov tests. Representative images in Fig. 1g and Extended Data Fig. 7a show one control organoid out of five control and five SUV420H1+/− organoids.

Immunohistochemistry

At least three organoids of each condition were used for verifying the expression of the indicated markers in Extended Data Figs. 1a–c, 3g, 4a, b, 8a, d, f and 9c, d, f.

Western blotting

Each control and mutant protein lysate was blotted at least twice in Extended Data Fig. 3d–f.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.