Main

To investigate the establishment of parvalbumin (PV) and somatostatin (SST) interneuron identities, we first sought to determine the earliest timepoint at which they can be distinguished from projection neurons, the other major derivative in the medial ganglionic eminence (MGE). Our previous work indicated that, at a transcriptional level, this distinction is first detectable in postmitotic populations, when cells diverge into three branches corresponding to specific fates: interneuron precursors (branch 1) and projection-neuron precursors (branches 2 and 3)8. To explore whether there is an earlier chromatin signature that is indicative of interneuron identity, we compared the RNA expression and chromatin accessibility in these precursor populations at embryonic day (E) 13, the peak of MGE interneuron neurogenesis.

Transcription and chromatin in the MGE

To this end, we performed single-cell RNA sequencing (scRNA-seq), single-cell assay for transposase-accessible chromatin with sequencing (ATAC–seq) and multiomic (dual RNA and ATAC) sequencing on the E13 MGE from Dlx6acre;Ai14 mice, using the expression of the Dlx6a-based reporter to discern mitotic progenitors from postmitotic precursors12 (Extended Data Fig. 1a–h, Supplementary Table 1). Diffusion map analysis of E13 MGE scRNA-seq and scATAC–seq datasets resulted in cells being ordered into a single trajectory, in which the primary source of variability was maturation state (Fig. 1a, b). Notably, promoter accessibility for developmentally relevant genes generally increases earlier and persists longer compared to gene expression (Fig. 1c, d, Extended Data Fig. 1i).

Fig. 1: Distal chromatin elements provide the earliest markers of interneuron identity.
figure 1

a, b, Diffusion maps of MGE cells using scRNA-seq (a) and scATAC–seq (b). Cells order into a maturation trajectory using both methods. c, d, RNA expression of maturation-relevant genes (c) and the corresponding promoter accessibility (d) across the maturation trajectory. Scale denotes normalized row-scaled expression or promoter accessibility. Arrowheads indicate the mitotic–postmitotic transition. e, Diffusion map analysis of scRNA-seq data from postmitotic MGE revealed three distinct branches: branch 1, interneurons; branches 2 and 3, projection neurons. f, Detailed analysis of the Maf locus for gene expression, promoter and aggregated accessibility of distal elements shown for early postmitotic (trunk) and branch-specific cells. The heat maps at the bottom show Maf distal elements compared for their accessibility across branches. The timeline on the right indicates the earliest time at which each branch can be distinguished by gene expression. g, Gene expression (red), promoter accessibility (purple), and aggregated accessibility of distal elements (blue) for four branch 1 marker genes across the pseudotime.

To identify the earliest features that are indicative of an interneuron (branch 1) identity, we subsetted postmitotic neurons and used several maturation trajectory methods to identify the three developmental branches, using the multiomic dataset to ensure accurate label transfer between scRNA-seq and scATAC–seq analyses (Fig. 1e, Extended Data Figs. 2a, b, 3a–c). Next, we performed differential expression analysis to identify the earliest branch 1 genes expressed in postmitotic cells. To identify distal chromatin elements associated with these genes, we collated ATAC-seq peaks surrounding them and trained a classifier to identify those that were most informative for distinguishing branch identity (Methods). Further analysis confirmed that these peaks were predominantly accessible in branch 1 cells (Extended Data Fig. 4a, b, Supplementary Table 2).

Narrowing our focus to transcription factors, we found that Maf—a regulator of cell identity13,14—is both highly expressed in branch 1 interneurons and exhibits an enriched DNA-binding motif in their open chromatin (Extended Data Fig. 3). Examination of chromatin accessibility surrounding this gene revealed that distal loci become accessible prior to the initiation of gene expression (Fig. 1f). Similar dynamics were observed for several other branch-1-specific transcription factors (Fig. 1g, Extended Data Fig. 4a, b), suggesting that distal elements might provide the earliest indication of interneuron identity, perhaps as a form of priming15. We therefore explored the accessibility of these distal elements prior to cell cycle exit. To this end, we used our classifier to annotate mitotic and postmitotic cells collected at E13. This failed to detect branch 1 interneurons prior to cell cycle exit (Extended Data Fig. 4c). Therefore, there is no evidence at present of the divergence of interneurons from projection cells prior to cell cycle exit.

Remodelling upon cortical settling

To investigate how early interneuron identity diverges, we next collected scRNA-seq and scATAC–seq datasets from the E18, postnatal day (P) 2, P10 and P28 cortex of Dlx6acre;INTACT mice (Supplementary Table 1). In adult cortical interneurons, scRNA-seq and scATAC–seq profiles are closely correlated16,17,18. However, our investigation of these populations at E13 reveals a misalignment between transcriptional and chromatin signatures in developing cells. To evaluate when during development they come into concordance, we aligned these datasets at each timepoint using both Seurat canonical correlation analysis19 and Conos20 (Fig. 2a–d, Extended Data Figs. 57). Analysis of the more mature populations (P10, P28) confirmed that cluster labels can be transferred from scRNA-seq to scATAC–seq data with high confidence (Fig. 2c, d). By contrast, this was not the case at early postnatal or embryonic timepoints (E18, P2) (Fig. 2a, b).

Fig. 2: Remodelling of interneuron chromatin architecture during migration and post-settling.
figure 2

ad, UMAP of co-embedded scRNA-seq and scATAC–seq data (top) and prediction scores of scATAC–seq assignment to scRNA-defined clusters (bottom) at E18 (a), P2 (b), P10 (c), P28 (d). e, Jaccard distance analysis for PV and SST cell scATAC–seq peaks across timepoints. Ctx, cortex; ALM, anterior lateral motor cortex; V1, primary visual cortex. f, Average signal within cell-type-specific accessible peaks identified at P28 located for proximal elements (gene bodies or promoters: TSS ±2 kb) or distal elements across timepoints. SPMR, signal per million reads. g, Aggregated scores (AS) for branch-specific peaks. h, Motif enrichment in class-specific loci at each timepoint. Each transcription factor enrichment value is normalized by the largest enrichment value in the population. i, Relative distribution of P28 MEF2C CUT&RUN peaks in PV and SST interneurons. Peaks represent the intersection of peaks across two biological replicates for each cell type. j, Peaks identified exclusively in CUT&RUN (C&R; orange) or jointly in CUT&RUN and ATAC–seq (green) analyses. k, Number of genes linked to CUT&RUN peaks found uniquely in PV cells (blue), SST cells (red) or shared in both populations (purple). Peaks were assigned to genes based on scATAC co-accessibility with promoters.

P2 is the timepoint at which interneurons complete migration, reach their settling position in the cortex and begin circuit integration21. Consistent with our RNA/ATAC alignment comparisons, Jaccard analysis of scATAC–seq data across development segregates timepoints before and after P2 (Fig. 2e). To examine the emergence of subtype-specific features at this transition, we analysed peaks unique to PV (44,322) or SST (24,428) interneurons at P28 and found that these mature subtype-specific elements are first detectable in distal loci at P2 (Fig. 2f, Extended Data Fig. 8a–d). Therefore, the developmental window during which interneurons settle within cortical layers represents a period in which chromatin structure undergoes substantial remodelling.

To characterize this transition, we catalogued accessible chromatin elements specific to PV or SST interneurons at both migratory (E14–E18) and post-settling (P2–P28) timepoints. Branch-1-specific loci generally become less accessible at this juncture, which is consistent with their role in establishing common interneuron identity (Fig. 2g). Next, we identified transcription-factor motifs enriched during these two periods, the majority of which were shared between PV and SST interneurons: LHX and DLX motifs during migration, and FOS/JUN motifs post-settling (Supplementary Table 3). By contrast, few transcription factors were cell-type-selective during these periods. In SST cells, SOX and POU motifs were sequentially enriched during migratory and post-settling periods, respectively (Fig. 2h). In PV cells, MEF motifs became enriched at E18 and persisted post-settling. These dynamic patterns correspond with the known function of these transcription factors during these phases of interneuron development8,22,23,24.

Our scATAC–seq analysis suggests that certain transcription factors are utilized in a cell-type-specific manner despite not being selectively expressed. To investigate this possibility, we performed CUT&RUN sequencing for MEF2C in the mature PV and SST populations. This revealed both common and subtype-specific binding of MEF2C, with more loci specifically bound in PV cells compared to SST cells (Fig. 2i, Extended Data Fig. 9a–d). Most of the peaks identified with CUT&RUN were also found in our scATAC–seq analysis, although only a subset of these featured canonical MEF2C-binding motifs (Fig. 2j, Extended Data Fig. 9e). We could therefore use peak co-accessibility to link CUT&RUN peaks to gene promoters (Fig. 2k). Gene ontology analysis revealed that these putative MEF2C targets are involved in synapse assembly and organization (Extended Data Fig. 9j, k).

Divergence of gene regulatory networks

The divergence in the gene expression and chromatin structure of PV and SST interneurons greatly accelerates upon settling within the cortex. To characterize these dynamics, we constructed gene regulatory networks (GRNs) using a multitask inference methodology25 and separated shared regulatory interactions from cell-type-specific ones across development (E18, P2, and P28) (Fig. 3a, b). Each regulator–target interaction is represented in our GRNs as an edge linking transcription factors to genes. The number of detected genes and edges progressively increases at each sequential timepoint (Fig. 3c, d). Over development, cell-type-specific edges replace shared ones (Fig. 3e, f). Notably, transcription factors co-expressed within both cell types shift from regulating common to cell-type-specific genes (Fig. 3g). This observation suggests that the chromatin landscape has been sculpted such that cell-type-specific gene expression can be mediated without a requirement for selective transcription factor expression.

Fig. 3: The maturation of gene networks is characterized by the emergence of cell-type-specific regulatory interactions.
figure 3

a, Schematic illustrating the key findings of the GRN analysis. Early in development, shared transcription factors generally target the same genes in both PV and SST cells. By adulthood, cell-type-specific programs take over, with transcription factors regulating genes in a cell-type-specific manner. b, Graphical representation of unique and common gene regulatory edges constructed using GRN analysis. Edges that explain at least 0.05% of the target’s variance are included. Each edge (green line) connects a transcription factor (black dot) to a target gene (red dot) or another transcription factor. c, Total number of genes detected in PV and SST interneurons at each timepoint. For P2 and P28, gene expression is divided into maintained from the earlier timepoint (bottom) versus newly expressed (top). Genes with fewer than 10 counts across all cells were excluded. d, Number of edges at each timepoint. In df, the x axis represents the edges that explain up to a certain value of its target’s variance. e, Edges shared between PV and SST. f, Proportion of the GRN unique to PV or SST cells at each age. g, The proportion of unique targets of common transcription factors. Transcription factors were considered common if they had an absolute log-transformed fold change ≤0.25 and were expressed in more than 10% of both PV and SST cells. Statistics within graphs represent the mean percentage of unique edges per common transcription factor.

Loss of Mef2c perturbs development

Having developed a comprehensive model of gene regulation in PV and SST interneurons, we aimed to investigate how the removal of a critical transcription factor affects gene expression and test our ability to predict these perturbations in silico. In a previous study, we found that Mef2c is essential for the development of PV interneurons, resulting in their loss after P68. This is consistent with the enrichment of MEF2C motifs in PV-specific ATAC–seq peaks (Fig. 2h). However, our CUT&RUN analysis revealed considerable utilization of MEF2C within both PV and SST cells (Fig. 2i–k). Thus, Mef2c is an ideal exemplar with which to examine both the unique and shared aspects of gene regulation across these developing cell types.

Using an Lhx6iCre driver to remove Mef2c in both PV and SST cells, we examined the effect on gene expression and chromatin structure at P2. We first integrated scRNA-seq data from these P2 Mef2c conditional knockout (cKO) mice with that from P2 wild-type (WT) mice. Unbiased clustering analysis revealed that a greater number of cKO cells colocalize with wild-type SST interneurons than with PV cells, indicating that the loss of Mef2c disproportionately disrupts PV cell maturation (Fig. 4a–c). The fragility of cKO neurons necessitated collecting single nuclei for this experiment. To ensure that the differences seen between cells from WT and cKO mice were not a result of comparing whole-cell with nuclear RNA-seq data, we collected an additional nuclear dataset from P2 WT mice and confirmed that, when compared to cKO, both the clustering and differential gene expression were highly similar to the results using whole-cell WT data (Extended Data Fig. 10a–c, Supplementary Table 4).

Fig. 4: Loss of Mef2c disproportionately affects the gene regulatory landscape of PV cells.
figure 4

a, Uniform manifold approximation and projection (UMAP) of integrated E18, P2 and P2 Mef2c cKO scRNA-seq data. b, UMAP in a segregated according to timepoint and colour-coded by cell type. c, Cluster composition delineated by, from top to bottom, marker gene expression, cell type, timepoint and cell number. d, Proportion of scATAC–seq peaks with MEF2C motifs in E18 WT, P2 WT, and P2 Mef2c cKO datasets. Charts are scaled to reflect the total number of peaks. e, Venn diagram showing all peaks with MEF2C motifs in WT P2 and cKO P2 in PV cells (left) and SST cells (right). f, Number of MEF2C-binding sites (identified in the MEF2C CUT&RUN experiment, see Fig. 2) that were either accessible in both the P2 WT and Mef2c cKO scATAC–seq dataset (light green) or were accessible in the P2 WT cells but inaccessible in the cKO (dark green). g, UMAP of simulated Mef2c knockout cells (P2SimKO) and cells from the true cKO RNA-seq dataset (P2cKO), P2 WT RNA-seq dataset (P2WT) and E18 WT RNA-seq dataset (E18).

Our scATAC–seq analysis of cKO cells confirmed the strong effect of Mef2c loss on PV cell development, but also revealed the more subtle effect on SST cells. In clustering analysis of scATAC–seq data, both PV and SST cKO cells segregate from WT cells (Extended Data Fig. 10d–g). Furthermore, during normal development, cell-type-specific and shared peaks featuring Mef2c motifs increase in number between E18 and P2. The overall number of peaks seen in the P2 Mef2c cKO cells, however, resembles that seen at E18, indicating a delay in their maturation (Fig. 4d, e). Notably, the proportion of PV-specific peaks was selectively reduced in the cKO cells (Fig. 4d, f, Extended Data Fig. 10e). These findings indicate that Mef2c sculpts the chromatin landscapes of both PV and SST interneurons during early postnatal development, and has a particularly important role in the opening of PV-specific loci.

To demonstrate that GRNs are more than a static representation of cell state, we queried whether we could use them to quantitatively predict changes in gene expression. First, we generated simulated Mef2c knockout neurons (P2SimKO) by computationally removing Mef2c in the P2 GRN. By integrating these cells with the experimental cKO (P2cKO) and wild-type P2 (P2WT) cells, we observed the formation of clusters primarily composed of P2SimKO and P2cKO cells (Fig. 4g, clusters 4 and 5), confirming that our simulated output matched those seen in true cKO cells. Next we examined the 364 direct targets of Mef2c predicted by the GRN, and found that 81% of them were differentially expressed between P2WT and P2cKO datasets (Supplementary Table 4). These direct targets included 17 transcription factors. We next iteratively calculated the differential expression of their targets (4,205 in total) and found that 81% of these predicted second-order Mef2c targets were correctly inferred. The same analysis comparing cKO data to single-nucleus rather than whole-cell WT data similarly found that 79% of direct and 77% of indirect targets were differentially expressed (Supplementary Table 4). Finally, as an independent validation of our P2 GRN, we re-calculated Mef2c transcription-factor activity using gene expression values measured in the P2cKO dataset. This computational experiment reported a transcription-factor activity centred around zero, compared to the unaltered P2 values that centre around −0.1; this validates that removal of Mef2c eliminates its transcription-factor activity from the network. Taken together, these results demonstrate that, with sufficient scRNA-seq and ATAC–seq data, we are now able to accurately infer the effects of gene loss in silico.

Discussion

In this study, we examined how PV and SST interneurons arise from common progenitors to become molecularly and functionally distinct by adulthood. We observed that these cell types initially share a molecular program that establishes a general interneuron identity before becoming diversified through the actions of cell-type-specific transcription factors. The maintenance of their distinct fates is stabilized in mature cells through the crystallization of unique chromatin landscapes. Our analysis revealed that RNA expression gives an instantaneous measure of the developmental state of a cell, whereas chromatin provides both a history of the developmental progression of a cell and a predictor of its future identity.

As development proceeds, we found that RNA-seq and ATAC–seq data remain discordant until interneurons reach their settling positions and begin terminal differentiation at P2. This represents a critical inflection point at which cells transition from having predominantly shared chromatin architecture to establishing cell-type-specific landscapes. Notably, the comparison of GRNs at different developmental timepoints revealed that, in adulthood, shared transcription factors are able to direct cell-type-specific gene programs. This finding indicates the strong contribution of chromatin architecture to the maintenance of adult fate. Indeed, the ability to reprogram terminally differentiated cells requires the erasure of the unique chromatin marks that characterize them as mature populations26,27.

In addition to providing a global overview of cell-type-specific molecular programs, GRNs are powerful predictive models. With the anticipated improvements that will be gained through iterative analysis, this has far-reaching implications both for developmental biology and for assessing the effect of candidate genes on cell-type-specific development. In this regard, the fact that loss-of-function mutations of MEF2C in humans result in autism spectrum disorder and intellectual disability28,29 suggests that analyses such as this have the potential to provide important insight into the etiology of these disorders. Our work therefore not only reveals how specific interneuron identities arise but also provides a roadmap for understanding both normal and pathophysiological development.

Methods

Mouse lines

For embryonic timepoints, male hemizygous Dlx6acre mice30 (Jax stock 008199) were crossed with female homozygous Ai9 or Ai14 mice31 (Rosa-LSL-tdTomato, Jax stock 007909 or 007914) and checked daily for the presence of a vaginal plug, with the morning a plug was found being considered embryonic day 0. Embryos were screened using a fluorescence microscope for the presence of tdTomato. For embryonic timepoints using wild-type mice, timed pregnant CD-1 mice were used (Charles River 022). For postnatal timepoints, male hemizygous Dlx6acre mice were crossed with female homozygous INTACT mice32 (flox-Sun1-eGFP, Jax stock 021039) to yield Dlx6acre;INTACT offspring. In Mef2c conditional knockout experiments, for wild-type controls, Dlx6acre males were crossed with female homozygous Ai9 mice to yield offspring with both alleles collected at P2. For Mef2c conditional knockouts, male Lhx6iCre/o;Mef2cfl/+ mice33,34 (Jax stock 026555 and 025556) were crossed with INTACT;Mef2cfl/fl females to yield offspring with all necessary experimental alleles collected at P2. All mouse colonies were maintained in accordance with protocols approved by the Institutional Animal Care and Use Committees at Harvard Medical School and the Broad Institute of MIT and Harvard. Both male and female mice were used for all scATAC–seq and scRNA-seq experiments. Blinding and randomization of animal experiments were not applied here.

Single-cell preparation for scATAC–seq and scRNA-seq

For both ATAC–seq and RNA-seq library preparation from embryonic timepoints (E13–E18), embryos were collected from pregnant dams and relevant brain regions were dissected in ice-cold Leibovitz’s L-15 medium. Pooled tissue from several embryos were dissociated using the Neural Tissue Dissociation Kit (T) (Miltyeni Biotec, PN-130-093-231) according to the manufacturer’s instructions. Dissociated cells were then filtered through a 70 µm filter, centrifuged at 300g for 5 min at 4 °C and, where cell-sorting was required, resuspended in 1% BSA in PBS for cell sorting of tdTomato+ cells. Sorting was performed on a Sony SH800S cell sorter with a 100 µm chip. Cells for scRNA-seq were then directly used for input to 10x Genomics scRNA-seq library prep. For scATAC–seq, nuclei were isolated from sorted cells as described in the 10x Genomics demonstrated protocol ‘Nuclei Isolation for Single Cell ATAC Sequencing’ for input to single-cell ATAC library preparation. For the P2 whole-cell WT RNA-seq dataset, cell preparation was performed as described above, except with ACSF used for the dissection instead of L-15.

For P10 and P28 RNA-seq datasets, as well as the Mef2c cKO P2 dataset and control wild-type nuclear P2 dataset, nuclei were isolated as described previously with some modifications32. In brief, brains were collected, sectioned coronally on a stainless steel mouse brain slicer (Zivic Instruments), and regions of interest were dissected in ice cold homogenization buffer (HB) (0.25 M sucrose, 25 mM KCl, 5 mM MgCl2, 20 mM Tricine-KOH, 1 mM DTT, 0.15 mM spermine, 0.5 mM spermidine). Dissected tissue was then transferred to a 2 ml dounce homogenizer containing HB + 0.3% IGEPAL-CA630 + 0.2 U µl−1 RNAse inhibitor and homogenized with 10 strokes of pestle A and 10 strokes of pestle B. Nuclei were then filtered through a 30 μm filter and centrifuged at 500g for 5 min at 4 °C. The pellet was resuspended in 1% BSA + 0.2 U µl−1 RNAse inhibitor in PBS and centrifuged again at 500 g for 5 min at 4 °C. The pellet was then again resuspended in 1% BSA + 0.2 U µl−1 RNAse inhibitor in PBS and filtered through a 40 μm filter for sorting of GFP+ nuclei on a Sony SH800S cell sorter with a 100 µm chip. Nuclei were collected in 1% BSA + 0.2 U µl−1 RNAse inhibitor for input to 10x Genomics scRNA-seq.

For all ATAC–seq postnatal timepoints (P2 WT and cKO, P10, P28), brains were collected, sectioned coronally on a stainless-steel mouse brain slicer (Zivic Instruments), and regions of interest were dissected in ice-cold ACSF. Tissue was then transferred to a Dounce homogenizer containing lysis buffer (10 mM Tris-HCl, 10 mM NaCl, 3 mM MgCl2, 0.01% Tween-20, and 0.01% IGEPAL CA-630, 0.001% digitonin). Tissue was homogenized with 10 strokes of pestle A, 10 strokes of pestle B, and incubated for 5 min on ice before being filtered through a 30 μm filter and centrifuged at 500g for 10 min at 4 °C. The pellet was resuspended in 1% BSA in PBS for sorting for GFP+ nuclei. For all timepoints, cells or nuclei were sorted into diluted nuclei buffer (10x Genomics) on a Sony SH800S cell sorter with a 100 µm chip.

Single-cell ATAC–seq and RNA-seq library preparation and sequencing

Single-cell ATAC–seq libraries for all timepoints were prepared on the 10x Genomics platform using the Chromium Single Cell ATAC Library & Gel Bead Kit v1.0 (PN-1000111), Chromium Chip E Single Cell kit (PN-1000156) and Chromium i7 Multiplex Kit N, Set A (PN-1000084) as instructed by the manufacturer. Single-cell RNA-seq libraries for E13, P2 Mef2c cKO, P10 and P28 datasets were prepared on the 10x Genomics platform using the Chromium Single Cell 3ʹ Library and Gel Bead Kit v3.0 (PN-1000075), Chromium Single Cell 3ʹ Library Construction Kit v3.0 (PN-1000078), Chromium Chip B Single Cell kit (PN-1000154) and Chromium i7 Multiplex Kit (PN-120262) as instructed by the manufacturer. The P2 nuclear RNA-seq library was prepared using the Chromium Single Cell 3ʹ Library and Gel Bead Kit v3.1 (PN-1000121). Single-cell RNA-seq libraries for E18 and P2 whole-cell wild-type datasets were prepared on the 10x Genomics platform using the Chromium Single Cell 3ʹ Library and Gel Bead Kit v2.0 (PN-120237), Chromium Single Cell 3ʹ Library Construction Kit v2 (PN-120267), Chromium Chip A Single Cell kit (PN-1000009) and Chromium i7 Multiplex Kit (PN-120262) as instructed by the manufacturer. Libraries were sequenced using the Nova-Seq 100 cycle kit (Illumina) by Broad Institute Genomic Services.

Single-cell ATAC–seq data processing

Raw scATAC–seq data was processed using the standard CellRanger ATAC v1.1.0 pipeline (10x Genomics). Reads were aligned to the GRCm38 (mm10) Mus musculus genome. Peak calling was performed using ChromA with default parameters “ChromA atac -i <file1> <file2> -spec mouse”. Peak calling on subpopulations of cells was performed with ChromA after isolation specific barcodes using “ChromA filter -c <cells> -i <file>”. Jaccard distance calculation was performed using bedtools jaccard.

Single-cell RNA-seq data processing

Raw scRNA-seq data was processed using the standard CellRanger v3.0.0 pipeline (10x Genomics). Reads were aligned to the GRCm38 (mm10) M. musculus genome. For single-nuclei RNA-seq datasets, a custom pre-mRNA reference from the mm10 genome was generated allowing intronic sequences to be included in gene counts. It cannot be ruled out that some differences in gene expression could result from differences in 10x Genomics chemistry (v2 vs v3) or cell preparation techniques (whole-cell vs nuclear).

E13 Multiome (dual RNA/ATAC) library preparation and data processing

E13.5 embryos were collected from timed pregnant CD-1 mice and the MGE was dissected in ice-cold Leibovitz’s L-15 medium. Pooled tissue from several embryos were dissociated using the Neural Tissue Dissociation Kit (T) (Miltyeni Biotec, PN-130-093-231) according to the manufacturer’s instructions. Dissociated cells were then filtered through a 70 µm filter, and centrifuged at 300g for 5 min at 4 °C. Nuclei then were isolated as described in the 10x Genomics demonstrated protocols ‘Nuclei Isolation from Embryonic Mouse Brain for Single Cell Multiome ATAC + Gene Expression Sequencing’ for fresh tissue for input to single-cell Multiome library preparation. Single-cell multiomic (ATAC + Gene Expression) libraries were prepared on the 10x Genomics platform using the Chromium Next GEM Single Cell Multiome ATAC + Gene Expression Reagent Bundle (PN-1000285), Chromium Next GEM Chip J Single Cell Kit (PN-1000230), Single Index Kit N Set A (Pn- PN-1000212) and Dual Index Kit TT Set A (PN-1000215) as instructed by the manufacturer. Libraries were sequenced using the Nova-Seq 100 cycle kit (ATAC library) and Next-Seq 150 cycle kit (Gene Expression library) (Illumina) by Broad Institute Genomic Services.

Raw multiomic data was processed using the Cell Ranger ARC v1.0.0 pipeline (10x Genomics). Reads were aligned to the GRCm38 (mm10) M. musculus genome.

Assignment of mitotic and postmitotic states

We defined mitotic and postmitotic populations for cells within scRNA-seq datasets following a previously described procedure8. In brief, in each cell we compared the sum of phase-specific gene expression to the distribution of background 100 randomly selected genes. Phase-specific enrichment was z-scored against the background gene sets. Finally, the S-phase score and G2/M phase score was subtracted to create the cell cycle score which is used to select cycling cells. Cell cycle signature was calculated using the previously published cell cycle gene list35. We did not regress out the cell cycle signature. In scATAC–seq datasets, cells were assigned a developmental phase by extending labels of multiome experiments. In multiome experiments, the scRNA-seq portion of the data can be annotated using the above procedure and these labels can be propagated to the corresponding scATAC–seq dataset.

Pseudotime trajectory analysis

Branch analysis for postmitotic cells from the scRNA-seq datasets was performed as described previously8. In brief, postmitotic cells were isolated and the Seurat pipeline was run19. Unsupervised clustering was run using the Leiden algorithm. Next, diffusion map representation was calculated using the diffusionMap library in R. Finally, 10 principal component analysis components were fed into Palantir36 and pseudotime and branch probabilities were calculated using the following parameters (num_waypoints=1000, knn=30). Lineages were defined using slingshot37.

Generation of pseudotime time traces of gene expression and accessibility

Chromatin accessibility and gene expression time traces were smoothed over pseudotime with local polynomial regression fitting in R (loess) separately, then min–max normalized.

Peak gene branch-specific association

To associate chromatin accessibility peaks annotated with ChromA38 to corresponding genes, we treated each gene independently. First, we selected peaks located within a 500 kb window around the gene and built a classifier by using these peaks as features in an elasticnet regularized mutlinomial classification (sklearn SGDClassifier with following parameters class_weight='balanced', l1_ratio=0.95, loss='log', penalty='elasticnet') task set to distinguish cells confidently assigned to each postmitotic branch using peaks as features. Hyper parameters were tuned using cross validation, and performances estimated using an 80% (train) 10% (validation) 10% (test) observation split. Finally, the most classification-relevant peaks are kept and sorted if their absolute coefficient is >0. Distal aggregated accessibility score was built by first performing a min–max normalization of each peak time trace (through pseudotime). Next, time traces were added and min–max normalized again. The same classification procedure was used for annotated cells belonging to each branch or early mitotic cells. Mitotic cells were distinguished by the mitotic–postmitotic separation. Early mitotic cells were those mitotic cells that lacked expression of Sp9.

Calculation of ATAC–seq aggregated scores

Aggregated scores were calculated by selecting a subset of peaks that are differentially accessible in the condition of interest (that is, peaks for the calculation of branch 1 aggregated scores are peaks differentially accessible in branch 1 cells compared to branch 2 and 3). Next, library size was corrected by normalizing the number of reads in each region by the total number of reads in the cell. The normalized read size was aggregated by summing reads in selected regions. Finally, an average aggregated accessibility was computed.

scATAC–seq cell-type classification

Cardinal class assignments into PV and SST cell types at P28 and P10 was performed by analysing clusters for promoter accessibility of cardinal class marker genes. Next, to transfer labels to E18, we identified mutually exclusive peaks shared in each cardinal class at P28 and P10 (that is, running ChromA consensus on datasets from PV cells belonging to P10 and P28 or SST at the same time points). We used these peaks to build a classifier identifying cardinal classes and using CGE cells as a null hypothesis. We repeated the same procedure to transfer labels at E14 by using mutually exclusive peaks from E18 and P10.

Clustering of scRNA-seq datasets

E18, P2, P10 and P28 scRNA-seq datasets were processed using the Seurat pipeline19. The number of principal components used for clustering analysis were determined with the ElbowPlot function. Graph-based clustering was then used as described to assign cells to clusters. Clusters in the P28 dataset were then assigned to a cardinal class on the basis of the expression of marker genes. Cells within P10, P2 and E18 datasets were assigned to a cardinal class via Seurat canonical correlation analysis (CCA) and label transfer of P28 dataset cardinal class assignments as described previously8.

Generation of snap objects and integration of scATAC–seq and scRNA-seq data

The fragments.tsv from the output of the CellRanger ATAC pipeline were then used to generate snap files for analysis using the snapATAC package as described previously18. 5 kb bins were used to partition the genome and create the cell-by-bin matrix. Cells were clustered using k-nearest-neighbour graph-based clustering (k = 15). For visualization of marker gene loci, accessibility within gene body regions was calculated, counts were normalized by RPM, and the resulting cell-by-gene matrix was smoothed with Magic. For integration with scRNA-seq data, a cell-by-gene matrix was produced for the scATAC–seq dataset with the same method for all variable genes within the RNA-seq dataset. The scATAC–seq cell-by-gene matrix was then converted to a Seurat object using the snapATAC function snapToSeurat. Defining transfer anchors and alignment of scATAC–seq and scRNA-seq data was then performed as described previously19.

Generation of accessibility heat maps

Accessibility heat maps (for cardinal-class-specific regions and comparison of proximal vs distal elements) were generated using bedtools computeMatrix and plotHeatmap. Reference files were bed files of peaks exclusively called within the given cardinal class. For comparison of proximal and distal elements, proximal elements were defined as those regions falling within gene bodies or promoter regions (transcription start site (TSS) ±2 kb), whereas distal elements comprised everything outside these regions.

Motif enrichment analysis

Transcription factor motif enrichment analysis was performed on appropriate bed files using Homer39 with the following parameters: findMotifsGenome.pl input.bed mm10 output -size 200 -len 8. For identification of branch-specific transcription factors, the intersection was found between motifs significantly enriched in only one branch (within branch-exclusive loci) and for which the corresponding transcription factor was differentially expressed within the same branch compared to the other two branches. In Fig. 2, motif enrichment is quantified as the percentage enriched, normalized to the enrichment of the top percentage enriched motif to account for differences in the overall number accessible of peaks across developmental datasets.

Mef2c CUT&RUN library preparation

Nuclear isolation was performed as described previously32. The neocortex was dissected from adult Pvcre;INTACT and Sstcre;INTACT mice in ice-cold buffer HB (0.25 M sucrose, 25 mM KCl, 5 mM MgCl2, 20 mM Tricine-KOH pH 7.8, 1 mM DTT, 0.15 mM spermine, 0.5 mM spermidine, 1X cOmpleteTM, EDTA-free Protease Inhibitor Cocktail (Sigma) and 10 mM sodium butyrate). Cortices were homogenized by a Dounce homogenizer in the presence of 0.3% IGEPAL. The homogenate was then filtered through a 30 µm MACS SmartStrainer (Miltenyi Biotec), mixed with working solution (5 volume of OptiPrep + 1 volume of Diluent (150 mM KCl, 30 mM MgCl2 and 120 mM Tricine-KOH pH 7.8)) at a 1:1 ratio, and underlaid with 30% and 40% gradient solution (working solution diluted by buffer HB). Following ultracentrifugation at 10,000g for 18 min at 4 °C, nuclei were collected from the interface of 30% and 40% layers and then resuspended in FACS buffer (1X PBS, 1% BSA, 1X Protease Inhibitor Cocktail and 10 mM sodium butyrate) for sorting GFP-positive nuclei on a Sony SH800S sorter.

CUT&RUN sequencing on sorted nuclei was performed as described previously40. Nuclei were resuspended in CUT&RUN wash buffer (20 mM HEPES pH 7.5, 150 mM NaCl, 0.2% Tween 20, 1 mg ml−1 BSA, 0.5 mM spermidine, 1X protease inhibitor cocktail, and 10 mM sodium butyrate) and captured with Concanavalin A beads (Bangs Laboratories) at room temperature for 10 min. Bead-bound nuclei were then incubated with MEF2C antibody (182901-AP, Proteintech, 1:100) in antibody buffer (CUT&RUN wash buffer, 0.1% Triton X-100, and 2 mM EDTA) overnight at 4 °C. The next day, beads were washed twice with antibody buffer and then incubated with pA-MNase (0.7 µg ml−1) in antibody buffer at 4 °C for 1 h. After two washes with Triton wash buffer (CUT&RUN wash buffer and 0.1% Triton X-100), tethered cleavage was induced by the addition of CaCl2 at a final concentration of 3 mM on ice for 30 min. To release the protein–DNA complex, nuclei were combined with 2X STOP solution (340 mM NaCl, 20 mM EDTA, 4 mM EGTA, 0.04% Triton X-100 and 50 µg ml−1 RNase A) at a 1:1 ratio and incubated at 37 °C for 20 min. Solubilized chromatin was separated from bead-bound nuclei on a magnetic rack, supplemented with 0.1% SDS, and then digested with 0.2 mg ml−1 Proteinase K at 65 °C for 1 h. Released DNA was extracted with phenol chloroform. CUT&RUN libraries were constructed using NEBNext Ultra II DNA Library Prep Kit as described previously41. Libraries with different barcodes were pooled and sequenced on NextSeq with NextSeq 500/550 High Output Kit v2 (75 cycles), generating 7–14 million paired-end 42-bp reads per sample.

Mef2c CUT&RUN data processing and analysis

CUT&RUN data were processed as described previously42. In brief, paired-end reads were trimmed with Trimmomatic to remove adaptor sequences. Trimmed reads were then aligned to the mm10 genome using Bowtie2 with the –dovetail setting enabled. PCR duplicates, multiple mapped reads and reads overlapping with the blacklist regions were excluded from downstream analyses. To ensure all libraries were compared at the same read depth, BAM files were downsampled to 8.5 million reads before peak calling. For each sample, peaks were called and bigwig files were generated using MACS2 (-f BAMPE -g mm -q 0.05 -B) (v.2.2.7.1; https://github.com/taoliu/MACS). Because the majority of CUT&RUN peaks overlapped with scATAC–seq peaks, scATAC–seq coaccessibility data was used to assign peaks to gene promoters. For this, the Cicero package43 was used to generate co-accessibility scores (conns) for each peak as described previously for P28 V1 scATAC–seq data. The resulting table was then filtered for those containing peaks falling within promoter regions (within 2 kb of a TSS), then for those connections with a co-accessibility score greater than 0.2. Finally, the peaks fitting these criteria were filtered for those overlapping with CUT&RUN peaks. Finally, genes associated with the promoters co-accessible with these peaks were used for downstream analysis (comparison of number of genes associated, gene ontology). Gene ontology analysis was performed using ClusterProfiler (https://github.com/YuLab-SMU/clusterProfiler).

Gene regulatory network inference

Gene regulatory network inference was performed using a previously described multitask learning algorithm25 implemented in the inferelator pipeline (https://github.com/flatironinstitute/inferelator)44. In brief, cellular gene expression (in matrix form, dimensions cells by genes) was represented as a linear multiplication between a transcription factor activity matrix (in matrix form, dimensions cells by transcription factors) and the gene regulatory network linking genes to transcription factors (in matrix form, dimensions transcription factor by genes). First, a prior structure of the gene regulatory network was calculated by using chromatin information to identify the transcription-factor binding motif and link transcription factors to nearby genes (inferelator-prior; http://github.com/flatironinstitute/inferelator-prior). Next, the transcription factor activity matrix was computed by inverting the linear system, using the already computed prior network matrix. Finally, gene expression and the activity matrix were fed into the multitask learning pipeline. Gene regulatory networks are displayed by using jp-gene-viz (https://github.com/simonsfoundation/jp_gene_viz). To simulate Mef2c KO expression data, we set to zero the Mef2c value in the transcription factor activity matrix and then multiplied this matrix by the gene regulatory network. These simulated expression values were then processed through the regular RNA-seq processing pipeline. Scripts for analysis are available at https://github.com/marianogabitto/ChromatinDynamics2020-Analysis.

Mef2c cKO RNA-seq data processing

Single-cell RNA-seq datasets collected from the cortex of wild-type mice at E18 and P2, and from Mef2c conditional knockout mice at P2 (P2Mef2c-KO), were aligned by different methods (Seurat, Conos; using default parameters), with both algorithms displaying similar results. The results shown in Fig. 4 represent Seurat alignment after regular pre-processing.

Mef2c cKO ATAC–seq data processing

The P2 scATAC-seq data from Mef2c cKO mice was processed to generate a SnapATAC object and the cluster the cells as described above. Clusters were assigned to the PV or SST cardinal class on the basis of the accessibility of marker genes for either class. Peaks were then called for each class using ChromA38. HOMER analysis was performed as described above to annotate peaks containing a Mef2c motif for comparison to E18 and P2 wild-type datasets. The overlap of peaks across datasets or cell types was determined using bedtools jaccard.

Mef2c in silico cKO prediction

The P2-Mef2c-simKO dataset reconstructed from our network inference algorithm (equation 1) was computed in the following way.

$${X}_{{\rm{ij}}}={\rm{kTFs}}{\beta }_{{\rm{ik}}}{\hat{A}}_{kj}$$

First, we modelled the expression X of a gene i in a sample j (Xij) as the weighted sum of the activities of each transcription factor k at sample j (Ak,j). Magnitude and direction (activation or repression) of a regulatory interaction (β) between transcription factor k and gene i were learned by solving for β. Next, in the A matrix, Mef2c activity was zeroed for every cell (A*). Finally, the P2-Mef2c-simKO reconstructed gene expression (X*) matrix was calculated by multiplying β by A*. To calculate knock-out transcription-factor activity, we solved equation 1 by inverting the relationship between the network and the transcription factor activity (Â=βprior−1 XMef2c-cKO). In this case, we used gene expression values belonging to P2-Mef2c-cKO dataset and the network was our prior network computed from the P2 WT dataset. log fold change was then computed between P2-Mef2c-WT and P2-Mef2c-cKO. For this analysis, PV and SST cells were grouped together and the perturbation was performed on the population as a whole, and compared to the entire true KO and WT datasets containing both PV and SST cells.

In order to predict targets of MEF2C, genes with edges connected directly to MEF2C were considered direct targets. Of these, 17 were transcription factors. Genes connected to these 17 transcription factors were considered indirect targets of MEF2C. To determine the validity of these predicted targets, they were compared to experimental Mef2c cKO scRNA-seq data compared to both whole single-cell (WTwc) and single-nuclear (WTnuc) wild-type scRNA-seq data. Both comparisons were considered because the whole-cell WT dataset was used to compute the GRN, but the cKO dataset was collected using single nuclei and we therefore wanted to ensure that differentially expressed genes were not a result of comparing whole cell to nuclear data. To that end, Batchelor and scran were used to preprocess and compute batch-corrected expression matrices. Two alignments were then computed : WTwc and cKO (WTK); WTnuc and cKO (WTNK). Each alignment were preprocessed with multiBatchNorm using the genes intersection between each dataset pair, and corrected expression generated with fastMNN. The average corrected expression by condition was then computed. Fold change of average corrected expression was then computed using the following formula : FC = (A−B)/B. We then scored each of direct and indirect MEF2C targets identified in the P2 GRN, and considered our prediction accurate if the fold change was ≥0.5 or ≤−0.25.

Statistics

No statistical methods were used to predetermine sample sizes. Clustering of scRNA-seq and scATAC–seq data was performed in an unbiased manner. For motif analysis, a P value of less than 0.01 was considered significant. A non-parametric Wilcoxon rank-sum test was used to find differentially expressed genes between clusters or cell types within scRNA-seq datasets. For scATAC–seq datasets, ChromA was used to call peaks within each dataset as described above and peaks called exclusively within a dataset were used to define mutually exclusive peaks.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.