Introduction

Genome-wide association studies (GWAS) have successfully identified hundreds of high-confidence genetic risk loci for neuropsychiatric disorders [1,2,3,4,5]. However, the DNA variants exhibiting strongest evidence for association with these conditions are typically in noncoding sequence, making it difficult to predict the actual susceptibility genes. One possible solution is to use expression quantitative trait loci (eQTL) maps in relevant human tissues to determine whether identified risk variants are associated with altered expression of specific genes at these loci e.g., [6]. An alternative approach is to first define the DNA variants associated with the cis-component of a gene’s expression in a given tissue, and then use these to predict the relative expression of that gene in cases and controls from a much larger GWAS dataset [7, 8]. This latter design, known as a transcriptome-wide association study (TWAS), recognizes that a gene might be regulated by multiple cis-acting variants, involves a smaller number of independent tests than a typical GWAS, and allows direct interpretation of risk in terms of the potentially pathogenic mechanism of gene expression.

Prenatal brain development is hypothesized to be an important window of vulnerability for several neuropsychiatric disorders [9, 10], implying that regulatory effects of risk alleles might operate during this period. Indeed, common genetic risk variants for these conditions have been found to be enriched within epigenomic annotations indicative of active regulatory genomic sites in the fetal brain [11, 12]. Moreover, we have recently mapped eQTL operating in the human brain during the second trimester of gestation, showing these to be enriched among common risk variants for various neuropsychiatric disorders, and providing the means to link risk alleles with gene expression in the prenatal brain [13]. Here, we combine data from that study with large-scale GWAS data for neuropsychiatric disorders to identify, through TWAS, genes and individual transcripts where heritable cis-effects on their expression in the human fetal brain are associated with risk for attention deficit hyperactivity disorder (ADHD), autism spectrum disorder (ASD), bipolar disorder, major depressive disorder, and schizophrenia.

Methods

Datasets

Predictors of cis-regulated gene expression were derived from whole transcriptome RNA sequencing and genome-wide genotyping of brain tissue from 120 human fetuses aged 12–19 post-conception weeks (for a full description of samples and data generation, see ref. [13]). We performed TWAS using genome-wide association summary statistics from recent large-scale studies of schizophrenia (40,675 cases, 64,643 controls) [1], major depressive disorder (135,458 cases, 344,901 controls) [2], ADHD (20,183 cases, 35,191 controls) [3], ASD (18,381 cases, 27,969 controls) [4] and bipolar disorder (20,352 cases and 31,358 controls) [5].

TWAS analysis using FUSION

Single-nucleotide polymorphism (SNP)-weight predictors of Ensembl gene and transcript level cis-expression were generated from fetal brain RNA sequencing and genotype data using the FUSION pipeline (http://gusevlab.org/projects/fusion/). We restricted our analyses to genes and transcripts with significant evidence of cis-heritable expression at the default P value (P < 0.01) in FUSION. GWAS summary statistics were prepared for use in FUSION using the munge_sumstats.py script in LD Score Regression (https://github.com/bulik/ldsc). The extended major histocompatibility complex (MHC) region (GRCh37/hg19 coordinates chr6:28,477,797–33,448,354) was removed prior to analysis to avoid spurious associations driven by the linkage disequilibrium pattern in this region. TWAS was performed on autosomal chromosomes using the FUSION.assoc.test.R script with default parameters. A multiple-testing correction was applied to TWAS P values using the Bonferroni method within each phenotype. To compare our TWAS findings with GWAS risk loci, implicated genes and transcripts were mapped in relation to the genomic coordinates of the genome-wide significant loci from their corresponding GWAS (12 loci in ADHD [3], 5 loci in ASD [4], 30 loci in bipolar disorder [5], 44 loci in major depressive disorder [2], and 145 loci in schizophrenia [1]).

Conditional analysis

Genes and transcripts achieving TWAS-wide significance (Bonferroni-corrected P < 0.05) for each disorder were assigned to loci using a ±250 kb window. Loci containing multiple implicated genes/transcripts were subjected to conditional analysis, implemented by the FUSION.post_process.R script, to determine statistically independent signals.

Results

Using FUSION software, we identified 1351 genes and 3985 individual Ensembl transcripts displaying significant cis-heritable expression (P < 0.01) in our reference panel of 120 fetal brains. We then determined genetic predictors of the cis-component of each gene/transcript’s expression within the reference panel and used these to impute the cis-genetic component of gene expression into GWAS summary data for five major neuropsychiatric disorders [1,2,3,4,5].

We identified 32 genes and 103 individual transcripts for which genetic cis-effects on fetal brain expression are associated with schizophrenia at a Bonferroni-corrected P < 0.05 (Supplementary Tables 1 and 2). Eleven of these genes and 34 of these transcripts are located within 16 of the 145 schizophrenia-associated loci identified in the corresponding GWAS [1]; a further 21 genes and 69 transcripts reside outside of these regions and were therefore not previously implicated. We excluded the MHC on chromosome 6 from our analyses due to extensive linkage disequilibrium in the region, but note that we have previously implicated expression of C4A (as well as that of other genes at the MHC locus) in risk for schizophrenia in our previous analysis of these fetal brain gene expression data using summary data-based Mendelian randomization [13]. Of genes outside of the MHC region, most significant association was observed for predictors of increased expression of SMDT1 (Pcorrected = 4.31 × 10−7), encoding a mitochondrial calcium channel subunit. This association has not, to our knowledge, been reported in any schizophrenia TWAS performed using expression predictors from adult brain tissue and may therefore indicate a fetal-specific risk mechanism. Of the other 31 genes that we implicate in schizophrenia at Bonferroni-corrected significance, 20 are significantly associated with schizophrenia with the same direction of effect based on expression predictors from adult cerebral cortex [14] (Supplementary Table 1). These include increased expression of AS3MT, encoding arsenite methyltransferase (this study: Pcorrected = 7.55 × 10−7; Gandal et al. [14] study: Pcorrected = 0.0001), consistent with previous findings in adult and fetal brain using measures of allele-specific expression [15]. Of the 24 genes that we implicate that also have HUGO Gene Nomenclature Committee (HGNC) symbols, 6 (SMDT1, WBP2NL, CSPG4P11, DDHD2, PCDHA2, and ARL14EP) are also reported to be differentially cis-regulated in association with schizophrenia genetic risk in a recent TWAS using a distinct collection of human fetal brain samples [16], and all with the same direction of effect. We also note association with expression predictors for transcripts of genes that have been implicated in schizophrenia through splicing QTL in the adult [17, 18] or fetal brain [16], including transcripts of APOPT1 (Pcorrected = 1.52 × 10−9), SDCCAG8 (Pcorrected = 4.6 × 10−8), and GNL3 (Pcorrected = 1.02 × 10−5).

ADHD was found to be associated with fetal brain expression predictors for three genes and four individual transcripts at a Bonferroni-corrected P < 0.05 (Supplementary Tables 3 and 4). Only one of these genes (ST3GAL3; see below) resides within any of the 12 risk loci identified by the corresponding GWAS [3]. The strongest association at the gene level was with reduced expression of COL28A1 (Pcorrected = 0.012), encoding a collagen that is predominantly expressed in developing neural tissue [19]. Consistent with this being a primarily early neurodevelopmental risk mechanism, a previous TWAS of ADHD using expression predictors from multiple adult brain regions found only nominally significant evidence for association with COL28A1 expression in two brain areas (amygdala and cerebellum; Supplementary Table 3) [20]. The most significant association we observed for ADHD was with increased expression of a transcript of ST3GAL3 (Pcorrected = 1.19 × 10−9), a gene linked to intellectual disability [21] which has been reported to be differentially methylated at birth in association with later ADHD symptomatology [22]. The ST3GAL3 gene has previously been implicated in ADHD through a TWAS analysis combining results across multiple adult brain regions [20]. The second most significant association with ADHD we observed was for predictors of increased expression of a transcript of the tyrosine kinase TIE1 (Pcorrected 0.0047), again supported by TWAS in adult brain, where association between ADHD and predictors of higher TIE1 expression in adult dorsolateral prefrontal cortex was reported [20].

We identified 17 genes and 29 transcripts for which predictors of cis-heritable expression in fetal brain were associated with ASD at a Bonferroni-corrected P < 0.05 (Supplementary Tables 5 and 6). None of these genes or transcripts reside within the 5 loci exhibiting genome-wide significant association with ASD in the corresponding GWAS [4]. Consistent with our previous TWAS of ASD using expression weights derived from a subset of the present samples [23] and a recent TWAS using predictors derived from adult cerebral cortex [14], the majority of implicated genes are located within a common polymorphic inversion on chromosome 17q21, which is associated with differential cis-regulation of numerous genes in the fetal brain [13]. The gene within this inversion for which expression predictors from the adult brain showed strongest association with ASD is LRRC37A (Supplementary Table 5), encoding a leucine-rich repeat-containing protein. Expression of LRRC37A in the fetal brain has also been implicated in intracranial volume [16] and the personality trait of neuroticism [13]. The only association with ASD we observed outside of this inversion was with decreased expression of a transcript of TM2D2 (Pcorrected = 0.038), a protein-encoding gene of unknown function.

Bipolar disorder was found to be associated with fetal brain expression predictors for 8 genes and 19 transcripts at a Bonferroni-corrected P < 0.05 (Supplementary Tables 7 and 8). Five of these genes and 13 of these transcripts are located within 4 of the 30 loci associated with bipolar disorder in the corresponding GWAS [5]; a further three genes and six transcripts reside outside of these loci and therefore represent novel associations. Implicated protein-coding genes include NMB, encoding neuromedin B (Pcorrected = 0.0002), LRRC57, encoding Leucine Rich Repeat Containing 57 (Pcorrected = 0.001) and DDHD2 (Pcorrected = 0.013), encoding DDHD Domain Containing 2. NMB and LRRC57 also show significant association with bipolar disorder in a previous TWAS based on expression predictors from adult cerebral cortex [14] (Supplementary Table 7); however, the association with NMB is in the opposite direction (i.e., increased, rather than decreased, expression in adult brain in association with risk for bipolar disorder), and neither observation is transcriptome-wide significant. The association between reduced DDHD2 expression and bipolar disorder has previously been reported based on gene expression predictors from the adult dorsolateral prefrontal cortex, cerebellum, pituitary, and caudate [24]. Our most significant observation in fetal brain for bipolar disorder was for predictors of increased expression of a transcript of PACS1 (Pcorrected = 3.8 × 10−5), encoding Phosphofurin Acidic Cluster Sorting Protein 1, a gene in which rare mutations cause intellectual disability [25]. Predictors of increased expression of PACS1 are also associated with bipolar disorder at the whole gene level in the adult cerebral cortex (P = 0.000653) [14], although this was not highlighted in that study as it did not survive Bonferroni correction.

We identified 3 genes and 11 individual transcripts with expression predictors associated with major depressive disorder at a Bonferroni-corrected P < 0.05 (Supplementary Tables 9 and 10). One of these genes and 7 transcripts map to 7 of the 44 independent genome-wide significant loci identified by the corresponding GWAS [2]. Most significant association at the gene level was with predictors of decreased expression of IMMP1L, encoding a mitochondrial protein (Pcorrected = 5.45 × 10−5). We also observed association between depression and predictors of decreased expression of PCDHA7 and PCDHA8 (Pcorrected = 0.048 and 0.011, respectively), encoding members of the protocadherin alpha cluster, which has been implicated in the formation and maintenance of complex neural circuits [26]. None of these genes were highlighted in a previous TWAS of major depressive disorder based on expression predictors from adult prefrontal cortex [2]. However, that previous study [2] did implicate adult brain expression of three genes in risk for major depressive disorder that we also find to be associated with the condition based on transcript-specific measures in the fetal brain; namely, DENND1B (this study Pcorrected = 0.0004), XPNPEP3 (this study Pcorrected = 0.004), and DLST (this study Pcorrected = 0.014).

For loci that contained, within a 500 kb window, more than one gene or transcript implicated in each disorder, we performed conditional analyses to identify independent associations (Supplementary Table 11). In interpreting these data, we note that effects on genes/transcripts that are conditionally nonsignificant do not necessarily imply that they are unrelated to risk for these disorders, only that these effects are not independent of more significant effects on other genes/transcripts at these loci. However, for schizophrenia, we were able discern conditionally independent associations with transcripts of MRM2 and SNX8 at a locus on chromosome 7, transcripts of C12ORF65, RSRC2, and KNTC1 on chromosome 12, and the SMDT1 and WBP2NL genes on chromosome 22. Schizophrenia was also associated with independent effects on alternative transcripts of ST3GAL3, SDCCAG8, NCK1-DT, and MARK3, while bipolar disorder was associated with independent effects on GOLGA2P7 and a transcript of UBE2Q2P1 on chromosome 15, and transcripts of CDK10 and CHMP1A at a locus on chromosome 16.

Cis-heritable effects on the expression of 8 genes and 13 transcripts were significantly associated with more than one neuropsychiatric condition (Figs. 1 and 2). For example, increased expression of the previously highlighted ST3GAL3 transcript ENST00000489897 was associated with schizophrenia (Pcorrected = 0.013) as well as ADHD (Pcorrected = 1.19 × 10−9), while predictors of decreased expression of the NMB gene were associated with both schizophrenia (Pcorrected = 0.002) and bipolar disorder (Pcorrected = 0.0002). Predictors of increased expression of a transcript of XPNPEP3, encoding X-Prolyl Aminopeptidase 3, were significantly associated with major depressive disorder (Pcorrected = 0.0043), bipolar disorder (Pcorrected = 0.0006), and schizophrenia (Pcorrected = 0.0023). Although only surviving Bonferroni correction for schizophrenia (Pcorrected = 2.7 × 10−5) and bipolar disorder (Pcorrected = 0.013), predictors of reduced expression of DDHD2, a brain triglyceride hydrolase associated with hereditary spastic paraplegia and intellectual disability [27], were significantly (P < 0.05) associated with all five tested neuropsychiatric conditions. Intriguingly, while predictors of low expression of the protocadherin genes PCDHA7 and PCDHA8 were associated with major depressive disorder (Pcorrected = 0.048 and 0.011, respectively), predictors of higher expression of these genes were associated with schizophrenia (Pcorrected = 0.019 and 0.0004, respectively). These genes reside at a locus on chromosome 5 where opposing effects on risk for schizophrenia and major depressive disorder have recently been reported [28].

Fig. 1: TWAS Z-scores across all five tested neuropsychiatric disorders for predictors of cis-heritable gene expression that are significantly associated with at least two conditions (gene-level analysis).
figure 1

A Bonferroni corrected P < 0.05 equates to a Z-score ±4.12. ADHD attention deficit hyperactivity disorder, ASD autism spectrum disorder, BD bipolar disorder, MDD major depressive disorder, SCZ schizophrenia.

Fig. 2: TWAS Z-scores across all five tested neuropsychiatric disorders for predictors of cis-heritable gene expression that are significantly associated with at least two conditions (transcript-level analysis).
figure 2

A Bonferroni corrected P < 0.05 equates to a Z-score ±4.36. The HUGO Gene Nomenclature Committee (HGNC) IDs for genes to which Ensembl transcripts are annotated are indicated on the left. ADHD attention deficit hyperactivity disorder, ASD autism spectrum disorder, BD bipolar disorder, MDD major depressive disorder, SCZ schizophrenia.

Discussion

An essential first step in translating GWAS findings into an understanding of molecular risk mechanisms for neuropsychiatric disorders is to elucidate the genes that are primarily affected, how they are functionally impacted and when (and where) these effects take place [29]. We have here applied TWAS methodology to genome-wide association summary statistics for five major neuropsychiatric disorders in order to identify genes that are differentially cis-regulated in the human second trimester fetal brain in association with genetic risk variation for these conditions. Our findings are consistent with the hypothesis that altered gene expression in the prenatal brain plays a role in the later development of neuropsychiatric disorders and nominate genes and individual gene transcripts for further neurobiological investigation.

While previous TWAS of neuropsychiatric disorders have largely focused on individual diagnoses using gene expression predictors derived from the adult brain e.g., [18, 20] we have here performed TWAS on multiple neuropsychiatric conditions, based on gene expression in the fetal brain. This allowed us to identify genetic risk-associated differences in prenatal gene expression that are unique to, or shared by, neuropsychiatric diagnoses. Our finding of numerous prenatal gene expression differences associated with genetic risk for schizophrenia is consistent with the long-hypothesized early neurodevelopmental component to this condition [9, 30]. Although bipolar disorder is generally considered to be less neurodevelopmental in origin, we identified prenatal cis-regulatory effects on a number of genes and transcripts associated with genetic risk for the condition, consistent with our previous finding that fetal brain eQTL are enriched within genetic risk variants for bipolar disorder as well as schizophrenia and ADHD [13]. Moreover, we observed multiple instances where schizophrenia-associated cis-effects on prenatal gene expression were shared with bipolar disorder, consistent with the high genetic correlation between the two conditions [31]. Apparent pleiotropic effects of prenatal cis-regulatory variation were also observed for schizophrenia and ADHD (affecting a transcript of ST3GAL3) and for schizophrenia, bipolar disorder, and major depression (affecting a transcript of XPNPEP3). In the case of the protocadherin alpha cluster genes PCDHA7 and PCDHA8, we observed significant opposing effects on their expression in fetal brain associated with genetic risk for schizophrenia and major depressive disorder. These findings may prove particularly important given recent evidence of intrinsic abnormalities in the expression of protocadherin alpha cluster genes in induced pluripotent stem cell-derived neurons from patients with these disorders [32, 33].

Our comparisons with TWAS findings from adult human brain suggest that some of the gene expression changes that we implicate in risk for neuropsychiatric disorders are specific to brain development, potentially reflecting genetic risk variants residing within development-specific regulatory elements (e.g., enhancers) [34]. However, many of the gene expression changes that we find to be associated with risk for neuropsychiatric disorders are also observed in adult brain, and these may therefore constitute ongoing risk mechanisms for these conditions. Experimental manipulations in model systems will be necessary to determine the relative impact of these gene expression changes at different developmental timepoints.

Around two-thirds of the genes (46 out of 63) and transcripts (111 out of 166) that we implicate in neuropsychiatric disorders at Bonferroni-corrected significance are located outside of known GWAS risk loci, illustrating the strength of the TWAS approach in revealing novel associations. However, a limitation of the current study is that, due to the modest sample size of the fetal brain gene expression reference panel (N = 120), we could only define genetic expression predictors for a fraction of genes/transcripts that are likely to be variably cis-regulated in the prenatal human brain. Future increases in GWAS as well as expression reference sample sizes will likely yield many additional associations between gene expression in the developing human brain and genetic risk for neuropsychiatric disorders, which could form the basis for pathway and other tests of biological enrichment. These might include tests of expression enrichment in specific cell types of the developing human brain based on single cell transcriptomic data. It should also be noted that, while associations between cis-regulated gene expression and genetic risk are consistent with causal involvement of the affected gene in the disorder, additional genetic and functional investigations will be required for their substantiation as risk mechanisms for these conditions. In this endeavor, our delineation of directional cis-regulatory effects on specific gene transcripts should provide a focus for investigations and guide modeling in neural systems.