Introduction

It has been known for many years on the basis of a large number of family, twin and adoption studies that genetics has an important role in conferring risk to schizophrenia.1 Identification of genetic risk at the level of DNA variation has, however, proved challenging with replicable findings hard to come by and the field has endured cycles of optimism and disappointment. However in the past few years this has begun to change. Success has been built largely on three main developments in addition to the publication of the sequence of the human genome and the increasingly detailed documentation of population variation in that sequence. First technology has enabled analyses of genetic variation on a genome-wide scale, both for common alleles through genome-wide association studies (GWAS) as well as rare mutations through copy number variant (CNV) analysis. More recently, the task of seeking rare pathogenic single base and insertion–deletion polymorphisms has been facilitated by the development of whole exome and whole genome sequencing. Second, most geneticists have realised the importance of applying rigorous statistical criteria. Finally, with the realisation that common risk alleles confer very small individual risks has come the appreciation that very large samples are required to satisfy stringent statistical thresholds. This has led geneticists, psychiatric and otherwise, to collaborate to an extent that hitherto has been unusual in the biological sciences (http://www.med.unc.edu/pgc). In this paper, we review very recent genomic findings in schizophrenia and consider their implications.

Recent history

Common genetic variation

The Wellcome Trust Case Control Consortium study of seven common diseases, including 2000 bipolar cases, was a landmark in genetics. Although it did not identify a definitive association to bipolar disorder, it clarified the boundaries of expectation regarding the effect sizes of common risk alleles for a range of complex disorders. To many psychiatric geneticists, the implications were clear; much larger samples were required to capture the small effect sizes (odds ratios <1.2) typical of common risk alleles for complex disorders.

The first wave of successful schizophrenia GWAS, which between them identified fewer than five risk loci, echoed this conclusion.2, 3, 4, 5 They also showed that schizophrenia was highly polygenic, and even more so than was generally expected. The clearest demonstration of this came from the study of the International Schizophrenia Consortium.5 That group showed that an aggregate score representing the number of single-nucleotide polymorphisms (SNPs) selected for even weak evidence of association in their study was higher in schizophrenia cases than controls in independent studies. Modelling suggested that the signal from this ‘polygenic score’ was being driven by hundreds, and likely more than a thousand, of individual susceptibility SNPs, which together could explain approximately one-third of the genetic liability.5

Over the next few years, studies (refs 2, 6,7,8,9 and others reviewed in ref 10) from either informal collaborations or more formalised consortia detected incrementally more risk loci such that by the end of 2013 around 30 loci had been reported at genome-wide significance, albeit not always in samples that were well enough powered to give confidence in the findings. The largest single study prior to 2014 undertook a meta-analysis of new data with that from the Psychiatric Genome Wide Association Consortium (now known as the Psychiatric Genomics Consortium or PGC)11; including replication the sample comprised more than 21 000 cases and 38 000 controls. In total, 22 nonoverlapping genomic loci reached genome-wide significance (P<5 × 10−8). These included 8 previously associated regions and 13 novel associations. It should be noted at this point that when we refer to risk loci, we refer to regions of the genome that contains one or more allele that is associated with disorder at a level corresponding to genome-wide significance. However, because of linkage disequilibrium, typically, a region contains many strongly or partially correlated alleles, any of which might be the actual pathogenic DNA variant. Similarly, when we refer to a risk allele, we refer only to association, but for the same reason, we do not intend to suggest that is the variant that directly alters gene function. Moreover, as multiple correlated SNPs within a locus often span multiple genes, and sometimes do not span any known genes, it follows that association does not unequivocally implicates a specific causal gene. Nevertheless, genes involved in a number of broad biological themes were found to be enriched within the regions of association, particularly calcium signalling and genes predicted to be regulated by microRNA MIR137. Interesting as those results were, the evidence for any of the biological themes was not definitive.

Another important finding of the International Schizophrenia Consortium was that the schizophrenia polygenic score is not only higher in people with schizophrenia, it is also higher in people with bipolar disorder than in controls. This indicates that the polygenic contribution to the two disorders is substantially shared, a conclusion that was also reflected in earlier studies which found evidence for shared risk alleles at individual genes including ZNF804A2 and CACNA1C.12 Subsequent studies have compared and combined GWAS of schizophrenia, bipolar disorder, autism spectrum disorders, major depressive disorder and attention-deficit and hyperactivity disorder. Again, evidence for shared risk was observed for specific risk variants, but more importantly, substantial overlap was found between the three adult onset disorders, schizophrenia, bipolar disorder and major depressive disorder, and a reduced, yet still significant overlap between schizophrenia and autism spectrum disorder.13, 14, 15

The overall conclusions from GWAS can be summarised as follows. First, the common variant contribution is not only substantial but also highly polygenic. Second, despite a possible increase in the heterogeneity, larger sample size leads to more genome-wide significant associations each of which potentially offers a window into the biology of schizophrenia. Third, and in contrast to APOE in Alzheimer’s disease, and the major histocompatibility complex in some auto-immune disorders, there are no common variants that individually contribute substantially to liability. Fourth, the cross-disorder analyses reveal substantial genetic overlap between the adult onset disorders, and a more modest amount of overlap between the schizophrenia and childhood onset disorders. This shared risk highlights the importance and potential utility of genetic findings to inform the aetiological and pathophysiological relationships between these syndromically defined disorders.

Rare genetic variation

The GWAS design typically investigates the higher end of the allelic frequency spectrum (minor allele frequency>1%) but it has been known for some time that rare variation also plays a role in schizophrenia. The earliest definitive report of schizophrenia-associated rare variation concerned deletions at chromosome 22q11.2.16 This deletion CNV confers a substantial (about 25-fold) increase in risk for schizophrenia as well as other psychiatric and neurodevelopmental phenotypes.17 As GWAS technology began to permit genome-wide CNV scans, so evidence has accrued for a wider role in the disorder for CNVs.

In general, people with schizophrenia have an increased burden of large (>100 kb) rare (frequency<1%) CNVs compared with controls. They also have an increased frequency of de novo CNVs.17, 18, 19 To date, CNVs at several distinct loci have been strongly implicated in schizophrenia,17, 20, 21 and, just as for deletions at 22q11.2, the effects of these CNVs are not specific; all are associated with at least one other neurodevelopmental and psychiatric condition including intellectual disability, autism spectrum disorders and attention-deficit and hyperactivity disorder.21, 22, 23, 24, 25, 26, 27 CNVs typically affect many genes, so cross-disorder effects cannot be attributed with confidence to a shared risk gene at any given multi-genic locus. Nevertheless, the findings are at least suggestive of partial sharing in genetic risk across multiple disorders.

Studies of CNVs, particularly of de novo CNVs, have yielded insights into biological processes that are perturbed in schizophrenia. CNVs preferentially disrupt genes involved in neurodevelopmental pathways.28 More specifically, there is strong evidence that they are enriched for genes in the postsynaptic density that play a role in modulating synaptic strength at glutamatergic synapses, particularly genes encoding members of the N-methyl-d-aspartate receptor (NMDAR) complex and the activity-regulated cytoskeleton-associated (ARC) protein complex.19 Recently, independent gene-set association analysis of case CNVs discovered in a genome-wide screen of cases and controls provided additional support for genes encoding protein members of the postsynaptic density21 as well as for enrichment among case CNVs for calcium channel signalling genes and targets of the fragile X mental retardation protein (FMRP).

In general, schizophrenia-associated CNVs have large individual effect sizes but are extremely rare in the population, whereas associated common variants have small individual effect sizes but are common in the population. The effect sizes of de novo point mutations are currently unclear. The aggregate effect of CNVs and SNPs, both de novo and inherited, across the allelic frequency spectrum must be considered in order to determine the contribution of genetic effects to schizophrenia.

Major GWAS and exome sequencing studies of 2014

GWAS

The recently published second GWAS paper from the Schizophrenia Working Group of the PGC29 comprised an analysis of 49 nonoverlapping samples containing 34 241 cases and 45 604 controls as well as 1235 parent affected-offspring trios. In total, this more than doubled the sample sizes used in the previously largest GWAS.11

Summary data for an additional sample of 1513 cases and 66,236 controls were obtained from deCODE genetics for linkage disequilibrium-independent-associated SNPs at P-value<1 × 10−6. Meta-analysis of these datasets resulted in a total of 128 statistically independent schizophrenia associations in 108 distinct genomic loci. The 108 loci included 25 previously reported, and 83 novel, loci. The continued and extended support for previously reported loci demonstrates the reliability of the earlier GWAS results built on large sample sizes, a point that had been demonstrated by extensive and fully independent replication of loci identified in the first PGC schizophrenia study.30

Of the associated loci, most (75%) contained 1 or more protein-coding gene. Perhaps most notably of all, one of the loci contains the Dopamine receptor D2 (DRD2) gene, which encodes the main target of all effective anti-psychotic drugs. This is the first strong link between genetic susceptibility to schizophrenia and the mechanism of action underpinning its treatment. It also provides an important reminder that, despite the small effect sizes associated with individual risk alleles, GWAS can identify treatment targets, modulation of which can have profound effects on the disorder.31 Other associated loci are notable for containing glutamate receptors (GRIA1, GRIN2A and GRM3) and members of the voltage-gated calcium ion channel family of proteins (CACNA1C, CACNA1I and CACNB2) as well as many genes involved in synaptic plasticity. Together with recent findings from sequencing studies reviewed below, these associations provide a broader body of evidence that disruption of the glutamate system and neuronal calcium homeostasis contribute to the pathophysiology of schizophrenia. Though these genes are all plausible biological candidates the authors were careful to point out that they may not necessarily be the causal elements within the associated loci.

Using histone acetylation markers from various sources including the ENCODE project32 to define enhancer elements, if not unsurprisingly, then reassuringly, the PGC group found associations were enriched in enhancers that were relatively brain-specific. However, they also found associations were enriched in enhancers that are active in immune cells and tissues, even after removing those with prominent activity in brain as well as excluding the poorly localised signal in the major histocompatibility complex region. This provides support for the hypothesis that reports of abnormal immune function in schizophrenia may reflect a causal disruption in immunity in schizophrenia rather than an epiphenomenon.33 However, caution is required here until links can be made between the associations at immune system enhancers and altered function in specific immune genes, and between changes in the functions of those genes and altered immune function. Nevertheless, the findings provide a further impetus for studying immune function in schizophrenia.

Polygene score analysis showed the amount of variance in case-control status explained by common additive genetic effects rising from previous estimates to 18% as measured by Nagelkerke R2. Even so, predicting diagnostic status by polygenic score results in a very-high degree of diagnostic misclassification, indicating that such analyses are not yet clinically useful.

Although the recent study represents a step change in discovering genetic susceptibility loci, an important limitation of the study was that plausible functional variants could not be identified for most of the findings. Only 10 of the index associations could potentially (though not definitively) be ascribed to a nonsynonymous coding variant, and only 12 could be credibly explained by a known expression quantitative trait locus (eQTLs). This highlights the need for a richer resource of annotation data if GWAS results are to be fully exploited for biological understanding.

Sequencing studies

Another recent landmark in schizophrenia genetics was the back-to-back publication in 2014 of two fairly large studies that used sequencing technology to screen most of the known coding exome for rare single-nucleotide variants (SNVs) and small insertions or deletions (indels) that might affect risk for schizophrenia.34, 35

Prior to 2014, a series of small sequencing studies provided support for the hypothesis that, just as for de novo CNVs, the rate of de novo SNVs and indel mutations was increased in schizophrenia.36, 37, 38 Others also reached similar conclusions for autism39, 40, 41, 42 and, more so, intellectual disability.43, 44 Moreover, since this type of mutation preferentially occurs in older men45, it was thought de novo mutations might partially explain the increased risks for schizophrenia in children whose fathers are relatively old at the time of conception.46

However, in the largest study of de novo mutations in schizophrenia to date, no evidence was found for an overall increase in the rate of nonsynonymous or loss-of-function de novo mutations, suggesting that de novo mutations play a lesser role in schizophrenia than has been indicated by the earlier studies, or for the other disorders.34

Despite the lack of evidence for a general elevation in the rate of de novo mutations, there was significant enrichment of nonsynonymous de novo mutations in glutamatergic postsynaptic proteins comprising the ARC and NMDAR complexes.19 De novo mutations were additionally enriched in other proteins that are hypothesised to modulate synaptic strength, specifically proteins regulating actin filament dynamics and those whose mRNAs are targets of FMRP also shown to be enriched in autism spectrum disorder de novo mutations42 and in a schizophrenia case-control CNV study.21

A second observation, subsequently also reported in a much smaller study, was that de novo mutations in schizophrenia occurred more frequently in genes affected by de novo mutations in autism spectrum disorder and intellectual disability.47 Unlike CNV associations, the overlaps in de novo mutations point to overlapping risk at single gene rather than locus resolution, and thereby provide more definitive evidence for pleiotropy.

In the study of Fromer et al., although cases in general had no elevation in de novo mutation rates, de novo loss-of-function mutations occurred more frequently than expected in people with schizophrenia who also had lower premorbid educational attainment, suggesting that de novo loss-of-function mutations may have a role in neurodevelopmental impairment across diagnostic boundaries. However, on average, schizophrenia de novo mutations were predicted to be less damaging to the protein function than those in people with autism or intellectual disability, a finding that is consistent with the hypothesis that schizophrenia occupies a less extreme position on a neurodevelopmental gradient of impairment than the other two disorders.34 The hypothesis of shared genetic risk has obtained further support from the recent GWAs study of the PGC29 in which is was noted that GWAS significant loci are enriched for genes affected by de novo mutations in autism and intellectual disability.

The second, and larger, exome sequencing study examined rare variants predicted to be damaging using a case-control study design of approximately 2 500 cases and 2 500 controls.35 No specific gene showed a genome-wide corrected significant excess of rare mutations in cases. However, a large set of around 2 500 genes selected a priori by the authors to be likely enriched for schizophrenia susceptibility genes (for example, genes with de novo mutations, genes in GWAS regions, mapping to CNVs, members of the ARC, NMDAR and postsynaptic density pathways) showed an increased burden of rare nonsense and disruptive variants in cases compared with the controls. This burden was attributable to mutations in a large number of genes suggesting that the mutational target of schizophrenia encompasses many hundreds of genes. Modelling suggests that the impact on disease risk of rare CNVs and disruptive mutations may be an order of magnitude smaller relative to common SNPs.

In terms of implications for pathophysiology, notable findings were convergence with the other exome sequencing study34 in finding enrichment for rare damaging mutations in the ARC and NMDAR complexes, and in targets of FMRP. Other convergences were noted with GWAS data; most notably genes encoding voltage-gated calcium ion channel proteins were also enriched. However unlike the large de novo study,34 there was no overlap between the mutations observed in schizophrenia and those discovered in either autism spectrum disorders or intellectual disability.

As noted in the original publications these studies lacked power to implicate specific genes and rare mutations in schizophrenia. However, they do allow us to conclude that rare, as well as common, single-nucleotide variation plays a role in schizophrenia though the relative contribution of the two classes of variant remains uncertain pending much larger sequencing studies as does the contribution of rare variation outside the exome. In view of the lack of power to implicate specific genes and variants, the authors took a hypothesis driven, gene-set approach. Both studies have provided support for specific gene sets that have been implicated in previous studies and this convergence between the two exome sequencing studies is striking. However, they were limited in that they examined only a small proportion of the neurobiologcal processes and structures that could potentially be implicated in schizophrenia. This limitation was imposed by the desire to focus on circumscribed, well-annotated gene sets with strong a priori evidence for involvement in schizophrenia. It seems certain that other important areas of biology remain to be discovered but that this will require much larger samples and much better annotated genomic resources (see below).

Conclusions

The combination of new technology, extensive collaboration and persistence in the face of a sometimes challenging funding climate has been rewarding for schizophrenia genetics. Over 100 specific common risk loci29 and at least 11 rare risk alleles20 have been identified. Moreover it is clear that these represent the tip of an iceberg of genetic complexity. Evidence for risk effects is seen across the entire allelic frequency spectrum, from private de novo mutations to common SNPs. In addition evidence from both GWAS29 and early sequencing studies35 suggest that the mutational target for schizophrenia is likely to be extensive involving hundreds and very likely thousands of genes.

The second finding, which is perhaps not surprising given the degree of genetic complexity and the continuous nature of many psychiatric traits,48 is that genetic risk does not map neatly on psychiatric diagnoses. There is evidence for shared genetic risk between schizophrenia, bipolar disorder, autism spectrum disorders, intellectual disability and attention-deficit and hyperactivity disorder. These point to shared disease mechanisms and to the need for approaches to patient stratification for research that go beyond the current Diagnostic and Statistical Manual of the American Psychiatric Association/International Classification of diseases categorical approaches used in the clinic.48, 49 Recent genetic findings also support the hypothesis that schizophrenia can be conceived of as part of a spectrum of neurodevelopmental disorders ordered by severity with identification at one extreme and mood disorders at the other.50

Perhaps most encouragingly, despite the complexity of the genetic picture that is emerging, we are beginning to get glimpses of convergence onto a coherent set of biological processes. Results from GWAS,11, 29 CNV19 and sequencing studies34, 35 point to a functionally related set of synaptic proteins involved in synaptic plasticity, learning and memory. Among the gene sets that can be tied to these processes are the ARC and NMDAR complexes, targets of FMRP, and voltage-gated calcium channels, all implicated by rare variant studies, and in the case of FMRP targets and voltage-gated calcium channels, common variant studies as well, including the most recent PGC GWAS.29 Moreover, although the limited gene-set analyses reported by the PGC GWAS did not identify enrichments among the synaptic gene sets, associations to loci containing individual N-methyl-d-aspartate (GRIN2A), α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (GRIA1) and metabotropic glutamate (GRM3) receptor genes as well as affiliated proteins point broadly in that direction. A more detailed gene-set analysis of the PGC GWAS has been undertaken and is being prepared for publication.

Although, it seems highly likely that there are other systems and mechanisms involved, including the dopamine system, and we can expect these to emerge with further and larger GWAS and sequencing studies, the current findings offer numerous entry points for neuroscientists to probe the biological basis of schizophrenia. Of course the sheer genetic complexity poses many challenges for such studies51 and we should not forget that with respect to the GWAS findings, we have yet to identify the actual risk variants and their proximal functional consequences at the gene level. The relatively sparse annotation of the genome, regulome and proteome across multiple brain regions, specific cell types and developmental periods hinders the translation of genetic findings into mechanistic understanding. The bridging of this ‘annotation gap’ coupled with larger, well-powered genetic studies will aid in elucidating both healthy and diseased brain function, and potentially provide further insights into drug discovery and nosology. However, the massive multiple testing burden faced by genomic studies, and the proliferation of analytical methods, means that this endeavour will need to overcome methodological challenges to avoid a proliferation of false positive findings.

A further avenue for defining the effects of specific high-penetrance mutations will be to return to patients carrying the mutations for more extensive phenotyping using the plethora of approaches now available to clinical neuroscientists. Direct comparison with animal and cellular models, including those using induced pluripotent stem cells and new genomic editing approaches of the same mutations will also likely be informative.52, 53 Detailed phenotyping studies of individual common variants have proliferated in recent years, but these face a number of methodological difficulties.54 The development of methods to measure the en masse effects of risk SNPs in individuals, such as the polygene score approach mentioned above, offer new approaches in studying the impact of genetic risk on brain function using studies of unaffected as well as affected individuals.

Finally, we should note the successes in schizophrenia genetics are unlikely to have occurred because of the fundamental differences in the genetic architecture of this disorder compared with many other psychiatric disorders. Although in the case of some very-high-frequency disorders such as depression, the genetic architecture and or heterogeneity may be particularly challenging, for many other disorders, similar success is likely to follow the application of larger sample sizes.