Introduction

The familiality of psychiatric illness has long been appreciated, but only in the past decade or so have the tools and resources become available to probe its genetic determinants directly. This new era of genetics is particularly meaningful in psychiatry, where, unlike other medical fields, disorders have often been diagnosed and treated subjectively and in the absence of a clear biological framework. Furthermore, the historically unappreciated link between mental illness and biology has led to centuries of stigma for those suffering. By using genetics to frame mental illness as a biomedical phenomenon, there is hope that stigma can be alleviated.

Because many aspects of psychiatric illness are uniquely human and have no adequate animal or cellular model, human genetic research studies are particularly crucial for illuminating the underlying biology of these conditions. As these studies fill in the gaps in our knowledge, we will be better equipped to design therapies in a rational and targeted way.

This review is intended for outsiders to the field of psychiatric genetics and perhaps even genetics in general. I begin by laying a foundation of basic aspects of genetic variation so that the most salient genetic findings from a number of major neurodevelopmental and neuropsychiatric conditions can be appreciated in their context in the latter half of the review. First, an overview of different kinds of genetic variation is given, followed by technologies currently used to probe them. General concepts that routinely come up in these genetic studies are then reviewed, and finally a brief synopsis of current trajectories of discovery in the major neurodevelopmental and neuropsychiatric conditions is given.

Types of Genetic Variation

Aneuploidy, the largest of the genetic variations (Fig. 1), is an abnormal chromosome count resulting from errors in chromosome segregation during cell division. Aneuploidy increases in the maternal germline with advancing maternal age [1]. Most aneuploidies that occur in the germline are embryonic lethal, but some result in viable individuals with developmental syndromes, the most well-studied examples being Down syndrome (trisomy 21) and Klinefelter syndrome (47, XXY), as well as Turner syndrome (monosomy X), Edward syndrome (trisomy 18), and Patau syndrome (trisomy 13). Exciting approaches that re-engineer X-inactivation mechanisms for potential treatment of some aneuploidies are currently under investigation [2, 3].

Fig. 1
figure 1

Size distributions of classes of genetic variation, and the ability of microarray and sequencing technology to detect them. Solid circle = routinely detectable; (+) = detectable depending on platform, or with special protocols, or with limitations on size or coverage; (–) = not detectable; SNP = single-nucleotide polymorphism; SNV = single nucleotide variant; indel = insertion/deletion; STR = short tandem repeat; VNTR = variable number tandem repeat; SV = structural variant; CNV = copy number variant

Genomic structural variants (SVs) occur on a subchromosomal scale (Fig. 1) and include inversions and translocations (where there is no net loss of genetic material, such as the balanced translocation involving DISC1 in schizophrenia (SZ) [4]), as well as deletions and duplications (where there is a net loss or gain, respectively, of genetic material). Deletions and duplications comprise a subclass of SVs called copy number variants (CNVs). SVs are sometimes arbitrarily defined as larger than 1 kb [5, 6]; however, in actuality, their size distribution is continuous, with smaller variants being more numerous within an individual genome and larger variants being less numerous. Ascertainment of CNVs has played a crucial role in psychiatric genetics over the past decade, as it has helped to turn the spotlight from the common variants popularized by genome-wide association studies (GWAS) to rare variants of large effect [7]. CNVs are a normal form of human genetic variation [8, 9], but neuropsychiatric populations show enrichment for de novo and rare, large CNVs [7, 10,11,12,13,14,15,16]. Genes that support the development and function of the brain tend to be larger than other genes, and so are more frequently affected by structural variation [17]. Furthermore, segmental duplications, which can act as a catalyst for SVs, are expanding in the primate and specifically the human lineage [18], with brain genes particularly affected [19]. Taken together, structural variation is an important mode of genetic variation in human evolution and disease in general, and in neuropsychiatric conditions in particular.

Short tandem repeats (STRs) and variable number tandem repeats (VNTRs), are smaller genetic variations (typically a few to hundreds of base pairs, and up to thousands of base pairs for VNTRs; see Fig. 1), consisting of a number of repeats of some core polynucleotide sequences (2–6 for STRs and 7–100 for VNTRs) . STRs arise chiefly as a result of slippage of DNA replication machinery and have high mutation rates compared with other forms of genetic variation [20]. Well-studied conditions with STRs as the causal factor include Huntington disease (HD) and Fragile X syndrome. In huntingtin (HTT), the gene that underlies HD, a CAG trinucleotide repeat encodes a polyglutamine tract in the corresponding huntingtin protein. If this STR expands past 35 repeats, the polyglutamine tract of the corresponding protein causes protein aggregation, which is pathogenic, and results in progressive neurodegeneration, chorea, psychiatric systems, and early death [21]. The CAG repeat is unstable and expansion between generations can lead to earlier onset of the disease, a phenomenon known as anticipation. In contrast to HD, the STR that underlies Fragile X syndrome lies in a noncoding regulatory sequence, upstream of FMR1. If this CGG repeat expands past ~200 units, the promoter becomes hypermethylated and gene expression of FMR1 is shut down [22]. FMR1 encodes a vital regulatory protein that binds mRNAs, in particular those related to synaptic function, and loss of FMR1 results in mass dysregulation of protein expression and autism-like neurodevelopmental features. Genotyping STRs at a genome-wide scale using high-throughput sequencing technology is not routinely done because read lengths are not currently long enough to read through many STRs while still having enough flanking sequence to reliably map them. Consequently, while specific markers are used for forensics or diagnosing well-known diseases, relatively little is known about the role of STRs in disease at the genomic scale.

Indels (insertion/deletions) are polynucleotide expansions (insertion) or contractions (deletion) relative to the reference genome. Like SVs, the size distribution of indels is continuous, but generally variants ranging from 1 to 50 base pairs are considered indels (with larger indels becoming indistinguishable from small SVs; see Fig. 1) [23]. In protein-coding sequence, indels that are not multiples of 3 are under selective pressure because they would induce a frame shift in the subsequent transcript and protein, either leading to nonsense-mediated decay or an entirely different structure downstream of the indel.

Single nucleotide variants (SNVs) are the smallest type of genetic variant (Fig. 1), and are by far the most thoroughly used in genetic association studies. In protein coding sequence, SNVs can change codons to alter the eventual sequence of amino acids in the protein (missense variant), introduce a premature STOP codon (nonsense variant), or alter splicing properties (splice variants). SNVs may also change the codon to another codon for the same amino acid (synonymous variants). SNVs in a noncoding sequence may exert an effect by altering binding affinities for DNA-binding proteins, such as transcription factors, or by affecting epigenetic properties of the chromatin, such as DNA methylation. These single-letter changes to the genetic code are, from a technical standpoint, the easiest class of genetic variation to identify in a massively parallel way, which led to them being the basis for the first generation of GWAS. SNVs that are common in the population are called single-nucleotide polymorphisms (SNPs), and these are the variants that are probed by the microarrays used in massive studies like those of the Psychiatric Genomics Consortium (PGC; http://pgc.unc.edu). Because SNPs are common in the population, they have generally withstood a reasonable amount of selective pressure, and, consequently, associations between a SNP and a deleterious condition such as a disease will generally have a small effect size. Such small effects require large studies to provide the power necessary to declare a statistically significant association. However, rare SNVs (or any kind of rare genetic variant) have generally not undergone the same level of purifying selection, and so disease associations can have larger effect sizes. However, owing to the rarity of these variants, they are often pooled within a gene or even a gene set or pathway to provide the necessary power to demonstrate a statistically significant association. A number of properties of SNPs plagued early GWAS [24], including confounding effects with population structure, as well as underpowered designs (as common SNP effects are small and require large samples to achieve significance). Furthermore common SNPs often merely tag causal variants by virtue of their “hitchhiking” together through recombination events (contributing to a phenomenon called linkage disequilibrium). This tagging effect can make locating and interpreting the causal variant a challenging exercise.

Technologies

Two basic technologies dominate current genetic studies of psychiatric conditions: DNA microarrays and high-throughput sequencing (Fig. 1). Microarrays use DNA oligonucleotide probes, which give a sparse picture of the genome, to genotype SNPs and CNVs. The low cost of these arrays (typically around $100–200) enabled the massive GWAS projects of the past decade, to the point that “microarray” and “GWAS” have become nearly synonymous. While the future is clearly in sequencing technologies (see below), microarray platforms, particularly those with custom probe content (such as the Illumina PsychArray, which probes an assortment of variants nominated by researchers in psychiatric genetics) still have their place in certain study designs. Specifically, in studies where assessment of common genetic risk is prioritized over gene discovery, microarrays can be a cost-effective tool.

As DNA sequencing prices have fallen dramatically over the past decade, investigators have more options to consider when designing genetic studies [25]. Whole-exome sequencing (WES) was the first form of high-throughput sequencing to be widely adopted in psychiatric genetics studies, with early studies in autism demonstrating enrichment of deleterious protein-coding mutations [26,27,28,29]. In WES, a preliminary capture step is performed that limits input DNA to protein-coding exons, thus reducing the DNA to be sequenced by more than an order of magnitude. While cost was the major consideration that birthed WES as a stopgap technology, investigators found that in contrast to GWAS, where top hits were often in functionally ambiguous noncoding regions, data from WES were comparatively straightforward to interpret: changes to coding DNA sequence affect splicing and amino-acid sequence of the resulting protein in a relatively straightforward way.

As sequencing throughput continues to increase with new technology, the price differential between WES and whole-genome sequencing (WGS) is shrinking. A whole genome sequence currently costs around 2 to 3 times the cost of an exome, yet yields ~30 times more data. Furthermore, the protein-coding fraction of the genome is covered more uniformly in WGS [30], and SVs can be called far more reliably and comprehensively in WGS. However, compared with WES, all these additional data require a substantial investment in storage and computational infrastructure, as well as the expertise to interpret noncoding variation. By far the most widely used technology for WGS is Illumina’s sequencing by synthesis, which sequences about 150 base pairs (75–300 base pairs, depending on the instrument and chemistry) of the ends of DNA fragments. While Illumina’s technology has proven popular and cost-effective, short-read (SR) approaches to sequencing have limitations. SR cannot read through long STRs, and consequently cannot always genotype them. SR can only be used for short-range phasing of variants; it struggles to resolve complex structural variation and low-complexity sequences. Competitors have focused on developing long-read technologies in the hope of luring customers who have run into Illumina’s limitations. PacBio uses a single-molecule real-time approach to achieve reads averaging > 10 kb [31]. This technology was demonstrated to great effect in the sequencing of a hydatidiform mole genome, resulting in a far more comprehensive view into SVs than can be provided by Illumina’s technology [32]. This work also enabled the closure of many interstitial assembly gaps in the human reference genome. While PacBio’s technology delivers impressive results and throughput is improving, its high cost pushes it out of the reach of most laboratories and makes it a poor solution for sequencing large cohorts. A somewhat more cost-effective approach to long-read sequencing has been developed by 10X Genomics [33], called linked read sequencing. This approach uses microfluidics to partition high-molecular-weight DNA molecules into droplets where barcodes are added to amplified fragments, thus introducing the means to “link” the resulting sequencing reads back to a single DNA molecule. The library is sequenced on standard Illumina instrumentation, and a custom alignment and variant-calling pipeline is used to identify and phase SNVs, indels, and SVs. On average, 97% of SNVs were phased into phase blocks ranging from 0.9 to 2.8 Mb in length [33].

While WGS is a powerful technology for comprehensive discovery of genetic variations, targeted sequencing offers a more economical approach for replication studies or other hypothesis-driven designs that target a smaller number of genes and loci. Technologies such as Agilent’s SureSelect use RNA probes to capture targeted genomic regions prior to sequencing library preparation. Another cost-effective approach to targeted sequencing is molecular inversion probes [34]. These are linear DNA oligos whose ends target sequences that flank regions of interest of about 200 base pairs. A gap filling and ligation reaction fills in the targeted sequence and the probe is circularized, the residual linear DNA digested, and the circular capture products amplified through PCR. The pool of PCR products is then sequenced with standard Illumina chemistry. Although the up-front investment in the synthesis of the oligo probes can be substantial, the number of individuals that can be sequenced with the resulting probe pools is, for all practical purposes, limitless. Cost analyses suggest that a panel of 1000 targeted regions can be sequenced for < $10/per individual [35].

Concepts

Here I introduce some concepts that come up routinely in psychiatric genetics. While some of these concepts are illustrated with examples from the field, the major findings are presented in more detail in the “Current Trajectories” section.

GWAS data have seen a renaissance in recent years with an increasing appreciation of the role of common polygenic risk [36,37,38,39]. Focus has shifted from array-based GWAS data as the primary means of disease gene discovery to microarrays as a tool to calculate the aggregate genetic risk of individuals for various diseases. Predictive models are trained using large discovery cohorts, such as from the PGC, and then a linear combination of risk alleles is computed in the cohort of interest for each individual, essentially producing a single numerical value for each individual that represents their polygenic risk for the disease in question. These polygenic risk scores are then correlated with other variables of interest to draw inferences about the role of common polygenic risk in the phenotypes of interest. Such analyses, using either a polygenic risk score or an alternative approach called LD score regression [40], are often referred to as studies of genetic correlations [41, 42]. Notable examples of the application of polygenic risk scores in the general population include the finding that polygenic risk for autism is positively correlated with cognitive ability [37] and that polygenic risk for SZ is predictive of creativity [38]. Such examples may serve to explain, at least in part, the evolutionary double-edged sword of polygenic psychiatric risk. Furthermore, it has been shown through genetic correlations that major depressive disorder, bipolar disorder, and SZ share genetic risk factors; and that anorexia and SZ share genetic risk factors [41]. A further permutation of the concept of polygenic risk is the use of predictive models to infer gene expression (rather than disease risk) based on large-scale genotyping data [43, 44]. In this way, the collection of imputed gene expression values may be used as an intermediate phenotype to link trait or disease state to genotype.

As parallel efforts to discover the genetic basis for different common psychiatric conditions have progressed, it has become clear that a substantial number of risk genes confer risk for multiple conditions [42], a phenomenon called pleiotropy [45]. Consequently, the notion that there are “schizophrenia genes” and “autism genes”, and so on, has evolved into the idea that there are simply “brain genes” that can be perturbed in ways and combinations so as to predispose individuals to either SZ or autism or any other neuropsychiatric condition [46]. Certain classes of genes may be more likely to be involved in one condition than another [e.g., chromatin and transcriptional regulators in autism spectrum disorder (ASD)], but the field is young enough that it is difficult to determine whether or not these perceptions of thematic segregation by condition are merely a product of ascertainment bias.

A very specific kind of pleiotropy has been observed in connection with some well-studied CNVs: reciprocal (or mirror) phenotypes. When a deletion and its reciprocal duplication (i.e., of the same genetic material) results in phenotypes that are correspondingly mirrored at opposite ends of a spectrum, that locus is said to be involved in a reciprocal phenotype. The implication is that phenotypes associated with these loci vary in a semi-quantitative way with gene dosage. Notable examples include variation of head circumference (i.e., trends toward microcephaly or macrocephaly) and body mass index with comorbid neuropsychiatric features observed in 16p11.2 [47, 48], head circumference in 1q21.1 [49], and head circumference, stature, and bone maturation rates at 5q35 [50]. Reciprocal CNVs at a number of genomic loci that confer reciprocal risk for autism or SZ (1q21.1, 16p11.2, 22q11.21, and 22q13.3) have led to a hypothesis that, in some respects, these 2 conditions might be considered reciprocal phenotypes [51].

As sequencing prices have continued to fall, it has become practical to sequence large numbers of trios with the aim of identifying putatively causal de novo mutations (DNMs). These are new mutations that do not affect the parents but that occur sporadically in either the sperm or egg haploid genomes, thus propagating to all cells in the offspring. This approach to gene discovery is particularly fruitful when family history is strongly suggestive of the condition being sporadic: the proband’s genome is compared with the parents’ genomes, and genetic variants that are absent in both parents are candidate DNMs (sequencing errors lead to many false-positives and best practices require confirmation of putative DNMs with an additional genotyping technology). Early exome studies of autism rapidly expanded the list of autism risk genes by identifying genes that were hit by de novo and presumed damaging mutations in multiple individuals [26,27,28,29]. Further, these studies showed that individuals with autism do not have a greater burden of exonic DNMs, although their DNMs are more likely to be damaging. WGS studies of autism and other samples showed that a disproportionate number of DNMs (~75%) are transmitted from the father, that DNM burden is positively correlated with paternal age (about 1 DNM per year of paternal age), that DNMs cluster together in a nonrandom fashion, that humans harbor about 50 to 100 single-nucleotide DNMs each, and that single-nucleotide mutation rate varies substantially across the genome [52, 53]. As DNMs have undergone only a single round of selective pressure, they are attractive candidates for potentially causal variants of large effect.

Depending on the comprehensiveness of the genetic study, there may be many thousands of variants of interest, and a variant annotation scheme must be used to prioritize candidates for further investigation. For protein-coding variants, indicators of the functional consequence (amino-acid change, premature STOP codon, splice-site disrupting, etc.) can be assigned in a straightforward manner. Variants are also annotated according to the frequency of the minor allele in a population sample, and this is used to classify the variant as common or rare (with thresholds varying, but usually either < 0.05 or < 0.01 qualifies as rare). Measures of selective constraint, such as PolyPhen [54], Genomic Evolutionary Rate Profiling (GERP) [55], or Combined Annotation Dependent Depletion (CADD) [56] are often used as an indicator of how deleterious the variant is. In addition, noncoding variants are sometimes annotated according to whether they intersect known regulatory elements or epigenetic marks, which may give further clues as to their regulatory consequences.

The impact of a genetic variant does not occur in a vacuum; it is often modulated by other genetic factors or the environment. This modulatory effect can result in incomplete penetrance (when some carriers of a damaging variant do not show the corresponding phenotype) or variable expressivity (when the associated phenotype manifests itself differently in individuals). When the effect of a variant depends upon the genetic background of an individual or the genotype at another locus, the effect is said to be epistatic. Epistatic effects are often called gene–gene interactions, and epistasis has been used in model organisms as a means to flesh out functional gene networks [57], though in humans the phenomenon has not been observed as extensively. Nevertheless, clear examples of epistasis have been observed in human neuropsychiatric and developmental conditions [58,59,60,61,62,63,64].

In a way that is comparable to epistasis, the sex of the variant carrier can have a decisive effect on the phenotypic expression of the variant. Perhaps the most striking example of this is the well-known male bias in ASD [65]. This work has led to a broader idea of the “female protective effect” in neurodevelopment in general [66,67,68], where females can carry higher levels of genetic risk than males, while remaining largely asymptomatic. The idea of a female protective effect is attractive, in part, because of its therapeutic implications: if the biological pathways that make females more resilient in the face of genetic insult can be better understood, then perhaps they can be exploited to therapeutic effect [66, 68, 69]. The level of sexual dimorphism in neuropsychiatric conditions varies along a spectrum ranging from extreme male bias in autism and other neurodevelopmental conditions to extreme female bias in eating disorders and anxiety [70]. Because of the crucial role that sex plays in the phenotypic expression of genetic conditions, it should be a core factor in experimental designs, and not ignored for convenience sake.

As with sex, the environment can play a crucial modulatory role in the expression of genetic disease, including autism [71, 72], major depression [73], SZ [74, 75], and others. The prototypical example of the gene–environment interaction in neuropsychiatry is post-traumatic stress disorder (PTSD) [76, 77], where not all individuals who suffer a traumatic event develop PTSD. It is thought that there is a latent genetic risk that, when combined with a traumatic event, manifests as PTSD.

Current Trajectories of Genetic Discovery

In the wake of the completion of the Human Genome Project, psychiatric geneticists have exploited the above technologies and concepts to learn more about the biological nature of neuropsychiatric conditions. Below I provide a brief synopsis of the recent trajectory of genetic discovery for some of the major neurodevelopmental and neuropsychiatric conditions. Emphasis is given to neurodevelopmental conditions, and these are not intended to be comprehensive; indeed, the breadth of the field makes a comprehensive review impossible. Rather, the goal is to illustrate patterns of inquiry currently in use across conditions. Table 1 summarizes the various consortia and working groups (with websites, where available) undertaking research in each condition. While not exhaustive, Table 1 focuses mostly on consortia that have demonstrated their productivity through multiple peer-reviewed publications over the last 5 years.

Table 1 List of active consortia, initiatives, and working groups by condition

Intellectual Disability

Intellectual disability (ID) is a genetically heterogeneous neurodevelopmental disorder characterized by impaired adaptive functioning and low IQ. Prevalence of ID is estimated to be between 0.05% and 1.55% [78]. ID is often a comorbidity in other neuropsychiatric conditions [79, 80] and is especially prevalent in SZ and ASD, as well as attention deficit/hyperactivity disorder (ADHD) [81]. Because of its profound effect on fecundity, severe ID is presumed to be largely monogenic and not familial [82], though the total number of risk genes has yet to be enumerated. At the same time, ID has varying levels of severity and it has been shown that some less severe forms of ID are the result of complex polygenic inheritance [82], essentially representing the lower range of intelligence, which itself has been shown to have a heritability of 0.4 to 0.8 [83].

ID has classically been linked to large-scale structural variations in the genome, and these have been extensively reviewed [84]. Recent WGS and WES studies, which can resolve deleterious genetic variants at a much finer resolution, have shored up known ID genes, discovered new ones, and demonstrated emergent patterns. One recent study [85] performed WGS on 50 individuals diagnosed with severe ID and their unaffected parents and found a clear genetic cause in 42% of the subjects (this was estimated to generalize to a rate of 62% in a new sample where microarray and WES analyses had not already been performed). This study, like others, found an excess of protein-coding de novo mutations, as well as enrichment of de novo hits among a list of 528 known ID genes. The use of WGS was vindicated by the discovery of several de novo SVs that escaped detection by microarray technology. As expected, slightly lower diagnostic yields are obtained with WES [86]. More recently, a meta-analysis of > 2000 trios with ID found statistical associations of rare and de novo variants in 10 newly identified candidates: DLG4, PPM1D, RAC1, SMAD6, SON, SOX5, SYNCRIP, TCF20, TLK2, and TRIP12 [87]. These genes were shown to be intolerant to functional nonsynonymous variants.

As a complement to the unbiased, genome-wide studies, many recent studies have followed the “genotype-first” approach [88], where a sample of individuals carrying damaging mutations in the same gene is assembled, and then characterized extensively from a phenotypic standpoint. Examples include POGZ [89], DYRK1A [90], TBCK [91], EBF3 [92], CHAMP1 [93], and IARS [94].

Inborn errors of metabolism represent a well-known cause of ID [95], and are attractive from a research perspective because of the potential for addressing or even preventing negative outcomes through dietary supplementation or restriction [95, 96]. Well-studied examples include phenylketonuria [97], defects in creatine transport [98], branched-chain amino-acid metabolism [99, 100], glycosylation [101], and lysosomal storage disorders [102]. A recent network analysis of ID risk genes underscored the central role of metabolic pathways, and highlighted other areas of convergence in ID, including genes involved in nervous system development, RNA metabolism, transcription, hedgehog signaling, glutamate signaling, peroxisomes, glycosylation, and cilia [103].

A number of pharmacological strategies targeting ID are under development [104]. Risperidone was shown in early studies to significantly ameliorate problematic behaviors [105, 106], though significant side effects temper enthusiasm about its use [107]. Furthermore, because it does not directly target pathways known to be dysregulated in ID, it cannot be considered a specific therapy for ID. Other approaches justified by the known molecular pathologies have been investigated including mGluR5 antagonists [108,109,110,111], ampakines [112,113,114], and γ-aminobutryic acid B agonists [115]. The mTOR pathway has also been a target of drug development in the context of ID [116], where rapamycin and other inhibitors of mTOR function show promise [117].

ASD

ASD is a phenotypically and genetically heterogeneous collection of neurodevelopmental conditions that entail impairments in social communication and restrictive and repetitive behaviors. Heritability of ASD ranges widely depending on study design and confounds, but recent estimates put it at 0.5 to 0.54 [36, 118]. Current estimates of prevalence are at 1 in 68 [119], and ASD encompasses enormous phenotypic diversity, from profoundly affected nonverbal individuals with comorbid ID to highly intelligent but socially impaired individuals. Correspondingly, there is no single genetic architecture for ASD. Some well-known monogenic syndromes show ASD as a common feature, including Rett syndrome (MECP2), Fragile X syndrome (FMR1), and Angelman syndrome (UBE3A). However, recent work has shown that most risk for ASD lies in the cumulative effect of thousands of common risk variants [36]. It is estimated that there are likely hundreds of ASD risk genes [26], and recent coordinated and individual efforts have been aimed at enumerating them. No other neurodevelopmental condition has been investigated with sequencing technology as intensively as ASD, and many early analytical and methodological advances were made to enable the analysis of ASD sequencing data [120,121,122].

In contrast to sequencing studies, ASD GWAS projects have yet to suggest robustly associated loci, and data from these studies have shown more traction in developing polygenic risk scores for ASD [37, 123] than for gene discovery. Array-based CNV studies have been more productive from a locus discovery perspective, through their focus on detecting rare variation [14, 124]. In 2012, a series of coordinated WES studies of ASD were published that used de novo mutation as a means for gene discovery [26,27,28,29]. Shortly thereafter came the first WGS studies of autism [52, 125]. These studies, especially the WES studies with their larger numbers, quickly expanded the list of candidate ASD genes, and multiple follow-up studies confirmed the association of some of these genes with ASD, notably CHD8, DYRK1A, GRIN2B, TBR1, PTEN, and TBL1XR1 [126], POGZ [89], and ADNP [127]. One emerging theme is that ASD risk genes are likely to be regulatory targets of fragile X mental retardation protein (FMRP), the protein encoded by the Fragile X gene, FMR1 [27]. Other risk genes are involved in synaptic structure and integrity, such as neuroligins/neurexins [128, 129], SHANK proteins [130], and contactin proteins [131]. Still other ASD risk genes are involved in broad regulation of chromatin structure and transcriptional regulation, such as CHD8 [28, 29, 126, 132, 133], TBR1 [126, 134], and TCF4 [132, 135].

Effective treatment options for ASD are likely to be highly individualized, owing to the extensive heterogeneity of the condition. Indeed, much of the most promising treatment development work is done in genetically defined forms of ASD, such as Fragile X [136], Rett syndrome [137,138,139], and others [100, 140,141,142,143].

Several consortia and foundations have been and continue to be driving forces in ASD genetics. The Simons Foundation has built extensive infrastructure and data sets for ASD researchers to use, including SFARI gene, which offers a curated list of ASD candidate genes. In addition, sequencing data and biospecimens from the Simons Simplex Collection are available to qualified investigators. The Simons Foundation also sponsors the Variation in Individuals Project, which focuses on genetically defined forms of ASD such as 16p11.2 and 1q21.1. Other consortia and projects actively contributing to research in the genetics of ASD include the ARRA Autism Sequencing Collaboration, the Autism Genome Project Consortium, the 16p11.2 European Consortium, and the Psychiatric Genomics Consortium.

ADHD

ADHD is a highly heritable neurodevelopmental condition that presents with impairments in sustaining attention and an inability to control impulses and activity level. In addition to the previously mentioned comorbidity with ID, ADHD shows high comorbidity with ASD within individuals and family members [144]. Prevalence of ADHD is estimated at 5% to 7% in children [145, 146] and 3% to 5% in adults [147, 148]. Twin studies estimate high heritability (70–80% [149, 150]), and ADHD is genetically heterogeneous and a host of candidate genes have been suggested [151,152,153]. Candidate gene studies (i.e., where only one or a handful of polymorphisms are studied) suffer from ascertainment bias, and many of the positive associations in candidate gene studies of neurotransmitter pathway genes are not represented in genome-wide studies, where a higher burden of proof exists owing to multiple hypothesis testing. However, some integrative analyses have indicated that pooling of common variants within these neurotransmission pathways may lead to statistical significance [154, 155].

Perhaps the most compelling theme to emerge from the genome-scale studies of ADHD is glutamatergic neurotransmission. A large study of CNVs implicated metabotropic glutamate receptors, as well as functionally related genes, in the etiology of ADHD [156]. This work led to the repositioning of the drug NFC-1 (fasoracetam monohydrate), which stimulates metabotropic glutamate receptors, for treatment of ADHD in individuals with confirmed mutations in mGluR genes (clinical trial ID NCT02777931). A subsequent study of CNVs in ADHD also found a link to metabotropic glutamate receptors [157]. Imaging studies have suggested abnormal glutamate levels in cortical and subcortical brain regions [158]. Glutamatergic signaling has gradually emerged as a point of overlap between ADHD and ASD [159, 160]. Methylphenidate, a common medication for ADHD that acts as a dopamine–norepinephrine reuptake inhibitor, was found to modulate the number of surface glutamate receptor subunits in a dose-dependent, bidirectional fashion [161]. Furthermore, genetic variation in GRM7 was associated with response to methylphenidate [162].

Synaptic adhesion molecule LPHN3, which regulates synaptic density and development [163], was found to be associated with ADHD by a linkage study [164] and has been robustly replicated since then [165,166,167,168,169,170,171,172,173,174,175]. LPHN3 variants have been shown to be predictive of methylphenidate response in ADHD [173]. FLRT3 encodes a ligand for LPHN3 [176,177,178], and has itself shown suggestive genetic associations to ADHD [179,180,181,182].

Other studies of rare variation in ADHD have implicated signal transduction genes NT5DC1, PSD, SEC23IP, and ZCCHC4 [183], and a small-scale exome sequencing study found a significant excess of rare variation in 51 preselected ADHD candidate genes [184].

Developmental Language Disorder

Developmental Language Disorder (DLD), also known as specific language impairment, language impairment, or language disorder, is a neurodevelopmental condition that impairs expressive and receptive language ability that is not attributable to hearing loss or severe ID. Prevalence is estimated at 7% [185] and heritability is moderate to high [186,187,188,189]. It is highly comorbid with ADHD, with ~40% of individuals with DLD also having an ADHD diagnosis [190].

Early genetic studies of language ability were driven by pedigrees with very pronounced phenotypes, and these led to seminal discoveries such as disruptive variation in FOXP2 being associated with impaired language abilities [191,192,193,194]. However, subsequent studies of common variation have not been able to provide strong support for these genes playing a major role in DLD [195, 196]. Genome-wide studies of common variation are generally woefully underpowered and have not produced findings that withstand multiple testing correction [197, 198].

A recent study that combined GWAS and exome sequencing in an isolated population implicated SETBP1 as a language-associated gene, and this association was replicated in an admixed sample [199]. Furthermore, through exome sequencing of the most severely affected individuals in the sample, a number of putatively disrupting variants were found and, together with GWAS candidates, these genes were enriched as transcriptional targets of MEF2A. WES was used in another study that implicated NFXL1, and then confirmed the association with language ability in 2 other samples [200]. These studies illustrate that movement toward sequencing technologies will bear fruit because of its ability to combine analysis of common and rare variation together.

Tourette Syndrome

Tourette syndrome (TS) is a neurodevelopmental condition that presents with disruptive motor and vocal tics [201]. Although early work suggested a monogenic, autosomal dominant mode of inheritance [202], subsequent findings demonstrated substantial genetic and phenotypic heterogeneity [203,204,205]. TS shows strong comorbidities with obsessive–compulsive disorder and ADHD [206], with other mood, anxiety, and disruptive behavior disorders occurring in about 30% of probands. While many of the specific genes underlying TS remain elusive, over the last decade, histidine decarboxylase, a key enzyme in the biosynthesis of histamine, has emerged from human genetic studies as an important player in the etiology of the disease [207,208,209], with animal models displaying TS-like phenomenology [210] and implicating an interaction between dopaminergic and histaminergic systems in the basal ganglia. Administration of haloperidol and histamine were shown to rescue TS-like behavioral and molecular characteristics in this model.

A number of common variant studies of TS have been carried out, some at the genome-wide scale, with less conclusive results. The first GWAS of TS failed to reach significance for any SNP [211]. A follow-up study [212] that included 42 of the top candidates from the initial GWAS found a significant association at rs2060546, near the gene that encodes the axon guidance protein netrin 4 (NTN4), which shows strong expression in the striatum. A recent attempt at replicating this association failed [213]; however, a meta-analysis of 3 cohorts showed significance and consistent direction of effect.

Several consortia are currently undertaking large-scale genetic studies of TS. The European Multicentre Tics in Children Study (EMTICS) seeks to elucidate gene–environment interactions, including the involvement of infection and immune mechanisms in TS etiology. Two patient cohorts form the basis of EMTICS: the ONSET study involves follow-up of 375 high-risk children aged 3 to 10 years who have an immediate family member with a diagnosis of TS and at study entry have no tics. COURSE is a longitudinal study that is following 700 children and adolescents (aged 3–16 years) with a known chronic tic disorder or TS. The study began in March 2013 and the study will conclude in 2017.

TS-EUROTRAIN is a training network that acts as a platform to unify large-scale TS studies and educate the next generation of experts. TS-EUROTRAIN is notable for completing the first epigenome-wide association study for tics, analyzing data from the Netherlands Twin Register [214]. This study interrogated 411,469 autosomal methylation sites in 1678 individuals. Although no site reached genome-wide significance, the top hits include several genes and regions previously associated with neurological disorders and warrant further investigation.

The Tourette International Collaborative Genetics (TIC Genetics) study [215, 216] is currently finishing analysis of WES data from 325 simplex TS trios, with the main focus being the detection of de novo SNVs and indels.

Adult-Onset Disorders

Schizophrenia

Schizophrenia (SZ) is a complex, highly heritable, and heterogeneous psychiatric disorder that presents with positive (psychosis, hallucinations) and negative (apathy, blunted affect, social withdrawal, poverty of speech, anhedonia) symptoms, with associated cognitive deficits. Its prevalence is estimated at 1% [217] and it is highly heritable (65–81% [218, 219]). Significant comorbidities include substance abuse (47%) and depression (50%), as well as anxiety disorders (29% PTSD, 15% panic disorder, and 23% obsessive–compulsive disorder) [220]. Despite the wide appreciation of the genetic roots of SZ, its etiology and origins are still not fully understood [221], though some recent exciting progress has been made that is beginning to chip away at the complex physiology that underlies SZ.

It is clear that SZ is polygenic [222, 223], and, indeed, the most compelling study of the last several years identified 108 loci as SZ-risk loci [224]. One of these hits resides in the major histocompatibility complex, a notoriously difficult region to resolve from a genotype standpoint. Despite this challenge, recent work that focused on elucidating the underlying source of the association signal in this region found that the complement component gene C4 drives the underlying association, and that risk alleles result in higher expression of C4A [225, 226]. C4 protein is expressed at the synapse and plays a role in synaptic pruning, thus implicating overactive synaptic pruning as a mechanism underlying schizophrenia risk.

A recent large-scale WES study of SZ [222] implicated voltage-gated calcium ion channels and proteins comprising the ARC postsynaptic protein complex as harboring an overabundance of putatively functional rare variation. As has been noted for genes affected by damaging variation in ASD [27], this study showed an enrichment of FMRP targets affected by damaging variation in SZ probands. These and other genetic findings implicate synaptic genes as a major theme in the genetic etiology of SZ [227], suggesting that development of therapeutics should involve the targeting of synaptic proteins and processes.

Bipolar Disorder

Bipolar disorder (BPD) is a heritable psychiatric condition marked by alternating episodes of mania and depression. Prevalence is 2% to 3% [228], and heritability may be as high as 80% [229]. Individuals with BPD are at an 8- to 10-fold increased risk for suicide [230]. BPD is a complex and genetically heterogeneous condition; however, compared with other conditions such as SZ, discovery of genetic risk factors robustly and specifically connected to the condition has been slow. The largest GWAS in BPD to date [231] showed associations in 4 known (MAD1L1, 6q16.1, DDN, and TRANK1) and 2 novel loci (intergenic 9p21.3 and intronic variants in ERBB2). Exome sequencing in a large cohort of familial BPD showed an enrichment of predicted damaging variation in genes previously associated with ASD and SZ, as well as targets of the fragile X protein, FMRP [232], suggesting overlapping genetic risk with these disorders. The trickle of genetic findings in BPD, in contrast to its high heritability, speak to the extensive complexity and heterogeneity of the disorder, and support the need for larger studies of this condition. In the near term, perhaps one of the most promising approaches is in the genetics of lithium response, reviewed in detail in this issue.

Major Depressive Disorder

Major depressive disorder (MDD) is characterized by prolonged depressed mood or loss of interest or pleasure in nearly all activities, together with other disturbances in areas such as sleep, appetite, and psychomotor activity. Its prevalence varies by geographic location but is mostly 8% to 12% [233, 234], and women show more susceptibility than men. Significant comorbidities include dysthymia (20%) and anxiety disorders (21%) [235] . Heritability is estimated at about 37% [236, 237]; however, like BPD, genetic findings that provide compelling biological insights into the disorder have been elusive [238]. Nevertheless, recent studies show promise. The largest MDD GWAS to date, which used data from direct-to-consumer genetics company 23andMe, uncovered 15 loci associated with MDD [239]. Another study used low-coverage WGS and identified SIRT1 and LHPP as potential MDD risk genes [240].

While genetic studies of MDD have not yet yielded enough robust results for pathway analyses to gain traction, a recent gene expression study of individuals with MDD implicated inflammation/immune pathways (specifically interleukin-6 and natural killer signaling pathways) as a marker of MDD [241]. DVL3, a gene that regulates cell proliferation and previously implicated by PGC results [242], was also found to be differentially expressed in this analysis. These findings raise the possibility of targeting inflammation pathways as a means to treat depression [243].

Conclusion

Human genetic studies have been the driving force in bringing to light the underlying biology of psychiatric conditions. The complex nature and genetic heterogeneity of these conditions requires vast sample sizes to power statistically robust associations, and such studies can only be accomplished through continued intra- and international collaboration and an increase in data sharing. While these massive collaborative efforts are necessary for gene discovery, individual laboratories and investigators still play a vital role in contextualizing and deepening our understanding of these genetic associations through focused and hypothesis-driven investigation.

As a complement to these gene-discovery efforts, it is clear that much of the future of psychiatric genetics lies in the “genotype-first” approach to studying genetically defined neuropsychiatric conditions. These studies, which are perhaps the greatest near-term boon that genetics can bestow on therapeutic research, can inform the design of clinical trials and improve the odds of their success while providing crucial insight into the penetrance and variable expressivity displayed among carriers of functionally comparable genetic variations. The Simons Variation in Individuals Project project provides a useful model for considering the promise and the challenges of clustering patients by genetic etiology. Furthermore, emerging online patient networks such as patientslikeme.com and the Interactive Autism Network, though not explicitly genetically defined, also show potential in the way they remove geographic boundaries and allow self-clustering of patients.

From a technological standpoint, microarray technology is waning, though for certain applications it remains an economical option. Sequencing approaches, and, in particular, WGS, allow for comprehensive discovery of all modes and frequencies of genetic variation, not just those designed into an array. Cost and analytical expertise are the prevailing barriers for wider adoption, but these are rapidly diminishing.

Common polygenic risk is becoming a useful tool for comparing the shared genetic basis of disparate psychiatric conditions, for stratifying population samples according to their polygenic risk as part of the study design, as well as for studying genetic correlates in the general population while drawing conclusions about the link between psychiatric risk and traits that may be under positive selection (such as creativity or cognitive ability).

Finally, as genetic and other molecular studies of psychiatric conditions increase our understanding of the basic biology of these disorders, we may find that drugs (or supplements) already on the market may be repurposed to treat underlying causes that manifest as mental illness [100, 243].