Rediscovering the value of families for psychiatric genetics research

Large-scale genome wide association studies (GWAS) comprise the dominant paradigm in psychiatric genetics research today [1]. Case/control GWAS, that compare the frequency of minor alleles from common polymorphisms between unrelated individuals [2], have provided numerous insights into the genetic architecture [1] and the interrelatedness [3, 4] of psychiatric disorders. However, like any experimental approach, the case/control GWAS design has relative strengths and weaknesses. Unfortunately, it is unlikely that any single design will be able to dissect all of the genetic influences on multifactorial traits [5, 6] such as mental illnesses [7, 8]. Rather, diverse complementary approaches may be necessary to garner the full spectrum of biological insights that genetics could provide neuropsychiatry [5, 9,10,11,12]. Chief among these approaches is the use of whole genome sequencing (WGS) which catalogues almost all genomic DNA sequence variation within an organism [13]. Early sequencing efforts confirmed that the substantial majority of human genetic variation is rare (occurring in less than 1% of the population) or private (only occurring in a single individual and their close relatives) [13, 14]. There is a growing appreciation of the impact of rare variation on human disease [11, 15, 16], particularly given the excess of rare functional variants resulting from recent accelerated population growth and relatively weak purifying selection [17]. Rare variants, especially loss of function variants or those deleterious to protein expression, are far more amenable to biological experimentation, and subsequent molecular insights, than common loci [18,19,20,21,22], which are often localized outside of transcribed regions [23, 24]. As it is likely that both common and rare variation are relevant for complex diseases [11], both GWAS and WGS methods should be utilized in a complementary manner to dissect the genetic influences on mental illness.

The rate limiting factor for inferring an association between a particular rare variant and a phenotype is inevitably the total number of copies of that variant captured in the sample [25,26,27]. Typically, to have enough copies of a rare variant for statistical analysis, one must sequence very large samples of unrelated individuals (e.g., ~700,000 in the recent human height exome study [20]). Consistent with this notion, the Whole Genome Sequencing of Psychiatric Disorders (WGSPD) consortium estimated that sequences from at least 20,000 unrelated cases and controls are needed to adequately power a gene burden-type analysis [8], though far larger samples are necessary to identify specific risk variants for mental illness. However, by using alternative analytic strategies and studying related individuals, particularly those from large multiplex families, it is possible to reduce the required sample size while maintaining statistical power [28,29,30,31]. Given this and other benefits discussed below, we contend that WGS in extended pedigrees provides a cost-effective strategy for psychiatric gene mapping that complements GWAS and WGS in unrelated individuals. In fact, family-based methods may be the only feasible study design for specifically identifying the rarest functional variants that are private to family lineages. This was our impetus for forming the “Pedigree-Based Whole Genome Sequencing of Affective and Psychotic Disorders” consortium, an international group of scientists using family-based designs to identify rare variants that increase risk for psychiatric disorders.

In this review, we provide a rationale for the use of WGS with pedigrees in modern psychiatric genetics research. We begin with a focused review of the current literature, followed by a short history of family-based research in psychiatry. Next, we describe several advantages of pedigrees for WGS research, including power estimates, methods for studying the environment, and utilizing endophenotypes. We conclude with a brief description of our consortium and its goals.

The current state of psychiatric genetics

Large-scale GWAS meta-analyses have been successfully completed for schizophrenia (sample size: 36,989 cases/113,075 controls [32]), bipolar disorder (13,902/19,279 [33], 9,784/30,471 [34]), major depression (130,664/330,470 [35], 10,851/32,211 [36], 121,380/338,101 [37]), post-traumatic stress disorder (5131/15,092 [38]), attention deficit hyperactivity disorder (20,183/35,191 [39]), and autism (16,539/157,234 [40]). Together, these GWAS have localized over 200 genome-wide significant loci influencing mental illness risk [1]. Given the sample sizes listed above, it is quite possible that common loci with moderate to large effect sizes for the majority of mental illnesses have already been localized [10], at least among individuals of European ancestry. If so, this represents an important milestone for the field and provides an opportunity to explore alternate approaches for delineating the genetics of mental illness.

One lesson from GWAS is that mental illnesses, like other complex diseases, appear to be highly polygenic, involving large numbers of loci, most of which have a small or very small effect on risk [10, 41, 42]. This pattern of results is entirely consistent with Fisher’s multifactorial model [43], which predicts that as the number of risk loci grows, the contribution of each new locus correspondingly shrinks. Accordingly, results from meta-analyses have been used for individual risk prediction based on polygenic scores [44, 45] that include thousands to hundreds of thousands of variants to provide a risk index [46]. Additionally, loci from GWAS studies appear to be useful for selecting among potential therapeutic agents [47], a property which could have a significant impact in psychiatry [48] where novel drug development is at a near standstill [49].

A case for rare genetic variants in mental illness

Arguably, our understanding of the genetic underpinnings of autism spectrum disorders has advanced more than that of other mental illnesses because investigators have focused more on rare nonsynonymous variants [50] than common genetic variation [40]. These studies, which often search for exonic de novo mutations [51, 52], have identified at least 50 potential risk genes for the disorder that together with copy number variants (CNV) explain more than 30% of the genetic variance of the illness [53, 54]. While the relative contribution of largest-effect common variants and of higher-penetrance rare variants probably varies across mental illnesses [1, 55, 56], the genetic architecture of autism is likely not unique. For example, Singh and colleagues identified a set of rare, putative loss-of-function variants in an exon SETD1A that strongly increases risk for schizophrenia and intellectual disability [57]. Similarly, exome sequencing studies in schizophrenia have implicated genes expressed in neurons [58] and synapses [59] and shown that affected individuals have more rare protein-altering loss-of-function variants than unrelated controls [58].

Perhaps the strongest evidence that rare variation is important across mental illnesses [60] comes from findings that certain rare CNVs or insertion-deletions clearly influence risk for autism spectrum disorders [61], intellectual disability [62] and schizophrenia [63, 64], and may also contribute to bipolar disorder [65] and ADHD [66] risk. Indeed, the 22q11 CNV [67] is among the strongest genetic predictors of schizophrenia risk [63, 68].

Family studies in psychiatric genetics

Historically, the mapping of traits to genetic loci in humans depended almost exclusively on family studies. Early linkage studies posited simple, single major gene models of inheritance and utilized transmission of chromosome segments across generations in large pedigrees to map putative disease loci relative to a scaffold of a few hundred markers of known position. Later linkage approaches did not assume a Mendelian model and utilized identity-by-descent (IBD) allele sharing among relatives. Although these linkage methods successfully identified loci for some illnesses (e.g., Huntington’s disease, Alzheimer’s disease, macular degeneration, diabetes, and some forms of breast cancer), early attempts to localize the genetic influences on polygenic diseases were limited and often could not be replicated. Indeed, two early high profile reports of linkages for bipolar disorder, one on the X chromosome [69] and the other on 11p [70], could not be replicated [71, 72]. When reviewing this literature in 2008, Burmeister and colleagues [73] reported that no single locus was unequivocally replicated across multiple independent samples for any mental illness. This lack of results was likely due to underpowered studies that used suboptimal concordant sibling pair designs [73, 74] and were likely ineffectual where very rare or private mutations were causal. Nonetheless, discouraging progress with linkage analyses, combined with the simplicity of sampling unrelated cases and controls, undoubtedly added to the popularity of association methods and the field’s shift towards GWAS.

In an influential article, Risch and Merikangas [75] argued that linkage analysis has limited power to detect genes of modest effect (particularly in concordant sibling pair designs), but that family-based assocation methods have far greater power to detect the same loci, provided the locus is either directly genotyped or in strong linkage disequilibrium (LD) with a genotyped marker. The genome-wide application of this association strategy was made possible by the human genome project’s identification and mapping of hundreds of thousands of common genetic variants and the characterization of patterns of LD between them. It draws on shared population history rather than transmission among family members, to map loci of interest. This information, in turn, allowed investigators to estimate minor allele frequencies (MAF) and LD-structure for singletons, enabling GWAS in unrelated individuals [2]. Yet, the reliance on population level knowledge has drawbacks. For example, GWAS are population-specific. Most published GWAS have been in European-derived populations, where the LD structure is well defined and represented on GWAS arrays. Although work is ongoing, sample sizes in non-European populations are yet to reach levels that would support powerful GWAS [76]. Carefully ascertained, very large families do not require population level information (e.g., MAF or LD-structure), have the potential to provide sufficient copies of very rare alleles to identify their effects, and offer the opportunity to leverage both analytical approaches, combining genome-wide association and examination of familial transmission within the same analysis. Thus, while family-based designs were largely set aside in the GWAS era, the recurring focus on rare variants and functional genomics have renewed interests in pedigrees.

Rare variants and pedigrees

Pedigree-based studies represent an implicit enrichment strategy for identifying rare variants as transmission of a rare allele from parents to offspring follows Mendel’s laws, maximizing the chance that multiple copies of that allele exist in the pedigree. For example, 148 individuals from a single large pedigree sampled in our ongoing “Genetics of Brain Structure and Function” study [77, 78] are represented in Fig. 1. Based on the principles of Mendelian inheritance, the pedigree could maximally provide 105 copies of a rare or even private mutation originating in a single founder (founder and unilineal descendants). While the propagation of a particular variant within a pedigree is likely less extreme than this, the example provides an important heuristic for understanding how families enrich even the rarest of genetic variation where the segregation of rare variants in a pedigree provides multiple copies, facilitating their detection and effect estimation [29, 31, 79, 80]. For a known pedigree, each founding lineage can be directly assessed for the expected number of copies of a private variant originating at the top of the lineage using Mendelian transmission probabilities. The expected number of copies of a private variant originating in the focal founder of Fig. 1 is 13 (as is that of his founder spouse). While this founder pair exhibits the maximum number of potential copies, the founder female spouse of the third male sibling in generation II actually exhibits the highest expectation of potential copies with 14.125.

Fig. 1
figure 1

Demonstration of rare variant inheritance in a large extended pedigree. One hundred and forty-eight individuals from a single large pedigree sampled in our ongoing “Genetics of Brain Structure and Function” study are represented. Based on the principals of Mendelian inheritance, the pedigree could maximally provide 105 copies of a rare or even private mutation originating in a single founder (filled). The figure was created with CraneFoot [150]

For a fixed biological effect size, the power of pedigrees for capturing larger numbers of rare minor allele copies than that expected in an equivalent set of unrelated individuals is a direct function of pedigree structure. Basically, the variance of the number of minor allele copies (MACs) can be substantially larger (and therefore lead to potentially many more copies) in pedigrees than in a sample of unrelated individuals. Given that the expected correlation structure for the allelic dosages amongst family members is well represented by the coefficient of relationship matrix, R, standard covariance mathematics reveal that the expected excess in the variance of expected MACs in a pedigree can be approximated by a multiplicative variance inflation factor, \(VIF = \mathop {\sum}\limits_{i,j} {r_{ij}/n}\) where rij is the coefficient of relationship between the i-th and j-th individuals in the pedigree and n is the number of individuals in the pedigree. The larger the VIF for a pedigree, the greater the expected power is for capturing larger numbers of a private variant, which itself determines the expected power to detect an association of a rare variant conditional on biological effect size. A sibship yields a VIF equal to\(1 + (n - 1)(1{\mathrm{/}}2)\), thus a large sibship of 10 siblings generates a VIF of 5.5 times that expected for 10 unrelated individuals. The pedigree shown in Fig. 1 generates a VIF of 8.6. Typically, large pedigrees with large lineages will yield the highest VIFs likely to be observed in humans. Thus, pedigrees are optimally suited for the examination of rare functional variants because in the limiting case of private variants, traditional epidemiological studies of unrelated individuals are highly unlikely to capture more than a single copy of such a variant (e.g., [58, 59, 81]). Pedigree-based studies could capture many more depending upon the size and structure of the pedigrees. However, a potential negative for such studies is the more limited number of genomes being observed over that of unrelated samples. For example, the pedigree in Fig. 1 represents independent genomes from 44 founders versus that 148 that would be observed if all these individuals were unrelated. Thus, while more copies of rare variants can be captured in pedigrees, we also expect fewer such variants overall than in samples of unrelated individuals.

For rare variants in the absence of inbreeding, the number of heterozygotes captured is a primary determinant of statistical power to detect association. In this case, the number of heterozygotes is equivalent to the number of minor allele copies captured in the sample. Following theory developed in Blangero and colleagues [82], the expected association test statistic for private variants in pedigrees can be approximated (for small relative effects) as:

$$\chi _1^2 \approx Nh_q^2 - c(h_T^2{\mathrm{,}}h_q^2,{\bf{R}}) = NH(1 - {\it{H}})\alpha ^2 - c(h_T^2,h_q^2,{\bf{R}})$$

where N is the sample size, \(h_q^2\) is the heritability due to the variant in the sample, \(h_T^2\) is the total heritability of the trait, H is the proportion of heterozygotes in the sample, and α is the displacement of the heterozygote mean trait value from the common homozygote in standard deviation units. The parameter, α, directly measures the biological effect size of the variant. The symbol \(c()\) represents a function of parameters within the parentheses and is used here as a correction that accounts for the non-independence amongst related individuals and is defined in detail elsewhere [82]. The value of c is generally small for most reasonable genetic effect sizes [82]. Thus, power is dominated by the biological effect size and NH that gives the observed number of heterozygotes (or the number of captured minor allele copies) in the sample.

Figure 2 shows the biological effect size that can be detected at 80% power for a fixed number of observed heterozygotes in the pedigree in Fig. 1. We show the range of 5 to 70 heterozygotes/MACs. The lower bound of five minor allele copies required before testing is based on simulations that show that the resulting test distribution under the null hypothesis conforms with expectation (i.e., there is no excess type I error). As the number of captured MACs increases, power to detect moderate biological effect sizes improves. As a rough reference, a biological effect size of 4.5 SDU approaches nearly monogenic penetrance. Figure 2 also shows the effect of augmenting this pedigree with an additional 20,000 unrelated controls (the total sample size of the WGSPD consortium [8]). For the case of the rarest of variants (i.e., private variants), there is a relatively minor improvement in power with increased numbers of controls who are highly unlikely to harbor the rare variant. Thus, the recruitment of related individuals acts like an ascertainment bias to increase power by increasing the probability of capturing additional copies of rare variants that appear in the founders of the sampled lineages.

Fig. 2
figure 2

Biological effect size for rare variants as a function of minor allele copies (MAC). The blue dashed line shows the biological effect size that can be detected at 80% power for a fixed number of observed heterozygotes in the pedigree in Fig. 1. As the number of captured MACs increases, power to detect moderate biological effect sizes improves. The effect of augmenting this pedigree with an additional 20,000 unrelated controls is presented in the orange line. For the case of the rarest of variants, there is a relatively minor improvement in power with increased numbers of controls who are highly unlikely to harbor the rare variant

The prior discussion focuses on ascertainment of families simply through lineage size in order to maximize the capture of rare variants that originate in pedigree founders. However, additional power benefits accrue through additional ascertainment through disease or phenotype. For example, the co-segregation of rare variation and disease status in multiplex families can amplify association signals [31, 83, 84]. For the study of rare sequence variation, an implication of Mendelian transmission is that the required sample sizes can be orders of magnitude smaller for families than those required for designs based on unrelated subjects [85], particularly if sequence information is combined with linkage methods [28] in pedigrees of 20–25 individuals or larger [29], when comparing affected sibling pairs [30] or when searching for shared genomic segments [31]. For the rarest variants, large pedigrees have better power for detection of linkage/association when compared to equivalent-sized samples of smaller families [86] or unrelated subjects [80, 87, 88]. Family-based cohorts have substantially greater power than unrelated cases to detect rare genetic effects given an equivalent number of sampled individuals [89, 90].

An additional advantage to studying families is that, in contrast to unrelated individuals, the analysis of phenotypes among family members is constrained for genetic background (e.g., minimizes the impact of population admixture and stratification [91, 92]). Given that analytic techniques developed to correct for population stratification in common variant studies maybe less effective when the focus is on rare variants [93, 94], observations that pedigree-based experiments appear to be robust to population stratification are of particular importance [92]. In addition, reduced environmental variation among family members can reduce noise, improving statistical power to observe genotype-phenotype associations [95]. Shared familial environments also can alter the potential to observe signals resulting from gene-environment interactions. Pedigree-based designs allow for the investigation of de novo mutations, parent of-origin effects [96], transmission bias [97], phasing [98, 99], and compound heterozygosity [100, 101]. Finally, when pedigrees have multiple affected members it is often presumed that the same inherited mutation on a similar genetic background causes the illness in each case. This assumption appears to be better supported when a kindred includes at least three affected individuals [102, 103]. Although unambiguously demonstrating phenocopies is difficult in multifactorial phenotypes [104], it is possible that family-based studies provide a method for detecting phenocopies if a rare mutation appears to segregate with affection status in the pedigree[102]. To the extent that the segregating mutation also influences an illness endophenotype (see below), contrasting the endophenotype from the putative phenocopy and family members who carry the variant could provide further evidence of the non-genetic origin of the illness in that individual.

Rare variants, pedigrees and psychotic and affective disorders

Recently, Steinberg and colleagues [105] examined a single Icelandic pedigree with ten psychotic individuals (six schizophrenia, two schizoaffective disorder and two psychotic bipolar disorder) using WGS and long-range phasing. All affected individuals carried a rare nonsense mutation in RBM12 (RNA-binding-motif protein 12) resulting in a truncated protein lacking a predicted RNA-recognition motif while few unaffected had the mutation (p = 2.2 × 10−4). A Finnish family with a second loss of function RBM12 mutation replicated the finding (p = 0.020). Although the truncating mutation was not fully penetrant for psychosis, non-psychotic carriers were similar to their psychotic relatives in terms of neurocognitive endophenotypes, educational attainment and disability benefits received. Together, these data strongly associate RBM12 with psychosis risk and demonstrate the potential for gene identification using WGS and extended pedigrees.

Homann and colleagues [106] performed WGS on nine families with at least three members with schizophrenia. In one of these families, seven siblings with schizophrenia spectrum disorders carried a private missense variant within the SHANK2 gene. In a separate family, four affected siblings carried a novel private missense variant in the SMARCA1 gene. In a conceptually similar study, Timms and colleagues [107] used exome sequencing to examine rare nonsynonymous variants in five multiplex schizophrenia families. One pedigree carried a missense and frameshift substitution of GRM5, while another family had a missense substitution in PPEF2; both are genes that directly interact with the NMDA system [107]. Three pedigrees had missense substitutions within LRP1B, which is putatively related to the NMDA receptor. While these findings require replication and biological validation, nominated genes are reasonable empirical candidates for psychosis risk, warranting further research.

As can be seen in Table 1, an increasing number of family-based sequencing studies involving affective and psychotic disorders are being published, often with very small sample sizes. While findings from most of these studies have yet to be replicated, several of the more recent studies, particularly those conducted in population isolates [108] with larger sample sizes, provide strong candidate genes for these disorders.

Table 1 Extended pedigree-based sequencing studies of psychotic or affective disorders

The foregoing discussion focused on identifying individual rare variants or CNVs strongly associated with risk for affective or psychotic disorders. The focus on a single variant or CNV is analytically consistent with method developed for monogenic disorders [109]. However, there is growing evidence that even in the case of a highly penetrant mutation, an individual’s genetic background contributes to illness risk [110]. For example, among individuals with a 22q11 deletion, rare CNVs outside of the 22q11 deletion region significantly contribute to schizophrenia risk [68]. Similarly, among members of a large multiplex pedigree with a balanced chromosomal translocation (1q42–11q14.3) associated with affective and psychotic disorders [111], common and rare variation in other areas of the genome appear to increase illness risk [112]. These finding are consistent with observations that genetic variation outside of the focal “causal” gene are often necessary for disease expression in monogenic disorders [103]. Together, these results serve as a reminder of the difficulty of making casual inferences in human genetics.

Cost-effectiveness of wgs in families

Family-based designs are cost effective. Given that genetic relationships between family members are known, WGS can be imputed [113] for individuals who have sparse genotype data, decreasing the effective cost per sample [114, 115]. This pedigree-based imputation or “pseudo-sequencing” is particularly effective for rarer, segregating variants [116]. Typically, this approach consists of two steps: (1) form optimal sub-pedigrees that maximize phase and IBD information and (2) pseudo-sequence each sub-pedigree. The resulting output will contain the expected number of copies for the tested allele (dosage), shown to yield the most power when used in association testing versus choosing most probable genotypes [90]. Livne and colleagues [114] applied similar methods (a combination of pedigree-based and LD-based imputation), reporting > 99% accuracy over the full range of allele frequencies. With data from the “Genetics of Brain Structure and Function” study, we found that pseudo-sequenced individuals show 97% accuracy for rare heterozygous variants and 99% for rare homozygotes compared to ExomeChip genotypes. Despite the accuracy of these “pseudo-sequencing” methods, once a rare variant is associated with a specific trait, we advocate directly genotyping that variant across the full sample to confirm the imputation.

Pedigree-based sequence data allows a level of quality control not available for population studies. Genotyping errors occur when the “true” genotype is not identical to the genotype determined after subsequent genotyping. These errors, can occur at every step of the genotyping process and cannot be fully eradicated as genotyping methods are not completely accurate [117]. Genotyping errors can lead to a number of possible biases, including an artificial excess of homozygotes [118], a false departure from Hardy–Weinberg equilibrium [119], an overestimation of inbreeding [118] or unreliable inferences about population substructures [120]. Incorporating evidence of Mendelian transmission of alleles between parents and offspring in pedigree data can dramatically reduce genotyping errors [121], even allowing for the detection of de novo mutation and the fact that 25% of typing errors may be Mendelian-compatible [122].

Using families to model environmental risk factors

Mental illness results from multiple genetic and environmental factors and, likely, from their interactions. In contrast to genetic data, the environment is ever changing and its impact can vary with developmental stage, making the study of non-genetic influences on mental illness risk particularly challenging. Yet, most studies of environmental risk factors for mental illness (e.g., [123,124,125.]) do not explicitly account for genetic background. For example, the incidence of schizophrenia is higher among individuals living in urban areas than to those living in rural areas [126, 127], which presumably reflects an environmental risk factor for psychotic disorders. However, even this classic environmental risk factor has an appreciable genetic component, where “urbanicity” is to some extent conditioned upon family history for schizophrenia [128] and individuals living in urban areas have higher polygenetic risk for the disorder than those living in rural areas [129]. Thus, Epidemiological studies designed to identify risk factors for mental illnesses should also include genetic information [124]. Pedigree-based designs, in addition to being of value for detecting genetic loci, enhance the study of environmental factors influencing mental illness as they provide a relatively straightforward method for optimally statistically controlling for genetic influences. Recently, we developed a best linear unbiased predictor estimation procedure to obtain individual-level estimates of genome-wide genetic effects [130]. This procedure uses all phenotypic information available for an individual and his or her relatives to infer the underlying genetic component of a phenotype. The estimated genetic value is then subtracted from the original phenotypic value to obtain an estimated environmental value devoid of the average additive genetic signal. Polygenic effect estimates derived in this way can be used to control for genetic influence when investigating non-genetic (environmental) contributions to mental illness.

Endophenotypes

An endophenotype is a trait influenced by some or all of the genes predisposing to an illness [131, 132]. As endophenotypes are measureable in both affected and unaffected individuals, they are theoretically capable of providing greater statistical power to localize and identify disease-related genes than affection status alone [26, 133]. Furthermore, as demonstrated by Steinberg and colleagues [105], endophenotypes can provide insight into unaffected carriers of putatively causal illness variants. Despite the consistent use of endophenotypes in other areas of human disease genetics (sometimes referred to as allied phenotypes or simply risk factors), their application in larger scale psychiatric genetics studies designed to identify novel risk loci has been limited [132]. However, methods for empirically selecting endophenotypes for specific illnesses based upon shared genetic covariance using related subjects [134, 135] or based upon common variants [46] have been developed and overlapping genetic influences for cognitive, electrophysiological, neuroimaging and transcriptional measures and various psychiatric disorders have been discovered [132]. Regardless of the genetic design employed, we strongly advocate deep phenotyping, including quantitative diagnostic/symptom measures and cognitive, imaging and molecular endophenotypes. While myriad of potential endophenotypes for psychotic and affective disorders exist, selecting those that are heritable, genetically correlated with illness risk and amenable to large scale data collection is critical [132]. Tools for such deep phenotyping are now available in the public domain (e.g., PhenX Early Psychosis Translational Research Collection https://www.phenxtoolkit.org/index.php?pageLink = browse.nimh.eptr).

Effects of ascertainment

How families are selected for study may influence both the phenotypic spectrum of the sample and the underlying genetic contributors. Probands recruited as part of families may differ from those recruited as singleton cases, presumably as an effect of selecting individuals with intact family relationships. For example, the Consortium on the Genetics of Schizophrenia (COGS) examined neurocognitive measures and other endophenotypes in families selected through a proband (COGS-1) and in a case-control (COGS-2) study [136]. Patients ascertained through the family-based design, compared to case-control, were younger, had higher educational attainment, better educated parents and superior performance on some neurocognitive tests. Thus, studies that use case-control ascertainment may tap into populations with more severe forms of illness that are exposed to less favorable factors compared to those ascertained through designs that require family participation. However, designs that require multiple affected individuals in a family may result in a more severe phenotypic profile and a different underlying genetic architecture as compared to simplex families. For example, a comparison of multiplex and simplex ASD families found an enrichment of CNVs in ASD risk loci in both but a lower rate of de novo CNVs in the multiplex families [137]. Family selection also impacts the distribution of phenotypes among unaffected family members, with members of multiplex families generally having greater endophenotype impairment than simplex family members [138,139,140]. In addition to enriching for inherited, as opposed to de novo, risk alleles selection on multiplex families may enrich for loci of larger effect which are presumably rarer [102, 107, 141].

Pedigree-based whole genome sequencing of affective and psychotic disorders consortium

To capitalize on the benefits of family-based designs for variant localization and gene identification, we formed an eight-site international consortium to use whole genome sequence data and novel analytic methods to identify rare variants that increase risk for affective and/or psychotic illness. Initially, participating studies included: individuals from large families of Amish and Mennonite descent ascertained for bipolar disorder and living in Pennsylvania, Ohio, and Indiana [142, 143]; individuals from 88 multiplex families living in Western Australia [144]; persons from extended families living in Costa Rica’s central valley who were identified via a sibling pair concordant for either schizophrenia or bipolar disorder [145]; large multiplex multigenerational families from Pennsylvania selected for schizophrenia [146]; individuals with from Scottish families multiply affected with bipolar disorder or schizophrenia [147, 148]; and large extended Mexican–American pedigrees living in Texas and selected without regard to phenotype [77]. Our cost-effective approach leverages existing DNA, phenotypic data and some existing sequence data from extended pedigrees with at least three affected family members. Together, we have marshaled over 4000 individuals in approximately 269 families (see Table 2). Other research groups who have generated WGS in additional well-characterized families are encouraged to join us.

Table 2 Studies currently participating in the consortium

Conclusion

It is clear that common and rare variants, as well as environmental factors, play a role in risk for mental illness. Large meta analytic GWAS have likely localized most or all of the common variants with moderate to large effect sizes for the major psychiatric disorders [10]. Following this logic, Boyle, Li and Pritchard [10] recently suggested that after the biggest hits from GWAS have been identified, “the next most promising step is to hunt for lower-frequency variants of larger effects” (page 1184). Given the recent progress with common variation, it would seem that the field of psychiatric genetics should now capitalize on those successes by identifying and characterizing analogous rare variation and confirming those previously identified [149]. Extended pedigrees represent an implicit enrichment strategy for identifying rare variants since Mendelian transmission maximizes the chance that multiple copies will exist in the family. Given this enrichment, the associated improvement in statistical power, plus the economic advantages of pseudo-sequencing through genotype imputation, we formed the “Pedigree-Based Whole Genome Sequencing of Affective and Psychotic Disorders” consortium, a group of international scientists dedicated to using family-based designs to identify rare variants that increase risk of psychiatric disorders. WGS in multiplex pedigrees provides an important complementary experimental approach for identifying genes that confer risk for mental illness.