Introduction

Schizophrenia is a devastating mental illness characterised by psychotic features, of which delusions, hallucinations and thought disorder are hallmarks. The prognosis of schizophrenia is poor in at least 20 % of cases [1]; sufferers of schizophrenia have a higher risk of comorbidity, they face social exclusion and the mortality rate is more than twice that of the general population [2].

Schizophrenia is estimated to be the most financially costly of psychiatric illnesses [3]. The global economic impact of schizophrenia can be measured in terms of healthcare expenditure and lost economic productivity [24]. The social impact of this disorder on sufferers, families, carers and the community is harder to measure but equally devastating [4, 5]. These issues give justifiable cause to develop strategies that may help identify high risk individuals and avert the poor outcomes associated with the subsequent onset of illness.

The baseline risk found in the general population is sustained by the persistence of social, environmental and biological factors, the study of which spans the two divergent disciplines of social science and genetics. The large population fractions attributed to socio-environmental risk factors for schizophrenia suggest that addressing these factors collectively would significantly alleviate the global burden of this illness [6]. However, without genetics, it is impossible to know for whom these risks can be averted.

A recent surge of new genetic data suggests that individually targeted prevention strategies are a realistic goal for schizophrenia [7]. Therefore, we begin this review by outlining an emerging genetic landscape of schizophrenia and how such knowledge can be converted into a simple risk score metric. This can be used to provide answers for questions concerning the genetic architecture of schizophrenia and related phenotypes. We outline the statistical infrastructure now allowing the clinical potential of this risk score to be assessed. Finally, we draw relevance to and discuss implications for GxE research.

Susceptibility to schizophrenia incorporates genetic and environmental risk

The scientific rationale for the genetic interrogation of a phenotype comes from the prior certainty of its genetic origin. Establishing that this is the case, by demonstrating the heritability of a trait, is the foundation for the subsequent exploration of genetic association.

Heritability expresses the proportion of phenotypic variance ascribed to genetic variance. Estimations can be derived in multiple ways. The archetypal approach uses twins and other family based designs [8]. Modern approaches to heritability estimation are a product of huge technological strides made in the ‘post-genomic’ era, a term used to describe the period since the human genome was first sequenced [9, 10]. These methods rely on information obtained from a DNA ‘chip’ to estimate the heritability associated with millions of genetic markers located throughout the genome (i.e. the entirety of one’s DNA sequence). The most common class of marker interrogated by DNA chips is the single nucleotide polymorphism (SNP): a single subunit change in the DNA sequence. Typically, the information contained by these markers creates a characteristic genetic profile of each individual within a study. DNA chips allow efficient screening of disease genomes, within a genetic study design known as a Genome-wide Association Study (GWAS) [10].

A divergence exists between heritability estimates derived using family based study designs and those derived from population cohorts [11]. Heritability estimations from twins average around 0.8 [12], while population-based estimates are usually lower at around 0.65 [13]. As estimates of heritability from national records may be more representative of the samples comprising the large international GWAS consortia [14], it is conceivable that the upper estimated heritability of schizophrenia just mentioned may be methodologically enhanced. From this perspective, it is all too easy to overlook the importance of the contribution made by the environment. Its importance is underlined by the fact that not all carriers of highly penetrant genetic risk factors succumb to schizophrenia (or any other psychiatric illness) [15]. Key to understanding the expression of genetic risk, therefore, is an appreciation of contextual influences arising from the environment (as well as other background genetic modifiers).

The environment and schizophrenia

Environmental risk factors for schizophrenia include prenatal maternal nutrition, maternal infection during pregnancy, obstetric complications, season of birth, urbanicity, migration, socio-economic status and cannabis. Table 1 lists these and other important environmental influences on risk of schizophrenia.

Table 1 Environmental risk factors for Schizophrenia

One advantage environmental factors have over genetic variables is their estimated effects: for complex diseases such as schizophrenia genetic risk factors tend to have very modest effects, while effects of environmental risk factors tend to be much larger. The typical odds ratios associated with the risk factors in Table 1 range between 1.5 and 11.0. In contrast, common genetic risk factors for schizophrenia are much smaller, with typical odds ratios of between 1.1 and 1.4. Apart from the subject-specific literature cited in Table 1, the wider range of non-genetic influences on risk is covered elsewhere [44] and hence is not discussed in depth here.

Debate about the relative importance of genes versus environment is never far away in schizophrenia and seems to be perpetuated by the zeal with which both sides cling to optimistic estimated contributions to population risk. Often, these may be flawed. For instance, a study that recently estimated the proportion of schizophrenia cases avoidable through social interventions [6] does not control for the influence of a positive family history (of schizophrenia), which is used as a fairly reliable proxy for genetic influence. Genetic studies, on the other hand, seldom acknowledge the potential for confounding due to hidden discrepancies in the environmental profile of comparison groups. Thus, in truth, the methodological biases present in both fields mean that the distinction between genetic and environmental influences may be less clear than it first seems. This means that a disorder resulting from GxE will more often be attributed to genes by geneticists and to the environment by social scientists.

The emerging molecular landscape of schizophrenia

A disorder of rare and common genetic effects

The infrastructure that has enabled GWAS to flourish was developed in the aftermath of the first human genome sequence [9, 45, 46]. GWAS was originally designed to test the Common Disease Common Variant hypothesis, a theory which explains the high burden of cases in a population through the joint effect of thousands of common risk alleles present at >5 % frequency in the general population. Both in theory and in practice, common variants individually explain only tiny amounts of actual risk.

After a slow start, the genetic landscape of schizophrenia has steadily been transformed by GWAS [47]. The total number of genome-wide ‘hits’ for schizophrenia currently stands at 128 [50]. In comparative terms, schizophrenia genetics outperforms all other psychiatric disorders [50].This, in part, reflects the different architectural structure of each disorder. For example, the heritability of major depression (~0.40 [51]), a disorder in which few GWAS hits have been identified, is approximately half that of schizophrenia’s. Hence, a 4- to 5-fold increase in current sample size would be required to provide for a GWAS of depression the statistical power as schizophrenia currently has [50].

In the time since it was initially developed, the GWAS hardware (a ‘chip’ which contains thousands of biological features that facilitate whole-genome scanning) has become laden with newer features which allow rarer genetic novelties to be interrogated, either through imputation methods [52] or targeted assays [53, 54]. The most ready examples of this are copy number variants (CNVs). The term describes the duplication and deletion events that occur at hotspots present throughout the genome. This type of variation is consistent with the ‘Multiple Rare Variant’ theory of causality, which defines schizophrenia as a disorder resulting from a high burden of these structural events in the population. The rarity of these variants individually (they occur at frequencies much lower than 1 %) is explained by large effects on risk and an associated negative impact on fecundity [55], which results in purifying selection against these variants. A recognised source of the rare variants found in schizophrenia is spontaneous (or ‘de novo’) mutations [56]. This coincides with the observation that the fathers of individuals with schizophrenia tend to be older [57, 58]. Some of these resist negative selection pressure to persist in the population, by segregating within families. But because de novo events do not contribute to the phenotypic similarity between twins concordant for the disorder, such events fail to contribute to estimated heritability.

On rare occasions, the co-transmission of genetic markers with illness has allowed the bounds surrounding causal regions to be narrowed enough for schizophrenia genes to be identified. This has been done using methods purposefully designed to capture genetic signals of segregation within families [59, 60]. It is important to note the high level of consistency between current and previous efforts to target such effects. A comparison of previous meta-analytic evidence [61, 62] and current genomic findings [6367] reveals a reassuring level of overlap between previous candidate regions and structural mutations on chromosomes 1q, 2p, 2q, 3q, 4q, 5q, 11q, 15q, 17q and 22q.

It is because the allelic spectrum of schizophrenia demonstrably ranges from common [49] to rare [48] that it is inaccurate to assume a delineation between common and rare genetic architectures [14]. While the catalogue of rare genomic events in schizophrenia is not currently extensive enough to tally with the genetic classification of schizophrenia as a common disorder, the initial screens for rare variants have been far from exhaustive and thus will have to wait for more of the sequencing initiatives that are currently underway (e.g. http://www.uk10k.org/studies/neurodevelopment.html) to run their course.

The broader common architecture can be purposefully leveraged

The key to the recent success of GWAS is a large international cohort which continues to expand. The total number of schizophrenia cases acquired by the international Psychiatric Genomic Consortium (PGC) has reached 35,000 [50]. In comparison, the earliest cohorts used for GWAS purposes were more than ten times smaller [68, 69]. The figure is projected to reach 60,000 cases by 2015. One hopes that the level of success (measurable by the count of new discoveries and the total variance explained by genome-wide data) will continue to match the effort required to assemble such cohorts. Current figures would suggest that the international movement has acquired a sufficient level of momentum to sustain these discoveries; at present, every 300 cases genotyped by the PGC generate a new GWAS finding (Gerome Breen, pers commun). Individually, typical effect sizes do not exceed an OR of 1.2, while explained variances range between 0.05 % (Znf804A) and 0.67 % (RELN) [14]. Although CNVs are much more likely to be disruptive, they are also much less frequent and, therefore, account for similar proportions (e.g. 17p12: 0.02 %; 22q11: 0.21 %) of the overall variance.

A landmark study of schizophrenia demonstrated that a large chunk of heritability previously assumed to be ‘missing’ [70] has been hiding all along on the same chips used to reach this conclusion. This ‘dark’ variance may be harnessed by tethering annotated genetic risk to a single quantitative ‘polygenic’ risk score. The score may incorporate sub-threshold effects, which GWAS is underpowered to detect at the conventional threshold of P = 5 × 10−8 [47]. The additional variance that can now be recovered has been analysed in relation to the clinical phenotype, as well as against quantitative traits related to the disorder (known as ‘endophenotypes’), with varying success [71, 72].

The polygenic score is derived and constructed in a ‘training’ sample before implementation in a separate independent dataset. Deriving the score involves the use of permissive P value thresholds to aid the discovery of signals in the training sample. The P value thresholds used occupy a sliding scale ranging from P < 0.01 to P < 0.5 [47]. Genetic signals that pass these benchmarks will incorporate many true underlying associations initially missed because of low power. The polygenic risk score is calculated as follows [73]:

$${\text{Polygenic risk score}} = \sum {x_{i} \times } { \log }({\text{OR}}_{i} )$$
(1)

where OR i is the allelic odds ratio as estimated in the discovery dataset and x i is the number of risk alleles present at a single bi-allelic locus (i.e. 0, 1 or 2) within a subject. The allelic odds ratio for the association of SNP i vs. trait is multiplied by the number of risk alleles. The procedure is repeated for each SNP falling within a specified P value range. The aggregate count of weighted risk alleles is calculated to create a per person polygenic score. The distribution of score values is normalised by fitting to a standard normal distribution curve, which helps with the interpretation of the score in downstream analyses (for example, the per standard deviation increase on the corresponding risk scale). The polygenic score suffers from the mass infiltration of null GWAS effects. The negative impact on the statistical power of the score was demonstrated graphically both by Purcell et al. [47] and by Dudbridge [74]. As the composition of the score at a given inclusion threshold is a balance between true and null effects, it is common practice to present the change in the association statistic as the inclusion threshold is varied. Recent work by the PGC suggests that around 25 % of additive genetic liability to schizophrenia is explained by the information derived from its cohort [75]. The high proportion of heritability captured by the polygenic score explains the additional leverage it provides for association analyses.

New insights from polygenic risk score analysis of schizophrenia

Demographic differences in genetic risk are currently estimated to be small

Very few high-powered genetic studies have been performed in samples of African American, or other African populations to date. It is reasonable to expect that the genetic contribution to schizophrenia will vary by population on account of population genetic differences. Even if the genetic risk architecture is the same for Caucasian and non-Caucasian populations, difficulties associated with confirming this should be anticipated in practice. This is because the genomes of these two populations have fundamentally different correlation structures [76]. This correlation, known as linkage disequilibrium (LD), allows a pair of neighbouring (physically close) SNPs to be genetically close in one population but distant and uncorrelated in another. Because the genetic proximity between two loci is the basis of genetic association (between marker and disease locus), the efficiency with which disease loci are captured (or ‘tagged’) will also vary in different populations. Given these considerations, the level of correlation found between the two population risk profiles is surprisingly high so far; the heritability associated with the current genomic profile of schizophrenia is similar regardless of whether it is studied in African Americans (0.24, SE = 0.09) or in European ancestral groups (~0.28, SE = 0.03) [77]. The similarity in genetic risk architecture is also confirmed by the high degree of genetic correlation between European and African American schizophrenia genomes (i.e. African American vs. European: r = 0.61–0.66).

Gender-specific GWAS such as Shifman et al. [68] has only been performed on one occasion. This is curious, given that almost all features of schizophrenia, including prevalence, incidence, age of onset, presentation, course and treatment response, demonstrate a gender difference [78]. The correlation between male and female schizophrenia genomes was recently examined on the basis of the polygenic risk score for schizophrenia. Genomic ‘partitioning’ methods were used to evaluate a combined discovery/replication sample of 9,087 cases and 9,343 controls from the PGC-SCZ cohort. The estimated overlap found (r = 0.89, SE = 0.06) indicates that the majority of the genetic variance captured by chips is shared between sexes [75]. Also determined was that the variance explained by SNPs on the X chromosome is in proportion to its length. Neither piece of evidence is consistent with a skew in the distribution of genetic risk between genders; hence there is no genetic basis for the gender differences widely observed in schizophrenia.

Cross-disorder overlap of common genetic influences

Mental disorders have traditionally been viewed as distinct, categorical conditions. This view is being challenged by emerging evidence that suggests psychiatric traits may fit into a single broader framework [79]. The organisation of this framework may be underpinned by genetic correlation [80]. For example, the common genetic architecture of schizophrenia is highly correlated with both bipolar disorder (r = 0.68 ± 0.04) and major depression (r = 0.43 ± 0.06) [80]. Beyond this trio of disorders, a correlation between schizophrenia and autism is also significant (r = 0.16 ± 0.06) [80].

Attempts have recently been made to pinpoint the molecular sources of cross-disorder overlap [81]. Four SNPs with genome-wide significant influence (P < 5 × 10−8) on cross-disorder risk were identified, implicating the genes ITIH3, AS3MT, CACNA1C and CACNB2. Only the effect of CACNA1C was not fully evident across all the disorders examined (schizophrenia, major depression, bipolar disorder, autism and attention-deficit/hyperactivity disorder). The SNP effect detected in this gene was restricted to bipolar disorder and schizophrenia (at P = 1.8x10−8). When the search for cross-disorder effects was broadened to include previous GWAS hits for both schizophrenia (n = 10) and bipolar disorder (n = 4), a disorder-specific model was found to provide the best-fit for 7 of the 14 hits. The remaining 7 SNPs (6 schizophrenia and 1 bipolar disorder) showed genome-wide evidence (P < 5 × 10−8) of cross-disorder effects. Findings from these recent cross-disorder analyses confirm long-held suspicions of biological overlap between psychiatric traits that date back to the pre-GWAS era [82].

Clinical applications of the schizophrenia polygenic score

The schizophrenia polygenic score may have beneficial implications for future clinical decision making, and may form the basis for a coherent public health strategy for schizophrenia. Many options exist for evaluating the clinical efficacy of risk scores. The area under the receiver–operator characteristic (ROC) curve is the most widely recognised [7, 83, 84]. The ROC is obtained by ranking a set of individuals with known disease status according to their risk for disease. The curve is derived by plotting sensitivity (each correctly ranked case) against 1-sensitivity (each instance when a control is incorrectly ranked above a case, by the score). The AUC reflects the area under this curve and expresses the accuracy with which high- and low-risk groups can be distinguished according to a bivariate outcome (such as case–control status). Uninformative prediction corresponds to an AUC of 0.5. Classifiers with AUCs >0.75 are useful for identifying high-risk groups who might benefit from screening [85]. Classifiers exceeding an AUC of 0.99 can be used to diagnose a disease reliably in the general population [85]. The AUC has recently been formularised for genetic purposes [7, 83, 84] to take account of extrinsic influences, such as disease prevalence and heritability, which vary with trait architecture. It expresses risk using the liability threshold model, which assumes that the individuals of a population possess a naturally varying liability to disease, with clinical illness only developing in those whose excess risk exceeds a certain threshold [11].

With a genomic profile accounting for roughly a quarter (23 %) of trait variance, the polygenic risk score for schizophrenia is estimated to be consistent with a genetic AUC of 0.75. If this is true, then the score could potentially already have clinical applications in undefined groups at high risk of schizophrenia [85]. However, even under the idealised assumption that all the heritability of schizophrenia can readily be translated into genetic findings, the AUC for schizophrenia is predicted to fall short of the level required for screening the general population (i.e. >0.99 [85]). Some of the extensions made to the AUC mean that the clinical utility of polygenic risk scores can be improved by allowing clinical questions to be refined [84]; however, ultimately the current GWAS strategy will need to be broadened to accommodate other complementary approaches. The effect of methodological constraints on current prediction power is not factored into these estimates. The following section explores these in greater detail.

Constraints on risk prediction accuracy

One key factor limiting the accuracy of polygenic risk prediction is the size of the discovery sample used to calculate the risk score weights. At limiting sample sizes, sampling error allows null SNPs unassociated with schizophrenia into the risk score. This comes at the expense of prediction power. When the purpose of deriving the score is to test for association, the impact of sampling error can be offset by replication in independent datasets. Risk prediction, however, is an exact science that penalises for every false discovery included in the final genetic model. It is similarly undone when true associations are omitted. Prediction of risk by genetic scores has failed when the sample size used to generate risk scores has been suboptimal [86], whereas the sample size improvements needed to increase the chances of successful outcome may be relatively modest [74]. The high heritability explained by the polygenic score for schizophrenia ensures it can provide a convenient preview of future GWAS hits. The impending arrival of genomics to clinical settings, as well as ongoing initiatives in biobanking, will have a vital role to play in the large-scale delivery of future samples for GWAS [87, 88].

GWAS chips only explain around 80 % of the known SNP variance in the Caucasian genome [89]. Increasing the level of coverage is, therefore, critical for unlocking the untapped sources of genetic variance. Current evidence suggests broad convergence of the genetic profile of schizophrenia in Caucasians and Africans. But only 23 % of liability variance has been explained so far. Therefore, it should not be assumed that a single genomic profile will satisfy the purposes of risk prediction in the global community; potentially some reconfiguration of the score will be needed to maximise the efficiency and accuracy. Much of this reconfiguration may be driven by new sequence-level insights from different populations.

Future strategies for enhancing prediction

The incorporation of family history

Family history of schizophrenia still remains the most reliable predictive instrument currently available to the clinician, but is of demonstrably limited use when implemented on its own. Optimistic simulations suggest that a family history of schizophrenia equates to an explained liability variance (or genetic AUC) of between 0.56 and 0.66 [7]. One of the practical drawbacks of family history is that while positive scores can be helpful and informative of genetic status, little can be inferred from a negative family history [90], as recall bias, varying family size and knowledge of family history all play a part in making the interpretation of a negative response not so straightforward. The interpretation of family history is further undermined by the assumption that it only reflects genetic liability, when in fact the familial environment contributes 6–11 % of this effect [12, 13].

Risk prediction can be boosted by supplementing the polygenic score profile with information on family history [7, 83]. Chatterjee et al. [83] were able to boost the AUC for polygenic scores for a number of GWAS disorders that encompass different prevalences, heritabilities and familial loading. Therefore, this work has potential relevance to schizophrenia. Despite the added efficacy of including family history, AUC scores remained below 0.9 for all but the rarest and most familial disorders (Type-1 diabetes and Crohn’s disease) in the analytical models explored by Chatterjee et al. Also it is necessary to check that family history and the polygenic score are not actually correlated (e.g. Belskey et al. [91]), because this will overstate the advantage gained by combining both sets of information.

Environmentally informed genetic risk prediction

The strongest evidence for the existence of GxE comes from empirical studies which use family history as a proxy tool to assess the joint effects of environmental and genetic factors on risk. Family history has been shown to augment the psychotogenetic effects of cannabis [92]. Another study compared and found an identical rate of schizophrenia family history in cannabis-induced cases of psychosis and schizophrenia. This suggests that sensitivity to the psychotogenic effects of cannabis is modified by genetic liability to schizophrenia [93]. Similar GxE effects can be demonstrated for urbanicity [94] and prenatal infection [95], but seemingly not for obstetric complications [96]. The adoption study design is a convenient way to disentangle the genetic and environmental components of familial risk. It has been used to demonstrate the interaction between social determinants such as parental separation and heritable risk [97]. If these molecular effect modifiers could be captured and tethered to the existing polygenic score, then prediction performance could be boosted to make population screening feasible and relevant to the exposure profile of the individual. This point underlies why the modification of environmental effects by polygenic scores is poised to become a major paradigm for translational research in schizophrenia.

Gene–environment interactions in Schizophrenia

The central paradigms

Typically, statistical interaction is estimated via inclusion of an interaction term in a regression model. A wider debate surrounds how these regression models should be scaled [98, 99], as the presence or absence of statistical interaction depends on how one expects risk factors to combine. For multifactorial disease, researchers typically use logistic regression to look for disease risk factors. Under this model, risk factors are assumed to be additive on the log odds ratio scale. This is the same as saying risk factor odds ratios are multiplicative. For rare diseases, this is equivalent to saying relative risks are multiplicative. Logistic regression models multifactorial disease architecture well and typically there is no interaction on this scale. This lack of interaction has perplexed many because it flies in the face of knowledge that interaction is endemic in biological systems, but this stems from their lack of appreciation that additivity in logistic regression actually models a particular type of interaction (multiplicative odds ratios).

An alternative view of disease aetiology comes from the counterfactual model or potential outcomes model of disease [100]. This is perhaps the most coherent definition of cause for multifactorial phenomena. Rothman’s Causal Pies [100] conceptualise this; a sufficient cause is a set of criteria (component causes) which when satisfied guarantee the onset of the disease. There may be several ways for a person to get a disease (i.e. there may be several causal pies), and these may differ between individuals, so a risk factor for one person may not be relevant for another. This provides a framework for prediction of population health intervention efficacy. If the proportion of persons with both necessary risk factors ‘A and B’ is large, then removal of either risk factor A or B will save many from disease, whereas if small and the proportion of ‘A or B’ persons is large then maybe only removal of A and B will be efficacious. Estimation of population proportions of these person types is, therefore, desirable. Unfortunately, it is confounded because the number of possible risk exposure strata (from which one might estimate person type proportions) is less than that of possible person types. For example, for exposure A, we have 2 strata—exposed or not, but four possible person types—causal, protective, immune, doomed. Nevertheless, some restrictions can be put on the ranges for person type proportions and a useful statistic has been derived (with some additional assumptions) for the interaction of two risk factors. This interaction statistic, synergy, also known as biological interaction, asserts the risk difference scale as the correct scale for measuring interaction.

The traditional GxE approach

The key to recent triumphs of the GWAS approach in schizophrenia is the fact that its implementation has been more rigorous and methodologically homogeneous than the previous generation of candidate-gene studies [101]. This contrasts with GxE research, where a diversity of methods and standards still exists. (A summary of studies that have explored GxE interactions in psychosis and related phenotypes can be found in Table 2, and have recently been reviewed by Modinos et al. [102]). This diversity has overshadowed some of the better study templates for this model [103] and has even precluded meta-analytical evaluation of the current GxE evidence [102]. This appears to be the legacy of a time in which geneticists and social scientists competed to demonstrate which science contributed most to schizophrenia liability. In doing so, opportunities to lay foundations for future GxE research were missed. The result today is a relative paucity of datasets able to adequately assess the effect of joint exposure to genes and environment.

Table 2 Published GxE studies in psychosis research

Longitudinal studies sit at the top of a complex methodological hierarchy of epidemiological study designs. But these too have failed to bring resolution to ongoing controversies, such as the controversial interaction between cannabis and the COMT gene in schizophrenia [104, 105]. One fundamental problem that classical approaches to GxE face is that the sample sizes required to find GXE effects are several times larger than are needed to detect the main effect of a gene [106]. Some estimates suggest that the average GxE effect would need to be ten times larger than those identified by GWAS, for the typical sample sizes used in candidate-gene research (typically in the hundreds) to be relevant [103]. This problem is compounded by the issue of publication bias. To refute a GxE discovery, a study must be, on average, six times larger than those which support the finding, to make it into the public domain [103].

Genome-wide approaches: GWEIS

An alternative to candidate GxE studies is to consider GxE on a genome-wide basis. A methodological approach that allows this to be done efficiently is the genome-wide environment interaction study design (GWEIS) [126]. The number of possible pairwise and higher-order interactions is vast. This necessitates simplifications in modelling which may be better and more flexibly expressed through machine learning and data mining approaches. Under most models, interaction terms contribute to marginal effects [127129]. Some have recommended consolidating genome-wide detection power by exploring the marginal and interaction properties of the genome simultaneously [130]. Others seek to increase effect size and decrease multiple testing by considering gene by E, or pathway by E interaction. Two-stage GWEIS methods have been developed which increase power by reducing the multiple testing burden. Genetic loci are first selected by their associations with disease and environment separately, and then subsequently tested for interaction [131]. Power considerations also underlie the case-only study design which derives power from the assumption that G and E are independent in the population.

GWEIS methods are yet to take off in schizophrenia, thus the current crop of GxE findings (reviewed in Modinos et al. [102]) for schizophrenia is still yet to face the same acid test used to put the previous generation of genetic association candidates on trial [132, 133]. Just one GWEIS survey has been conducted to date in schizophrenia [134]. The study of genome-wide interaction with maternal cytomegalovirus (CMV) infection, returned one genome-wide significant hit, an SNP within the gene CTNNA3 (P value = 7.3 × 10−7), which had not previously been implicated in schizophrenia.

In theory, the same polygenic principle described earlier can be applied in the context of GWEIS. Therefore, GWEIS may have an important part to play in deriving an environmentally informed genetic risk profile for schizophrenia (one whose power can be enhanced by selective application to individuals that have the corresponding exposure profile. However, the same constraints outlined earlier (power and coverage) will be limiting here also.

Systematic approaches could also be used to identify novel environmental risk factors. The potential to recover interactions between weak risk environments and rarer genetic variants has been demonstrated in theory [135], but in practice can only be tested in the largest schizophrenia datasets, in which rarer risk variants will be better represented. As the frequency of scores in the extreme upper portion of the risk score distribution may be similar (in frequency and penetrance) to rarer structural mutations, the polygenic score for schizophrenia could also be leveraged to identify new environmental factors that contribute to GxE.

Future parallel strategies for improving the power of GxE

The polygenic risk score provides the ultimate genetic tool for detecting GxE. But further gains in detection power can be made by increasing the sensitivity with which environmental measures are captured. Simulated studies of measurement error suggest that an increase in correlation with true exposure values from 0.4 to 0.7 can equate to as much as a 20-fold gain in sample size [136]. Purposefully, written tools have since been developed that allow the precision of exposure measurement to be factored into power calculations (see http://www.hsph.harvard.edu/faculty/peter-kraft/software/ or the ESPRESSO power calculator at http://www.p3gobservatory.org/powercalculator.htm).

The use of endophenotypes will be beneficial in this regard and is consistent with the P factor hypothesis [79], which proposes that all psychiatric traits map onto a single underlying psychopathological profile.

Conclusion

GWAS of schizophrenia can be considered to have delivered a satisfactory return on the initial capital outlay. The resulting genomic profile has already replaced family history as the most reliable genetic prediction tool in schizophrenia research. We have seen that while this tool has considerable clinical potential, it clearly has some way to go before it can be considered valid for widespread clinical use.

Hence, the ability to provide an accurate discrimination of genetic risk, based on a case-by-case understanding of historical environmental footprints, is an important clinical objective, and one which is consistent with the goal of personalised medicine. The polygenic score provides considerable scope to improve the detection of GxE at the statistical level.

The option to extend the search for meaningful interaction into other biological domains (e.g. gene expression) is attractive but currently under-resourced in the field of schizophrenia. Such approaches will help to build an understanding of the biological architecture that underpins schizophrenia in its various environments of origin.