Attention-deficit/hyperactivity disorder (ADHD) is a neurodevelopmental disorder with an estimated worldwide prevalence of 7.2% (Thomas et al., 2015). ADHD is reliably associated with a range of adverse social, behavioral, and emotional outcomes in later life, including low educational attainment (Kuriyan et al., 2013), depression and substance misuse (Agnew-Blais et al., 2018), criminality (Fletcher & Wolfe, 2009), and poor physical health (Kuriyan et al., 2013). The negative impact of childhood ADHD is pervasive in ways beyond its effects on the individual; it also affects society through the substantial economic burden it places on mental health systems and education services (Doshi et al., 2012).

Although decades of behavioral and molecular genetics research have indicated the critical role that genes play in the etiology of ADHD (Chang et al., 2013; Chen et al., 2017; Faraone & Larsson, 2018; Nikolas & Burt, 2010), recent studies leveraging genome-wide methods have begun to challenge some of our previous knowledge about the genetics of ADHD while ushering in a new set of tools to our pursuit of uncovering its genetic architecture. One method that has gained considerable traction in psychiatric genetics research are polygenic scores (PGS) (Wray et al., 2007). Study adoption of PGS methods has been rapid; from 2007 to 2013, there were a total of 138 published studies cataloged on PubMed involving the use of PGS (three of which were related to ADHD). In just 2018 alone, there were 279 studies published on PGS, which is more than double the total number of published PGS studies prior to 2013. Given the high degree of heritability of many complex traits and disorders, there is optimism that PGS can eventually be used in the prediction of human health and behavioral outcomes (Anderson et al., 2019; Martin et al., 2019; Torkamani et al., 2018). Despite its tremendous promise, little is known about whether psychiatric PGS are particularly useful for clinical prediction. Furthermore, as more studies involving PGS are published, it is paramount to consider the limitations and future directions of PGS research. This study examines the predictive performance of ADHD PGS by conducting the first meta-analytic review of the ADHD PGS literature.

Genome-Wide Association Studies of ADHD

Before discussing the PGS approach, we first summarize genome-wide association studies (GWAS) of ADHD, a critical aspect of PGS computations. GWAS examine the association between single nucleotide polymorphism (SNP) variation in the genome and a quantitative trait of interest (Hirschhorn & Daly, 2005). Most GWAS to-date have employed microarrays to genotype hundreds of thousands to several million SNPs in individuals. Because of the volume of SNP-phenotype associations tested and the expected small effect of even a significant association, GWAS have far larger sample size requirements than behavioral genetic or candidate gene studies (Visscher et al., 2017). Thus, nearly all major GWAS of psychiatric outcomes are led by consortia, consisting of international research teams that combine their cohorts to form a large pooled sample (e.g., Psychiatric Genomics Consortium; PGC) (Sullivan, 2010).

PGC-led efforts have produced two meta-analyses specific to ADHD GWAS, the first of which was performed by B. M. Neale and colleagues (2010). This GWAS was comprised of four independent child and adolescent cohorts of predominantly Western European descent, resulting in a pooled study sample of 2,064 parent-ADHD proband trios, 896 ADHD probands and 2,455 controls. Participants in all cohorts were assessed for ADHD using a semi-structured clinical interview (keyed to either the DSM-IV or ICD-10, depending on the population) conducted with a caregiver. The majority of ADHD probands were diagnosed with combined-type ADHD (n = 868). Genetic data were imputed using the HapMap Phase III reference panel, resulting in 1,206,463 SNPs analyzed in the GWAS. Test statistics from each cohort were transformed into Z scores, which was the effect size statistic in the GWAS. The GWAS yielded no genome-wide significant findings (i.e., p < 5 × 10–8). Regions on chromosome 7 (e.g., SHFM1), 8 (e.g., CHMP7), and 11 (e.g., DHCR7 and NADSYN1) were implicated based on having the most SNPs that were among the top 50 associations, but the authors pointed to being underpowered in their sample to detect small genetic effects. Other noted limitations of the GWAS included: measurement variability and differences in ADHD referral patterns across cohorts, unmeasured effects from rare and copy number variants, and unaccounted environmental differences (e.g., diet, culture) due to having drawn from different populations (Neale et al., 2010).

The most recent meta-analysis of ADHD GWAS was conducted by Demontis and colleagues (2019). The research team analyzed genetic data from 12 child and adult cohorts, resulting in 20,183 ADHD probands and 35,191 controls. However, the vast majority of the probands and controls (n = 14,583 and 22,494 respectively) came from a single child and adult cohort from Denmark (iPSYCH). The 11 other cohorts were aggregated by the PGC and represented smaller child and adult samples from Europe, Canada, United States and China. To address the possibility of population stratification, genetic principal components were included in their analysis. Imputation was performed using the 1000 Genomes Project Phase 3 reference panel, resulting in 8,047,421 variants analyzed in the GWAS. In iPSYCH, ADHD status was determined by a psychiatrist according to the ICD-10. ADHD status in the PGC samples were assessed using semi-structured clinical interviews (e.g., Schedule for Affective Disorders and Schizophrenia for School-Age Children, K-SADS; Child and Adolescent Psychiatric Assessment, CAPA). Twelve unique SNPs were identified in the full GWAS sample, including a locus in the FOXP2 gene, which is believed to play a role in learning and speech (Schreiweis et al., 2014) and SEMA6D, which is believed to play a role in embryonic brain development and educational attainment (Okbay et al., 2016). Notably, none of the regions that were implicated in the earlier B.M. Neale et al. (2010) GWAS were genome-wide significant in the more recent GWAS. Furthermore, one of the loci (in SPAG16 on chromosome 2) failed to pass the significance threshold when only the European ancestry subsample was meta-analyzed. And the 12 loci that were genome-wide significant only captured a small fraction of variance in ADHD, with odds ratios for each of the loci ranging from 1.077 to 1.198 (the SNP heritability of ADHD was an estimated 0.22). A number of important limitations should be noted, including the high degree of age heterogeneity in the pooled sample (combining children and adults), diagnostic heterogeneity from the use of different measures of ADHD, and an overrepresentation of cases and controls from a single Western European cohort (i.e., iPSYCH). Results from these GWAS have made it abundantly clear that many genes with individually small effects, rather than a few genes with large effects, are likely involved in the etiology of ADHD. As sample sizes for these GWAS have increased, so too has the rate of genetic discovery (Visscher et al., 2017). Given the prominence of consortia-led efforts towards gene identification, GWAS sample sizes are projected to be large enough to capture a substantial amount of the common genetic variation underlying the psychiatric disorders in time (Sullivan, 2010).

Polygenic Scores (PGS)

PGS leverage of our knowledge from GWAS into a single score that characterizes an individual’s polygenic (via common SNPs) liability for a trait of interest (Wray et al., 2007). Notably, PGS have been widely referred to as a polygenic “risk” scores (PRS) in the broader literature (Anderson et al., 2019; Bogdan et al., 2018). The inclusion of the term “risk” is perhaps a misnomer because the traditional scoring approach does not differentiate true risk alleles (i.e., those that confer an increased liability to a trait in question) from unassociated variants (The International Schizophrenia Consortium, 2009). Furthermore, there is emerging evidence that PGS may also have promotive associations, particularly for those on the “low” end of the distribution (Krapohl et al., 2016; Li, 2019b; Plomin et al., 2009; Torkamani et al., 2018). We therefore prefer the more general term “polygenic score,” which is the nomenclature used throughout the review.

The traditional PGS approach takes an ensemble of genetic variants from a GWAS sample, referred to as the “discovery” population, and computes a weighted linear composite of these variants for individuals sampled from an independent “target” population (Anderson et al., 2019; The International Schizophrenia Consortium, 2009):

PGS = \(\sum_{i}{SNP}_{ij} {\beta }_{j}\)

where the PGS for individual i in a target population is the summation of the total number of alleles, j, for a SNP in individual i multiplied by the SNP’s effect size, β, taken from GWAS conducted on a separate discovery population. In PGS, the number of SNPs included in the computation is based on the p-value threshold (PT) set in the discovery GWAS. A more liberal threshold, e.g. PT = 1, captures much more of the genetic signal into the PGS computation, but also more unassociated variants as well. Conversely, setting a more conservative threshold, e.g. PT < 0.01, captures less of the genetic signal but also fewer unassociated variants that may contribute to a noisy signal (Dudbridge, 2013). Although most studies have used PT < 0.5 as a matter of convention (see example set by The International Schizophrenia Consortium, 2009), a “best fit” PT can also be empirically derived by selecting the PT that explains the largest amount of variance in the phenotype as measured in the target population (Euesden, Lewis, & O’Reilly, 2016). There are, however, issues with model overfit (Benjamini et al., 2001) and a lack of generalizability when using a PT that is optimally-predictive only for a particular phenotype in a particular target population.

Purcell and colleagues (2009) published the first major application of PGS in psychiatric genetics (The International Schizophrenia Consortium, 2009). Using a discovery GWAS on schizophrenia, which at the time consisted of 3,322 European proband individuals and 3,587 controls, schizophrenia PGS (at GWAS PT < 0.5) explained approximately 3% of the variance in schizophrenia in a completely independent target sample, with larger effects at increasingly more liberal GWAS PT. Moreover, they replicated the schizophrenia PGS association in two European ancestry target samples, once again showing that schizophrenia PGS (at GWAS PT < 0.5) explained between 2.3 – 3.2% of the variance in schizophrenia. A systematic review identified 31 studies involving the use of schizophrenia PGS in relation to a broad set of outcomes (Mistry et al., 2018b). The authors noted that a meta-analysis was not conducted due to a lack of enough information provided by most PGS studies as well excessive variability in both the types of outcomes examined and discovery populations across each study. The same authors also conducted a review of studies on bipolar and major depressive disorder PGS, making similarly limited conclusions with respect to the state-of-the-field (Mistry et al., 2018a). Excessive outcome heterogeneity and the current lack of reporting standards in PGS studies of psychiatric disorders obfuscates our knowledge about the predictive utility of PGS for these outcomes, which has implications on future research in this area and its clinical applications.

The Current Study

Recent years have seen psychiatric genetics research developed rapidly, given the combination of publicly-available GWAS summary statistics for psychiatric disorders, the relatively low cost of microarrays and genotyping (Visscher et al., 2017), and the availability of open-source software for GWAS and PGS computations (Euesden et al., 2016; Purcell et al., 2007). Despite the rapid emergence ADHD PGS studies, there has yet to be a systematic or meta-analytic review of this literature. This study provides the first meta-analysis of the ADHD PGS literature and provides recommendations for future PGS studies.

Due to the possibility of heterogeneity in the effect sizes of ADHD PGS across studies, we explored a number of potential moderators, including the GWAS discovery sample, publication year, sampling method of the target sample (i.e., population-based vs. case–control), informant type of the ADHD measurement (i.e., parent only vs. multi-informant), and whether the ADHD outcome was measured categorically or continuously. Because of the greater power afforded by larger GWAS discovery samples (Chatterjee et al., 2013), we hypothesized that more recently published and larger discovery samples (i.e., Demontis et al., 2019) would be associated with larger PGS effect sizes than older studies using smaller discovery samples (i.e., Cross-Disorder Group of the Psychiatric Genomics Consortium, 2013; Neale et al., 2010). We also hypothesized that studies that focused on clinical populations (i.e., case–control) would have larger effect sizes than studies that sampled from general populations, especially given prior research showing that psychiatric PGS are more predictive in clinical populations than non-clinical ones (Savage et al., 2018). With respect to the various measurement methods used to assess ADHD across studies (e.g., categorical vs. continuous, parent-only vs. multi-informant), we make no specific hypothesizes regarding genetic prediction estimates given that we are not aware of any prior studies that have considered analyzing the effects of measurement method on PGS effect sizes specifically.

Method

Eligibility Criteria

This review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Moher et al., 2009). Eligibility criteria for the meta-analysis were as follows: 1) published in a peer reviewed journal, 2) written in English, 3) used an ADHD GWAS discovery sample, 4) used an independent target population (i.e., target participants cannot overlap with those in the ADHD GWAS discovery sample), 5) derived ADHD PGS on said target population, and 6) examined associations between ADHD PGS and ADHD (broadly defined). In the final meta-analyzed sample, we also removed studies that featured the same target sample, but used an older (i.e., lesser powered) discovery GWAS sample. The decision to focus on phenotypic ADHD in the ADHD PGS meta-analysis was driven by 1) the larger number of studies having measured this phenotype relative to a non-ADHD phenotype, and 2) the importance of establishing the construct validity of the ADHD PGS, which remains a crucial endeavor in the psychological sciences (Cronbach & Meehl, 1955). In order to facilitate comparisons between ADHD PGS studies as well as to reduce some of the heterogeneity that typifies this literature, we only included ADHD PGS studies that employed ADHD GWAS disseminated by the PGC: B. M. Neale et al. (2010), the Cross-Disorder Group of the Psychiatric Genomics Consortium (2013), and Demontis et al. (2019).

Search Procedure and Data Extraction

A flow chart of the meta-analysis procedure is presented in Supplemental Fig. 1. Data collection was conducted by Q.H. The review focused on all studies published from January 2010 to January 2020. Potential studies were identified in PubMed and Google Scholar databases. Searches targeted studies with all possible variants of the term “ADHD” (e.g., “attention deficit disorder,” “attention,” “hyperactivity”) and variants of the term “polygenic score” (e.g., “polygen*”, “polygenic risk scores,” “profile scores,” “risk profile scores”). A full list of the search criteria employed in the current review are listed in Supplemental Table 1. Study authors were contacted in cases where relevant information about a study was missing (e.g., sample demographics, association statistics, p-values, etc.).

Fig. 1
figure 1

Forest plot of ADHD PGS effect sizes

Fig. 2
figure 2

p-curve analysis of publication bias and p-hacking Note. The p-curve reflects the distribution of p-values as a function of the “true” underlying effect. For any given sample size, the bigger the effect, the more right-skewed the expected p-curve becomes (Simonsohn et al., 2014). If a “true” effect exists, one expects lower significant p-values than higher significant p-values. The observed p-curve indicates that at 99% power, all 12 statistically significant results (100%) had p < 0.025

Table 1 Abbreviated summary of ADHD PGS studies included in meta-analysis

Effect Sizes

To determine the overall effect size of the association between ADHD PGS on phenotypic ADHD, Pearson’s r effect sizes were computed for each study (using the “esc” package in R), thus allowing us to combine studies that used a case–control design (i.e., logistic regression) with population-based cohorts (i.e., linear regression or correlations). We used analyzed (rather than total) sample sizes in the effect size computation. Some studies reported odds ratios (OR’s) or standardized betas (β’s) for multiple ADHD phenotypes (e.g., parent versus teacher report, questionnaire versus interview) and at multiple GWAS PT. Rather than including each of these effect sizes into the meta-analysis, we instead included only the strongest reported association of the ADHD PGS on ADHD for each study. We assumed that these estimates would be more consistent with the effects reported in studies that did not test multiple ADHD PGS associations at multiple thresholds and phenotypes.

Analyses

All analyses were performed in R, Version 3.6.2. The meta-analysis was performed in the R packages “meta”, “dmetar”, and “metafor”. We reported both fixed- and random-effects models to the estimate the pooled effect size across ADHD PGS studies (Borenstein et al., 2010). Between-study heterogeneity was estimated using Cochran’s Q test, which measures the weighted and summed difference between the observed effect sizes and the pooled effect, and Higgin’s and Thompson’s I2, which assesses the percentage of variability in effect sizes not due to sampling error (Higgins & Thompson, 2002). High heterogeneity may be suggestive of excessive study differences, which can complicate interpretations of the meta-analytic pooled effect size (Higgins & Thompson, 2002). For potential moderators, we also explored whether 1) the effects of the GWAS discovery sample that was used, 2) the publication year, 3) the sampling strategy of the target sample (i.e., case–control or population-based), 4) the primary method of measurement (i.e., parent report only vs. multi-informant), and 5) ADHD was measured categorically or continuously, moderated the association of ADHD PGS and phenotypic ADHD effect sizes via meta-regression. Finally, we evaluated the possibility of publication bias in two ways: the Egger’s test of the intercept (Egger et al., 1997), which tests asymmetry in the effect size estimates relative to sample sizes, and p-curve analysis (Simonsohn et al., 2014), which plots the distribution of the p-values across studies and estimates the true meta-analytic effect size after accounting for possible “p-hacking.”

Results

Summary of ADHD PGS Studies

There were 18 studies that met our criteria for the meta-analysis, although only 12 effect sizes were ultimately included in the meta-analysis, yielding a pooled N = 40,088 (see Fig. 1). Three of the 18 studies (Albaugh et al., 2019; Benca et al., 2017; Hawi et al., 2018) did not report the information needed to compute effect sizes and could not be included in the meta-analysis. We also identified several studies that published on the same target samples: Child and Adolescent Twin Study in Sweden (CATSS) (Brikell et al., 2018; Taylor et al., 2019), Generation R (Alemany et al., 2019; P. R. Jansen et al., 2018), the Avon Longitudinal Study of Parents and Children (ALSPAC) (Riglin et al., 2016; Stergiakouli et al., 2017), and a community case–control (Nigg et al., 2018, 2019). To avoid sample overlap in our effect sizes, we included the study among each pair that either featured the most recent GWAS discovery sample (Generation R: Alemany et al., 2019; ALSPAC: Stergiakouli et al., 2017), the study that used that utilized a larger portion of the target sample (CATTS: Taylor et al., 2019; ALSPAC: Stergiakouli et al., 2017) or the study most recently published (Nigg et al., 2019). Table 1 provides an abbreviated summary of the 12 effect sizes included in the meta-analysis; more detailed characteristics of each study can be found in Supplemental Table 1.

We highlight several characteristics of the studies included in our meta-analysis (Table 1). The most used GWAS PT was PT < 0.50. ADHD was measured across studies in a variety of ways, including parent-rated questionnaires (e.g., Child Behavior Checklist, CBCL; Strengths and Difficulty Questionnaire, SDQ), semi or fully-structured clinical interview (e.g., Kiddie Schedule for Affective Disorders and Schizophrenia; K-SADS, Diagnostic Interview Schedule for Children; DISC) or a combination of both methods. Seven studies included in the meta-analysis used population-based target samples where ADHD was measured continuously (e.g., symptom counts or latent factors), except for Li (2019a, b). All studies included in the meta-analysis accounted for age, child sex, and population stratification effects via genetic principal components (PCs) as covariates in their association analyses. However, not all studies included the same number of genetic PCs (see Supplemental Table 3 for additional details about each study). All studies consisted of target populations that were predominantly or entirely of Western European descent.

Meta-analysis, Heterogeneity, and Moderation Analysis

In the random-effects model, the pooled effect size of the ADHD PGS was rrandom = 0.201, 95% CI = [0.144, 0.288], p < 0.001 (r2random = 0.040) (see Fig. 1). In the fixed-effects model, the pooled effect size of the ADHD PGS was rfixed = 0.190, 95% CI = [0.180, 0.199], p < 0.001 (r2fixed = 0.036). However, there was evidence of excessive heterogeneity, Cochran’s Q = 360.40, df = 11, p < 0.001; I2 = 96.9%, 95% CI = [95.9—97.8%]. A multiple meta-regression analysis showed significant associations of sampling [i.e., population-based (0) vs case–control (1)] of the target sample, B = 0.128, se = 0.050, p = 0.015, 95% CI = [0.030, 0.225] and the use of continuous (1) vs. categorical (2) measures of ADHD, B = 0.115, se = 0.051, p = 0.025, 95% CI = [0.014, 0.216] on the effect sizes of the PGS across studies. That is, studies with target samples that featured cases and controls (n = 5 studies) and used categorical measures of ADHD (n = 5 studies) reported greater effect sizes than studies that used population-based (n = 7 studies) sampling methods and continuous measures of ADHD (n = 7 studies). There was no significant association of the discovery sample, B = 0.057, se = 0.075, p = 0.446, 95% CI = [.-0.090, 0.205], publication year, B = 0.001, se = 0.027, p = 0.960, 95% CI = [-0.052, 0.054], and informant type, B = -0.048, se = 0.060, p = 0.426, 95% CI = [-0.166, 0.070] on the effect sizes. Notably, only 3 of the 12 studies in our meta-analysis used a non-Demontis et al. (2019) GWAS discovery sample, and only 2 studies of the 12 studies used a multi-informant measurement method as opposed to a parent-only method. Thus, one reason why we may not have detected any moderating effects of the discovery sample or informant method is due to the relative lack of variability in these variables. Overall, 73.7% of the heterogeneity of the effect sizes was accounted for by the five moderators we tested.

Publication Bias

The Egger’s test for effect size asymmetry showed a non-significant deviation from the Y = 0 intercept (B = -0.580, 95% CI = [-8.028, 6.868], p = 0.883) and no clear evidence of publication bias. We then performed a p-curve analysis, a tool that corrects for inflated effect sizes that publication bias produces. For any given sample size, the bigger the “true” effect, the more right-skewed the expected p-curve (Simonsohn et al., 2014). If a “true” effect exists, one expects lower significant p-values than higher significant p-values. Figure 2 shows the results of the p-curve analysis of the 12 meta-analyzed effect sizes. The p-curve analysis indicated that at 99% power, 92% of the p-values are < 0.01. In fact, all 12 of the studies had a p-values lower than 0.025, reflecting a significant right-skewedness of the p-curve (p < 0.0001). Collectively, results from Egger’s test and p-curve analysis suggest no strong evidence of publication bias or “p-hacking” in the meta-analysis.

Discussion

To our knowledge, this is the first meta-analysis of ADHD PGS studies. The meta-analysis included 12 unique effect sizes spanning population-based and case–control target samples (albeit, predominantly of European ancestry) with a pooled N = 40,008. Overall, ADHD PGS were consistently and significantly associated with phenotypic ADHD across studies. ADHD PGS accounted for between 3.6% (in the fixed effects model) to 4.0% (in the random effects model) of the variance in broadly defined phenotypic ADHD. This prediction estimate is in line with PGS estimates observed for other psychiatric disorders with similarly large GWAS samples, including schizophrenia (Mistry et al., 2018b), bipolar disorder (Mistry et al., 2018a; Stahl et al., 2019), and autism spectrum disorder (Grove et al., 2019). This estimate reflects the tremendous progress being made in psychiatric genetics, especially when considering that just over a decade ago the dominant methodology for directly quantifying genetic risk for complex traits like ADHD was via candidate genes, an approach that not only failed to capture much of the variation in ADHD, but also required us to suspend our long-held belief that most complex traits are driven by polygenic, rather than monogenic, influences (Lander & Schork, 1994). As we enter the post-GWAS era, our findings provide important insights into the state-of-the-science in the genetics of ADHD and lay a foundation for future directions in this field.

Results indicate that ADHD PGS reliably predict ADHD symptoms and diagnosis across different samples, with every study reporting a statistically significant effect size. However, there was excessive heterogeneity as we observed a wide range of reported effect sizes. For instance, ADHD PGS effect sizes were r = 0.062 (r2 = 0.004) on the lowest end (Stojanovski et al., 2019) and r = 0.490 (r2 = 0.240) on the highest end (Jansen et al., 2019). Stojanovski et al. (2019) may have been relatively underpowered given that they conducted their association analyses on subsamples of individuals with and without traumatic brain injuries in the Philadelphia Neurodevelopmental Cohort (PNC), a population-based dataset children, adolescents and young adults from the United States. In contrast, Jansen et al.’s (2019) target sample was ascertained from a psychiatric outpatient hospital in the Netherlands that oversampled for children with ADHD (controls were recruited from the general population). In fact, studies with the largest effect sizes in our meta-analysis were typically outpatient or case–control populations that oversampled or focused on individuals with ADHD and related disorders (Jansen et al., 2019; Nigg et al., 2019; Vuijk et al., 2019). The GWAS cohorts in the discovery sample were largely recruited from clinical populations themselves, potentially making them a better ‘match’ to other case–control studies (Savage et al., 2018).

Qualitative differences between the discovery sample and target populations might have contributed to stronger predictive performance of ADHD PGS in clinical or case–control samples relative to population-based samples. The classic liability threshold model (Gottesman & Shields, 1967) provides a plausible theory for why matching the composition of the discovery and target samples in PGS studies may be crucial for optimizing the PGS signal. Clinical populations may thus reflect a different genetic liability distribution than those in the general population, as individuals from clinical populations also typically present with more severe psychopathology and additional comorbidities (Savage et al., 2018). Thus, sampling from a clinical population in the ADHD GWAS may have attenuated the ADHD PGS predictive effect in the general population (Supplemental Fig. 2), where a higher PGS would be required to meet the liability threshold in a general population than a clinical population. Existing GWAS discovery samples for ADHD seem optimized for studies where the target population is also a case–control sample and/or a clinical population, and possibly less so for samples featuring a general population. We note, however, that there may be other differences besides sampling type that could contribute to differences in the PGS distribution (e.g., age of the sample, informants that were used to phenotype, ancestry, etc.). Future studies should examine these effects by comparing the performance of PGS based on various discovery samples as they pertain to different target populations. Furthermore, the excessive study heterogeneity could also be partly due to the different GWAS PT used across studies, although most studies used PT < 0.50. We already discussed the problem with this convention, including issues with model overfit (Benjamini et al., 2001) and a lack of generalizability when using a standard PT across studies and populations. As it stands, there is currently no established, universal, or agreed upon GWAS PT with which to specify in deriving a PGS. Few, if any psychiatric PGS studies have employed a separate discovery population, independent from the GWAS, to select an optimal PT for the target population. At the same time, selecting an optimal GWAS PT this way limits the validity and reliability of PGS as predictors since it means that a different GWAS PT will have to be used each time a different sample is analyzed. We recommend future studies consider reporting PGS effects sizes using a GWAS PT of 1.0, which incorporates all available genotypic information without the need for model selection. While it is highly unlikely that all genes contribute to the various outcomes, the computation of the PGS accordingly downweighs trivial variants based its small (or null) GWAS effect size.

Limitations of the Meta-analysis

Some limitations of the meta-analysis should be noted. First, we focused our review on studies that exclusively used ADHD GWAS summary statistics that were publicly available via the PGC, rather than studies that used GWAS summary statistics from smaller/local samples. This decision was motivated by our desire to enhance the comparability across ADHD PGS studies and to subsequently perform a meta-analysis of these studies. However, we have already noted that the small overall effect size of the ADHD PGS across studies may have been due to the discrepancy between the ADHD discovery sample (i.e., case–control samples) and the target samples (predominantly population-based). Meta-analyzed GWAS of population-based samples for ADHD (e.g., UK Biobank, 23andMe) thus offer a compelling direction for a future GWAS in this area. Second, we noted that some studies reported multiple ADHD outcomes at multiple PT, thus providing multiple effect sizes with which to include in our meta-analysis. We mentioned that we selected the most significant effect size that was reported in each study, in part because we assumed that studies that did not report multiple effect sizes likely also just reported what was most robust. Thus, the pooled effect size reported in this meta-analysis is likely more liberal of an estimate than if we had only included the least significant effect sizes across studies. Third, nearly every study in our meta-analysis (as well as the GWAS discovery cohorts) consisted of individuals of predominantly (Western) European ancestries. PGS are known to be imprecise in non-European populations because these populations are underrepresented in GWAS discovery samples (Martin et al., 2019). Although well-powered transancestral GWAS studies of psychiatric outcomes have just recently emerged (Bigdeli et al., 2017; Walters et al., 2018), the lack of racial diversity is concerning and has the potential for exacerbating health disparities if and when genomic information becomes widely used for clinical applications (Martin et al., 2019). Fourth, all studies in our meta-analysis assumed an underlying linearity with respect to the PGS. This ignores the possibility of non-linear or non-additive allelic effects in the PGS computation, which is problematic when considering that not all genes have the same mode of inheritance. Additionally, emerging studies have shown that polygenic risks may not be linearly associated with psychiatric risks, especially at the upper end of the PGS distribution (Khera et al., 2018; Li, 2019b).

Future Directions

Accounting for comorbidity

There is shared genetic variation between most of the major disorders in the DSM (Cross-Disorder Group of the Psychiatric Genomics Consortium, 2013). Psychiatric comorbidity complicates the interpretation of PGS for a single disorder because these scores likely contain a mixture of pleiotropic variants and variants that are unique to the disorder. It is important to disambiguate the genetic signals that are shared from those that are specific to discrete disorders in order to enhance the prediction accuracy of trait-specific PGS. In the absence of accounting for this phenomenon, it is unsurprising that psychiatric GWAS, and by extension, PGS studies informed by these GWAS, have yet to yield any high impact discoveries and large effect sizes, respectively.

Most well-powered psychiatric GWAS do not allow for the direct identification of higher-order or pleiotropic SNP effects because they typically employ case–control samples, where cases are recruited and measured on the presence of a single (or a few related) disorder of interest. Genomic Structural Equation Modeling (GSEM) was developed as a way to identify the variants that affect cross-trait liability (i.e., influencing a general factor) and trait-specific liability leveraging only the GWAS summary statistics for these phenotypes (Grotzinger et al., 2019). One of the more promising applications of GSEM is in PGS. The proof of this concept was demonstrated by modeling a general factor PGS from GWAS summary statistics of schizophrenia and major depressive disorder in GSEM (Grotzinger et al., 2019). The general factor PGS was more predictive of the phenotypic general factor, psychotic experiences, depression, mania, anxiety and post-traumatic stress disorder in the completely independent UK Biobank dataset than any univariate version of the PGS (Grotzinger et al., 2019). Their findings not only show what a substantial effect that the general factor has on any single psychiatric outcome, but also potential gains in prediction accuracy and specificity when we control for the general factor disorder specific PGS studies.

Integrating bioinformatics into PGS computations

A common criticism of PGS is that the presence of a statistical association does not necessarily reveal information about the biological mechanisms underlying the trait (Schaub et al., 2012; Subramanian et al., 2005). While gene expression panels such as the Encyclopedia of DNA Elements (The ENCODE Project Consortium, 2011) and the Epigenomics Roadmap Project (Roadmap Epigenomics Consortium et al., 2015) have generated a wealth of data on the linkages between genetic variation and gene function, their applications for PGS have yet to fully develop and take hold in the field. Software are available to functionally annotate GWAS summary statistics from gene expression panels like ENCODE and the Roadmap Project (e.g., DAVID; Dennis et al., 2003; GenoSkyline; Lu et al., 2016). These approaches integrate bioinformatics with GWAS to partition the SNP heritability by how strongly enriched the genetic signals are in human cells and tissues. Using GenoSkyline, which corrects for multiple testing and linkage disequilibrium (LD), genes for ADHD have been shown to be significantly overrepresented in the anterior caudate of the brain, although enrichment in other brain regions were also implicated to a lesser degree (Supplemental Fig. 3). Another gene expression assay for ADHD GWAS (which did not control for multiple testing or LD) detected gene enrichment across a broader range of brain regions, including the anterior cingulate gyrus (ACC), the anterior caudate, and the dorsolateral prefrontal cortex (DLPFC) (Demontis et al., 2019). These structures collectively play a crucial role in human reward processing and decision-making (Volkow et al., 2011). Thus, in the case of ADHD, functional annotations of the GWAS confirmed the regions of the brain that we had long suspected in its etiology (Li, 2018; Luman et al., 2005), but they also provide us with clues as to which brain regions are more relevant to ADHD than others.

Once GWAS summaries have been mapped onto gene expression panels, the resultant information can then be directly leveraged into PGS computations, where SNPs that are overrepresented in functionally annotated sites can be prioritized and weighted in the PGS computation. This contrasts with the traditional approach to computing PGS that prioritize SNPs based solely on statistical significance. PGS using highly enriched GWAS signals via AnnoPred significantly outperformed prediction estimates from a traditional PGS approach in predicting several complex phenotypes, including Crohn’s disease, breast cancer, rheumatoid arthritis, Type-II diabetes, and celiac disease (Hu et al., 2017). We are not aware of any applications of this approach in psychiatric genetics at this time but integrating bioinformatics into PGS computations can promisingly lead to new hypotheses or provide evidence in support of existing hypotheses with respect to the biological processes underlying many complex traits.

Testing mechanistic theories of psychiatric outcomes

PGS can also be leveraged to validate psychological constructs or to test new or existing models of heritable psychiatric disorders. One empirical framework is the Polygenic-Phenotypic Mediation Model (PPMM) (Li, 2019a; Li et al., 2019), which involves comparing two mediational models of a given trait B: 1) a phenotypic model, in which the independent variable for trait A is measured using traditional psychological methods, such as self-report or a behavioral paradigm, and 2) a polygenic model, in which the independent variable is characterized by a PGS of trait A. Both models are tested in relation to another trait B, where concordance between the phenotypic and polygenic mediation models provides robust validation of a mechanism involved in trait B and robust evidence of a genetic relationship between the two traits (Supplemental Fig. 4). The strength of PPMM is that the two models complement each other by addressing their respective limitations. Polygenic models, which are derived mechanically from completely independent GWAS datasets, are impervious to many of the methodological confounds that typify a traditional phenotypic model (i.e., common method variance, self-reporter bias). Phenotypic models on the other hand, address limitations of the polygenic model by virtue of providing construct validity of the PGS, whereby the genetic associations between the two traits also reflect observed associations as well. Furthermore, temporally separating measures of traits A and B allow researchers to test even more robust models of causality, particularly in cases where specific mechanisms are theorized to intervene in these pathways.

PPMM was first illustrated in the context of ADHD in examining that causal pathways of psychosocial risk from early childhood ADHD to later antisocial behaviors in adulthood (Li, 2019a). Using the PPMM framework, results of the study revealed that children with ADHD exhibited greater difficulties in school, which in turn predicted their antisocial development during adulthood even after accounting for the simultaneous effects of supportive parenting and peer effects during adolescence. Importantly, this indirect effect was completely replicated in the polygenic model of ADHD. Another study using the same PPMM framework showed that phenotypic and polygenic models of neuroticism indirectly predict late life depression via mid-life stressful life events and low social support in the Wisconsin Longitudinal Study (Li et al., 2019). Collectively, these studies show how PPMM could be used to validate theories about the psychosocial mechanisms involved in complex psychiatric development.

The two study examples focused on psychosocial factors as mechanisms rather than as moderators of genetic liability. This represents a significant departure from convention in the gene-environment literature. In fact, gene-environment correlation effects are quite pervasive in the psychopathology literature and are likely to explain why gene-environment interactions are so difficult to detect and replicate (Knafo & Jaffee, 2013). Furthermore, PPMM can be used to test more proximal mechanisms of genetic liability, including neurobiological or cognitive endophenotypes (Gottesman & Gould, 2003). One of the studies included in the meta-analysis (Nigg et al., 2018) showed that ADHD PGS was indirectly associated with phenotypic ADHD via working memory and arousal/alertness. New paradigms such as PPMM can be used to incorporate measures at multiple levels of analysis, including those rigorously assayed in the laboratory, to shed new light on biological as well as psychosocial mechanisms of risk for ADHD and other psychiatric outcomes (Hinshaw, 2018).

Conclusion

The rapid rise in PGS studies published in just the last half decade reflects the multidisciplinary enthusiasm the approach has garnered, from scientists who view PGS as a new model for explaining the genetic variation underlying complex traits (Wray et al., 2014), to those who view PGS as a tool for clinical prediction and diagnosis (Anderson et al., 2019; Martin et al., 2019; Torkamani et al., 2018). Prediction estimates from ADHD PGS should become even more robust as our GWAS sample sizes increase (Chatterjee et al., 2016). For instance, the increase from just a few thousand (B. M. Neale et al., 2010) to several tens of thousands of ADHD cases and controls (Demontis et al., 2019) in GWAS have only just enabled the detection of genome-wide significant findings for ADHD. We note that simulations of genetic data indicate that increases in the proportion of phenotypic variance explained by PGS in several complex traits is non-linearly associated with the GWAS sample size (Chatterjee et al., 2013). Greater prediction accuracy depends on the heritability of the trait as well, where more heritable traits (e.g., height) require fewer samples to achieve the same gains in prediction accuracy than less heritable and presumably more complex traits. PGS are projected to be a major part of precision medicine moving forward as our technology improves and the science becomes more refined.

At the same time, it is also important to mitigate any unbridled enthusiasm around this conceivably not-too-distant future where our genes are used to predict our health and behavior. The current lack of diversity in GWAS studies of ADHD crucially limits the utility of ADHD PGS to those with Western European-ancestries. Unless substantially greater efforts are made to collect genetic samples from a broader range (i.e., non-Western European descent) of populations, PGS studies will only continue to be disproportionately focused on Western European-descent populations (Martin et al., 2019). Furthermore, history has been fraught with infamous examples of the social consequences that stemmed from strong beliefs about genetic determinism (Epstein, 2003; Galton, 1904). Prominent experts have already warned about the potential invasions of privacy and cavalier presentations of genetic information that direct-to-consumer genetic testing companies have introduced to mass audiences (Zettler et al., 2014). As psychiatric genetics continues to rise, we must be careful not to also perpetuate the old and dangerous stereotype that our destiny is in our genes. Despite the valuable information they provide about us, genes alone do not determine who we become.