Introduction

The study of human genetics has established that the vast majority of complex traits that are genetically influenced are polygenic—with many small effects being distributed across the genome (Visscher et al. 2021). Polygenic Scores (PSs) are a class of prediction methods that can aggregate these polygenic effects for a given phenotype (e.g. height, BMI, psychiatric risk) to explain a substantial amount of variation in phenotypes using genetic data (Sugrue and Desikan 2019). While not useful as stand-alone diagnostic measures, previous research has shown that, when incorporated with other traditional risk measurements, PS can improve predictive model accuracy for common diseases like cancers (Jia et al. 2020; Kachuri et al. 2020), Coronary Artery Disease (Inouye et al. 2018; Klarin and Natarajan 2022), and Type 2 Diabetes (Ashenhurst et al. 2022; Ge et al. 2022).

A major challenge facing the application of PS is their diminished performance when being trained and deployed in different ancestry groups. Thus far the majority of well-powered Genome-Wide Association Studies (GWASs) have been conducted on individuals of European ancestry (Bitarello and Mathieson 2020; Lewis and Green 2021; Martin et al. 2019; Peterson et al. 2019), limiting their utility in non-European ancestry groups. The over-representation of European ancestry individuals in genetic studies has led many to fear this may exacerbate health disparities (Martin et al. 2019). It is thought this drop in performance when deploying PS between ancestry groups is related to differences in linkage-disequilibrium (LD) and allele frequencies (Wang et al. 2020), which will likely be addressed by conducting GWAS in more diverse samples. While there have been efforts to improve the number of non-European GWAS samples by groups like the Hispanic/Latino Anthropometry (HISLA) Consortium (Fernández-Rhodes et al. 2022), the Population Architecture Using Genetics and Epidemiology (PAGE) Study (Matise et al. 2011), and the African Ancestry Anthropometry Genetics Consortium (AAAGC) (Ng et al. 2017) among others, European ancestry samples are usually the largest for a given phenotype and cross-ancestry PS methods are limited.

There exist multiple different methods for computing PS (Choi and O’Reilly 2019; Ge et al. 2019, 2022; Ruan et al. 2022) which each attempt to address two issues of GWAS: (1) distinguishing impactful and non-impactful variants and (2) LD correlations across the genome. The classic PS method, Pruning and Thresholding (P + T), addresses these issues by (a) grouping genomic variants according to LD correlations (pruning), then (b) restricting remaining variants to those meeting a p-value threshold (thresholding) (Choi et al. 2020; Choi and O’Reilly 2019; Marees et al. 2018). An alternative PS approach tackles these issues by using shrinkage techniques which have been shown to achieve superior performance (Ge et al. 2019; Privé et al. 2020). Finally, PS methods exist which are specifically designed to be deployed across multiple ancestries (Ge et al. 2022; Ruan et al. 2022). Benchmarking these methods in independent datasets of ancestral diverse individuals is of importance in evaluating their relative performance.

The Adolescent Brain Cognitive Development (ABCD) Study® is a longitudinal study with deep phenotyping of over 11,000 children from 9 to 11 years old, with wide sociodemographic and genetic diversity across the United States. This study provides an ideal opportunity to profile the performance of these recently developed PS methods in an ancestrally diverse cohort. In the present study, we deploy four PS methods: PRSice2, PRScs, PRScsx, and PRScsx Meta. For this analysis, we use PRSice2 as our P + T method (Choi and O’Reilly 2019). PRScs (Ge et al. 2019) has emerged as a particularly effective shrinkage PS method, and PRScsx and PRScsx Meta leverage information from multiple ancestries to improve PS generation trans-ancestrally (Ge et al. 2022; Ruan et al. 2022). We use each of these four methods to generate PS for four phenotypes in the ABCD sample: height, body mass index (BMI), schizophrenia risk, and depression risk. For each of these phenotypes, we utilize previous GWAS in five ancestral groups where possible: European (EUR), East Asian (EAS), South Asian (SAS), Hispanic (HIS), and African (AFR). The PSs from these methods are then evaluated, against relevant phenotypes, separately in European, African, and Mixed ancestry individuals in the ABCD cohort. We hope that the results from this analysis will help guide future work aiming to utilize PS in populations of diverse ancestries.

Methods

ABCD sample

Our sample consisted of 11,178 children from the 4.0 of the Adolescent Brain Cognitive Development (ABCD) study (https://doi.org/10.15154/1523041) with qualified genetic data. The ABCD cohort was recruited to ensure the sample was as close to nationally representative as possible, and therefore exhibits large sociodemographic diversity (Garavan et al. 2018). There is an embedded twin cohort and many siblings.

ABCD phenotypes

Dependent variables in this analysis come from the baseline visit from ABCD release 4.0. Two anthropomorphic traits were used in our analysis: height, calculated as the mean of 3 measurements, and body mass index (BMI), calculated as weight/height2. Two behavioral metrics were used in our analysis: KSADS Total Symptoms as reported by the participant’s caregiver and CBCL Total Problems based on an extensive battery of questionnaires and interviews. The Kiddie schedule for affective disorders and schizophrenia (KSADS) measure represents the combined outcome of a self-administered version of the K-SADS-5 assessment filled out by the caregiver of the child enrolled in the ABCD study. The KSADS assessment is a diagnostic tool commonly used to identify symptoms, behaviors, and impairments potentially related to psychiatric disorders, and our variable represents a normalized total of all responses to a participants K-SADS-5 assessment and is used in this analysis as a single metric of the level of presence or absence of psychiatric disorders (Kaufman et al. 1997). The KSADS variable used in our analysis was normalized using a rank-based inverse normal transformation. The K-SADS-5 assessment has been shown to be valid, reliable, and replicable for children of diverse cultural and national backgrounds (de la Peña et al. 2018; Dun et al. 2022; Kaufman et al. 1997; Kim et al. 2004; Nishiyama et al. 2020; Shahrivar et al. 2010).

The child behavioral checklist (CBCL) is a caregiver-reported assessment of their child on eight syndrome scales: anxious/depressed, withdrawn/depressed, somatic complaints, social problems, thought problems, attention problems, rule-breaking behavior, and aggressive behavior which are then combined into a single score for this analysis representing the normalized total number of behavioral problems reported in the CBCL assessment. The resulting combined score was normalized using a rank-based inverse normal transformation. The CBCL Total Symptoms score represents a single metric of a participants emotional functioning and well-being as evidenced through behavior (Achenbach and Rescorla 2004). The CBCL is a portion of the Achenbach System of Empirically Based Assessment (ASEBA) designed to be used on school-aged children between the ages of 6 and 18 (Achenbach and Rescorla 2004) and it has been found to be valid and reliable for children of diverse cultural and national backgrounds (Albores-Gallo et al. 2007; Dutra et al. 2004; Hartini et al. 2015; Leung et al. 2006).

Summary statistics data access

Our analyses were limited by the public availability of large and diverse GWAS of mental phenotypes. Furthermore, our trans-ancestry method required that our summary statistics be easily separable into distinct continental ancestries which prevented us from using some large trans-ancestry meta-analyses. The phenotypes height and BMI were selected to show the efficacy of these methods across some of the most well-powered anthropomorphic phenotypes. Depression was chosen because of its prevalence among adolescence (Goodwin et al. 2022) and because of its availability as summary statistics in multiple ancestry groups. Schizophrenia was chosen despite its low prevalence in an adolescent population sample because of its availability of summary statistics in non-European ancestries and because previous studies have found differential experiential and mental phenotypic manifestations in adolescents with high genetic load for schizophrenia before clinical manifestation of the disorder (Jones et al. 2016; Woolway et al. 2022). Data were collected from publicly available GWAS summary statistics found through GWAS Catalog (https://www.ebi.ac.uk/gwas/home) and Google scholar. Additional information regarding existing and available GWAS summary statistics was also gathered from the GWAS catalog (https://www.ebi.ac.uk/gwas/home). A full list of data sets used can be found in Supplementary Table 1. All summary statistics were aligned to genome build GRCh38.

Genetic data

Genetic data was collected using blood or saliva samples from participants of the ABCD study (Uban et al. 2018). 656,247 genomic markers were measured using the Smokescreen array (Baurley et al. 2016). Genetic principal components were calculated from these genetic data using PC-Air (Conomos et al. 2015) with default settings. We calculated participants’ continental genetic ancestry as calculated using SNPweights (Chen et al. 2013) and precompiled external genomic reference panels from the 1000 Genomes Project (Auton et al. 2015), and Indigenous reference panels (Reich et al. 2012). Participants were categorized into groups depending on if they had an inferred genetic ancestry at least 80% consistent with a continental reference panel (African continental ancestry, East Asian continental ancestry, European continental ancestry, or indigenous North and South American ancestry) or admixed meaning that their genetic ancestry did not meet the 80% threshold for any of the ancestries due to a genetic admixture of two or more of the aforementioned continental ancestral components. We chose to use inferred genetic ancestry as opposed to other methods as this enabled us to not only define individuals of continental ancestries but also to define an admixed ancestry group (i.e., those that did not meet criteria for a single continental ancestry). Genetic PCA plots of our sample (see Supplementary Fig. 1) show that inferred ancestries labels individuals as would be expected. Due to the relatively small portion of the participants of genomic indigenous ancestry (n = 53) and participants of continental East Asian ancestry (n = 158) were left out of our total testing sample. Our total testing sample consisted of an African ancestry group (AFR, n = 811), a European ancestry group (EUR, n = 6703), and an admixed ancestry group (MIX, n = 3664).

To increase the overlap of genetic variants in ABCD with summary statistics from previous GWAS we imputed markers measured from the Smokescreen array using the TOPMED imputation server (Taliun et al. 2021). These imputed variants were fractional dosages that were converted to an integer number of alleles using a best guess threshold of 0.9. This resulted in 280,850,795 imputed variants aligned to genome build GRCh38. After imputation target genetic data was restricted to only autosomal variants with a minor allele frequency of 1% (0.01) or greater leaving just under 11 million Single Nucleotide Polymorphisms (SNPs) in the target data.

Polygenic score methods

As some methods required hyperparameter tuning we generated 100 50/50 cross-validation splits within each ancestry group. We ensured that family members were not split across training and testing folds (with families being defined using the ‘rel_family_id’ variable). This provided training folds, for hyperparameter tuning, and testing folds for the evaluation of PS performance.

We provide a brief description of each PS method used in this analysis:

  1. 1.

    P + T: PRSice2

    1. a.

      PRSice2 is an LD-informed pruning and P-value thresholding method meaning that it groups and thins SNPs according to LD and P-value and then limits these SNPs to only those that exceed a given P-Threshold (Choi and O’Reilly 2019). For our analysis, we used default parameters with a range of P-value thresholds (0, 0.001, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 0.9 and 1) where we applied the threshold maximizing R2 in the training fold to the test fold. Reported R2 values were calculated in the testing fold.

  2. 2.

    CS: PRScs

    1. a.

      PRScs is a Bayesian calculation method that uses GWAS summary statistics and LD reference information as well as a continuous shrinkage prior to infer posterior SNP effect sizes which is found to be both reliable and computationally efficient in varying genetic architectures (Ge et al. 2019). For our analysis, we used base parameters except with regards to MCMC and burnin where we used a slightly higher threshold of 10,000 MCMC iterations and 5000 burnin. These values were chosen as they have been shown to help increase stability of posterior effects between runs without being too computationally intensive (Schultz et al. 2022). LD references were based on data from 1000 Genomes Phase 3 to match respective summary statistics (AFR, AMR, EAS, EUR, or SAS). PLINK 2.0 was used to generate Polygenic Risk scores from PRScs posterior effect sizes (Purcell et al. 2007).

  3. 3.

    CSx: PRScsx

    1. a.

      PRScsx is a Bayesian polygenic modeling method that integrates GWAS and LD information from multiple ancestrally diverse populations to improve the estimation of posterior SNP effects (Ruan et al. 2022). For our analysis, we used recommended parameters (apart from MCMC iterations and burnin) and the appropriate provided LD references from 1000 Genomes Phase 3 for a given summary statistic (AFR, AMR, EAS, EUR, or SAS). PLINK 2.0 was used to generate ancestry-specific PSs from PRScsx posterior effect sizes (Purcell et al. 2007). We used 10,000 MCMC iterations and 5000 burnin to increase stability (Schultz et al. 2022) and to be constant with other methods. Ancestry-specific PSs were combined using a linear combination of ancestry-specific PSs as advised by the authors of the method. Weights were learned in training folds using linear regression to predict the phenotype of interest as:

      $${\text{PRS}}\, = \,\,w_{{{\text{Ancestry1}}}} {\text{PRS}}_{{{\text{Ancestry1}}}} + w_{{{\text{Ancestry2}}}} {\text{PRS}}_{{{\text{Ancestry2}}}} \ldots + w_{{{\text{AncestryN}}}} {\text{PRS}}_{{{\text{AncestryN}}}} ,$$

      where w represents the relative weight. R2 values were reported in the validation fold after applying weights learned in training folds.

  4. 4.

    CSx Meta: PRScsx Meta

    1. a.

      PRScsx Meta uses the same Bayesian polygenic model as PRScsx, but instead of needing hyperparameter tuning, it uses an inverse-variance-weighted meta-analysis to produce a single set of posterior effects (Ge et al. 2022). For our analysis, we used recommended parameters (apart from MCMC iterations and burnin) and the appropriate provided LD references from 1000 Genomes Phase 3 for a given summary statistic (AFR, AMR, EAS, EUR, or SAS). We used 10,000 MCMC iterations and 5000 burnin to increase stability (Schultz et al. 2022) and to be constant with other continuous shrinkage methods. PLINK 2.0 was used to generate Polygenic Risk scores from PRScsx posterior effect sizes (Purcell et al. 2007).

Statistical analysis

To assess the association between each PS method and the relevant dependent variable Generalized Additive Mixed Models (GAMMS) were fitted using the gamm4 package in R (Wood and Scheipl 2022) in each test fold. Each model predicted a different mental or anthropomorphic feature, depending on the PS of interest. Each model was corrected for participants’ age in months, sex, ABCD study site, and the top ten generic principal components calculated using PC-AIR (Conomos et al. 2015)as fixed effects and a participants’ family id as a random effect. Nagelkerke R2 (Nagelkerke 1991) values were calculated between reduced (covariates only) and full (covariates + PS) models. Supplementary Tables were generated using the t-tests functions and False Discovery Rate (FDR) correction from the R package ‘stats’ (R Core Team 2022).

Genome-wide complex trait analysis (GCTA)

For some traits we unexpectedly found higher performance in non-European ancestry cohorts. In attempt to understand this further we conducted GCTA (Yang et al. 2011) each ancestry subsample separately to estimate snp-heritability (\({h}_{snp}^{2})\) for each analyzed measure in ABCD to quantify the genetic variability contributing to each trait—independent of previous GWAS summary statistics or any specific polygenic score method. For this we constructed a GRM (genetic relatedness matrix) using ‘gcta –make-grm’ for each ancestry individually. We then filtered this GRM to unrelated individuals using a threshold of --grm-cutoff = 0.025 as recommended. This resulted in 5133, 683 and 414 individuals in European, African and Admixed ancestry cohorts, respectively. We then performed GCTA on this pruned GRM, using covariates described above to obtain point estimates of \({h}_{snp}^{2}.\) With the small sample sizes for this analysis we observe large error bars and so use point estimates as an indication of differences and interpret with caution.

Results

The results in this analysis utilize the full baseline visit data from ABCD data release 4.0, an ancestrally diverse longitudinal cohort of children from 21 different data acquisition sites around the United States (https://doi.org/10.15154/1523041). We restricted our analysis to the three largest ancestral groups of the ABCD sample: African (AFR), European (EUR), and Admixed (MIX). Membership within each ancestry group was defined as an individual having greater than 80% inferred continental ancestry for the given group. We inferred participants’ continental genetic ancestry as calculated using SNPweights (Chen et al. 2013) and precompiled external genomic reference panels from the 1000 Genomes Project (Auton et al. 2015), and Indigenous reference panels (Reich et al. 2012). Genetic ancestry estimates for each group are shown in Fig. 1. Assignment into each of these three ancestral groups covers 98% of the full ABCD baseline sample. Sample sizes and demographic information of these groups are presented in Table 1.

Table 1 Size and demographic information of the ancestry subsamples used in this analysis
Fig. 1
figure 1

Genetic ancestry proportion for individuals of each ancestry subpopulation in ABCD categorized into African, East Asian, European, and Indigenous North and South American ancestry portions. AC represent European, African, and Admixed ancestry populations respectively. D shows the mean genetic ancestry components of the admixed population

Height polygenic score

Figure 2 shows the predictive performance of different PS models in predicting height. The single ancestry continuous shrinkage method based on European ancestry LD and summary statistics from a large European GWAS (Yengo et al. 2022), CS EUR, outperformed other methods in the European and African ancestry groups with a mean variance explained of 14.5% and 13.2% respectively. In the admixed ancestry group the best performing method was the trans-ancestry, continuous shrinkage, meta-analysis method, CSx Meta, which accounted for a mean variance explained of 10.9%. CS EUR was roughly comparable to CSx Meta (RCS EUR2/RCSx Meta2 = EUR:1.05, AFR:1.03, MIX:0.96) and was an improvement on CSx (RCS EUR2/RCSx2 = EUR:1.11, AFR:3.42, MIX:1.11) and the best performing P + T Method, P + T EUR (RCS EUR2/RP+T EUR2 = EUR:1.21, AFR:1.59, MIX:1.65).

Fig. 2
figure 2

Variance explained of PS methods in predicting height across 100 folds of 50/50 cross-validation. AC Represent performance in AFR, EUR, and MIX ancestry populations respectively. Methods marked ‘CSx Meta’ and ‘CSx’ represent the meta-analysis and hyperparameter-weighted trans-ancestry outputs from the continuous shrinkage method PRScsx. Methods marked ‘CS’ represent the single ancestry outputs of the continuous shrinkage method PRScs (both LD reference and summary statistics from a single ancestry). Methods marked ‘P + T’ represent the single ancestry outputs from the linear pruning and thresholding method PRSice2. The three letter abbreviations included in some methods represents the sample ancestries as follows: European ancestry (EUR), African ancestry (AFR), East Asian ancestry (EAS), North or South American ancestry (AMR), and South Asian ancestry (SAS). Additional information about the summary statistics can be found in Supplementary Tables 1 and additional phenotype data can be found in Supplementary Fig. 2

Although CSx achieved worse but comparable performance in EUR and MIX groups, it showed particularly low performance in the AFR group (see Fig. 2). This may be explained by the wide variability in weightings of ancestry-specific PS across training folds used to calculate CSx for the AFR group: compare panel A with panels B and C in Supplementary Fig. 6. P + T (using PRSice2) showed the lowest performance across all ancestry groups except the African ancestry sample where PRScsx showed the lowest performance. Numeric summaries of these results are available in Supplemental Tables 2–4.

BMI polygenic score

When comparing the performance of PS methods in predicting Body Mass Index (BMI), we found that the trans-ancestry, continuous shrinkage, meta-analysis method, CSx Meta, performed best in all ancestries. CSx Meta accounted for a men variance explained of 11.7% in the European ancestry sample, 11.9% in the African ancestry sample, and 9.9% in the admixed ancestry sample. Numeric summaries of these results are available in Supplemental Tables 5–7.

Once again, we observed particularly low performance of the CSx method in the AFR and observed wide variability in ancestry specific PS weights in training for this method (Supplementary Fig. 7). Once again, the P + T methods (PRSice2) exhibited the lowest performance across ancestry groups, except in the AFR group where the CSx method had the lowest performance (Fig. 3).

Fig. 3
figure 3

Variance explained of PS methods in predicting BMI across 100 folds of 50/50 cross-validation. Panels A, B, and C represent performance in AFR, EUR, and MIX ancestry populations respectively. Methods marked ‘CSx Meta’ and ‘CSx’ represent the meta-analysis and hyperparameter-weighted trans-ancestry outputs from the continuous shrinkage method PRScsx. Methods marked ‘CS’ represent the single ancestry outputs of the continuous shrinkage method PRScs (both LD reference and summary statistics from a single ancestry). Methods marked ‘P + T’ represent the single ancestry outputs from the linear pruning and thresholding method PRSice2. The three letter abbreviations included in some methods represents the sample ancestries as follows: European ancestry (EUR), African ancestry (AFR), East Asian ancestry (EAS), North or South American ancestry (AMR), and South Asian ancestry (SAS). Additional information about the summary statistics can be found in Supplementary Tables 1 and additional phenotype data can be found in Supplementary Fig. 3

Depression polygenic score

In predicting CBCL total problems from depression GWAS, we found the CS EAS method trained on an East Asian GWAS (Giannakopoulou et al. 2021) achieved the highest performance in the African ancestry sample, the CS EUR methods trained on a European GWAS (Howard et al. 2019) achieved the highest performance in the European ancestry sample, and CSx Meta performed best in the admixed ancestry sample—see Fig. 4. The mean variance explained by CS EAS was 10.1% in the African ancestry sample, 1.1% in the European ancestry sample, and 2.0% in the admixed ancestry sample. The mean variance explained for the CS EUR method was: 9.8% in the African ancestry sample, 1.7% in the European ancestry sample, and 2.2% in the admixed sample. The mean variance explained by CSx Meta was 10.0% in the African ancestry sample, 1.6% in the European ancestry sample, and 2.2% in admixed ancestry sample. Numeric summaries of these results are available in supplemental Tables 8–10.

Fig. 4
figure 4

Variance explained of PS methods applied to depression in predicting CBCL total problems scores across 100 folds of 50/50 cross-validation. AC represent performance in AFR, EUR, and MIX ancestry populations respectively. Methods marked ‘CSx Meta’ and ‘CSx’ represent the meta-analysis and hyperparameter-weighted trans-ancestry outputs from the continuous shrinkage method, PRScsx. Methods marked ‘CS’ represent the single ancestry outputs of the continuous shrinkage method PRScs (both LD reference and summary statistics from a single ancestry). Methods marked ‘P + T’ represent the single ancestry outputs from the linear pruning and thresholding method PRSice2. The three letter abbreviations included in some methods represents the sample ancestries as follows: European ancestry (EUR) and East Asian ancestry (EAS). Additional information about the summary statistics can be found in Supplementary Tables 1 and additional phenotype data can be found in Supplementary Fig. 4

In predicting KSADs total problems from depression GWAS, we found the CS EAS method trained on an East Asian GWAS (Giannakopoulou et al. 2021) achieved the highest performance in the African ancestry sample, the CS EUR methods trained on a European GWAS (Howard et al. 2019) achieved the highest performance in the European ancestry sample, the CSx Meta method achieved the highest performance in the admixed ancestry sample—see Fig. 4. Mean variance explained for the CS EAS method was 10.7% in the African ancestry sample, 1.1% in the European ancestry sample, and 2.1% in the admixed ancestry sample. Mean variance explained for the CS EUR method was 10.1% in the African ancestry sample, 1.5% in the European ancestry sample, and 2.3% in the admixed ancestry sample. Mean variance explained for the CSx Meta method was 10.2% in the African ancestry sample, 1.4% in the European ancestry sample, and 2.3% in the admixed ancestry sample.

We observe low performance of the CSx method in predicting both CBCL total problems and KSADS total problems across all ancestry groups, this once again may be due to unstable weightings of ancestry-specific PS that make up the CSx method across cross-validation folds (Supplementary Figs. 8–9 Numeric summaries of these results are available in supplemental Tables 11–13 (Fig. 5).

Fig. 5
figure 5

Variance explained of PS methods applied to depression in predicting KSADs total problems scores across 100 folds of 50/50 cross-validation. AC represent performance in AFR, EUR, and MIX ancestry populations respectively. Methods marked ‘CSx Meta’ and ‘CSx’ represent the meta-analysis and hyperparameter-weighted trans-ancestry outputs from the continuous shrinkage method PRScsx. Methods marked ‘CS’ represent the single ancestry outputs of the continuous shrinkage method PRScs (both LD reference and summary statistics from a single ancestry). Methods marked ‘P + T’ represent the single ancestry outputs from the linear pruning and thresholding method PRSice2. The three letter abbreviations included in some methods represents the sample ancestries as follows: European ancestry (EUR) and East Asian ancestry (EAS). Additional information about the summary statistics can be found in Supplementary Tables 1 and additional phenotype data can be found in Supplementary Fig. 5

Schizophrenia polygenic score

In predicting CBCL total problems from schizophrenia GWAS, we find that the CS EAS method trained on the large East Asian GWAS (Trubetskoy et al. 2022) performed best in the African and admixed ancestry samples and the CS EUR method trained on the large European GWAS (Trubetskoy et al. 2022) performed best in the European ancestry sample. Mean variance explained for the CS EAS method was 9.5% in the African ancestry sample, 1.0% in the European ancestry sample, and 1.9% in the admixed ancestry sample. Mean variance explained for the CS EUR method was 9.2% in the African ancestry sample, 1.0% in the European ancestry sample, and 1.8% in the admixed ancestry sample. There is not a significant difference between the results from CS EAS, CS EUR, and CSx Meta which all accounted for a mean variance of approximately 1.0% (pCS EAS−CS EUR = 0.39, pCS EAS−CSx Meta = 0.88, pCS EUR−CSx Meta = 0.08). A graphic summary of these results can be found in Fig 6. Numeric summaries of these results are available in Supplemental Tables 14–16.

Fig. 6
figure 6

Variance explained of PS methods applied to schizophrenia in predicting CBCL total problems scores across 100 folds of 50/50 cross-validation. AC represent performance in AFR, EUR, and MIX ancestry populations respectively. Methods marked ‘CSx Meta’ and ‘CSx’ represent the meta-analysis and hyperparameter-weighted trans-ancestry outputs from the continuous shrinkage method PRScsx. Methods marked ‘CS’ represent the single ancestry outputs of the continuous shrinkage method PRScs (both LD reference and summary statistics from a single ancestry). Methods marked ‘P + T’ represent the single ancestry outputs from the linear pruning and thresholding method PRSice2. The three letter abbreviations included in some methods represents the sample ancestries as follows: European ancestry (EUR), African ancestry (AFR), East Asian ancestry (EAS), and North or South American ancestry (AMR). Additional information about the summary statistics can be found in Supplementary Tables 1 and additional phenotype data can be found in Supplementary Fig. 4

In predicting KSADS total problems from schizophrenia GWAS, we find that the CS EAS method trained on the large East Asian GWAS (Trubetskoy et al. 2022) performed best in the African ancestry sample, the CSx Meta method performed best in the European ancestry sample, and the CS EUR method trained on the large European GWAS (Trubetskoy et al. 2022) performed best in the admixed ancestry sample. Mean variance explained for the CS EAS method was 9.5% in the African ancestry sample, 1.0% in the European ancestry sample, and 1.9% in the admixed ancestry sample. Mean variance explained for the CSx Meta method was 9.0% in the African ancestry sample, 1.0% in the European ancestry sample, and 1.8% in the admixed ancestry sample. Mean variance explained for the CS EUR method was 9.3% in the African ancestry sample, 1.0% in the European ancestry sample, and 1.9% in the admixed ancestry sample. It is worth noting that the results of CS EAS and CS EUR are not significantly different in the MIX group (p = 0.87). A graphic summary of these results can be found in Fig. 7. Numeric summaries of these results are available in Supplemental Tables 17–19.As with depression PSs, schizophrenia PSs predicting CBCL total problems and KSADS total problems, we observe particularly poor performance of the CSx method; this may be due to unstable weightings of ancestry-specific PS that make up the CSx method across cross-validation folds (Supplementary Figs. 9–10).

Fig. 7
figure 7

Variance explained of PS methods applied to schizophrenia in predicting KSADs total problems scores across 100 folds of 50/50 cross-validation. AC represent performance in AFR, EUR, and MIX ancestry populations respectively. Methods marked ‘CSx Meta’ and ‘CSx’ represent the meta-analysis and hyperparameter-weighted trans-ancestry outputs from the continuous shrinkage method PRScsx. Methods marked ‘CS’ represent the single ancestry outputs of the continuous shrinkage method PRScs (both LD reference and summary statistics from a single ancestry). Methods marked ‘P + T’ represent the single ancestry outputs from the linear pruning and thresholding method PRSice2. The three letter abbreviations included in some methods represents the sample ancestries as follows: European ancestry (EUR), African ancestry (AFR), East Asian ancestry (EAS), and North or South American ancestry (AMR). Additional information about the summary statistics can be found in Supplementary Tables 1 and additional phenotype data can be found in Supplementary Fig. 5

GCTA analysis

For psychological traits the polygenic prediction in the African ancestry group showed surprisingly high performance. To further explore these results we performed Genome-wide Complex Trait Analyses (GCTAs) to see if there were significant differences in SNP heritability (\({h}_{snp}^{2})\) estimates for our mental health traits between our different samples (Yang et al. 2011).

We observed higher point estimates for \({h}_{snp}^{2}\) the African ancestry cohort vs. European and admixed ancestry cohorts for CBCL Total Problems (AFR: \({h}_{snp}^{2}\) = 0.34, SE = 0.88; EUR: \({h}_{snp}^{2}\) = 0.19, SE = 0.08; MIX: \({h}_{snp}^{2}\) = 0.06, SE = 0.82) and for KSADS Total Problems (AFR: \({h}_{snp}^{2}\) = 1.00, SE = 0.93; EUR: \({h}_{snp}^{2}\) = 0.00, SE = 0.08; MIX: \({h}_{snp}^{2}\) = 0.00, SE = 0.09). This is consistent with greater genetic variability in this trait for African ancestry individuals – which may explain our unexpected result of higher polygenic prediction in this sample. However, with small sample sizes we observe wide error bars and so we advise caution in overinterpreting this result. Full results of GCTA Analysis can be found in Supplementary Tables 20–25.

Discussion

In ABCD, we find PRScsx Meta (CSx Meta) and PRScs (CS), especially those run on large European ancestry sample, provide improved polygenic prediction over pruning and thresholding methods. Also, the addition of multiple non-European ancestry reference panels and summary statistics seemed to provide greater predictive utility to models in African and Admixed populations than to European Populations.

Our results showed that PS for height and BMI explained between 9.8 and 14.5% of variance across all ancestry groups. For anthropometric traits, the methods used in this analysis perform best in the European ancestry sample. Differences in performance are likely due to differences in GWAS sample sizes for different ancestries (Karunamuni et al. 2020; Wu et al. 2022). Additionally, for the admixed cohort proportion of European ancestry may be a factor; previous research has shown that in ancestrally diverse populations the predictive validity of PS increases linearly with the individual’s proportion of European genetic ancestry (Bitarello and Mathieson 2020). Unexpectedly, despite PRScsx being a trans-ancestry method it didn’t show as strong of performance in this sample compared to a previous study (Ruan et al. 2022). This relatively poor performance could be due to a lack of homogeneity in the United States (Adhikari et al. 2017) or the high degree of genetic diversity within continental ancestry groups (Adhikari et al. 2016; Campbell and Tishkoff 2008). Indeed, the weighting of the different ancestral components making up PRScsx appeared unstable in our analysis, particularly for psychiatric PS, (see Supplementary Figs. 6–11) which may be driven by these factors and likely explains this method’s low performance in our sample. Additionally, the two best-performing methods (PRScs and PRScsx-Meta) do not require hyperparameter tuning which negates the requirement for cross-validation in the target population making them much easier to deploy. In mental health traits we found uncharacteristically high performance in our African ancestry sample. GCTA analysis showed there were differences in heritability between our samples that could potentially account for the improved performance; however, the wide-error bars of the results prevent us from drawing any definitive conclusions. Additional research is needed to understand the potential factors that underlie this result. There was also surprising predictive power of the CS methods based on GWAS in East Asian Populations (Giannakopoulou et al. 2021; Trubetskoy et al. 2022) on our mental health phenotypes, despite their smaller discovery sample sizes. While the true difference between the results analysis is small and sometimes even not significant, we propose a potential explanation. First there may be differences in environmental factors or confounding diagnostic practices for schizophrenia or depression in different regions in which the discovery GWAS were conducted. Previous research has highlighted geographic differences in diagnostic criteria of psychiatric conditions (Mitchell et al. 2011; Saito et al. 2022). Because we are associating polygenic scores with measures of behavior and symptomology instead of formal diagnoses, it is possible that diagnostic criteria for schizophrenia and depression used in East Asia happens to be more related to variability in CBCL total problems and KSADS total problems than criteria for these same disorders in Europe. Such effects may explain differences in genetic correlations between eastern and western countries observed for psychiatric disorders (Saito et al. 2022; THE BRAINSTORM CONSORTIUM et al. 2018). In any case, given the generally low performance of these models in the majority of our participants we again caution the reader from over interpreting our results. Additional, more targeted research is needed to fully understand the association between genetics and any behavioral or psychiatric outcome.

There is potential that PS performance on anthropometrics traits in our analysis may be low due to the age of our sample. Previous research has shown that genetic factors exert a particularly strong influence on body height between the ages of 14 and 18 (Silventoinen et al. 2008) and that PRSs of BMI increase in efficacy as an individual’s age passes adolescence into adulthood (Sanz-de-Galdeano et al. 2020). Additionally, age may play a role in the efficacy of our polygenic models based on GWAS of schizophrenia as, in most cases, schizophrenia is diagnosed in late adolescence and early adulthood (Walker et al. 2004), and premorbid declines in cognition are normally assessed during the onset of puberty between the ages of 13 and 16 years (Fuller et al. 2002). Because symptoms related to schizophrenia may not have begun to fully manifest in our population it is possible that our estimates of the genetic variance explained are underestimates of associations at later time points.

This paper also represents an important step in the use of PSs in admixed individuals. There has been research into particular GWAS methods in admixed populations that have had some success in improving PSs (Bitarello and Mathieson 2020; Hou et al. 2022) these analyses often only look at two-way admixture between African ancestry and European ancestry. Our analysis of our admixed population shows encouraging performance in our admixed population agnostic of the degree of admixture and the component ancestry pieces.

While the precise causal mechanisms that underlie these interactions are yet poorly understood, and many traits, especially cognitive and psychiatric traits, are strongly influenced by environmental factors, this article shows the current potential that Bayesian continuous shrinkage PS methods display higher prediction of complex traits across ancestries. If PSs are to achieve clinical validity it is important that they show equitable performance across diverse populations and within populations of individuals that do not fit categorically into continental ancestries (Lewis and Green 2021). More work must continue to be done to improve the power and validity of diverse GWAS summary statistics, as well as to develop genomic datasets in individuals of diverse ancestry.

Limitations

It must be acknowledged that PSs only account for genetic influences. Any sort of modeling based on PS should account for other known non-genetic causal factors. For many traits, genetic factors only account for a portion of the variance in trait outcomes. It is especially important to acknowledge that, for essentially all cognitive and psychiatric traits, environmental factors and gene-environment interactions will be major components in outcomes. ABCD has very rich phenotyping, but this analysis was limited by the public availability of GWAS summary statistics in non-European populations, as the majority of large public GWAS studies have been performed on European populations (Martin et al. 2019). Moreover, while ABCD is a diverse cohort, the majority of participants are of European ancestry.