Main

The predisposition to common diseases and traits arises from a complex interaction between genetic and non-genetic factors. In the past decade there has been enormous success at discovering disease-associated genetic variants, facilitated by many collaborative consortia and large cohorts of well-phenotyped individuals with matched genetic information1,2,3,4,5. In particular, genome-wide association studies (GWASs) have yielded summary statistics that describe the magnitude (effect size) and statistical significance of association between an allele and the outcome of interest4,6. GWASs have been applied to many complex human traits and diseases, including height, blood pressure, cardiovascular disease, cancer, obesity and Alzheimer’s disease.

The associations identified through GWASs can be combined to quantify genetic predisposition to a heritable trait, and this information can be used to conduct disease risk stratification or to predict prognostic outcomes and response to therapy7,8. Typically, information across many variants is combined by means of a weighted sum of allele counts, in which the weights reflect the relative magnitude of association between variant alleles and the trait or disease. These weighted sums can include millions of variants, and are frequently referred to as polygenic risk scores (PRSs), or genetic or genomic risk scores (GRSs), if they refer to risk estimates of disease outcomes; or, more generally, polygenic scores (PGSs) when referring to any outcome (Box 1). Although algorithms are actively being developed to decide how many and which variants to include and how to weigh them so as to maximize the proportion of variance explained or disease discrimination, there is an emerging consensus that inclusion of variants beyond those that meet stringent GWAS significance levels can boost predictive performance9,10. Methodological research has also established theoretical limits of the performance of PGSs and PRSs on the basis of the genetic architecture and heritability of the trait11,12,13,14,15.

In the last decade, the landscape of genetic prediction studies has transformed. Over 900 publications mention PGS or PRS, and there have been significant developments in how PRSs are constructed and evaluated, as well as many new proposed uses. The data available in the current era of biomedical research are larger and more consolidated than ever before. Biobanks and large-scale consortia have become dominant, yet frequently researchers have limited access to individual-level data. As individual data are often unavailable, most PRS models are developed from summary-level data (for example, GWAS summary statistics) in secondary datasets, each of which come with their own specific methodological considerations16,17,18. At the same time, there has been a push towards open data sharing as outlined in the FAIR (Findable, Accessible, Interoperable and Reusable) Data Principles3,19, with an emphasis on ensuring that research is reproducible by all.

The capacity of PRSs to quantify genetic predisposition for many clinically relevant traits and diseases has begun to be established, with many potential clinical uses in settings related to disease risk stratification as well as proposed prognostic uses (for example, predicting responses to intervention or treatment). Readiness for implementation varies by outcome, and mature PRSs with potential clinical utility are available for only a few diseases—for example, coronary heart disease (CHD) and breast cancer (Boxes 2 and 3, respectively). There has also been a rapid rise of direct-to-consumer assays and for-profit companies (23andMe, Color, MyHeritage, and so on) that provide PGS and PRS results to customers outside of the traditional patient–provider framework. These concomitant developments have resulted in healthcare systems developing new infrastructures to deliver genetic risk information20. Individually and combined, these advances have raised substantial challenges for PRS reporting standards—from the very basic (for example, reporting performance metrics on an external validation dataset) to the complicated (for example, making raw variant and weight information for a PRS available)—and necessitate the updating of existing standards for reporting genetic risk prediction studies to convey the increased scope of PRSs and the complexity of their clinical applications.

Poorly designed and/or described studies call into question the validity of some PRSs to predict their target outcome21,22, and relatively few studies have externally benchmarked the performance of multiple scores. At present, there are no best practices for developing PRSs that are uniformly agreed upon, nor are there widely adopted standards or regulations that are sufficiently tailored to assess the eventual clinical readiness of a PRS. There are emerging applications of PRSs that further compound the heterogeneity in reporting; for example, using PRSs as tools for testing gene–environment interactions or shared aetiology between diseases23,24,25,26. The rapid evolution in both methodological development and applications of PRSs make it challenging to compare or reproduce claims about the predictive performance of a PRS for a specific outcome when studies are not properly documented. These deficiencies are barriers to PRSs being interpreted, compared, and reproduced, and must be addressed to enable the application of PRSs to improve clinical practice and public health.

Frameworks have been developed to establish standards around the transparent, standardized, accurate, complete and meaningful reporting of scientific studies. In 2011, an international working group published the Genetic Risk Prediction Studies (GRIPS) Statement—a set of reporting guidelines for risk prediction models that include genetic variants, from genetic mutations to gene scores27. These guidelines are analogous to those developed for observational epidemiological studies (STROBE28) and genome-wide association studies (STREGA29), and are in line with the reporting guidelines for multivariate prediction models (TRIPOD30). Adherence to reporting statements has been low, and the same holds for GRIPS. One reason might be that researchers feel that the GRIPS Statement inadequately addresses PRSs. Researchers are frequently uncertain as to what precisely should be reported for a PRS study to be assessed as rigorous, reproducible and ultimately translatable, especially with the increased push for data availability and transparency. Most PRS studies follow a prototypical process (Fig. 1) that can be used as a template for standardizing reporting and benchmarking in the field.

Fig. 1: Prototype of PRS development and validation process.
figure 1

The prototypical steps for PRS construction, risk model development and validation of performance are displayed with select aspects of the PRS-RS guidelines (labelled in bold). During PRS development, variants associated with an outcome of interest, typically identified from a GWAS, are combined as a weighted sum of allele counts. Methods for optimizing variant selection (PRS construction and estimation) are not shown. To predict the outcome of interest the PRS is added to a risk model and may be combined with non-genetic variables (for example, age, sex, ancestry or clinical variables; collectively referred to as risk model variables). After fitting procedures to select the best risk model, this model is validated in an independent sample. The PRS distribution should be described, and the performance of the risk model demonstrated in terms of its discrimination, predictive ability and calibration. Though not displayed in the figure, these same results should also be reported for the training sample for comparison to the validation sample. In both training and validation cohorts, the outcome of interest criteria, demographics, genotyping and non-genetic variables should be reported (Table 1) HLA, human leukocyte antigen; HR, hazard ratio; IDI, integrated discrimination improvement; IQR, interquartile range; NRI, net reclassification improvement; OR, odds ratio; β, effect estimate from linear regression.

Here, the Clinical Genome Resource (ClinGen) Complex Disease Working Group and the Polygenic Score (PGS) Catalog (Supplementary Note 1) jointly present the Polygenic Risk Score Reporting Standards (PRS-RS), an expanded and updated set of reporting standards for PRSs that addresses current research environments with advanced methodological developments to inform clinically meaningful reporting on the development and validation of PRSs in the literature, with an emphasis on reproducibility and transparency throughout the development process. Additional methods are detailed in Supplementary Note 2.

The PRS-RS

The PRS-RS is a set of standard items specifying the minimal criteria that need to be described in a manuscript to accurately interpret a PRS and reproduce results throughout the PRS development process31 (see Fig. 1 for a brief summary). It applies to PRS development and validation studies that aim to predict disease onset, diagnosis and prognosis, as well as response to therapies; however, other research uses of PRSs have overlapping steps that should be reported in a similar manner. Table 1 presents the full PRS-RS, with reporting items organized into key components along the developmental pipeline of PRSs for clear interpretation and to encourage their documentation from the inception of the study, well before publication.

Table 1 Polygenic Risk Score Reporting Standards (PRS-RS)

Reporting on the background of risk scores

The development and validation of a PRS tests a specific hypothesis with a defined outcome and study population. Therefore, authors should define a priori (note that in the next few sections, inverted commas are used to refer to each of the Reporting Standards referred to in Table 1) the ‘study type’ (for example, development and/or validation), ‘risk model purpose’ (for example, risk prediction versus prognosis) and ‘predicted outcome’ (for example, CHD) in enough detail to understand why the study population and risk model selected are relevant (for example, the value for CHD risk stratification and primary prevention is highest in younger individuals compared to those over 80 with lifetime accumulated risk). As the PRS-RS is focused on clinical validity and implementation, authors must outline the study and appropriate outcomes to understand what risk is measured, what the purpose of measuring risk would be and why this purpose may be of clinical relevance. To establish the internal validity of a study, authors should use the appropriate data for the intended purpose (for example, prediction of incident disease versus prognosis), with adequate documentation of dataset characteristics to understand nuances in measured risk.

Reporting on study populations

The applicability of any risk prediction to an external target population (the who, where, and when) depends on its similarity to the original study populations that were used to derive the risk model. Therefore, authors need to define and characterize the details of their study population (‘study design and recruitment’), and describe study ‘participant demographics’ for key variables (most often age and sex) and ancestry. Notably, there are often inconsistent definitions and levels of detail associated with ancestry, and the transferability of genetic findings between different racial and ethnic groups can be limited1,9,32. It is therefore essential for authors to provide a detailed description of the genetic ‘ancestry’ of participants—including how ancestry was determined—using a common controlled vocabulary where possible (for example, the standardized framework developed by the NHGRI-EBI GWAS Catalog1). Authors should provide a sufficient level of detailed criteria for defining all of the factors relevant to the ‘outcome of interest’, including but not limited to those used in the risk model (‘non-genetic variables’). These details should accompany information about how the population was genotyped (‘genetic data’), including assays and all quality control measures.

Reporting on the development of risk models

At present there are several commonly used methods to select variants that constitute the PRS and fine-tune their weights7,16,17,18,31. Methods using GWAS summary statistics should clearly cite the relevant GWAS, preferably using unique and persistent study identifiers from the GWAS Catalog (GCSTs)33. As the performance and limitations of the combined risk model are dependent on methodological considerations, authors must provide complete details including the method used and how variants are combined into a single PRS (‘PRS construction and estimation’). Apart from genetic data, authors should also describe the defining criteria for other demographic and non-genetic predictors (‘non-genetic variables’) included in the model. Often authors will iterate through numerous models to find the optimal fit. In addition to the estimation methods, it is important to detail the ‘integrated risk model(s) fitting’ procedure, including the measures that were used for the selection of the final model. Translating the continuous PRS distribution to a risk estimate, whether absolute or relative, is highly dependent on assumptions and limitations that are inherent to the specific dataset used. When describing the ‘risk model type’, authors should detail the timescale used for prediction, or the study period and follow-up time for a relative hazard model. Furthermore, if relative risk is estimated, the reference group should be well described. These details should be described for the training set, as well as for validation and sub-group analyses.

Reporting on the evaluation of risk models

Authors should report estimates for all evaluated models (including the methods used to derive them) to equip readers with the information necessary to evaluate the relative value of an increase in performance against other trade-offs. We recommend that authors provide summary information of the ‘PRS distribution’ to aid in model interpretation. The ‘predictive ability’, ‘calibration’, and ‘discrimination’ of the risk model should also be assessed and detailed with common descriptions including the risk score effect size, variance explained (R2), reclassification indices and metrics like sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The risk model ‘calibration’ and ‘discrimination’ should be described for all analyses, although their estimation and interpretation are most relevant for the PRS validation sample. It is imperative for the PRS and integrated risk models to be evaluated on a population that is external (independent, non-overlapping) to the individuals in the study population. The ability of the risk model to classify individuals of interest (‘risk model discrimination’) is commonly described and presented in terms of the area under the receiver operating characteristic (AUROC) or precision-recall curve (AUPRC), or the concordance statistic (C-index). Any differences in variable definitions or performance discrepancies between the training and validation sets should be described.

Reporting on interpretation

By explicitly describing the ‘risk model interpretation’ and outlining potential ‘limitations’ to the ‘generalizability’ of their model, authors will empower readers and the wider community to better understand the risk score and its relative merits. Authors should justify the clinical relevance and the ‘intended uses’ of the risk model, such as how the performance of their PRS compares to other commonly used risk models, or previously published PRSs. This may also include comparisons to other genetic predictors of disease (for example, mutations in high or moderate risk genes associated with Mendelian forms of the disease), family history, simple demographic models or conventional risk calculators (see Boxes 2 and 3 for disease-specific examples). What indicates a ‘good’ prediction can differ between outcomes and intended uses, but should be reported with similar metrics to those described in the evaluation section.

Reporting on model parameters

The underlying PRS (variant alleles and derived weights) should be made publicly available, preferably through direct submission to an indexed repository such as the PGS Catalog, to enable others to reuse existing models (with known validity) and to facilitate direct benchmarking between different PRSs for the same trait (thus promoting ‘data transparency and availability’). The current mathematical form of most PRSs—a linear combination of allele counts—facilitates clear model description and reproducibility. Future genomic risk models may have more complex forms; for example, allowing for explicit non-linear epistatic and gene–environment interactions, or deep neural networks of lesser clarity. It will be important to describe these models in sufficient detail to allow their implementation and evaluation by other researchers and clinical groups.

Supplementary Note 3 provides explanations of reporting considerations in addition to the minimal reporting framework in Table 1. Authors intending downstream clinical implementation should aim for the level of transparent and comprehensive reporting covered in both Table 1 and the Supplementary Notes, especially concerning points related to discussing the interpretation, limitations and generalizability of results. The proper reporting of the development and performance of PRSs can also have implications for seeking regulatory approval of the PRS as a clinical test. Although not a comprehensive list of regulatory requirements, we highlight aspects of the PRS-RS that would be considered evidence of analytical and clinical validity from the perspective of the College of American Pathologists (CAP) and the Clinical Laboratory Improvement Amendments (CLIA) (Supplementary Table 1). CAP and CLIA approvals are additional incentives for the reporting adherence of researchers who wish to translate their work to the clinic, as well as a caution for researchers who want to avoid their findings being put to unintended use. Finally, we reiterate the need for both methodological and data transparency, and we encourage the deposition of PRSs (variant-level information necessary to recalculate the genetic portion of the score) in the PGS Catalog (www.pgscatalog.org34), which provides an invaluable resource for the widespread adoption and distribution of a published PRS. The PGS Catalog provides access to PGSs and related metadata to support the FAIR principles of data stewardship19, enabling subsequent applications and assessments of PGS performance and best practices (see Supplementary Table 2 for a description of the metadata captured in the PGS Catalog and its overlap with the PRS-RS).

Improving PRS research and translation

We surveyed 30 publications (selecting for a range of disease domains, risk score categories and populations) to understand how the information in the PRS-RS is presented and displayed as part of the larger iterative process to clarify and improve minimal reporting item descriptions. For 10 of these publications, we provide detailed annotations using the final minimal reporting requirements (Supplementary Table 3) and use these annotations to illustrate the detail necessary for each PRS-RS item (further described in Supplementary Note 3). The heterogeneity in the PRS reporting we observed in this pilot highlights a series of challenges. Critical aspects of PRS studies—including ancestry, predictive ability, and transparency or availability of information needed to reproduce PRSs—were frequently absent or reported in insufficient detail. This underscores the need for the PRS-RS to clearly and specifically define meaningful aspects of PRS development, testing and intended clinical use. However, these deficits in reporting are not unique to PRSs; previous reports of underreporting have found that 77% of GWAS publications in 2017 did not share summary statistics35 and 4% of GWASs do not report any relevant ancestry information1. In line with the push towards a culture of reproducibility and open data in genomics, we as the ClinGen Complex Disease Working Group and PGS Catalog joined to create this set of reporting standards (Table 1), which is specifically tailored to PRS research and adapts the previous standards on the basis of the opinions of multidisciplinary and international experts.

Researchers using the PRS-RS may identify fringe cases that are inadequately captured by these reporting items, as we have modelled our guidelines on prototypical steps for PRS development (Fig. 1). Although we anticipate that the field may change further as novel methods and technologies are generated, the PRS-RS items can be expanded and adapted to encompass new considerations. By updating previous standards, drawing on the knowledge of leaders in the field and tailoring the framework to common barriers observed in recent literature, we aim to provide a comprehensive and pragmatic perspective on the topic. In line with previous standards, the PRS-RS includes elements related to understanding the clinical validity of PRSs and consequent risk models. Items such as ‘predicted outcome’ and ‘intended use’ bookend our guidelines with the intended clinical framing of PRS reporting. In addition, we have modelled the guidelines by steps in experimental design—from hypothesis to interpretation—to more clearly emphasize the importance of considering the risk model’s intended purpose in defining what needs to be reported and to inform documentation throughout the process. As a reference, we have included a guide to where PRS-RS items should be reported in a manuscript in Supplementary Table 4. These expansions will further facilitate the curation and expert annotation of published PRSs as we move towards widespread clinical use.

Although the scope of our work encompasses clinical validity, it does not address the additional requirements that are needed to establish the clinical or public health utility of a PRS, such as randomized trials with clinically meaningful outcomes, health economic evaluations or feasibility studies36. In addition, the translation of structured data elements into useful clinical parameters may not be direct. One example is that the case definitions used in training or validation in any particular PRS study may deviate (sometimes substantially) from those used in any specific health system. CHD symptoms commonly include angina (chest pain), whereas PRSs are frequently trained on stricter definitions excluding angina. Another example is that the definitions used for race or ancestry as outlined in the PGS Catalog and the GWAS Catalog1 may differ from structured terms used to document ancestry information in the clinic. Consistent mappings and potentially parallel analyses may be necessary to translate from genetically determined ancestries to those that are routinely used in clinical care. Such translation issues potentially limit generalizability to target populations and warrant further discussion, and we reiterate the need for authors to be mindful of their intended purpose and target audience when discussing their findings. Authors’ understanding of potential translational barriers can be aided by considering the current CAP and CLIA analytical and clinical validity evidence requirements of peer-reviewed literature to ensure that the PRS-RS has value in informing later steps of the clinical translation spectrum, including clinical utility (Supplementary Table 1). Finally, although the principles of this work are clear, its scope does not include the complex commercial restrictions—such as intellectual property—that may be placed on published studies with regard to the reporting or distribution of PGSs or the data that underlie them. We hope that our work will inform downstream regulation and transparency standards for PRS as a commercial clinical tool.

The coordinated efforts of the ClinGen Complex Disease Working Group and PGS Catalog provide a set of compatible resources for researchers to deposit PGS- and PRS-related information. The PGS Catalog (www.PGSCatalog.org) provides an informatics platform, with data integration and harmonization to other PGSs as well as the source GWAS study through its sister platform, the GWAS Catalog1,34. In addition, it provides a structured database of scores (variants and effect weights) that can be reused, along with metadata requested in the PRS-RS. With these tools, the PRS-RS can be mandated by leading peer-reviewed journals and, consequently, the quality and rigour of PRS research will be increased to a level that facilitates clinical implementation. We encourage readers to visit the ClinGen website (https://clinicalgenome.org/working-groups/complex-disease/) for any future changes or amendments to our reporting standards.

Although we have provided explicit recommendations on how to acknowledge study design limitations and their effects on the interpretation and generalizability of a PRS, future research should attempt to establish best practices to guide the field. Moving forward, supplementary frameworks should be developed for the reporting of new methods, such as deep learning, as well as requirements for clinical utility and readiness. Together, the PRS-RS enables the rapid development of PRSs as potentially powerful tools for the translation of genomic discoveries into clinical and public health benefits, and provides a framework for PRSs to transform multiple areas of research in human genetics.