Introduction

The transition to adulthood is a tumultuous period of biological, physical, social, and emotional changes [1]. In this period, adolescents consolidate their identity, achieve independence from parents, establish adult relationships outside the family, and find a vocation. For young people with disabilities, the transition period may be particularly difficult as they may be at a disadvantage due to their physical, cognitive, or psychosocial impairments, the extra health maintenance skills they need to acquire, lack of experience in activities and participation, social isolation, or by other environmental, family, and personal factors [2, 3]. Failure to make a successful transition to adulthood may result in unnecessary lifelong dependency, unemployment, lack of achievement, and poor quality of life [4].

Use of patient-reported outcomes (PRO) instruments in pediatric clinical medicine has increased in recent years but has yet to become standard practice as a way to assess physical, mental, and social outcomes [5]. Pediatric PRO instruments often have been developed in relative isolation from adult measures of similar constructs; the focus of pediatric PRO instrument development typically is on developing age-appropriate, contextually relevant items [5] rather than ensuring continuity of measurement throughout the lifespan. In cases where multiple versions of a measure are created for use with different age cohorts, it is common to have a parent proxy report, child, and adolescent versions of the same instrument (e.g., PedsQL) [6, 7] that contain different items and may measure slightly different aspects of the underlying construct. A major limitation to approaching pediatric PROs in this way is the inability to compare scores across the age groupings [5]. As the child transitions into adult care and responds to adult PRO instruments, there may be no parallel “adult” measure and therefore no mechanism to compare scores from previous pediatric PRO instruments to those from adult PRO instruments. Lack of comparability renders it impossible to track changes in health outcomes across the lifespan for children aging with a disability. Currently, the comparison of outcomes between pediatric and adult PRO instruments is not possible nor is there a mechanism to monitor outcomes of children as they age through childhood and adolescence and into adulthood. These limitations are major barriers to evaluating and comparing treatment effectiveness and prognosticating long-term outcomes, especially for children with disabilities.

Starting in 2004, the NIH launched the Patient-Reported Outcomes Measurement Information System® (PROMIS®) initiative to design state-of-the-art PRO measurement instruments for a wide range of physical, mental, and social outcomes for children and adults [8, 9]. However, the overlap between pediatric and adult instruments was limited and the development and calibration of the pediatric and adult instruments were largely independent research activities. While the PROMIS pediatric investigators developed newly written items, they also reviewed the adult PROMIS item banks (and created items with content that was relevant to children) and other existing pediatric measures (e.g., PedsQL) to supplement their original item pools. The PROMIS items underwent extensive psychometric testing including a large quantitative study of 8000 children with a range of chronic conditions and children in the general population [10]. Meanwhile, the PROMIS Version 1.0 adult instruments were similarly developed and administered to a sample of over 20,000 adults and calibrated using graded response model IRT using a subsample of participants with demographic characteristics representative of the 2000 US Census [11]. In effect, the PROMIS pediatric and PROMIS adult versions have been evaluated separately and are discrete sets of instruments. Having two independent versions of the PROMIS instruments (for Pediatric and Adults) is problematic as child research participants cannot be followed longitudinally as they move from childhood into adulthood.

The purpose of this study, therefore, is to use IRT to develop a transitional scoring link between the PROMIS adult and pediatric physical health item banks so that studies that follow individuals through the child–adult transition can compare scores on the pediatric instrument with scores from the adult instrument as the study population ages into adulthood. In addition, any study in which both pediatric and adult forms were used would be able to test hypotheses on a common metric. For this investigation, we included item banks assessing physical functioning (i.e., mobility and upper extremity)Footnote 1 as well as physical symptoms (i.e., pain and fatigue). This manuscript describes the administration of PROMIS pediatric and adult physical health instruments to two independent samples of children and young adults and the subsequent use of IRT to produce score linking coefficients to transform PROMIS pediatric scores to their adult equivalency scores (and vice versa) for physical health domains. The linking coefficients for the PROMIS emotional health domains (i.e., adult and pediatric versions of PROMIS Depression, Anxiety, and Anger) have been reported in Reeve et al. [12].

Method

As part of the research consortium developing and validating PROMIS projects, two research studies (one at the University of North Carolina research site and another at the University of Michigan/Boston University site) performed linking studies of pediatric PROMIS instruments with the corresponding adult instruments. The samples, research design, and analytic methods have been described in a previously published study by Reeve et al. [12]. We will briefly review the methodological information below.

Research participants and data collection

Sample 1: Individuals with physical or cognitive disabilities

In the first sample, 188 adolescents (14–17 years old) and 453 young adults (18–25 years old) living with physical and/or cognitive disabilities due to spinal cord injury (SCI), traumatic brain injury (TBI), or cerebral palsy (CP) were recruited into the study at six participating sites: the University of Michigan, Boston University, Craig Hospital (Colorado), Rehabilitation Hospital of Michigan, and the Shriners Hospitals for Children (Philadelphia, Chicago). The University of Michigan served as the primary coordinating institution, and local site personnel were responsible for recruitment. Institutional Review Board (IRB) approval was obtained from all participating sites. Individuals were eligible to participate if (a) a confirmed diagnosis of SCI, TBI, or CP, (b) ability to read and understand English, and (c) ability to respond to self-report scales (e.g., by speaking, using a communication board, or gesturing). Participants with non-traumatic SCI and uncomplicated mild TBI (i.e., a Glasgow Coma Scale score between 13 and 15 with no positive neuroimaging findings) were not eligible to participate. Prior to beginning the study, assent was obtained from adolescents and informed consent from their parents; informed consent was obtained from young adults.

Data were collected by trained interviewers between June 1, 2011 and April 10, 2012. The interview format allowed the research team to include individuals with higher levels of physical or neurocognitive impairment while maintaining a consistent assessment modality across all participants. Interviewers met with participants in person or via telephone and entered their responses to items directly into the Assessment CenterSM data collection platform [13]. Response formats varied across items and measures; printed cards with item-specific response scales were provided by the interviewer to match each item when administered. Response format cards were sent to participants interviewed by phone, who were instructed by interviewers on which to use for each question.

Sample 2: Individuals with “special health care needs”

The second sample was comprised of adolescents (n = 415) and young adults (n = 459) living with health conditions that require specialized health services (e.g., hypertension, cancer, mental health conditions) [14] These individuals, similar to those recruited in the first sample, are prime candidates for HRQOL assessment given the potential of their condition(s) to influence quality of life domains. Adolescents and young adults were recruited from public health insurance programs (Medicaid and Children’s Health Insurance Program [CHIP] in Florida) and the Opinions for Good (Op4G) panel, a research company that maintains an online participant pool and asks participants to donate a portion of their proceeds to charitable organizations. Participants were eligible for participation if they were identified with special health care needs (SHCN; defined by the Clinical Risk Groups [15] in the Medicaid/CHIP sample and by the Special Care Needs Screener [16] in the Op4G sample), were 14–20 years of age, able to read, write, and speak English, and able to access an internet-enabled computer. The University of North Carolina at Chapel Hill served as the coordinating site and the University of Florida was responsible for data collection. The study protocol was approved by the IRB at each institution. Data were collected between April 1, 2012 and September 30, 2013. Assent was obtained from adolescent participants and informed consent was obtained from young adult participants and parents of adolescent participants.

Measures

Demographic information (e.g., age, sex, race, ethnicity, education level) was obtained for all participants. Additionally, participants with disabilities were asked to provide information on their methods of mobility (e.g., wheelchair use), and secondary medical complications (e.g., neurogenic bowel/bladder). Participants in the SHCN sample provided additional information related to their health condition(s). All participants completed the PROMIS pediatric and adult short forms for the following domains: Pain Interference, Fatigue, Peer Relationships, Depression, Anxiety, and Anger. Participants also completed the adult Physical Functioning and pediatric mobility and pediatric upper extremity scales because there are two pediatric short forms based on two physical functioning sub-domains (Mobility and Upper Extremity), whereas there is only a single adult short form for Physical Function. All items utilize a 5-category Likert-type format; higher item responses reflect greater functional ability on the Physical Function short forms and more severe symptoms on the Pain Interference and Fatigue short forms. The measures examined in this study are listed in Table 1 along with example items and associated response options. Measures were administered in random order to minimize the likelihood of order effects. Expected a Posteriori (EAP)Footnote 2 scores for the item response patterns were calculated for each measure and transformed to the standard PROMIS T-score metric (M = 50, SD = 10) [10].

Table 1 Pediatric and adult PROMIS® physical function measures and example items

Analysis

Analyses were conducted on each sample separately and results were compared. Descriptive statistics and correlations between T-scores on the pediatric and adult measures were used initially to indicate which type of linking was most applicable. In educational testing, a correlation of 0.866 or higher between measures has been proposed as a prerequisite of unidimensional linking [17], which benchmarks a 50% or greater reduction in the uncertainty that results from predicting one measure from another (i.e., the scale score standard deviation). Although we employ this criterion in the current study so that estimations can be made on an individual level, it is important to state that a less stringent criterion (e.g., r = 0.75–0.80) is likely acceptable for lower stakes settings where decisions are not made on individual people (e.g., health outcomes research at the group level) [17, 18].

The Root Expected Mean Square Difference (REMSD) was calculated to further evaluate the tenability of linking. The REMSD statistic reflects the degree to which linking is invariant for important subgroups within a population. Given two subgroups (e.g., males and females, adolescents and young adults), the REMSD can be calculated by computing the standardized mean difference (SMD) between the subgroups on one measure and subtracting it from the SMD computed between the subgroups on the second measure. Subgroup invariance is achieved when the subgroups differ by about the same amount—and in the same direction—on each measure; that is, the difference between SMDs for the two measures is close to zero. In the present study, subgroup invariance was evaluated between males and females as well as between adolescents and young adults. Using data from college admissions (i.e., high-stakes) testing, Dorans and Holland [19] concluded that REMSD values below 0.08 were generally supportive of subgroup invariance.

Disattenuated correlations were also calculated via the estimation of two-dimensional IRT models with pediatric items loading onto the first latent variable and adult items loading onto the second latent variable. Specifically, two-dimensional graded-response models were fit in the IRTPRO [20] software using maximum likelihood estimation. As Reeve et al. [12] found with the PROMIS emotional distress measures, the pediatric and adult measures were not expected to be perfectly correlated, even after obtaining disattenuated estimates. If correlations were trivially different from 1, we planned to proceed with symmetrical (e.g., scale alignment) linking methods. However, we anticipated that the PROMIS pediatric and adult physical health measures would be highly correlated yet below the level at which symmetrical linking is justified. We therefore planned to proceed with an asymmetrical linking method called calibrated projection which relies on predicting one measure from another, and vice versa, in order to unify metrics. Calibrated projection [21] is a newly developed linking method that does not require the assumption that underlying constructs to be linked are the same, as is the case with symmetrical linking procedures. Calibrated projection does not require values on the predictor variable to be fixed; score distributions (as opposed to point estimates) are projected from one scale to another. Thissen et al. [22] simplified this technique by introducing a linear approximation that is computationally simpler and makes explicit the use of regression in the linking predictions. Therefore, in this article we used the linear approximation to calibrated projection (LACP) method described by Thissen et al. [22]. We briefly review this procedure here and direct interested readers to Thissen et al. [22] and Reeve et al. [12] for further information.

To proceed with the LACP given two scales, we first let θ1 represent an arbitrary score on the latent construct underlying the first scale and let θ2 represent an arbitrary score on the latent construct underlying the second scale. Given θ1, the goal of calibrated projection is to predict an associated value θ2 on the metric of the second scale, and vice versa when given θ2 to predict θ1. The prediction is accomplished via the linear regression:

$$\widehat {{{\text{EAP}}}}\left[ {{\theta _2}} \right]=~{\beta _0}+{\beta _1}{\text{EAP}}\left[ {{\theta _1}} \right].$$
(1)

In Eq. 1, the regression coefficients β0 and β1 (intercept and slope) are hereafter referred to as ‘linking’ coefficients. Because the linkage is asymmetrical, the coefficients used to predict \(\widehat {{{\text{EAP}}}}\left[ {{\theta _1}} \right]\) will differ from those used to predict \(\widehat {{{\text{EAP}}}}\left[ {{\theta _2}} \right]\). The prediction in Eq. 1 is computed implicitly using IRT methods following the estimation of a multi-dimensional IRT (MIRT) model. In this example, a two-dimensional model is used such that items from the first scale load onto the first latent variable, and items from the second scale load onto the second latent variable. The prediction in Eq. 1 can be carried out by explicitly calculating the linking coefficients using estimates of the mean and covariance matrix derived from the MIRT model, which is the approach used in the LACP method. The standard deviations of the projected scores, which are taken as estimates of the projected score standard errors (and thus used to compute confidence intervals around individual scores), are automatically produced using IRT methods in the calibrated projection approach originally proposed [21]. Subsequently, Thissen et al. [22] suggested the SDs can be approximated using the formula:

$$\widehat {{{\text{SD}}}}\left[ {{\theta _2}} \right]=~\sqrt {\beta _{1}^{2}{\text{S}}{{\text{D}}^2}\left[ {{\theta _1}} \right]+{\text{MSE}}}, $$
(2)

where MSE is the mean squared error computed from Eq. 1 and SD2[θ1] is the variance of the posterior distribution for observed θ1. It can be seen in Eq. 2 that the approximation linearly combines two sources of error: the error variance associated with the observed θ1 estimate, and the error variance associated with the projection procedure (Eq. 1).

The scales used in this study were based on published item banks calibrated in independent samples from those reported here. To preserve the metric of the original scales, parameters in the MIRT model were fixed to those estimated in the original calibrations. The resulting mean and covariance matrix estimates for θ1 and θ2 were then estimated with the current data and used to calculate the coefficients for each linkage [22]. A summary of the LACP procedure as used in this study is provided in Fig. 1.

Fig. 1
figure 1

Summary of Linking Procedure (Linear Approximation to Calibrated Projection). The diagram in step 1 of the example (Ex.) represents a path diagram—typically reserved for describing factor analytic or structural equation models—of the two-dimensional IRT model. In this diagram, circles represent the two latent variables underlying the set of combined items, which are represented by boxes. Terms in boldface in steps 2 and 3 of the example represent linking coefficients. Subscripts reflect whether the term is an intercept (0) or slope (1) as well as whether it is used to link the Pediatric form to the Adult form (P > A) or vice versa (A > P)

Linkages were evaluated by comparing participants’ projected scores on a given scale with their actual scores on the same scale (made possible by the fact that in both samples participants completed all PROMIS pediatric and adult measures). Comparisons were made in terms of confidence interval coverage; specifically, linkages were considered efficacious if approximately 68% of the observed values were within 1 SD of the associated projected values and approximately 95% of the observed values were within 2 SDs of the associated projected values. These criteria follow from the fact that the SDs were used as standard error estimates for the projected scores, and that scores are assumed to be normally distributed with repeated sampling. Thus, one would expect that 100 (1 − α)% of the time the “true” scores (i.e., participants’ actual scores on the measure) would reside within the projected scores’ confidence intervals for a chosen width.

Results

Table 2 displays the demographic characteristics for the two samples with values displayed separately for the adolescent and young adult participants. Males made up a greater share of the sample of individuals with disability (62.9%), reflecting population estimates that show that SCI and TBI are more common in males than females [23, 24]. The sample of individuals with disability was largely white (80.5%) and non-Hispanic (84.2%), with 37.8% of participants diagnosed with SCI, 31.5% with TBI, and 30.7% with CP.

Table 2 Demographic and clinical characteristics of the study samples

Descriptive statistics for the PROMIS pediatric and adult measures are shown in Table 3. Participants with a disability reported lower T-score averages on all measures. Correlations between the pediatric and adult versions in each sample overall ranged between 0.77 and 0.90 (Table 4; correlations are also provided for demographic subgroups); only one of the measures (Mobility, in the sample with a disability) exceeded Dorans’ [17] criterion of 0.866 for unidimensional equating. Table 5 displays REMSD statistics, estimated latent variable correlations (ρ) and associated correlation SEs. REMSD values were below the 0.08 threshold in all cases except when comparing genders on the Mobility (0.097) and Pain (0.170) scales in the SHCN sample. Although exceeding the 0.08 upper limit commonly used to establish subgroup invariance, these REMSD values were not replicated in the sample with a disability. Finally, estimated latent variable correlations (i.e., disattenuated correlations estimated from the 2-D IRT models) ranged between 0.84 and 0.95 for the PROMIS pediatric and adult scales. These values are large, suitable for health outcomes research, and most exceed the very conservative criterion of 0.866 that was recommended for high-stakes educational testing. However, statistical comparisons between unidimensional and multi-dimensional models that contained pediatric and adult items in the same model (e.g., Pediatric PROMIS Fatigue and Adult PROMIS Fatigue items) revealed that multi-dimensional models provided a better fit to the data. These results suggest that symmetrical equating methods, such as those based on unidimensional IRT models, are inappropriate. However, asymmetrical linking methods, such as the LACP method used in this study, were considered applicable.

Table 3 Descriptive statistics and correlations between PROMIS® Pediatric and adult measures
Table 4 Correlations between pediatric and adult PROMIS forms by age, sex, disability type, special health care needs type, education
Table 5 Root expected mean square difference (REMSD) by age and sex and estimated latent variable correlation ρ between the pediatric and adult constructs

LACP results are presented in Table 6. To evaluate the precision of the approximation, sum score conversion tables of the PROMIS pediatric and adult measures were created using calibrated projection and approximate calibrated projection [12, 22]. Across the entire range of sum scores, the two methods produced scores that are virtually identical. Standard deviations created under the approximation method were between 0.9 and 1.8 times those derived from calibrated projection. Therefore, the LACP worked well and the intercept and slope coefficients are presented in Table 6. Associated 95% confidence intervals as well as the estimated MSE from the projections are provided. Comparing the disabilities and SHCN samples, the regression slopes are quite similar. Across the entire range of observed scale scores, the regression lines never differed by more than one-third of a standard deviation (i.e., 3 T-score points). Given the apparent comparability, the linking coefficients were averaged across the two samples.

Table 6 Regression coefficients β0 and β1, and MSE for the calibrated projection from θ1 to θ2

In Table 7, the proportion of observed scores falling within ± 1 SD and ± 2 SDs of the projected scores are reported separately for the two samples. Estimates are provided both for the pediatric-to-adult and adult-to-pediatric linkages. Proportions based on same-sample coefficients (see Table 7 footnote) are provided in the first row block and reflect model fit. Proportions based on cross-sample coefficients are provided in the second row block and reflect cross-sample validation. Finally, proportions based on the averages of the linking coefficients across samples are presented in the third row block. Proportions were close to the confidence interval widths chosen (68% and 95%) with the exception of the Pediatric Upper Extremity—Adult Physical Functioning linkage. For this linkage and in both directions, proportions of observed values within 1 SD of projected values were well below 68% in most cases, with values ranging between 0.37 and 0.59. This is due to the distribution of scores for those scales: Over 20% of the respondents have perfect (maximum) scores on both scales, which produces a single point mass in the distributions with over 20% of the data. The fact that these large blocks, and adjacent nearly perfect-score blocks, have residuals between 1 and 2 SDs from the mean reduces the observed proportion within ± 1 SD from the nominal 0.68 to 0.37–59. Excluding this linkage from consideration, proportions ranged between 0.61 and 0.75 for 1 SD and between 0.92 and 0.96 for 2 SD when the linking coefficients were averaged, which is considered acceptable.

Table 7 Proportions of EAP[θ2] values within ± 1 and ± 2 SD of the values obtained using linear approximation to calibrated projection

Discussion

The PROMIS measurement system was designed to address several needs in PRO assessment [8]. Primary among these was the lack of a common set of standardized and validated instruments that could be used to measure important PROs in the general population and across a wide variety of populations with health conditions. While the initial and subsequent releases of PROMIS have made just such a set of measures available for both research and clinical use, at a more fundamental level, a common longitudinal metric of overall health and functioning for children and young adults was lacking. In the current study, we set out to create a bridge to connect individuals’ scores across PROMIS pediatric and adult physical health measures. This work allows investigators who enroll children in longitudinal studies to follow them over years, switching from the pediatric version of the scale to the adult version. Clinicians will also be able to administer either the pediatric or adult PROMIS measures as appropriate and convert scores from the pediatric version to equivalent adult version scores (and vice versa) using the newly created linking coefficients. A computer application has been developed to assist researchers and clinicians compute these linkages and is available at https://sites.udel.edu/chs-chart/.

We first assessed the viability of linking PROMIS pediatric measures of Fatigue, Pain, Upper Extremity Functioning, and Mobility to commensurate adult measures in two healthcare-specific samples. Although pediatric and adult measures within each domain were highly correlated, construct equivalence could not be established. However, the REMSD index of subgroup invariance suggested that asymmetrical linking was possible. Therefore, we used the recently developed LACP procedure [22] to connect the pediatric and adult measures by establishing unidirectional prediction functions that can be used to convert scale scores from one instrument to another. The linear approximation was close, as observed scores and LACP-predicted scores on the same measure were nearly identical. Additionally, except for the PROMIS Pediatric Upper Extremity–PROMIS Adult Physical Functioning linkage, the approximation did not substantially inflate score error estimates.

The regression coefficients produced in Table 6, which can be used to convert PROMIS pediatric scale scores to PROMIS adult scale scores or vice versa, were evaluated by calculating the proportion of individuals’ observed scale scores falling within ± 1 or 2 SDs of associated projected scores (Table 7). The results provided general support for the pediatric-to-adult and adult-to-pediatric linkages: the proportion of observed scores within 1 SD (2 SDs) of the predicted scores were close to 68% (95%) as expected. One exception to this finding, again, was the linkage between the PROMIS Pediatric Upper Extremity scale and the adult PROMIS Physical Functioning scale: in both directions, the calculated proportions were below those expected based on the normal distribution. Notably, this linkage also results in the highest inflation of estimation error resulting from the linear approximation as described earlier. Both of these phenomena likely are due to the point mass in the distributions for Upper Extremity and Physical Function. While the linking coefficients in Table 6 to link the Pediatric Upper Extremity and Adult Physical Function measures may be useful in research settings when large data sets are compared and group-level inferences are of interest, it is not recommended to use these linking coefficients in any clinical or other high-stakes setting where decisions about an individual person are made. The linking coefficients do not provide the level of accuracy and the large prediction errors could result in inaccurate decisions for an individual.

As shown in Table 6, the MSE is smaller from pediatric to adult than adult to pediatric. The implication is that the adult-to-pediatric predictions will have more linking error than the converse. Although the Upper Extremity-Physical Function linkage results in the highest MSE discrepancy and lowest distributional overlap, this is likely due to a distributional artifact as explained above.

Given that Upper Extremity and Mobility-specific PROMIS adult items have been identified subsequent to the conduct of this study, and that Hays et al. [25] have identified a subset of these adult items that have corresponding pediatric items, future work could use the LACP projection procedure to produce coefficients to directly link the PROMIS Pediatric and Adult Mobility measures and, separately, link the PROMIS Pediatric and Adult Upper Extremity measures. It would be ideal to include a substantial subsample of individuals with upper extremity limitations in this work.

A clear strength of this study was the use of two samples which (a) allowed for descriptive comparisons and explicit cross-validation, and (b) represent distinct and important healthcare populations whose constituents are the primary beneficiaries of PRO research. Furthermore, LACP was able to produce successful linkages while being flexible enough to accommodate the special case of linking two pediatric physical functioning measures (Upper Extremity and Mobility) to the single adult physical functioning measure. However, future work including participants from a wider age range (< 14 and > 20) would add insight to the generalizability of the findings reported here.

Ideally, the same PRO measure could be used consistently over time to compare scores between individuals and, especially, within individuals over time. However, given the rapid pace of advances in PRO measurement and the diversity of developmentally appropriate constructs and items over the lifespan, it is necessary to employ PRO linking procedures to approximate the results that would be obtained through the perpetual administration of a static measure. The results of this study and those reported in Reeve et al. [12] represent a significant contribution to longitudinal research that assesses children across their lifespan and as such offers a significant contribution to PROMIS and the PROMIS linkages published in recent years [18]. However, estimating scores through linking procedures never will be as accurate as directly comparing scores on the same measure. The projection procedures applied here will always involve prediction error, but is nevertheless one possible solution when the same measurement tool cannot be utilized across the lifespan. Linking efforts appear to be increasing in popularity among PRO researchers in a variety of health-related fields and, as a consequence, the field would benefit from more research on linking methods in the context of health outcomes assessment. Methodologies currently used (including those reported in this article) originally were designed for measure linking in educational and high-stakes testing settings and may need further development in the healthcare context [18]. For instance, one limitation of the current study is the use of the normal distribution to compare actual vs. predicted scores, which may be less informative when substantial floor/ceiling effects are present (as is common in health outcomes measures).

Conclusion

The results presented herein are part of a larger collective effort aiming to establish a common metric for PROs regardless of the measurement instrument administered or the characteristics of the person assessed. In particular, the linking coefficients provided in this study can be used by researchers to calculate scale scores on PROMIS pediatric or adult physical health scales given data on only one age-specific instrument. As a result, researchers may potentially achieve more accurate measurement in cross-sectional studies spanning multiple age groups or longitudinal studies that require comparable measurement across distinct developmental stages.