Bandura defines self-efficacy as individuals’ confidence in performing a required behavior or task to achieve a desired outcome [1]. To be specific, individuals’ self-efficacy is not related to the skills they have, but their confidence in what they can perform [2]. For example, a patient may have the skills needed to manage medications but may not be confident to do so.

Self-efficacy has consistently predicted the initiation of behavioral changes and maintenance of acquired health behaviors [3]. Several studies have demonstrated that self-efficacy is associated with individuals’ health outcomes (e.g., physical ability, behavioral dysfunction, depression, and anxiety) and disability [4,5,6,7,8]. For example, Meredith et al. [7] reported that pain self-efficacy was more predictive of patients’ disability than pain intensity. As such, self-efficacy is currently used to tailor self-management programs for managing health conditions (e.g., via goal setting, coping, or problem-solving skills) [9, 10].

It is important to note that self-efficacy is entirely behavior- and task-related [11]. For instance, an individual can have high self-efficacy for getting out of bed and low self-efficacy for climbing Mount Everest [3]. This is because the range of skills required for different behaviors and tasks vary by context [1]. Sometimes, self-efficacy is evaluated at a global level [12], or even with a single item [13], but these global judgements of self-efficacy can overlook variations across individuals’ behavioral complexity [1]. Inferences from general self-efficacy measures have less explanatory and predictive power [1]. Thus, when measuring individuals’ self-efficacy, we need to consider assessing it with respect to a specific behavior or task.

The patient-reported outcome measurement information system self-efficacy measure for managing chronic conditions (PROMIS-SE) is a behavior-specific self-efficacy measure for managing chronic conditions. Funded by the National Institute of Health (NIH), PROMIS-SE is a publicly available self-reported measure that is efficient and valid for individuals with chronic conditions [14]. Various qualitative methodologies (i.e., Delphi, focus groups, and cognitive interviews) were used to determine the PROMIS-SE’s five domains. The PROMIS-SE’s five domains measure respondents’ self-efficacy for managing (1) daily activities; (2) emotions; (3) medications and treatments; (4) social interactions; and (5) symptoms [14, 15]. The PROMIS-SE estimates respondents’ self-efficacy for each domain [14].

It is reasonable to expect patient’s responses across PROMIS-SE domains to be influenced by multiple domains. That is, a therapeutic program aiming to enhance patient self-efficacy for managing emotions might improve not only the patient’s self-efficacy levels for managing emotions, but also their self-efficacy levels for managing social interactions. This would indicate that perhaps these domains are not as unidimensional as previously assumed [14] and that other psychometric models (i.e., multidimensional models) might be more appropriate for calibrating these domains. It has been argued by some investigators that this might be the case for many of the existing patient-reported outcomes [16,17,18]. This multidimensionality of a construct (e.g., self-efficacy for managing chronic conditions) indicates that the level of one domain (e.g., self-efficacy for managing emotions) can also provide information about the level of another domain (e.g., self-efficacy for managing social interaction). Thus, simultaneous examination of PROMIS-SE domains might help clinicians better understand their patients’ self-efficacy. Patients’ self-efficacy profiles can be created based on interactions among domains. Profiles can be used by clinicians to develop targeted interventions that might include multiple strategies across domains. For instance, in education, student mathematics profiles for the division of fractions can help teachers design individually tailored instructional math programs by identifying target skills for improvement (e.g., ability to convert mixed numbers, multiply fractions) [19].

However, the current PROMIS-SE provides individual scores for each domain with no possibility of obtaining a global SE profile for managing chronic conditions. PROMIS-SE measures that are not in the same metric (i.e., they are individually calibrated) do not account for the potential influences of interrelated domains. Our previous study reported high correlations among all PROMIS-SE domains, suggesting individuals’ responses to a particular domain (e.g., managing symptoms) are heavily associated with their responses to another domain (e.g., managing emotions and social interactions, r > 0.75) [14]. While some advantages exist for treating domains as independent factors (less computational demands, smaller sample size requirement, and simpler outcome interpretation), when the correlations are strong among domains, incorporating interactions across domains can result in more accurate estimations [20, 21], in turn, leading to better therapeutic evaluation and planning.

Multidimensional models estimate respondents’ abilities or functions based on the premise of having more than one conceptually and empirically defined latent construct that influences their responses to items on the measure; thus, correlations among domains can be integrated into measure estimations. In the context of PROMIS-SE, these models would allow measures to account for inter-domain correlations in the calibrations while preserving self-efficacy’s behavior-specific characteristic (i.e., obtaining scores for each domain). Applying a multidimensional model to our PROMIS-SE patient data would enable users to (1) obtain simultaneous self-efficacy domain estimates that account for domain correlations; and (2) create a common metric for meaningful comparisons across domains. However, in order to apply a multidimensional model to a measure, one must first evaluate the assumption that multiple dimensions (i.e., factors; latent traits) underlie the item response data matrix. This can be evaluated by fitting a multidimensional model that evaluates model fit to the data.

Therefore, the purpose of the study is to (1) investigate current domain metric discrepancies in PROMIS-SE; (2) test the PROMIS-SE’s multidimensional model fit; and (3) examine PROMIS-SE domain and item psychometric properties under a multidimensional framework. This study is part of a larger study that employs a multidimensional model to develop patient self-efficacy profiles concurrently across domains. Our two subsequent manuscripts use patient responses to demonstrate the feasibility of using a multidimensional model to estimate patient self-efficacy for managing chronic conditions.

Methods

Data collection

A total of 1087 patients were recruited from two sites. Patients were recruited from a clinical practice at the University of Maryland Neurology Ambulatory Center (n = 837), and a national online recruitment company n = 250; (see op4g.com for more detail). Patients recruited from the University of Maryland Neurology Ambulatory Center had chronic neurologic conditions, whereas patients recruited from Op4G had general chronic conditions. Recruited patient participation was from April 2013 to April 2014 at the University of Maryland Neurology Ambulatory Center and from August to September 2013 from online recruitment.

Inclusion criteria for participants at the University of Maryland Neurology Ambulatory Center (treated by neurologists) were the following: 18 years of age or older, resided in the community, and diagnosed with one of the following chronic conditions: epilepsy, multiple sclerosis, Parkinson’s disease, peripheral neuropathy, and stroke. Patients with cognitive impairment (i.e., scored below 20 on the Montreal Cognitive Assessment), inability to give informed consent, severe or unstable medical conditions, pregnant women, prisoners, and institutionalized patients were excluded from the sample at the University of Maryland Neurology Ambulatory Center.

For online recruitment, participants were randomly selected from approximately 250,000 Op4G subjects. The selected subjects completed web-based surveys using their personal devices. Inclusion criteria for the online sample were (1) 18 years of age or older, (2) community residence, and (3) having one of the following general chronic conditions: coronary artery disease, heart failure or congestive heart failure, heart attack (myocardial infarction), stroke or transient ischemic attack (TIA), liver disease, hepatitis, or cirrhosis, kidney disease, arthritis or rheumatism, asthma, chronic obstructive pulmonary disease (COPD), chronic bronchitis or emphysema, migraines or severe headaches, diabetes or high blood sugar or sugar in your urine, cancer (other than non-melanoma skin cancer), HIV or AIDS, spinal cord injury, multiple sclerosis, Parkinson’s disease, neuropathy, or epilepsy. Participants were eligible for the study as long as they had at least one of the conditions listed above; additional health conditions (other than those listed) were allowed.

PROMIS-SE

The PROMIS-SE has three test forms (Computerized Adaptive Test (CAT), eight-item and four-item short forms) and five behavioral domains (daily activities (DA), Emotions (EM), medications and treatments (MT), social interactions (SS), and Symptoms (SX)). Test forms were created using domain-specific item banks. The five item banks included a total of 137 items (DA-35 items, EM-25 items, MT-26 items, SS-23 items, and SX-28 items). With CAT, individual item responses determine the selection of subsequent items from the item bank. Respondents are required to answer a minimum number of four and a maximum number of 12 items to obtain scores for each PROMIS-SE domain. The CAT stops when it reaches a standard error (of an estimated score) below 0.3 or the maximum number of administered items (i.e., 12). Short form items were selected based on item discrimination, item difficulty, and local independence from each domain item bank. The structural model for the original PROMIS-SE is illustrated in Fig. 1.

Fig. 1
figure 1

PROMIS structural models

Our previous article reported that patient scores from two PROMIS-SE short forms were highly correlated with results using item banks (r > 0.85 for four-item and r > 0.90 for eight-item forms) [14]. Lists of items are provided in Online Appendix. Our previous article also reported that none of the items in our five item banks had sufficient differential item functioning (McFadden’s pseudo R2 > 10%) to bias item estimates by age (under 65/over 65 years old), gender (male/female), race (white/non-white), or data source (the University of Maryland Neurology Ambulatory Center vs Op4G) [14].

All final patient scores are estimated based on the Item Response Theory (IRT) Graded Response Model (GRM). GRM links items to a latent construct. In our analysis, GRM links items to each PROMIS-SE domain (i.e., DA, EM, MT, SS, or SX). All calibrations were conducted separately for each domain, and all results were provided as T-scores (centered at 50 with a standard deviation of ten). These T-scores are standardized based on the distribution of our clinical sample and will be referred to as Tclin-scores hereafter. Higher scores indicate higher levels of self-efficacy for the particular domain. CAT directly provides users with Tclin-scores, but for short forms, scoring conversion tables should be used to convert the summated raw scores to Tclin-scores. Scoring tables are provided in the PROMIS website (http://www.healthmeasures.net/images/promis/manuals/PROMIS_Self_Efficacy_Scoring_Manual.pdf).

Statistical data analysis

For descriptive statistics, PROMIS-SE’s Tclin-score ranges and distributions were examined to identify domain discrepancies across item banks and four-item and eight-item short forms. In addition, a frequency analysis of item ratings was conducted to compare percentages for each item rating (i.e., total response numbers for each item rating/total response number) to investigate response patterns across PROMIS-SE domains.

The multidimensionality of the PROMIS-SE was tested using confirmatory factor analysis (CFA) for the item banks and the four-item and eight-item short forms (see Fig. 2) with the following criteria for fit-statistics: RMSEA (< 0.06 good, < 0.08 acceptable), CFI and TLI (> 0.95 good, > 0.90 acceptable), and SRMR (< 0.05 good, < 0.08 acceptable) [22]. Weighted least square mean and variance (WLSMV) was used as the estimator to accommodate the Likert responses of PROMIS-SE, and to maximize the efficiency of the measure, we used pairwise deletion. We also examined domain correlations, item factor loadings, and R-squares under a multidimensional model for PROMIS-SE item banks. For statistical analyses, R version 3.4.3, R studio version 1.1.414, and R packages (lavaan, dplyr, tidyr, and ggplot2) were used.

Fig. 2
figure 2

Hypothesized PROMIS structural models

Results

Demographics

On average, participants had 3.8 chronic conditions. The top five commonly shared chronic conditions were hypertension (40%), arthritis or rheumatism (33%), depression (33%), anxiety (29%), and migraines or severe headaches (26%). Table 1 provides the prevalence for 25 chronic conditions. The mean age of the sample population was 53.6 years old (SD = 14.7). The proportion of gender was 42.1% for male and 55.1% for female participants (2.8% missing data). Please see Hong et al. [15] and Gruber-Baldini et al. [14] for additional demographic information.

Table 1 Chronic conditions

Descriptive statistics

Our results indicate that the PROMIS-SE Tclin-score ranges had inconsistent patterns across the five domains. While PROMIS-SE EM had the highest (least negatively skewed) Tclin-scores (69, 65, and 63), MT had the lowest (most negatively skewed) Tclin-scores (13, 19, and 22) for item banks, eight-item short forms, and four-item short forms, respectively. Overall, Tclin-scores ranged from 13 to 69 for item banks, from 19 to 65 for eight-item short forms, and from 22 to 63 for four-item short forms. The average ranges of Tclin-scores were 51.2, 41.4, and 36.8 for item banks, eight-item short forms, and four-item short forms. Table 2 provides PROMIS-SE Tclin-score ranges and Fig. 3 illustrates Tclin-score distributions for PROMIS-SE domains.

Table 2 Tclin-score ranges of PROMIS-SE domains
Fig. 3
figure 3

Tclin-score distributions of PROMIS-SE item banks (box plot)

Frequency analyses identified that patients were more inclined to report higher self-efficacy ratings for DA, MT, and SS as compared to EM and SX. The most frequently reported rating was “very confident” for all domains, and for three domains (DA, MT and SS), “very confident” had higher than 50% of the total responses. The percentage difference between the most and the second most frequently reported ratings (“very confident” and “quite confident”) were 48.24%, 45.30%, and 37.10% for DA, MT, and SS, whereas they were 6.76% and 14.19% for EM and SX, respectively. Figure 4 demonstrates the response frequencies for each rating for each PROMIS-SE domain.

Fig. 4
figure 4

Histogram of ratings for PROMIS-SE domains

Confirmatory factor analysis

Out of 148,919 (1087 respondents with 137 items) responses, a total of 147,859 (1,060 missing, < 0.01%) responses were used for the analysis. CFA confirmed that the multidimensional model adequately fit item banks and two short forms. Corresponding fit indices for item banks, eight-item forms, and four-item forms were RMSEA (0.045, 0.069, and 0.070), CFI (00.911, 0.952, and 0.976), TLI (0.909, 0.949, and 0.972), and SRMR (0.061, 0.051, and 0.037), respectively. Overall, all domains also demonstrated high correlations under the multidimensional model (r > 0.652). PROMIS-SE EM and SX domains showed the highest correlations among all bivariate combinations (r = 0.788), closely followed by SX and SS (r = 0.783) and MT and SS (r = 0.782). Please see Table 3 for additional domain correlations. Moreover, all items highly or moderately loaded on their respective domains (\({\uplambda }\) > 0.675), and equal or larger than 45.6% of the variance of all items was explained by the model. Online Appendix PROMIS-SE Item-Level Information provides additional item-level information.

Table 3 Domain correlations under a multidimensional model

Discussion

This study confirmed that the Tclin-score ranges and distributions varied across the five PROMIS-SE domains and that a multidimensional model adequately fit the PROMIS-SE. Differences in Tclin-score ranges and distributions verified our notion that Tclin-scores from each PROMIS-SE domain should not be compared. Also, adequate fit-statistics for the multidimensional model indicated that applying a multidimensional model is viable and recommended for estimating multiple domains of PROMIS-SE.

Some might assume that Tclin-scores can be compared across PROMIS-SE domains because Tclin-scores are standardized scores. Standardized scores represent an individual’s percentage with respect to a reference population (in our case, individuals with chronic conditions). The same Tclin-score, indeed, indicates the same percentage. However, each domain Tclin-score was calibrated on a different Tclin-score distribution, resulting in varied Tclin-score ranges, which may cause inaccurate interpretations of patient self-efficacy. For example, the maximum Tclin-scores for the MT and EM domains of the four-item short forms were 58 and 63, sequentially. Different maximum Tclin-scores correspond to different portions of the population. For instance, responding “very confident” to all items on MT (21.19%) and EM (9.68%) may indicate that, overall, EM is more difficult than MT for this population. When a patient receives a Tclin-score of 58 on the MT and EM domains of the four-item short forms, some might interpret that this patient has the same level of self-efficacy for MT and EM. However, whether this patient has the same level of self-efficacy for MT and EM is questionable, since the difficulties of the two domains may not remain the same. We suggest that Tclin-score interpretation should be done with caution or that practitioners be trained extensively in how to interpret these norm-referenced scores, especially when multiple domains are used together (i.e., the Tclin-scores can be compared to see how far away a person lies from the average score in standard deviation units of that domain scale, but this information should not be compared across domains as the average score and standard deviation of scores can vary across domains).

Multidimensional psychometric models that take into account inter-domain dependencies can be used to avoid confusion when comparing Tclin-score distributions across domains. Our results indicate the factor structure for the different forms of the PROMIS-SE (the item banks and two short forms (four-item and eight-item)) can be considered multidimensional. This was expected based on the high correlations reported in our original study [14]. Of note, while our multidimensional model showed satisfactory fit-statistics for all four indices (CFI, TLI, RMSEA, and SRMR), in our original study (treating domains as independent constructs) not all fit indices were adequate [14]. Using a model with better fit-statistics is desired because it allows for less biased measurement estimates. Fayers [23] reported that multidimensional models are more useful when domains have high correlations (as is the case with the PROMIS-SE) by allowing users to identify unusual scoring patterns that are often hidden under a unidimensional framework.

Along these lines, health-related items typically represent multiple domains and those domains are often highly correlated. Chang and Reeve [16] argue that almost all health-related outcomes are multidimensional. Multidimensional IRT (MIRT) models have demonstrated enhanced measurement precision and efficiency for various health-related outcomes including quality of life, pediatric functional skills, depression, anxiety, and global physical health [24,25,26,27]. In addition to MIRT, other multidimensional models, such as the bifactor model, locally dependent unidimensional models, and diagnostic classification models (DCMs), have also been introduced in education and psychology [28, 29].

A MIRT model can provide a patient PROMIS-SE profile and an overall set of scores (calculated concurrently across domains) (e.g., a profile of 59-50-58-50-50 for DA, EM, MT, SS, and SX, respectively). Similarly, multidimensional categorical models such as DCMs can provide categorical PROMIS-SE profiles (e.g., high-low-high-low-low self-efficacy) for five domains (DA, EM, MT, SS, and SX, respectively), where high/low or other appropriate thresholds indicate the level of proficiency. These patient profiles can be used to examine common self-efficacy patterns across different demographic groups, such as diagnosis, gender, age, and others. Clinical decisions for certain interventions could be inferred from patient profile results.

With rapid advances in computational power and the development of new software applications, these more complicated models, such as MIRT or DCMs, are expected to be more frequently applied to health-related constructs. Future studies are encouraged to apply these more complicated models and investigate practitioner perceptions of the different scoring results.

Our study confirmed the multidimensional nature of the PROMIS-SE. Important information about the relationship among domains that might lead to improvements in self-efficacy can be obtained when applying a multidimensional model. Clinicians can use this information to customize treatment and provide interventions that target the most appropriate items across domains to improve a patients’ self-efficacy to manage their chronic condition. Other patient-reported outcome measures might face similar challenges. For example, we might expect measures such as the PROMIS depression and anxiety scales to be highly correlated. We suggest future studies to apply multidimensional models (e.g., Multidimensional IRT, Diagnostic Classification Modeling) to other health-related measures and further investigate the clinical utility of the measures from the clinicians’ and patients’ perspectives.