Introduction

Effective self-management requires knowledge and plays a pivotal role in achieving successful treatment outcomes. In order to improve the quality of care, providers must identify patients lacking self-management skills and intervene appropriately. To identify these patients, practitioners need a valid and reliable tool. The patient activation measure short form (PAM-13) [1] consists of 13 items measuring patients’ self-reported knowledge, motivation, and skills for health management (Appendix 1). It was developed using a Rasch model [2] and has been validated in the US general population.

Rural communities are often less studied than urban regions [3, 4]. Rural patients are more isolated than the general population [35] and could exhibit different health behaviors. For instance, rural patients have been reported to be hospitalized more often than non-rural patients [6] and face more barriers to access, including increased travel and limited specialty care [3, 4]. Rural areas have a larger elderly population [7, 8], less education [9], and increased chronic health conditions [1012]. The goal of this study was to examine PAM-13 for validity–dimensionality, DIF, convergent, and discriminant validities in the rural population.

Methods

Participants/data collection

We performed a demonstration project on integrating a personal health record with an electronic medical record (EMR), called the unified health resource (UHR). Four primary care clinics from the Intermountain West were recruited and they served rural communities ranging from 8,000 to 22,500 individuals. Two clinics used the UHR, while the other two used an EMR. We conducted a telephone survey on 812 patients from all 4 clinics using PAM-13, consumer assessment of healthcare providers and systems (CAHPS®) [13], and a self-management (SM) survey developed by our team. SM requires subject-based knowledge and motivation. The purpose of the SM survey was to measure patients’ knowledge and behavior in managing their personal health and to validate the PAM-13. The SM survey contains a total of 7 items (Appendix 3).

Statistical analysis

A one-parameter Item Response Theory model, known as the Rasch model [2, 14], was utilized to evaluate PAM-13. Rasch can correct some of the traditional assumptions (e.g., interval scale) made by the classical test theory models and may potentially create equal interval scores, overcoming methodological challenges to provide objective measurement.

We analyzed correlations between PAM and the SM subscales to investigate validity. The SM subscales consist of self-management knowledge (SMK) and self-management willingness to change (SMW). We hypothesized that patients with high PAM scores should have high SM scores.

We applied the Rasch partial credit model (PCM) [1518] to our sample to examine model-data fit using WINSTEPS [19]. The Rasch PCM was chosen because the PAM items had more than 2 response options and showed different patterns of usage. There are various statistical indices within the Rasch framework that can be used to check whether the data fit the model. In this study, we examined these indices: unidimensionality, item difficulty, quality measures, category response functions, and differential item functioning (DIF).

Category response function assesses whether the response categories define a distinct position on the scale. A functional scale should not have disordered thresholds, <10 responses per categories, and outfit MNSQ > 2 [14].

DIF occurs when the difficulty levels of items vary systematically based on sample characteristics. It provides one source of evidence of item bias and answers the question whether an item functions similarly across different subgroups of patients [15, 2022]. In this study, we were particularly interested in whether patients with chronic diseases respond to individual items in the same way as patients without chronic diseases, given that they all have the same overall activation measure. We are also interested in assessing age DIF (i.e., age 45 or older vs. under 45 years old) and gender DIF (i.e., female vs. male). A t-statistics of p > .05 would indicate that the item shows no evidence of DIF.

Dimensionality analyses address whether multiple constructs are needed to explain all of the variance in the data. It is evaluated using principal component analysis of residuals after the initial Rasch factor is removed [14]. These criteria were used to assess unidimensionality: (1) the variance explained by the first contrast in the residuals is <10 % and (2) the eigenvalue of the first contrast is <3.0 [19].

The person separation index (PSI), item separation index (ISI), and item fit are indicators of quality of measures. Item fit indicates whether a set of items fit the Rasch model and it can be evaluated using the outfit mean square (MNSQ) statistics. Outfit MNSQ close to 1 is considered good fit and >2 is considered misfit [14] and hence should be excluded. The PSI refers to the reproducibility of the relative measure location of the persons, where the ISI refers to the reproducibility of the relative measure location of the items [19]. A separation index of 2 or higher (corresponding to a reliability of .80 or higher) is considered reliable.

Results

Descriptive statistics

The sample is composed of primarily Caucasians, with 78 % having chronic disease. Half were from the UHR clinics. Over 60 % were women; 65 % were 45 years or older; and 63 % had at least some college (Table 1).

Table 1 Patient demographics

Among the 4 response categories in PAM (i.e., strongly disagree, disagree, agree, strongly agree), not a single category was endorsed by over 70 % of people. The “strongly disagree” category was chosen by <1 %, indicating high activation levels. Less than 2 % of responses were missing across all items; however, Rasch measurement investigates responses at the item level and supports use of incomplete data [15].

In addition to the PAM survey, the CAHPS and the SM surveys (Appendices 1, 2 and 3) were used to examine divergent and convergent validities. All items were first calibrated using a Rasch model, then scored (Table 2). The correlations between PAM and CAHPS subscales were small (r range = .007–.125), demonstrating divergent validity. The correlations between PAM and SM subscales were moderately high (r ~ .4), demonstrating convergent validity.

Table 2 Correlations between PAM, CAHPS, and SM subscales

Response category function

Table 3 displays the observed count per category, outfit MNSQ, and thresholds for each item. Three items showed disordered thresholds; two had outfit MNSQ >2; all had observed count of <10 in category 1, implying that the 4 categories should be collapsed into 3 to reduce patients’ cognitive burden. Hence, we collapsed categories 1 and 2 into a single category, which did not reveal further disordering in reanalysis (see Table 4). Subsequent Rasch analysis was based on the 3 category options.

Table 3 Item category function of PAM-13 (all 4 categories included)
Table 4 Item category function of PAM-13 (after collapsing categories 1 and 2)

Dimensionality

Rasch dimensionality analysis was conducted on the 13 items. Results indicated that the variance attributable to the first contrast was 6.3 % with a strength of 1.5 eigenvalue units, implying unidimensionality. Multidimensional models were not tested as our sample size was quite small, and we were mainly interested to see whether our results replicate the developer’s using the same model.

Item difficulty

Figure 1 displays the spread of all items and patients along a standardized linear logit scale. The central vertical dash line is a ruler separating items on the right and patients on the left. The top of the ruler corresponds to high activation levels, whereas the bottom corresponds to low activation. The map reveals that the items target the lower level of patients’ activation very well. However, the majority of the sample landed at the upper end that lacked coverage, reflecting a ceiling effect.

Fig. 1
figure 1

Person–item map for the entire sample (in logit scale)

Quality of measures

The PSI was 2.36, corresponding to Cronbach’s reliability index of .85, while ISI was 9.15, equivalent to a reliability of .99. The outfit MNSQ ranged from .67 to 1.24, reflecting excellent item fit (Fig. 2) and construct validity.

Differential item functioning

We performed uniform DIF testing and considered items with a t-statistics of p < .05 as showing statistical evidence of DIF. Results indicated that 3 items showed chronic disease DIF; 2 showed gender DIF; and 3 showed age DIF (see Tables 5, 6 and 7, respectively). Non-uniform DIF testing was not conducted due to the small sample size [20].

Table 5 Item differential functioning—Chronic diseases
Table 6 Item differential functioning—gender
Table 7 Item differential functioning—age
Fig. 2
figure 2

PAM-13 item fit

Discussion

This study utilized a Rasch model to validate the PAM-13 in rural populations. Results indicated that PAM-13 performs well in some areas, but not all. The items had excellent fit statistics and largely confirmed to undimensionality. The person and item reliability indices were high, suggesting that person and item orderings were both replicable. The PAM-13 also demonstrated high convergent and divergent validities. However, the item hierarchy revealed considerable ceiling effects, posing several potential problems. This should be addressed in future tool refinement to better capture the responses of those with high activation, and track improvements. Items that showed flat category probability curves or disordered thresholds imply that some response categories were unnecessary. Only PAM_#13 showed consistent evidence of DIF across chronic disease, gender, and age, indicating a need for item refinement.

In summary, the PAM-13 showed ceiling effects and should be interpreted with caution when examining change over time. For future scale revision, this study suggests two areas for consideration: (1) collapse categories 1 and 2 for all items to improve parameter estimation and (2) add some high-end items to the scale to cover the upper end of the trait.