Introduction

Two-key triple aim healthcare goals are enhancing the patient experience of care and improving outcomes [1]. Therefore, it is essential to assess patient perspectives of their Healthcare Environment and their personally relevant outcomes. Although understanding the specific effects of treatments is vital and the focus of most biomedical research, nonspecific or contextual features of health care are also important as they contribute to the care experience and often influence outcomes [2]. The nonspecific, contextual features of healthcare treatments are broad ranging and multifaceted and may include interactive processes such as rituals and conditioning [3, 4], as well as patients’ attitudes and their views of healthcare relationships and the treatment environment.

Trust in the healthcare provider, comfort with the clinical environment, and optimism regarding treatment are important contextual aspects of the patient experience of care. Such contextual factors may be independent of specific actions of a medication or procedure, yet there is growing recognition that they can influence treatment outcomes. A meta-analysis of 13 studies found significant effects of the patient–provider relationship on outcomes [5]. Trials that experimentally manipulate provider warmth show enhanced treatment effects even with inert treatments [6, 7]. Expectations regarding treatments are particularly influential in symptom improvement [6, 811]. For example, back pain patients with the highest treatment expectations exhibited an adjusted odds ratio of 5.3 for improvement [12]. Additionally, when given pills clearly marked ‘placebo,’ and told of potential mind–body benefits, 59 % of irritable bowel syndrome patients reported adequate improvement [13]. Likewise, patients’ own characteristics shape their experience of care. Taking an active role can contribute to therapeutic outcomes [14], as can optimism [15, 16], and Spirituality [17]. A review of the biological mechanisms of nonspecific factors supports their influence on outcomes, often greater than the specific treatment [18]. Thus, enhancing nonspecific effects may be the most direct method for enhancing the impact of all therapies.

Efforts to understand nonspecific factors are increasing, particularly through experiments designed to elucidate placebo effects. However, there has been little effort to measure nonspecific, contextual factors, particularly from the patient’s perspective. Instruments that measure these factors would have widespread value as they would allow researchers and clinicians to understand these dimensions as well as enhance them in the treatment setting.

This study reports on the development and initial validation of a set of patient-reported measures, the Healing Encounters and Attitudes Lists (HEAL), that assess several nonspecific factors in health care and healing. We used instrument development methodology of the Patient-Reported Outcomes Measurement Information System (PROMIS) [19, 20], which includes patient and clinician input, iterative steps of item revision, and modern psychometric methods (see Online Resource). HEAL instruments are designed to be administered as computerized adaptive tests (CATs), in which a small number of items can provide precise information. However, brief static (six- to seven-item) forms were also created using this methodology. The University of Pittsburgh IRB approved this study (PRO13090474).

Methods

Overview

HEAL item banks were developed using the rigorous instrument development methodology of PROMIS [19, 2123]. Several iterative steps were involved: formulation of a conceptual model, development of item banks, and psychometric analyses including both classical test theory (CTT) and item response theory (IRT).

Formulation of the conceptual model

Multiple sources of information informed the model of contextual factors that contribute to health outcomes. We included the viewpoints of clinical experts and patients as well as existing scientific literature. Through interviews with over 20 healthcare providers, six patient focus groups that explored patients’ ideas about contributors to positive and negative treatment outcomes, and literature reviews of placebo mechanisms and patient characteristics that moderate outcomes, we developed a preliminary conceptual model of HEAL (Fig. 1). The model included within-person characteristics and interpersonal relationship perceptions, and consisted of seven initial domains: positive/negative attitudes, Spirituality, Locus of Control, Treatment Expectancy, Health and Wellness Attitudes, perceptions of the patient–provider relationship, and patient perceptions of the Healthcare Environment.

Fig. 1
figure 1

Initial conceptual model of HEAL domains. Published originally as figure 1 in Greco et al. [25]

Development of item banks

Item bank development included (1) comprehensive literature searches to find existing questionnaires, (2) categorization, review and refinement of existing items, and writing new items, and (3) cognitive interviews with patients to establish clarity (see Fig. 2).

Fig. 2
figure 2

Overview of HEAL item banks development

Comprehensive literature searches

Searches of PubMed, EMBASE, CINAHL, AMED, ICL, MANTIS, HaPI, and PsycINFO databases were conducted using a search methodology developed by the Pittsburgh PROMIS research site [24]. The searches generated 14,864 abstracts, which were individually reviewed. Articles documenting development and psychometric validation of instruments were examined, yielding 535 unique questionnaires with over 16,000 items potentially related to the domains in the HEAL model.

Categorization and review of items

The initial pool of existing questionnaire items was coded into the conceptual categories of the HEAL model. The codings were based on independent ratings by teams of 2–4 healthcare clinician–researchers (R.G., C.G., N.M., M.J.S.) and instrument development experts (P.P., K.J., N.D., J.C., C.M.), with disagreements resolved through discussion. Using the qualitative item review procedures of PROMIS [19], the pool of items was reduced by removing redundant items and those that were too narrow in focus (e.g., disease or treatmentspecific). Further refinement included removing or rewriting poorly written items (see [25] for examples) and writing new items. The retained items were standardized for first-person subject, verb tense (primarily present tense), simple vocabulary, and five-point response scales reflecting frequency (Never to Almost always) or intensity (Not at all to Very much). From the initial pool of items, a total of 359 were retained for further review by patients in the cognitive interview stage.

Cognitive interviews

Forty-two patients participated in cognitive interviews, during which patients ‘think aloud’ while reviewing one item at a time with a trained interviewer. Each of the 359 items was reviewed by at least six patients representing a broad range of ages, diverse races and reading levels, and both genders. Patients provided feedback on the item clarity item, vocabulary, and appropriateness of the response scale. Based upon participant feedback, 63 items (17.5 %) were rewritten or removed. All revised items were subsequently reviewed in further cognitive interviews. The item pool contained 296 items following the cognitive interview phase.

Sampling

We field tested the items on two samples of patients who received conventional or integrative medicine treatments for a medical or mental health condition (details in Online Resource). One sample of 1400 persons was provided through the Internet survey company, YouGov.com. Patients who reported receiving treatments within the past year were eligible. The second sample included 257 patients at the University of Pittsburgh Medical Center (UPMC) who had recently started a new integrative medicine (n = 127) or conventional medicine (n = 130) treatment. This clinical sample repeated the computerized assessment 6 weeks later.

Measures

The 296 HEAL items retained following cognitive interviews represented several domains relevant to interpersonal and contextual aspects of treatment as well as potentially important patient attitudes.

To reduce burden for the Internet sample to 30 min or less, only HEAL items and demographics were included. Also, we administered subsets of HEAL items to respondents in a blocked design, ensuring that each item was paired with every other item a comparable number of times without requiring that participants respond to all items. In the clinical sample, in order to explore concurrent and discriminant validity, the PROMIS 29 health status profile and several well-known ‘legacy’ questionnaires were also completed (see Online Resource). At the 6-week follow-up, clinical participants also completed the single-item Clinical Global Impression (CGI) scale, which asks for comparison of current symptoms to earlier symptoms [26] to assess predictive validity of the HEAL measures.

Classical test theory analyses

The initial CTT analyses involved descriptive statistics and factor analysis. Although our conceptual model included seven domains deemed relevant to patients’ experience of health care and healing, we made no assumptions regarding the most appropriate factor structure for the HEAL items. Our goal was to identify the most robust latent constructs with sufficient unidimensionality to proceed with IRT analyses. We also wanted to capture a variety of clinical indicators to ensure content validity. Thus, we intended to strike a balance between unidimensionality and adequacy of content.

The entire sample, both Internet and clinical, was randomly divided into two subsamples. The first subsample was used for exploratory factor analysis (EFA, n = 799), and the second was used for subsequent confirmatory factor analysis (CFA, n = 858) [27]. Factor loadings, scree plots, and eigenvalues were evaluated.

IRT analyses

Item response theory (IRT) refers to a set of psychometric methods for developing and scoring tests based upon the idea that one’s response to each item reflects one’s level on the underlying domain of interest. The latent trait (for example, Treatment Expectancy) is scaled along a dimension called theta (θ). IRT models include discrimination parameters (i.e., how well the item distinguishes among individuals higher vs. lower on the θ scale) and location/threshold parameters (i.e., the value of θ at which an individual has the highest probability of choosing the particular response to the item). Thus, IRT provides psychometric information about each item separately, as well as information for the overall test (see Online Resource).

Items remaining in the pools after CTT were calibrated with the two-parameter graded response model (GRM) using MULTILOG 7.03 [28]. Each item’s fit to the IRT model was examined using the SAS macro-IRTFIT program [29]. Misfitting items (χ 2, p < 0.01) were considered for exclusion. Several additional outcomes from IRT analyses, such as differential item functioning (DIF) and indicators of local independence, were used to identify items for possible exclusion (see Online Resource).

Content-expert review

To ensure the relevance of the remaining items, content experts (C.G., P.P.) re-examined items from the clinical perspective to eliminate items with questionable psychometric properties or clinically overlapping content. Conversely, items with important clinical implications were considered for retention in the pool even if they did not meet some conventional psychometric guidelines.

Preliminary validity evidence

In order to evaluate concurrent and discriminant validity, we correlated θ scores on HEAL domains with corresponding legacy measures and the PROMIS health status measures. As an initial estimate of predictive validity, we correlated HEAL θ scores with participants’ perceptions of improvement on the CGI. The clinical sample was the primary source for validity information, given the more extensive data we had on presenting complaints and treatment history.

Results

Factor analyses

In the EFA, we found a five-factor structure fit well to the data. Correlations among the five factors ranged from 0.17 to 0.49, indicating that they were related but sufficiently distinct to deserve separate status. The items in the original HEAL domains of Patient–Provider Connection and Healthcare Environment loaded on the first factor. The items in the original HEAL domains of Optimistic Attitudes and Locus of Control loaded on factor 2. Spirituality items, Health and Wellness Attitudes, and Treatment Expectancy items loaded primarily on factors 3, 4, and 5, respectively. Nine items with cross-loadings (0.40 loading on more than one factor) and 39 items with single-factor loadings <0.40 were dropped. Notably, 17 of the original 34 items in the Health and Wellness Attitudes item bank had low single-factor loadings (<0.40) and were among the items dropped. A second round of EFA was performed to confirm the five-factor structure and to re-assess the magnitude of the factor loadings. No further items were eliminated after the second EFA.

Single-factor CFAs were performed for each of the five factors, based on items retained after EFAs. Iterations of CFA were performed until all retained items had loadings >0.50. A total of 250 items were retained at this stage for subsequent IRT analyses. To ensure construct validity, all items were reviewed again for content relevance. Based on this review, factor 1, which had 94 items, was separated into 1a, Patient–Provider Connection (PPC), and 1b, Healthcare Environment (HCE). CFA was rerun for PPC and HCE, and all factor loadings were larger than 0.50.

IRT calibrations

The item banks corresponding to the six factors from CTT were calibrated separately using the two-parameter GRM. Item-parameter estimates for the final items for each item bank are shown in the Online Resource in Tables 1–6.

Item information functions (IIF), differential item functioning, and local independence

Item information curves, which reflect the overall performance of individual items, were examined. Items with limited information (i.e., with peaks on the IIF < 1.0) were removed. Following IIF-based refinement, 190 items remained (see Table 1). No items showed differential functioning based upon gender, age, or education. Therefore, no items were removed due to DIF. Five locally dependent items were removed.

Table 1 Items retained following CTT and IRT analyses

IRT model fit

We examined each item’s fit to the IRT model. The SAS macro-IRTFIT identified 17 misfitting items, and these were removed (see Table 1).

The domain Health and Wellness Attitudes was originally broadly defined and included social and family factors, attitudes toward diet and exercise, fear of illness, and views of conventional and complementary medicine. The multidimensionality of this domain rendered 17 of the original 34 items inappropriate for fitting to IRT models. A further 11 items in this domain were dropped due to low item information (10 items) and IRT misfit (1 item), leaving too few items for CAT administration of this item bank. A set of six items concerning views of complementary medicine were the only survivors, and they were retained as a short, static measure entitled Attitudes toward CAM.

Preliminary validity evidence

In the clinical sample, we examined concurrent validity between the six HEAL θ scores and legacy instruments measuring similar constructs. HEAL Patient–Provider Connection and Healthcare Environment correlated 0.38 and 0.39 (p < 0.01) with a measure of outpatient clinical care (ACES [30]), indicating similarity but not complete overlap. HEAL Treatment Expectancy was associated with Credibility 0.71 (p < 0.01) and Expectancy 0.58 (p < 0.01) factors of the Credibility Expectancy Questionnaire [31]. HEAL Positive Outlook was inversely associated with PROMIS 29 depression (−0.71, p < 0.01) and anxiety (−0.54, p < 0.01). Likewise, Spirituality, Positive Outlook, and Attitudes toward CAM were significantly correlated with corresponding legacy measures (see Online Resource), with correlations ranging from 0.81 to 0.30 (all p’s < 0.01). Treatment-related item banks PPC and HCE were unrelated to PROMIS health status measures, providing preliminary support for discriminant validity.

To explore predictive validity, baseline HEAL scores were compared with follow-up CGI ratings in the clinical sample. Correlations ranged from 0.36 (p < 0.01) for HEAL Treatment Expectancy to 0.13, (p < 0.05) for HEAL Spirituality, indicating that HEAL scores account for some variability in patients’ perceived improvement across a broad range of treatments.

Selection of items for short forms

To administer the HEAL where computerized adaptive testing (CAT) is not available, static short forms can be a useful alternative. To develop short forms, we evaluated the items in the HEAL banks based on their psychometric properties: their discrimination parameters, the percentage of time the item would have been selected in a simulated CAT [32] based on the observed data from the calibration samples, the expected information under the standard normal distribution (mean 0, SD 1), and the expected information under an extended distribution (mean 0, SD 1.5) [33]. For five of the HEAL banks, we selected items for short forms based upon the convergence of the psychometric criteria together with review of the clinical importance and content balance of the items. Clinical experts (P.P., C.G.) performed these content reviews and choose items for short forms that had both excellent psychometric properties and represented the clinical breadth of the domains. The other HEAL item bank, originally conceptualized as Attitudes toward Health and Wellness, included only six items following CTT and IRT analyses which is an inadequate number for CAT. Therefore, this bank is available only as a short form: Attitudes toward Complementary/Alternative Medicine (CAM). The short forms are provided in Table 2.

Table 2 Calibrated HEAL short form items

The short forms’ internal consistency alpha coefficients were 0.92 for Healthcare Environment and Positive Outlook, 0.96 for Patient–Provider Connection and Treatment Expectancy, and 0.97 for Spirituality. The correlations between the theta scores derived from the short forms and their corresponding full item banks were high, ranging from 0.93 to 0.97, indicating that the short form scores are highly consistent with those of the full banks.

Discussion

Nonspecific or contextual aspects of healthcare treatments are complex and extensive and contribute to patients’ experience of care. Nonspecific factors may moderate and mediate treatment outcomes, and thus, despite the challenges, are important to assess in medical research. Barriers to such assessments include patient and clinician burden and a lack of precise measures that are relevant across a broad range of treatments and health conditions. Many existing measures of patient engagement and other relevant factors are lengthy, specific to a particular setting, or are not informative across the full range of the construct. We developed the HEAL item banks to overcome these barriers and provide researchers and clinicians with precise tools for assessing contextual factors that influence treatment outcomes and patients’ care experiences.

The HEAL banks of Patient–Provider Connection (57 items), Healthcare Environment (25 items), Treatment Expectancy (27 items), Positive Outlook (27 items), and Spirituality (26 items) demonstrated adequate unidimensionality, high item and test information, and initial construct validity. The items of our original model domains of Locus of Control and Self-Efficacy were not found to be separate factors distinct from Positive Outlook, and thus, these items are contained within the Positive Outlook item bank. The domain Attitudes toward Health and Wellness originally included many items describing health attitudes ranging from views of exercise to opinions regarding the role of advanced technology in health care. Due to this diversity, upon psychometric testing very few of the items in this domain formed a unidimensional set of items. This set of items concerned Attitudes toward CAM and were redefined as a short six-item form. We were able to derive short forms for the other five HEAL banks that provide brief, efficient measures of nonspecific moderating and mediating factors that may contribute to healing outcomes. Short forms may be especially useful in clinic to assess the contribution of nonspecific factors to patients’ experiences and outcomes.

HEAL CATs and associated short forms fill an important gap in the measurement of nonspecific factors that may affect treatment outcomes. The HEAL are unique in that they provide precise information on potentially influential interpersonal and attitudinal factors that cut broadly across medical conditions and treatments. Because they are administered as CATs or short forms, using HEAL does not unduly burden patients. Investigators and clinicians can choose to use whichever HEAL instruments are most relevant and appropriate for their purposes. In research, the use of HEAL CATs may be a step toward dismantling the ‘black box’ of the placebo effect. In clinical applications, HEAL may inform clinicians regarding factors that can enhance health outcomes and improve the experience of care [34].

Future directions

One goal is to provide further validation of the HEAL item banks among patients who are initiating a treatment that involves ongoing meetings with a healthcare provider. Our initial validation work in the present sample provides useful clues to predictive validity of the HEAL item banks, yet is limited by the fact that treatments were of differing lengths and intensity, and clinical details were limited. We are currently validating the HEAL CATs and short forms in patients with chronic back or neck pain receiving new treatments. A related goal is to determine, through both qualitative and quantitative methods, the clinical utility of the HEAL for improving patient–provider engagement and clinical outcomes in an expanded group of patients. Future work to disseminate and implement HEAL as well as other PROMIS measures will involve their systematic inclusion in large behavioral and pharmacologic RCTs.