Introduction

Improving equitable access to healthcare is a priority for many institutions across the United States (US), yet there are many well-documented disparities in treatment and outcomes in the US population. More than 37 million Americans have been diagnosed with diabetes, accounting for over $327 billion in annual healthcare costs [1, 2]. In addition, patients with type 2 diabetes mellitus (T2DM) experience over 7.7 million hospital admissions per year, and approximately 1 in 5 hospitalizations involve patients with T2DM [3], due to complications driven by unmet health-related social needs, among them an inadequate built environment, economic instability, lack of community/social supports [4], and limited access to medical care. In a recent systematic review, financial constraints and limited access to health services and management were highlighted as two key healthcare barriers to T2DM care and management [5]. Furthermore, in the US, a number of studies that examine disparities in healthcare access identified disparities in healthcare access for different subgroups of individuals, including those of Hispanic/Latino ethnicity [6,7,8,9,10], Blacks/African Americans [9,10,11], and women [12]. In many cases, social determinants of health, including education and socioeconomic status, have been associated with these disparities [4, 13, 14].

Healthcare access has been defined by the Institute of Medicine as “the timely use of personal health services to achieve the best health outcomes” [15] and has been broadly thought to encompass healthcare coverage (i.e., insurance), healthcare services (screening, treatment, prevention, and usual care), and timeliness for securing services (the ability to get services once a service need is identified) [16]. Healthcare access refers to the physical availability, affordability, and acceptability of healthcare services [17]. There is currently no consensus approach to evaluating healthcare access, and, as such, current approaches are varied. Many national survey studies employ a series of self-report proxy-type questions to determine health insurance status, whether or not there has been a time in the recent past when the participant was unable to get healthcare that was believed to be needed, and/or emergency room use in the recent past; yet these self-report questions are not standardized and vary across studies (yielding different results) [18]. There are also concerns about whether these types of proxy questions adequately reflect the healthcare access concerns of the patients themselves [19]. Another approach to evaluating healthcare access is to use statistical analyses of healthcare utilization or medication data (using hospital records or medical claims data) alongside demographic variables (e.g., race/ethnicity data, age, insurance data) to determine if rates of utilization differ between different groups, or if outcomes (i.e., morbidity/mortality) differ across groups [20]. This type of approach can yield robust quantitative information, but it still does not directly address patient-specific reports about healthcare access. This problem is also inherent in other analytical approaches to evaluating healthcare access, such as the Geographic Information System Mapping (GIS) and related spatial analytic techniques that create, manage, analyze, and map health services [21] but, again, are devoid of any patient-reported perceptions about healthcare access.

In addition to these commonly used but widely varying approaches, there are also a handful of standardized and validated self-report surveys that examine perceived access to healthcare. For example, the Perceptions of Access to Health Care Services Questionnaire is a 31-item measure that assesses accessibility, affordability, accommodation, acceptability, and awareness. While a total score from the Access to Health Care Services Questionnaire meets established recommendations for internal consistency reliability, three of the six subscales fall below what is considered acceptable for internal consistency reliability (i.e., Cronbach’s alpha < 0.70) [22]. Other limitations of this measure include the following: 1) its development was informed by experts (i.e., professional providers) not by key stakeholders (i.e., patients); and 2) performance validity, convergent/discriminant validity, known-groups validity, and predictive validity have not been established [22]. Other measures exist but are limited in terms of scope; either they focus on a specific medical condition (e.g., healthcare services for someone with tuberculosis [23], or cancer [24]) or on a single aspect of access (e.g., access to mental health services [25,26,27], access to physical and built environment for people with disabilities [28], geographic accessibility [29]) or healthcare access for a specific type of hospital system (e.g., the military healthcare system [30], the healthcare system of a particular country) [31, 32]. Regardless of the current approach, there is general consensus that we need to do a better job evaluating healthcare access [18, 20].

To this end, we have developed a new measurement system that is designed to evaluate important social determinants of health that contribute to healthcare disparities. This measurement system, the Re-Engineered Discharge for Diabetes Computer Adaptive Test (REDD-CAT), included the development of a new patient-reported outcome (PRO) item bank to capture the experiences that persons with T2DM have with access to healthcare services, including healthcare coverage, actual provision of healthcare services, and timeliness of receiving services. Specifically, this new PRO was developed as an extension to the Patient-Reported Outcomes Measurement Information System (PROMIS) [33], a measurement system designed to assess concepts that are characteristic to health-related quality of life and appropriate for use in individuals with chronic diseases and conditions. PROMIS and its complementary measures offer several advantages over other systems, given that measurement development has involved both classical test theory and item response theory-based analytical approaches [34, 35]. The combination of these analytical approaches allows for the development of item banks that can be administered as a long form (an administration format where all items in the bank are administered), a static short form (SF; an administration format that includes the administration of a fixed set of items typically selected to capture items of varying levels of endorsement difficulty), or as a computerized adaptive test (CAT; a smart test that employs a standard first item followed by item administration based on the response to a previously administered item [36,37,38,39,40,41,42]). These types of measures allow for sensitive assessment of a broad range of symptomatology using a minimum number of items (typically 4–6), without sacrificing precision [43]. In essence, CATs have the advantage of brevity (i.e., minimal administration burden) and tend to be equivalent to or outperform the psychometric reliability and validity of more traditional static measures, even when the number of items administered across these formats (CATs versus traditional static measures) are identical [44]. Furthermore, PRO measures that were developed using item response theory yield “estimated scores” even if only a single item is administered; threats to score validity because of missing data (that are inherent to classical test theory-based scores requiring the successful completion of all test items to generate an “estimated score”) are considerably less relevant for these types of measures.” [45] We describe the development of this new item bank, the REDD-CAT Healthcare Access Item Bank, below.

Methods

Study participants

This analysis included a total of 225 individuals with T2DM. Data collection efforts were part of a broader effort to develop the REDD-CAT measurement system, a new patient-reported outcomes measure suite designed to capture important social determinants of health, which included the development of a new measure of Healthcare Access (described herein), as well as two additional measures that are also included in this issue of Quality of Life Research (Illness Burden [46] and Medication Adherence [47]). We used three strategies to identify and screen potentially eligible participants at a safety-net urban hospital. Weekly lists of diabetes outpatients with upcoming appointments from Boston Medical Center’s Clinical Data Warehouse were generated. In addition, inpatient census reports were produced via the electronic health record to identify eligible inpatients with T2DM. Participants who were previously enrolled in a T2DM research study and had agreed to be contacted for future research opportunities were also contacted. Study inclusion criteria included the following: aged 18 + , diagnosis of T2DM, ability to communicate in English, willingness to participate, and capacity to consent. Given that study participation required, at minimum, a 5th-grade reading level (which is the recommendation for low health literacy precautions)[48], reading comprehension was assessed using the Wide Range Achievement Test 4th Edition (WRAT4) Reading Subtest [49]. Participants who were able to read the first 10 words correctly on the WRAT4 completed the study assessments independently; otherwise, research assistants helped administer the questionnaire and recorded participants’ responses. All study activities were conducted in accordance with the Boston Medical Center/Boston University Medical Campus Institutional Review Board.

Measures

All measures that were used in this study were either publicly available (i.e., Neuro-QoL and PROMIS) or used with permission from the authors (i.e., HEAL measures and Econ-QOL).

Healthcare access

The Healthcare Access item bank was developed according to established PRO development methodology [50]. This new measure was designed to assess participants’ concerns and experiences accessing healthcare. An iterative process was used to refine the Healthcare Access item pool (i.e., an uncalibrated set of items); this included feedback from expert review, item reading-level assessment, translatability review, cognitive interviews, and a final consensus meeting attended by study team investigators. This was followed by a second iterative process, which is described below (under Item Bank Development). Response options included two 5-point Likert scales (1 = never, 2 = rarely, 3 = sometimes, 4 = usually, 5 = always; 1 = not at all, 2 = a little bit, 3 = somewhat, 4 = quite a bit, 5 = very much). The final item bank (i.e., calibrated set of items) is scored on a T-score metric (mean = 50; SD = 10) with higher scores indicating more ease in accessing healthcare services. Note that T-scores are normalized relative to the calibrations sample; thus, T-scores for Healthcare Access are normalized relative to a calibration sample of people with T2DM. For reliability and validity analyses, we examined T-scores derived from the full item bank, from computer adaptive testing (CAT; scores were simulated using Firestar Version 1.3.2) [51], and from a 6-item short form.

Mental health measures

Participants completed two measures from the Neuro-QoL measurement system: Neuro-QoL Depression (perceptions of sadness) and Neuro-QoL Anxiety (perceptions of worry, fear, and hyperarousal) [52,53,54]. Both measures were administered as computer adaptive tests; response options are on a Likert scale (1 = never, 2 = rarely, 3 = sometimes, 4 = often, 5 = always). Resulting scores are based on a T-score metric (mean = 50; SD = 10) normalized relative to a calibration sample of people with neurological conditions, with higher scores indicating more depression and anxiety, for Depression and Anxiety, respectively. These measures were used to examine the discriminant validity of the new Healthcare Access item bank. Studies in the general population and other diverse clinical samples (i.e., adults with epilepsy, multiple sclerosis, and Huntington disease) support the reliability of the Neuro-Qol Depression and Anxiety item banks [54,55,56,57,58,59,60]. Across these studies, reliability is supported by excellent internal consistency (all Cronbach’s alphas ≥ 0.91 regardless of the sample being examined), as well as adequate to excellent test–retest reliability (ICCs range from 0.68 to 0.82 for 7-day test–retest for the epilepsy, Parkinson’s disease, and multiple sclerosis cohorts, and 0.95 for 3-day test–retest correlations in the Huntington disease cohort). Convergent validity was supported across all cohorts with moderate to large correlations of these measures with other measures of mental health and or emotional well-being (all correlations ≥ 0.60 across the different measures and different cohorts), and discriminant validity was supported by correlations that were lower in magnitude between Anxiety or Depression and other measures of social health or physical functioning. These measures were also able to differentiate among people with epilepsy with differing levels of disease severity, and people with multiple sclerosis reporting better versus worse global health. There are also data to support responsiveness in Huntington disease and epilepsy.

Substance use/abuse measures

Participants also completed two measures from the Patient-Reported Outcomes Measurement Information System (PROMIS), including PROMIS Alcohol Use [61] (alcohol use; consequences and expectancies of drinking) and PROMIS Severity of Substance Use [62] (past 30 days). Resulting scores are based on a T-score metric (mean = 50; SD = 10) normalized relative to a calibration sample of people from the general population, with higher scores indicating more alcohol use and more severe substance abuse, for Alcohol Use and Severity of Substance Use, respectively. These measures were used to examine the discriminant validity of the new Healthcare Access measure. The reliability and validity of these measures have been supported by research in both the general population and in participants of community addiction treatment programs [62,63,64]. Internal consistency reliability was excellent (all Cronbach’s alpha > 0.93) and convergent validity has been supported by a moderate correlation with other alcohol use screening measures (all r > 0.55), and responsiveness is supported by a modest improvement in scores.

Healing encounters and attitudes lists (HEAL) measures [65]

Two measures from the HEAL Measurement System were also administered to participants, including the 7-item Patient–Provider Connection SF (trust and satisfaction with one’s healthcare provider) and the 6-item Health Care Environment SF (perceptions about staff being respectful and the healthcare environment being comfortable). Responses for these measures are on a 5-point Likert scale (1 = not at all, 2 = a little bit, 3 = somewhat, 4 = quite a bit, 5 = very much). Resulting scores are based on a T-score metric (mean = 50; SD = 10) normalized relative to a calibration sample of people from the general population, with higher scores indicating more positive healing encounters. These measures were used to examine the convergent validity of the new Healthcare Access item bank. Initial development work supported the reliability and validity of the HEAL measures in the general population and for people with chronic conditions [65]. Specifically, internal consistency reliability was excellent for the Patient–Provider Connection and Healthcare Environment measures (Cronbach’s alpha = 0.96 and 0.92, respectively) and these measures also demonstrated moderate relationships with a measure of outpatient clinical care (r’s 0.38 and 0.39, respectively), providing preliminary support for validity.

Economic quality of life (Econ-QOL) short form [66]

The 8-item Econ-QOL short form was administered to evaluate perceptions about economic and financial HRQOL. Responses for this measure are on a 5-point Likert scale (1 = never, 2 = rarely, 3 = sometimes, 4 = usually, 5 = always). This measure is scored on a T-score metric (M = 50; SD = 10), with higher scores reflecting worse economic quality of life. We examined those patients with “better” economic quality of life, i.e., scores one standard deviation below the mean (≤ 40), versus those with “worse” economic quality of life, i.e., scores one standard deviation above the mean (≥ 60). The reliability and validity of the Econ-QOL measure are supported by the measurement development papers in individuals with disabilities (e.g., traumatic brain injury, spinal cord injury, and stroke) [67,68,69], as well as in caregivers of persons with traumatic brain injury [70]; internal consistency reliability is excellent (all Cronbach’s alpha > 0.91), there are moderate correlations with self-reported income (rs > 0.46), and scores can discriminate between those above versus below (or possibly below) the poverty line.

Hospital readmissions

Medical record data were used to identify the inpatient admissions during the previous 6-month period for each participant. Those individuals with two or more inpatient readmissions in the previous 6 months were considered “high risk” for readmission; individuals with no or one inpatient admission in the previous 6-month period were considered “low risk” for readmission.

Data collection

All self-report data were collected using REDCap [71, 72]. Assessments were completed on either a personal or study-owned mobile device or desktop/laptop computer or by phone interview with the study staff. Specifically, 200 participants completed the survey using a study device, one completed the study via telephone assisted by a study research assistant and three completed the study on a home device by connecting to a study survey link sent by email. A total of 21 participants used more than one method to complete the survey. Fifteen participants partially completed the survey in person at Boston Medical Center on a study device and finished completing survey questions by phone with staff administering questions verbally (this includes, but is not limited to, participants who received reading assistance from staff due to failing the WRAT4) and recorded participant responses directly in REDCap. Three participants completed the REDD-CAT survey partially in person on a study device and partially at home using study link sent by email. Three additional participants began the survey responses in person on a study device, continued the survey by phone with staff, and finished the survey using the emailed study link.

Statistical analyses

Item bank development: qualitative analyses

The item bank development process was conducted according to the published measurement development standards [73] and is detailed in the Supplementary File 2. This item pool was informed by both the concerns raised in the previous qualitative studies, as well as by a literature review that identified an existing measure that was specific to the military healthcare system: the TBI-CareQOL Military Frustration measure [30]. The final Healthcare Access item pool included items adapted from the TBI-CareQOL Military Frustration measure [30], as well as newly drafted items that reflected the specific concerns that were raised in the above-reported qualitative studies. As noted previously, the item pool was revised using an iterative process that included expert review, cognitive interviews, reading-level assessment, and translatability review (to ensure future acceptability for adaptations into Spanish and other languages).

Item bank development: quantitative analyses

Following the literature review and qualitative analysis of interview data, classical test theory (CTT) and item response theory (IRT) analytic approaches were used to develop the calibrated item bank. A detailed summary of our analytical approaches can be found in the Supplementary File 1. In brief, our quantitative analyses were as follows. We identified a unidimensional set of items using full-sample exploratory and confirmatory factor analyses (EFA, CFA), in conjunction with clinical input [81,82,83]; these analyses were conducted using Mplus (version 7.4) [84]. For EFA, we considered the item set to have unidimensional characteristics if the ratio of eigenvalue 1 to eigenvalue 2 was ≥ 4 and the proportion of variance accounted for by eigenvalue 1 was ≥ 0.40. We excluded items with sparse cells (response categories with n < 5 respondents), items with low item-adjusted total score correlations (< 0.40), and items that were non-monotonic (monotonicity was examined using non-parametric IRT models of item-rest plots and expected score by latent trait plots; Testgraf Software [85]). For CFA, we considered an item set to be unidimensional if: the comparative fit index (CFI) was ≥ 0.90, the Tucker-Lewis index (TLI) was ≥ 0.90, and the root mean square error of approximation (RMSEA) was < 0.10 [44, 82, 86,87,88,89,90]. For comparative fit purposes, we also obtained the chi-square value for model fit and its associated p value. We deleted items with low factor loadings (lx < 0.50) and items that were locally dependent (i.e., residual correlation > 0.20; correlated error modification index ≥ 100) [81,82,83; 91,92,93,94,95]. When CFA overall model fit criteria were not fully met, we conducted confirmatory bi-factor analyses (CBFA) [83, 96] to obtain comparators to traditional fit analyses. CBFA can be used to assess whether the data are “unidimensional enough” to fit with a unidimensional measurement model [97]. CBFA provides a set of indices to assess factor strength, including omega, omega-Hierarchical (omega-H), and explained common variance (ECV). For our interests, omega-H provides a dimensionality index: A threshold general factor omega-H value > 0.80 has been recommended for establishing a measure’s essential unidimensionality [98].

Next, a constrained graded response model (GRM), i.e., a common-slope IRT model that is appropriate when sample sizes are less than N = 500 [99], was used to estimate item parameters. We excluded items with significant misfit (S-X2 / df effect size > 3) [100,101,102,103]. We also excluded items with impactful differential item functioning (DIF): (1) a statistically significant (p < 0.01) group-specific item parameter difference, with a weighted area beneath the curve [wABC] effect size > 0.30 [104], for any DIF candidate item tested; plus (2) > 2% of DIF-corrected vs. uncorrected score differences exceeding individual case uncorrected score standard errors. DIF analyses were conducted for factors theorized to be potentially biasing, given that n ≈ 100 participants per DIF factor subgroup were available [105]. This included an examination of DIF for age (< 60 vs. ≥ 60 years), sex (male vs. female), education (≤ high school vs. > high school), and socioeconomic status (pay rent/mortgage: never/rarely/sometimes vs. usually/always; pay bills on time: never/rarely/sometimes vs. usually/always). DIF analyses were conducted in IRTPRO (version 3.1.2) [106] using iterative Wald-2 testing, a process that establishes a set of DIF-free items against which candidate DIF items can be examined to determine if they exhibit DIF [107, 108]. In Wald-2 testing Step 1, we identified a DIF-free set of anchor items, while in Step 2, we tested any identified candidate items for DIF. Subgroup-specific parameters were estimated for each candidate DIF item, using the constrained GRM. The parameters were then compared across subgroups—total DIF, slope-related DIF, and threshold-related DIF—to identify the statistically significant parameter differences with non-trivial effect sizes. The suite of IRT-based analyses was followed by a final CFA analysis designed to confirm that the final item set was essentially unidimensional (using the same item-level and overall model fit criteria outlined above).

Calibration parameters (i.e., slope and threshold estimates) from our GRM analyses were used to program computer adaptive test (CAT) administration of the final item bank. For a more rigorous and realistic CAT performance assessment, we simulated the item responses from N = 2000 cases drawn from a clinical population (i.e., having a mean one SD in the direction of worse health status). CAT administration parameters (e.g., number of items to administer, targeted score reliability level) were optimized to balance response burden and score precision. In addition, a 6-item short form (SF) was constructed, using clinician input and item-level statistics, including item score-level information values. SF items were purposefully selected to represent the full range of concept coverage while simultaneously referencing item calibration and calibration-related statistics (e.g., item slope, thresholds, average item difficulty, and item information). CAT scores were simulated using Firestar software [109].

Preliminary reliability and validity analyses

Healthcare Access score data were normally distributed and appropriate for parametric analyses. Internal consistency reliability was examined using Cronbach’s alpha for full bank and SF scores and an IRT-based estimate [110] for the simulated CAT scores (a priori criterion for an acceptable reliability level specified as ≥ 0.70 [111]). The percentages of participants who had the highest possible and lowest possible scores for the full bank and the newly developed SF were obtained to establish potential ceiling and floor effects, respectively. We divided the raw CAT item response score by the number of items administered in order to examine floor and ceiling effects for the CAT (i.e., a quotient score of “1” was considered a “floor effect” and a quotient score of “5” a “ceiling effect”). A priori criteria for acceptable floor and ceiling effects were specified as ≤ 20% [112, 113].

Convergent and discriminant validity of the Healthcare Access item bank were examined using Pearson correlations. Convergent validity would be supported by moderate to strong correlations (“moderate” = r’s ≥ 0.36–0.67 and “high” = r’s between 0.68 and 0.89) between Healthcare Access and community factors related to healthcare (i.e., HEAL Health Care Environment and HEAL Patient–Provider Connection) [114]. Discriminant validity would be supported by weak correlations (“low” = r’s ≤ 0.35) between Healthcare Access and mental health/substance use/abuse (i.e., Neuro-QoL Depression, Neuro-QoL Anxiety, PROMIS Alcohol Abuse, and PROMIS Severity of Substance Abuse) [114].

Known-groups validity was examined using independent sample t-tests to compare 1) those at high risk for readmission (i.e., ≥ 2 inpatient admissions in the past six months) versus those at low risk for readmission (no or one inpatient admission in the past six months) and 2) those with “worse” economic quality of life (Econ-QOL scores ≥ 60) versus those with “better” economic quality of life (Econ-QOL scores ≤ 40). Known-groups validity would be supported by: (1) those at high-risk reporting worse healthcare access than those at low risk for readmission and (2) those with “worse” economic quality of life reporting worse healthcare access than those with “better” economic quality of life. Finally, we expected greater than 16% of participants at high risk for readmission or with “worse” economic quality of life to have Healthcare Access scores ≥ 1 SD below the mean [115].

Sample size requirements

Sample size specifications were informed by needing to ensure stable parameter calibration for the constrained GRM modeling and for the Wald-2 DIF analyses used in the item bank development process. Published guidelines indicate that a constrained GRM is appropriate for sample sizes less than N = 500 [99], and, further, that sample sizes of at least N = 200 are recommended to conduct stable constrained GRM parameter estimation [99, 116]. Published guidelines also indicate that DIF analyses, when using the iterative Wald-2 method, are appropriate for subgroup sizes ~  ≥ 100 participants [117].

Results

Study participants

Two hundred and twenty-five persons with T2DM participated in this study; data come from a study focused on the development of new patient-reported outcomes that capture important social determinants of health, including the development of the Illness Burden item bank [46] and the Medication Adherence item bank [47], which are also published in this issue of Quality of Life Research. Descriptive information is provided in Table 1. Briefly, our sample was, on average, 57.7 years of age (SD = 11), 52% female, predominantly Black/African American (75%), and 83% non-Hispanic/Latino. Just under a quarter (24%) of the sample reported needing help reading materials from the hospital/doctor, and roughly half (47%) of the sample reported an annual income of less than $15 K per year.

Table 1 Descriptive data for study participants

Item bank development

Table 2 outlines the primary findings from the item bank development process. Briefly, EFA analyses supported the unidimensionality of the item pool: The ratio of eigenvalue 1 to eigenvalue 2 was 9.48; eigenvalue 1 accounted for 56.65% of modeled variance, while eigenvalue 2 accounted for only 5.97%. Of the 54 items in the Healthcare Access item pool, one item was eliminated due to a low item-adjusted total score correlation (inclusion criterion r ≥ 0.40), and nine items were eliminated due to high residual correlations (inclusion criterion r ≤ 0.20). Subsequent IRT modeling of the 44 remaining items revealed no item misfit (see Supplemental File 3 for item fit chi-square values, degrees of freedom, p values, and chi-square /degrees of freedom quotients). No items had impactful DIF for any of the investigated DIF factors. A final CFA model of these 44 items indicated good item-level and overall model fit (Table 3). Because our CFA model fit criteria were fully met, it was not necessary to conduct CBFA.

Table 2 Unidimensional modeling and analyses for Healthcare Access Item Pool
Table 3 Final model fit and reliability characteristics for the Healthcare Access Item Bank

The final item calibration estimates are presented in Supplemental File 4. The common slope was 2.08, and thresholds ranged from − 3.53 to  + 0.07 for the full item set. Test information was excellent (i.e., ≥ 10, with corresponding reliabilities ≥ 0.90) for scaled scores between approximately 10 and 60 (i.e., from—4 SDs to + 1 SD; see Fig. 1 for the test information function and standard errors, plotted by theta); marginal reliability was 0.93. With minimum number of items = 4, maximum number of items = 12, and targeted score-level reliability = 0.85, CAT administration tended to use the fewest items (i.e., 4) from the item bank from approximately theta = − 2.0 to theta = − 0.2; the maximum number of items (i.e., 12) was administered by the CAT at theta scores ≥  + 0.5 (See Fig. 2, which displays the number of items administered by examinee plotted as a function of theta).

Fig. 1
figure 1

Healthcare Access Test Information Plot. In general, we would like total test information per score level to be ≥ 10.0 and the resultant standard error to be ≤ 0.32 (which would provide a score-level reliability of ≥ 0.90). This figure shows excellent total test information (left y axis) and standard errors (right y axis) for Healthcare Access scaled T-scores between approximately 10 and 60 (i.e., x axis: theta − 4 to + 1)

Fig. 2
figure 2

Simulation data for the Healthcare Access Number of CAT Items by CAT Theta. In this figure, the number of items administered by examinee (the individual blue circles) is plotted as a function (the red curvilinear line) of theta. The figure shows the number of CAT items used for different score levels in standard deviation units: From approximately − 2.0 SD units to − 0.2 SD units, the CAT tended to use the minimum of four items from the item bank; at approximately ≥  + 0.5 SD units ,the maximum of 12 items from the item bank was used by the CAT. (Color figure online)

A 6-item SF was constructed using items from the final item bank, employing calibration and calibration-based statistics (e.g., slope, item characteristic curves, item information, and average item difficulty), in conjunction with item content-related clinical coverage considerations. A look-up table to convert raw (summed) scores to T-scores is available in Supplemental File 5. The reliability of the SF was examined on a measurement continuum from approximately theta = − 3.0 (T-score = 20) to + 1.0 (T-score = 60). Score-level reliabilities were very good (i.e., ≥ 0.80) for T-scores between 20 and 55, and good or very good (i.e., ≥ 0.70) for T-scores between 20 and 59.

Preliminary reliability and validity analyses

Internal consistency reliability was excellent for both the CAT and SF administrations (Table 4). The different administration formats were generally free of floor and ceiling effects, although there was evidence for a slight ceiling effect for the SF administration (Table 4).

Table 4 Descriptive data for the different Healthcare Access administration formats

Correlations supported convergent and discriminant validity (Table 5). Correlations between Healthcare Access and community factors related to healthcare were moderate, supporting convergent validity. In addition, correlations between Healthcare Access and mental health and substance use/abuse measures were generally low, supporting discriminant validity.

Table 5 Convergent and discriminant validity for the Healthcare Access Item Bank

Known-groups validity was also generally supported (Table 6). Findings for those at high risk for readmission were in the expected direction, but this difference did not meet the conventional levels of significance (p value = 0.06). As expected, those with “worse” economic quality of life indicated significantly worse healthcare access than did those with “better” economic quality of life. Those individuals at high risk for readmission, as well as those with “worse” economic quality of life, were at greater risk of having problems with Healthcare Access (i.e., greater than 16% of individuals in those groups had scores ≥ 1 SD below the mean, relative to the general population of individuals with T2DM).

Table 6 Known-groups validity

Discussion

This manuscript describes the development and preliminary psychometric examination of a new patient-reported outcome measure, the REDD-CAT Healthcare Access item bank. This new measure, designed to capture important patient-reported concerns related to access to healthcare services, can be administered as either a computer adaptive test (CAT) or a 6-item short form. Factor analyses supported the development of this unidimensional measure, and differential item functioning (DIF) analyses indicated that items were devoid of bias for age, sex, education, and socioeconomic status. This new measure will be publicly available through healthmeasures.net, as well as the PROMIS Application Programmable Interface (API), in early 2023.

Preliminary examination of the psychometric properties of this measure indicated that, for both administration formats (i.e., CAT and SF administrations), internal consistency reliability was excellent. In addition, the measures were free of excessive floor or ceiling effects, although there was a slight ceiling effect for the SF administration (but not the CAT administration) of the measure. While this was not consistent with our a priori specifications (we expected those effects to be ≤ 20% and found a ceiling effect for the SF of 21.78%, which slightly exceeded this expectation), this minimal ceiling effect will not have a negative impact on its clinical utility, given that we expect this measure to be used as a screening measure to help identify those individuals with healthcare access problems (which would be impacted by the presence of a floor effect, not a ceiling effect).

With regard to validity, we found evidence for convergent, discriminant, and known-groups validity. Specifically, the overall pattern of correlations was as expected; we had moderate correlations between the Healthcare Access scores and HEAL Health Care Environment and Patient–Provider Connection scores, supporting convergent validity. These findings are consistent with recent research examining barriers to Healthcare Access among a diverse sample of publicly insured adults that identified health insurance coverage, logistical barriers, and patient–provider trust among the top-endorsed barriers to access [118]. We also had low correlations between the Healthcare Access full bank scores and Depression scores as well as alcohol/substance use scores, which provided support for discriminant validity. The relationship between the Healthcare Access full bank scores and Depression scores did not meet the a priori cutoff (i.e., < 0.36) for discriminant validity; the magnitude of the relationship was 0.366, which just exceeded this specification. Given that this was the only relationship that did not meet the pre-specified criterion for convergent and discriminant validity, and more than 75% of our validity analyses were consistent with proposed expectations, Healthcare Access scores still meet the established standards for construct validity [119].

In addition to convergent and discriminant validity, known-groups validity was also supported. For example, individuals with worse economic quality of life reported significantly worse healthcare access and were at higher risk for elevated scores on the new REDD-CAT Healthcare Access measure, relative to those with better economic quality of life. This finding is consistent with a considerable body of literature that finds a robust relationship socioeconomic status and healthcare access [4, 13, 14, 118, 120, 121]. In addition, there was a trend for those at high risk for readmission to have worse reported Healthcare Access compared to those with low readmission risk. This is consistent with findings showing that individuals who do not have healthcare access to post-acute care services in the community are more likely to end up at the hospital [122]. Those individuals with worse economic quality of life, as well as those with high readmission risk, also had elevated rates of healthcare access problems, as hypothesized.

This is the first comprehensive measurement system, to our knowledge, to include a patient-reported measure about healthcare access, which is an important social determinant of health that is related to readmission risk in persons with T2DM. This new measure, REDD-CAT Healthcare Access, can be administered as a computer adaptive test, a format where each subsequent item administered is selected based on a participant’s previous response, in essence maximizing brevity without losing the precision that is afforded by administering more items [44]. This new measure is scored on a T-score metric (M = 50, SD = 10), with higher scores indicating better perceived healthcare access. This type of standardized score increases the clinical utility of the tool, given that obtained scores can be directly compared to the reference group (in this case, other individuals with T2DM). For example, individuals with scores of 40 or less (i.e., ≤ 1 SD below the mean) report more significant barriers to Healthcare Access than 83.9% of the broader T2DM population. Furthermore, individuals with scores 30 or less (i.e., ≤ 2 SDs below the mean) are reporting barriers at a rate that exceeds 97.9% of their peers. When deciding among the different administration modalities (long form, SF, or CAT), we would encourage people to focus on using either the SF or the CAT administration, as the overall participant burden (of administering all of the items in the bank) is not outweighed by the small gains in precision of the long form administration. When deciding between SF and CAT administration, there are practical reasons to consider. For example, CAT administration requires an electronic delivery platform with a live internet connection to run the CAT administration programs that are included as part of the PROMIS API. In addition, these electronic data capture platforms (such as REDCap or healthmeasures.net) may incur additional costs for the researcher/clinician. Regardless, given the slightly better psychometric performance of the REDD-CAT Healthcare Access CAT (i.e., the absence of the ceiling effect), we would recommend CAT administration when not precluded by practical considerations that would warrant SF administration.

Bearing in mind the advances offered by this new measure to capture and identify individuals with T2DM who have significant Healthcare Access concerns, we acknowledge several study limitations. First, while the sample size met minimal requirements for the analyses (i.e., EFA, CFA, constrained GRM, and DIF) that informed the development of this new measure, and while this approach has been successfully applied to PRO measurement development in other settings [30, 123], larger samples tend to exhibit more stable calibration parameter estimates and provide a more robust estimate of DIF; as such, future work that confirms EFA, CFA, calibration parameters, and DIF analyses in independent samples is needed. Similarly, the preliminary reliability and validity data that are reported herein should be replicated in an independent sample. In addition, the CAT data presented were based on simulations and therefore need to be replicated in a clinical sample using an actual CAT engine. In addition, the study sample setting was an urban safety-net health system, which is likely to treat patient that lack adequate healthcare coverage and have patients with a high number of unmet social needs relative to other types of healthcare systems [2]. As such, results may not be generalizable to other types of hospital systems. The study population also included mainly non-white T2DM patients, which may limit generalizability to Caucasians.

In sum, the REDD-CAT Healthcare Access measure provides a brief, reliable, and valid assessment of T2DM patients’ healthcare access concerns. This new measure can be used to aid discharge planning for those individuals with T2DM who have recently been hospitalized, in order to screen for individuals that may be experiencing difficulties within the healthcare system. In addition, although this measure was developed specifically for use in persons with T2DM, it may also have clinical utility in other medical populations.

Conclusions

The REDD-CAT measurement system is the first comprehensive system designed to assess social determinants of health among persons with T2DM. The new REDD-CAT Healthcare Access measure captures an important social determinant of behavior, namely, patient experiences with access to healthcare services, including healthcare coverage, actual provision of healthcare services, and timeliness of receiving services. This measure will be available for public use as a part of the healthmeasures.net platform. Additional research is needed to elucidate the mediation and moderation effects that perceived healthcare access has on outcomes in these individuals. Understanding these relationships will help inform targeted interventions designed to minimize readmission risk and improve patient HRQOL.