INTRODUCTION

Worldwide, the proportion of older individuals is increasing due to rising life expectancy and decreasing fertility rates.1 One consequence of this is an increasing prevalence of chronic diseases. This epidemiological shift has led to a growing recognition that we are not fully meeting needs in an increasingly complex population and has reinforced the notion that health systems must be systematic in meeting key needs to reduce avoidable adverse outcomes.2,3,4,5,6,7 However, a disease-based approach to healthcare resource planning has a tendency to lead to fragmentation of care.8

In this context, population segmentation into categories of related service needs is an important foundation for meeting needs in ways that are both effective and sustainable.7, 9, 10 A planning strategy based on a population segmentation perspective would, for example, facilitate the development of care packages based on sets of similar needs within one segment but differ between segments, resulting in a more integrated approach to addressing diverse healthcare needs11 and reducing the risk of over- or under-planning of services.7, 9, 10 For example, the Valcronic integrated care program was found to have led to a reduction in emergency care service use and an increase in patient satisfaction.9 It may also allow streamlining of programs, such as those involving large multidisciplinary teams, which can be very expensive, especially when they are hospital-based.12, 13

Approaches that have been proposed for population segmentation are often proprietary in nature, complex, and focus on risk predictions for healthcare utilization and adverse outcomes.14 Further, most approaches rely on the electronic medical record (EMR) where key factors known to predict utilization and healthcare needs, such as social support, cannot be obtained reliably.15, 16

The “Bridges to Health” segmentation scheme developed by Lynn et al. is an exemplar of a basic needs–based framework10 aimed at categorizing individuals not by their diseases (e.g., heart failure vs. chronic lung disease) but by the nature of chronic services that would best serve their needs. Specifically, “Bridges to Health” focuses on whether the person has a dominant condition and whether that condition is asymptomatic, symptomatic but stable, or advanced with frequent exacerbations. While the framework has strong clinical face-validity, there are no validated tools to make it operationally useful.

Based on the concepts represented in the “Bridges to Health” framework, we developed the “Simple Segmentation Tool” (SST), a brief instrument used to categorize elderly patients in a clinical setting for purposes of policy planning and evaluation. In order to gain acceptance, the SST needs to be reliable, valid, and easy to use. Thus, the primary aim of our study is (1) to evaluate the inter-rater reliability of the elements of the SST, both between physician pairs and physician-nurse pairs and (2) to assess the validity of the tool in predicting the time to adverse medical outcomes, namely, emergency department (ED) visits, non-elective hospitalizations, and mortality 90 days post-assessment.

METHODS

The SST Instrument

The SST16 (Appendix) enables patient population segmentation based on a clinician rater’s Global Impression (GI) of a patient’s medical condition relative to a prototypical set of categories.7, 16 The six GI population segments were adapted from the “Bridges to Health” population segmentation scheme,10 with a focus on older populations (e.g., age 55 years old and above) as well as long-term health states (i.e., requiring chronic services) (Table A1 of Appendix).

In addition to the core GI, the SST includes eight complicating factors (CFs) (Table A2 of Appendix). These factors were chosen based on findings from a focus group of four experienced clinicians representing general internal medicine, geriatrics, and family medicine to identify factors that could complicate medical management but could be managed through non-medical services; and where published evidence was available to support that factor as a predictor of adverse medical outcomes.17,18,19,20,21,22,23 As shown in Figure A1, CFs were rated on three levels of severity, namely, “low,” “moderate,” or “high”; a “low” level corresponded with the absence of a CF altogether, while “moderate” and “high” levels corresponded with varying degrees of intensity deemed indicative of needs for different types of services. The guidelines for triaging patients into GI and CF levels were placed on the reverse side of the SST instrument (see Fig. A2).

Subjects

The Singapore General Hospital (SGH) is the largest tertiary acute care hospital in Singapore with over 81,000 hospital admissions and 128,000 ED attendances per year.24 Subjects were recruited from the ED of SGH between May and June 2016 and the general medical (GM) ward of the Department of Internal Medicine on February 10, 2017 (Fig. 1). ED subjects were eligible if they were Singaporean citizens or permanent residents aged 55 years and above and had not been triaged to the most severe triage urgency category. Subjects were recruited sequentially and informed consent could be obtained either from the subject or their legal representatives. Our sample size for the ED recruitment was 123 based on achieving a predetermined level of inter-rater reliability as measured by Cohen’s kappa.

Figure 1
figure 1

Flowchart of subjects recruited and available for analysis from the emergency department and General Medical Ward.

GM subjects were eligible based on the same criteria except that there was no restriction on severity of their condition as long as they or their legal representatives could provide consent. All patients on the GM ward were assessed on February 10, 2017. The GM study was designed to be an observational study on all patients; based on historical admission rates, at least 100 patients were expected to consent.

SST Raters

SST raters in the ED consisted of four physicians assigned to provide patient care (service physicians), one independent physician to observe the encounter (observing physician), and two nurses to observe the encounter (see “Procedures” section). The ED raters familiarized themselves with the SST through completion of an online SST tutorial as well as a teaching day during which they had the opportunity to discuss their experience with the SST form and study procedures. Ratings from the teaching days were not included in the analysis. For inter-rater reliability, all ratings were used. For predictive validity, the rating from the service physician was used.

SST raters in the GM ward included 34 physicians. Their training consisted of a SST tutorial where they had the opportunity to clarify SST rating procedures with the study team. In both the ED and GM setting, the provided training modules can typically be completed in less than 15 minutes. For predictive validity, the SST was assessed whenever possible by the physician most directly responsible for the subject’s care.

Procedures

Ethics Approval

Both the ED and GM components of this study were approved by the Singapore SingHealth Centralised Institutional Review Board (CIRB) (CIRB/2016/2005 and CIRB/2016/2629 respectively). Written informed consent to participate and to allow access to the patients’ EMRs was obtained from participants or their legal representatives.

Inter-rater Reliability

A research nurse recruited eligible ED patients in the waiting area. Once the subject was brought to a clinic room to be seen by the service physician, a nurse and an observing physician entered the room and silently observed the interaction between the patient and the service physician. The observing physician and nurse were not involved with management of the patient. Consistent with standard clinical practice, all three raters had access to the subject’s EMR prior to the index ED visit. After the consultation, each rater completed the SST independently without discussion, resulting in three concurrent SST evaluations per patient.

Predictive Validity

Predictive validity was assessed by comparing the SST rating for both ED and GM subjects with our outcomes of interest over 90 days after the index SST assessment. The following outcome data were obtained through EMR review: ED visits, non-elective hospital admissions, and mortality within the aforementioned 90-day period. The ED encounter or hospital admission during which the SST was assessed was not counted as an event. The EMR review was performed by an investigator (CJL) without knowledge of the ED or GM encounter itself nor the responses on the SST.

Statistical Methods

The Cohen’s kappa statistics was used to evaluate the degree of inter-rater agreement beyond chance through the concomitant assessments performed in the ED setting. The strength of agreement based on kappa statistics was rated as follows: slight, < 0.20; fair, 0.21–0.40; moderate, 0.41–0.6; substantial, 0.61–0.80; and almost perfect, 0.81–1.00.25

Predictive validity of the GI segments and CFs was assessed using Cox proportional hazard models. Hazard ratios for adverse events were computed between the highest and lowest levels of both GI and CF variables, adjusted for age and gender, and censoring for mortality events. For example, hazard rates were compared between the GI categories of “healthy” vs. “short period of decline before dying” (see SST rating page in Fig. A1). Hazard rates for CFs were computed using the differences between “high” and “low” levels of each of the eight variables measured. Kaplan-Meier plots were constructed for the three outcomes stratified by GI category.

Sample Size Calculation

The sample size for this study was estimated based on the primary objective of assessing the inter-rater reliability of GI ratings between physician-physician pairs. A sample size of 123 subjects was needed to achieve 80% power (significance level of 0.05) to detect a true kappa value of 0.60 (assuming null value k0 = 0.4) in a test of H0: kappa = κ0 vs. H1: kappa ≠ κ0 when there were six categories with a population distribution equal to 20% (healthy), 50% (chronic conditions, asymptomatic), 25% (chronic conditions, symptomatic), 3% (long course of decline), 1% (limited reserve and serious exacerbations), and 1% (short course of decline before dying).

Role of the Funding Source

The funding source was not involved in the design of the study, analysis and interpretation of the data, or decision to approve publication of the finished manuscript.

RESULTS

A total of 199 subjects from the ED were eligible to participate. After excluding subjects who did not consent or were recruited during teaching days, 142 ED subjects were included in analysis. From the GM ward, 262 individuals were approached to participate out of which 108 both consented and received an SST assessment. Thus, 142 individuals were available for assessing inter-rater reliability and 250 subjects were available for assessing predictive validity (142 from the ED and 108 from the GM ward) (Fig. 1). From the ED, twelve subjects with missing data elements (Global Impression and functional assessment) were excluded from analyses requiring those elements.

Demographics

Of the 142 patients surveyed in the ED, the majority of subjects were between the ages of 55 and 64 (44%) while on the GM ward, the majority were aged 75 or older (57%). In both groups, subjects were equally distributed between men and women and the majority were of Chinese ethnicity (83% in the ED and 69% in the GM ward) (Table 1).

Table 1 Patient Characteristics at Baseline

Distribution by GI and CF Assessments

The ED and GM ward subjects complemented each other in terms of GI segment with the majority in the ED in GI category I or II (76%) and the majority in the GM ward in GI categories II, IV, V, or VI (90%).

While the majority of ED subjects were deemed to have few CFs (e.g., 90% were free of any functional deficits, 99% without disruptive behavioral issues, and 96% without skilled nursing task needs), the GM ward subjects had more CFs implying substantial needs (e.g., 52% with functional deficits that would require external assistance, 71% without coordination of a complex mix of medical services (significant health conditions without a main service provider or multiple non-coordinated providers), and nearly half (44%) with a skilled nursing-type task (e.g., wound or catheter care)).

Inter-rater Reliability

The physician-physician inter-rater reliability for GI rating computed using Cohen’s kappa was 0.60 (SE 0.06) (Table 2). The inter-rater reliability values between service physician-nurse and observing physician-nurse were 0.71 (SE 0.06) and 0.68 (SE 0.06), respectively.

Table 2 Cohen’s Kappa Scores for Inter-rater Reliability of Global Impression Population Segments

Predictive Validity

Based on Cox regression, hazard ratios between the highest and lowest medical severity GI categories, adjusted for age and gender, were statistically significant for ED visits (6.94, p = 0.012), non-elective hospital admissions (10.81, p = 0.004), and mortality (7.44E+10, p < 0.001). With the exception of social support, all CFs revealed statistically significant hazard ratios in predicting mortality (Table 3). The Kaplan-Meier survival curves associated with ED visits, hospitalization, and mortality by GI categories are shown in Figures 2a, b, and c, respectively.

Table 3 Predictive Validity of SST Variables
Figure 2
figure 2

a Kaplan-Meier survival estimates for ED visit outcome by Global Impression categories (n= 248). b Kaplan-Meier survival estimates for non-elective hospital admission outcome by Global Impression categories (n= 248). c Kaplan-Meier survival estimates for mortality outcome by Global Impression categories (n= 248). A, healthy; B, chronic condition, asymptomatic; C, chronic condition, symptomatic; D, long course of decline; E, limited reserve; F, short decline before dying.

DISCUSSION

In this study, we demonstrated that the SST GI ratings between physicians as well as between physicians and nurses are moderate to substantial25. The SST GI and CFs tend to broadly discriminate between individuals with different risks for ED visits, non-elective hospitalization, and mortality (see Table 3 and Table 4). We observed that individuals rated as category “short decline before dying” experienced a remarkably higher mortality rate compared with individuals in the other categories suggesting that trained clinicians are quite good in making this assessment. Notably, some of the GI categories—categories designed to represent features that reflect distinct clinical needs—show similar rates of adverse medical outcomes. This is not itself an undesirable feature of a needs-based classification scheme. Unlike risk-based classification, the priority here is not to identify individuals with similar risk but rather individuals with similar response to specific actions. While different individuals may have similar risk for poor outcomes and similar needs to reduce those risks, many individuals with similar risk for poor outcomes will have quite different needs. For example, two patients may be readmitted to the hospital within 3 months; however, for one patient, the cause for the readmission was lack of adequate wound care while for the other, the reason was failure to closely monitor and rapidly treat a fragile medical condition. Because the SST captures basic features related to care needs, when collected systematically, it can inform healthcare policy makers regarding the nature and magnitude of health and social services required for their population.

Table 4 Hazard Rate of SST GI Variables, Controlled for Age and Gender

This is the first known validation study of a clinician-administered, healthcare needs–based population segmentation tool in a clinical setting. Commonly, healthcare needs–based population segmentation is performed using an EMR retrospectively even though such records may not always capture information in a reliable or accurate manner.26 A previous analysis which assessed the reliability of the EMR for administering the SST also found that due to high rates of missing information, inter-rater agreement for GI ratings for the same patients between physicians using the EMR and physicians making the assessment based on a face-to-face encounter in the clinic was low (kappa = 0.37).16

It is notable that even in the challenging environment of the ED, inter-rater reliability of the SST was good. The finding of substantial inter-rater reliability between physicians and nurses also supports the notion that trained nurses can administer the instrument reliably, increasing the range of sites in which the SST can be implemented practically.

The current work was focused on assessing the inter-rater reliability of elements of the SST and the predictive validity of both GI and CFs as a prerequisite to further application of the instrument. Work is ongoing to identify combinations of SST features that constitute a core group of needs segments based on typical sets of health and health-related social service needs. The goal is to identify unmet needs, the relationship between persistence of unmet needs and outcomes, and strategies for meeting those needs that would be both effective and sustainable at the clinical and organizational level. While we do not expect the SST to simply save costs under the current “model of care,” it may nonetheless promote cost savings when it is used as a key metric for health service innovation (see Appendix for implementation example).

A major strength of this study lies in the strong methodology used for testing inter-rater reliability. We provided all raters of the SST identical stimuli consisting of exposure to a real patient-doctor interaction within an outpatient setting (ED). Second, the raters were working clinicians; demonstrating reliability and validity of SST ratings in this context increases confidence in the feasibility of using the SST in real-world clinical environments. This introduces the possibility that basic healthcare needs of a patient population can be monitored in a simple, ongoing way using the SST and this information could be useful in monitoring appropriate service levels at the organizational level.9 Third, by recruiting from the inpatient service as well as the ED, we were able to demonstrate the predictive validity of SST features for a diverse population with a wide range and severity of health conditions.

Regrettably, the evaluation of inter-rater reliability in the ED was restricted to individuals triaged as non-urgent; this limited the number of individuals with more complex medical needs and CFs. Because of the complexity of the inter-rater reliability assessment, we were unable to evaluate inter-rater reliability amongst clinicians on the GM inpatient service. Nonetheless, predictive validity was high across the broad population that included good representation of patients with more severe GI categories and CFs, suggesting that even if the instrument was somewhat less reliable in more complex patients, it retained predictive validity. A second potential limitation is that the population is older and predominantly Chinese, thus primarily speak one of several local languages and dialects rather than English, which our raters are uniformly proficient in.

There are several areas for improvement in this study which can be explored in future studies. They include the measurement of time taken for completion of SST by raters, assessment of raters’ understanding of SST administration, having clinicians and allied health providers complete different parts of the SST, evaluating characteristics of subjects who decline participation, and assessment of how using the SST affects healthcare utilization and outcomes (see Appendix for further discussion on areas for improvement).

In all, our study indicates that clinicians can use the SST to identify patient features related to patterns of clinical healthcare needs as well as complicating factors indicative of broader health and health-related social service needs. Such a simple, focused assessment can complement the EMR which, even when available, is not a reliable source for key information driving health service needs. A natural extension of this work is to identify packages of health and social services tailored to SST-defined population segments, followed by evaluations of whether receiving population segment–specific service packages improves health outcomes. This work will enable clinical and healthcare policy makers to more effectively gather actionable healthcare needs information from patients, which in turn would facilitate improved allocation of clinical and social services tailored to patient needs.