Introduction

Healthcare workers, physicians and nonphysicians, report high levels of burnout that affect the well-being of themselves and their patients.1,2,3,4 Thus, for the well-being of both healthcare workers and patients, we must reliably measure workforce burnout. The best-studied measurement of burnout is the Maslach Burnout Inventory (MBI), introduced in 1981.5 However, the proprietary status of the MBI and its length of 22 items both handicap organizations that embed the MBI within larger surveys on employee well-being and experiences.6,7,8,9

These limitations of the MBI have led organizations to use the single-item burnout question (SIBOQ), derived from the “Z” Clinical Questionnaire and introduced in 1994.8 The SIBOQ is widely used, including incorporation into the Physician Worklife Study,10 the MEMO study,11 and the Mini-Z survey in 2016.12 The American Medical Association’s (AMA) Steps Forward program has recommended the SIBOQ since 2015.13 The National Academy of Medicine 14 recommends the Dolan 9 variation from 2015 as valid, reliable, and freely available. The SIBOQ is worded by the AMA as,13 “Using your own definition of ‘burnout,’ please circle one of the answers…” with five Likert responses ranging from “I enjoy my work. I have no symptoms of burnout” to “I feel completely burned out. I am at the point where I may need to seek help.”

Unfortunately, the two studies often cited as establishing the validity of the SIBOC validated the SIBOQ only against the emotional exhaustion (EE) subscale of the Maslach Burnout Inventory (MBI) and not the full MBI. 9,15 Thus, the ability of the SIBOQ to measure overall burnout is uncertain despite its frequent use.

We aim to create an equation to convert group rates of burnout as measured by the SIBOQ to estimated rates measured by the MBI. By using generalized linear mixed regression, we can separately quantify the predictive (benchmarking) and explanatory (hot-spotting) capabilities of the SIBOQ. A prediction equation, if reliable, will allow organizations that use the SIBOQ to benchmark their results against studies that report MBI from large sample populations. Our second aim is to meta-analyze existing studies to correlate the overall SIBOQ with the full MBI (MBI-full) and its subscales. This second aim will help focus further development of the SIBOQ.

Methods

We followed the PRISMA reporting guidelines for systematic reviews (checklist in eTable 1).16 We did not prospectively register the review for reasons detailed in the Online Supplement : Registration and reporting standards. We followed methods of openMetaAnalysis and the Cochrane Collaboration.17,18

Eligibility Criteria

We included studies that met either of two criteria.

For the study’s first aim, to support benchmarking, we included studies reporting rates and respondent counts of burnout for both the SIBOQ and the overall MBI. In addition, we included studies that used the MBI dual items (MBI-DI) as a gold standard. This criterion was unplanned and the rationale for accepting the MBI-DI is in the Supplement: Inclusion Criteria.

For our second aim, we included studies reporting correlations (r) between the SIBOQ and any version of the MBI, including overall correlation as well as correlations with one or more of the main subscales of the MBI: emotional exhaustion (EE) and depersonalization (DP).

Information Sources and Search Strategy

Our literature search was first developed in 2021 with the final search before submission occurring on April 7, 2023. Since that date, our search continues to be executed daily at PubMed as done in our first living systematic review.19 Details of our search are in the Online only supplement.

One of two reviewers screened studies and shared with the second reviewer if the first reviewer required the full text of the article to determine inclusion.17 We included studies if also approved by the second reviewer, who was unaware of the first reviewer’s decision.

Data Collection Process and Effect Measures

Two reviewers abstracted for each study: the numbers of surveys distributed and respondents; rates of burnout as measured by the SIBOQ, MBI, and its subscales; rates of confounders; and lastly, correlation coefficients between measures.

We abstracted any version of the MBI for our criterion standard.1 The initial MBI-Human Services Survey (MBI-HSS) was published in 1981 and contains 22 items, including eight items in the personal accomplishment (PA) scale.5 The second version of the MBI-General Survey (MBI-GS) was published in 1996 with 16 items and replaced the PA scale with 6 items on the professional efficacy (PE) scale.20 Both versions measure depersonalization (DP) with 5 items, but the emotional exhaustion (EE) scale was reduced from 9 items on the MBI-HSS to 5 items on the MBI-GS. We also collected descriptions of study populations, survey methods and incentives, and response rates.

Studies varied in criteria for burnout cutoff scores for determining burnout. For the MBI, the MBI originally required three criteria to deem burnout: abnormal EE, abnormal DP, and lack of personal accomplishment.21 Most studies deemed burnout if either the EE or DP scales were abnormal. Thus, we accepted the following definitions of an abnormal MBI: abnormal EE accompanied by abnormal DP or PA scales 22, abnormal EE or DP scales, or abnormal MBI dual items (MBI-DI: either EE or DP single items abnormal).23

Risk of Bias Assessment

We based our assessment on the CLARITY tool (eTable 3).24 We did not include CLARITY’s fourth item, “Is the survey clinically sensible,” as we are comparing the administration of two surveys and not comparing their development.

Statistical Methods

Aim 1: create an equation for converting a group rate of burnout measured by the SIBOQ to a predicted rate measured by the MBI. As noted by Debray et al., validation of prediction models is highly recommended and thus meta-analysis is needed to summarize predictive performance of the model across different settings and populations.25 For our first aim, we meta-regressed the rates of burnout by the SIBOQ against those reported by the MBI. If a study reported subgroups of respondents, we used these results rather than the overall rates reported by the study. We used the intercept and coefficient from the regression to derive an equation for predicting the MBI from the SIBOQ.

The equation for converting between the SIBOQ and the MBI was created with unweighted, linear regression models using both a fixed effects regression model as well as a generalized linear mixed-effects regression model using restricted maximum likelihood (REML). To increase validity of our regression model, we presented the results with prediction intervals that are more conservative than confidence intervals.25,26,27

Aim 2: correlate the overall SIBOQ with the full MBI (MBI-full) and its subscales. We used generalized linear mixed regression to separately quantify the predictive (benchmarking) and explanatory (hot-spotting) capabilities of the SIBOQ. In the mixed regression model, we treated the study as a random effect. For the mixed regression model, we calculated the explanatory power (total, conditional) R2 as well as the predictive (marginal, fixed) R2. We compared the explanatory powers of the two regression models by using analysis of variance (ANOVA) of their Akaike information criterion (AIC). We calculated the heterogeneity of the results of the mixed regression by dividing the variance between studies by the total variance.

To explore modulators of the relationship between the SIBOQ and the overall MBI, we added to the mixed regression terms for the number of respondents, the response rate, the rate of DP, and the rate of physicians among responders. In a post hoc revision of the analysis, we replaced the rate of DP with the ratio of the rates of DP to EE to predict the MBI. The reason for the revised analysis was our realization that the planned analysis using the absolute rate of DP would not be informative unless compared to the rate of EE.

For our second aim, we meta-analyzed the correlation coefficients with random effects estimation with inverse variance weighting. As noted by Borenstein et al., the correlation coefficient can serve as the effect size index.28 However, as recommended by Borenstein for validity, we transformed the correlation coefficients to Fisher’s z scale.

In all analyses, we deemed correlation coefficients to indicate reliability when the correlation coefficient was at least 0.7 29,30 or the corresponding proportion of variance explained (R2) was at least 50%.

We performed all statistics with the R software.31 Random effects meta-analyses with the Hartung and Knapp approach32 were done with the metacor function of the R package meta,33 while the mixed effects analysis for creating the conversion equation was done with the lmer function of the package lme4.34 The dominance analysis was performed with the package domin.35

All data and statistical code are available online at https://ebmgt.github.io/well-being_measurement/.

Role of the Funding Source

We received no funding for this review.

Results

Study Selection and Characteristics

We included 17 studies reporting 15,600 survey recipients and 6788 respondents (PRISMA Flow diagram; Fig. 1). The daily PubMed alert identified two included studies before our manuscript submission 36,37 and the study by Song published after our submission.38 The 17 included studies are described in Table 1.

Figure 1
figure 1

PRISMA flow chart.

Table 1 Characteristics of Studies Included

As previously found by Rotenstein 1, studies defined burnout with the MBI in many ways: MBI:EE and MBI:DS scales both abnormal in two studies,39,40 and MBI:EE and MBI:DP single items positive in two studies,39,41 and MBI:EE or MBI:DP scales abnormal in the remaining studies. Only the study by Ong used the updated Maslach criteria of a high EE scale and either high DP or low personal accomplishment scales.39 Studies used from 642 to 7 anchors for the MBI questions. Even within the two studies from different years by the same organization, the Sierra Sacramento Valley Medical Society, the MBI anchors changed. The first survey did not include the conventional anchor of “a few times a month”,41,42 which was corrected in the second survey.43 The addition of the new anchor was associated with an 8% drop in the rate of burnout, as 15% of respondents chose the new anchor of “a few times a month” for each MBI question.

The lowest rates of burnout tended to be in the two studies that required both MBI:EE and MBI:DS scales to be abnormal (20 to 26%).39,40 This was followed by studies requiring either single item to be abnormal (27 to 48.1%) 39,43 and studies requiring either scale to be abnormal which tended to report the highest rates of burnout (6.7 to 85%).29,44

There were insufficient data to meta-analyze studies that required both MBI:EE and MBI:DS scales to be abnormal; therefore, this analysis focuses on the remaining papers.

Risk of Bias in Studies

No studies used standard reporting recommendations for surveys; however, all studies were published before the availability of the CROSS guideline for reporting of survey studies.24

A high risk of bias was present in all studies as no study met the CLARITY item of a response rate over 75% or a comparison of respondents and non-respondents (online supplement, eTable 3). However, six studies of the 17 studies had response rates between 50 and 75% (Hansen,28 Hansen,29 Kemper,32 Knox,2 Ong,39 Waddimba4513) and all of these studies met the other four CLARITY items for quality. The response rate, pooled by random effects analysis, was 35% (95% CI 24 to 49%) which was significantly heterogeneous across studies (I2 = 99%).

Results of Syntheses

There were insufficient data to meta-analyze studies that required both MBI:EE and MBI:DS subscales to be abnormal; therefore, our analysis focuses on surveys that deem burnout if either scale is abnormal. The summary of the three regressions is in Table 2.

Table 2 Summary Correlations of SIBOQ Single-item Burnout Question with the Complete Maslach Burnout Inventory (MBI) and Components of the MBI. These Results are Limited to Studies Using Surveys in English

For the first study aim, the fixed regression analysis of rates of abnormal SIBOQ versus rates of abnormal MBI yielded an equation to convert rates (Fig. 2):

Figure 2
figure 2

Prediction of MBI from the SIBOQ using fixed-regression.

$$MBI = 17.75 + SIBOQ\, rate \times 0.73$$

As shown in Fig. 2, the prediction limits are wide. The coefficient of determination, indicating the proportion of variance explained (R2) by the regression model, was 32%.

The median absolute difference of the predicted rates of the SIBOQ below the rates of the MBI in 37 comparisons was less than 1% with a range of 24% lower to 31% higher than the MBI. In comparison, the median absolute difference of the reported rates of the SIBOQ below the rates of the MBI in 37 comparisons was 6% with a range of 25% lower to 35% higher than the MBI.

The mixed regression, with the study treated as a random effect, yielded a substantial explanatory R2 at 69%. The fixed regression yielded a predictive R2 of only 32% (Table 2). The proportion of variance explained by the generalized linear mixed regression model is over the 50% threshold to indicate an adequate fit for explanatory ability. However, the mixed model’s predictive (marginal) ability is lower than the threshold for adequacy.

For the second study aim, the forest plots of random effects meta-analyses are in Figs. 3 and 4 and summarized in Table 2. Of the studies that used surveys in English, the correlations of the SIBOQ with the MBI subscales were MBI:EE 0.71 (95% CI 0.67 to 0.74) and MBI:DP 0.44 (95% CI 0.34 to 0.52). The studies whose surveys were not in English36,38 reported a significantly lower correlation with MBI:EE (Fig. 3) and a statistically nonsignificantly higher correlation with MBI:DP (Fig. 4). The I2 values, 88% and 89%, respectively, indicated substantial heterogeneity of results (Figs. 3, 4).

Figure 3
figure 3

Correlation of the SIBOQ with the MBI:EE.

Figure 4
figure 4

Correlation of the SIBOQ with the MBI:DP.

We could not assess publication bias as cross-sectional studies are not generally registered in advance.

For a sensitivity analysis, we repeated the mixed effects analysis, limited to the studies that used the full MBI for the gold standard, and the results were very similar (Online only supplement, eTable 4).

The DP/EE ratio, which ranged from 0.74 to 1.53, tended to modulate the correlation between the SIBOQ and the MLB with a beta-coefficient of 11.0 (95% CI − 2.9 to 24.9). Dominance analysis of the fixed regression yielded for the R2 52%, 42%, and 10% for the complete model, SIBOQ alone, and the DP/EE ratio alone, respectively. The p value for DP/EE ratio was 0.114.

We planned to measure the impact on the fixed regressions from study size, response rates, and the rates of women and physicians among respondents. However, these data were only available at the study level, giving too few data points to analyze.

Discussion

The SIBOQ has statistically significant and borderline adequate reliability for predicting the MBI:EE. However, the SIBOQ has statistically significant but insufficient reliability to predict the overall MBI or the MBI:Dl. Accordingly, we found substantial heterogeneity of correlations between the SIBOQ and the two MBI scales across studies. Our work complements that of Brady, who created a crosswalk at the level of the individual respondent between SIBOQ and MBI,29 while we compared measures at the organizational level.

Unfortunately, we cannot recommend the SIBOQ as a short, reliable measure of burnout that can be embedded in larger surveys of workplaces and achieve the goals of both identifying local bright spots and comparisons to external benchmarks. The mixed meta-regression, with its ability to separate predictive and explanatory abilities, supports that the SIBOQ can stratify levels of burnout across subgroups in a single fielding of the survey to help organizations identify bright spots; however, the mixed regression finds insufficient predictive ability of the SIBOQ to external benchmarks. Accordingly, the median absolute difference of the rate of the SIBOQ below the rate of the MBI was 6% (range of 25% lower to 35% higher).

The heterogeneity in response rates across studies is substantial (I2 = 100%). One explanation for variable response rates may be the recipients’ expected impact of the survey as found by Brosnan.46 If recipients have completed well-being surveys in the past and their effort did not lead to meaningful organizational change, the recipients may be less likely to repeat burnout screening measures. The response rate to the survey may modulate correlations. Brady used the AMA Masterfile and was the only study that reported a lower rate of burnout using the MBI than using the SIBOQ.29 While the difference was less than 1%, no other studies approached equivalence between the two measures. This is revealed in Fig. 2 by the Brady study (black points) being lower on the plot than the other studies. One possible explanation is that when response rates are low, survey recipients who do respond may be more influenced by the subjective anchors in the SIBOQ than the objective anchors in the MBI. In a prior study of respondents to the Staff Surveys of the English National Health Service (NHS), we found that sites with low response rates reported more work stress and less engagement in a survey that used subjective Likert anchors.47 This hypothesis would be difficult to study as it would require an additional study like Brady’s in which both the SIBOQ and MBI were asked of multiple sites with a range of response rates.

The DP/EE ratio may also modulate the ability of the SIBOQ to predict the MBI. This ratio varied from 0.68 among the pediatricians in Brady’s study, to 1.91 among the emergency medicine physicians in Brady’s study.29 Prior research identified two modulators of the DP/EE ratio. First, women respondents may report EE whereas men tend toward DP or cynicism.5,48,49 Second, the nature of worksite demands may affect the ratio. Leiter found that worksite workload contributes to exhaustion whereas lack of values congruence with management contributes to cynicism.49 Lastly, the previously reported reduced internal consistency of MBI:DP compared to MBI:EE50 may affect the effect of the ratio. The role of the DP/EE ratio should be further explored with larger studies or with an individual patient meta-analysis.

The study by Kemper supports the inadequacy of the SIBOQ to predict rates in external populations. Kemper was the only study with a negative correlation between the rates of abnormal SIBOQ and MBI across the subgroups within the study (Fig. 2 online, purple and largest points).51 Kemper was also the only study with subgroups based on the year of survey administration. Restated, Kemper had the unusual finding in the second year of having a lower MBI, but higher SIBOQ. In the second year, the sample size increased by 20% as the number of study sites increased from 34 to 46. While the differences between years are small, the absence of a positive relationship raises the question of whether the new sites may have had a different profile of stressors, specifically a lower rate of value incongruence relative to the rate of workload stress.

Due to the cost of the MBI and the low predictive ability of the SIBOQ, organizations might choose to switch to another non-proprietary scale. However, aside from the SIBOQ and MBI, few surveys reviewed by the National Academy of Medicine have extensive benchmarks, and all have at least six items. Increasingly, organizational psychologists recognize the need to combine short scales that measure issues relevant to organizations.52

We encourage survey developers to acknowledge the difficulty of survey creation and support collaborative evolution of their work. A successful example is the measurement of psychological engagement at work with the UWES-3,53 which has derivatives used by national surveys of the CDC,54 National Health Service,55 and American Psychological Association.56 One way to promote collaboration is to publish surveys with a Creative Commons “copyleft” license without a “NoDerivatives” feature.57 Surveys developed with public funding may require and support open-access publication with a Creative Commons license. Authors who cannot afford open-access fees can still publish their survey items prior to journal submission with a Creative Commons license in repositories that can create a digital object identifier (DOI), such as GitHub or the Open Science Framework. We strongly commend the studies in this review that display a Creative Commons copyright in their publications.36,37,39,58,59,60,61,62 This includes three studies that validated items to measure DP that could be studied with the SIBOQ.39,58,63

A copyleft license with a NonCommercial feature allows the developer to separately negotiate commercial licenses with payments. The entertainment industry provides a precedent. Many songs or films have been created, performed, and recorded by one artist, then covered years later by a different one, sometimes obtaining similar or greater success than the original. Imagine if Simon and Garfunkel in 1964 refused to allow modifications to their song, “Sounds of Silence,” which has been covered or sampled over 100 times, including Disturbed’s cover 50 years later or the more recent sample by Eminem in “Darkness.”64 Similar approaches in academics could give survey developers continued success and financial gain if their work evolves in future surveys.

In conclusion, the SIBOQ can stratify levels of burnout across subgroups in a single fielding of the survey to help organizations identify bright spots whose success can inform the improvement of struggling hot spots. However, the SIBOQ is less able to compare local results to external benchmarks. For survey developers, we encourage using copyrights that allow surveys to evolve.