Introduction

Traumatic brain injury (TBI) is an important cause of morbidity and mortality in Europe [15]. Among all TBIs, moderate and severe TBIs (msTBIs) admitted to the intensive care unit (ICU) have the poorest prognosis, 55% of them have an unfavorable outcome at 6 months [21].

To respect patients’ values and preferences and possibly withdraw or withhold inappropriate life-sustaining treatment, estimating neurological prognosis and informing surrogate decision-makers (SDMs) are paramount in neuro-ICU [2, 23]. Although numerical prognostic estimations may not be advisable for informing SDMs [18] and should not be used directly to decide beyond which threshold we are willing to discontinue life-sustaining treatments, numerical prognostic estimations are essential to know the reliability of physicians or scores.

However, the ability of physicians to predict mortality is uncertain in critically ill patients [20, 26]. The prediction of functional outcomes is even more uncertain and less studied [5, 7], and few studies have investigated physicians’ ability to predict functional outcome specifically for patients with msTBI [25]. There are prognostic models, such as the CRASH and IMPACT scores [6, 22], validated in large cohorts [19], but rarely used in daily practice for many reasons. These scores are not well understood, and not valid for clinical decision-making. Clinicians do not know how to properly interpret them and, therefore, distrust their accuracy [18, 24]. In addition, a recent pilot study that compares physician prediction with IMPACT score prediction did not show a significant difference [1]. Therefore, the experience of the clinician seems to prevail over the implicit use of these scores [16].

Faced with the challenge of communicating prognosis without knowing their ability to predict, physicians experience anxiety, discomfort, or frustration [13, 18, 24, 25]. This situation also generates misunderstandings between families and physicians [5, 14, 18].

To better understand our ability to predict, and the role that our clinical experience plays in this ability, we hypothesized that experience in a neuro-ICU would improve physicians’ ability to prognosticate an unfavorable outcome in the early phase of an msTBI. To test this hypothesis, we compared the prognosis made by young physicians with one’s made by experienced physicians at the time of the initial management of msTBI.

Materials and methods

Design

PREDICT II was a prospective study that interviewed clinicians about patient records. We evaluated the accuracy of prognosis determined by physicians of ICU patients diagnosed with msTBI. The protocol was registered at ClinicalTrials.gov, NCT04810039 before interviews. The approval was obtained from the Pitié Hospital Ethics Review Board and Ethic comity of the French Anesthesia Reanimation Society. Physicians and patients were recruited after obtaining their informed consent.

Scenarios

Sixteen patients with msTBI were randomly selected from 55 patients prospectively included in the PREDICT TBI study (NCT03874546) at the Pitié-Salpêtrière Hospital from April 2019 to December 2020 [1].

The inclusion criteria were as follows: msTBI patients over the age of 18 years with a Glasgow Coma Scale (GSC) score \(\le\) 12. Patients were excluded from the study for the following reasons: a decision to discontinue life-sustaining treatment within the first 24 h after ICU admission, patients under court protection, pre-existing disability defined by a score > 1 on the Rankin scale, and pregnancy. Scenarios were built with the first 24-h data after ICU admission including clinical, biological, admission CT scan, and the last CT scan.

Physicians

Neuro-ICUs from 4 French hospitals recruited physicians in 2021: Beaujon Hospital, Lariboisière Hospital, Kremlin Bicêtre Hospital, and Pitié-Salpêtrière Hospital in the Ile-de-France region. Both junior and senior physicians were anesthesiologists and/or intensivists. We were very careful to make sure that physicians were not involved in the care of selected patients at the Pitié-Salpêtrière Hospital. Indeed, all the senior physicians were practicing in hospitals outside of the 16 patients’ recruitment (Beaujon Hospital, Lariboisière Hospital, and Kremlin Bicêtre Hospital) and were selected based on whether they had at least 3 years of experience in the neuro-ICU and had graduated from a critical care fellowship. Junior physicians were recruited from 2 hospitals (Pitié-Salpêtrière Hospital and Beaujon Hospital) more than 6 months after inclusion of the 16 patients. The French anesthesia critical care residency consisted of changing internships every 6 months for 5 years, alternating between surgical ICU and anesthesia internships. They were selected if they had at least 3 years of internship and had at least 6 months of experience in neuro-ICU.

Administration of scenarios

Physicians’ predictions were independently obtained during a single session per physician. All physicians were given all 16 scenarios written on a computer associated with the CT-scan images. Before each questionnaire, physicians were systematically reminded about the meaning of the GOS score. There was no time limit imposed. The physicians and the investigator were blinded to patient outcome at 6 months.

Assessment of the 6-month outcome

Outcomes were assessed prospectively at the 6-month follow-up using the Glasgow Outcome Scale (GOS) [9]. Two trained interviewers collected the GOS during a telephone interview with the subject or his/her legal representative, using a standardized script.

Main outcome measures

Physicians were asked to predict for each patient the Glasgow Outcome Scale (GOS) at 6 months, the risk of unfavorable outcome at 6 months between 0 and 100, and their level of confidence in their prediction. Confidence described as “certain” or “very confident” was considered a high level of confidence; and confidence described as “confident,” “not very confident,” or “uncertain” were considered a low level of confidence. A high level of agreement between physicians was defined as agreement among more than 2/3 of physicians regarding the patient’s prognosis. To determine the actual GOS at 6 months, 2 trained raters, ignoring clinicians’ predictions, interviewed patients or relatives by telephone using a standardized script if the GOS was 3 or 4. A single rater was sufficient for obtaining GOS of 1, 2, or 5. Since we did not observe any difference between the 2 assessors, we did not need to perform a 3rd assessment.

The primary endpoint of the study was the correct prediction of an unfavorable outcome (defined as a GOS 1–3). The correct physician prediction was concordant with the actual evolution at 6 months.

During the first interview, physicians were also asked what prognostic elements they would communicate with the family. The physician’s expression of the patient’s risk to a family concerned about long-term prognosis was scaled from 1 to 5, from the lowest level of concern (level 1) to the highest level of concern transmitted to the family (level 5). For level 1, the physician emphasized only the good prognostic factors. For level 2, the physician emphasizes the good prognostic factors and the uncertainty of the prognosis; level 3: prognostic uncertainty only; level 4: risk of disability or death and uncertainty of prognosis; and level 5: only the risk of death and disability.

The questionnaire was translated into English and published in the Supplementary information.

Sample size

To avoid conducting an interview that is too long for the physician, we limited the number of charts to be analyzed to 16 patients per physician. A target sample of 18 junior physicians and 18 senior physicians was chosen to achieve 80% power with an α risk of 5%, with an estimated probability of error of 30% for senior physicians and 40% for junior physicians, in accordance with a previous study PREDICT.

Statistical analysis

The first level of analysis corresponded to physicians who were interviewed independently. Each physician makes a number of correct and incorrect predictions, modeled using binomial regression as the dependent variable. We compared the “senior” group with the “junior” group using the likelihood ratio test. The second level of analysis was the prediction made for each patient by all physicians (576 predictions). The physicians’ predictions were analyzed using a mixed-effects logistic model with a random intercept for each patient, a fixed effect for group (senior vs. junior), a fixed effect for confidence levels (high confidence vs. low confidence), and a fixed effect for the level of agreement between clinicians on the patient’s prognosis (high level of agreement vs. low level of agreement). Because predictions made by the same physician are not independent, we initially introduced a random intercept on the physician in a cross-classified multilevel model. However, the introduction of this second random effect variable was not justified following a non-significant likelihood ratio test.

To measure inter-physician reliability, we calculated the intraclass correlation coefficient for the quantitative variable and a Fleiss’s kappa coefficient for the qualitative variable. We defined a discordance between two physicians if one physician judged the evolution of a patient to be unfavorable while another judged it to be favorable, regardless of the actual fate of the patient.

To analyze the numeric variable summarizing the physician’s communication to the family about prognostic severity, we performed a generalized linear multiple mixed linear model with a random intercept on the patient and having as explanatory variables the physician’s experience (his or her group: senior vs. junior) and the physician’s perceived severity (the numeric value between 0 and 100 estimating the risk of a GOS < 4 at 6 months).

Statistical analysis was performed using the R software (version 4.1.0), Vienna, Austria. All tests were 2-tailed, and p values < 0.05 were considered statistically significant. The STROBE checklist was used.

Results

We interviewed 18 physicians in the senior group (8 at Beaujon Hospital, 4 at Lariboisière Hospital, and 6 at Kremlin Bicêtre Hospital) and 18 in the junior group (8 at Beaujon Hospital, and 10 at Pitié-Salpêtrière Hospital) from March to June 2021. Senior physicians had a median of 7 years of experience with an interquartile range of 5–12 years, a minimum experience of 3.5 years, and a maximum of 25 years. All junior physicians had at least 3 years in anesthesia critical care residency and had completed at least a 6-month internship in neuro-ICU. They had a median internship of 4 years, ranging from 3 to 5 years. Physician interviews to address the prognoses of the 16 charts varied in length from 40 min to 2 h per physician. The average time spent in the interview was 62 min, which was not significantly different between the two groups.

The characteristics of the 16 randomly selected patients are shown in Table 1, and all medical stories are summarized in the Supplementary information in a 16-vignette format. A majority of patients were male (94%). Their median age was 38 years [IQR 24–54]. A total of 50% underwent neurosurgery and 25% underwent decompressive craniectomy. At 6 months, 25% of patients died, and 50% had an unfavorable outcome. Only one patient died following withdrawal of life-sustaining treatment (WLST) after 15 days of intensive care (vignette number 12). The decision was made after a collegial decision, and two brain MRIs were performed at 3 and 14 days following the accident.

Table 1 Patients’ characteristics. IQR inter quantile range; GOS Glasgow Outcome Scale; SD standard deviation

The 18 senior physicians correctly prognosticated the outcomes at 6 months better than the 18 junior physicians, with 73% (95% IC 65–79) vs. 62% (95% IC 56–67) of cases (p value = 0.006). The risk of incorrect prediction was estimated for the junior group compared with the senior group, with an OR of 1.63 (95% CI 1.15–2.32). This result is illustrated in Fig. 1. The CRASH (core model and CT-scan model) and IMPACT (core model, extended model, and lab model) performances are shown for illustrative purposes in Fig. 1.

Fig. 1
figure 1

Comparison of the percentage of correct predictions among junior and senior physicians (in red). To illustrate, the three IMPACT scores (core, lab and extended) and the two CRASH scores (core model and CT-scan model) are shown in blue

The 36 physicians made 576 predictions without missing data. The physician success rate for a patient had a median value of 67% [IQR, 47–90%]. Some patients seemed to be more difficult for physicians to interpret, with a wide disparity in prognosis. Only 31% of physicians made a correct prognosis for patient number 1 and 94% for the patients’ numbers 8, 10, and 12.

Mixed-effects logistic model analyses to explain incorrect prediction of prognosis found significant effects of the following: physician experience (being a junior presented significant risk for incorrect prediction (OR 1.71, 95% CI 1.15–2.55, p = 0.008)); confidence in prediction (low confidence was a risk factor for incorrect prediction (OR 1.76, 95% CI 1.18–2.63, p = 0.006)); and agreement between senior physicians (disagreement increased the risk of incorrect prediction (OR 6.78, 95% CI 3.45–13.35, p < 0.001)) (Table 2 and Fig. 2). The model’s total explanatory power (conditional R-squared) was 0.31, and the part related to fixed effects alone (marginal R-squared) was 0.25. The C-value was 0.79 and Somers’ Dxy was 0.58.

Table 2 Mixed logistic model to explain incorrect physician’s prognosis (random intercept for patients). Univariate mixed models: univariate analysis. Model 2: multivariate analysis with senior agreement, confidence, and group in the model (residual variance = 3.29, random effect variance = 0.28)
Fig. 2
figure 2

Probabilities of correct predictions with 95% confidence intervals. Left panel: low level of agreement; right panel: high level of agreement. Red color: low level of confidence; blue color: high level of confidence

Physicians had fair agreement when predicting unfavorable outcome at 6 months for the same patient, with the Fleiss’ kappa coefficient at 0.27 (95% CI 0.25–0.29) for all physicians, at 0.25 (95% CI 0.21–0.29) for seniors and at 0.33 (95% CI 0.29–0.37) for juniors. The Fleiss’ kappa value for ordinal data without having dichotomized the GOS into favorable or unfavorable outcome was only 0.14 (95% CI 0.13–0.16). The intraclass correlation coefficient performed on the physician-given probability of an unfavorable outcome at 6 months between 0 and 100 was 0.38 (95% CI 0.26–0.57).

Physician responses to the family about prognostic severity emphasized prognostic uncertainty 79.6% of the time, and 2.3% of the responses emphasized only good prognostic factors. We found no significant difference in the way prognosis was expressed between the junior and senior groups (adjusted regression coefficient =  − 0.06, p = 0.305). However, an adaptation of speech is visible according to the probability of an unfavorable outcome (scaled between 0 and 100) evaluated by the physician (adjusted regression coefficient = 0.02, p < 0.001).

Discussion

As senior physicians performed better than junior physicians, physicians seem to be able to improve their prognostic ability with experience. Several studies have found the same correlation between accuracy and level of training in different patient populations [3, 4, 10, 26]. In a study of patients with neurological illnesses, neurocritical care attendance was more precise than residents or nurses in predicting unfavorable outcomes [10]. In this study, neuro-ICU physicians were also compared to medical ICU physicians. In France, patients with msTBI are managed in neurosurgical intensive care units by anesthetist-intensivists in collaboration with neurosurgeons. It would have been interesting to compare these two specialties. We could also have studied the differences between physicians in contact with ICU survivors either through post-ICU consultations or through a neuro-recovery clinic adjoining the neuro-ICU. This type of activity could greatly improve the prognostic abilities and future research showing that association would prompt neurocritical care physicians to observe the long-term evolution of their patients. By extension, if the predictive ability of senior physicians is superior to that of junior physicians, we can assume that SDMs have a lower ability. In a large study looking at prognosticating mortality in the ICU, families had worse prognostic performance than physicians [26].

As reported in previous studies [10, 17], physicians predicted, on average, unfavorable outcomes better than vague prophecies. However, physicians’ ability to predict 6-month functional outcome seemed more accurate in this study of neurologic patients requiring mechanical ventilation for 3 days or more [10], and in patients at 2 days of subarachnoid hemorrhage [17]. Heterogeneity of msTBI [8], physicians’ assessment based on paper records, and the presence of lesions not seen on initial scans may be partly responsible for this poor performance. In addition, the physicians’ assessment was based on paper records not on the patient’s clinical examination. Potentially useful elements for a physician in the elaboration of the prognosis, such as the physiological age of the patient or biomarkers, could not be mentioned in these scenarios. Finally, we restricted the data to the first 24 h to mimic previous model studies (IMPACT, CRASH) [19]. It also corresponds to the first meeting with the SDMs after the correction of life-threatening failures. Future studies are needed to determine whether physicians’ ability to predict patient outcome improves several days after neuro-ICU hospitalization. This is not self-evident, and this improvement in physician prediction over time was not found in a study of patients with subarachnoid hemorrhage [17].

We observed great variability in the ability of physicians to predict functional prognosis among themselves. However, this variability should be known and shared because it is interesting to note that it is a sign of a greater risk of error in the prognosis when physicians disagree. This notion of collegiality and improvement of the physician’s prognostic capacities has been highlighted several times under the term wisdom of the crowds [11, 12]. We also identified that physicians’ level of confidence would be a factor in the accuracy of the predictions, as shown in another study conducted for patients hospitalized for more than 3 days in the ICU [7]. This feeling reflects the physician’s ability to appreciate probability.

To our knowledge, this study is the first to compare the prognostic accuracy between senior and junior physicians in patients with TBI. Another strength of our study is its sufficient statistical power owing to the prior calculation of the number of subjects needed. We also ensured that the interviewer and doctors were blinded to the patients and their progress.

Our study has some limitations. We do not believe that we can determine the predictive factors that enable seniors to better predict patient prognosis. Compared with artificial intelligence, neural network algorithms for prognosis work through experiments but the internal process is hidden. In the same way with physicians, we cannot know what influences the prognosis of a given patient. Even if the physician stated that he was mainly influenced by the CT scan for a particular patient, it is obvious that the prognosis is not based on a single criterion but on a multitude of factors with a different weighting for each of the criteria and for each of the patients without the physician really knowing how to quantify this weighting.

There may be a non-differential bias of self-fulfilling prophecy, even when we sought to reduce it by asking clinicians to prognosticate 6-month functional outcome and not death. In fact, we consider that WLST is performed only for patients with an extremely high risk of progression in the long term to a GOS < 4. Among the 16 scenarios proposed to physicians, only one patient died following the WLST (Supplementary information).

We can discuss that it would have been more relevant to study the improvement at 1 year after the trauma to better understand the real fate of the patients. However, this does not limit the result of the best ability to predict functional prognosis for senior physicians at 6 months.

With a 16 patient charts, our study was underpowered to compare IMPACT and CRASH scores to physician predictions and to identify patient-related risk factors with poorer predictive ability. Therefore, agreement among senior physicians as a risk factor for incorrect prediction is a less statistically robust result despite its importance.

Although these charts were randomly selected, we cannot claim that they reflect the full heterogeneity of patients with msTBI because of their limited number.

Finally, despite the multicentric nature of the study, there is a potential selection bias related to the fact that physicians were recruited without random selection in Ile-de-France but only by their motivation to participate in the study.

Conclusions

Although prediction of 6-month functional outcome was highly variable among physicians, we observed that accuracy in predicting varied depending on the physician’s experience, confidence, and broad agreement among physicians.