Background

Traumatic injuries have become the fourth leading cause of mortality and disability around the world [1], resulting in more than 5 million deaths annually [2]. With the acceleration of global population aging, trauma injuries in geriatric patients have become a worldwide concern [3]. It has been reported that the risk of mortality in geriatric patients is greater than that among young people with similar injury severities [4]. The mechanisms of injury and physiological responses in geriatric trauma patients are different from those in the younger population [5]. Multiple reasons, including frailty, degraded physiological status, and comorbidities, contribute to a greater demand for medical resources but a higher risk of poor prognosis in the geriatric trauma population [6,7,8,9].

Regarding the rational allocation of medical resources and the maximum reduction in trauma-related mortality in geriatric trauma patients, an accurate assessment of geriatric trauma patients seems to be necessary. Multiple trauma scoring systems have been developed to describe the severity of injuries and evaluate the clinical outcomes and prognosis of trauma patients [10]. The Injury Severity Score (ISS), considering the three most serious injuries of the body according to the Abbreviated Injury Scale (AIS), was the most widely used trauma scoring system [11,12,13]. Although the performance of the ISS in prognosis prediction in trauma patients has been well validated in some studies [13, 14], evidence on the ability to predict mortality in the geriatric trauma population is still conflicting [15]. The results of Tamim et al. showed that the AUROC of the ISS in predicting mortality in all-age trauma patients was 0.881 (0.816–0.945), while the AUROC of the ISS in predicting mortality in patients aged over 65 years was only 0.584 (0.401–0.767) [16]. In recent years, some studies reported the use of other scoring systems in the elderly trauma population [5, 12, 17]. Trauma and Injury Severity Score (TRISS), including age, the ISS, and the Revised Trauma Score (RTS), was also a widely used trauma scoring system in the past 30 years [18]. Previous studies have suggested that the TRISS has good performance in predicting mortality in geriatric trauma patients [5, 12, 19,20,21]. The Geriatric Trauma Outcome Score (GTOS) was specifically developed for prognosis prediction in older patients [22]. Some studies reported good performance of the GTOS in predicting mortality in geriatric trauma patients [5, 12, 23,24,25], while the research conducted by Meagher et al. showed low accuracy of the GTOS in predicting 30-day mortality in trauma patients over 65 years old [26]. Another clinical study showed that the performance of the GTOS in predicting mortality in geriatric trauma patients with an ISS ≥ 15 was not better than age [27].

Given the potential limitations of the application of ISS in geriatric trauma patients, the present study expected to find a scoring tool that is more applicable to geriatric trauma patients. Therefore, the present meta-analysis was conducted to assess the performance of ISS, TRISS, and GTOS in mortality prediction in geriatric trauma patients.

Material and methods

The present meta-analysis followed the Preferred Reporting Items for Systematic Review and Meta-analysis (PRISMA) statement (shown in Additional file 1: Table S1) [28].

Inclusion and exclusion criteria

Studies were eligible if they (1) included a separate group of geriatric trauma patients (aged 60 years or older) and (2) reported the area under the receiver operating characteristic curve (AUROC) of the ISS, TRISS, or GTOS to evaluate the performance in mortality prediction, or the exact number of true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) results could be extracted indirectly. Studies were excluded if they met the following criteria: (a) were not conducted in a separate group of geriatric trauma patients; (b) had no clear definition of age in geriatric patients; (c) did not consider mortality as the outcome variable; (d) lacked the data required for the present meta-analysis.

Search strategy

Two investigators (Liu and Qin) carried out a systematic search in the MEDLINE, Web of Science, and EMBASE databases. The MEDLINE database was searched through PubMed. Studies were published from January 2008 to October 2023—there were no language restrictions for the search. Furthermore, the bibliographies of all included studies were searched manually to identify potentially eligible articles.

Study selection and data extraction

Relevant studies in each database were merged, and duplicate records were removed. Two investigators (Liu and Qin) independently reviewed the titles and abstracts of all articles in the initial search to identify potentially relevant studies. They retrieved the full texts of potentially eligible studies. Two investigators (Liu and Tian) independently extracted the following data from the included studies: the first author’s surname, year of publication, location of study, number of participants, age, sex (proportion of male subjects), definition of geriatric, outcomes for prediction, AUROC and its 95% confidence interval (CI), cut-off value, sensitivity, and specificity. Discrepancies between the two investigators were resolved by reaching a consensus with a third investigator (Wei-Ming Xie). A consensus on all items was reached by all investigators through discussion and examination.

Assessment of the risk of bias

Two investigators (Liu and Qin) used the Critical Appraisal Skills Programme (CASP) Clinical Prediction Rule Checklist to assess the risk of bias [29]. The CASP Clinical Prediction Rule Checklist consists of 3 parts and 11 questions. The first two questions of the first part are screening questions, and the remaining questions are detailed questions. The first two questions could be answered quickly. If the two answers were “yes,” it was worth proceeding with the remaining questions; otherwise, the assessment was terminated. Discrepancies between the two investigators were resolved by reaching a consensus with the third investigator (Wei-Ming Xie).

The funnel plot was used to evaluate the publication bias, of which the symmetry was tested by Egger’s linear regression test [30].

Assessment of the quality of recommendations

Two investigators (Liu and Qin) used the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach to assess the quality of evidence for each outcome [31]. Discrepancies between the two investigators were resolved by reaching a consensus with the third investigator (Wei-Ming Xie).

Data synthesis and analysis

All data extracted were analyzed by STATA 16.0. The results are shown in the form of forest plot figures. The pooled AUROC and its 95% CI of each involved trauma scoring system were calculated from the combined studies. The AUROC ranged from 0 to 1, with an AUROC = 1 representing perfect predictive ability and an AUROC = 0.5 representing no predictive ability. A fixed‐effects model was applied when there was no statistically significant heterogeneity; a random‐effects model was applied when the heterogeneity between combined studies was statistically significant. The heterogeneity among the combined studies was evaluated by Cochran Q statistics (P < 0.10 indicated statistically significant heterogeneity) [32]. As a supplement to the evaluation of heterogeneity, the I2 statistic is a quantitative measure of heterogeneity, which is divided into low, moderate, and high thresholds of 25%, 50%, and 75%, respectively [33, 34]. If the heterogeneity was too high, a sensitivity analysis was conducted to identify the potential source of the heterogeneity. Moreover, a hierarchical summary receiver operating characteristics curve (HSROC) was established by combining the information about TP, TN, FP, and FN results reported in studies, from which the summary sensitivity, specificity, and diagnostic odds ratio (DOR) with its 95% CI for each involved trauma scoring system could be calculated [35].

Results

Search results

The present meta-analysis identified 12,049 records by the search strategy, of which 6951 duplicates were removed. A total of 606 studies were found by reading titles and abstracts that used the ISS, TRISS, or GTOS as a prognostic method for clinical outcome prediction, of which 4 full texts were not available. Among these 602 studies, twenty studies were excluded due to the lack of a clear definition of elderly patients. A total of 378 studies were excluded due to not evaluating the performance of the ISS, TRISS, or GTOS in a group of elderly patients. Ninety-eight studies were excluded due to the lack of mortality outcomes. Eighty-seven studies were excluded due to the AUROC, or the exact number of TP, TN, FP, and FN test results could not be obtained. Ultimately, 19 studies evaluating the performance of the ISS, TRISS, and GTOS in the prediction of mortality in geriatric trauma patients were included in this meta-analysis [5, 12, 16, 17, 19,20,21, 23,24,25,26,27, 36,37,38,39,40,41,42]. The flowchart of the search process is presented in Fig. 1. The characteristics of the included studies are shown in Table 1. The baseline characteristics of the patients are shown in Table 2 and 3.

Fig. 1
figure 1

Flowchart of literature search and study selection

Table 1 Characteristics of the included studies involved in the pooled AUROC analysis
Table 2 Baseline characteristics of patients
Table 3 Characteristics of the included studies involved in HSROC analysis

Assessment of the risk of bias

The assessment of the risk of bias in the included studies is shown in Table 4. The quality of the included studies was moderate-high. The publication bias of the included studies reporting data about the ISS, TRISS, and GTOS was evaluated by funnel plots as respectively shown in Additional file 1: Figure S1, and Figure S3. Egger’s linear regression tests for the symmetry of the funnel plot of ISS (P = 0.712), TRISS (P = 0.091), and GTOS (P = 0.624) all indicated no obvious publication bias.

Table 4. Detailed CASP checklist of the quality assessment for included studies

Study characteristics and results

The present meta-analysis included 19 studies conducted in 12 different countries, including the USA, Germany, Australia, Britain, Iran, India, Canada, Singapore, Korea, Spain, China, and Sweden. A total of 118,761 participants were evaluated. The majority of participants were male. Nine of the 19 studies (47.4%) assessed the ISS, 9 studies (47.4%) assessed the TRISS, and 11 studies (57.9%) assessed the GTOS. The included studies were retrospective cohort reviews. The endpoints of mortality prediction were different. Two studies (10.5%) used 30-day mortality as the clinical outcome, 9 studies (47.4%) used in-hospital mortality as the clinical outcome, and 8 studies (42.1%) used all-cause mortality as the clinical outcome.

Synthesis of results

Pooled AUROC for predicting mortality in geriatric trauma patients

Among the 19 included studies, 18 studies reported data about the AUROC and its 95% CI. One study did not report the 95% CI of the AUROC [36]. Eight studies [12, 16, 20, 21, 27, 37,38,39] reported the AUROC and 95% CI of the ISS, eight studies [5, 12, 17, 19,20,21, 40, 41] reported the AUROC and 95% CI of the TRISS, and 11 studies [5, 12, 17, 23,24,25,26,27, 37, 38, 42] reported the AUROC and 95% CI of the GTOS. Details of the included studies are shown in Table 1.

The pooled AUROC for the 8 studies that reported ISS data was 0.74 (95% CI: 0.71–0.79; shown in Fig. 2). The heterogeneity among the combined 8 studies was statistically significant (Cochran Q statistic’s P < 0.001; I2 = 92.8%; shown in Fig. 2); thus, the pooled AUROC of the ISS was determined by a random-effects model. Sensitivity analysis was also conducted (shown in the Additional file 1: Figure S4). When studies were removed one by one, the pooled AUROC and 95% CI of the remaining studies did not change, and the heterogeneity among the remaining studies was still high. The sensitivity analysis did not determine the potential source of heterogeneity.

Fig. 2
figure 2

Result of pooled AUROC analysis for ISS in predicting mortality among geriatric trauma patients. The square in the forest plot denotes the pooled AUROC with the corresponding 95% CI. AUROC, area under the receiver operating characteristic curves; CI indicates confidence interval; ISS, Injury Severity Score

The pooled AUROC of the TRISS indicated better performance in predicting mortality in geriatric trauma patients (AUROC = 0.82, 95% CI: 0.77–0.87; shown in Fig. 3). The result was determined by a random-effect model due to the significant heterogeneity (Cochran Q statistic’s P < 0.001; I2 = 97.0%; shown in Fig. 3). However, sensitivity analysis did not find a potential source of heterogeneity among the combined studies (shown in Additional file 1: Figure S5).

Fig. 3
figure 3

Result of pooled AUROC analysis for TRISS in predicting mortality among geriatric trauma patients. The square in the forest plot denotes the pooled AUROC with the corresponding 95% CI. AUROC, area under the receiver operating characteristic curves; CI indicates confidence interval; TRISS, Trauma and Injury Severity Score

The result of the pooled AUROC of the GTOS indicated a performance between the TRISS and ISS to predict mortality in geriatric trauma patients (AUROC = 0.80, 95% CI: 0.77–0.83; shown in Fig. 4). The heterogeneity among the combined 11 studies was also statistically significant (Cochran Q statistic’s P < 0.001; I2 = 98.5%; shown in Fig. 4), and the result of the pooled AUROC was calculated from a random-effect model. Sensitivity analysis of the combined 11 studies did not identify a potential source of heterogeneity (shown in Additional file 1: Figure S6). The certainty of evidence evaluated by the GRADE approach was low. Details are shown in Additional file 1: Table S3.

Fig. 4
figure 4

Result of pooled AUROC analysis for GTOS in predicting mortality among geriatric trauma patients. The square in the forest plot denotes the pooled AUROC with the corresponding 95% CI. AUROC, area under the receiver operating characteristic curves; CI indicates confidence interval; GTOS, Geriatric Trauma Outcome Score

HSROC analysis for predicting mortality in geriatric trauma patients

Seven included studies [17, 20, 30, 32, 34, 38, 39] reported information on sensitivity and specificity that could be applied in HSROC analysis, and detailed information about the studies included in the HSROC analysis is displayed in Table 3. HSROC curves of the ISS, TRISS, and GTOS for predicting mortality in geriatric trauma patients are illustrated in Fig. 5. Figure 5A shows the HSROC curve of the ISS, and the pooled sensitivity and specificity of the ISS score extracted from HSROC analysis were 0.59 (95% CI: 0.16–0.92) and 0.81 (95% CI: 0.59–0.93), respectively. Figure 5B shows the HSROC curve of the TRISS, and the pooled sensitivity and specificity of the TRISS extracted from the HSROC analysis were 0.91 (95% CI: 0.75–0.97) and 0.68 (95% CI: 0.49–0.82), respectively. Figure 5C shows the HSROC curve of the GTOS. The pooled sensitivity and specificity of the GTOS extracted from the HSROC analysis were 0.85 (95% CI: 0.79–0.89) and 0.46 (95% CI: 0.29–0.64), respectively. The pooled DORs of the ISS, TRISS, and GTOS from the respective HSROC curves were 6.27 (95% CI: 1.23–31.8), 21.5 (95% CI: 3.56–129.5), and 4.76 (95% CI: 2.81–8.06), respectively.

Fig. 5
figure 5

HSROC curves for A ISS, B TRISS, and C GTOS in predicting mortality among geriatric trauma patients. HSROC, hierarchical summary receiver operating characteristics; GTOS, Geriatric Trauma Outcome Score; ISS, Injury Severity Score; TRISS, Trauma and Injury Severity Score

Information about the sensitivity, specificity, and DOR from the HSROC analysis is listed in Table 5. In summary, the sensitivity for the prediction of mortality in geriatric trauma patients was the best for the TRISS, and the specificity for the prediction of mortality among geriatric trauma patients was the best for the ISS. For the DOR, a combined index to evaluate the performance of prediction, the TRISS obtained the highest value among the three involved trauma scoring systems.

Table 5 Summary of detailed information based on HSROC analysis

Discussion

Typically, trauma scoring systems can be divided into anatomical injury scores, physiological scores, and combined scores with both anatomical and physiological factors [43]. The ISS is the most widely used anatomic injury score and is considered the “gold standard” for assessing the severity of anatomical injuries [44], while the TRISS is the most widely used combined score based on anatomical and physiological variables [18]. However, most of the general trauma scoring systems were designed for all-age populations, the performances of which are uncertain among the elderly population [15]. In a study published in 2015, Zhao et al. established the GTOS, a combined score that aimed to evaluate the prognosis of geriatric trauma patients specifically [22]. However, the performance of the GTOS in predicting the prognosis of geriatric trauma patients is still controversial due to the short application period [12, 25, 26]. The present meta-analysis aimed to evaluate the performance of the ISS, TRISS, and GTOS in predicting mortality in geriatric trauma patients. The results of the present meta-analysis showed better performance of the TRISS in predicting mortality in geriatric trauma patients than the ISS and GTOS following the pooled AUROC and HSROC analyses.

Previous studies have suggested that the TRISS has good performance in predicting mortality in geriatric trauma patients [5, 12, 19,20,21, 40]. The results of this meta-analysis also showed that the AUROC, DOR, and sensitivity of the TRISS were higher than those of the ISS and GTOS. The DOR combines the TP, TN, FP, and FN values to indicate test accuracy [45]. Higher DOR values indicate better accuracy of a test [45]. In this meta-analysis, although the specificity of the TRISS was lower than that of the ISS, the DOR of the TRISS was much higher than that of the ISS and GTOS, which suggested better performance of the TRISS in predicting mortality in geriatric trauma patients.

The comprehensive composition of scoring indicators of TRISS may contribute to the good performance of TRISS [36]. ISS only evaluates the injury severity from the anatomy. However, physiological indicators may also have an enormous impact on prognosis in geriatric trauma patients [14, 46,47,48]. Systolic blood pressure (SBP) and respiratory rate (RR) are the physiological indices included in the TRISS. Multiple studies have suggested that the risk of mortality increases 3 to 5 times with a systolic blood pressure of less than 90 mmHg [49]. Hranjec et al. suggested that a systolic blood pressure of less than 130 mmHg may increase the risk of mortality in elderly patients [50]. The study of Wilson et al. to identify early predictors of mortality in geriatric trauma patients reported no significant difference in SBP between the two groups [41]. However, the sample size of 147 may affect the results of Wilson et al. [41]. Many elderly patients have poorer physiological function reserves, and they may have several coexisting medical conditions, which cause a higher risk of death [48, 51]. The results of Tamim et al. showed a large gap between the performance of ISS in the geriatric trauma group (AUROC = 0.584) and the performance of ISS in all-age groups (AUROC = 0.881) [16]. What is more, Yousefzadeh-Chabok et al. reported that the AUCs of the ISS and TRISS in predicting mortality in geriatric trauma patients were 0.76 and 0.94, respectively [20].

The GTOS was specifically designed to assess the prognosis of geriatric trauma patients based on age, ISS, and history of packed red blood cell (PRBC) transfusion in the first 24 h after trauma [22]. Although the performance of the GTOS in geriatric trauma patients is still controversial [5, 12, 24, 25], most studies comparing the GTOS and ISS in geriatric patients suggested that the GTOS performed better than the ISS in predicting mortality, which is consistent with the results of this meta-analysis [26, 27, 37, 38]. Compared with ISS, the improved performance of GTOS in predicting mortality in geriatric patients with trauma may be also from the inclusion of physiological parameters. Age and blood transfusion after trauma were shown to be associated with morbidity and mortality during hospitalization in geriatric patients with trauma [23], aiding GTOS in evaluating patient condition more comprehensively. Park et al. concluded that the GTOS was superior to the TRISS in predicting mortality for predicting mortality in Korean patients with trauma who were older than 65 years of age [5]. In contrast, studies conducted by Barea-Mendoza et al. and Madni et al. indicated that the performance of GTOS was not as good as that of the TRISS for predicting mortality in geriatric patients with trauma [12, 17]. In addition to the controversy regarding the GTOS, the requirement of a history of PRBC transfusion in the first 24 h may delay an accurate prognostic calculation [12]. The TRISS can evaluate the risk of mortality earlier than the GTOS, which may be critical for patients with early hospital admission and a high death risk. On the other hand, the GTOS also has the advantage of being a single and simple measurement. The GTOS can be calculated easily and rapidly with the free calculator available at www.palliateconsortium.com. Hence, as a “young” trauma scoring system, more studies are needed to validate the value of the GTOS in the field of geriatric trauma.

Durability is the primary advantage of the TRISS, which has been well-validated in the past 30 years [12]. The TRISS was developed from the Major Trauma Outcome Study (MTOS) database [52]. The TRISS utilizes logistic regression to develop the β coefficients based on both elderly and non-elderly patients that determine the weight of its variables. Madni et al. showed that the TRISS still performed well in predicting mortality in a geriatric-specific cohort even with previous beta coefficients, with AUCs ranging from 0.8895 to 0.8869 [12]. The superior performance of the TRISS in the study of Madni et al. indicated that the predictive performance of the TRISS in geriatric patients did not seem to be impaired, even though non-elderly patients were included in the generation of the age coefficients. The study conducted by Yousefzadeh-Chabok et al. even reported an AUROC as large as 0.94 for the TRISS in predicting mortality in geriatric patients [20].

There are some limitations in the utilization of the TRISS. The first recorded physiological data may vary for out-of-hospital interventions [53]. The GCS may be difficult to calculate for patients with prehospital intubation and patients in a state of sedation [54]. The first acquisition of physiological data relies on the records of field and emergency workers, especially for patients with the worst injuries. Therefore, although the results of this meta-analysis suggested superior performance of the TRISS in predicting mortality in geriatric trauma patients, prospective assessment is needed to clarify the ability of the TRISS to predict mortality in patients with critically serious trauma.

There were several limitations in interpreting the results: (a) The present meta-analysis was based on observational studies, which ranked low in the GRADE evidence quality assessment. Clinical studies with multiple centers and larger sample sizes are necessary to provide higher-quality evidence for evaluating the accuracy of trauma scoring systems in predicting mortality in geriatric trauma patients. (b) The present meta-analysis only compared two representative trauma scores, ISS and TRISS, and the GTOS trauma scoring system designed for elderly patients. In addition, some trauma scoring tools such as RTS and NISS were not included in our study, which may limit the generalizability of the results in this study. Comprehensive meta-analyses are needed in the future to analyze the performance of more trauma scoring tools in predicting outcomes in elderly trauma patients. (c) The populations of the included studies were mostly from Western countries, and the data from specific trauma centers in Asia and Africa were lacking. (d) The heterogeneity of the combined studies was high, but the potential source of heterogeneity was not determined.

Conclusions

The TRISS showed better accuracy and performance in predicting mortality in geriatric trauma patients than the ISS and GTOS. Further studies need to be conducted prospectively to specify the appropriate choice of variable trauma scoring systems in patients with different conditions.