Abstract
Objective
To evaluate whether the SOFA score can be used to develop a model to predict intensive care unit (ICU) mortality in different countries.
Design and setting
Analysis of a prospectively collected database. Patients with ICU stay longer than 2 days were studied to develop a mortality prediction model based on measurements of organ dysfunction.
Patients
748 patients from six countries.
Measurements and results
Two logistic regression models were constructed, one based on the SOFA maximum (SOFA Max model) and the other on variables identified by multivariate regression (SOFA Max-infection model). The H and C statistics had a p value above 0.05 for both models, but the D statistics showed a poor performance on the SOFA Max model when stratified for the presence of infection. Subsequent analysis was performed with SOFA Max-infection model. The area under the curve was 0.853. There were no statistically significant differences in observed and predicted mortalities except for one country which had a higher than predicted ICU mortality both in the overall population (28.3 vs. 19.1%) and in the noninfected patients (21.4 vs. 12.6%).
Conclusions
The SOFA Max adjusted for age and the presence of infection can predict mortality in this population, but in one country the ICU mortality was higher than expected. Our data do not allow us to determine the reasons behind these differences, and further studies to detect differences in mortality between countries and to elucidate the basis for these differences should be encouraged.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
The ability to evaluate the severity of the disease state in critically ill patients is important as a measure of intensive care unit (ICU) performance as a means of providing patients’ relatives with information regarding outcome, as a guide for resource allocation, and as a way of stratifying patients in clinical research [1]. Currently available prediction models such as the Acute Physiology and Chronic Health Evaluation (APACHE) II [2], Simplified Acute Physiology Score (SAPS) [3], and Mortality Probability Models (MPM) [4] use values taken within the first 24 h of an ICU stay. However, these scores ignore the many factors that can influence patient outcome during the course of an ICU stay beyond the first 24 h. Some researchers [5, 6] have advocated the sequential application of these systems, possibly with correction for other factors, but such use is as yet experimental, except for MPM [7]. Sicignano et al. [8] observed that the discriminative power of SAPS decreases over time, retaining its predictive power only for patients who stay in the ICU no more than 5 days. These scoring systems have been evaluated in several large Canadian and European studies, which have confirmed their predictive accuracy in those settings. The area under the receiver operator characteristic (ROC) curve for the three models ranged from 0.74 to 0.86 in these studies [9, 10, 11]. Since these scoring systems were developed using large population databases, evaluation of predictive accuracy in isolated ICUs may not always fit well because of smaller numbers of patients [12] and differences in case-mix [13, 14]. Comparing logistic regression models and artificial neural networks, Clermont et al. [12] observed that as development size decreases model performance deteriorates rapidly. Increasing the number of patients with a particular condition causes the discrimination and calibration of the MPM II to deteriorate [13] and the mortality ratios (observed deaths divided by the predicted deaths) predicted by the APACHE II to vary widely [14]. The application of such models to individual ICU performance is thus difficult. Patel and Grant [15] observed that although the predicted mortality using APACHE II, MPM II, and SAPS II is similar to the 95% confidence interval (CI) of the observed mortality, there is a poor fit to the model, impairing validity of the result. This finding may be the result of differences in quality of care, differences in case-mix, small numbers of patients, or delay in ICU admission.
The Sequential Organ Failure Assessment (SOFA) assesses patients for organ dysfunction not only at ICU admission but serially during the ICU stay and was first developed to evaluate morbidity [16]. Organ dysfunction is related to both morbidity and mortality [17, 18], and it has been shown that the SOFA score, using either the sum of the maximum scores for each system (SOFA Max), the admission value, or the changes in the first 48 h, is related to mortality [19, 20]. The aim of this study was to develop an ICU mortality prediction model based on organ dysfunction measurements and, using this model, to compare differences in ICU mortality in different countries.
Materials and Methods
A database collected by a working group on sepsis-related problems of the European Society of Intensive Care Medicine was analyzed. The database consisted of data collected in 40 ICUs from 16 countries in Australia (1), Europe (35), North America (1), and South America (3). Data were collected on all 1,449 patients admitted to the participating ICUs during a 1-month period, excluding those with a length of stay (LOS) less than 2 days following uncomplicated surgery. Coronary patients were included. Further details of the database formation are provided elsewhere [21]. For each patient basic demographic data were collected at admission, and the variables needed to construct the SOFA score were collected at admission and every 24 h thereafter. For a single missing value a replacement was calculated using the mean value of the result preceding and that following the missing one. When more than one consecutive value was missing it was considered as a missing value in the analysis. Patients with an LOS of at least 2 days were identified and classified by country. The six countries with the largest numbers of patients (Brazil, Belgium, France, Italy, Spain, United Kingdom) were selected for further analysis. Univariate analysis was performed using nonpaired t tests for continuous variables and χ2 for categorical ones to assess those related to mortality. Variables with a p value less than 0.15 were entered into a logistic regression model to identify those significantly associated with mortality. Two logistic regression models were constructed, one based on the SOFA Max, which was the best predictor in previous studies [19, 20], and another (SOFA Max-infection) based on the variables that remained significant (p<0.05) after multivariate analysis. Both models were applied individually to obtain a predicted mortality for the whole population and for each country. The variables chosen were the total SOFA score at admission, the Δ value between the SOFA score at 48 h and the admission SOFA, the SOFA Max (defined as the sum of the maximum value for each organ category during the ICU stay), and the Δ value between the admission SOFA and the SOFA Max. Calibration of the model was assessed with Hosmer-Lemeshow’s goodness-of-fit statistics [22], and the Pulkstenis-Robinson D statistic [23]. Values of the Pulkstenis-Robinson D statistic should be used in conjunction with the Hosmer-Lemeshow statistic when categorical and continuous variables are included in the analysis as data are cross-classified by categorical covariates eliminating the risk of grouping together observations with similarly fitted probabilities but different covariate patterns, which may occur with the Hosmer-Lemeshow approach. Assessment of calibration usually relies on p values being higher than 0.05. Another strength of the D statistic is that even when p values are above 0.05, significant flaws in the model can be perceived and interactions can be added to the model improving p values, although this must be done with careful clinical reasoning. The interpretation of the Pulkstenis-Robinson D statistic involves two steps: First, the p value should be analyzed and, if less than 0.05, the individual values for each strata observed. Variables that have values deviating from zero (±5) should be suspected of having important covariates. Even if p is not less than 0.05, if individual variables have values that deviate from zero, they may be thought of as important for the model if clinical reasoning is in agreement. ROC curves were constructed, and discrimination was assessed by the area under the curve (AUC) [24, 25].
The 95% CI for the observed ICU mortality was calculated and compared to the ICU mortality predicted by the models. One-way analysis of variance was used to compare means for continuous variables across different countries, and when differences were found, Fisher’s least significant differences test was applied to compare groups. All tests were two-tailed. When multiple comparisons were made, Bonferroni’s adjustment was applied. A p value less than 0.05 was considered significant. Data are presented as mean ±SD, except where otherwise indicated.
Results
The study included 748 patients from six countries. Their mean age was 54.9±18.6 years, 63.5% were men, and ICU mortality was 21.5%. Patient characteristics in the various countries are summarized in Table 1. There were fewer than 1% missing data. Differences among countries were observed in the age of the patients and in admission diagnosis. The differences in mean admission SOFA and mean SOFA Max scores among countries are shown in Table 1. Differences in ICU mortality did not reach statistical significance.
Differences between survivors and nonsurvivors were observed. Survivors were younger (age 53.3±18.9 vs. 60.7±17.0 years, p<0.0001), had a lower infection rate (27% vs. 55%, p<0.0001), lower admission SOFA (4.0±3.0 vs. 7.3±4.0, p<0.0001), lower SOFA Max (6.7±4.4 vs. 13.5±5.0, p<0.0001), lower Δ Max (2.8±2.9 vs. 6.3±3.8, p<0.0001), and lower Δ 48 h (0.0±2.4 vs. 1.3±3.4, p<0.0001). Medical admissions were more common in nonsurvivors (56 vs. 44%, p<0.01); trauma and coronary admissions were more common among survivors (14 vs. 6%, p<0.01 and 7 vs. 2%, p<0.01, respectively). After multivariate logistic regression analysis with ICU mortality as the dependent outcome the only variables that remained significantly associated with death were SOFA Max, infection, and age (Table 2). The Δ between admission SOFA and SOFA Max was not included in the model because of redundancy.
The logistic regression model based on the SOFA Max model showed good calibration, as assessed by the C and H statistics (p=0.54 and p=0.95, respectively). However, when assessed by the Pulkstenis-Robinson method for the presence of infection, analysis of the D statistics individually showed large values for both the infected (from −6.788 to +7.894) and noninfected (+7.464 to −6.565) patients, demonstrating an obvious underestimation of mortality for infected patients, as well as an overestimation for noninfected patients, and highlighting the need to include infection as a covariate in the model, although the p value was not significant (p=0.157). After inclusion of infection, we no longer observed these discrepancies (values were within −1.782 to +1.850), and the p value was 0.796 for the D statistic. The AUC for this SOFA Max model was 0.840 (95% CI, 0.804–0.872) and 0.845 (95% CI, 0.809–0.876) after the addition of infection as covariate.
Calibration for the model based on SOFA Max, infection, and age (SOFA Max-infection model) showed p values for the H and the C statistics of 0.72 and 0.37, respectively; the calibration curve is shown in Fig. 1. The D statistic showed a p value of 0.825. Discrimination was assessed by the AUC (0.853, 95% CI: 0.817–0.884). Subsequent analysis was performed with this SOFA Max-infection model.
The observed ICU mortality for each country was compared to the predicted ICU mortality using the SOFA Max-infection model. There were no statistically significant differences in observed and predicted ICU mortalities, except for one country, which had a higher than expected mortality for the whole population (28.3 vs. 19.1%, p<0.05) and for the noninfected patients (21.4 vs. 12.6%, p<0.05). The infected patients in this country had a higher than predicted ICU mortality but this was not statistically significant (44.8 vs. 34.9%; 95% CI for observed: 26.7–62.9%). Analyzing the different prediction levels, ICU mortality was higher in every decile in this country (Fig. 2). Calibration was not performed for individual countries because of the limited number of patients.
Discussion
Outcome prediction is a fundamental tool in critical care. The available severity scores such as the APACHE II and the SAPS predict mortality based on physiological variables collected in the first 24 h of ICU stay, ignoring the fact that morbidity and mortality are very closely correlated, and that changes in the initial parameters may influence patient outcome; indeed using the MPM model, Rué et al. [7] observed that the best estimate of hospital mortality was the probability of death on the current day. In an ICU environment morbidity can be described as multiple organ dysfunction syndrome and has been observed with several acute states commonly seen in the ICU, including hemorrhagic shock [26], infection [27, 28], acute pancreatitis [29], burns [30], shock [31], and trauma [32]. The SOFA score, based on six independent and simple to obtain variables was initially presented for assessing morbidity in septic patients [16] but has been validated also in trauma [33] and in general ICU patients [21]. Recently it has been shown that the admission SOFA score, SOFA Max, and the changes in SOFA over the first 48 h are correlated with mortality [19, 20].
Consistent with the idea that mortality is related to severity of organ failure, our results show that although the admission SOFA score is related to ICU mortality, in a multivariate analysis only the SOFA Max is significantly related to ICU mortality. The SOFA-Max model showed a good calibration by the Hosmer-Lemeshow statistic, but, as observed by Pulkstenis and Robinson, this method may not detect poor calibration when an important binary covariate is missing [23]. Indeed, after stratification for the presence of all infections combined, we observed significant disparities in the observed and predicted mortalities for infected and noninfected patients. After adding infection as a covariate (SOFA Max-infection model) performance improved. Thus when using the SOFA score to evaluate the severity of the disease process, it should be adjusted for the presence of infection. Other studies have compared patients with the systemic inflammatory response syndrome with or without infection, noting higher mortality rates in infected patients [34]. Our data are even more exciting since the SOFA score represents a more complex analysis of organ function. The observation that infection influences the probability of death, independently of the degree of organ failure measured by the available clinical scores, leads us to believe that the presence of infection should be included in these scores, and in addition, if we want a more precise score, we should measure not only the degree of organ damage but also the inflammatory and coagulation abnormalities present in sepsis.
The calibration curve for the SOFA Max-infection model showed that the observed ICU mortality was always close to the line of identity, a finding that differs from what has been reported in the literature, in which overestimation or underestimation of mortality in sicker patients is usually observed [15, 35, 36]. This may be due to the fact that as the SOFA Max changes with the clinical course of the patient, greater accuracy can be obtained in all deciles of prediction.
Discrimination of the SOFA Max-infection model was good (AUC: 0.853) and similar to published values for the APACHE III [37, 38] and logistic organ dysfunction score [39] after customization, despite the fact that the SOFA score is simpler to compute and does not depend on admission diagnosis. This facet is highlighted by the observation that although medical, coronary, and trauma admissions had a significant univariate relationship with ICU mortality, after adjustment for the SOFA score their importance was lost, a finding that makes sense in view of the fact that the SOFA score actually measures organ dysfunction, and this may be a common pathway for many different disease states.
International differences were assessed, looking for variations in predicted vs. observed mortality. Since ICU selection was not random, these data should be interpreted carefully, as they may not be representative of each country. In this analysis, only one country had a higher ICU mortality than predicted, both in the overall population and in the noninfected patients, and this finding was present in all deciles of prediction. This discrepancy may be due to differences in case-mix, nonmeasured clinical, and nonclinical [40] variables that may be strongly related to mortality [35, 41, 42], differences in quality of care, or differences in cultural aspects such as resource allocation and policies regarding the limiting of therapy. Our data cannot solve this difficult question, although differences in case-mix should not be important after customization [43].
Our study differs from others as the models were applied to subpopulations of the original development data. This may impair external validation of the SOFA Max-infection model, but comparisons within groups should be enhanced, as opposed to the poor calibration found when other models were applied to independent populations [15, 44, 45]. It is also important to indicate that the present study included a limited number of patients from each country, thus preventing a more precise analysis, as simulations have shown that for smaller populations the fit of the model may deteriorate [12], and as yet we do not have adequate statistical methods to calibrate models in limited samples. Another potential limitation is that we convert a continuous variable (probability of death) into a dichotomous variable (dead or alive), but this is an inherent problem in studies of this type. In addition, while there was an apparent higher ICU mortality in patients from one country, the data do not allow any conclusions to be made regarding the reasons behind this finding; in particular, we cannot say from these results that the standard of care in that country is any less than in any of the five other countries. Rather, our findings should encourage other studies into international differences in ICU mortality rates, both in terms of absolute numbers and cause. Importantly, too, this model should not be used to evaluate the risk of death in individual patients.
In conclusion, the SOFA Max, adjusted for age and infection, is significantly related to ICU outcome, independently of admission category, but nonmeasured variables may still play an important role in differences found between predicted and observed ICU mortality.
References
Shortell SM, Zimmerman JE, Rousseau DM, Gillies RR, Wagner DP, Draper EA, Knaus WA, Duffy J (1994) The performance of intensive care units: does good management make a difference? Med Care 32:508–525
Knaus WA, Draper EA, Wagner DP, Zimmerman JE (1985) APACHE II: a severity of disease classification system. Crit Care Med 13:818–829
Le Gall J-R, Lemeshow S, Saulnier F (1993) A new simplified acute physiology score (SAPS II) based on a European/North American multicenter study. JAMA 270:2957–2963
Lemeshow S, Teres D, Avrunin JS, Gage RW (1988) Refining intensive care unit outcome prediction by using changing probabilities of mortality. Crit Care Med 16:470–477
Chang RWS, Jacobs S, Lee B (1988) Predicting outcome among intensive care unit patients using computerized trend analysis of daily APACHE II scores corrected for organ system failure. Intensive Care Med 14:558–566
Rogers J, Fuller HD (1994) Use of daily Acute Physiology and Chronic Health Evaluation (APACHE) II scores to predict individual patient survival rate. Crit Care Med 22:1402–1405
Rué M, Quintana S, Alvarez M, Artigas A (2001) Daily assessment of severity of illness and mortality prediction for individual patients. Crit Care Med 29:45–50
Sicignano A, Carozzi C, Giudici D, Merli G, Arlati S, Pulici M, on behalf of ARCHIDIA (1996) The influence of length of stay in the ICU on power of discrimination of a multipurpose severity score (SAPS). Intensive Care Med 22:1048–1051
Wong DT, Crofts SL, Gomez M, McGuire GP, Byrick RJ (1995) Evaluation of predictive ability of APACHE II system and hospital outcome in Canadian intensive care unit patients. Crit Care Med 23:1177–1183
Castella X, Artigas A, Bion J, Kari A (1995) A comparison of severity of illness scoring systems for intensive care unit patients: results of a multicenter, multinational study. The European/North American Severity Study Group. Crit Care Med 23:1327–1335
Rowan KM, Kerr JH, Major E, McPherson K, Short A, Vessey MP (1994) Intensive Care Society’s Acute Physiology and Chronic Health Evaluation (APACHE II) study in Britain and Ireland: a prospective, multicenter, cohort study comparing two methods for predicting outcome for adult intensive care patients. Crit Care Med 22:1392–1401
Clermont G, Angus DC, DiRusso SM, Griffin M, Linde-Zwirble WT (2001) Predicting hospital mortality for patients in the intensive care unit: a comparison of artificial neural networks with logistic regression models. Crit Care Med 29:291–296
Murphy-Filkins R, Teres D, Lemeshow S, Hosmer DW (1996) Effect of changing patient mix on the performance of an intensive care unit severity-of-illness model: how to distinguish a general from a specialty intensive care unit. Crit Care Med 24:1968–1973
Goldhill DR, Withington PS (1996) The effect of casemix adjustment on mortality as predicted by APACHE II. Intensive Care Med 22:415–419
Patel PA, Grant BJ (1999) Application of mortality prediction systems to individual intensive care units. Intensive Care Med 25:977–982
Vincent JL, Moreno R, Takala J, Willatts S, de Mendonça A, Bruining H, Reinhart CK, Suter PM, Thijs LG (1996) The SOFA (Sepsis-Related Organ Failure Assessment) score to describe organ dysfunction/failure. Intensive Care Med 22:707–710
Tran DD, Groeneveld ABJ, Vander Meulen J, Nauta JJP, Strack Van Schijndel RJM, Thijs LG (1990) Age, chronic disease, sepsis, organ system failure, and mortality in a medical intensive care unit. Crit Care Med 18:474–479
Deitch EA (1992) Multiple organ failure: pathophysiology and potential future therapy. Ann Surg 216:117–134
Moreno R, Vincent JL, Matos A, de Mendonça A, Cantraine F, Thijs J, Takala J, Sprung C, Antonelli M, Bruining H, Willatts S (1999) The use of maximum SOFA score to quantify organ dysfunction/failure in intensive care. Results of a prospective, multicentre study. Intensive Care Med 25:686–696
Lopes Ferreira F, Peres Bota D, Bross A, Melot C, Vincent JL (2001) Serial evaluation of the SOFA score to predict outcome. JAMA 286:1754–1758
Vincent JL, de Mendonça A, Cantraine F, Moreno R, Takala J, Suter P, Sprung C, Colardyn FC, Blecher S (1998) Use of the SOFA score to assess the incidence of organ dysfunction/failure in intensive care units: results of a multicentric, prospective study. Crit Care Med 26:1793–1800
Lemeshow S, Hosmer DWJ (1982) A review of goodness of fit statistics for use in the development of logistic regression models. Am J Epidemiol 115:92–106
Pulkstenis E, Robinson TJ (2002) Two goodness-of-fit tests for logistic regression models with continuous covariates. Stat Med 21:79–93
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36
Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839–843
Tilney NL, Bailey GL, Morgan AP (1973) Sequential system failure after rupture of abdominal aortic aneurysms: an unsolved problem in postoperative care. Ann Surg 178:117–122
Fry DE, Pearlstein L, Fulton RL, Hiram CP (1980) Multiple system organ failure: the role of uncontrolled infection. Arch Surg 115:136–140
Bell RC, Coalson JJ, Smith JD et al (1983) Multiple organ system failure and infection in adult respiratory distress syndrome. Ann Intern Med 99:293–298
Tran DD, Cuesta MA (1992) Evaluation of severity in patients with acute pancreatitis. Am J Gastroenterol 87:604–608
Marshall WG Jr, Dimick AR (1983) The natural history of major burns with multiple subsystem failure. J Trauma 23:102–105
Henao FJ, Daes JE, Dennis RJ (1991) Risk factors for multiorgan failure: a case-control study. J Trauma 31:74–80
Faist E, Baue A, Ditmer H, Heberer G (1983) Multiple organ failure in polytrauma patients. J Trauma 23:775–787
Antonelli M, Moreno R, Vincent JL, Sprung CL, Mendoca A, Passariello M, Riccioni L, Osborn J (1999) Application of SOFA score to trauma patients. Sequential Organ Failure Assessment. Intensive Care Med 25:389–394
Bossink AW, Groeneveld J, Hack CE, Thijs LG (1998) Prediction of mortality in febrile medical patients: how useful are systemic inflammatory response syndrome and sepsis criteria? Chest 113:1533–1541
Moreno R, Morais P (1997) Outcome prediction in intensive care: results of a prospective, multicentre, Portuguese study. Intensive Care Med 23:177–186
Sirio CA, Tajimi K, Tase C (1992) An initial comparison of intensive care in Japan and the United States. Crit Care Med 20:1207–1215
Rivera-Fernandez R, Vazquez-Mata G, Bravo M, Aguayo-Hoyos E, Zimmerman J, Wagner D, Knaus W (1998) The Apache III prognostic system: customized mortality predictions for Spanish ICU patients. Intensive Care Med 24:574–581
Markgraf R, Deutschinoff G, Pientka L, Scholten T, Lorenz C (2001) Performance of the score systems Acute Physiology and Chronic Health Evaluation II and III at an interdisciplinary intensive care unit, after customization. Crit Care 5:31–36
Metnitz PG, Lang T, Valentin A, Steltzer H, Krenn CG, Le Gall JR (2001) Evaluation of the logistic organ dysfunction system for the assessment of organ dysfunction and mortality in critically ill patients. Intensive Care Med 27:992–998
Bastos PG, Knaus WA, Zimmerman JE, Magalhaes M, Sun X, Wagner DP, The Brazil APACHE III Study Group (1996) The importance of technology for achieving superior outcomes from intensive care. Intensive Care Med 22:664–669
Apolone G, Bertolini G, D’Amico R, Iapichino G, Cattaneo A, De Salvo G, Melotti RM (1996) The performance of SAPS II in a cohort of patients admitted to 99 Italian ICUs: results from GiViTI. Gruppo Italiano per la Valutazione degli interventi in Terapia Intensiva. Intensive Care Med 22:1368–1378
Beck DH, Taylor BL, Millar B, Smith GB (1997) Prediction of outcome from intensive care: a prospective cohort study comparing Acute Physiology and Chronic Health Evaluation II and III prognostic systems in a United Kingdom intensive care unit. Crit Care Med 25:9–15
Teres D, Lemeshow S (1999) When to customize a severity model. Intensive Care Med 25:140–142
Ghuysen A, Lambermont B, D’Orio V (2000) [Application of the Simplified Acute Physiology Score II (SAPS II) in a medical intensive care unit]. Ann Fr Anesth Reanim 19:510–516
Buist M, Gould T, Hagley S, Webb R (2000) An analysis of excess mortality not predicted to occur by APACHE III in an Australian level III intensive care unit. Anaesth Intensive Care 28:171–177
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kajdacsy-Balla Amaral, A.C., Andrade, F.M., Moreno, R. et al. Use of the Sequential Organ Failure Assessment score as a severity score. Intensive Care Med 31, 243–249 (2005). https://doi.org/10.1007/s00134-004-2528-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00134-004-2528-6