Abstract
Based on the MIMIC-III database of the Massachusetts Institute of Technology, this paper studies and analyzes the symptoms of trauma-related sepsis. Use SOFA score as the Inclusion and Exclusion Criteria, extract the relevant patient medical index data with the guidance of a professional clinician. Sequential forward search is applied to search the optimal index combination based on the eXtreme Gradient Boosting (XGBoost) algorithm. Twenty independent replicates perform to obtain 7 key risk indicators (Urea Nitrogen, Prothrombin Time, PO2, Sodium, Red Blood Cells, Carbon Dioxide, International Normalized Ratio). The time window prediction model builds by four machine learning algorithms (decision tree, random forest, decision tree-based adaptive reinforcement (Adaboost) algorithm, XGBoost). The results show that the time window prediction model of trauma-related sepsis has good generalization ability. The prediction effect of the random forest and XGBoost algorithm is better than the other two. Finally, using the multi-factor Logistic regression method build the risk scoring tool for sepsis-induced by trauma-related infection base on the key risk indicators and the opinions of professional clinicians. The results show that the data-driven risk scoring tool can effectively predict the outcome of patients with trauma-related sepsis, which has high clinical significance.
This work was partly supported by the National Key Research and Development Plan for Science and Technology Winter Olympics of the Ministry of Science and Technology of China (2019YFF0302301).
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
With the development of medicine and the application of Smart healthcare, many diseases are under control. But the number of trauma patients is increasing, and the number of deaths is also increasing. Trauma has become the first cause of death for patients between 1 and 44 years old [1]. Trauma-related infection is one of the most common complications of trauma patients. Due to the difference in traumatic environment and degree of trauma, most trauma patients will have different degrees of infection, among which sepsis is the result of out-of-control infection of trauma patients in the late stage. Difficulty in predicting, diagnosis, and treatment are the reasons for the extremely high mortality of sepsis. It can highly improve the treatment rate of sepsis if sepsis prediction in the early stage. Many researchers applied Smart healthcare in the study of sepsis, combined with the computer, big data, and clinical medicine, analyze the risk factors for sepsis, trying to find out the relationship between the specific indicators and sepsis, focus on the point in a certain time of sepsis judgment, lack of the forecast on time interval judgment [2]. This paper will focus on the possibility of sepsis in trauma patients over time. Combined with the MIMIC-III database, this paper will conduct data mining and statistical analysis, and machine-learning algorithms to construct time window prediction and risk scoring tools for trauma-related sepsis to reduce the risk of sepsis and improve the working efficiency of medical staff.
2 Methods
2.1 Study Population and Data Sources
All data used in this study obtained from the Medical Information Mart for Intensive Care III database (MIMIC-III). The data are patients aged 18 years or older who had been in the ICU for 4 h or more due to trauma. Blood culture should be performed within 24 h if the patient uses antibiotics first. If the patient is performed within blood culture first, antibiotics should use within 72 h, and the priority project time recorded as Tsuspicion. When the SOFA score of the patients at 12 h after Tsuspicion minus the SOFA score at 24 h before Tsuspicion is higher than 2, the patients are identified as having sepsis, that is, the experimental group, otherwise for the control group.
2.2 Data Processing
The SOFA score is used as the inclusion and exclusion criteria of the research experiment. Data is preprocessed by data transpose, outlier processing, missing value analysis, and data filling. Missing no module of Python is used as the main tool for missing value analysis, and indicators with a missing ratio of more than 80% are removed. And then time series data are filled based on two dimensions, namely linear interpolation and distance filling.
2.3 Feature Selection and Machine Learning
Feature Selection is the preliminary step of machine learning and data mining, and it is a process of data preprocessing. It eliminates redundant or irrelevant features to identify the most important features, thus reducing the complexity of the problem [3]. In this study, the greedy algorithm is used to design a feature selection algorithm, and the XGBoost algorithm is used to select features of 35 indicators by sequential forward search strategy search. XGBoost has good anti-over-fitting characteristics and high computational efficiency [4]. The tree model of XGBoost is characterized by providing a basis for quantitative feature selection and forming encapsulated feature selection. The time-series data of trauma-related sepsis are input into the key indicator screening model for iteration, and the results of each iteration are recorded to select the index with the highest performance.
Compared with the black-box model (uninterpreted algorithm taking neural network as an example), the Decision Tree is based on if-then-else rules and is easier to understand, realize, explain and visualize [5]. The neural network (the black-box model representation) has certain defects: difficult to optimize, result in the local-optimal solution rather than the global-optimal solution, and low generalization leads to overfitting problems, etc. To sum up, this study uses the decision tree algorithm to build the time window prediction model. This study also uses the random forest and Boosting method which is derived from the decision tree to carry out multiple groups of experiments, to improve the accuracy of time window prediction of trauma-related sepsis [6].
In this study, grid parameter iteration is used for parameter adjustment. Given a set of parameters, the enumeration search method is used to iterate over all possibilities to select the best result (Fig. 1).
The logistic regression model is a multivariate statistical method to study the relationship between explanatory variables and observed results. In this study, the scoring tool is based on the key index set, using the multi-factor logistic regression method and the clinical grading consensus of the indicators to quantitatively calculated the severity of the patient's illness, namely the score. And then the score is corresponding to the outcome probability. The specific steps of model construction are as follows:
-
1)
Calculate the risk index regression coefficient β of each index;
-
2)
Combined with the inherent medical knowledge to determine the scoring threshold of each index, and determine the reference value \({w}_{ij}\) of each group;
-
3)
The basic risk reference value \({w}_{iREF}\) of each risk indicator is determined. In the subsequent scoring model construction, \({w}_{iREF}\) is recorded as 0 points, and when it is higher than \({w}_{iREF}\), it is recorded as positive points; otherwise, the higher the score, the higher the risk is.
-
4)
Calculate the distance D between the reference value \({w}_{ij}\) of each risk indicator and basic risk reference value \({w}_{iREF}\):
$$D=\left({w}_{ij}-{w}_{iREF}\right)*{\beta }_{i}$$(1) -
5)
Set constant B, the change value of the index corresponding to the change of 1 point in the risk scoring tool;
-
6)
Calculate the score \({Points}_{ij}\), of each group of risk indicators, and round the calculated value as the corresponding score value of this group:
$${Points}_{ij}=\frac{D}{B}=\left({w}_{ij}-{w}_{iREF}\right)*{\beta }_{i}/B$$(2) -
7)
Calculate the total score and risk prediction probability:
3 Results
This study uses PostgreSQL to extract data from the MIMIC-III database. And finally obtained data of 177 patients in the experimental group and 369 patients in the control group, with 35 various examination and laboratory indicators, amount to 201189 records.
The Hosmer Lemeshow goodness of fit index (H-L) [7] is used to verify the time series data after filling, and the significance of the result is 0.553 which greater than 0.05. There is no significant difference between the predicted value and the observed value, which proves that the model has a good fit.
3.1 Key Risk Indicators
The key indexes of trauma-related sepsis are extracted by feature selection. After 20 separate repeated experiments, the key indexes with retention times more than or equal to 16 times are Urea Nitrogen, PTT, PO2, Sodium, Red Blood Cells, CO2, and INR. The key indicators for retention between 12 and 16 times are Lactate and White Blood Cells. Hematocrit, Chloride, Hemoglobin, Temperature, Base Excess are the key indicators of retention between 10 and 12 times. Key indicators with retention times between 8 and 10 are Heart Rate, PCO2, Glucose, Platelet, Creatinine, and Calcium. The more times a single indicator is retained, the greater the influence of this indicator on the outcome and the stronger the ability to identify patients.
In this study, the feature_importances function of XGBoost is used to obtain the characteristic importance of each index. The results are shown in the following Table 1.
Compared with the calculation results, find that Urea Nitrogen and PTT, with the most retention times of key indicators, rank the first and the sixth in the ranking of characteristic importance, respectively, which proved that there is a causal relationship between key indicators and the accuracy of the predicted results of trauma-related sepsis. The weight of characteristic importance does not represent the correlation degree between the indicators and the predicted results, but it proves the correlation between the indicators and the predicted results, which lays a foundation for the subsequent prediction of trauma-related sepsis (Table 2).
By summing up the weights of all the key indicators, the calculation results show that the weight summation of Blood Biochemical and Blood Routine is the highest, and the distribution is 0.308 and 0.21. To a certain extent, it proved the important value of Blood Biochemical and Blood Routine in the prediction of trauma-related sepsis, followed by Arterial Blood Gas and Coagulation Function, and Vital Signs had the least influence in the prediction of trauma-related sepsis (Fig 2).
3.2 Time Window Forecast
Under the time window model of the full index data set, the accuracy rate, recall rate, and precision rate of the four model algorithms are between 64% and 83%, which meet the requirements of clinical medicine. The best prediction effect is the Random Forest. From the perspective of time, although the prediction effect fluctuates slightly, the overall accuracy decreases with the increase of time, which is in line with the actual prediction logic. Moreover, the overall model performance increase with time, but the changing trend is not obvious, which proved the stability of the model and is more conducive to the earlier prediction and early warning of trauma-related sepsis (Table 3, Fig. 3).
According to the model AUC value, all the models are higher than 0.64 under different time window parameter Settings, which can meet the dynamic requirements (Table 4).
Under the time window model of “Key Indicator Set 1–4”, the performance results of each machine learning method are still above 63%, and the accuracy, recall, and precision rate of Random Forest are all the best. Although the effect of different indicators fluctuated slightly, in general, the prediction effect decreased with the decrease of the number of key indicators (Fig. 4).
According to the model AUC value, all models are higher than 0.63 under different time window parameter Settings, among which Random Forest performed the best, followed by XGBoost, Adaboost, and Decision Tree. With the decrease of the amount of key index data, the prediction effect has a certain tendency to decrease, but it is not obvious, which does not affect the application requirements of the dynamic real-time time window prediction model for trauma-related sepsis, which is in line with clinical practice, and proves the generalization ability of the time window prediction model for trauma-related sepsis (Fig. 5).
3.3 Risk Score
In this study, the two indexes with the highest weighted sum of feature importance, blood biochemical and blood routine, are selected to construct a risk scoring tool. Creatinine is taken as a constant reference index, and the results are as follows (Table 5).
Taking a wounded patient as an example, the sum of all index scores is 8, so the probability of the patient suffering from sepsis at that moment is 91.58% (Table 6).
Input the time series data of trauma patients into the risk scoring model can get the change of the risk probability of the trauma patients suffering from sepsis, which reflects the increase in the severity of trauma-related infection over time. Taking a certain trauma patient as an example, the data from 4 h before to 24 h after the treatment of antibiotics showed that the patient's condition improved after the treatment of antibiotics, but only lasted for a while. The subsequent changes in the patient's condition are not detected in time, leading to a rapid increase in the possibility of the patient suffering from sepsis (Table 7, Fig. 6).
The AUC value of the trauma-related sepsis risk scoring tool is 0.79, which proved that the model has a certain generalization ability and can meet the clinical needs.
4 Conclusion
Sepsis is one of the main causes of death in patients with trauma-related infection in clinical practice. The onset of sepsis is serious and develops rapidly. Once diagnosed, a large amount of time for diagnosis and treatment has been lost, and it is difficult to control the development of sepsis. At the same time, due to the particularity of sepsis, the diagnosis process is complicated and needs a long time, which leads to the delay in the judgment of trauma-related sepsis by medical staff. Therefore, this study realizes the trauma-related sepsis predict warning, as well as the risk score. The model has good generalization ability, and can basically meet the clinical practical application, could help doctors perceive ahead of trauma-related sepsis in patients development trend and extent, thus early medical intervention on the patients, Controlling the development of sepsis will greatly improve the treatment rate of sepsis.
The next step is to validate the model in clinical trials. The diagnosis of sepsis needs to base on the changes of various indicators of patients over some time, so the data obtained in this study is limited, and the conditions of sepsis patients are diverse. As the limited data cannot accurately display all the clinical manifestations of sepsis, the model obtained in this study cannot achieve higher performance. More effective and real clinical data can optimize and improve the model.
References
Brunicardi, F., et al.: Schwartz’s Principles of Surgery, 10th edn., p. 161. McGraw-Hill Education, New York (2015)
Desautels, T., et al.: Prediction of sepsis in the intensive care unit with minimal electronic health record data: a machine learning approach. JMIR Med. Inf. 4(3), e28 (2016)
Lieber, D., Stolpe, M., Konrad, B., Deuse, J., Morik, K.: Quality prediction in interlinked manufacturing processes based on supervised & unsupervised machine learning. Procedia CIRP 7, 193–198 (2013)
Jalal Mussa, D., Jameel, N.M.: Relevant SMS spam feature selection using wrapper approach and XGBoost algorithm. Kurdistan J. Appl. Res. 4(2), 110–120 (2019)
Tofan, C.: Optimization techniques of decision making - decision tree. Adv. Soc. Sci. Res. J. 1(5), 142–148 (2014)
Sagi, O., Rokach, L.: Approximating XGBoost with an interpretable decision tree. Information Sciences (2021)
Hosmer, D., Lemesbow, S.: Goodness of fit tests for the multiple logistic regression model. Commun. Stat. Theory Methods 9(10), 1043–1069 (1980)
Acknowledgment
Thanks to Dr. Zhao of the Chinese PLA General Hospital for his help. Medical big data analysis cannot separate from the support of clinical medicine. Dr. Zhao guided and verified my research content with professional clinical experience, which made me realize the clinical significance of the research content fundamentally. I would like to express my gratitude to Ms. Li Jing of the Intelligent Diagnostic Team of the Winter Olympics for her help and guidance in the whole research stage of my thesis.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Luo, K., Li, J., Zhao, Y. (2022). Research on Time Window Prediction and Scoring Model for Trauma-Related Sepsis. In: Shi, X., Bohács, G., Ma, Y., Gong, D., Shang, X. (eds) LISS 2021. Lecture Notes in Operations Research. Springer, Singapore. https://doi.org/10.1007/978-981-16-8656-6_10
Download citation
DOI: https://doi.org/10.1007/978-981-16-8656-6_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8655-9
Online ISBN: 978-981-16-8656-6
eBook Packages: Business and ManagementBusiness and Management (R0)