Keywords

1 Introduction

During the rapid development of modern cities, traumatic events, especially severe trauma, have become a serious social problem. Haemorrhagic shock is a serious complication caused by trauma, which will lead to rapid death if the bleeding cannot be controlled in time [1]. Therefore, early condition assessment and prognosis determination of patients with traumatic haemorrhagic shock are crucial for guiding clinical treatment. With the continuous application of machine learning algorithms in the medical field, prediction models based on machine learning have gradually become a hot spot of research.

Currently, judgement and decision-making in common traumatic haemorrhagic shock is largely dependent on physician experience. In clinical practice, SI (shock index), the ratio of pulse rate to systolic blood pressure, is a common quantitative judgement method, and the degree of increase in SI is positively correlated with blood loss. In addition, some traditional clinical assessment tools are also commonly used to determine the risk of traumatic haemorrhagic shock, such as the PHI score [2], the GCS score [3], the ISS score [4], the TRISS score [5], and the APACHE II score [6]. However, as the injuries of patients with traumatic haemorrhagic shock are often characterized by rapid change, rapid progression, and insidiousness, failure to detect and intervene in a timely manner may lead to haemorrhage or even death [1]. Therefore, developing an early prediction and warning model based on the patient’s vital signs and other indicators to predict the patient’s morbidity in advance can provide enough time to assist healthcare professionals in making effective clinical decisions.

The introduction of machine learning methods can analyse large amounts of medical data in a more detailed and rapid manner. Machine learning algorithms can identify patterns and extract key features from large amounts of medical data and use them to build disease prediction models to provide decision support. Some scholars have used simple machine learning methods to predict the risk of diseases.Joshi RD et al. used logistic regression models and decision trees to predict type 2 diabetes for Pima Indian women and the results showed that the model prediction accuracy was 78.26%, which can be used for reasonable prediction of type 2 diabetes [7].Sun X et al. used LASSO to screen the five most relevant features for prediction of glioma pathological grading, and their Logistic Regression model had a good discriminatory ability with an AUC of 0.919 for the ROC curve [8].Manore et al. used Machine Learning to identify the significant predictive factors of West Nile virus significant predictors and applied logistic regression analysis to predict the probability of at least one West Nile virus case in the county next summer [9]. In the field of machine learning, ensemble learning can often achieve more accurate predictions than a single model by combining multiple models to complete the corresponding learning task [10].Amir et al. used the Random Forest to predict 30-day mortality after ST-segment elevation myocardial infarction, and the model significantly outperformed the Global Registry of Acute Cardiac Events (GRACE) score [11].Kim et al. constructed an RF-based prediction model for Alzheimer’s disease, and the results showed that the model was effective and could be easily applied in clinical practice [12]. Deep learning, as a more popular research direction of machine learning, has developed rapidly in recent years.Doppalapudi et al. compared the performance of three more popular deep learning models, ANN, CNN and RNN, and at the same time compared the deep learning model with the traditional machine learning, and the results showed that the deep learning model outperforms the traditional machine learning model in terms of classification and regression methods [13].

But the complexity of machine learning models also raises the issue of interpretability. For some simple machine learning models, the interpretability of the model depends on the model itself, but it can’t highly fit complex data. Complex machine learning algorithms can achieve high accuracy, but there is a “black box” problem, making it difficult to understand the basis and mechanism of their judgement. As healthcare is a life-related industry, doctors will only believe in the model’s predictions if they can see why the model makes a certain judgement, so the model’s interpretability is crucial. Explainability by Design is achieved at the model design stage by choosing simple model structures, adding interpretable modules, and other means. Typical approaches are to use simple models such as logistic regression [14], support vector machines [15], or to include interpretable modules in complex models [16]. Instead of modifying the model itself, Post-hoc Explainability explains the trained black-box model through additional interpretation procedures such as LIME, SHAP, LRP and other interpretable algorithms. Among them, SHAP (SHapley Additive exPlanations) is a game theory-based model interpretation method proposed by Lundberg et al. in 2017, which measures the importance of each feature in the samples in the model by calculating the SHAP Value of each feature in the model, and is applicable to the interpretation of any model [17]. Wang et al. constructed a 10-year all-cause mortality risk model for heart failure (HF) patients caused by coronary heart disease (CHD) based on ML, and applied SHAP to explain the model’s decisions separately. The conclusion points out that the combination of ML and SHAP can clearly explain personalized risk prediction and enable doctors to intuitively understand the impact of key features in the model [18].

However, although machine learning has achieved some results in other medical fields, applying it to the prediction of traumatic haemorrhagic shock still faces many challenges. On the one hand, the pathogenesis of traumatic haemorrhagic shock is complex and is affected by a variety of factors, such as the degree of trauma, the patient’s age, and the underlying health status, but there is still a lack of effective evaluation indexes, which increases the complexity of model prediction. On the other hand, due to the rapid onset of traumatic haemorrhagic shock and the crisis of the condition, which requires rapid intervention and treatment and is uncommon in the clinic, the training of machine learning algorithms may be affected by the limited number of samples.

Therefore, this study aims to develop a predictive model for traumatic hemorrhagic shock based on clinical real-world data using machine learning algorithms, and to explain the predictive results of the model through the use of interpretable methods, thereby increasing the trust and acceptability of the predictive model among doctors and clinical decision-makers. This study can identify the risk factors and signals of patients before the occurrence of traumatic hemorrhagic shock by learning a large amount of clinical data and features, which helps to identify high-risk patients in advance and take measures. It can provide reference for the prevention and management of traumatic hemorrhagic shock patients, and has certain theoretical significance and application value.

2 Content and Methodology

2.1 Research Content

Based on the emergency database of the Chinese People’s Liberation Army General Hospital, this paper designs the inclusion and exclusion criteria of the research experiment under the guidance of professional clinicians, from which the medical index data of relevant patients are extracted, and the cleaning, gap-filling and screening of key indicators are carried out for the collected data, including the age, gender, underlying diseases, medication use and biochemical indicators of patients. In this paper, five machine learning models, namely logistic regression, decision tree, random forest, XGBoost and MLP, were selected to construct a prediction model for traumatic haemorrhagic shock, and the prediction effects of different machine learning models were evaluated by combining indicators such as ROC curves and AUC. In addition, this study uses the SHAP method to interpret the importance information of features in the Xgboost prediction model from both a global and individual perspective. The research route of this paper is shown in Fig. 1.

Fig. 1.
figure 1

Technology Roadmap

2.2 Data Sources

The experimental data of this study were obtained from the People’s Liberation Army General Hospital Emergency Rescue Database (PLAGH-ERD), which integrates data from several departments of the PLA General Hospital, including the laboratory information system, the emergency specialist system, the emergency nursing system, and the hospital information system, and covers the period from It covers patient information from 2014 to 2018 at PLA General Hospital, including demographic information, diagnosis and treatment information, nursing information, medical advice information, vital signs, laboratory test results, and other indicators [19]. By using medical data from the “real world”, processing and modelling the data based on machine learning and other methods, and predicting the occurrence of specific diseases on the basis of the data, it can provide a reference for doctors’ diagnosis and decision-making.

2.3 Inclusion Exclusion Criteria

According to the objectives of this study and the content of the study, the inclusion and exclusion criteria for the extraction of experimental data were designed under the guidance of a professional physician, and trauma patients were divided into experimental and control groups using the occurrence of haemorrhagic shock as the outcome of the patients.

The inclusion and exclusion criteria for the experimental group were as follows:

  1. (1)

    Inclusion of patients admitted to hospital for trauma;

  2. (2)

    Inclusion of adult patients, i.e. aged ≥18 years;

  3. (3)

    Patients who had shock during hospitalisation, i.e. Shock Index (SI) ≥1.0 and Mean Blood Pressure (MBP) ≤70 mmHg were included;

  4. (4)

    Patients who have experienced infectious shock, cardiogenic shock and anaphylaxis during hospitalisation are excluded.

  5. (5)

    Patients with a haemoglobin measurement of less than 90 g/L on admission (excluding anaemia that may be caused by malignancy, haematological disorders, immune system disorders, chronic kidney disease, chronic liver disease, etc.) or a 20% decrease in haemoglobin measurement before and after the onset of shock compared to the first haemoglobin measurement.

The inclusion and exclusion criteria for the control group were as follows:

  1. (1)

    Inclusion of patients admitted to hospital for trauma;

  2. (2)

    Inclusion of adult patients, i.e. aged ≥18 years;

  3. (3)

    Patients who had not experienced shock during hospitalisation were included;

  4. (4)

    Patients discharged from hospital in survival status were included.

According to the inclusion and exclusion criteria, the corresponding data of the experimental and control groups were extracted from the emergency database of the General Hospital of the Chinese People’s Liberation Army.

2.4 Data Pre-processing

Data pre-processing is the process of cleaning the raw data, removing redundant information and forming structured data before modelling the data analysis. The quality of data pre-processing plays a very important role in the effectiveness of data analysis. Because the data in the emergency database of the Chinese People’s Liberation Army General Hospital is a combination of data from several departments, including the People’s Liberation Army, it is necessary to clean and fuse heterogeneous data from multiple sources. In this paper, we mainly use manual cleaning, data cleaning matching rules and machine learning to carry out data pre-processing work.

Firstly, manual screening is used to remove certain noisy data containing meaningless text information, symbols, etc. caused by irregular records. Secondly, data statistics are used to calculate the maximum value, minimum value, average, standard deviation and other statistical information of each index, analyse and find outliers that do not meet the data standards and eliminate them. Finally, the filling method of machine learning is chosen to fill in the vacant values. In recent years, the filling of vacancies in the medical field mostly uses K-means Clustering, Multiple Imputation, MissForest and other methods. In this study, missing data values were analysed using the missingno library, and a high rate of missing indicators was found. Therefore, based on the MissForest algorithm, the threshold was set to 0.8, the missing data of the indicators with a missing rate of less than 80% were filled.

2.5 Research Methodology

Machine learning algorithms can efficiently learn complex data patterns and build accurate predictive models. In this study, five classical machine learning models, including Logistic Regression, CART, Random Forest, XGBoost and MLP, are used to construct a predictive model for traumatic haemorrhagic shock. Among them, Logistic Regression is popular in industry for its simplicity, parallelisability and strong interpretability. CART is a classical algorithm for decision trees, which is a supervised learning method. In addition, Random Forest and XGBoost algorithms based on the ideas of Bagging Ensemble Learning and Boosting Ensemble Learning, as well as multilayer perceptron (MLP) models with a neural network structure, are used to improve model accuracy. The prediction effectiveness of the models is evaluated by using Accuracy, Precision, Recall and F-score metrics. Through model evaluation, the more effective prediction models are selected and the importance information of the features in the prediction models is explained using the interpretable SHAP.

3 Results and Discussion

3.1 Model Construction and Evaluation

According to the inclusion and exclusion criteria in Subsect. 2.3, 604 patients were extracted from the emergency database of the General Hospital of the People’s Liberation Army for the study, of which the number of the experimental group was 102 and the number of the control group was 502. After being screened by professional doctors, the indicators that were not strongly correlated with traumatic haemorrhagic shock were excluded, and five indicators, namely Heart Rate(HR), Respiratory Rate(RR), Pulse Oximeter Oxygen Saturation (SPO2), Systolic Blood Pressure(SBP) and Diastolic Blood Pressure (DBP) were included. The pre-processed set of indicators was input into the constructed machine learning prediction model, and the data set was divided according to 70% as the training set and 30% as the test set.

In this study, the model was built using the Python machine learning library, and different parameters were set during the construction of the model in order to achieve better prediction results. Different parameter settings have a significant impact on the performance and effectiveness of the model, so the tuning strategy is a very important part of the model building process. In this paper, we use network parameter iteration to tune the parameters of the prediction model by setting the combination of parameters required for grid search and selecting the best combination based on cross-validation scores to obtain the prediction results.

This paper uses the metrics Accuracy, Precision, Recall, F1_score and the overall AUC of the 10-fold cross-validation to evaluate the results shown in Table 1.

Table 1. Comparison of prediction model results

The ROC curves for each algorithm are also plotted in this paper for visual comparison, and the results are shown in Fig. 2.

Fig. 2.
figure 2

AUC curve of the prediction model

From the experimental results, it can be seen that the accuracy of the five types of traumatic hemorrhagic shock prediction models constructed in this paper is above 80%, indicating that the algorithm predicts the correct number of traumatic hemorrhagic shock patients with a high ratio of the number of injuries to the total sample size. Precision refers to the probability of predicting the number of patients with traumatic hemorrhagic shock who deteriorates the number of patients with traumatic hemorrhagic shock, and the precision of logistic regression algorithms is low. Recall refers to the probability of being predicted correctly in the actual number of patients with traumatic hemorrhagic shock, and the Recall of the random forest algorithm is relatively low. But because of the contradictory relationship between Precision and Recall, F1_score is used to balance the results of the two. XGBoost, CART and MLP have relatively high F1_score. This study uses ROC curve to show the performance of different prediction models, the area under the ROC curve is called AUC (Area under Curve), the larger the AUC value, the higher the prediction accuracy of the model, the smoother the ROC curve, generally represents the lighter the overfitting phenomenon.

Combining these metrics shows that XGBoost performs significantly better than the other four algorithms, with the MLP algorithm coming in second, while LR and RF do not perform as well as expected. The prediction results of the ensemble learning models are overall better than those of the simple machine learning models, but the most complex is not necessarily the best. For example, XGBoost outperforms MLP, which also shows that the prediction models have to be combined with the actual data. However, it can also be seen that the AUC values of all five algorithmic models exceeded 0.8, indicating that the constructed prediction models for traumatic haemorrhagic shock are more accurate and can better solve the problem of this application scenario.

3.2 Global Feature Importance Analysis

Machine learning models can analyse a large amount of medicine, but they also raise the issue of interpretability. Interpretability is crucial to medical problems, and doctors will only trust the results predicted by a model if they are allowed to see why the model makes certain judgements. This study uses the SHAP method to interpret information about the importance of features in the Xgboost prediction model.

Global feature importance is calculated based on the full data set and this type of feature attribution method is usually expressed in terms of feature importance. The vertical coordinates indicate each key feature in the predictive model and the horizontal coordinates indicate the SHAP Value of each feature, which reflects the contribution and positivity of each feature. As shown in Fig. 3, heart rate is a very important feature and is essentially positively correlated with patient injury risk. In addition, systolic and diastolic blood pressures also significantly affect the patient’s risk of injury.

Fig. 3.
figure 3

SHAP summary plot

3.3 Attribution Analysis of Personalised Characteristics

The SHAP partial dependence plot illustrates the marginal effect of features on the prediction results of a machine learning model. Each point in the plot represents a data instance in the dataset, the horizontal coordinate codes the feature value and the vertical coordinate represents the corresponding SHAP value. The SHAP partial dependence plot represents the extent to which the feature value changes the output of the predictive model, showing the effect of individual features on the model.

Heart rate is the number of heartbeats per minute in a normal person in a quiet state. As shown in Fig. 4, when HR > 90, most samples have positive SHAP values and patients have a higher probability of shock due to blood loss, and when 50 < HR < 90, patients with traumatic haemorrhagic shock have a lower probability of deterioration. Respiratory Rate is the number of breaths per minute. As shown in Fig. 5, most samples have a positive SHAP value when Respiratory Rate < 20 and Respiratory Rate > 30, when early attention should be paid to patients with traumatic haemorrhagic shock.Pulse Oximeter Oxygen Saturation is the concentration of oxygen in the blood.As shown in Fig. 6, most samples had positive SHAP values when SPO2 was < 94%, when the risk of deterioration in patients with traumatic haemorrhagic shock is increased, which needs to be a cause for clinical attention.

In addition, since the two key features, Systolic Blood Pressure and Diastolic Blood Pressure, have a relatively strong correlation, this study depicts the effect of the interaction of the two features, systolic and diastolic blood pressure, on the injury status of patients with traumatic haemorrhagic shock. As shown in Fig. 7, when DBP < 60 and SBP < 100 or DBP > 90 and SBP > 150, the SHAP value is generally positive, and the patient has a higher probability of experiencing traumatic haemorrhagic shock, which should be used to provide an early warning of the patient’s condition development.

Fig. 4.
figure 4

Partial SHAP dependence plots for Heart Rate (HR)

Fig. 5.
figure 5

Partial SHAP dependence plots for Respiratory Rate (RR)

Fig. 6.
figure 6

Partial SHAP dependence plots for SPO2

Fig. 7.
figure 7

SHAP dependence plots for SBP and DBP

4 Conclusion

This paper takes the construction of disease risk prediction model as the research background, selects the disease of traumatic haemorrhagic shock as the research object, and mainly applies the method of machine learning to complete the data extraction and data processing, risk prediction model construction, model evaluation, model interpretation and other key contents of the construction of traumatic haemorrhagic shock prediction model. Firstly, this paper is based on the patient information from the emergency database of the General Hospital of the People’s Liberation Army (PLA), and the experimental inclusion and exclusion criteria are designed under the guidance of professional doctors, and the experimental group and control group are extracted using database language. In the data preprocessing stage, the extracted patient data were data cleaned and preprocessed using python tools, the missing values were analysed using the missingno library, and the MissForest method was used to fill in the gaps. In the model construction stage, five machine learning algorithms, namely Logistic Regression, CART, Random Forest, XGBoost and MLP, were used to parameterise the classifier prediction model using the network parameter iteration method, which iterates over all the possible values given a set of parameters to obtain the prediction results. Finally, the model is evaluated by the metrics of Accuracy, Precision, Recall, and F-score. The results show that the XGBoost algorithm has the largest average AUC and the overall prediction effect is better than the other models. In addition, the AUC values of these five algorithmic models exceeded 0.8, indicating that the constructed traumatic haemorrhagic shock prediction model has high accuracy and has potential clinical application. In addition, this study used the SHAP method to explain the value of different features to the prediction model from both global and individual perspectives, where the global explanation enables physicians to understand the global feature importance and the local explanation reflects the effect of individual features on the model, making the model more credible.

There are still some areas for improvement in this paper. On the one hand, due to the large number of missing data values, there are fewer indicators that can be used in the prediction model after the data replacement stage, which may make the results of model prediction and model interpretation one-sided. In future work, the authoritative MIMIC public dataset can be selected for predictive model construction and model validation. On the other hand, this paper provides risk prediction for the broad category of traumatic haemorrhagic shock, and in the future, the risk of traumatic haemorrhagic shock can be investigated in specific patient groups, such as the elderly, pregnant women, patients with hepatic insufficiency, patients with chronic renal disease, and other special populations, in order to better guide clinical practice.