Introduction

Ischemic heart disease (IHD) is the leading cause of death and emergency department (ED) admission, which exacerbates the burden of society and individuals [1, 2]. Since a large proportion of ED visits are avoidable [3, 4], identifying those who require immediate attention among the large number of patients who do not require urgent care is critical to ensuring patient safety, especially in overcrowded environment, which might help improve the appropriate use of ED visits and control health expenditures [5, 6].

Triaging as the first layer of emergency care plays an essential role in assessing and stratifying the risks of patients [7]. It is essential to identify the risk of adverse events in emergency patients with specific tool, such as early warning scores [8, 9]. One such tool is the HEART score, which estimates the possibility of major adverse events in patients presenting to the ED due to symptoms of acute coronary syndrome within 30 days [10]. The components are based on medical history, past medical risk factors, age, ECG interpretation, and serum troponin results. However, some variables are not instantaneous and shorter term outcomes are lacking.

To address the need for more accurate and straightforward predictive tools, we aimed to develop scoring models, based on machine learning algorithm, to improve the prioritization of IHD patients visiting the ED.

Methods

Study population

This study was a retrospective study in patients with IHD from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database [11]. The data of MIMIC-IV are sourced from two inpatient database systems: the customized digital electronic health record system (EHR) in hospitals and the intensive care unit (ICU) specific clinical information system. The medical records of patients who admitted to an ICU or ED between 2008 and 2019 were contained in this database and have been reorganized and de-identified [12]. The Institutional Review Board at the Beth Israel Deaconess Medical Center reviewed the research resource, granted a waiver of informed consent, and approved the data sharing initiative.

Patients diagnosed with IHD were extracted according to the 2016 edited International Classification of Diseases, Ninth Revision (ICD-9). IHD related codes from chapter 7 were selected, including code 410 of acute myocardial infarction, code 411 of other acute and subacute forms of IHD, code 412 of old myocardial infarction, and code 413 of angina pectoris. All patients who visited ED were screened. Patients who died within 24 hours after ED admission or who were under the age of 18 years old were excluded. All the included patients were randomly divided into the training set with the percentage of 70%, the validation set with 10%, and the testing set with 20%.

Outcome

The primary outcomes used to develop and test the tool were ICU stay and 3d-death. ICU stay was defined as transferring to ICU for advanced therapy after emergency admission. The 3d-death was defined as deaths within 72 hours after the time of admission. The secondary outcomes were deaths within 7 and 30 days after emergency admission. Both hospital and post-hospital deaths were recorded in MIMIC-IV database.

Data/variables extraction

All data and variables were evaluated and recorded when patients visited ED. Age and gender were included as demographic characteristics [13,14,15]. Vital signs were included due to the availability and immediacy responding to critical states, including temperature, heart rate, respiratory rate, oxygen saturation, systolic blood pressure (SBP), and diastolic blood pressure (DBP) [16]. History of smoke was included as a common risk factor for cardiovascular disease [17]. Major comorbidities were included as following: hypertension, atrial fibrillation, heart failure, chronic pulmonary disease, diabetes without chronic complication, diabetes with complication, renal disease, and malignant cancer. We also included variables that reflect the severity or urgency of IHD, including arrival transport, pain_max score, and intubated state. The arrival transport was included because it could reflect the urgency of ED admission [18, 19]. Pain_max score is based on the pain level when the patient enters ED [20], from no pain to total intolerability corresponding to a score from 0 to 10. Patient with intubated state suggests a severe condition [21, 22], such as respiratory failure or loss of consciousness.

Statistical analysis

In the description of the baseline characteristics, categorical variables were reported was numbers with percentages. The continuous variables in this study were represented as the median with quartile (Q1, Q3) with nonnormal distributions. Values of vital signs that were beyond the range of clinical and physiological cognition were considered outliers. We used K-Nearest Neighbor to interpolate the missing values.

AutoScore, a machine learning-based automatic clinical score generator, was applied to derive the scoring models in this study [23]. Its application to mortality prediction using EHR data increases the implementation and validation in the clinic. The training set was used to generate a preliminary scoring model using the AutoScore framework. The validation set was used to evaluate parameter tuning and model selection for candidate scoring models. Four scoring models were finally derived to predict the possibility of ICU stay, 3d-, 7d-, and 30d-death, respectively.

The performance metrics of the final scoring models were analyzed in the testing cohort. In addition, all the four scoring models were applied to predict the possibility of mortality within 3 days after emergency admission in the testing set and the performance were evaluated. The predictive power of scoring models was measured with areas under the curve (AUC) in the receiver operating characteristic (ROC) analysis. Sensitivity, specificity, positive predictive value, and negative predictive value were calculated under the optimal threshold and were reported with 95% confidence interval (95% CI). The data were analyzed using R software (version 3.5.3). P < 0.05 is considered statistically significant.

Results

Baseline characteristics

A total of 8381 IHD patients from ED were included in the present study (median patient age, 71 years, 95% CI 62–81; 3035 [36%] female), in which 5867 episodes were randomly assigned to the training set (median patient age, 72 years, 95% CI 62–81; 2142 [37%] female), 838 episodes to the validation set (median patient age, 71 years, 95% CI 62–81; 311 [37%] female), and 1676 to the testing set (median patient age, 71 years, 95% CI 62–80; 582 [35%] female). The flow diagram of patient selection and study cohort formation is presented in Fig. 1. In total, there were 2551 (30%) patients transferred into ICU after visiting ED. The mortality rates observed in the total cohort were 1% with 123 cases at 3 days, 3% with 245 cases at 7 days, and 7% with 622 cases at 30 days. The baseline characteristics in the training, validation, and testing cohorts were similar in terms of age, gender, vital signs, comorbidities, and other characteristics (Table 1).

Fig. 1
figure 1

Flow of patient selection and study cohort formation

Table 1 Baseline characteristics of the study cohort

Selected variables and scoring models

We performed AutoScore to select the most discriminative variables from all 20 baseline variables. Parsimony plots based on the validation set were applied for determining the choice of variables (Supplemental Figs. 1–4). With good balance in the parsimony plot, we chose 7 variables for the prediction of ICU stay, including age, SBP, DBP, temperature, heart rate, respiratory rate, and arrival transport. Nine variables were chosen as the parsimonious choice for 3d-death and 7d-death predicting with the addition of oxygen saturation and pain_max score. On this basis, 10 variables with the addition of atrial fibrillation were chosen for 30d-death estimation achieving good performance in the scoring model. When more variables were added to the scoring model for the outcomes, their performance were not improved significantly.

In the ICU-stay scoring model, the scores are mainly contributed by the arrival transport of helicopter, SBP < 90 mmHg, respiratory rate ≥ 24 beats per minute (bpm), heart rate ≥ 115 bpm, temperature < 35.8℃, DBP < 50 mmHg, and age < 45 or ≥ 65 years old (Table 2). In the 3d-death scoring model, the scores are mainly contributed by the arrival transport of helicopter, temperature < 35.8 ℃, SBP < 90 mmHg, age ≥ 83 years old, respiratory rate ≥ 30 bpm, heart rate ≥ 96 bpm, oxygen saturation < 93%, DBP ≥ 95 mmHg, and pain_max score ≥ 7 (Table 2). In the scoring models of 7d-death and 30d-death, the trends of contribution to scores were similar with 3d-death, including more urgent admission transport, older age, lower SBP, lower temperature and oxygen saturation, faster respiratory rates and heart rates, higher DBP, and with painful symptoms. Patients with the comorbidity of malignant cancer were added extra scores in the prediction of 30d-death.

Table 2 Scoring models derived from the primary and secondary outcomes

The scores of these scoring models based on the analysis of the variables included were all around 0–69. The possibility of predicted risk, patient ratios, and the evaluation of performance in each scoring models for different scoring intervals based on the testing cohort are shown in Table 3. For ICU stay and 3d-death scoring models, most patients had a risk score between 10 and 34, and few patients had scores under 5 or above 45. For patients who died within 7 days and 30 days of emergency admission, the risk score appeared to increase to the range of 26–48, with few patients scoring below 23 or above 53.

Table 3 Varying cutoffs of predicted risk based on the scoring models of ICU stay, 3d-death, 7d-death, and 30d-death, the proportion of patients stratified for the primary and secondary outcomes, and the corresponding sensitivity, specificity, positive and negative predictive values on the testing cohort

Performance evaluation

We used scoring models of ICU stay, 3d-death, 7d-death, and 30d-death to predict the possibility of transferring to ICU, death within 3 days, death within 7 days, and death within 30 days in patients after emergency admissions, respectively (Table 4). Among the testing cohorts, the scoring model achieved good performance for shorter term and longer term mortality prediction, with an AUC of 0.7551 (95% CI 0.7297–0.7805) for ICU stay, an AUC of 0.7856 (95% CI 0.7166–0.8545) for 3d-death, an AUC of 0.7371 (95% CI 0.6665–0.8077) for 7d-death, and an AUC of 0.7407 (95% CI 0.6972–0.7842) for 30d-death. Among the three cohorts in each outcome, the scoring model achieved the best performance for mortality with 3 days prediction, with an AUC of 0.8243 (95% CI 0.7771–0.8716) in training set, an AUC of 0.7877 (95% CI 0.6593–0.9161) in validation set, and an AUC of 0.7856 (95% CI 0.7166–0.8545) in testing set. We also applied the four scoring models to predict the possibility of mortality within 3 days after emergency admission in testing cohort and the performance are reported in Fig. 2. The 30d-death scoring model achieved the best performance for 3d-death prediction with an AUC of 0.790.

Table 4 Comparison of AUC values achieved by different triage scores on the training, validation, and testing cohort
Fig. 2
figure 2

Receiver operating characteristic curves of scoring models for predicting mortality within 3 days after emergency admission. ACU area under the curve, ICU intensive care unit, 3d 3 days, 7d 7 days, 30d 30 days

Discussion

In this retrospective study with a population of IHD patients presenting to the ED, newly developed scoring models were able to identify patients with high-urgency of ICU stay, 3d-death, 7d-death, and 30d-death. To the best of our knowledge, this is the first validated point-based clinical tool based on machine learning algorithm to be developed specifically for identifying patients who transferred into ICU or died during shorter or longer term care in the ED. Scoring models showing good discriminative performance in which 7–10 variables were selected to calculate a score as following: age, arrival transport, temperature, SBP, DBP, respiratory rate, heart rate, oxygen saturation, pain_max score, and comorbidity of malignant cancer.

IHD remains a major threat to public health, and overall burden is increasing globally [1]. The total number of prevalent cases, deaths, and DALYs due to IHD increased steadily from 1990, reaching 197 million prevalent cases, 9.14 million deaths, and 182 million DALYs due to IHD in 2019 [1]. In the present study, we found 30% IHD patients transferred into ICU after visiting ED and the mortality within 3 days, 7 days, and 30 days were 1%, 3%, and 7%, respectively. Although our study excluded patients who died within 24 hours in ED to ensure the reliability of the analysis results, a mortality of 0.98% was observed in the process of screening. Therefore, the disease burden caused by IHD is practically heavy. More accurate and parsimonious scoring models in this study help stratifies patients with high-risk in ED, which is conducive to patient safety and rational allocation of medical resources.

In the crowded and rapidly changing ED, doctors need to make the fastest judgment on the patient's condition and provide the timeliest intervention [24]. Physicians must undergo dual specialist and general internal medicine training, as a high proportion of inpatients have multiple comorbidities [25]. Combined with machine learning and logistic regression, AutoScore automatically develops a minimalistic sparse score risk model for predefined outcomes, enabling users to quickly and seamlessly build interpretable clinical scores that can be easily implemented and validated in clinical practice [23]. One study has revealed that objective criteria may be useful to physicians who are evaluating the appropriateness of admitting patients presenting to the ED to a short stay unit and to predict the LOS, which can both shorten hospital stays and reduce the cost of hospitalization [26]. In addition, the reported scoring systems have complex requirements for variables and lack prediction of adverse events in the very short term of ED [27,28,29,30]. In this study, both the candidate variables and those were ultimately selected in the scoring models are readily available in ED, making this clinical tool more implementable. Demographics, vital signs, and arrival transport are all indicators that can be obtained in ED in the first time, while more complex inspection indicators and imaging results are not screened because they are not available in time. The tendency of the selected variable values to be assigned points through cut-off segments is correspond to common clinical knowledge [24, 31]. For such example, in our scoring models, patients transported by helicopter to ED with shock blood pressure, breathless, and rapid heart rates are at high risk for transferring to ICU for advanced therapy. On this basis, patients with hypoxemia, elevated DBP, and painful symptoms have a higher risk of death within 3 days. Furthermore, the scoring models in our study for risk prediction show satisfactory performance through sensitivity and accuracy, and the prediction results obtained are highly reliable.

Transferring patients to ICU for advanced therapy and assessing outcomes in short term are important parts for emergency physician to triage patients from ED, and the correct decision to transfer to advanced care or not is both responsible for patient safety and rational use of medical resources [32,33,34]. Our study found that predictions of 3d-death showed the best performance among the scoring models of 30d-death. This may be contributed by more variables were included in this model, pain_max score, oxygen saturation, and the comorbidity of malignant cancer, which increases the identification of high-urgency, resulting in greater reliability and accuracy in 3d-death prediction.

There are several limitations in the present study. Firstly, the variables included in the scoring models in this study are based on clinical routine collection, which may cause some parameters to appear similar between high-risk patients and other patients during an ED visit, making the prediction models less accurate. Then, we did not compare the performance of scoring models in this study with other scoring systems due to the limited types of variables in the MIMIC-IV database [11]. In addition, the scoring models based on the database in the United States have not been tested in other countries and regions, which may have present prediction performance due to differences in economic and medical conditions [35].

Conclusion

The point-based clinical tool that we have successfully implemented based on machine learning algorithm offers an innovative approach to optimize the triaging of IHD patients in ED. This newly accurate and parsimonious scoring models present good discriminative performance for predicting the possibility of transferring to ICU, 3d-death, 7d-death, and 30d-death in IHD patients visiting ED.