Development and validation of a machine learning algorithm and hybrid system to predict the need for life-saving interventions in trauma patients

Liu, Nehemiah T.; Holcomb, John B.; Wade, Charles E.; Batchinsky, Andriy I.; Cancio, Leopoldo C.; Darrah, Mark I.; Salinas, José

doi:10.1007/s11517-013-1130-x

Development and validation of a machine learning algorithm and hybrid system to predict the need for life-saving interventions in trauma patients

Original Article
Published: 22 November 2013

Volume 52, pages 193–203, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Development and validation of a machine learning algorithm and hybrid system to predict the need for life-saving interventions in trauma patients

Download PDF

Nehemiah T. Liu¹,
John B. Holcomb²,
Charles E. Wade²,
Andriy I. Batchinsky¹,
Leopoldo C. Cancio¹,
Mark I. Darrah³ &
…
José Salinas¹

1269 Accesses
42 Citations
10 Altmetric
2 Mentions
Explore all metrics

Abstract

Accurate and effective diagnosis of actual injury severity can be problematic in trauma patients. Inherent physiologic compensatory mechanisms may prevent accurate diagnosis and mask true severity in many circumstances. The objective of this project was the development and validation of a multiparameter machine learning algorithm and system capable of predicting the need for life-saving interventions (LSIs) in trauma patients. Statistics based on means, slopes, and maxima of various vital sign measurements corresponding to 79 trauma patient records generated over 110,000 feature sets, which were used to develop, train, and implement the system. Comparisons among several machine learning models proved that a multilayer perceptron would best implement the algorithm in a hybrid system consisting of a machine learning component and basic detection rules. Additionally, 295,994 feature sets from 82 h of trauma patient data showed that the system can obtain 89.8 % accuracy within 5 min of recorded LSIs. Use of machine learning technologies combined with basic detection rules provides a potential approach for accurately assessing the need for LSIs in trauma patients. The performance of this system demonstrates that machine learning technology can be implemented in a real-time fashion and potentially used in a critical care environment.

Machine learning combining CT findings and clinical parameters improves prediction of length of stay and ICU admission in torso trauma

Article 21 January 2021

Predicting the complexity and mortality of polytrauma patients with machine learning models

Article Open access 09 April 2024

Machine learning techniques for mortality prediction in critical traumatic patients: anatomic and physiologic variables from the RETRAUCI study

Article Open access 20 October 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Hemorrhage continues to be the leading cause of preventable mortality in trauma patients. For many of these patients, the key to effective assessment begins with the accurate identification and diagnosis of injury type and severity in order to determine the appropriate and timely treatment options. In this environment, potential delays in performing a life-saving intervention (LSI) may compromise a patient’s stability, complicate injuries, and lead to increased morbidity or mortality. Although vital signs—such as a patient’s body temperature, heart rate, blood pressure, and respiratory rate—play significant roles in monitoring trauma patients and are generally used as means to assess patient condition, they alone may not identify patient destabilization until late and often irreversible changes in state take place. The utility of available field vital signs depends heavily on concomitant interpretation by an expert provider, and vital signs are only measured intermittently during patient care at both prehospital and hospital phases. Previous studies have shown that standard vital signs available from monitors are no better in determining true injury status and severity in trauma patients than a simple physical examination [9]. Nonetheless, measurement and interpretation of electronic vital signs have become routine during prehospital and hospital care. In order to achieve more accurate diagnostic capabilities, new approaches based on the combinations of multiple vital signs, trends, and other information may be better suited for trauma diagnosis [5]. However, as approaches required for improving the sensitivity of diagnosis become more complex, use by providers may become increasingly difficult without the use of adjuncts capable of deriving solutions based on complicated mathematical formulas.

Because of the intricate relationships that exist between vital signs, time, and other factors, developing new approaches that take these items into account will also require the use of advanced information systems and computer algorithms coupled with technologies capable of processing and fusing multiple parameters, weight, and trends. These algorithms provide the capability to extract the maximum information content available from both single vital signs and combinations of multiple vital signs across time points. Machine learning (ML) technology is one approach that has been studied recently as a potential approach and solution for the multivariate processing of vital signs required to accurately diagnose the patient condition in the critical care environment [2, 10, 11, 14]. In addition, because such an environment requires rapid, accurate decisions within a short time frame, ML technology can potentially help describe possible interventions earlier. This approach uses information technology to mimic human decision making and provide an automated approach for processing vital signs and other patient data, with the aim of predicting needs rapidly for patient care. The use of ML technology may facilitate the process of triaging patients to appropriate trauma centers where experienced personnel can rapidly perform LSIs. By providing diagnostic support, computers and ML may fit well into a prehospital triage algorithm that focuses on LSIs as its optimal end point [3, 7]. Additionally, such technology may strengthen the ability to accurately triage trauma patients in the prehospital environment and hence improve the survival of patients that would have otherwise died.

The purpose of this study was to test and evaluate different types of artificial intelligence and ML methods for modeling injury severity (as defined by the need for an LSI during prehospital and/or in the emergency department) of a set of retrospectively and prospectively collected trauma patients based on data collected over 5 years, most recently from the Wireless Vital Signs Monitor (WVSM, Athena GTX, Des Moines, Iowa) trial. Methods were implemented as a real software application module capable of real-time processing in a computer system with moderate performance specifications. A 90/10 cross-validation approach was used for designing the predictive capacity of the algorithm.

We hypothesized that because computers and ML can process large amounts of disparate data continuously, quickly, and accurately, they would not only benefit trauma diagnosis, especially in the context of prehospital triage, but also integrate well into an electronic system that may perform LSI predictions in real time.

A novelty of this study was the development and validation of an ML algorithm and hybrid system to predict the need for LSIs in trauma patients. While there have been numerous studies utilizing decision trees, conjunction rules, support vector machines, artificial neural networks, multilayer perceptrons, and logistic regression models in order to discriminate between different patient groups, to date, no study has investigated the possibilities of predicting in real time the needs for LSIs in trauma patients. The ultimate goal of this work was to address this shortcoming and provide physicians with a new tool for decision support.

2 Methods

This study was approved by the Institutional Review Boards of the US Army Institute of Surgical Research, Fort Sam Houston, TX, USA, and the University of Texas Health Science Center at Houston, Houston, TX, USA. We analyzed data from both the Trauma Vitals (TV) database and the WVSM to generate datasets for training and validating an artificial intelligence model, respectively.

2.1 TV database and protocol

Data in the TV database include severe trauma patients with blunt and penetrating injuries transported from the scene by helicopter service to a Level I trauma center in Houston, Texas, or San Antonio, Texas. Patients were monitored from the scene during transport using a Welch Allyn Propaq 206 (Welch Allyn, Skaneateles Falls, NY) monitor or Welch Allyn PIC 50 (Welch Allyn, Skaneateles Falls, NY) monitor. Propaq data were collected using a computerized personal digital assistant (PDA) attached to the monitor during transport. Data were stored in a nonvolatile memory card in the PDA for use during the study. All numeric Propaq data were stored at a rate of 1 Hz. Waveform data were recorded at a rate of 182 Hz. PIC 50 data were stored in a built-in flash memory card attached to the monitor. PIC 50 numeric data were stored at a rate of one measurement every 3 min, coinciding with the patient’s noninvasive blood pressure measurements. Waveform data were stored at a rate of 375 Hz. Data from the PDA and flash cards were extracted by research personnel and uploaded to the TV database for analysis. All nonelectronic data were manually recorded on the run sheet from the monitor’s screen by Emergency Medical Services medics, then collected on a standardized form, and entered into the TV database. These included demographic data, physical examination results, Glasgow coma scores, and interventions performed on the patients in the field. LSIs consisted of endotracheal intubations, transfusions, tube thoracostomies, cardiopulmonary resuscitations, needle decompressions, angio-embolizations, cricothyrotomies, thoracotomies, and cardioversions.

Data for the ML model were selected based on a generic population sample that would prevent the algorithm from training to only a subset of the general population (which would result in a nongeneralizable model). We selected over 30 h worth of data corresponding to 79 prehospital patient records from the TV database based upon three criteria necessary to maximize the input parameter values required to provide an optimal learning set to the ML engine. These included (1) availability of vital signs and patient status summary scores (Murphy Factor, Athena GTX, Des Moines, Iowa) with values from 0 to 5 for the patient on a second-by-second basis, (2) blood pressure measurements over a minimum of 15 min that also changed from initial measurements (had a baseline shift to yield measurable slopes), and (3) heart rate measurements uncorrupted by electromechanical noise. Lengths of patient records varied from approximately 15–30 min. Records contained single episodes of data, sometimes missing one or more measurements from different vital signs over the episode’s duration. Standard vital signs used during trauma care for patient assessment included heart rate (HR), systolic blood pressure (SBP), diastolic blood pressure (DBP), mean arterial pressure (MAP), respiratory rate (RR), and blood oxygenation (SpO₂). Combinations of these vital signs were also used to derive other measurements including shock index (SI = HR/SBP) and pulse pressure (PP = SBP − DBP). The Murphy factor is a patient status summary alarm that provides the medic with decision support capability by combining all vital signs, trends, and pulse characteristics recorded by the monitor, and applying a multivariate sensor fusion algorithm that generates a combined index of the patient condition in ranges from 0 to 5. Results are interpreted as 0–1 in green, 2–3 in yellow, and 4–5 in red, indicating a patient who is in a normal, low-priority, and high-priority condition, respectively. A baseline shift was defined as a dataset with at least one change from initial measurement over the data-recording time. This provided a learning dataset required for the ML algorithm to learn from changes in blood pressure during transport.

2.2 WVSM database and protocol

Because a limitation of the initial TV cohort was the fact that times of actual LSIs were not recorded and stored in the TV database, an additional 82 h worth of data corresponding to 24 prehospital patient records were chosen based upon availability of times of recorded LSIs from the WVSM protocol. From June 27, 2011, to January 6, 2012, 305 consecutive patients transported from the injury scene via the Life Flight helicopter service to the Memorial Hermann Hospital, a Level I trauma center in Houston, Texas, were enrolled for this protocol. This included data captured from 104 patients wearing the WVSM system during transport to the Houston Level I trauma center.

Data in the WVSM database include severe trauma patients with blunt and penetrating injuries transported from the scene by helicopter service to a Level I trauma center in Houston, Texas. WVSM data were collected using a computerized server system that collected and stored all transport data from the WVSM device through a wireless connection once a patient arrived in the emergency department. Numeric data from the WVSM device were stored at a rate of 1 Hz. In addition, ECG waveform data from a single lead and pleth waveform data from a thumb-mounted pulse oximeter to the WVSM were recorded at the rates of 230 and 75 Hz, respectively. For trauma patients with concomitant lung injuries, respiration waveform data were also recorded at a rate of 10 Hz. Standard vital signs used during trauma care for patient assessment included the same vital signs recorded in the TV database (HR, SBP, DBP, MAP, RR, SpO₂, SI, and PP). All nonelectronic data were manually recorded on an electronic run sheet (Tablet PCR, Zoll Medical, Chelmsford, MA, USA) by Emergency Medical Services medics, then collected on a standardized form, and entered into the WVSM database (OpenClinica). These included demographic data, physical examination results, Glasgow coma scores, and interventions performed on the patients in the field. LSIs consisted of endotracheal intubations, transfusions, tube thoracostomies, cardiopulmonary resuscitations, needle decompressions, angio-embolizations, cricothyrotomies, thoracotomies, and cardioversions. Patients for analysis were selected based upon two criteria: (1) direct transport of the patient from the injury scene to the hospital and (2) an injury requiring hospital admission. Of these 104 patients, 32 received at least one LSI, while only 24 patients had both recorded LSIs and corresponding LSI predictions. Actual LSIs were recorded only when the nurse/paramedic manually pressed a button on the WVSM data-capture-and-display interface. Only the start of each LSI was recorded. These 24 patients provided a validation set for this project. Lengths of these records varied from approximately 3–4 h. Records contained single episodes of data, sometimes missing one or more measurements from different vital signs over the episode’s duration.

2.3 Design, validation, and analysis

Design of a hybrid system for LSI predictions employed two components: (1) a simple rule-based algorithm that would serve as a front end to handle obtrusive cases involving measurements that clearly indicated the need for some interventions and (2) an ML algorithm that would serve as an intelligent component to handle more obscure and complex cases involving measurements unrecognized by the front end (see Fig. 1). This configuration was a novelty proposed from our combined knowledge of ML and medicine. If a patient’s vital signs were clearly abnormal according to a set of basic rules, the patient would be classified as needing an LSI (see rules below). Rules were based on the analysis of data from the TV database. “Normal” measurements included those within the 95 % confidence interval of the database. On the other hand, if vital signs were not obviously abnormal, the data would be passed to the ML algorithm. The basic detection rules were meant to filter out patients who required immediate attention. Order of the rules also reflected the relative importance of measurements for discriminating patient instability and their potential for affecting system performance.

In addition, distributions of initial nonzero BP measurements and mean non-BP measurements across all patient records in the TV database were used to formulate detection rules. First, identification of tail regions of BP-related distributions was combined with knowledge of expected BP ranges (normal SBP 90–120 mm Hg; normal DBP 60–80 mm Hg; normal MAP: DBP + (SBP − DBP)/3 mm Hg) to derive lower-bound and upper-bound thresholds. Similarly, identification of tail regions of non-BP-related distributions was combined with knowledge of expected ranges (normal HR 60–100 bpm; normal RR 12–20 breaths per minute; normal SpO₂ 94–100 % at sea level; normal SI 0.5–0.7 bpm/mm Hg) to derive thresholds. Decision tables, decision trees, and/or conjunction rules were then employed to tailor rules.

Rates of change (slopes) and mean and maximum measurements were used to train an ML algorithm so that it could respond quickly to measured trends and disparities in a patient’s vital signs. We used linear regression to calculate slope values, ignoring those values equal to zero to derive an estimate of the rate of change for numeric values across time. A sliding window of 180 s was used to calculate slopes for all non-BP-related vital signs. Because BP-related measurements were recorded every 3 min, a sliding window of 540 s was used to calculate slopes for these measurements. Data were configured for input into an ML modeler (WEKA, University of Waikato, New Zealand) to generate the ML model [8].

The main criteria to train the ML algorithm using the TV database were a strong correlation between inputs and outputs, preferably, with correlation coefficient greater than 70 %, and a low mean absolute error, with values less than 30 %. From a system perspective, the ML algorithm needed to produce smooth continuous outputs (probabilities) between 0 and 1, indicating the need for an LSI. Because these outputs could not be binomial nor jump sporadically up and down with discontinuities, data corresponding to patients with LSIs were not separated from data corresponding to patients without LSIs. Visual assessment of the outputs was required to evaluate system performance.

To design our ML algorithm, we used a tenfold cross-validation approach [1, 6, 13]. Because standardizing the inputs improves the numerical condition of the data for training, we preprocessed the data before training the classifier by replacing all unknown or missing features for each given patient record in the dataset with zeros and normalizing all other features so that features fall within the range −1–1. In particular, we used a maximum–minimum normalization rule as follows:

$$\bar{x}_{i,j} = \frac{{x_{i,j} - \frac{1}{2}\left( {x_{{i,j_{{\rm max} } }} + x_{{i,j_{{\rm min} } }} } \right)}}{{\frac{1}{2}\left( {x_{{i,j_{{\rm max} } }} - x_{{i,j_{{\rm min} } }} } \right)}}$$

∀ i = 1, …, N, ∀ j = 1, …, M and j ⊂ M _i, where x _i,j denotes the jth feature value of feature set i, N denotes the number of instances (feature sets) in the training data, M denotes the number of features in an instance, and M _i denotes the set of known features in the feature set i. In addition, we replicated each feature set, replacing unknown or missing feature values for each set with averaged values over all values in the training data and then normalizing all feature values using the rule above. This second set was only used for providing a confidence interval for outputs from the first dataset, not for model training.

The following features formed a feature set of our training data for designing the MLP: slope of SBP, current SBP, slope of DBP, current DBP, slope of MAP, current MAP, slope of SpO₂, mean SpO₂, slope of RR, mean RR, slope of HR, mean HR, slope of inverted SI, mean inverted SI, slope of PP, current PP, maximum SBP, maximum DBP, maximum MAP, maximum SpO₂, maximum RR, maximum HR, maximum inverted SI, and maximum PP. Moreover, classifications were obtained by remapping Murphy scores to a scale between 0 and 1, i.e., a nominal probability. The final training data consisted of over 110,000 feature sets. Thus, these training data covered more than 30 h of data, ranging across different physiologic, temporal, and spatial conditions.

Validation involved the WVSM protocol and its patient records and was accomplished by determining the output of our hybrid system at the time of the recorded LSI and the maximum output of our system 60 s, 3 min, and 5 min prior to the recorded LSI. In other words, the observation window ended at the time of the recorded LSI. Further validation was done by sampling outputs during the first 5 min of each patient record and 5 min prior to the start of each LSI. An initial analysis classified prediction outputs (probabilities) >30 % as true positives (TPs) and otherwise as false negatives (FNs). Similarly, a second analysis classified outputs >50 % as TPs and otherwise as FNs, respective of the analysis.

3 Results

3.1 Model development

The demographics of the 79 patients included in this study are depicted in Table 1; likewise, the demographics of the WVSM patients are shown in Table 2. Quartiles were established for age. Race and age were not different between those patients who received at least one LSI and those who received none, nor did male gender predispose to an LSI. Likewise, increasing patient age did not increase the frequency of an LSI in this sample/study. Of the 79 patients, 24 (30 %) did not require an LSI. The other 55 patients received a total of 124 LSIs. Thirty-nine percent (48) of the LSIs were performed prehospital, 60 % (74) in the emergency room, and 1 % (2) elsewhere. Interventions consisted of the following: 42 endotracheal intubations, 42 transfusions, 18 tube thoracostomies, eight cardiopulmonary resuscitations, five needle decompressions, five angio-embolizations, two cricothyrotomies, two thoracotomies, and one cardioversion. Table 3 describes the hybrid system’s front-end component, that is, the basic detection rules that were used to identify patients who required immediate interventions. As a note, the value of 0.9 (90 %) in the table was arbitrary and only used to indicate the fact that abnormal measurements should alert the provider's attention to a strong need for an LSI.

Table 1 Demographics of selected patients from the Trauma Vitals database

Full size table

Table 2 Demographics of selected patients from the Wireless Vital Signs Monitor protocol

Full size table

Table 3 Basic detection rules

Full size table

We trained and compared several ML models, including decision trees, conjunction rules, support vector machines, artificial neural networks, multilayer perceptrons, and logistic regression models. Models were generated for the 110,000+ feature sets using WEKA and binary and continuous classes. In order to develop a real-time hybrid system to predict the need for LSIs, i.e., output a continuous probability, we limited comparisons to artificial neural networks, multilayer perceptrons, and logistic regression models and sought models that yielded highest correlation and lowest errors. Comparisons of top cross-validation results are shown in Table 4.

Table 4 Comparisons of cross-validation results for various machine learning models

Full size table

Comparisons among several proposed models proved that a multilayer perceptron (MLP) would best implement the ML algorithm in the novel hybrid LSI prediction system. This ML model consisted of 24 inputs, 12 hidden nodes that each contained a set of 24 optimized weights, and one output that contained a set of 12 optimized weights. The back-propagation algorithm (learning rate 0.05, momentum 0.2) was used to train the MLP (as well as all other algorithms in Table 4). The activation function employed by the MLP was the sigmoid function. Given 111,028 feature sets, the WEKA tool spent exactly 10.3 h (36,861.2 s) to generate weights for the MLP nodes. In Table 4, the high correlation coefficient of 0.8072 indicates that the predicted probabilities of the MLP matched the desired probabilities reasonably well. In addition, the MLP model achieved a relatively low mean absolute error of 0.1612, which equals the sum total of the absolute differences between each desired probability and its predicted probability divided by the total number of instances during cross-validation.

Our ML algorithm system was able to generate outputs commensurate with baseline changes in the patients’ vital signs in real time. Through a graphical interface, we analyzed these results, and for selected records, we plotted predicted probabilities against patient features in order to assess the influence of every feature on the prediction model. We illustrate our analyses by showing an example in Fig. 2.

When all vital signs were available, RR and SpO₂ were most discriminative in detecting patient instability and affecting system performance. These results agreed with the fact that measurements outside of the 95 % confidence interval of measurement distributions (from the TV database) would immediately trigger the basic detection rules of the hybrid system. When RR and SpO2 measurements were missing from the input set, BP-related vital signs (SBP, DBP, MAP, and PP) were most discriminative in detecting patient instability (see Fig. 2).

3.2 Validation

To validate the model, we employed an additional set of data derived from 305 patients of which 37.7 % required an LSI (Table 3). Of the 199 LSIs, 90 (45 %) were performed prehospital and 109 (55 %) in the emergency department.

There were 295,994 feature sets from 82 h of real-world patient data to validate the hybrid classification system. Table 5 shows confusion matrices for the initial analysis, as described in the “Design, validation, and analysis” section. Importantly, the system was able to obtain a sensitivity of 89.8 % within 5 min of recorded LSIs when a probability >30 % was denoted as a TP. Moreover, the system achieved a positive predictive accuracy of 96.4 % for observation windows described in the previous section.

Table 5 Confusion matrices for the performance of the hybrid system

Full size table

4 Discussion

Although the application of ML algorithms to datasets began over 50 years ago and now has roots in multiple disciplines [4, 12], only recently has this technology been introduced to trauma research. Furthermore, ML technology has rarely been applied to trauma diagnosis, decision support, or clinical practice for the trauma patient. This study was designed to advance trauma patient care through the development and validation of an ML algorithm and hybrid system to predict the need for LSIs in trauma patients. In previous work, only ML and new vital signs were explored for their utility to discriminate between LSI and non-LSI patients [2]. Neither standard vital signs nor trends were used for identifying LSI patients. Likewise, numerous studies utilizing various ML techniques in order to discriminate between different patient groups have been conducted. However, to date, no study has investigated the possibilities of predicting in real time the needs for LSIs in trauma patients using ML and other information.

By producing over 110,000 feature sets from various vital sign measurements of a select cohort of trauma patients, we intended to capture the synergistic complexities among vital signs, derived statistics, time, and spatial/environmental factors—complexities that may not be understood by the health practitioner in an emergency situation. Since feature sets retrospectively scored patients on a per-second basis, time became an integral part of real-time system design. Furthermore, unlike previous work [2, 10, 11, 14], system design involved not only the development of an ML model but also formulation of basic detection rules.

In addition, we chose to develop a real-time ML algorithm system that incorporates an MLP based upon the ability to handle complex datasets and perform well on nonlinear data, especially missing data. Moreover, MLPs yield numerical outputs equivalent to probabilities, provide easy real-time implementation in software, and learn through conventional techniques (such as the back-propagation algorithm). An MLP has a major strength over a traditional artificial neural network in that it uses a hidden layer or layers of nodes and transforms every weighted sum using a nonlinear function before making any threshold comparisons. Hence, MLPs not only distinguish, if possible, the instances of classes in some feature space, but also join isolated convex regions into a single class [13].

For this study, the term “prediction” denoted the probability that a patient needs an LSI at a particular time. While ML may help predict whether a patient should receive an LSI, the accuracy of that prediction and its confidence interval depend upon the availability of measurements and their buffered histories. In other words, we expected that the longer our system could buffer measurements and calculate features, the more reliable our system would perform. As such, initial predictions would only make sense with respect to their place in time, and confidence intervals would only improve as time goes on. As a part of the design process, we explored the types and numbers of features that would best assist ML. To add robustness to system design and validation, datasets included feature sets that contained missing vital sign measurements.

Interpretation of outputs during this study influenced use and performance of the hybrid prediction system; likewise, selection of outputs in a given time frame. When a probability >30 % was denoted as a TP, the system was able to obtain an accuracy of 89.8 % within 5 min of recorded LSIs. As this selection time frame was narrowed, the hybrid system achieved smaller accuracies. On the other hand, as the selection time frame increased, the simple rule-based algorithm played a greater role in indicating patient destabilization. When a probability >50 % was denoted as a TP, the system was able to obtain an accuracy of 69.5 % within 5 min of recorded LSIs.

4.1 Limitations

This study had a number of limitations. The sizes of the training and validation datasets were small, i.e., they contained less than 120 h of data from less than 110 patients in total. Moreover, the results were preliminary due to the dataset sizes, criteria for selecting the data, the training dataset used to design our ML algorithm, and the fact that nonpresence of an LSI does not equate to a need for an LSI. Therefore, we tended to err on the side that certain measurements may indicate the possible need of an intervention rather than indicate an LSI is not required. This is the basic concept of overtriage that is a central tenet of trauma care. In order to trade off the requirement of low outputs (<10 %) that indicated stable measurements with the requirement of high outputs (>90 %) that indicated patient needs, we accepted the middle ground that a system would gravitate extremes toward the center, and we chose, instead, to add an offset to ML outputs according to a power-law adjustment (see Table 6). In other words, outputs closer to 100 % were subtracted with a smaller power of 2, whereas outputs closer to 0 % were subtracted with a larger power of 2. This adjustment compensated for the bias in the training dataset so that the real-time ML system could yield a wide range of values, including very small (e.g., 0 %) and very large (e.g., 100 %) predictions.

Table 6 Power-law adjustment for system outputs as probabilities

Full size table

Lastly, this study did not investigate the impact of noise and artifacts in the measurements on the real-time performance of our system. Although the training dataset contained missing data and erroneous measurements and system design employed safeguards against abnormal measurements, future studies using larger datasets and noisy measurements will be required to test system performance thoroughly and improve system robustness.

In summary, we developed and validated an algorithm and system to predict the probability of a trauma patient requiring an LSI. The system is composed of an MLP and rules for predicting the need for LSIs in both prehospital and emergency department trauma patients. The performance of our system demonstrates that ML technology combined with basic detection rules may provide valuable support in assessing trauma patients within the critical care environment. Future studies will expand on the described approach utilizing assigned prediction probabilities derived from this initial effort and include system validation in a clinical trial with both recorded LSIs and times of performance.

References

Alpaydin E (2004) Introduction to machine learning, 1st edn. MIT Press, Cambridge, pp 1–16
Google Scholar
Batchinsky AI, Salinas J, Jones JA, Necsoiu C, Cancio LC (2009) Predicting the need to perform life-saving interventions in trauma patients using new vital signs and artificial neural networks. Lect Notes Comput Sc 5651:390–394
Article Google Scholar
Baxt WG, Jones G, Fortlage D (1990) The trauma triage rule: a new, resource-based approach to the prehospital identification of major trauma victims. Ann Emerg Med 19:1401–1406
Article CAS PubMed Google Scholar
Buchanan BG (2005) A (very) brief history of artificial intelligence. AI Magazine 26:53–60
Google Scholar
Chen L, Reisner AT, Gribok A, Reifman J (2009) Exploration of prehospital vital sign trends for the prediction of trauma outcomes. Prehosp Emerg Care 13:286–294
Article PubMed Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. John Wiley and Sons, New York, pp 215–349
Google Scholar
Garner A, Lee A, Harrison K, Schultz CH (2001) Comparative analysis of multiple-casualty incident triage algorithms. Ann Emerg Med 38:541–548
Article CAS PubMed Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11:10–18
Article Google Scholar
Holcomb JB, Salinas J, McManus JM, Miller CC, Cooke WH, Convertino VA (2005) Manual vital signs reliably predict need for life-saving interventions in trauma patients. J Trauma 59:821–829
Article PubMed Google Scholar
Hravnak M, Devita MA, Clontz A, Edwards L, Valenta C, Pinsky MR (2008) Cardiorespiratory instability before and after implementing an integrated monitoring system. Crit Care Med 177:A842
Google Scholar
Hravnak M, Edwards L, Clontz A, Valenta C, DeVita MA, Pinsky MR (2008) Defining the incidence of cardio-respiratory instability in step-down unit patients using an electronic integrated monitoring system. Arch Intern Med 168:1300–1308
Article PubMed Central PubMed Google Scholar
Patela VL, Shortliffea EH, Stefanellic M, Szolovitsd P, Bertholde MR, Bellazzic R, Abu-Hanna A (2009) The coming of age of artificial intelligence in medicine. Artif Intell Med 46:5–17
Article Google Scholar
Rohwer R, Wynne-Jones M, Wysotzki F (1994) Neural Networks. In: Michie D, Spiegelhalter DJ, Taylor CC (eds) Machine learning, neural and statistical classification. Series in artificial intelligence. Ellis Horwood Publishing, New York, pp 84–105
Google Scholar
Tarassenko L, Hann A, Young D (2006) Integrated monitoring and analysis for early warning of patient deterioration. Br J Anaesth 97:64–68
Article CAS PubMed Google Scholar

Download references

Acknowledgments

This work was supported by the National Trauma Institute, the US Army Combat Casualty Care Research Program, and the State of Texas Emergency Technology Fund. We acknowledge the expertise, dedication, and professionalism of the Emergency Medical Services paramedics, nurses, and staff in Houston; Denise Hinds, Timothy Welch, and Jeannette Podbielski (the University of Texas Health Science Center in Houston, Texas, USA); and Kevin Stitcher (Athena GTX, Inc). We also thank Athena GTX, Inc. for the use of the Murphy factor to support algorithm development.

Conflict of interest

None.

Author information

Authors and Affiliations

US Army Institute of Surgical Research, 3650 Chambers Pass, Building 3610, Fort Sam Houston, TX 78234-6315, USA
Nehemiah T. Liu, Andriy I. Batchinsky, Leopoldo C. Cancio & José Salinas
Department of Surgery, Center for Translational Injury Research, University of Texas Health Science Center at Houston, 6431 Fannin Street, Houston, TX, 77030-1501, USA
John B. Holcomb & Charles E. Wade
Athena GTX, Inc., 3620 SW 61st Street, Suite 395, Des Moines, IA, 50321-2419, USA
Mark I. Darrah

Authors

Nehemiah T. Liu
View author publications
You can also search for this author in PubMed Google Scholar
John B. Holcomb
View author publications
You can also search for this author in PubMed Google Scholar
Charles E. Wade
View author publications
You can also search for this author in PubMed Google Scholar
Andriy I. Batchinsky
View author publications
You can also search for this author in PubMed Google Scholar
Leopoldo C. Cancio
View author publications
You can also search for this author in PubMed Google Scholar
Mark I. Darrah
View author publications
You can also search for this author in PubMed Google Scholar
José Salinas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nehemiah T. Liu.

Additional information

Disclaimer: The opinions or assertions contained herein are the private views of the authors and are not to be construed as official or as reflecting the views of the Department of the Army or the Department of Defense.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, N.T., Holcomb, J.B., Wade, C.E. et al. Development and validation of a machine learning algorithm and hybrid system to predict the need for life-saving interventions in trauma patients. Med Biol Eng Comput 52, 193–203 (2014). https://doi.org/10.1007/s11517-013-1130-x

Download citation

Received: 15 July 2013
Accepted: 08 November 2013
Published: 22 November 2013
Issue Date: February 2014
DOI: https://doi.org/10.1007/s11517-013-1130-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Development and validation of a machine learning algorithm and hybrid system to predict the need for life-saving interventions in trauma patients

Abstract

Similar content being viewed by others

Machine learning combining CT findings and clinical parameters improves prediction of length of stay and ICU admission in torso trauma

Predicting the complexity and mortality of polytrauma patients with machine learning models

Machine learning techniques for mortality prediction in critical traumatic patients: anatomic and physiologic variables from the RETRAUCI study

1 Introduction