1 Introduction

Hemorrhage continues to be the leading cause of preventable mortality in trauma patients. For many of these patients, the key to effective assessment begins with the accurate identification and diagnosis of injury type and severity in order to determine the appropriate and timely treatment options. In this environment, potential delays in performing a life-saving intervention (LSI) may compromise a patient’s stability, complicate injuries, and lead to increased morbidity or mortality. Although vital signs—such as a patient’s body temperature, heart rate, blood pressure, and respiratory rate—play significant roles in monitoring trauma patients and are generally used as means to assess patient condition, they alone may not identify patient destabilization until late and often irreversible changes in state take place. The utility of available field vital signs depends heavily on concomitant interpretation by an expert provider, and vital signs are only measured intermittently during patient care at both prehospital and hospital phases. Previous studies have shown that standard vital signs available from monitors are no better in determining true injury status and severity in trauma patients than a simple physical examination [9]. Nonetheless, measurement and interpretation of electronic vital signs have become routine during prehospital and hospital care. In order to achieve more accurate diagnostic capabilities, new approaches based on the combinations of multiple vital signs, trends, and other information may be better suited for trauma diagnosis [5]. However, as approaches required for improving the sensitivity of diagnosis become more complex, use by providers may become increasingly difficult without the use of adjuncts capable of deriving solutions based on complicated mathematical formulas.

Because of the intricate relationships that exist between vital signs, time, and other factors, developing new approaches that take these items into account will also require the use of advanced information systems and computer algorithms coupled with technologies capable of processing and fusing multiple parameters, weight, and trends. These algorithms provide the capability to extract the maximum information content available from both single vital signs and combinations of multiple vital signs across time points. Machine learning (ML) technology is one approach that has been studied recently as a potential approach and solution for the multivariate processing of vital signs required to accurately diagnose the patient condition in the critical care environment [2, 10, 11, 14]. In addition, because such an environment requires rapid, accurate decisions within a short time frame, ML technology can potentially help describe possible interventions earlier. This approach uses information technology to mimic human decision making and provide an automated approach for processing vital signs and other patient data, with the aim of predicting needs rapidly for patient care. The use of ML technology may facilitate the process of triaging patients to appropriate trauma centers where experienced personnel can rapidly perform LSIs. By providing diagnostic support, computers and ML may fit well into a prehospital triage algorithm that focuses on LSIs as its optimal end point [3, 7]. Additionally, such technology may strengthen the ability to accurately triage trauma patients in the prehospital environment and hence improve the survival of patients that would have otherwise died.

The purpose of this study was to test and evaluate different types of artificial intelligence and ML methods for modeling injury severity (as defined by the need for an LSI during prehospital and/or in the emergency department) of a set of retrospectively and prospectively collected trauma patients based on data collected over 5 years, most recently from the Wireless Vital Signs Monitor (WVSM, Athena GTX, Des Moines, Iowa) trial. Methods were implemented as a real software application module capable of real-time processing in a computer system with moderate performance specifications. A 90/10 cross-validation approach was used for designing the predictive capacity of the algorithm.

We hypothesized that because computers and ML can process large amounts of disparate data continuously, quickly, and accurately, they would not only benefit trauma diagnosis, especially in the context of prehospital triage, but also integrate well into an electronic system that may perform LSI predictions in real time.

A novelty of this study was the development and validation of an ML algorithm and hybrid system to predict the need for LSIs in trauma patients. While there have been numerous studies utilizing decision trees, conjunction rules, support vector machines, artificial neural networks, multilayer perceptrons, and logistic regression models in order to discriminate between different patient groups, to date, no study has investigated the possibilities of predicting in real time the needs for LSIs in trauma patients. The ultimate goal of this work was to address this shortcoming and provide physicians with a new tool for decision support.

2 Methods

This study was approved by the Institutional Review Boards of the US Army Institute of Surgical Research, Fort Sam Houston, TX, USA, and the University of Texas Health Science Center at Houston, Houston, TX, USA. We analyzed data from both the Trauma Vitals (TV) database and the WVSM to generate datasets for training and validating an artificial intelligence model, respectively.

2.1 TV database and protocol

Data in the TV database include severe trauma patients with blunt and penetrating injuries transported from the scene by helicopter service to a Level I trauma center in Houston, Texas, or San Antonio, Texas. Patients were monitored from the scene during transport using a Welch Allyn Propaq 206 (Welch Allyn, Skaneateles Falls, NY) monitor or Welch Allyn PIC 50 (Welch Allyn, Skaneateles Falls, NY) monitor. Propaq data were collected using a computerized personal digital assistant (PDA) attached to the monitor during transport. Data were stored in a nonvolatile memory card in the PDA for use during the study. All numeric Propaq data were stored at a rate of 1 Hz. Waveform data were recorded at a rate of 182 Hz. PIC 50 data were stored in a built-in flash memory card attached to the monitor. PIC 50 numeric data were stored at a rate of one measurement every 3 min, coinciding with the patient’s noninvasive blood pressure measurements. Waveform data were stored at a rate of 375 Hz. Data from the PDA and flash cards were extracted by research personnel and uploaded to the TV database for analysis. All nonelectronic data were manually recorded on the run sheet from the monitor’s screen by Emergency Medical Services medics, then collected on a standardized form, and entered into the TV database. These included demographic data, physical examination results, Glasgow coma scores, and interventions performed on the patients in the field. LSIs consisted of endotracheal intubations, transfusions, tube thoracostomies, cardiopulmonary resuscitations, needle decompressions, angio-embolizations, cricothyrotomies, thoracotomies, and cardioversions.

Data for the ML model were selected based on a generic population sample that would prevent the algorithm from training to only a subset of the general population (which would result in a nongeneralizable model). We selected over 30 h worth of data corresponding to 79 prehospital patient records from the TV database based upon three criteria necessary to maximize the input parameter values required to provide an optimal learning set to the ML engine. These included (1) availability of vital signs and patient status summary scores (Murphy Factor, Athena GTX, Des Moines, Iowa) with values from 0 to 5 for the patient on a second-by-second basis, (2) blood pressure measurements over a minimum of 15 min that also changed from initial measurements (had a baseline shift to yield measurable slopes), and (3) heart rate measurements uncorrupted by electromechanical noise. Lengths of patient records varied from approximately 15–30 min. Records contained single episodes of data, sometimes missing one or more measurements from different vital signs over the episode’s duration. Standard vital signs used during trauma care for patient assessment included heart rate (HR), systolic blood pressure (SBP), diastolic blood pressure (DBP), mean arterial pressure (MAP), respiratory rate (RR), and blood oxygenation (SpO2). Combinations of these vital signs were also used to derive other measurements including shock index (SI = HR/SBP) and pulse pressure (PP = SBP − DBP). The Murphy factor is a patient status summary alarm that provides the medic with decision support capability by combining all vital signs, trends, and pulse characteristics recorded by the monitor, and applying a multivariate sensor fusion algorithm that generates a combined index of the patient condition in ranges from 0 to 5. Results are interpreted as 0–1 in green, 2–3 in yellow, and 4–5 in red, indicating a patient who is in a normal, low-priority, and high-priority condition, respectively. A baseline shift was defined as a dataset with at least one change from initial measurement over the data-recording time. This provided a learning dataset required for the ML algorithm to learn from changes in blood pressure during transport.

2.2 WVSM database and protocol

Because a limitation of the initial TV cohort was the fact that times of actual LSIs were not recorded and stored in the TV database, an additional 82 h worth of data corresponding to 24 prehospital patient records were chosen based upon availability of times of recorded LSIs from the WVSM protocol. From June 27, 2011, to January 6, 2012, 305 consecutive patients transported from the injury scene via the Life Flight helicopter service to the Memorial Hermann Hospital, a Level I trauma center in Houston, Texas, were enrolled for this protocol. This included data captured from 104 patients wearing the WVSM system during transport to the Houston Level I trauma center.

Data in the WVSM database include severe trauma patients with blunt and penetrating injuries transported from the scene by helicopter service to a Level I trauma center in Houston, Texas. WVSM data were collected using a computerized server system that collected and stored all transport data from the WVSM device through a wireless connection once a patient arrived in the emergency department. Numeric data from the WVSM device were stored at a rate of 1 Hz. In addition, ECG waveform data from a single lead and pleth waveform data from a thumb-mounted pulse oximeter to the WVSM were recorded at the rates of 230 and 75 Hz, respectively. For trauma patients with concomitant lung injuries, respiration waveform data were also recorded at a rate of 10 Hz. Standard vital signs used during trauma care for patient assessment included the same vital signs recorded in the TV database (HR, SBP, DBP, MAP, RR, SpO2, SI, and PP). All nonelectronic data were manually recorded on an electronic run sheet (Tablet PCR, Zoll Medical, Chelmsford, MA, USA) by Emergency Medical Services medics, then collected on a standardized form, and entered into the WVSM database (OpenClinica). These included demographic data, physical examination results, Glasgow coma scores, and interventions performed on the patients in the field. LSIs consisted of endotracheal intubations, transfusions, tube thoracostomies, cardiopulmonary resuscitations, needle decompressions, angio-embolizations, cricothyrotomies, thoracotomies, and cardioversions. Patients for analysis were selected based upon two criteria: (1) direct transport of the patient from the injury scene to the hospital and (2) an injury requiring hospital admission. Of these 104 patients, 32 received at least one LSI, while only 24 patients had both recorded LSIs and corresponding LSI predictions. Actual LSIs were recorded only when the nurse/paramedic manually pressed a button on the WVSM data-capture-and-display interface. Only the start of each LSI was recorded. These 24 patients provided a validation set for this project. Lengths of these records varied from approximately 3–4 h. Records contained single episodes of data, sometimes missing one or more measurements from different vital signs over the episode’s duration.

2.3 Design, validation, and analysis

Design of a hybrid system for LSI predictions employed two components: (1) a simple rule-based algorithm that would serve as a front end to handle obtrusive cases involving measurements that clearly indicated the need for some interventions and (2) an ML algorithm that would serve as an intelligent component to handle more obscure and complex cases involving measurements unrecognized by the front end (see Fig. 1). This configuration was a novelty proposed from our combined knowledge of ML and medicine. If a patient’s vital signs were clearly abnormal according to a set of basic rules, the patient would be classified as needing an LSI (see rules below). Rules were based on the analysis of data from the TV database. “Normal” measurements included those within the 95 % confidence interval of the database. On the other hand, if vital signs were not obviously abnormal, the data would be passed to the ML algorithm. The basic detection rules were meant to filter out patients who required immediate attention. Order of the rules also reflected the relative importance of measurements for discriminating patient instability and their potential for affecting system performance.

Fig. 1
figure 1

A hybrid system for predicting the need for life-saving interventions in trauma patients. To predict the need for life-saving interventions, a hybrid system could employ the following components: (1) a component that extracts features from the measurements of various vital signs, (2) a simple rule-based algorithm that handles obtrusive cases (features) involving measurements that clearly indicated the need for some interventions, and (3) a machine learning algorithm (multilayer perceptron) that handles more obscure and complex cases (features) involving measurements unrecognized by the rule-based algorithm. If a patient’s vital signs were clearly abnormal according to a set of basic rules, the patient would be classified as needing a life-saving intervention. Standard vital signs used during trauma care for patient assessment often include heart rate (HR), systolic blood pressure (SBP), diastolic blood pressure (DBP), mean arterial pressure (MAP), respiratory rate (RR), and blood oxygenation (SpO2). Combinations of these vital signs are also used to derive other measurements including shock index (SI = HR/SBP) and pulse pressure (PP = SBP − DBP)

In addition, distributions of initial nonzero BP measurements and mean non-BP measurements across all patient records in the TV database were used to formulate detection rules. First, identification of tail regions of BP-related distributions was combined with knowledge of expected BP ranges (normal SBP 90–120 mm Hg; normal DBP 60–80 mm Hg; normal MAP: DBP + (SBP − DBP)/3 mm Hg) to derive lower-bound and upper-bound thresholds. Similarly, identification of tail regions of non-BP-related distributions was combined with knowledge of expected ranges (normal HR 60–100 bpm; normal RR 12–20 breaths per minute; normal SpO2 94–100 % at sea level; normal SI 0.5–0.7 bpm/mm Hg) to derive thresholds. Decision tables, decision trees, and/or conjunction rules were then employed to tailor rules.

Rates of change (slopes) and mean and maximum measurements were used to train an ML algorithm so that it could respond quickly to measured trends and disparities in a patient’s vital signs. We used linear regression to calculate slope values, ignoring those values equal to zero to derive an estimate of the rate of change for numeric values across time. A sliding window of 180 s was used to calculate slopes for all non-BP-related vital signs. Because BP-related measurements were recorded every 3 min, a sliding window of 540 s was used to calculate slopes for these measurements. Data were configured for input into an ML modeler (WEKA, University of Waikato, New Zealand) to generate the ML model [8].

The main criteria to train the ML algorithm using the TV database were a strong correlation between inputs and outputs, preferably, with correlation coefficient greater than 70 %, and a low mean absolute error, with values less than 30 %. From a system perspective, the ML algorithm needed to produce smooth continuous outputs (probabilities) between 0 and 1, indicating the need for an LSI. Because these outputs could not be binomial nor jump sporadically up and down with discontinuities, data corresponding to patients with LSIs were not separated from data corresponding to patients without LSIs. Visual assessment of the outputs was required to evaluate system performance.

To design our ML algorithm, we used a tenfold cross-validation approach [1, 6, 13]. Because standardizing the inputs improves the numerical condition of the data for training, we preprocessed the data before training the classifier by replacing all unknown or missing features for each given patient record in the dataset with zeros and normalizing all other features so that features fall within the range −1–1. In particular, we used a maximum–minimum normalization rule as follows:

$$\bar{x}_{i,j} = \frac{{x_{i,j} - \frac{1}{2}\left( {x_{{i,j_{{\rm max} } }} + x_{{i,j_{{\rm min} } }} } \right)}}{{\frac{1}{2}\left( {x_{{i,j_{{\rm max} } }} - x_{{i,j_{{\rm min} } }} } \right)}}$$

i = 1, …, N, ∀ j = 1, …, M and j ⊂ M i , where x i,j denotes the jth feature value of feature set i, N denotes the number of instances (feature sets) in the training data, M denotes the number of features in an instance, and M i denotes the set of known features in the feature set i. In addition, we replicated each feature set, replacing unknown or missing feature values for each set with averaged values over all values in the training data and then normalizing all feature values using the rule above. This second set was only used for providing a confidence interval for outputs from the first dataset, not for model training.

The following features formed a feature set of our training data for designing the MLP: slope of SBP, current SBP, slope of DBP, current DBP, slope of MAP, current MAP, slope of SpO2, mean SpO2, slope of RR, mean RR, slope of HR, mean HR, slope of inverted SI, mean inverted SI, slope of PP, current PP, maximum SBP, maximum DBP, maximum MAP, maximum SpO2, maximum RR, maximum HR, maximum inverted SI, and maximum PP. Moreover, classifications were obtained by remapping Murphy scores to a scale between 0 and 1, i.e., a nominal probability. The final training data consisted of over 110,000 feature sets. Thus, these training data covered more than 30 h of data, ranging across different physiologic, temporal, and spatial conditions.

Validation involved the WVSM protocol and its patient records and was accomplished by determining the output of our hybrid system at the time of the recorded LSI and the maximum output of our system 60 s, 3 min, and 5 min prior to the recorded LSI. In other words, the observation window ended at the time of the recorded LSI. Further validation was done by sampling outputs during the first 5 min of each patient record and 5 min prior to the start of each LSI. An initial analysis classified prediction outputs (probabilities) >30 % as true positives (TPs) and otherwise as false negatives (FNs). Similarly, a second analysis classified outputs >50 % as TPs and otherwise as FNs, respective of the analysis.

3 Results

3.1 Model development

The demographics of the 79 patients included in this study are depicted in Table 1; likewise, the demographics of the WVSM patients are shown in Table 2. Quartiles were established for age. Race and age were not different between those patients who received at least one LSI and those who received none, nor did male gender predispose to an LSI. Likewise, increasing patient age did not increase the frequency of an LSI in this sample/study. Of the 79 patients, 24 (30 %) did not require an LSI. The other 55 patients received a total of 124 LSIs. Thirty-nine percent (48) of the LSIs were performed prehospital, 60 % (74) in the emergency room, and 1 % (2) elsewhere. Interventions consisted of the following: 42 endotracheal intubations, 42 transfusions, 18 tube thoracostomies, eight cardiopulmonary resuscitations, five needle decompressions, five angio-embolizations, two cricothyrotomies, two thoracotomies, and one cardioversion. Table 3 describes the hybrid system’s front-end component, that is, the basic detection rules that were used to identify patients who required immediate interventions. As a note, the value of 0.9 (90 %) in the table was arbitrary and only used to indicate the fact that abnormal measurements should alert the provider's attention to a strong need for an LSI.

Table 1 Demographics of selected patients from the Trauma Vitals database
Table 2 Demographics of selected patients from the Wireless Vital Signs Monitor protocol
Table 3 Basic detection rules

We trained and compared several ML models, including decision trees, conjunction rules, support vector machines, artificial neural networks, multilayer perceptrons, and logistic regression models. Models were generated for the 110,000+ feature sets using WEKA and binary and continuous classes. In order to develop a real-time hybrid system to predict the need for LSIs, i.e., output a continuous probability, we limited comparisons to artificial neural networks, multilayer perceptrons, and logistic regression models and sought models that yielded highest correlation and lowest errors. Comparisons of top cross-validation results are shown in Table 4.

Table 4 Comparisons of cross-validation results for various machine learning models

Comparisons among several proposed models proved that a multilayer perceptron (MLP) would best implement the ML algorithm in the novel hybrid LSI prediction system. This ML model consisted of 24 inputs, 12 hidden nodes that each contained a set of 24 optimized weights, and one output that contained a set of 12 optimized weights. The back-propagation algorithm (learning rate 0.05, momentum 0.2) was used to train the MLP (as well as all other algorithms in Table 4). The activation function employed by the MLP was the sigmoid function. Given 111,028 feature sets, the WEKA tool spent exactly 10.3 h (36,861.2 s) to generate weights for the MLP nodes. In Table 4, the high correlation coefficient of 0.8072 indicates that the predicted probabilities of the MLP matched the desired probabilities reasonably well. In addition, the MLP model achieved a relatively low mean absolute error of 0.1612, which equals the sum total of the absolute differences between each desired probability and its predicted probability divided by the total number of instances during cross-validation.

Our ML algorithm system was able to generate outputs commensurate with baseline changes in the patients’ vital signs in real time. Through a graphical interface, we analyzed these results, and for selected records, we plotted predicted probabilities against patient features in order to assess the influence of every feature on the prediction model. We illustrate our analyses by showing an example in Fig. 2.

Fig. 2
figure 2

Plots of prenormalized features and predictions versus time for a trauma patient record. Standard vital signs used during trauma care for patient assessment included heart rate (HR), systolic blood pressure (SBP), diastolic blood pressure (DBP), mean arterial pressure (MAP), respiratory rate (RR), and blood oxygenation (SpO2). Combinations of these vital signs were also used to derive other measurements including shock index (SI = HR/SBP) and pulse pressure (PP = SBP − DBP). The following features were extracted for a hybrid system in order to predict the need for life-saving intervention (LSI as a probability): slope of SBP, current SBP, slope of DBP, current DBP, slope of MAP, current MAP, slope of SpO2, mean SpO2, slope of RR, mean RR, slope of HR, mean HR, slope of inverted SI, mean inverted SI, slope of PP, current PP, maximum SBP, maximum DBP, maximum MAP, maximum SpO2, maximum RR, maximum HR, maximum inverted SI, and maximum PP. Linear regression was used to calculate slope values, ignoring those values equal to zero to derive an estimate of the rate of change for numeric values across time. A sliding window of 180 s was used to calculate slopes for all non-BP-related vital signs. Because BP-related measurements were recorded every 3 min, a sliding window of 540 s was used to calculate slopes for these measurements. For this particular patient, the hybrid system described in this paper yielded appropriate outputs corresponding to the input feature set. The region where the solid black line remained at 90 was a result of the rule-based algorithm of the hybrid system detecting BP-related measurements outside of “normal” range values

When all vital signs were available, RR and SpO2 were most discriminative in detecting patient instability and affecting system performance. These results agreed with the fact that measurements outside of the 95 % confidence interval of measurement distributions (from the TV database) would immediately trigger the basic detection rules of the hybrid system. When RR and SpO2 measurements were missing from the input set, BP-related vital signs (SBP, DBP, MAP, and PP) were most discriminative in detecting patient instability (see Fig. 2).

3.2 Validation

To validate the model, we employed an additional set of data derived from 305 patients of which 37.7 % required an LSI (Table 3). Of the 199 LSIs, 90 (45 %) were performed prehospital and 109 (55 %) in the emergency department.

There were 295,994 feature sets from 82 h of real-world patient data to validate the hybrid classification system. Table 5 shows confusion matrices for the initial analysis, as described in the “Design, validation, and analysis” section. Importantly, the system was able to obtain a sensitivity of 89.8 % within 5 min of recorded LSIs when a probability >30 % was denoted as a TP. Moreover, the system achieved a positive predictive accuracy of 96.4 % for observation windows described in the previous section.

Table 5 Confusion matrices for the performance of the hybrid system

4 Discussion

Although the application of ML algorithms to datasets began over 50 years ago and now has roots in multiple disciplines [4, 12], only recently has this technology been introduced to trauma research. Furthermore, ML technology has rarely been applied to trauma diagnosis, decision support, or clinical practice for the trauma patient. This study was designed to advance trauma patient care through the development and validation of an ML algorithm and hybrid system to predict the need for LSIs in trauma patients. In previous work, only ML and new vital signs were explored for their utility to discriminate between LSI and non-LSI patients [2]. Neither standard vital signs nor trends were used for identifying LSI patients. Likewise, numerous studies utilizing various ML techniques in order to discriminate between different patient groups have been conducted. However, to date, no study has investigated the possibilities of predicting in real time the needs for LSIs in trauma patients using ML and other information.

By producing over 110,000 feature sets from various vital sign measurements of a select cohort of trauma patients, we intended to capture the synergistic complexities among vital signs, derived statistics, time, and spatial/environmental factors—complexities that may not be understood by the health practitioner in an emergency situation. Since feature sets retrospectively scored patients on a per-second basis, time became an integral part of real-time system design. Furthermore, unlike previous work [2, 10, 11, 14], system design involved not only the development of an ML model but also formulation of basic detection rules.

In addition, we chose to develop a real-time ML algorithm system that incorporates an MLP based upon the ability to handle complex datasets and perform well on nonlinear data, especially missing data. Moreover, MLPs yield numerical outputs equivalent to probabilities, provide easy real-time implementation in software, and learn through conventional techniques (such as the back-propagation algorithm). An MLP has a major strength over a traditional artificial neural network in that it uses a hidden layer or layers of nodes and transforms every weighted sum using a nonlinear function before making any threshold comparisons. Hence, MLPs not only distinguish, if possible, the instances of classes in some feature space, but also join isolated convex regions into a single class [13].

For this study, the term “prediction” denoted the probability that a patient needs an LSI at a particular time. While ML may help predict whether a patient should receive an LSI, the accuracy of that prediction and its confidence interval depend upon the availability of measurements and their buffered histories. In other words, we expected that the longer our system could buffer measurements and calculate features, the more reliable our system would perform. As such, initial predictions would only make sense with respect to their place in time, and confidence intervals would only improve as time goes on. As a part of the design process, we explored the types and numbers of features that would best assist ML. To add robustness to system design and validation, datasets included feature sets that contained missing vital sign measurements.

Interpretation of outputs during this study influenced use and performance of the hybrid prediction system; likewise, selection of outputs in a given time frame. When a probability >30 % was denoted as a TP, the system was able to obtain an accuracy of 89.8 % within 5 min of recorded LSIs. As this selection time frame was narrowed, the hybrid system achieved smaller accuracies. On the other hand, as the selection time frame increased, the simple rule-based algorithm played a greater role in indicating patient destabilization. When a probability >50 % was denoted as a TP, the system was able to obtain an accuracy of 69.5 % within 5 min of recorded LSIs.

4.1 Limitations

This study had a number of limitations. The sizes of the training and validation datasets were small, i.e., they contained less than 120 h of data from less than 110 patients in total. Moreover, the results were preliminary due to the dataset sizes, criteria for selecting the data, the training dataset used to design our ML algorithm, and the fact that nonpresence of an LSI does not equate to a need for an LSI. Therefore, we tended to err on the side that certain measurements may indicate the possible need of an intervention rather than indicate an LSI is not required. This is the basic concept of overtriage that is a central tenet of trauma care. In order to trade off the requirement of low outputs (<10 %) that indicated stable measurements with the requirement of high outputs (>90 %) that indicated patient needs, we accepted the middle ground that a system would gravitate extremes toward the center, and we chose, instead, to add an offset to ML outputs according to a power-law adjustment (see Table 6). In other words, outputs closer to 100 % were subtracted with a smaller power of 2, whereas outputs closer to 0 % were subtracted with a larger power of 2. This adjustment compensated for the bias in the training dataset so that the real-time ML system could yield a wide range of values, including very small (e.g., 0 %) and very large (e.g., 100 %) predictions.

Table 6 Power-law adjustment for system outputs as probabilities

Lastly, this study did not investigate the impact of noise and artifacts in the measurements on the real-time performance of our system. Although the training dataset contained missing data and erroneous measurements and system design employed safeguards against abnormal measurements, future studies using larger datasets and noisy measurements will be required to test system performance thoroughly and improve system robustness.

In summary, we developed and validated an algorithm and system to predict the probability of a trauma patient requiring an LSI. The system is composed of an MLP and rules for predicting the need for LSIs in both prehospital and emergency department trauma patients. The performance of our system demonstrates that ML technology combined with basic detection rules may provide valuable support in assessing trauma patients within the critical care environment. Future studies will expand on the described approach utilizing assigned prediction probabilities derived from this initial effort and include system validation in a clinical trial with both recorded LSIs and times of performance.