Keywords

1 Introduction

Preterm infants are not able to regulate their oxygen levels reliably. Hypoxia or hyperoxia may lead to injury, in particular of the brain, which may lead to long-term disabilities. Therefore, oxygen saturation as the most commonly monitored proxy for oxygenation and vital parameters are continuously measured in neonatal intensive care units (NICUs). If the oxygen saturation is out of the set range, an alarm sounds and the staff reacts accordingly by adjusting the oxygen supply or stimulating the infant for breathing. However, 87.5% of the alarms have been shown to be false alarms , which are mostly caused by movements of the infants [1]. False alarms increase the stress for patients, parents, and staff. In particular, if an infant creates many false alarms and suddenly a real critical situation occurs, the response time of the caregivers may be increased. This is due to a phenomenon known as ‘alarm fatigue’ [2,3,4] or as the ‘crying-wolf syndrome’ [5]. Alarm fatigue is a real safety concern and may harm the patients [2,3,4]. One option to solve this problem is to harness technology to suppress false alarms, for example the Fuzzy-logic, a multivalued logic for classification of mathematical objects with blurred boundaries. To caregivers, the vital signs are signals with fuzzy boundaries. For example, a value of SpO2 slightly below its 87% threshold may mean the patient is still doing fine and the alarm can be dismissed as false. Using the Fuzzy-logic based system [1], a suppression of 99.4% of all false alarms was demonstrated. Unfortunately, neither Fuzzy logic nor machine learning algorithms (MLAs) have been implemented in monitoring devices in NICUs, due to insufficient accuracy and legal issues [6, 7]. MLAs are a promising modern approach that may solve the problem of false alarms in NICUs. The aim of our project, therefore, was to test how efficiently MLAs are able to reduce the rates of false alarms by intelligently analyzing data from standard physiological monitoring and from cerebral oximetry.

2 Methods

Data of 25 preterm infants were recorded in the NICU of the University Hospital Zurich (Switzerland) under a declaration of ‘no objection’ by the Ethics Committee of Zurich (KEK-ZH, Req-2016-00720). Parental consent was obtained prior to enrolment. Data of 11 participants were excluded due to missing actionable alarms, or events labeled by the caregivers as real alarms, or due to poor signal quality. The demographic data of the remaining 14 subjects are shown in Table 1.

Table 1 Information on the 14 participants whose monitoring data were included

The standard monitoring device (SMD), the Infinity Delta XL (Dräger, Germany), provided arterial oxygen saturation (SpO2) and the heart rate (HR). The data were streamed for (mean) 55 min via the serial port at a time resolution of 0.5 s. Simultaneously, our in-house built near-infrared spectroscopy (NIRS) oximeter (OxyPrem) [8] measured cerebral oxygen saturation (StO2) from the parietal cortex.

An additional research SpO2 sensor [9], on the opposite fronto-parietal side, measured HR and SpO2. The alarm limits of the SMD were set to: SpO2 < 87%, SpO2 > 95% (only for assisted ventilation or additional O2 supply), HR < 80 bpm and HR > 250 bpm. These limits were frequently crossed and many alarms were created. The staff labeled these alarms as true (C1) or false (C0) according to their professional judgment (an example is shown in Fig. 1 and Table 2).

Fig. 1
figure 1

Example of a recording showing one alarm labeled by the staff and logged (see Table 2)

Table 2 Example of real-time transcriptions in the log-book (corresponding to Fig. 1)

The pre-processing of the 14 datasets was performed in two steps: (i) the research SpO2 sensor caused interferences on StO2 (NIRS) and hence StO2 was filtered to reduce this noise (Fig. 2), (ii) data were of different sampling rates and were resampled to the same rate of 2 Hz.

Fig. 2
figure 2

Original StO2, in green, affected by the interference coming from the research SpO2 sensor. StO2 signal in blue was filtered with 30 points (15 s) moving average filter

The HR and SpO2 of the SMD and the StO2, the oxy- and deoxy-hemoglobin concentration ([O2Hb] and HHb]) of the OxyPrem were directly employed as MLA features. Three additional features that represent the area between the StO2, SpO2 and HR signals and their respective alarm limits were calculated. The rationale for this is that the severity of a deviation depends on both the time and the depth. This also represents the concept of the caregivers. In addition, the last feature corresponds to the phase difference between the [O2Hb] and the [HHb].

To simplify the classifier’s complexity and improve the accuracy of the classification, the most relevant features were automatically identified by a feature selection step. First, the Fisher’s discriminant ratio [10] reduced the set from 10 to 6 features. The Sequential Backward Selection [11] decreased the set further to 4 features: Area SpO2, HR, SpO2 and StO2. The feature selection was carried out on the individual subject level, and in two subjects solely HR and SpO2 were selected. The resulting reduced feature sets were used to train and test the four supervised machine learning algorithms : (i) decision tree (DT), (ii) 5-nearest neighbors (5-NN), (iii) naïve Bayes (NB) and (iv) support vector machine (SVM). Each was implemented in the MATLAB Statistics and Machine Learning Toolbox. The learning phase was repeated 10 times. The classifier was tested each time on different data. To avoid overfitting [12], a ten-fold cross-validation procedure was employed, i.e. in a rotational manner, nine subsets were used as a training set and the remaining one as a test set.

3 Results

During a mean measurement time of 55 min in 14 subjects, 27 true and 578 false alarms occurred. The real alarms occurred at a rate of median 1.5 per measurement. The most important features to discriminate the type of alarm in the classification phase were the area below the SpO2 threshold, HR, SpO2 and StO2, in this order. To evaluate the performance of classifications the accuracy (percentage of correctly classified instances over the total number of instances), sensitivity (percentage of situation classified as real alarms that actually were the real alarms), and specificity (percentage of situations reported as false alarms that actually were the false alarms) are shown in Fig. 3 and Table 3. These values are based on the average of the 10 iterations. It is noteworthy that the cerebral StO2 improved the classification accuracy.

Fig. 3
figure 3

Mean accuracy, mean sensitivity and mean specificity for NB, 5-NN, SVM and DT classifiers. DT has the highest accuracy and sensitivity of 98.67% and 87.52%, respectively

Table 3 Mean (standard deviation) performance of the four classifiers

4 Discussion, Conclusions and Outlook

The achieved specificity of >99% and the sensitivity (87.52%) is comparable to the results reported in [13,14,15] for (N)ICUs, realized either with MLAs or with other approaches such as, e.g., the employment of multimodal descriptive statistics, Fourier and Hilbert transform to categorize the alarms. It has to be kept in mind that, generally, MLAs require a large amount of data to be trained effectively. In our case, the amount of data was certainly not optimal, because it was too few data with too few real alarms, due to the imbalance between true and false alarms . Taking this into consideration, impressive accuracy, specificity and sensitivity were, nevertheless, achieved. Of course, for a clinical application the highest sensitivity of 87.52% by the DT MLA would not be sufficient. Since missing a real alarm could be detrimental for patients, 100% detection of the real alarms is needed. To obtain higher sensitivity in the future, we will include larger datasets and use strategies to handle the above-mentioned imbalance. We conclude that all four tested MLAs yield promising results and we are convinced that 100% sensitivity can be achieved by increased training. This study shows that indeed the false alarms can be eliminated with modern MLAs, and that once the legal problems are solved, MLAs can rapidly be implemented in the standard care of the NICU. This will lead to better working conditions for the staff and a higher quality of care, as well as less stress for the neonates and their parents. Ultimately, this will reduce brain injury and long-term disabilities.