Introduction

Risk stratification is an important element of emergency department (ED) care as it impacts patient acuity, resource allocation, and clinical decision-making. Chest pain patients present unique challenges for risk stratification because of the current ambiguity in diagnosing acute coronary syndrome (ACS). A high volume of ED visits each year are due to chest pain [1], but only approximately 9% are due to ACS [2]. Missing the diagnosis of ACS may have catastrophic clinical consequences, but over-treating chest pain patients may also lead to ED and hospital inefficiency and adverse clinical outcomes. Therefore, accurately risk stratifying chest pain patients is important for ensuring optimal patient care.

Physicians typically use subjective history and objective data such as vital signs, electrocardiogram (ECG) changes, and troponin in the context of their clinical experience to risk stratify patients. Physiologic measures such as heart rate variability (HRV) have also been employed to risk stratify chest pain patients based on the hypothesis that HRV can measure cardiac stress through dynamic analysis of the autonomic nervous system [3], [4]. With all this available information, it is becoming important to explore clinically meaningful and validated methods to combine and assess patient data beyond clinical gestalt. Risk scores, such as the HEART score, combine key aspects of patient information (for example age, ECG changes, medical history, and troponin readings) through simple, but effective scoring methods [5]. The HEART score is an evidence-based method that has been proposed [5] and validated to risk stratify chest pain patients [6, 7]. Other clinical scores that have been validated on ED chest pain patients include TIMI score [8] and GRACE score [9]. Additionally, studies have employed logistic regression to identify prognostic factors for adverse cardiac outcomes in chest pain patients [3, 4]. However, these techniques are stationary and unadaptable to different patient populations and clinical settings.

In contrast, machine learning (ML) provides a more dynamic process to combine data for the risk stratification of chest pain patients. Liu et al. showed that an ML-based scoring system employing the variables of HRV, vital signs, and ECG changes can accurately predict major adverse cardiac events in chest pain patients [1012]. In the ensemble-based scoring system (ESS) [10], support vector machine (SVM) was adopted as the individual classifier, and a hybrid sampling (under-sampling + over-sampling) technique was proposed to handle data imbalance and output combination in the decision ensemble. The over-sampling component uses a method named SMOTE [13] to generate artificial data samples. However, this process increases the computational burden. In this paper, we adopt a fast neural network training method, extreme learning machine (ELM) [14] as the individual classifier, and aim to improve predictive performance of the ESS framework while reducing the computational complexity.

In recent years, ELM has attracted numerous attentions and many ELM variants and extensions have been developed. For example, there are pruned ELM algorithms [15, 16], evolutionary ELM [17], sparse Bayesian ELM [18], and ensemble based ELM [19, 20]. ELM has also been extended to handle imbalanced data [2123]. Furthermore, there are studies combining fuzzy logic with ELM for performance improvement [24, 25]. Other than algorithmic extensions, ELM methods have been widely implemented in applications such as medical decision making [2628], image processing [29, 30], bioinformatics [31, 32], and industrial applications [33, 34].

Given the simplicity and flexibility in extension that ELM has, we aim to investigate the feasibility of combining ESS and ELM in a sophisticated process to facilitate predictive modeling and decision making. The reminder of this paper is organized as follows. The “Clinical Setting and Predictive Variables” section presents the study design, data collection, and selected predictive variables for model building. The “Prediction of Adverse Cardiac Events” section describes a novel ensemble-based framework with extreme learning machine for the prediction of major adverse cardiac events. The “Results” section presents the performance evaluation and “Conclusion” section draws the conclusions.

Clinical Setting and Predictive Variables

Study Design

Adult patients with chest pain suggestive of cardiac etiology were prospectively and conveniently recruited from September 2010 to July 2015 at the Emergency Department (ED) of Singapore General Hospital (SGH). We excluded patients who had an obvious non-cardiac etiology of chest pain (e.g., trauma, pneumothorax) as assessed by the primary ED physician. We also excluded patients if their one-lead ECG showed non-sinus rhythm (e.g., arrhythmia, ectopic beats >5%) or artifacts precluding adequate HRV analysis. Singapore Health Services (SingHealth) Centralized Institutional Review Board approved the study and patient consent was waived.

For each patient, demographic information and medical history were retrospectively acquired from hospital’s electronic health records (EHR). The first vital signs acquired in the ED or in triage were also obtained from the EHR. The first 12-lead ECG acquired in the ED was read by the primary ED physician as well as a second, independent reviewer for signs of ischemia (e.g., ST segment changes) and other significant abnormalities (e.g., bundle branch block, left ventricular hypertrophy, QTc prolongation, etc.). The first troponin lab taken in the ED was recorded. At SGH, troponin-T is used with an abnormal value defined as >99th percentile of assay (0.03 ng/mL).

Five- to six-minute one-lead ECG signals were recorded with X-Series Monitor (ZOLL Medical Corporation, Chelmsford, MA). ECG signals were then loaded into Kubios version 2.2 (Kuopio, Finland) for HRV analysis. Within the program, software automatically identified QRS complexes. ECG signals were also manually screened to ensure QRS detection accuracy. If necessary, QRS detection could be manually adjusted. The R-R interval time-series was then created for each ECG and screened for irregular rhythm, ectopic beats, or artifacts. Time domain, frequency domain, and non-linear HRV variables were computed using internal software of the Kubios program described by Niskanen et al. [35].

In this study, we define the primary outcome as a composite outcome of adverse cardiac events within 72-h of presentation to the ED, including revascularization, cardiac arrest, cardiogenic shock, lethal arrhythmia, and mortality. We obtained the outcomes from patient EHR review.

Heart Rate Variability

We used three types of HRV parameters, including time domain, frequency domain, and non-linear variables. Time domain variables are computed using traditional statistical and geometric methods. Average R-R interval (aRR), standard deviation of the R-R time series (sdRR), and the square root of the mean squared differences between R-R intervals (RMSDD) are derived from statistical computations and are meant to depict the overall variability of the R-R time series. RMSDD is most sensitive to vagal influences, but may also be easily skewed by ectopic beats or irregular rhythms. Other time domain measures such as the number of times that the absolute difference between two successive R-R intervals exceeds 50 ms (NN50) and NN50 divided by the total number of R-R intervals (pNN50) can convey beat-to-beat variation as well as the variability of the total R-R time series. The triangular baseline width of a triangle fit into the R-R interval histogram using least squares (TINN) is a geometric based variable again used to convey overall variability.

Frequency domain variables are computed by transforming the R-R time series into the frequency domain using power spectral analyses such as fast Fourier transform (FFT). HRV arises from different systems, such as respiration, baroreceptors, circadian rhythms, and the central nervous system providing feedback to the sinoatrial node through neural circuits. Such negative feedback systems tend to oscillate producing HRV in healthy adults [36]. The autonomic nervous system provides the majority of this input and therefore HRV frequency parameters give dynamic insight into the balance between parasympathetic and sympathetic tone. At the sinoatrial node, parasympathetic nervous system inputs do not require a secondary messenger and therefore have a higher frequency oscillation. Therefore, high-frequency (HF) components (0.15 to 0.40 Hz) represent parasympathetic input, while low-frequency (LF) components (0.04 to 0.15 Hz) represent sympathetic inputs as well as a component of parasympathetic input. Very low frequency (VLF) components (0 to 0.04 Hz) may not be appropriate for analysis of short R-R interval time series [37]. Low-frequency and high-frequency components are also normalized over total power minus very low frequency power. The ratio of normalized LF to normalized HF can furthermore depict autonomic balance.

Non-linear variables used to analyze HRV include Poincare plot, sample and approximate entropy, and detrended fluctuation analysis (DFA). The Poincare plot is created by plotting an interval against the subsequent interval for a series of R-R intervals. Multiple methods have been described to characterize HRV using this plot [13]. The technique used in our analysis fits an ellipse to the plot shape with the axis positioned along the line of identity (LOI). The standard deviation of points perpendicular to the LOI measure short-term variability (SD1), while the standard deviation of points parallel to the LOI measure long-term variability (SD2). However, these two variables may be more representative of linear characteristics of the R-R interval rather than non-linear [38].

The short length and noise of the R-R time interval series used in our study present challenges for measuring entropy. Two methods have been previously described to estimate the degree of regularity in the R-R time series including approximate entropy (ApEn) and sample entropy (SampEn). Approximate entropy has been described previously [39] and searches for similar epochs within the time-series. However, this measure has been shown to have bias with respect to length of the R-R interval time series as well as difficulty with relative consistency. Therefore, a new measure called sample entropy was created and shown to be more consistent and less dependent on the length of the R-R interval time series [40].

Detrended fluctuation analysis measures the long-range correlation in noisy signals making it ideal for evaluating R-R interval time series. It has been studied in different contexts among cardiovascular physiology and pathology [4143]. The method, described previously by Penzel et al. [44], finds correlations over different time scales. In our study, correlations were divided into short term (range 4–16 beats) and long term (range 16–64 beats) quantified by the slope of log-log plot and represented by the variables α1 and α2 respectively. Due to the length of R-R time series, α1 may be the more appropriate measurement for our data.

12-Lead ECG and Vital Signs

In this study, we measured 12-lead ECG using Philips PageWriter TC Series device. Some ECG parameters were automatically computed and shown in the device, and the rest parameters were manually calculated by a trained medical practitioner using a continuous hardcopy paper printout of the electrical signals. Within an ECG cardiac cycle, there are P wave, T wave, U wave, and QRS complex. We used the following 12 parameters as candidate variables in predictive modeling: ST segment changes (ST elevation and ST depression), T wave inversion, Q wave, QRS axis, QT interval correction (QTc), left bundle branch block (BBB), right BBB, intraventricular conduction delay (IVCD), left atrial abnormality (LAA), left ventricular hypertrophy (VH), and right VH. A brief description of these parameters is given as follows.

The ST segment represents the connection between QRS complex and T wave. Typically, ST segment is isoelectric and matched with the baseline. An ST elevation may be defined when the ST segment is abnormally high above the isolectric baseline. The ST elevation is obtained by measuring the vertical elevation between the ECG trace of the ST segment and the baseline, and may correspond to damage or pathological change to the cardiac muscle. A QRS axis can be determined from the QRS complex, where the QRS axis is the net vector of ventricular depolarization. T wave occurs after the QRS complex and represents the repolarization (or recovery) of the ventricles. T wave inversion may be a sign of coronary ischemia or central nervous system disorder, etc. The QT interval is measured from the initial negative deflection of Q wave, to the end of T wave. A prolonged QT interval may indicate ventricular tachyarrhythmias and sudden death. The QT interval varies with heart rate, and for clinical relevance requires a correction for this, giving the QT interval correction (QTc).

Presence of a left bundle branch block (LBBB) is a cardiac conduction abnormality. In a LBBB, left ventricle contraction is later than the right ventricle due to the delay of activation of the left ventricle. In a RBBB, the right ventricle is not directly activated by impulses traveling through the right bundle branch. IVCD could be determined from a QRS duration widening, where by a process of elimination, the QRS duration widening is caused by an IVCD if the manifestation is not caused by a LBBB or a RBBB. IVCD may correspond to a myocardial infarction, a cardiomyopathy with ventricular fibrosis, or a chamber enlargement. Atrial abnormalities or atrial enlargements, atrial dilatations or atrial hypertrophy may also be detected in an ECG.

In addition to HRV and 12-lead ECG parameters, we chose eight clinical vital signs in predictive modeling. We used Propaq CS (Welch Allyn, Skaneateles Falls, NY, USA) Vital Signs Monitor to measure heart rate, systolic, and diastolic blood pressure. While the patients were presented at the ED, we recorded respiratory rate, GCS, and temperature. Additional vital signs included pain score and oxygen saturation (SpO 2). Furthermore, we collected patients’ medical history and relevant information.

Prediction of Adverse Cardiac Events

The original ensemble-based scoring system (ESS) [10] was proposed to risk stratify chest pain patients in the emergency department, where support vector machine (SVM) was implemented as the individual classifier. There are rooms for further improvement: First, a combined use of both under-sampling and over-sampling techniques increases computational load; second, as reported in the literature, ELM-based algorithms outperform SVM in various applications [45].

In this section, we present a brief introduction to the basic ELM algorithm, and then show the detailed descriptions of the proposed ESS-ELM algorithm. ESS-ELM is more than a simple replacement of SVM with ELM, but an integration of unique ELM features into the ESS architecture. We will elaborate the proposed algorithm in “Ensemble-Based Scoring System with ELM (ESS-ELM)” section.

Extreme Learning Machine

ELM [46] was proposed as a learning algorithm for single-layer feed-forward network (SLFN) where the weights and biases for hidden nodes were randomly selected and the output weights are determined with least square solution. In a training set L with N samples,

$$ L = \{(\mathbf{x}_{j},\mathbf{t}_{j})|\mathbf{x}_{j} \in \mathbf{R}^{p}, \mathbf{t}_{j} \in \mathbf{R}^{q},j=1,2,...,N\} $$
(1)

x j is the input feature vector with p components and t j is a q-dimensional target vector. Assume g(x) is the activation function for hidden nodes and w i is the weight vector that connects input neurons and the ith hidden node, we can define an SLFN with \(\tilde {N}\) hidden nodes as follows,

$$ f_{\tilde{N}}(\mathbf{x}_{j})=\sum\limits_{i=1}^{\tilde{N}}\beta_{i}g(\mathbf{w}_{i} \cdot \mathbf{x}_{j}+b_{i})=\mathbf{t}_{j}, \quad j=1,2,...,N $$
(2)

Equations 2 can be further written as

$$ \mathbf{H}\hat{\beta}=\mathbf{T} $$
(3)

where \(\mathbf {H}(\mathbf {w}_{1},...,\mathbf {w}_{\tilde {N}}\), \(b_{1},...,b_{\tilde {N}}\), x 1, ..., x N ) is the output matrix, and h j i = g(w i x j + b i ) is the output of ith hidden neuron with respect to x j . Furthermore, the output weight matrix is \(\hat {\beta }=[\beta _{1},...,\beta _{\tilde {N}}]^{\mathrm {T}}\) and the target matrix is T = [t 1, ..., t N ]T. As proposed in Huang et al. [46], parameters w i and b i are randomly assigned so that the output weights can be estimated as β = H T. In ELM algorithm, Moore-Penrose generalized inverse [47] is implemented to convert H into H .

Ensemble-Based Scoring System with ELM (ESS-ELM)

As described in [10], ESS was developed for risk stratification of ED chest pain patients, which was specifically designed to handle imbalanced data where samples (i.e., patients) with normal outcome are much more than those samples with abnormal outcome (in our study, a composite adverse cardiac events within 72-h). Since conventional machine learning algorithms cannot handle imbalanced data well [48], ESS algorithm [10] was created by adopting a hybrid sampling technique (under-sampling + over-sampling). The main mechanism behind the ESS algorithm is using ensemble learning to combine individual classifiers for reliable decision making. In this paper, we will integrate ELM into the ESS framework while removing the over-sampling component to reduce algorithmic complexity.

The structure of the proposed ESS-ELM algorithm is depicted in Fig. 1. In this ensemble architecture, each ensemble classifier is defined as φ t and there are a total of T individual classifiers. A weight is assigned to each classifier φ t to indicate its significance in the decision ensemble. As described in the original ESS algorithm, the weight w t is determined from over-sampled data that are generated using the SMOTE technique [13]. In this new ESS-ELM algorithm, we propose to derive the weights directly from the ELM learning process, which will avoid using over-sampling during model training. Due to the nature of data imbalance in our dataset (146 out of 797 patients met the outcome), we adopted the under-sampling structure in creating the decision ensemble.

Fig. 1
figure 1

The architecture of the proposed ESS-ELM algorithm

Given a training dataset L = [x 1, x 2, ..., x K ] where x is a feature vector (HRV parameters, 12-lead ECG parameters, and vital signs) representing a patient, we start the analysis with normalizing [49] the original values into the interval of [-1, 1]. We assume that in the training set there are one minority set P (patients with positive outcome) and one majority set N (patients with negative outcome). We apply the under-sampling technique to randomly select a subset N t from N. The number of samples in subset N t is the same as that in dataset P. In such a way, we are able to create T balanced dataset S t (P + N t ) for building a decision ensemble.

In the training dataset L, we have K samples (x k , y k ) where y k is C 0 or C 1, with C 1 indicating patient x k met the outcome, i.e., a composite outcome of serious adverse cardiac events within 72 h. Assume that we have a testing sample x, we aim to predict its label y using an ensemble of single classifiers φ(x,L) with training set L. As depicted in Fig. 1, we will build total T independent classifiers and combine them to make a final decision. Also, as previously mentioned, the weights are derived from ELM learning process and they should represent the significance of corresponding individual ELM classifiers. Referring to several ELM literature [19, 46], ∥β∥, the norm of output matrix weights, is closely associated with ELM generalization performance. That is, smaller ∥β∥ leads to better generalization ability. This property is an indicator on how important an ELM classifier is, so we consider ∥β∥ as one key component in creating the decision ensemble. To achieve a trade-off between training accuracy and generalization performance, we define the weight w t for the t-th ELM classifier as Acc t /∥β t 2 where Acc t is the training accuracy. The predicted outcome y t of classifier φ t (x,L) is either 0 or 1. A risk score for testing sample x is derived using the following equation,

$$ \begin{array}{ll} \text{Score}(\mathbf{x}) & =\frac{{\sum}_{t=1}^{T}\varphi_{t}(\mathbf{x},L) \cdot w_{t}}{{\sum}_{t=1}^{T}w_{t}} \times 100\\ & =\frac{{\sum}_{t=1}^{T}\varphi_{t}(\mathbf{x},L) \cdot \text{Acc}_{t}/\|\beta_{t}\|^{2}}{{\sum}_{t=1}^{T}\text{Acc}_{t}/\|\beta_{t}\|^{2}} \times 100 \end{array} $$
(4)

Compared with the original ESS algorithm, we use ELM-derived parameters (i.e., Acc t and |β t 2) instead of the over-sampled data for weight calculation. By using this weight, the computationally expensive over-sampling component in the original ESS is replaced with an efficient ELM algorithm. Moreover, we assign a randomly selected number of hidden nodes to each individual classifier so that the diversity within the decision ensemble is increased. In ensemble learning, it is known that a diversified decision ensemble has better generalization ability [50]. The detailed description of the ESS-ELM algorithm is presented in Fig. 2.

Fig. 2
figure 2

Descriptions of the proposed ESS-ELM algorithm

Results

As observed in our dataset, the average age of all patients was 60 years old and majority of the cohort were male (68%) and Chinese (62%). Table 1 shows predictive variables including HRV, ECG, and vital signs. We found out that in linear HRV predictors, only time domain parameters (normalized LF power, normalized HF power, and LF/HF) were statistically significant (p < 0.05). Moreover, approximate entropy, sample entropy, and DFA α1 of the non-linear HRV variables were also significant. In 12-lead ECG predictors, the proportions of patients with ST-elevation, ST-depression, and Q-waves were much higher in the group of patients who met the outcome. Among vital sign predictors, we only observed that pain score was statistically significant.

Table 1 Vital signs, heart rate variability (HRV) variables, electrocardiogram (ECG) variables, and troponin variable of patients in the study population

Figure 3 illustrates the ROC curves generated by ESS-ELM algorithm, ESS-SVM algorithm, HEART score, TIMI score, and GRACE score. In performance evaluation, we use the leave-one-out cross-validation (LOOCV) framework. Figure 3a shows the comparisons between ESS-ELM and the clinical scores and Fig. 3b shows the comparisons between ESS-SVM and the clinical scores. Obviously, both ESS-ELM and ESS-SVM algorithms significantly outperform TIMI and GRACE scores. As shown in Table 2, ESS-SVM and HEART achieve similar AUC values while ESS-SVM gives better specificity and PPV. ESS-ELM is the top performer among all scores. Both ESS-based algorithms use 100 as the ensemble size and ESS-ELM adopts [10,100] as the range of number of hidden nodes for its individual ELM classifiers. The results presented in Table 2 are derived from optimal cutoff score for each method; optimal cutoff point on the ROC curve is defined as the point nearest to the upper-left corner.

Fig. 3
figure 3

ROC curves by ESS-ELM algorithm, ESS-SVM algorithm, HEART score, TIMI score, and GRACE score

Table 2 Comparison of the predictive results. The range of number of hidden nodes for ELM algorithms was [10,100] and ensemble size was 100 for both ESS-ELM and ESS-SVM algorithms

We note several factors that determine algorithm complexity and predictive performance, for example, the number of hidden nodes and the ensemble size. As seen from the algorithm architecture (Fig. 1), the complexity of ESS-ELM has a linear correlation with the ensemble size. Compared with the original ESS algorithm [10], ESS-ELM reduces computations by removing the over-sampling component. Figure 4a demonstrates the impacts on AUC using different number of hidden nodes. As mentioned in the previous section, we use different number of hidden nodes for each individual ELM classifiers, so the number indicated in the figure is the range of number of hidden nodes, i.e., [10,25], [10,50], [10,100], [10,150], and [10,200]. The best prediction performance was obtained when the range of number of hidden nodes was [10,100]. Figure 4b depicts different ensemble sizes and their corresponding AUC values while keeping the range of hidden nodes as [10,100]. We observe that the highest AUC value was achieved when ensemble size was 100. In general, larger ensemble size corresponded to better prediction performance. In all of our experiments on the ESS-ELM algorithm, we chose ensemble size as 100 and the hidden node range as [10,100] to produce the best trade-off between predictive performance and complexity.

Fig. 4
figure 4

The effects of the number of hidden nodes and ensemble size on area under the curve (AUC) values

Conclusion

In this paper, we presented an ensemble-based risk scoring and conducted an observational study of 797 ED chest pain patients. We proposed a novel ELM-based ensemble scoring method named as ESS-ELM, and demonstrated that the new algorithm outperformed the original ESS-SVM algorithm and three established clinical scores, namely HEART, TIMI, and GRACE. Moreover, we have investigated the effects of parameter changes in terms of ROC analysis. AUC values, sensitivity, specificity, PPV, and NPV, were used as the performance indicators. ELM has shown the flexibility on its integration with the ESS framework.