Keywords

1 Introduction

Vital biological Signals, such as heart and respiratory rate, are some of the first-level means to evaluate an individual’s physical health scenario. For example, cardiac motion, which is a primary indicator of an individual’s well-being, is often a unique identifier for each person as no two individuals have the same size, anatomy, or position of heart. While there are scientific tools to estimate basic health conditions from such biological signals, being bulky and hard to use in nature, they are primarily used within a clinical environment, under the supervision of a health professional. Hence, it is typical that these signs are only checked rarely at the annual doctor’s visit or when the patient’s physical health has already drastically deteriorated and symptoms are too prevalent to ignore. The situation becomes more complicated in a COVID-19 like pandemic scenario, where common people around the world, specifically the elderly patients, who are amongst the most vulnerable sections of the community, are trying hard to stay away from the hospitals and clinics to ensure safety. So, the probability of missing the regular health check process is now higher than ever. In fact, to address the criticality, an intensive and expensive medical procedure often turns out to be imperative or unavoidable. However, with early detection and regular monitoring processes in place, such exorbitant events may be circumvented.

Fig. 1.
figure 1

The proposed multimodal anomalous pattern recognition framework

Unconstrained means to monitor these vital body signals have rapidly emerged as a popular alternative to the conventional health check process in the last decade [1,2,3,4,5,6]. However, these appliances require frequent charging and are mostly wearable, making the patient uncomfortable (like causing skin irritation), specifically the elderly population. Many times, they also find it awkward due to the devices’ external visibility often compromising the privacy of their personal health information. All these pose severe challenges in continual and accurate data collection. A set of works employ unobtrusive devices, which can be easily installed in frequently used furniture that often appears in closed body contact with the patients [7,8,9,10,11,12]. However, the quality of the signal recorded using such devices often may relies on the frequency of the direct contact between the device surface and the patient’s body.

As such, an obvious way to inconspicuously monitor a person’s physical health is to embed sensors into objects most frequented by an individual. Studies have shown that people usually spend most of their day doing activities that require the person to be seated such as sitting while attending meetings, sitting while eating, sitting in cars, sitting while watching television [13]. A set of recent works [14,15,16,17,18,19,20] which mainly focus on building a hardware system like a chair, often rely on measurements obtained from only one type of signal and thereby have an access to only a limited amount of user’s health information. Additionally, frequently their prediction models derive aggregated decisions on the user’s health condition without allowing enough personalization. On the other hand, works [21,22,23,24,25] focusing primarily on its recognition sub-task tend to fail in a real-life problem setting, where signals collected from the patients are often too noisy for these machine learning-based systems to make an accurate prediction. Toward this, we aim to develop a generic signal processing and anomaly detection framework that may be deployed both in obtrusive and unobtrusive environments to measure and analyze vital signals of humans in real-time. While the proposed algorithm is invariant to its deployment environment, the proposed system has been installed within a SmartChair for real-life evaluation, as such a chair-like setting is known to be a complex application setting in this problem scenario. The extensive set of experiments demonstrate the effectiveness of the proposed cepstral-based peak fusion module by reporting 7 to \(10\%\) improvement over the baseline of a time-domain analysis. Furthermore, the proposed deep anomaly detection reports an average accuracy of 95.3% with 8 classes and \(93\%\) (improvement of \(3\%\)) with 17 classes.

An overview of the proposed method is illustrated in Fig. 1. In our experiments, we have used three vital body signals: respiratory rate, heart rate, and femoral pulse [10, 18, 26,27,28]. The primary contributions of the proposed system include:

  1. 1.

    Generic Machine Learning Based Framework that may analyze both uni- and multi-modal signals within an integrated noise-tolerant framework. The proposed algorithm develops a robust peak detection module by fusing peaks in time and cepstral domain to identify an exhaustive and accurate peak list, which works as an input to the proposed deep learning-based prediction model to predict an individual’s personalized health pattern in an automated manner.

  2. 2.

    Deep Anomaly Detection Strategy, which enables a continual deep learning-based monitoring process to precisely localize the anomalies in the time domain.

  3. 3.

    Real-life Demonstration in an Unobtrusive Experiment Setting, wherein the proposed multimodal signal processing and analysis framework is deployed with a SmartChair health monitoring system that may simultaneously capture different types of vital signals from different parts of the seat occupant’s body (over a wide range of ages) without forcing an interruption in their daily work schedule.

  4. 4.

    Extensive Evaluation and Comparative Study demonstrates an improved performance both in the publicly available datasets as well as our real-life lab experimental settings.

The rest of the paper is organized as follows: Sect. 2 briefly describes related works. The proposed method is explained in Sect. 3. Section 4 and 5 respectively present the experimental results and conclusion.

2 Related Works

In this section, we will briefly describe a set of related research, which can be categorized in parts: (1) Methods focusing on building an intelligent software system, wherein authors assume that a good quality annotated data collection is always available for training a sophisticated machine learning model and the quality of the signals captured during test time may also be considered to be reasonably noise-free, and (2) Methods aiming to build a hardware system that will collect the streaming data for further analysis using machine learning-based methods.

2.1 Health Anomalous Pattern Recognition

Traditional machine learning methods, like Multi-Layer Perceptron (MLP) [29,30,31], Support Vector Machine (SVM) [32,33,34,35], and K nearest neighbors (KNN) [30, 31, 36, 37] have already been used extensively to analyze vital health signals. A set of recent works introduce deep learning models [38,39,40,41,42,43] for improved performance. Deshmane and Madhe [44] have shown some impressive results on ECG Based biometric human identification using the Convolutional Neural Network model. In contrast to the traditional neural networks, Recurrent Neural Network (RNN) can be used for processing sequential data (e.g., cardiac signal) due to their internal state of memory and connection between the nodes. To explore the temporal granular details, RNN and its variant Long Short Term Memory (LSTM) [45,46,47,48] based models have also been introduced for the task. Given the prior knowledge of adjustment between input and output, it can map various sequences with sequences. However, in a practical scenario, specifically in a home-based computationally constrained setting, it is challenging to apply RNN-based methods for continual monitoring tasks, due to its scalability issues. In contrast to these methods, to ensure computational tractability, we use a small set of hand-crafted features to compute a compact feature descriptor that is passed as an input to the subsequent neural network-based prediction module. This not only helps attain a scalable prediction module but also ensures easy adaptability to an individual’s personalized health signal patterns.

2.2 Vital Signal Sensing Modality

Vital signals like ECG have been widely used to determine the health condition in many works [42, 49,50,51]. Kim et al. [52] studies of ECG measurement on the toilet seat for ubiquitous health care. Wu et al. [53] use a capacitive coupling ECG sensor to obtain the signal. While the signal capturing module for many of these methods is unobtrusive in nature, they may still demand the seat occupant’s attention to ensure a perfect connection between the body and the sensor and accurate angular arrangement, which may cause interruption to the seat occupant’s daily routine. A set of recent works have installed multi-channel ECG signals in a chair-based acquisition system to identify the motion artifacts [54]. Important to note that the system either requires the physiological activity in the same fashion as the enrollment stage or periodical resampling of the training dataset [55]. Therefore, the signal capturing process fails to be sustainable enough to ensure long-term usage.

Another set of works design the radio frequency (RF) methods [56] based on the signal reflection, require an off-body reader with the antenna in the far fields, while making the signal acquisition process for a single individual from multiple points, more challenging. A few research have utilized femoral pulse as a component of an active near-field coherent sensing (NCS) system [26, 57,58,59]. However, the arterial blood pressure is dependent on individual’s personal characteristics (e.g., age, height, gender), health conditions, and the administration of vasoactive drugs on the patient. Therefore, it is important to have a personalized prediction model that may effectively utilize the femoral pulse as a vital health signal to evaluate a patient’s health condition. In this paper, we use the ECG, PPG, and SCG vital signals to evaluate the subject’s physical condition. These specific vital signals are chosen as other possible signal resources such as Phonocardiography and Echocardiography are either obtrusive or require high expenses and are difficult to install on chairs.

3 Methodology

We design a generic machine learning-based classification model that performs a comprehensive and synchronized vital health signal analysis both in a uni-modal or multi-modal environment, wherein each mode may represent a signal generated from a unique body part of the participant and make a comprehensive prediction on the health condition of the individual. More specifically, given an annotated data collection \(\mathcal D=\{(s^j,y^j)\}_{j}\), where each vital body signal \(s^j\) is described using a m-mode (\(m\ge 1\)) representation, i.e., \(s^j=\{x^j_l\}^m_{l=1}\) and the corresponding label \(y^j\) is the label for the signal \(s^j\). As shown in Fig. 1, each mode-specific signal is pre-processed via the proposed signal processing module in parallel and later may get combined through feature fusion. In this section, we will describe the process in detail.

3.1 Signal Preprocessing

Noise Filtering. In the real-life setting, the signals received via the sensors are often noisy, due to the individual’s movements or shifts during measurements, dampening and noise from clothing, and noise introduced by the sensor itself. Therefore, any raw input signal is somewhat noisy. To address this challenge, we perform an initial noise filtering using Butterworth filter [60] to process each incoming signal to ensure an accurate prediction performance.

The Butterworth Filter is a filter that separates the high-frequency noise from the signal, such that frequency values within the range of the frequency boundaries are reflected in the signal without a significant amount of change. Also, the impact of higher frequencies is reduced by a significant factor, which is dependent on the filter order, in the filtered out signal. The sharpness of the transition from stopband to passband is controlled by the order, a predefined constant in our experiments. The low-pass Butterworth filter is designed as a rational function, defined as follows:

$$\begin{aligned} |H(j\omega )|^2=\frac{H_{0}}{(1+{\omega /\omega _0})^{2n}}, \end{aligned}$$
(1)

where \(H_0=1\) the maximum passband gain and \(\omega _0=1\) rad/sec. In our experiments, we have filtered the signal with a cutoff at 2.5 Hz and a fifth-order Butterworth filter, i.e. we have \(n=5\). The filtered signal x is treated as an input for further analysis. Unless mentioned otherwise, any reference of the signal x in the latter part of the paper will assume it as a filtered signal. We use the Scipy Python library [61] to implement the Butterworth filter.

Peak Identification. In order to analyze the signal characteristic, the first objective of this paper is to propose a robust peak detection scheme that may identify an exhaustive peak list within a signal against the different types of noise resources (e.g. non-stationary effects, low SNR, or several environmental settings of the patient like high heart rate exhibited after exercise) with minimum false positives. In this work, we compute a moving average based on a one-sided window proportional to the sampling frequency, where the proportionality constant is constant and user-defined. In all our experiments, we have chosen the proportionality factor as 0.75 and the sampling frequency as 100. Within each window, any heart rate lying above moving average (where the signal demonstrates a sharp change in gradient) is considered as a peak. While this approach works well in an ideal signal, in presence of a low SNR ratio, the precision performance may still deteriorate significantly resulting in the generation of some false peaks or end up losing some significant peaks in the input signal. An intuitive approach to mitigate the risk of false peak identification is to raise the moving average threshold. However, selecting a universal threshold that would work for all possible noisy signal settings, is difficult and may not be chosen automatically. Therefore, we employ an adaptive approach to dynamically set the threshold by computing the standard deviation of RR intervals [62, 63]. In general, the standard deviation of RR intervals is not large. Marking an extra peak or misplacing a R peak may increase the standard deviation significantly, which indicates the possibility of some false peak identification. Therefore, minimizing RRSD will be key to finding a threshold that finds the most accurate number of peaks. However, if RRSD is zero, then there can be two possibilities: either we have a perfect signal or we are seeing the consequences of undetected noises. So, to provide for the best solution, we choose a threshold from a predefined range that would satisfy both \(min(RRSD)>1\) and \(RRSD>1\). We use Heartpy [64] Python library function for the implementation task.

Peak Identification in the Cepstral Domain. Note that the peak detection in the input signal x is similar to detecting pitch from an audio signal. However, identifying peaks from x directly may not be sufficient in isolation, due to having the chance of missing some important peaks. In fact, this may in turn impact on deteriorating the following feature extraction task. Toward this end, for an improved peak detection performance, we use Cepstrum of the signal x for a granular-level peak analysis. As such, Cepstrum analysis, which is a nonlinear signal processing technique, is typically used for pitch detection (similar in some aspects to peak detection) in audio and speech. The real cepstrum of a signal x [65] is calculated as follows:

$$\begin{aligned} c_x(t)=\frac{1}{2\pi }\int _{-\pi }^{\pi } ln|X(\omega )| e^{j\omega t}\,d\omega , \end{aligned}$$
(2)

where \(X(\omega )\) is the Fourier transform of the sequence x(t). The proposed peak detection scheme (as described in Sect. 3.1) is employed to parallelly capture a set of peaks in the cepstral domain representation \(c_x\) of the input signal x.

Note that in order to ensure an accurate heart rate prediction, we aim to first identify the peaks in an input signal, which will later be used for identifying several key features like beats-per-minute (BPM), Inter-beat-interval (IBI), Root mean square of the successive differences (RMSSD), etc. We will discuss these features more in Sect. 3.2. As such in the cepstral domain, the magnitude of the cepstral coefficient is naturally related to the periodicity of the signal, which is the focus in heart rate estimation and higher values of the cepstrum coefficients reflect increased Signal to Noise Ratio (SNR). A fusion of signal peaks at the cepstrum domain is advantageous to produce a more exhaustive and accurate peak list, which forms the basis of the following feature fusion module (Fig. 2).

Fig. 2.
figure 2

Peak fusion process: an arrow displays the corresponding peak positions in the input signal (displayed in graph at the top), and the Cepstrum signal (displayed in graph at the bottom.

Peak Fusion Algorithm. The cepstrum signal \(c_x\) is used as a derived representative for the original unimode input signal x. The proposed method uses both \(c_x\) and x to identify sets of peaks which are fused to obtain a more exhaustive set of peaks in the signal x.

Given \(\mathcal P\) as the set of identified peaks in x, as shown in Fig. 5, for every peak at \(C_i\in \mathcal P\) with co-ordinate \((t_i, x_{(t_i)})\) in the signal x, there is a set of peaks in the corresponding time-domain neighborhood \(N_{t_i}\) around \(t_i\) for the cepstrum signal, \(c_x\). Intuitively multiple such peaks in \(c_x\) within the close neighborhood \(N_{t_i}\)around \(t_i\) do not provide any new peak information. Therefore, while fusing we eliminate all such redundant peaks within \(N_{t_i}\) retaining only the common peak identified at \(C_i\). This is the scenario, which we refer to as Remove Upward. This process is repeated for all peaks in \(\mathcal P\), resulting in retaining only those peaks in \(c_x\), which were not captured within any neighborhood \(N_{t_i}\) for any \(C_i\in \mathcal P\). The set of these remaining peaks in \(c_x\) is denoted as \(\mathcal P_{c_x}\). As illustrated in Fig. 5, to capture these missing peaks within the fused peak list for x, we analyze a close neighborhood \(N_{t_j}\) around every remaining peak in \(\mathcal P_{c_x}\). The time instant \(t_j\) within \(N_{t_j}\) at which the signal magnitude \(c_x[t_j]\) is maximum is mapped down to identify an additional peak \(D_j\) with magnitude \(x[t_j]\) in x. This process is referred to as Add Downward. The process is repeated for all elements of \(\mathcal P_{c_x}\). The combined peak list obtained at the end of a sequence of Remove Upward followed by a sequence of Add Downward is treated as the fused peak list that is used as the input to the following feature extraction module.

3.2 Multimodal Feature Extraction

Given the fused peak list obtained from the processed signal x, we derive several handcrafted signals including RRSD; RMSSD; BPM; IBI; SDNN; SDSD; NN20; NN50; PNN20; and PNN50 to represent the incoming signal in terms of a compact feature descriptor \(f_x\in \mathbb R^d\). RRSD can be computed as the standard deviation between the RR intervals (difference in time between the R-peaks) of a heart signal. RRSD can be computed as the standard deviation between the RR intervals (difference in time between the R-peaks) of a heart signal. RMSSD is defined as the root mean square of successive RR-Intervals and calculated by squaring each RR-interval. Then, the resulting values are averaged before the square root of the total is obtained. BPM can be calculated as the total number of peaks divided by the amount of time passed. IBI, the inter beat interval, can be calculated as the overall average of the RR Intervals. SDNN reflects the changes in heart rate due to cycles longer than 5 min. SDNN can be measured by computing the standard deviation of the time between the consecutive R-peaks. SDSD can be computed as the standard deviation of the successive differences between adjacent RR intervals. NN20 and NN50 can be computed by measuring the number of successive RR intervals that differ by more than 20 and 50 milliseconds respectively. PNN20 and PNN50 can be obtained by dividing NN20 and NN50 by a total number of RR intervals respectively.

In a multimodal environment, feature descriptor collection \(\{f^j_{x_l}\}^m_{l}\) representing multiple unimode signals \(s^j=\{x^j_l\}^m_{l=1}\) is transformed into a fused feature \(f=\phi (\{f^j_{x_l}\}^m_{l})\). In this work, we use vector concatenation function [66] as \(\phi \) to produce md dimensional fused feature \(f^j\).

3.3 Anomalous Pattern Recognition

Given an input signal x, we feed the feature vector \(f_x\) (as defined above) into a Neural Network model consisting of 3 fully connected (FC) layers with rectified linear unit (ReLU) activation function. The activation of the last FC layer is fed into a softmax layer to obtain the probabilistic category membership scores for the incoming signal’s anomaly score. While adding more layers makes the network more expressive, it simultaneously becomes harder to train due to increased computational complexity, vanishing gradients, and model over-fitting. The standard backpropagation algorithm is employed to update the fully connected layer weight parameters.Footnote 1 The loss function L is defined as follows:

$$\begin{aligned} L(\mathbf{W})=-\frac{\sum _{y \in \mathcal Y}\sum ^{|\mathcal D|}_{j=1}(\mathbf{1}(y^{j}=y))log(p(y^{j}=y|s^{j}; \mathbf{W})}{|\mathcal D|}, \end{aligned}$$
(3)

where \(\mathbf{1}{.}\) is the indicator function, \(\mathbf{W}\) represents the neural network weight parameters and \(log(p(y^{j}=y|s^{j}; \mathbf{W})\) computes the probabilistic score of the sample \(x_i\) for the class \(y\in \mathcal Y\). The learning task is formulated as solving the minimization problem defined as: \(\underset{\mathbf{W}}{min}L(\mathbf{W})\).

Fig. 3.
figure 3

Peak detection performance in the OHSU ECG signal dataset [67] (shown in (a)) and our in-house dataset with 13 participants (shown in (b)).

4 Experiments

The proposed method is evaluated from two different perspectives: 1) accuracy evaluation of the peak detection module and 2) the effectiveness of its two-class neural network based prediction module, where the goal is to precisely identify the ‘anomalous’ signal characteristics of a participant in near real-time. Different datasets are used to evaluate the performance of the model.

4.1 Dataset

To evaluate the performance of our peak detection algorithm, which forms the core of the subsequent prediction module, we use two datasets: the publicly available Oregon Health and Science University (OHSU) ECG signal dataset with 28 participants [67] and our in-house dataset with 13 participants. The OHSU dataset has recorded its signals at a sampling rate of 200 Hz and at an amplitude resolution of 4.88 muV. We have used only the health signals from 26 participants. As for the remaining 2 participants, the ECG signals were missing at several time instants. Therefore, we have not used these 2 participants’ data. In our in-house dataset collected via the prototype VitalChair (which has sensors at different positions for recording signals from the seat occupant and details to follow in Sect. 4.2), the synchronized Femoral pulse (FP), Wrist Pulse (WP), and ECG signals are collected from 13 participants sitting at 7 different positions in a chair for 30 seconds. Among the participants, 4 are high school students, 6 are healthy functioning adults, and 3 are senior adults who have gone through heart surgeries in the past year. The system performs sensor fusion, analyzing the signal patterns to highlight potential anomalous patterns if any. Tests include two scenarios: 1) heart rate of a person at ‘calm’ state, 2) excited state after 30 min of ‘after exercise’.

To evaluate the performance of the proposed neural network model that uses a compact feature descriptor derived from the identified fused peak list as input to predict the participant’s health condition, we use Mendeley ECG 1000 Fragments Dataset [25] and our in-house dataset. The Mendeley ECG 1000 Fragments Dataset [25] is the publicly available dataset that we have used to evaluate our framework. This dataset has data from 45 different patients in different health conditions, which comprise of: 2 types of normal rhythms including a pace-maker rhythm and a normal sinus rhythm; 15 types of cardiac dysfunctions including Atrial premature beat, Atrial flutter, Atrial fibrillation, Supraventricular tachyarrhythmia, Pre-excitation (WPW), Premature ventricular contraction, Ventricular bigeminy, Ventricular trigeminy, and Ventricular tachycardia. All the recorded signals are documented at a sampling rate 360 Hz and a gain of 2200 [adu/mV]. In our experiments we have used the above-mentioned 2 types of normal rhythm signal collection as our ‘normal’ class, which combined together is referred to as Class 8, while all the other classes are treated as a specific type of ‘anomalous’ classes. The class population ratio between two types of classes (i.e. ‘normal’ and ‘anomalous’) are highly skewed and the Class 8 population has size 14, 000. So, we refrain from using any ‘anomalous’ class with samples less than 1, 000. Therefore, in our derived dataset, we have only samples from Class 8 forming the ‘normal’ class population and 7 different ‘anomalous’ classes. In our binary prediction module, we reiterate the experiments several times. At each session, Class 8 is used as the ‘normal’ class and one of the remaining 7 classes is treated as the ‘anomalous’ class. Also to note that the signals in this collection are typically high-sampled and the ratio of the anomaly to non-anomaly classes is still very low. Therefore, to further balance the class population at every experimental session, 50 randomly selected sub-sampled signals (of length 500) from the entire signal comprising of nearly 3600 samples, are randomly selected to form the larger training collection. To maintain the balance we just randomly select an equal-sized subset of sub-sampled normal signals to represent the Class 8 population.

Note that in Mendeley Dataset [25], there are 7 anomaly classes and 1 normal class (namely, Class 1: Ventricular Bigeminy, Class 2: Ventricular Trigeminy, Class 3: Supraventricular Tachyarrhythmia, Class 4: Atrial Fibrillation, Class 5: Left Bundle Branch Block Beat, Class 6: Atrial Premature Beat, Class 7: Premature Ventricular Contraction, and Class 8: Normal Sinus Rhythm) and the ratio of the anomalies to the normal classes is exceptionally low. For the training and testing of the neural network, 50 randomly selected segments of 500 samples for every 3600 samples of the ECG signal were randomly selected. This procedure of subsampling was performed to ensure that the neural network produced by training on this data is not biased or overfitted due to the lack of anomaly-class data. Furthermore, this act of subsampling allows the neural network to produce a more fine-grained interval in which the anomalies are prevalent.

Fig. 4.
figure 4

Comparing the peak detection performance using the processed signal x against that achieved using Fused peak list by combining the peaks from x and the cepstrum signal \(c_x\).

4.2 Prototype Implementation: VitalChair

The custom-built circuit used for the Vitalchair used in our experiments for real-life study, consists of several capacitors, resistors, and photodiodesFootnote 2. It includes Arduino UNO; Breadboard; USB Cable; Power supplies; Jumper-wires (M/M, M/F); 1.0 M/4,7M Ohm Resistors; Piezoelectric; DS18B20 1-wire waterproof Temperature Sensors; Heart Rate pulse-sensors; different colored LEDs. To build the software module, I have used Arduino IDE and Python. Multiple biological signals including Electro-Cardiogram (ECG), Photoplethysmogram (PPG) from the wrist, Femoral Pulse (FP) are recorded using its corresponding sensor placed at different parts of the chair as illustrated in Fig. 1. The resulting signal from each sensor is passed onto an Arduino microcontroller attached to the bottom of the chair to collect readings from each sensor. Data was acquired onto a server connected to the Arduino over USB and analyzed using the Arduino software in real-time. The outputs of the sensors at different positions on the SmartChair are collected in a synchronized fashion for a comprehensive understanding of the seat occupant’s overall wellbeing.

Fig. 5.
figure 5

A custom-built prototype of VitalChair, where vital signals are represented as: WP = Wrist Pulse, FP = Femoral Pulse, ECG = Electro-Cardiogram signal, and Temp = Body Temperature

4.3 Performance Evaluation

Peak Detection Accuracy Metric. The results of the first type of experiments, evaluating the peak detection module of the proposed method, use Accuracy as the evaluation metric. Given g as the number of hand-picked peaks by an independent evaluator and p is the number of system-identified peaks, we compute the \(Accuracy=1-\frac{|p-g|}{g}\) The quantitative results obtained in the OHSU ECG signal dataset and our in-house dataset are reported in Fig. 3(a) and (b) respectively. As observed in Fig. 3(a), the average accuracy achieved by the proposed prediction module over 28 participants is around \(94.18\%\). Specifically for the participant id 26, the accuracy (approx \(75\%\)) is considerably lower compared to the rest, which is due to the missing data at several time instants that resulted in missing some significant peaks. The deteriorated peak detection performance propagated to influence the performance of the subsequent prediction module.

Peak Detection Performance. In Fig. 3(b), we notice that the accuracy of the ‘calm’ state is usually greater than the after exercise accuracy. This is the case because, after exercise, an individual’s heart rate increases significantly, which causes many additional consecutive peak occurrences. However, the system perceives this extra flow of peaks as noises and thus, some of the peaks are not counted. This results in missing peaks that impact reducing the overall accuracy of the prediction module. However, this high-frequency heart-rate period only lasts for a couple of minutes and the individual (if indeed ‘healthy’) quickly regains their normal heart rate. To mitigate this noise impacted response, we pause the prediction task during the initial minute, so that any alert regarding the participant’s health condition is generated only if it has been more than a minute since their seat occupancy.

As observed in Fig. 4, combining peaks from the processed signal x and the cepstrum signal \(c_x\) have been useful to improve the resulting peak detection performance of the proposed method by an average of \(3\%\). In fact, in several instances (like participants 1, 8, 15, 22, and 23) the improvement reported was around 7–10%.

Fig. 6.
figure 6

The performance of the proposed prediction module on the Mendeley dataset, reported using class specific ROC Curves (in (a)) and the classification accuracy (in (b)), wherein Class 1: Ventricular Bigeminy, Class 2: Ventricular Trigeminy, Class 3: Supraventricular Tachyarrhythmia, Class 4: Atrial Fibrillation, Class 5: Left Bundle Branch Block Beat, Class 6: Atrial Premature Beat, and Class 7: Premature Ventricular Contraction with their AUC scores respectively as: 0.98, 0.98, 0.97, 0.95, 0.94, 0.95, and 0.92.

Anomaly Detection Performance Metrics. The Classification Accuracy and Sensitivity score are used as the compact evaluation metrics computed by relating FP (False Positives), FN (False Negatives), TP (True Positives) and TN (True Negatives) and defined as:

$$\begin{aligned} Classification\_Accuracy=\bigg (\sum ^N_{i=1}\frac{TP+TN}{TP+TN+FP+FN}\bigg ).100\%/N, \end{aligned}$$
(4)
$$\begin{aligned} Sensitivity=\bigg (\sum ^N_{i=1}\frac{TP}{TP+FN}\bigg ).100\%/N, \end{aligned}$$
(5)

where the scores are computed based on N-fold cross-validated test process, In our experiments, we have used \(N=5\). We also report the performance details using Area Under the Receiver Operating Curve, known as AUC score [68]. This metric is significant as the population distribution across different classes varies widely. While Classification_Accuracy (or Sensitivity) may provide an overall performance at a given experimental parameter setting, the AUC metric provides greater insight on the class-specific performances of the proposed method. Also, Sensitivity score is used to report the comparative performance of the proposed method.

Anomaly Detection Performance. Figure. 6 reports the performance of the proposed method using ROC Curves, AUC Scores as the total area under the ROC curve, and the Classification Accuracy as the evaluation metrics [69]. As seen in the figure, note that, the average performance on all seven anomalous classes is around \(95.29\%\). While the performance on Class 5 is approximately \(87.22\%\), it is primarily attributed to the sparse signal (with also missing ECG values) obtained from participants.

4.4 Comparative Study

The performance of the proposed method is compared against that of several methods reported in [25], and the result is reported in Table 1. To attain an equivalent experimental setting, for this experiment, we combine all the 17 cardiac disorders into an anomaly class, while the healthy signals form the second class. As seen by comparing the results reported in the table, the proposed method shows an improved performance by reporting \(3\%\) increased Sensitivity score. An equivalent experiment is also performed using only 8 classes and as shown in the table, the proposed method attains an impressive performance gain of \(2.5\%\) compared to the best result reported on the data-set.

Table 1. A comparative study on the binary classification task performed using the Mendeley ECG 1000 Fragments Dataset [25].

Also in this scenario, it is also important to note that the performance varies from class to class (please refer to Fig. 6(b)). So, such course-level overall performance evaluation may not be sufficiently insightful in terms of getting sufficient insight on the effectiveness of the proposed model. For example, accurate identification of samples from Class 5 samples is harder than that of Class 6. Moreover, the above-mentioned paper obtains the accuracy in an obtrusive manner, which significantly reduces the amount of noise corrupting the signal. Therefore, the applicability of these methods is limited in various real-life environments, where signals can only be received in an unobtrusive manner. In contrast, the proposed method, which is generic and sufficiently robust to handle the noisy signal inputs, and allows for sequential learning by continual signal capturing process, is more effective and efficient. As described earlier, to further investigate the robustness of the method, we perform experiments in a real-life environment by deploying the software in the SmartChair, which collects signals by placing sensors at different parts of the chair. Also, the method is evaluated at different levels of stress and physical activity state of the participants to investigate the efficacy of the signal filtering and feature extraction methods.

5 Conclusion

In this paper, we have presented a framework that is able to accurately classify multiple input signals like ECG, PPG, and Femoral Pulse from a specific individual into two categories: healthy or unhealthy. Having been able to continuously monitor the patient’s vital signals, this system has several life-changing effects, including the ability to identify pathology conditions before they can turn into a serious threat to the human’s life or severe measures are required to cure them such as amputations. To demonstrate our proposed model’s real-life feasibility, we have physically implemented this framework into a chair. However, the proposed method is sufficiently generic to be deployed into other frequently used furniture items like beds, sofas, etc. We plan to extend this work to include other means of extracting pathological information, like vocal signals, and synchronize them all to make a smart home system that will be able to accurately classify the disease-specific pathological condition the individual has using multi-modal information.