Introduction

Monitoring the fetal cardiac activity may allow for screening of fetal well-being through analysis of the fetal heart rate (FHR) and morphology of the fetal electrocardiogram (FECG) waveform. For example, shortening of the fetal QT interval has been associated with intrapartum hypoxia resulting in metabolic acidosis.25 Doppler ultrasound is routinely used for measuring the FHR during pregnancy and delivery,21 even though it has not been demonstrated that ultrasound use is fully safe for the fetus.3 Moreover, ultrasound is less accurate than FECG for tracking FHR.12

The FECG can be recorded in two ways; through an electrode attached to the fetal scalp while the cervix is dilated (during delivery) or non-invasively through electrodes attached to the mother’s abdomen. Figure 1 shows an example of a maternal electrocardiogram (MECG), scalp FECG and an abdominal electrocardiogram (AECG) recorded simultaneously. The current primary use of the FECG is for FHR analysis during delivery. Morphological analysis of the FECG waveform is usually not performed with the exception of the STAN monitor (Neoventa Medical, Goteborg, Sweden), which attempts to identify ST segment deviation through a proxy measure (the T/R amplitude ratio). However this technology is invasive, can only be performed during labour and uses a single electrode (on the fetal scalp) which does not cover the three dimensional electrical field emanating from the fetal heart. Conversely, non invasive FECG (NI-FECG) extraction can theoretically be performed at almost any point in the pregnancy with multiple electrodes. This is true as long as the fetal dipole is large enough and that the vernix caseosa layer is not isolating the FECG signal too much during the third trimester. This motivates the research in NI-FECG extraction from the AECG which exhibits a mixture of the MECG, noise and the FECG.

Figure 1
figure 1

Example of (a) maternal chest ECG, (b) fetal scalp ECG and (c) abdominal ECG. Note that the abdominal ECG contains a mixture of both maternal and fetal ECG

Despite the rich literature on adult ECG, the significant advances in signal processing and the increased computational capabilities of digital processors, clinically useful extraction of the FECG from the mixture of abdominal signals is still a nascent field.30 This is due to the relatively low signal to noise ratio (SNR) of the FECG compared to the MECG, as well as the limited clinical knowledge on how fetal cardiac function and development map to changes in the FECG.

The primary feature that any algorithm needs to extract from the AECG signal mixture is the fetal QRS complex (FQRS) location. This peak detection is used for computing FHR, detecting rhythm abnormalities, or further used as an anchor point for extracting features from the FECG waveform. Ascertaining the location of the FQRS is simplified by first separating the FECG from the AECG, and several approaches have been previously applied. These include: principal component analysis,19 independent component analysis (ICA),7,16 or periodic component analysis (\(\pi \)CA)31 which makes use of the ECG’s periodicity. In essence, these approaches are a form of blind source, or semi-blind in the case of \(\pi \)CA, separation, which aim to separate the underlying statistically independent sources into three categories: MECG, FECG and noise. A key assumption of these methods is that of a linear stationary mixing matrix between these sources. Although the stationary aspect of the ICA mixing matrix could be unrealistic over long recordings, in practice the matrix can be regenerated for successive short time periods. The original signals are projected into the “source” domain, where the channels representing the MECG and noise can be canceled. The resulting back-projected signals should primarily consist of FECG components. Other techniques which operate in lower dimensions include adaptive filtering,36 template subtraction (TS)8,24,33 and Kalman filtering (KF).29 See Sameni and Clifford30 for a good overview of the methods. Despite many interesting theoretical frameworks the robustness of most of these methods has not been sufficiently quantitatively evaluated. This is mainly due to two factors: (1) the lack of gold standard databases with expert annotations and (2) the methodology for assessing the algorithms is underdeveloped.

The echo state neural network (ESN), which belongs to the family of adaptive filtering techniques,17 was used for the first time in the context of ECG processing by Petrenas et al. 27 for QRST cancellation during atrial fibrillation. In the present work, an FQRS detection method based on the ESN is introduced and compared to the following baseline techniques: the least mean square (LMS) adaptive filter,36 the recursive least square (RLS) adaptive filter36 and TS.8,24,33 The work presented here is focused on the relative performance comparison of these single channel time-based techniques for extracting FQRS, in contrast to Blind Source Separation (BSS) techniques which can be considered as spatially based. Focusing on single abdominal channel extraction techniques (i.e., techniques that require only one abdominal channel with or without a reference chest channel) would enable the production of low-cost, easy-to-use devices for NI-FECG monitoring. In particular, it is worth noting that the methods are not compared against BSS based approaches which require many abdominal channels to be simultaneously recorded. The details for tuning the algorithms’ global parameters and assessing their performance are discussed and a particular focus is given to the effect of the preprocessing step on the AECG mixture, where preprocessing refers to any signal filtering prior to applying a given FECG extraction algorithm.

Materials and Methods

Template Subtraction

Four variants of the TS method were implemented and evaluated.8,19,24,33 They are denoted TSc,8 TSm,24 TSpca,19 and TSlp 33 with TSlp implemented as described in Vullings et al. 35 A MECG template cycle was built centered on the mother R-peak and considering a duration of 0.20 seconds (s) for the P wave, 0.10 s for the QRS complex and 0.40 s for the T wave. In the case of TSc the average MECG complex \(\underline{\mathbf {t}}\) was scaled for each individual MECG cycle with a constant a in order to reduce the mismatch between the template cycle and the MECG complex \(\underline{\mathbf {m}}.\) The scaling constant a was found by searching for the least mean square error (MSE) (\(e^{2}\)) between \(\underline{\mathbf {m}}\) and \(\underline{\mathbf {t}}\) (i.e., solving \(\mathop{\text{argmin}}\limits_{a}(e^{2}) = \mathop{\text{argmin}}\limits_{a}(||\underline{\mathbf {t}} a - \underline{\mathbf {m}}||^{2})\).24 TS\(_\mathrm{m}\) is similar to TS\(_\mathrm{c}\) except that three scaling constants are searched for each individual cycle (one for each of the P, QRS and T waves). TS\(_\mathrm{lp}\) was implemented as described in Vullings et al. 35 where the template ECG was built by weighting the seven previous cycles, where the weights selected minimized the MSE. This is in contrast to the other TS methods where the weights of the cycles contributing to the template are equal. In order for the methods to adapt to the non-stationary MECG morphology, the template was updated with incoming cycles.

PCA aims to identify a meaningful orthonormal basis to re-express a given dataset. It can be used for dimensionality reduction, source separation and visualization. In this work TS\(_\mathrm{pca}\), was implemented as described by Kanjilal et al. 19 and the first two principal components computed on the aligned ECG cycles were kept. In order for the method to be adaptive, the PCA basis was updated every 10 AECG cycles.

The LMS and RLS Adaptive Filter

The classical method for removing noise from a corrupted signal is to pass it through a filter. The filter can be fixed (i.e., its transfer function is constant) or adaptive. In the context of FECG extraction, adaptive noise cancelation36 is commonly used to suppress noise from the mixture of signals in the AECG. The AECG \(y(n)\) is treated as the sum of the signal of interest (the FECG, \(s(n)\)) and noise (\(\eta (n)\)), i.e., \(y(n) = s(n) + \eta (n)\). \(\eta (n)\) corresponds to the combination of the MECG, other physiological signals such as muscle noise and artifacts such as movement. As the signal recorded on the chest does not have an FECG component due to its location, it serves as an observation of the noise and as a reference for the noise canceling field. The abdominal noise \(\eta (n)\) is adaptively removed by a filter whose coefficients \(\underline{\mathbf {w}}=[w_{1},\ldots ,w_{N}]\) form a finite impulse response filter with \(N\) being the number of coefficients or weights that are recursively updated in order to minimise an error signal \(e(n)\). Thus the goal of the LMS, RLS and ESN methods is to learn a model with input \(u(n)\) and output \(\hat{\eta }(n)\), where \(\hat{\eta }(n)\) matches the target signal \(y(n)\) as closely as possible in the least MSE sense. Subtracting \(\hat{\eta }(n)\) from the abdominal mixture results in the suppression of the most disrupting source of noise; the MECG. This process is represented in Fig. 2.

Figure 2
figure 2

Adaptive noise canceling block diagram in the case of one reference input \(u(n)\). On the diagram : the FECG \(s(n)\), the noise \(\eta (n)\), the abdominal ECG \(y(n) = s(n) + \eta (n)\), the chest signal \(u(n)\), the estimated noise \(\hat{\eta }(n)\), the estimation error \(e(n)\) and the output signal \(\hat{s}(n)\). \(n\) corresponds to a time index

The LMS Adaptive Filter

The LMS adaptive filter applied to NI-FECG extraction was first published by Widrow et al. 36 but only qualitative results were demonstrated. LMS is used to find filter coefficients that minimise the MSE \(e^{2}(n)\) between the filter output \(\hat{\eta }(n)\) and the desired response or target \(y(n)\). Let \(\underline{\mathbf {u}}(n)=[u_{1}(n-N+1),\ldots , u_{1}(n)]^{T},\) \( \forall \; n > N\) be a segment of the input signal (with \(N\) being the last \(N\) input samples), \(\underline{\mathbf {w}}(n) = [w_{1}(n),\ldots ,w_{N}(n)]\) be the filter weights, and \(e(n) = y(n)-\underline{\mathbf {w}}^{T} \underline{\mathbf {u}}(n)\) the error rate at each step. The optimal weight vector \(\underline{\mathbf {w}}_\mathrm{o}\), also called the Wiener weight vector, is given by \(\underline{\mathbf {w}}_\mathrm{o}=\varvec{R}^{-1} \varvec{P}\) where \(\varvec{R}\) is the input correlation matrix and \(\varvec{P}\) is the cross correlation between the desired response \(y(n)\) and \(\underline{\mathbf {u}}(n)\). LMS algorithms aim to estimate the optimum filter weights that minimise the MSE by utilizing the gradient of the MSE at each step. The weight update equation is given by \(\underline{\mathbf {w}}(n+1) = \underline{\mathbf {w}}(n)-\frac{\mu }{2} \nabla E[e^{2}(n)],\) where \(E[e^{2}(n)]\) is the expected value of the MSE and \(\mu \) is the step size that controls the stability and convergence rate. By assuming that the expectation of the MSE (\(E[e^{2}(n)]\)) can be adequately approximated by a finite sample of size \(N\) (i.e., as \(E[e^{2}(n)] = \sum _{n=1}^{N-1}\left[ y(n) - \underline{\mathbf {w}}^{T} \underline{\mathbf {u}}(n) \right] ^2\) ), this equation becomes \(\underline{\mathbf {w}}(n+1)=\underline{\mathbf {w}}(n)+\mu e(n) \underline{\mathbf {u}}(n)\). The key LMS adaptive algorithm steps can be summarized as:

$$\begin{aligned} {\hat{\eta }}(n)&= \underline{\mathbf {w}}^{T}(n-1) \underline{\mathbf {u}}(n) \end{aligned}$$
(1)
$$\begin{aligned} e(n)&= y(n)- {\hat{\eta }}(n) \end{aligned}$$
(2)
$$\begin{aligned} \underline{\mathbf {w}}(n)&= \underline{\mathbf {w}}(n-1) + \mu e(n) \underline{\mathbf {u}}(n) \end{aligned}$$
(3)

where (1) gives the filter prediction, (2) is used to evaluate the error and (3) is used to update the filter weights at each sample \(n\). Note that there are two parameters to set in order to design the adaptive filter: the filter length \(N\), and the step size \(\mu \). Both these parameters must be optimized on a training set. In the context of this work, usage of the LMS technique implicitly assumes that there is a linear relationship between the maternal waveform recorded on the chest and on the abdomen.

The RLS Adaptive Filter

The RLS algorithm minimises the total squared error between the desired signal and the target signal. In contrast to the LMS, which only considers the current error value to adapt its coefficients, the RLS considers the total error from the beginning of the signal to the incoming data point. The forgetting factor \(\lambda \in [0 \; 1]\) defines what proportion of past data contribute to the filter coefficient update. In the extreme case of \(\lambda =1\) all past data contribute equally, and as \(\lambda \) approaches zero only the most recent data points play a role. This translates into finding the parameters so that the following “loss-function” \(\epsilon (n)\) is minimized:

$$\begin{aligned} \epsilon (n) = \sum _{i=1}^{n} \beta (n,i) e(i)^{2} = \sum _{i=1}^{n} \beta (n,i) [y(i) - \hat{\eta }(n,i)]^{2} \end{aligned}$$

with \(\hat{\eta }(n,i) = \underline{\mathbf {w}}^{T}(n) \underline{\mathbf {u}}(i)\) and where \(\beta (n,i) = \lambda ^{n-i}\) in the case of the exponentially weighted least squares solution. The RLS algorithm updates the filter coefficients at each iteration2,34 as follows:

$$\begin{aligned} e(n)&= y(n) - \underline{\mathbf {w}}^{T}(n-1) \underline{\mathbf {u}}(n)\\ \underline{\mathbf {k}}(n)&= \frac{\mathbf {P}(n-1) \underline{\mathbf {u}}(n)}{\lambda + \underline{\mathbf {u}}(n)^{T} \mathbf {P}(n-1) \underline{\mathbf {u}}(n)}\\ \mathbf {P}(n)&= (\mathbf {I} - \underline{\mathbf {k}} \; \underline{\mathbf {u}}(n)^{T}) \mathbf {P}(n-1) \frac{1}{\lambda } \\ \underline{\mathbf {w}}(n)&= \underline{\mathbf {w}}(n-1) + \underline{\mathbf {k}}(n) e(n), \end{aligned}$$

where \(\mathbf {P}\) is the covariance and \(\mathbf {I}\) is the identity matrix. There are two important parameters in the RLS: the forgetting factor \(\lambda \) and the number of filter coefficients \(N\). RLS tends to converge faster than LMS and is usually more accurate. However, this is at the price of a higher computational complexity. In the context of this work and similar to the LMS approach, RLS assumes that a linear relationship exists between the maternal waveform recorded on the chest and the abdomen.

The ESN

Recurrent neural networks (RNN) are a class of neural networks capable of non-linear modeling of dynamical systems. This is made possible by recurrent connections between neurons, visualized as cycles in the network topology, that allow processing of temporal dependencies.9 However, parameter estimation of the RNNs has proven to be a difficult task. Indeed, optimization methods that were originally used for training feedforward neural networks, such as the error backpropagation algorithm, do not generally perform as well for training RNNs.23 The ESN17 is a recently introduced approach to RNN training, with the RNN (or reservoir) being generated randomly. The reservoir is then fixed and only the weights of the output neurons are learnt and updated using online or offline linear regression. This method outperformed classic fully trained RNNs in many tasks.23 The ESN is introduced in the context of noise canceling as a non-linear medium into which the reference signals propagate before the “echo response” given by the network reservoir is weighted by a readout layer.

In the configuration presented here, the MECG recorded on the chest is projected onto a set of non-orthogonal basis functions through the ESN reservoir, comparable to the kernel in kernel learning approaches. The input signal(s) (chest ECG) drive the nonlinear reservoir resulting in a high-dimensional dynamical “echo response.”9 The reservoir also acts as a memory of the input signal thus providing temporal context.22 In a second step an adaptation algorithm is used to compute the weights of the output neurons. The RLS algorithm was used for this step. This readout layer maps the reservoir states to the output: the observed abdominal AECG.

For \(K\) input units, \(M\) internal units and \(L\) output units: \(\underline{\mathbf {u}}(n) = [u_{1}(n),\ldots ,u_{K}(n)]\), \(\underline{\mathbf {x}}(n) = [x_{1}(n),\ldots ,x_{M}(n)]\), \(\hat{\underline{\varvec{\eta }}}(n) = [\hat{\eta }_{1}(n),\ldots ,\hat{\eta }_{L}(n)]\), where \(\underline{\mathbf {x}}(n)\) is the reservoir state vector, \(\underline{\mathbf {u}}(n)\) is the vector of input signals and \(\underline{\hat{\mathbf {\eta }}}(n)\) is the vector of output signals. We define the extended system state as \(\underline{\mathbf {z}}(n) = [\underline{\mathbf {x}}(n)\;\underline{\mathbf {u}}(n)].\) The activation of internal units is updated using the following equation:

$$\begin{aligned} \underline{\mathbf {x}}(n+1) = \underline{\mathbf {x}}(n) + f(\mathbf {W} \underline{\mathbf {x}}(n) + \mathbf {W}_{i} \underline{\mathbf {u}}(n+1) + \mathbf {W}_\mathrm{b} \hat{\underline{\varvec{\eta }}}), \end{aligned}$$

where \(\mathbf {W}\) \(\in {\mathfrak{R}}^{M \times M}\) is the reservoir weight matrix and \(\mathbf {W}_{i}\) \(\in {\mathfrak{R}}^{M \times K}\) is the input weight matrix (randomly generated and fixed). \(\mathbf {W}_\mathrm{b}\) \(\in {\mathfrak{R}}^{M \times L}\) is the back projection weight matrix and \(f\) is the reservoir neuron activation function, taken to be the hyperbolic tangent. Considering a purely input-driven dynamical pattern recognition task, the system is simplified by setting \(\mathbf {W}_\mathrm{b}=0\). The output is computed as:

$$\begin{aligned} \hat{\underline{\varvec{\eta }}}(n+1) = g(\underline{\mathbf {w}}_\mathrm{o}(n) \underline{\mathbf {z}}(n)), \end{aligned}$$

where \(g\) is the output neuron activation function (taken to be identity) and \(\underline{\mathbf {w}}_\mathrm{o}\) are the output weights which may be adaptive or fixed. An ESN with a leaky integrator neuron model18 was used. This satisfies:

$$\begin{aligned} \underline{\mathbf {x}}(n+1) = (1-\alpha ) \underline{\mathbf {x}}(n) + f(\mathbf {W} \underline{\mathbf {x}}(n) + \mathbf {W}_{i} \underline{\mathbf {u}}(n+1)), \end{aligned}$$
(4)

with \(\alpha \in [0 \; 1]\) being the leakage rate or forgetting factor. In the case of \(\alpha = 1\), the neurons do not retain any information about their previous state. Initial weights were generated from the uniform distribution on the interval \([-1 \; 1]\) for \(\mathbf {W}_{i}\) and \(\underline{\mathbf {w}}_\mathrm{o}\). \(\mathbf {W}\) is a random \(M \times M\) sparse matrix with approximately \(\psi \times M \times M\) uniformly distributed non zero entries (\(\psi \in [0 \; 1]\) is the sparsity of the reservoir). Figure 3 shows a representation of the ESN based FECG extraction algorithm. In the figure, one chest signals \(u(n)\) is used as inputs of the ESN and the abdominal channel is used as the target signal. The reservoir and input weights are randomly initialized once and the generated network is used independently for each abdominal channel available of each individual record. The predicted signal \(\hat{\eta }(n)\) is then subtracted from the \(j\)th abdominal signal \(y_{j}(n)\) giving the residual signal \(\hat{s}(n)\) containing the FECG. Using the ESN for NI-FECG extraction allows for a non-linear relationship between the maternal waveform as recorded on the chest and recorded on the abdomen.

Figure 3
figure 3

The ESN based FECG extraction algorithm showing the relationship between the chest signal \(u(n)\), predicted signal \(\hat{\eta }(n)\), the \(j\)th abdominal ECG signal \(y(n)\), and the residual signal \(\hat{s}(n)\) containing the FECG. Dashed lines represent adaptive weights. Image of the women is adapted from Zaunseder et al. 37

There are a number of global parameters that have an important influence on the algorithm’s performance. The main three global parameters of the ESN are22: (1) input scaling \(\gamma \), (2) the spectral radius \(\rho \) and (3) the leakage rate. The spectral radius determines how fast the influence of an input disappears in the reservoir with respect to time: choosing a small \(\rho \) means that the output is more dependent on recent history.18 It also affects how stable the reservoir activations are (see Eq. (4)). \(\mathbf {W}\) is first rescaled by \(\rho (\mathbf {W})\) (i.e., its dominant eigenvalue) and therefore has unit spectral radius. Next \(\mathbf {W}\) is scaled by \(\rho \). Input scaling determines the degree of non-linearity of the reservoir responses. Normalizing the input signal so that it lies in the range [\(-1\) 1] plays a similar role as scaling \(\mathbf {W}_{i}\) (see Eq. (4)). Table 1 summarises the ESN parameters.

Table 1 Parameters of the ESN, LMS and RLS algorithms

The weights can be allowed to evolve (adaptive filtering) or can be fixed (initialized on a sub-segment of the signal and kept constant). Both options were explored for the ESN approach; these will be denoted ESN\(_\mathrm{a}\) for the adaptive (i.e., weights are updated online) approach and ESN\(_\mathrm{na}\) for the nonadaptive (i.e., weights are determined on some initial training data and kept constant) approaches. This allows for assessment of whether online updating of the filter coefficients improves the filter performance.

Databases and Evaluation Protocol

Databases

Two databases were used in this study. The first was the Physionet non-invasive FECG database (PNIFECGDB)14 consisting of 55 multichannel AECG recordings taken from a single subject between 21 and 40 weeks of fetal gestational age. Each record consisted of two chest channels and 3–4 abdominal channels with different electrode configurations (electrode position was varied in order to improve the SNR). All signals were sampled at 1 kHz with 16-bit resolution. A bandpass filter (0.01–100 Hz) and a notch filter (50 Hz) were applied to the data during acquisition. A total of 14 records were manually selected where the FQRS complexes were visible on at least one channel. The gestational age for these records ranged from 22 to 40 weeks. One minute of signal, 30 s after the start of the record, was extracted for all available channels for each record.Footnote 1 A total of 2148 FQRS were manually annotated by the first author (with three years of FECG analysis experience) using the channel where the FQRS appeared to be the most visible for each record. These markers were considered to be the reference. For consistency, only the first three abdominal channels were kept so that all records had two chest and three abdominal channels. The three abdominal channels were considered independently for FECG extraction thus providing \(3 \times 14 = 42\) min of annotated data (i.e., \(3 \times 2148 = 6444\) reference FQRS). Data were of good quality with variable FECG/MECG SNR and some minor artifacts which were not manually discarded. This first database was denoted DB\(_{1}\).

The second database consisted of a subset of records from a private commercial database.Footnote 2 Each record consisted of 28 abdominal channels, one maternal chest channel, and the invasive scalp FECG signal. All signals were sampled at 1 kHz with 16-bit resolution. The chest channel as well as a subset of four abdominal channels where a FECG trace was visible were manually selected for processing. A total of eleven 5 min records from 8 pregnant women were used. An energy QRS detector based upon that of Pan and Tompkins (P&T)26 was applied to the scalp electrode and the corresponding markers were used as the reference. All records’ abdominal channels were considered independently for FECG extraction thus providing \(11 \times 5 \times 4 = 3\) h and 40 min of annotated data (overall \(4 \times 7752\) = 31,008 reference FQRS annotations). Some artifacts were present in a few records and the FECG/MECG SNR was variable. This second database was denoted DB\(_{2}\).

All ECG signals were downsampled to 250 Hz with an anti-aliasing filter prior to running extraction algorithms and tuning the parameters. In the following work, DB\(_{1}\) constitutes the training database and DB\(_{2}\) the test database. Summary statistics for the fetal and maternal HR in DB\(_{2}\) are reported in Table 2. Note that the data in the training set (DB\(_{1}\)) and the test set (DB\(_{2}\)) were recorded with different hardwares, following a separate protocol and at different stage of pregnancy for different subjects. This allows for reassessment of whether the algorithms are sufficiently flexible to work with data that have similar, but not equivalent, recording configurations. It is particularly pertinent to assess the adaptability of the extraction methods to an unseen set of signals from distinct subjects as there are many free parameters which are tuned on the training set database. Furthermore, the PNIFECGDB was used as the training set because the reference FQRS fiducial markers were not as accurate as for the private database which had a simultaneous fetal scalp signal.

Table 2 HR reference statistics for DB\(_{2}\) records

Because only one chest channel was available in DB\(_{2}\) only one of the two available chest channel was used for DB\(_{1}\). Each abdominal channel were considered individually. Thus \(\underline{\mathbf {u}}(n)=u(n)\) and \(\underline{\varvec{\hat{\eta }}}(n)=\hat{\eta }(n).\)

Evaluation Protocol

QRS detectors’ performances are usually assessed by beat-to-beat comparisons between the detected beats and the reference beats. The classical adult matching window for candidate fiducial points is 150 milliseconds (ms).1 However, in order to account for the higher FHR, a matching window of 50 ms is commonly employed, as for example in Guerrero-Martinez et al. 15 and Zaunseder et al. 37 In accordance with the ANSI/AAMI guideline1 the sensitivity (\(Se\)) and positive predictive value (\(PPV\)) are reported as follows: \(Se = TP/(TP+FN)\), \(PPV = TP/(TP + FP),\) where \(TP\), \(FP\) and \(FN\) are true positive, false positive and false negative detections respectively. For algorithm parameter optimization the following performance index (PI), was suggested by Kotas et al. 20 in the context of FQRS detection: \(PI = (T-FN-FP)/T = (TP-FP)/(TP+FN)\), where \(T = TP+FN\) is the number of annotated FQRS. The F\(_{1}\)-measure could alternatively be used as a measure of an algorithm’s accuracy. In the context of binary classification:

$$\begin{aligned} F_{1} = 2 \cdot \frac{PPV \cdot Se}{PPV+Se} = \frac{2 \cdot TP}{2 \cdot TP + FN + FP}. \end{aligned}$$
(5)

From Eq. (5) one can observe that \(FN\) and \(FP\) play a symmetric role in penalizing the accuracy measure \(F_{1}\), which is not the case when considering the \(PI\) measure. Thus in “Results” section \(Se\), \(PPV\) and \(F_{1}\) statistics are reported and the \(F_{1}\) measure was used for parameter optimization. In addition, the performance of the algorithms are evaluated in terms of FHR. FHR can be derived from the RR interval time series and compared with the reference trace HR derived from the reference annotations. Both the FHR and the reference HR were extracted as the reciprocal of the RR interval scaled by a factor of 60. At any given time, the extracted FHR is said to match the reference FHR if it is within \(\pm \)5 beats per minute (bpm) with respect to the reference trace. The corresponding FHR measure is denoted \(HR_\mathrm{m}\).

Preprocessing and General Experimental Set Up

For each of the available abdominal channels, baseline wander was first removed. In the context of NI-FECG extraction, it is common to use a larger than standard low frequency cut-off before performing MECG cancelation (see Martens et al. 24 and Zaunseder et al. 37 for example), where larger is \(\sim \)2 Hz. This cut-off is acceptable when the aim of the analysis is not FECG morphology, yet there is no comprehensive study known to the authors that has assessed the prefiltering effect on outcomes for FQRS extraction. As a consequence an exhaustive search was conducted to determine \(f_\mathrm{b}\) (an “optimal” cut-off frequency to remove the baseline using a high pass filter) and \(f_\mathrm{h}\) (an “optimal” cut-off frequency to remove the high frequency content using a low pass filter). Two zero phase Butterworth digital filters were cascaded for that purpose: one sixth order high pass filter and one 10th order low pass filter. The reference and input signals were then normalized according to the following procedure: (1) the first 5 s of the signal were used to derive the amplitude range of the ECG signal and the signal was then divided by this value, (2) the mean was computed over the first 5 s and subtracted from the ESN signal, and (3) the resulting signal was transformed using the hyperbolic tangent function. Step (3) was applied in order to avoid outliers which could result in the reservoir state \(\underline{\mathbf {x}}(n)\) or LMS/RLS weights \(\underline{\mathbf {w}}(n)\) taking unexpected values due to abnormally large values (likely attributable to signal artefact). Not performing this last step could result in loss of useful memory or a highly unpredictive output.22 Following the preprocessing, each of the algorithms was utilized to filter out the MECG resulting in a residual signal comprising of the FECG and some noise. FQRS detection was performed on the residual signal \(\hat{s}\) using a P&T QRS detector with 150 ms refractory period. All the parameter optimization was performed on DB\(_{1}\) while DB\(_{2}\) was used as the independent test set. The \(Se\), \(PPV\), \(F_{1}\) and \(HR_\mathrm{m}\) of each algorithm was evaluated.

TS techniques MQRS detection was run on the chest channel because of its higher SNR and negligible FECG contribution. Each MQRS was then adjusted in order for the R-peaks to be accurately located on each abdominal channel. This was to ensure good construction of the channel-specific MECG template and good alignment between the ECG cycles and the template MECG.

LMS technique An exhaustive search over a range of values of \(N \in [1 \; 262]\) and \(\mu \in [0.01 \; 0.46]\) was performed.

RLS technique An exhaustive search over a range of values of \(N \in [1 \; 61]\)) and \(\lambda \in [0.8 \; 1]\) was performed.

ESN technique The optimization of the ESN parameters was performed using stage wise grid search as follows: (i) a standard prefiltering range using \(f_\mathrm{b}=2\) and \(f_\mathrm{h}=100\) Hz was used to search for workable \(\rho \) and \(\alpha \) values, (ii) given \(\rho \) and \(\alpha \), a search was performed over a range of preprocessing cut-off frequencies to find optimal \(f_\mathrm{b}\) and \(f_\mathrm{h}\), (iii) using the optimized \(f_\mathrm{b}\) and \(f_\mathrm{h}\), the search for optimal values of \(\rho \) and \(\alpha \) was repeated, and finally (iv) the number of reservoir neurons (\(M\)) was optimized as measured by the \(F_{1}\) score with the parameters selected in (i–iii). Note that (i–iii) were performed with a high number of reservoir neurons (\(M=250\)). In the case of adaptive filtering, an RLS algorithm was used to update the weights of the readout layer which maps the reservoir states to the output of the observed AECG. In the case of non-adaptive filtering, the Wiener–Hopf, direct pseudo inverse and ridge regression were considered to determine the weights on a 30 s signal epoch preceding the studied segments. When varying the number of neurons, \(M\), the experiment was repeated 20 times in order to ensure the variance of the F1 measure caused by the random initialization of the ESN reservoir connections and weights was negligible.

One chest channel was taken as the reference signal for the LMS, RLS and the ESN. Filter weights were initialized on 30 s preceding the annotated data for each record (i.e., preceding the 1 min of DB1 and the 5 min of DB2).

Adding Signal Quality Indices

In Clifford et al. 11 different signal quality indices (SQIs) to identify bad quality ECG signals were evaluated. The most accurate SQI evaluates the agreement between two QRS detectors with different robustness to noise. This metric, termed bSQI \(\in \) [0 1] (with 1 representing a good quality signal), was difficult to apply to the abdominal signal as the MQRS and also some FQRS could be detected by the QRS detectors in the case of high amplitude FECG traces. This makes bSQI a weak quality indicator if applied to the abdominal signal. Thus no bSQI was used on this signal but was used on the chest channel (where there is no FECG contribution) with a 10 s window and 9 s overlap. This provided a second by second SQI for the chest channel. In the case where bSQI was inferior to 0.8 the corresponding abdominal segments were removed.

Results

Parameter Optimization

TS

For TS, the TS\(_\mathrm{pca}\) method gave the best results on the training set (see Table 3), thus only results for TSpca were reported in Table 4 for the performance on the individual records of DB2.

Table 3 Overall statistics on DB\(_{1}\) (training DB) and DB\(_{2}\) (test DB)
Table 4 Per record statistics on DB\(_{2}\) (test DB)

ESN

Figure 4 illustrates the exhaustive search results obtained on the training database DB\(_{1}\) using the direct pseudoinverse method for computing \(\underline{\mathbf {w}}\). Based on this optimization step, the preprocessing and ESN parameters were determined to be: \(f_\mathrm{b}=20\) Hz, \(f_\mathrm{h}=95\) Hz, \(\alpha =0.4\), \(\rho =0.4\), \(M=90\). The best results on the training set using the above mentioned parameters were \(Se = 97.2 \%\), \(PPV = 97.3 \%\) and \(F_{1} = 97.2 \%\) (see Table 3).

Figure 4
figure 4

Search space for the preprocessing parameters (\(f_\mathrm{b}\), \(f_\mathrm{h}\)) the ESN specific parameters (\(\alpha \), \(\rho \)) and the LMS parameters (\(\mu \), \(N\)) on the training set database DB\(_{1}\). (a) search for ESN preprocessing parameters (\(\alpha =0.4\), \(\rho =0.4\), \(M=90\)), (b) search for ESN parameters \(\alpha \) and \(\rho \) (\(f_\mathrm{b}=20\) Hz, \(f_\mathrm{h}=95\) Hz, \(M=90\)), (c) search RLS parameters \(\lambda \) and \(N\) (\(f_\mathrm{b}=20\) Hz, \(f_\mathrm{h}=95\) Hz), (d) search LMS parameters \(\mu \) and \(N\) (\(f_\mathrm{b}=20\) Hz, \(f_\mathrm{h}=95\) Hz), (e) for the number of ESN neurons required (\(f_\mathrm{b}=20\) Hz, \(f_\mathrm{h}=95\) Hz, \(\alpha =0.4\), \(\rho =0.4\)- repeated 20 times for each value of N to look at the variance of the \(F_{1}\)measure caused by the random initialization of the ESN reservoir connections and weights)

LMS

The parameters \(\mu \) and \(N\) of the adaptive LMS adaptive algorithm were searched (Fig. 4). Selected parameters based on the grid search were \(\mu = 0.1\) and \(N=20\) with corresponding performance statistics \(Se = 95.8 \%\), \(PPV = 95.0 \%\) and \(F_{1} = 95.4 \%\) (see Table 3).

RLS

The parameters \(\lambda \) and \(N\) of the RLS adaptive algorithm were searched (Fig. 4). Parameter values selected based on the grid search were \(\lambda = 0.999\) and \(N=20\), with corresponding performance statistics of \(Se = 96.2\%\), \(PPV = 95.6\%\) and \(F_{1} = 95.9\%\) (see Table 3).

Qualitative Results

Results in Fig. 5 were produced with \(f_\mathrm{b}=20\) Hz, \(f_\mathrm{h}=95\) Hz. Figure 5b shows qualitative examples of the ESN algorithms performance on r154 in DB\(_{1}\) with optimal parameters.

Figure 5
figure 5

(a) Effect of varying \(f_\mathrm{b}\) (r154, DB\(_{1}\), 3rd AECG, \(f_\mathrm{h} = 110\) Hz) for the preprocessing step. Note that at 10 Hz most of the frequency content of the P and the T-wave of the MECG have been filtered out, leaving only the FQRS and the MQRS. (b) Example of the ESN algorithm performance with \(\alpha =0.4\), \(\rho =0.4\), \(M=90\) (optimal parameters). The signal (r154, DB\(_{1}\), third AECG) was prefiltered with \(f_\mathrm{b}=20\) Hz and \(f_\mathrm{h}=95\) Hz. Notice in particular the extraction of the FQRS embedded in the MQRS at t = 48.2 sec (circled by a broken black line)

Results on Training and Test Sets

Table 3 summarises the results obtained on the training set DB\(_{1}\). It can be seen that ESN based techniques performed better than the TS and adaptive filter techniques. Both ESN techniques gave similar results, with \(F_{1}\) scores of \(97.2\%\) for ESN\(_\mathrm{na}\) and \(97\%\) for ESN\(_\mathrm{a}\). Among the TS techniques, TS\(_\mathrm{pca}\) gave \(F_{1}=95.4\%\) and outperformed all other TS techniques. Finally, among the adaptive filter techniques the LMS (\(F_{1}\) of \(95.4\%\)) was slightly outperformed by the RLS (\(F_{1}\) of \(95.9\%\)).

Table 3 also summarises the results obtained on the test set DB\(_{2}\). The adaptive ESN (ESN\(_\mathrm{a}\)) gave the best \(F_{1}\) score of \(90.2\%\), improving upon the TS\(_\mathrm{pca}\) technique (\(89.3\%\)) and the RLS technique (\(88.2\%\)). Table 4 presents the results for each individual record of DB\(_{2}\) for the optimal ESN, LMS, RLS and TS\(_\mathrm{pca}\) as determined on the training set DB\(_{1}\). Note that in Table 4 the performance of each algorithm was averaged over all individual channels of a given record. The best result in each case is underlined.

Results with Signal Quality Indices

Table 5 presents the results for the test database in terms of \(F_{1}\); Including the signal quality indices improved the results on DB\(_{2}\) by \(+1.07\) for the TS\(_\mathrm{pca}\) approach, \(+0.95 \%\) for the LMS approach, \(+1.04 \%\) for the RLS approach and \(+1.11 \%\) for the ESN approach while excluding \(3.6\%\) of the overall signals length. See Fig. 6 for an example of segment excluded on DB\(_{2}\) using bSQI.

Table 5 Results on DB\(_{2}\) both with and without removing low quality record segments based upon a single SQI
Figure 6
figure 6

Signal quality for identifying bad quality regions (record 123a first abdominal channel). From top to bottom: chest MECG, ABD ECG and residual after performing MECG cancelation using the ESN. In the faded colour area, the part of the signal that would be discarded because of its low quality (SQI \(\le \) 0.8-window size for assessing quality is 10 s with 9 s overlap)

Discussion

The focus of this study was to benchmark the ESN, RLS, LMS and TS techniques in their capacity to accurately extract the FECG from the AECG and facilitate robust FQRS detection. Better results may be achieved with an alternative QRS detector, specifically designed for FQRS detection as in Kotas et al.,20 but the relative ordering of the algorithms is unlikely to change.

It should be noted that in practice the FHR time series are smoothed prior to being displayed on the clinical monitor. This smoothing operation removes sudden drops or increases of FHR due to missed beat detections. Higher performances would be expected on DB\(_{1}\) and DB\(_{2}\) in terms of the FH\(_\mathrm{m}\) if this smoothing operation was applied.

Studying the performance of the algorithms on individual channels without smoothing the extracted FQRS time series allowed us to draw a direct comparison between the various time based techniques. Blind source separation techniques such as ICA or PCA were not considered in this work as its focus was on time based techniques using a single abdominal channel. However, in a multi abdominal channel system any of the presented methods could be used to remove the MECG contribution on the different channels independently before performing the BSS step, as for example in Behar et al. 5 where higher performance was obtained by adding a BSS step after any of the single channel time based approaches was used. In addition results in Table 4 showed that no algorithm was systematically outperforming the others. This suggests that there is potential to combine different approaches for NI-FECG extraction and further improve the overall results.

The importance of the signal preconditioning and in particular the baseline wander cut-off frequency was studied. Figure 5 shows an example of a segment (r154, DB\(_{1}\), 3rd AECG) which was preprocessed with various different \(f_\mathrm{b}\) values. Note that at 10 Hz most of the frequency content of the P-wave and the T-wave of the MECG have been filtered out leaving only the FQRS and MQRS. This higher cut-off showed improvement on both training and test sets for all methods. As an example, the TS\(_\mathrm{lp}\) technique gave \(F_{1}=83.0 \%\) and \(F_{1}=80.0 \%\) with \(f_\mathrm{b}=2\) Hz, \(f_\mathrm{h}=95\) Hz on DB\(_{1}\) and DB\(_{2}\) respectively and \(F_{1}=92.0 \%\) and \(F_{1}=85.7 \%\) with \(f_\mathrm{b}=20\) Hz, \(f_\mathrm{h}=95\) Hz on DB\(_{1}\) and DB\(_{2}\) respectivelyFootnote 3. The cut-off selected in this work was inferred from the grid search in Fig. 4a. Although it was clear that choosing \(f_\mathrm{b}\) larger than traditional used was beneficial in this case, whether to choose \(f_\mathrm{b}=20\) Hz (as in this work) or a lower value inevitably depends on the dataset and filter design considered.

One of the theoretical limitations of such adaptive filtering approaches is that any noise contained in the abdominal, but not the chest ECG signal, will not be removed. Indeed, the assumption behind adaptive noise canceling is that the noise contaminants on the abdominal channels are also present on the chest channels (considered as the noise field). Nevertheless the method is well suited for removing the main noise contaminant from the abdominal signal, namely the MECG, and easing FQRS detection.

The ESN algorithm requires a minimum of one reference and one abdominal channel. This is an important advantage over the blind source separation techniques which while very popular for this application, require a relatively high number of channels (usually between 8 and 1629). Furthermore, the ESN does not require any prior information on the MQRS location, unlike the TS or KF approaches. In particular, the KF requires a precise MQRS detection technique, making this technique particularly susceptible to noise in the MECG.

When using the ESN\(_\mathrm{a}\) with the traditional RLS implementation presented in Sect. 2.2.2 used to train the readout layer, we observed a monotonic increase of the weights over time leading to large weights after a few minutes. When the weights are too large they amplify small differences among the dimensions of \(\underline{\mathbf {x}}(n)\) which in turn can lead to instability in the presence of a small deviation from the conditions for which the network had been trained.23 As a consequence we implemented the RLS algorithm introduced in Douglas et al.,13 which uses RLS in combination with least squares prewhitening. This implementation made the ESN\(_\mathrm{a}\) stable.

Within the TS techniques, TS\(_\mathrm{pca}\) performed the best on the training and test sets. Within the TS class of techniques used TS\(_\mathrm{pca}\) is certainly the most adaptive. Within the other methods (LMS, RLS, ESN), RLS was found to perform slightly better than LMS and the ESN better than the RLS. This suggests that the MECG cycle was better removed with more adaptive algorithms.

It is important to note that the TS techniques could include discontinuities due to the piece-wise template. Indeed the MECG template cycle was built centered on the mother R-peak and considering a duration of 0.20, 0.10 and 0.40 s for the P, QRS and T waves respectively.24 The choice of each ECG segments’ time interval is realistic although certainly not optimal. However, the discontinuities mentioned are assumed to be minimal, with limited influence on FQRS detection (as opposed to FECG morphological analysis). Performance could be improved by varying the length of each of these intervals with the maternal heart rate rather than being constant.

The main drawback of the TS with respect to the LMS and ESN is that it relies on accurate MQRS detection (the best detectors typically achieve about 99% accuracy over a range of diverse databases). Indeed, a missed MQRS detection will most likely result in FPs (except if located in the FQRS refractory period) and possibly FNs (if the actual FQRS is located within the refractory period of the induced FP). Conversely, the main drawback of the LMS, RLS and ESN algorithms is that they are driven by and subject to poor signal quality in the chest signal. This particularly motivates the use of SQIs. It should also be noted that the number of coefficients for LMS/RLS adaptive filters or neurons in the ESN is a function of the signal sampling frequency and as a consequence parameter optimization should be conducted again if a different sampling frequency is to be considered.

Usage of a SQI on the chest reference channel improved the accuracy measure by \(\sim \)1% while suppressing \(\sim \)3% of the overall record. It is expected that the added value in using SQI would be higher in the presence of noisier recordings, as the data used in this work were of relatively good quality with some minor local artifacts.

It is to be noted that the number of ESN parameters that need tuning, in addition to the randomness of reservoir initialization and connectivity (that give no insight into the reservoir dynamic), makes the ESN design and implementation difficult. In an attempt to tackle these problems, Rodan et al.28 compared the performance of the “standard” ESN framework (the one used in this paper) with much simpler networks structures and showed that similar performance to the standard ESN could be achieved using these deterministically constructed reservoirs. Their approach has two main advantages: 1) simplification of the reservoir design and number of parameters to optimise and 2) building a path to theoretical analysis of the ESN. This alternative ESN design will be compared, in the context of our application, to the standard ESN in future work. An alternative is to use random search for finding a set of acceptable parameters as suggested in Behar et al.4 where we used the approach from Bergstra et al.6 Random search performed for the ESN and preprocessing parameters showed similar performance as the exhaustive grid search presented in this work but reducing the number of search iterations. The parameters that were assumed constant in this work in order to restrict the grid search to an acceptable number of free-parameters (such as \(\phi \) and \(\gamma \)) could be searched if random search was employed. In practice it is likely that random search would be used as an initialization method to find which parameters are the most relevant and further parameter tuning would be performed with fine grid.

An important limitation for evaluating NI-FECG extraction algorithms is the absence of large publicly available databases with expert references. This limitation was recently partially addressed with the introduction of the Physionet/Computing in Cardiology 2013 challenge database.32 However this database is constituted of 4 abdominal channels for each record without any reference chest MECG channel, thus not allowing for evaluation of adaptive noise canceling techniques such as the LMS, RLS and ESN presented in this work.

The ESN performed slightly better than the LMS and RLS on the test database, but the results were not significantly different. One of the main differences between the two approaches is that the ESN allows for a non-linear relationship between the chest and abdominal ECGs whereas the LMS only considers a linear relationship. Thus the results could suggest that the the mapping is mostly linear or that the numerous ESN hyperparameters were over-tuned on the training set database. Recall that the data in the training set (DB\(_{1}\)) and test set (DB\(_{2}\)) were recorded with different hardware, following a separate protocol and at different stage of pregnancy for different subjects. Since the LMS is conceptually simpler and computationally less expensive it is most likely a more appropriate choice for low-cost devices with limited computational power. However in the case of a more advanced hospital-based system, where computing power is generally not a concern, then there is little incentive against using the more computationally complex and accurate solution.

Conclusion

This work compared non invasive FECG extraction methods. The methods were qualitatively and quantitatively evaluated. In addition, the preprocessing performed for baseline wander and high frequency removal was studied in some detail. These filters, along with various parameters for the LMS, RLS and ESN, were optimized through exhaustive grid search on a training database. The findings of this research are; (1) The non linear ESN\(_\mathrm{a}\) method showed slightly superior performance with respect to the LMS, RLS and TS methods; (2) using a high baseline wander cut-off frequency improved the performance of the extraction algorithms; (3) SQI improved the performance of all methods by excluding bad quality chest ECG which likely led to an unexpected adaptive filters response or false QRS detection. In addition this paper suggested a framework for assessing the algorithms performance in extracting the FQRS and FHR by using the F1 and HR\(_\mathrm{m}\) measures. Future work requires assessing the algorithms on a larger dataset, benchmarking them against additional methods employed for this task and evaluating the alternative ESN reservoir design as suggested by Rodan et al.28 Open source code for the benchmark methods are made available to allow comparison and reproducibility on the public domain data. The source code is available on Physionet at http://physionet.org/.10