Introduction

Cardiotocography (CTG) is a technical means of the fetal state assessment introduced in early1970 [1]. CTG aims to assess a pathological state, including early detection of congenital heart defect, fetal distress, or hypoxia that are crucial for further treatment during pregnancy [2]. CTG includes fetal heart rate (FHR) and uterine activity (UA) signals. Fundamental challenges for CTG classification by experts are that they need time, and an accurate assessment depends on the knowledge and clinical experience [1]. Therefore, applying computer decision-making methods is an efficient way of evaluating fetal well-being due to progress techniques in signal processing and pattern recognition.

Several methods have been proposed for fetal state assessment, such as probabilistic neural network [3], stochastic forest (SF) algorithm [4], and the extreme learning machine [5]. However, these solutions have to perform additional computation for CTG signal processing, such as data processing, feature extraction, and selection. Since they have to address CTG signal processing obstacles, including irrelevant or redundant attributes, missing value, and external noise [6] to obtain optimal accuracy. Recently, convolution neural network (CNN) is utilized to address these challenges in other domains, such as image processing and social network analysis [7] and has obtained remarkable success in this field. Since CNN can automatically extract and learn useful attributes from input data and generate deep features that are robust against irrelevant or redundant attributes, missing values, and external noise [8].

This research is an extension of our previous work [9] in which we have introduced a deep architecture of 1-D CNN based on four convolution layers, three pooling layers, and three fully connected layers to diagnose arrhythmia diseases automatically. However, in this research, we propose a new shallow architecture of 1-D CNN to improve fetal state assessment accuracy. This architecture has conducted based on one convolution layer, resulting in computational complexity reduction. In addition, pooling operation that is a regular part of traditional CNN is not applied in this architecture to increase the number of features in the classification phase. Accordingly, due to this feature increment, the classification phase is conducted by three fully connected layers.

We applied the cardiotocography data set (CTG) to evaluate the performance of the proposed architecture. The CTG data set is the most famous data set for fetal state assessment and includes three fetal statuses, consisting of normal, suspicious, and pathological. Furthermore, apart from the CTG data set, four real-world data sets, namely, statlog (ST), lymphography (LY), breast cancer (BC), and cervical cancer data set (CC), have been obtained from UCI [10] machine learning repository and considered for further evaluation. Besides, we implemented six classifiers such as neural network, SVM, k-nearest neighbor, decision tree, logistic regression, and deep belief network. We compared the proposed architecture results with these classifiers in terms of accuracy, sensitivity, specificity, and area under the roc curve (AUC). The results show that the proposed architecture outperforms other implemented classifiers, and in comparison with previous research, it achieves very competitive results in terms of accuracy.

This article is structured as follows: “Related Work” explains a short review of previous researches. “1-D CNN for Fetal State Assessment” introduces our proposed architecture for CTG classification. “Evaluation” demonstrates the experimental results on five data sets, and the study is concluded in “Conclusion and Future Work”.

Related Work

This section briefly reviews CNN research for signal processing and its previous fetal state assessment applications based on CTG signals. There is a growing tendency to apply machine learning methods for the clinical decision support system, as these methods offer appropriate solutions for accurate medical data analysis [11] (Fig. 1).

Fig. 1
figure 1

Classification of clinical decision support system methods

As shown in Fig. 1, clinical disease diagnosis (CDD) solutions can be classified into machine learning, metaheuristic-based, and artificial neural networks (ANN) methods. Metaheuristic-based methods are a combination of feature selection algorithms and classifiers [12]. Some well-known metaheuristic algorithms have been used in this category, such as practical swarm optimization (PSO) [1], genetic algorithms (GA) [13], and differential evolution (DE) [14]. These algorithms aim to select effective features in different applications, particularly for more accurate disease diagnosis [15]. Machine learning methods such as B-tree, näive Bayes, k-nearest neighbor, SVM, decision tree, and logistic regression usually are simple in use, and they show acceptable results for clean data sets. Deep learning is a sub-branch of ANN, which includes various methods such as deep belief network (DBN) [16], deep neural network (DNN) [17], recurrent neural network (RNN) [18], deep autoencoder (DA) [19] and convolution neural network (CNN). The CNN can be divided into three different architectures, including 1-D CNN [9], 2-D CNN [20], and 3-D CNN[21] for CDD. Feedforward neural network(FNN) [22], radial basis function neural network (RBFNN) [1], multilayer perceptron (MLP) [3], and extreme learning machine (ELM) [23] are other methods of ANN.

Signal Processing Based on 1-D CNN

Giri et al. [24] used a different model of 1-D CNN to diagnose Ischemic stroke from EEG and EOG signals. In this model, batch normalization is implemented to accelerate the training process. They evaluated their model on a clinical data set consisted of 62 instances and 24 features. The average accuracy of this model is equal to 86%. Acharya et al. [25] implemented a 1-D CNN architecture based on three convolutional layers, three max-pooling layers, and three fully connected to classify various heartbeats categories based on ECG. Their experiments were conducted in Physionet databases, accounting for accuracy of 94.03%. Kiranyaz et al. [26] introduced a monitoring system to distinguish ventricular ectopic beats from supraventricular ectopic beats based on ECG. They considered MIT-BIH to evaluate their proposed method, representing an accuracy of 98.9%.

Machine Learning Methods for Fetal State Assessment

Ravindran et al. [5] implemented a new clinical decision support system for detecting fetal state classes based on an extreme learning machine and a modified genetic algorithm. The accuracy of their model was calculated at 94%. Yılmaz [3] implemented three neural network models, including probabilistic, generalized regression, and multilayer perceptron for the fetal state assessment. This research shows that the probabilistic neural network's accuracy is more than other algorithms, accounting for 92.15%. Tsouros et al. [4] presented a new stochastic approach to create independent and robust decision trees. In this method, attributes in every tree node are selected based on a defined probability. The accuracy of their model was calculated by 88.66%. Comert et al. [27] conducted five artificial neural network methods such as Resilient Backpropagation, Gradient Descent, Quasi-Newton Conjugate Gradient Levenberg–Marquardt for fetal state classification. In this study, Levenberg–Marquardt backpropagation and Resilient Backpropagation achieved the best accuracy, representing 89.69% and 89.14%, respectively. Yilmaz and Kilikçier [1] used a binary decision tree and conducted a least-squares support vector (LS-SVM) machine for CTG classification. They applied a PSO for LS-SVM parameter optimization. In this study, the accuracy of LS-SVM was calculated at 88.66% by performing tenfold cross validation. Piri et al. [28] have conducted a solution based on an evolutionary multi-objective genetic algorithm to extract essential features that lead to fetal death. The extracted features have been classified by Seven existing classifiers, such as LR, SVM, RF, DT, KNN, GNB, and XGBoost. The best accuracy performance is related to the XGBoost classifier, accounting for 94%.

1-D CNN for Fetal State Assessment

This section introduces our proposed architecture for developing a more accurate 1-D CNN for the fetal state assessment. CNN's fundamental concepts are related to the neural network (NN), while it is distinct from NN's traditional use. This distinction is that CNN uses various operations such as convolution, pooling, and the ReLU activation function and implements a new method in the training stage [29].

CNN architecture can increment computational complexity that can lead to overfitting [30]. When overfitting occurs, the classifier cannot effectively classify new features and achieves an optimal accuracy. Therefore, first, we implemented a shallow network based on the minimum number of convolution layers (one convolution layer), resulting in computational complexity reduction. Second, pooling operation is not applied in this architecture to have more features in the classification phase. In medical data processing, especially in using a shallow model, applying this operation can lead to the lack of important medical features in the classification stage. However, traditional CNN needs this operation to reduce the number of features by mapping a special size of an image region to a feature map on image classification. Finally, three fully connected layers were considered for the classification phase due to the increase in the number of features. Figure 2 demonstrates the introduced architecture that is specifically configured for different data sets. These configurations are shown in Table 1.

Fig. 2
figure 2

Proposed architecture for Fetal State Assessment

Table 1 Summary of the proposed CNN for analyzing data sets

Architecture

CNN for image classification applies feature extraction in both horizontal and vertical directions, since image data are relevant in both directions. However, in biomedical data organized as a matrix such as CTG signals, 1-D CNN performs convolution operations only in the horizontal direction, as the data in this direction are just relevant and in the vertical direction are independent [31]. Equation (1) shows a 2-D convolution operation that is altered to a 1-D convolution operation (Eq. 2). In this equation, \(w\) is our kernel, which puts a group of weights and shares all over the input space, \(x_{i}\) is our input \(({x}_{1},{x}_{2},{x}_{n})\) and n is the whole number of instances. Consequently, the output of the 1-D convolutional layer can be computed as Eq. (3). In this Equation \(\sigma\) is the activation function, \({w}_{k}\) and \({b}_{k}\) refer to the weights and bias of the kth activation map, and N is the filter size. Equation (4) shows the number of the filter in each layer with the stride size of 1.

$$ I^{\prime} = \mathop \sum \limits_{i, j} I\left( {x - i, y - j} \right) \cdot w(i, j) $$
(1)
$$ z\left[ n \right] = \sum w\left[ i \right] \cdot x\left[ {n - i} \right] $$
(2)
$$ C_{k}^{t} = \sigma \left( {\mathop \sum \limits_{i = 1}^{N} x^{t - 1} \left( i \right)w_{k}^{t - 1} \left( i \right) + b_{k}^{t} } \right. $$
(3)
$$ {\text{The}}\;{\text{number}}\;{\text{of}}\;{\text{the}}\;{\text{filter}} = \frac{{{\text{Input}}\_{\text{size}} - {\text{Filter}}\_{\text{size}}}}{{{\text{Stride}}\_{\text{size}}}} $$
(4)

Furthermore, complex features are learned by applying a ReLU activation function that introduces the nonlinearity into the network [32]. A flattening operation is applied in the first fully connected layer in the classification tasks that changes the convolutional layer outputs into a single feature vector. Besides, the dropout technique [33] is implemented for reducing overfitting. This technique is a useful regularization method, preventing complex co-adaptations on the training data [33]. After that, a SoftMax activation function is performed to create a probability generalization based on network output, according to Eq. (5). Eventually, a standard feedforward and backpropagation pass is performed in the training phase.

$$ F\left( {x_{i} } \right) = \frac{{{\text{Exp}}\left( {x_{i} } \right)}}{{\sum\nolimits_{0}^{n} {{\text{Exp}}} \left( {x_{j} } \right)}}\quad \left[ {i = 0,1, \ldots ,k} \right] $$
(5)

Evaluation

The introduced architecture's performance is appraised by several experiments conducted by different data sets and validated using the cross-validation method. In the following, the experimental environment and data sets are introduced, then the experimental results are explained.

Data Sets Description

We have implemented the introduced architecture on the Google Colab environment and considered five different data sets from the UCI repository [10] for the proposed architecture evaluation. These data sets are considered with different challenges, such as imbalanced and missing values data. Table 2 demonstrates data sets' statistical information, and their descriptions are as follows.

Table 2 Data sets' statistical information

Cardiotocography data set consists of 2126 CTG samples and 23 features extracted from FHR and UC attributes by experts. This data set has three fetal state classes, including suspect, normal, and pathologic. Out of these 2126 instances, 1655 samples were classified as normal, and 176 and 295 instances belong to pathological and suspect states, respectively.

Statlog (Heart) data set Commonly has been used for heart disease diagnosis. This data set contains 13 features and 270 clinical records demonstrating the presence (120 samples) and absence (150 samples) of heart disease.

Lymphography data set has been considered for the diagnosis of lymphatic diseases. This data set originally consists of 148 samples, 18 attributes, and four classes gathered from radiologist judgments and estimations. We considered only two classes in this research, including metastases and malign lymph with 81 and 60 instances, respectively, since other classes have less than four attributes.

Breast Cancer Wisconsin (Diagnostic) data set includes features that are extracted from breast mass' digitized images. This data set consists of 32 tumor features and 569 instances (212—malignant, 357—benign).

Cervical cancer data set includes cervical cancer features, such as demographic information, patient habits, and historical medical records with 858 instances and 36 attributes with 44% missing value. In this research, we have considered Biopsy as the target variable in which cancer patients were 55 and non-cancer were 803.

Performance Evaluation Criteria

The proposed architecture has addressed both binary and multi-class classification. Thus, the performance of the proposed architecture is evaluated based on various criteria, such as accuracy (ACC), sensitivity (SE), specificity (SP), and area under the roc curve (AUC) [3]. We utilized common formulas for binary performance measures that can be studied in [3]. In addition, the confusion matrix is of size 3 × 3, as shown in Table 3. Accordingly, this multi-class confusion matrix can be utilized for multi-class performance measures by calculating the average of each class's measures, as shown in Table 4.

Table 3 Confusion matrix
Table 4 Performance measures for multi-class classification

Performance Analysis

We have conducted various experiment sets to appraise the proposed architecture performance. We implemented three different 1-D CNN architectures named 1-D CNN, 1-D CNNI, and 1-D CNNII (the proposed architecture) in the first experiment set. The 1-D CNN architecture is a traditional CNN implemented based on two convolutions and two fully connected layers. Consistently, 1-D CNNI architecture is developed by one convolution layer that can lead to computational complexity reduction, and three fully connected layers were considered for its classification phase. We use this architecture to evaluate reducing the convolution layers (computational complexity) on architecture performance. Eventually, in the 1-D CNNII, pooling operation is not applied to this architecture to have more features in the classification phase. This architecture was implemented based on one convolution layer and three fully connected layers.

In this performance analysis, tenfold cross validation is applied to examines the architectures' robustness and address the instances' inadequacy. Tables 3, 4 and 5 show the average results of the performance analysis using tenfold cross validation. The comparative results of different 1-D CNN architectures in terms of accuracy (ACC), sensitivity (SE), specificity (SP), and training time (TT) are shown in Table 5. The results in this table illustrate that 1-D CNNII achieves the best performance compared to other 1-D CNN architectures. Although reducing training time as a computational time is still a significant bottleneck for experimental research [30], this table shows that the proposed architecture's training time is faster than two other CNN architectures. Table 6 illustrates that CNNII has the best performance of correct predictions per class in the task of multi-class classification. In addition, as shown in Fig. 3, in the validation and training phase of 1-D CNNII, both training and validation errors decrease as the epochs continue to drop. In contrast, the validation error at a certain epoch point of 1-D CNN and 1-D CNNI starts to increase, which means that these two architectures suffer overfitting.

Table 5 Experimental results of different 1-D CNN architectures
Table 6 Experimental results of any output CTG classes
Fig. 3
figure 3

Training and validation error generated for analyzing various data sets

In the second experiment set, the accuracy and the area under the ROC curve (AUC) of CNNII were compared with other implemented classifiers such as decision tree (DT), neural network (NN), logistic regression (LR), support vector machine (SVM), k-nearest neighbor (k-NN), and standard deep belief network (DBN). Tables 7 and 8 demonstrate the experimental results for different data sets. The results show that the 1-D CNNII is more efficient than five implemented classifiers and achieves very competitive accuracy in the fetal state assessment compared to previous researches, as demonstrated in Table 9.

Table 7 AUC comparison of various methods for different data sets
Table 8 Accuracy comparison of various methods for different data sets
Table 9 Accuracy comparison of the proposed architecture with previous works on CTG classification

Conclusion and Future Work

The accurate assessment of fetal well-being is crucial for further treatment during pregnancy and is also an indicator of fetal distress. Many methods have been proposed to improve fetal state assessment. However, because of missing values and external noise of CTG signals, these methods' performance is not optimal. Thus, they have to implement feature selection and extraction as additional computations to address the mentioned challenges. In this article, a new shallow architecture of 1-D convolution neural network is proposed to enhance fetal state assessment accuracy. This architecture has performed based on one convolution layer, resulting in computational complexity reduction. Besides, pooling operation that is a standard part of traditional CNN is not applied in this architecture to have more features in the classification phase. In medical data processing, especially in the shallow model, using this operation can lead to the lack of important medical features in the classification stage. We have considered five different disease data sets, such as statlog, lymphography, breast cancer, cardiotocography, and cervical cancer data sets, to evaluate the proposed architecture. The results show that the proposed architecture is more efficient than traditional 1-D CNN and five implemented classifiers in accuracy, sensitivity, and specificity. This architecture also has achieved very competitive accuracy in the fetal state assessment compared to previous research. Our research's future direction includes using recent and effective metaheuristic algorithms to determine optimal CNN parameters and applying the proposed architecture for disease detection based on real-time stream processing.