1 Introduction

India, a low-middle income country is the second most populous country of the world that has been drawing attention concerning health profile. Globally, the Maternal Mortality Ratio (MMR) declined from 385/100,000 live births in 1990 to 216/100,000 in 2015. In a similar timeframe, estimated MMR of India ranged from 556 to 174 per 100,000 live births in between 1990 and 2015 and by 2015, the country contributed to 15% of global maternal deaths [7]. Maternal mortality highlights a health burden on women during and just after pregnancy when women are at risk of complications, particularly in developing countries like India.

Maternal Mortality Ratio (MMR) is a standard measure for measuring maternal deaths which mainly occur while a woman is pregnant or within 42 days of the termination of pregnancy, during labor or delivery or after childbirth. These deaths mostly happen due to preventable causes [28, 48] which include ante-partum hemorrhage, postpartum hemorrhage, ruptured uterus, high blood pressure or eclampsia, severe bleeding, infection after termination of pregnancy and pulmonary embolism. The prominent factors like early age marriage or pregnancy, poverty, malnutrition, illiteracy, unsafe abortion and less time gap between two deliveries make the condition even more dangerous for mothers living in remote areas [25] and in order to accommodate this, Government of India has implemented bundles of schemes for reducing maternal mortality for the improvement of the health of pregnant women falling in the reproductive age group. A multi-strategy initiative like National Rural Health Mission (NRHM) was launched in 2005 focusing on the reproductive, maternal, newborn or child healthcare strategies in order to strengthen the health system and the same was renamed to National Health Mission (NHM) in 2012 [27]. Despite of all the initiated schemes, application of recent technologies is a requisite for improving and reducing the burden of mortality rate present in India.

Owing to the inclining use of electronic health (e-health) systems by health organizations in India, the flow of medical information has also increased, thereby, necessitating the usage of intelligent automated systems for early detection of problems. This study aims at employing multiple data mining algorithms in medical domain followed by analysis on the basis of outcomes. Afterwards, plan concerning detailed evaluation of problems in pregnancy is sketched out as early stage detection and treatment of causes can reduce the number of deaths of women during childbearing and further, keeping into consideration the problem, various methodologies can be developed with the help of data mining and machine learning algorithms.

Data Mining refers to the identification and extraction of useful information from large collection of raw healthcare data [5] as Data Mining is able to search valuable information in the sector of healthcare, which mainly can be used for predicting various diseases, automated decision system or assistance for the doctors in making decisions [18, 30]. Depending on the type of dataset and how it is implemented, data mining algorithms have different powers in classification, clustering and prediction. Since, single selection algorithm seems to be incapable of ensuring optimal results in terms of prediction and stability, thus the effectiveness of ensemble approach involving the combination of different algorithms [6, 44] was explored. The ensemble approach has been found to be more effective in the growth of healthcare data mining and shows more promising predictive results as compared to a single classification algorithm applied on training and testing dataset. The training dataset is used to train the learners and model building after which the trained learners are combined using stacking ensemble technique and prediction is computed. The result, thus, is evaluated by comparing the predicted results of single learners and ensembles with K-fold cross validation [9, 10]. As a step further, majority voting [41] has been applied as baseline method for wrapping the combined learners and averaging the prediction of learned combiners.

In the second section of the present research paper, work related to proposed ensemble methodologies in the area of medical science was studied and explored whereas the proposed ensemble methodology, architecture and algorithm were scrutinized in the third section. In the section four, the experiments were thoroughly explored followed by the presentation of results and discussion in the fifth section. Conclusion, future scope and benefits of the proposed methodology were examined in the last section of the research paper.

2 Related work

Over the last couple of years, researchers have worked out a lot of ensemble methodologies for analysis and prediction in medical domain. The ensuing paragraphs reflect the review of literature of research work carried out by the researchers.

Abdar et al. [3] proposed two-layer nested ensemble for early detection of breast cancer by employing classifier and Meta Classifiers. They combined the nature of stacking and voting ensemble techniques and variation in classification algorithms was accomplished in Meta Classifier. The Wisconsin Diagnostic Breast Cancer Dataset was used for conducting experiment and evaluation of model was done on the basis of K-fold cross validation, wherein the results indicated that two-layer nested ensemble performed better as compared to single classifiers and SV-NaïveBayes-3-Meta Classifier took less time to build model as it was discerned to be more efficient towards diagnosis of breast cancer. Esener et al. [22] presented a framework for breast cancer diagnosis and prediction employing feature ensemble with multistage classification scheme. They collected a publicly available mammogram dataset during the Image Retrieval in Medical Applications (IRMA) project and three groups of features were concatenated to construct the feature vectors which were local configuration pattern-based, statistical and frequency domain features. After feature extraction, eight well-known classification algorithms were applied in three stages i.e. one-stage study, two-stage study and three-stage study with 11-fold cross-validation. The results indicated that the performances were combined via a majority voting technique to improve the recognition accuracy and multistage classification scheme was found to be more effective than the single-stage classification for prediction. Moreira et al. [40] created an ensemble with nearest-neighbor classifiers using the random subspace algorithm which classified unbalanced pregnancy database. The performance of proposed ensemble was evaluated by Area under Curve (AUC) and other indicators of confusion matrix using ten-fold cross-validation method. This approach predicted the Apgar score and gestational age during childbirth which could be strongly associated with the neonatal death risk and also predicted fetus-related problems that developed hypertensive disorders in pregnancy. Cong et al. [19] proposed a selective ensemble method using KNN, SVM and Naive Bayes as the Base Classifier with ten-fold cross-validation in order to diagnose breast cancer. Different ultrasound images were combined with mammography images to calculate gray level co-occurrence matrix (GLCM) which can provide a method for generating texture features and extracting morphological features. The selective ensemble method was noted to be efficient in diagnosing the breast cancer along with the classifier-fusion method as compared to the feature fusion method during the study.

Researchers Kabir and Ludwig [31] presented a technique called super learning or stacked-ensemble coming up with the optimal weighted average of diverse learning models achieving better performance than the individual base classifier. Bashir et al. [9, 10] conducted a study wherein Naive Bayes, Decision Tree based on Gini Index and Information Gain, Instance-based classifier and Support Vector Machine were applied across heart dataset and achieved the accuracy of 87.37% with ten-fold cross-validation. Rahman et al. [45] utilized ensemble method of three data mining modeling techniques viz. Logistic Regression, Naïve Bayes and Neural Network for Robust Intelligent Heart Disease Prediction System (RIHDPS) and the same could predict accurately just by analyzing the history of heart disease in a patient. The proposed RIHDPS produced an accuracy of 91.26% and logistic regression decision boundary using Principal Component Analysis (PCA). Bashir et al. [11] developed an application ‘IntelliHealth’ based on proposed model called ‘HM-BagMoov’, an ensemble framework with multi-layer classification using enhanced bagging and optimized weighting applied on five different datasets that may be used by hospitals/doctors for diagnosis/advice. Abdar et al. [1, 2] applied novel decision tree based algorithm on liver disease dataset and observed that C5.0 algorithm via. Boosting technique achieved an accuracy of 93.75% which was found to be better than the boosted CHAID algorithm. Shastri and Mansotra [46] designed a conceptual framework viz. KDD-MHCI based on Knowledge Discovery in Databases (KDD) for discovering knowledge from the databases of maternal health and child immunization (MHCI).

3 Methodology

3.1 Proposed stacking model

This section introduces the new proposed methodology, architecture and algorithm with its fundamental design and the features of each of three parts of proposed method.

Let Ds is the dataset, fn is the set of feature vectors, tn is the set of target variables and L = {L1, L2, L3, …, Ln} is the set of algorithms that will be applied on dataset Ds. In proposed K-level nested stacking, two or more stacking techniques can be combined to arrive at better performance than single learners. The proposed system has multiple levels of stacks (nested stacking) where stacking can be applied K times and is very flexible to use several Base Learners as shown in Fig. 1. Assume, we have stacking learning technique and different combination of algorithms L1 = {L1,1, L1,2, …, L1, N}, {L2,1, L2,2, …, L2, N}, …, {LM,1, LM,2, …, LM, N} as Base Learners with Meta Learners. The output obtained will be {P1,1, P1,2, …, P1, N} of Level-1 and if there arises the need of more stacking then we move to next levels of stacking. After implementing the combination of these learners, the majority voting technique is applied for calculation of final output. If optimal result is achieved in 1-Level, there is no further need to apply stacking two or more times.

Fig. 1
figure 1

Proposed K-level nested stacking framework

The proposed K-level nested stacking framework is shown in Fig. 1 and the proposed algorithm for nested stacking with K-fold cross validation is depicted in Algorithm 1.

figure a

The present paper deals with the heterogeneous ensemble [42] method based on Stacking and Voting techniques. The nested stacking with K-fold cross-validation [17, 35] includes two learners namely Base Learner and Meta Learner and the performance of the model is verified using K-fold cross-validation [52]. The ensemble in this study is called SV-(n-Base, n-Meta) where SV is Stacking and Voting, n is as many numbers of learners used. The different combinations of these Base and Meta Learners are indicated in Table 1.

Table 1 Combinations of selected base and meta learners

The individual classification algorithm was applied on extracted subset of features of training data and their performance was assessed based on K-fold cross-validation. As per nature of stacking, several level learners are primarily trained followed by the prediction. Test data classification is later on accomplished, firstly by producing the output of the Base Learner and then passing these outputs to the Meta Learner to give rise to the final prediction [15, 36]. The present work is only up to 1-level of stacking and the architecture for the same is depicted in Fig. 2.

Fig. 2
figure 2

1 level stacking architecture

The proposed heterogeneous ensemble is implemented by using eight different classification algorithms viz. Random Forest, Logistic, CART, JRip, PART (Base Learners) and Hoeffding tree, REPTree, J48 (Meta Learners). The description of techniques used is presented in Table 2.

Table 2 Learners description table

4 Experiments

4.1 Dataset

The dataset used in this study was taken from Health Management Information System (HMIS) portal of Ministry of Health and Family Welfare (MoHFW), Government of India and it comprised of all 674 districts of India for the years 2014–18 and contained 33 parameters. Out of 674 districts, 386 districts were reflecting high MMR and the rest 288 were found to be low MMR districts. Out of 33 input parameters, the important 16 parameters were selected for modeling by using Forward Feature Selection technique of Wrapper Method which is depicted in Table 3. Additionally, a flag variable i.e. MMR with 2 values viz. High MMR and Low MMR was used as the class label for the present work.

Table 3 Selected feature set

4.2 Evaluation measures

To evaluate the performance, several performance measures were used:

  1. (a)

    Accuracy: The ratio of the number of correct predictions to the total number of predictions [20].

  2. (b)

    Precision: The ratio of number of true positives to the number of positives [29].

  3. (c)

    Recall: The ratio of correctly classified positives out of the total positives in that particular class [4, 8].

  4. (d)

    F-measure: The weighted average of precision and recall [39].

  5. (e)

    ROC: Receiver Operating Characteristic (ROC) curve is a probability curve showing the performance of a classification model by plotting true positive rate against true negative rate at varied threshold values [26, 33].

Furthermore, nested stacking ensemble approach was applied to verify substantial improvement, if any, in the prediction method.

4.3 Working environment

The experiments were done on WEKA environment. Waikato Environment for Knowledge Analysis (WEKA) is a free and open source software used for the study, implementation, construction or development of machine learning schemes [38] since it is freely available to the public and is widely used for research in the data mining and machine learning fields, as it represents a conglomeration of diverse machine learning methods for data visualization, classification, clustering, regression etc.

5 Results and discussion

5.1 Results without ensemble techniques

In this section, individual learners assessment was performed on Maternal Health dataset by varying the values of K-fold cross validation. Table 4 indicates the comparison of accuracy, precision, recall, F-measure and ROC results of individual learner techniques using CART, Random Forest and JRip. It can be seen from Table 4 that Random Forest reflected better performance than other learners.

Table 4 Results without stacking

5.2 Results with ensemble techniques

The performances of SV-(n-Base, n-Meta) with variation of 2 or 3 Base and Meta Learners are reported in this section. The aim of using nested stacking ensemble was to comprehend the best suitability of methods to the available data and corroborate their effect on classification accuracy. The proposed ensemble model was applied on each test set with varied fold of cross validation for result calculation followed by analysis to verify the superiority of the same. In the present piece of research, the value of batch-size was 200. Table 5 reported the performance of JRip algorithm as Base with various combinations, out of which SV-(3-Base, 3-Meta) achieved better accuracy of 90.06% when K = 15. Table 6 depicted the performance of Random Forest algorithm as Base Learner with four combinations. Out of four combinations, three combinations viz. SV-(2-Base, 2-Meta), SV-(2-Base, 3-Meta) and SV-(3-Base, 2-Meta) were performing better at K = 15. The combinations SV-(2-Base, 2-Meta) and SV-(2-Base, 3-Meta) had the same accuracy of 90.80% whereas SV-(3-Base, 2-Meta) gave an accuracy of 91.10%. The difference of accuracy between 91.10% and 90.80% was discerned to be very minimal. So, to select the best model among these three combinations, other measures were also used and shown in Sects. 5.3 and 5.4. It is further recognizable from Table 7 that SV-(3-Base, 3-Meta) outperformed better with 90.65% accuracy using CART algorithm as Base at K = 15.

Table 5 1-Level stacking results using JRip as base learner
Table 6 1-Level stacking results using random forest as base learner
Table 7 1-Level stacking results using CART as base learner

5.3 Overall comparison

The overall comparisons of accuracy, ROC and F-measure were made with single learner and proposed 1-level heterogeneous ensemble with variation of K-fold cross-validation which is reflected in histograms in Figs. 3, 4 and 5. As per figures, the proposed 1-level nested stacking ensemble performed much better than traditional classification techniques. The best results achieved by RF: [SV-(3-Base, 2-Meta)] indicated an accuracy rate of 91.10% at K = 15 cross-validation which was found to be greater than other created models. Figure 4 demonstrated a remarkable prediction ROC of 95.10 by RF: [SV-(2-Base, 3-Meta)] model with K = 10 cross-validation. Figure 5 explained that the value of F-measure is foremost by RF: [SV-(3-Base, 2-Meta)] i.e. 91.10. To conclude, the combinations of Random Forest exhibited outstanding accuracy, ROC and F-measure with variation of Base and Meta Learners. Further, to choose the finest ensemble model among all these combinations of Random Forest, the time comparison was also calculated which is discussed in Sect. 5.4.

Fig. 3
figure 3

Overall accuracy comparison

Fig. 4
figure 4

Overall ROC comparison

Fig. 5
figure 5

Overall F-measure comparison

5.4 Computational time comparison

The overall time taken comparisons to train the model are shown in Table 8 and RF: [SV-(3-Base, 2-Meta)] was observed to perform better among others. Therefore, RF: [SV-(3-Base, 2-Meta)] was considered as the best performance model among all the models used in this research work in terms of accuracy, F-measure and training time.

Table 8 Overall time comparison

6 Conclusion

In all walks of life, people gradually focus on collecting and utilizing data. The experts from various fields now-a-days are investigating the dataset by applying data mining algorithms for well-being of the society. In this study, the aim was to ameliorate the current status of the deaths of women by undertaking prediction procedures via Base and Meta Learners. The well-known algorithms of data mining are considerably important in medical field. Thus, through this paper, a novel attempt was made to investigate maternal dataset by way of more effective nested stacking technique. There is limited research on India’s Maternal Mortality Ratio (MMR) when stacking techniques were applied, therefore, researchers endeavored to find various combinations of Base and Meta Learners with the intent to examine which learners are preeminent for making predictions in maternal deaths. The reliability of the system was evaluated by computing the accuracy and ROC area of algorithms without stacking and with stacking ensemble. In case of Base Learners, Random Forest attained accuracy of 87.98% among other learners. However, it was observed that in case of proposed nested stacking, accuracy of 91.10% was achieved from combination RF: [SV-(3-Base, 2-Meta)] and among three Base Learners, Random Forest achieved noteworthy prediction accuracy. Furthermore, this accuracy was obtained by varying K-fold cross validation and by working on best feature subset obtained after feature selection process. Therefore, Random Forest showed its potential in terms of efficiency and effectiveness based on accuracy, F-measure and training time. Thus, from this study, it has been concluded that Random Forest when using as Base Learner with the combination of two other Base Learners viz. Logistic and PART and two Meta Learners viz. Hoeffding and REPTree was best suited model for classifying the Maternal Health data into High MMR and Low MMR. In future, the same model shall also be implemented on other datasets related to healthcare. Moreover, the current work shall also be extended to n-level stacking.