Performance Assessment of Machine Learning Classifiers Using Selective Feature Approaches for Cervical Cancer Detection

Chauhan, Nitin Kumar; Singh, Krishna

doi:10.1007/s11277-022-09467-7

Performance Assessment of Machine Learning Classifiers Using Selective Feature Approaches for Cervical Cancer Detection

Published: 12 January 2022

Volume 124, pages 2335–2366, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Wireless Personal Communications Aims and scope Submit manuscript

Performance Assessment of Machine Learning Classifiers Using Selective Feature Approaches for Cervical Cancer Detection

Download PDF

780 Accesses
21 Citations
Explore all metrics

Abstract

Worldwide, cervical cancer is the leading cause of death among women from cancer. The symptoms of this gynecological disease are difficult to recognize at early stage, especially in those countries that don’t have facility of screening programs. In diagnosis of cervical cancer, machine learning methods can be used to detect the malignous cancer cells at initial stage. The foremost apprehension in disease diagnosis involves data imbalance issue and non-uniform scaling in dataset. In this article, a prevalent oversampling approach Synthetic Minority Oversampling Technique along with fivefold cross-validation is being used on unscaled and scaled data to handle these issues. A promising comparison is been made among the performance of most prevalent machine learning (ML) classifiers such as Naive Bayes, Logistic Regression, K-Nearest Neighbor, Support Vector Machine (SVM), Linear Discriminant analysis, Multi-Layer Perceptron, Decision Tree (DT) and Random Forest (RF) on unscaled data and scaled data obtained by Min–Max scaling, Standard scaling and Normalization. RF, SVM and DT are the top three ML algorithms obtained in cervical cancer diagnosis for which optimization possibilities are explored with feature selection methods as Univariate feature selection and Recursive feature elimination (RFE). Overall performance of Random Forest predictor with RFE (RF-RFE) is superior to all others being implemented.

Evaluation of machine learning based optimized feature selection approaches and classification methods for cervical cancer prediction

Article 25 May 2019

A Comparison of Machine Learning Algorithms to Predict Cervical Cancer on Imbalanced Data

Automated invasive cervical cancer disease detection at early stage through suitable machine learning model

Article Open access 16 September 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Among the severe medical emergencies, cancer is the most lethal disease induced by tumor cells. Obsolete development of cells as tumors, is still being a major challenge ahead of technological world today. Cancer treatments of tumor cells involve chemotherapy and endoscopy also has high risks of destroying the healthy tissue cells. Cervical cancer is found as fourth most prominent category of death from cancer in between women. By World Health Organization (WHO), there were 604,127 new cases of cervical cancer found in 2020 with mortality of 341,831 patients, 6.5% of all cancer infected females. In 2020, more than 83% of deaths of cervical cancer infected patients occurred in the low and middle income nations [1]. Here alone in India, in 2020, 18.3% cases of cervix cancer observed in all women cancer patients that are 9.4% of all cancer infected patients. It has found third place in all cancer cases in India, total 123,907 new registered patients along with 77,348 deaths in 2020 [2]. Each type of cancer is categorized into malignant and benign cancer. Due to the lack of, early diagnosis, effective screening and treatment programs, causes it to be as one of the most malignant type of cancer. Cervical cancerous cells are developed in the cervix of female’s uterus and early stage symptoms are abnormal bleeding in vagina, increase in vagina discharge, menopause bleeding after going through menopause, pain during sex, pelvic pain, etc.

Cervical cancer is found in females infected with the Human Papilloma virus (HPV) that causes cervical tissue to be change abnormally. Sexual relations with different partners, early age sexual relationship, long term usage of oral contraceptives; smoking, etc. lead to increase risk of cervical cancer [1, 3]. Most popularly, Pap test and HPV DNA test are recommended for screening of cervical cancer. Pap test (a.k.a. Pap smear test) is a cytology-based screening test in which a sample of cells is taken from cervix of a female and then tests for abnormality in cervix cells with cancer cells and also the cells that causes increase in chances of cervical cancer. HPV DNA test detects any type of HPV within the cells taken from the cervix that are responsible to lead cervical cancer. Pap smear and HPV tests both can be examine at similar time using the similar swab or by second swab. On suspicion of cervical cancer, patients have to go through the detailed diagnosis tests such as biopsy [4]. Presently, along with the conventional medical approaches, computer vision algorithms i.e. machine learning in cyber-physical system, interestingly playing a vital role in various medical applications such as diagnosis of diseases. Here in this paper, we applied some of the most popular machine learning (ML) approaches like NB, LR, KNN, SVM, LDA, MLP, DT and RF on cervical cancer data with some preprocessing methods. Analyzing with all the risk factors in disease diagnosis degrades the efficiency of classification model and also increases its computational complexity. So selection of relevant features also plays a vital role while analyzing the performance of a classification model. This article also realizes some of the popular feature selection methods for getting optimized performance in classification of disease.

2 Background of ML Algorithms Used

2.1 Naive Bayes (NB)

NB is another supervised classification model based on a conditional probabilistic approach utilizing the Bayes theorem to detect all the substances within the information set. This classification method is often suited for high dimensionality datasets [5,6,7]. This approach classifies the problem based on joint posterior probability distribution:

$$\begin{aligned} & p{(}C{|}X) = \frac{{p\left( C \right) p{(}X{|}C)}}{p\left( X \right)} \\ & p{(}X{|}C) = \mathop \prod \limits_{i = 1}^{n} p(x_{i} |C) \\ \end{aligned}$$

Here, $p(C/X)$, p(C) and $p\left( X \right)$ gives the posterior probability, prior probability of class and probability of attributes respectively and X is the vector space of n attributes. Due to statistical independence among features, these classifiers are highly scalable and can utilize limited training data with high dimensional features. In [6], Weighted Principle Component Analysis (WPCA) is used along with NB classifier to achieve improved performance in pap smear cervix cell image classification for Herlev dataset. References [8,9,10,11] compare NB classifier prognosis performance for cervical data with other ML classification models.

2.2 Logistic Regression (LR)

It is a statistical binary classification supervised learning method that fits linear regression algorithm to classify data in terms of discrete binary outputs based on logistic function. It intuits maximum-likelihood estimation by a search procedure to minimize the probability error in predicted model and optimize best coefficients value for the data so that the threshold value for classification can be easily adjusted [12]. The essence of the algorithm involves the minimization of cost function:

$$\begin{aligned} & J\left( \theta \right) = - \frac{1}{{\text{m}}}\mathop \sum \limits_{i = 1}^{m} [{\text{y}}^{{\left( {\text{i}} \right)}} \log h_{\theta } \left( {{\text{x}}^{{\left( {\text{i}} \right)}} } \right) + \left( {1 - {\text{y}}^{{\left( {\text{i}} \right)}} } \right){\text{ log}}\left( {1 - { }h_{\theta } \left( {{\text{x}}^{{\left( {\text{i}} \right)}} } \right)} \right] \\ & h_{\theta } \left( {\text{x}} \right) = \frac{1}{{1 + {\text{e}}^{{ - \theta^{T} x}} }} \\ \end{aligned}$$

Here, $h_{\theta } \left( {\text{x}} \right)$ denotes the hypothesis for logistic function, $\log h_{\theta } \left( {{\text{x}}^{{\left( {\text{i}} \right)}} } \right)$ and ${\text{log}}(1 - { }h_{\theta } \left( {{\text{x}}^{{\left( {\text{i}} \right)}} } \right)$ gives the cost function when class y is ‘1’ and ‘0’ respectively for m training examples. Reference [13] proposed a LR classifier with fuzzy inference model utilizing combined grayscale-texture based features for Cervical Intraepithelial Neoplasia (CIN) image classification. Many researches [8, 10, 11, 14] utilized LR as one of the classifier to perform comparative analysis in cervical cancer classification. Reference [15] utilized logistic regression for probability estimation of knowledge, attitude, and perception (KAP) of Human Papillomavirus (HPV) infection and cervical cancer.

2.3 Linear Discriminant Analysis (LDA)

LDA is popularly known as dimensionality reduction approach however it proved an effective algorithm to classify objects in two or more groups or clusters based upon the features measured describing that objects. LDA is alternative of LR when there are more than two classes to be classified as LR has limited to binary classification problems. It comprises statistical characteristics of data determined for each of the class that is used to predict decision based on Bayes theorem [16]. Its objective lies in predicting the class, for an input x with the largest:

$$\delta_{k} \left( x \right) = \log \pi_{k} - \frac{1}{2}\mu_{k}^{T} \hat{\varepsilon }^{ - 1} \mu_{k} + x^{T} \hat{\varepsilon }^{ - 1} \mu_{k}$$

Here, $\pi_{k} = p\left( {y = k} \right)$ exactly, $\mu_{k}$ and $\hat{\varepsilon }$ denotes mean and the covariance matrix for class ‘k’. Reference [17] implemented fuzzy-entropy based prime feature discrimination from segmented cell nuclei for Herlev dataset. These segmented features are used with LDA as one of the classification model for abnormal cell detection.

2.4 K-Nearest Neighbor (KNN)

KNN is a non-parametric classification technique that uses feature similarity approach by searching the very similar data points among the available data to categorize them into a class. The KNN finds nearest K data points by determining the Euclidean distance (Other distance measures includes Manhattan, Minkowski and Hamming distance) to the given query point and identifies its class by determining the mainly repeated class label. The value of K is chosen by parameter tuning, providing best suited prediction for the given data [18]. An input x is considered to belong from the class that evident largest probability among all:

$$p{(}y = j{|}X = x) = \frac{1}{k}\mathop \sum \limits_{i \in A} I\left( {y^{\left( i \right)} = j} \right)$$

Here, I(x) denotes the indicating function that is ‘1’ for argument x is true and 0 otherwise, A is the set of K nearest points of input x. References [9,10,11, 17] utilized KNN classifier for comparative analysis with other classification models for cervical cancer classification.

2.5 Multilayer Perceptron (MLP)

MLP is a neural network based proficient and robust method that is used for finding solution of nonlinear and complex classification problems. It comprises multiple neurons arranged in form of input layer, hidden layers and output layer. Here some the nodes i.e. neurons uses non-linear activation functions so that it can also find solution for the problem that are not linearly separable. Here the most complex task is to determine the hidden layer size [19]. The optimization objective of MLP model is based on minimization:

$$\begin{aligned} & \min \left| {\left| {F\left( {X,W} \right) - d} \right|} \right|^{2} \\ & F\left( {X, W} \right) = Y = \left( {y_{1} ,y_{2} ,y_{3} , \ldots ,y_{{n_{N + 1} }} } \right) \\ \end{aligned}$$

Here, F denotes transfer function, X is input to the model, W indicates weight matrix, d is desired response, Y denotes computed output vector, N is total count of hidden layers and n_N+1 indicates total output layer neurons. As a consequence incorrect estimation of same may results in approximation error, generalization error and overfitting. In [14], MLP classifier is used for performance comparison with other classifiers for two-class classification on risk-factor cancer data using RFE and RF based ensemble method for feature selection.

2.6 Decision Tree (DT)

DT is a supervised learning tree-like structure in which every single node in DT signifies an attribute value within an instance that is to detect for a class and each branch provides a value that assumed by a node [7, 20]. Selection of best split among training samples based on the measures in terms of class distribution:

$$\begin{aligned} & Entropy \left( t \right) = - \mathop \sum \limits_{i = 0}^{c - 1} p{(}i{|}t) log_{2} \left( {p{(}i{|}t} \right)) \\ & Gini\left( t \right) = 1 - \mathop \sum \limits_{i = 0}^{c - 1} \left[ {p{(}i{|}t} \right)]^{2} \\ & Classification\; error\left( t \right) = 1 - \mathop {\max }\limits_{i} \left[ {p{(}i{|}t} \right)] \\ \end{aligned}$$

Here, c is total number of targets and $p{(}i{|}t)$ is the predicted sample belonging to class i at a specific node t. Conventional DT [8, 10, 21] algorithms as well as types of decision tress like ID3, C4.5, C5.0, CHAID, and CART [9], and J48 [10, 11] standalone and with ensemble approach performed efficiently in cervical cancer detection.

2.7 Support Vector Machine (SVM)

Vapnik introduced the SVM approach to deal with classification as well as regression models. SVM is supervised discriminative linear approach to accomplish binary classification through an explicit hyper plane [7]. Optimization in SVM is based on the minimization of equation:

$$\mathop {\min }\limits_{\theta } C\mathop \sum \limits_{i = 1}^{m} \left[ {y^{\left( i \right)} z^{\prime}\left( {\theta^{T} x^{\left( i \right)} } \right) + (1 - y^{\left( i \right)} } \right)z^{\prime\prime}\left( {\theta^{T} x^{\left( i \right)} } \right)] + { }Regularization \;term$$

Here, C is the penalty factor for error, $z^{\prime}\left( {\theta^{T} x^{\left( i \right)} } \right)$ and $z^{\prime\prime}\left( {\theta^{T} x^{\left( i \right)} } \right)$ denotes the cost function when class y is equals to ‘1’ and ‘0’ respectively and m indicates number of samples. In [22], SVM, support vector machine-recursive feature elimination (SVM-RFE) and support vector machine-principal component analysis (SVM-PCA) methods were used for cervical cancer detection with 90–94% accuracy for the risk-factor cervical cancer data. At early stage, SVM application was constrained to two-class classification, but afterward, kernel capacities for SVM presented that are valuable in multiclass classification [17, 23, 24]. Reference [17] implemented SVM with linear kernel (SVM-linear) and radial basis function kernel (SVM-RBF) using fuzzy entropy based feature extraction mechanism for abnormal cells detection in pap-smear images. References [8,9,10,11, 14, 25] performed comparative analysis of SVM with other prediction models for cervical cancer prognosis.

2.8 Random Forest (RF)

Random forest by Breiman (2001) is based on ensemble method that is used for both classification and regression problems. Ensemble methods used to group weak learners to shape strong learner and use multiple learning approach to produce enhanced predictive result. It trains multiple numbers of DTs that returns with the class find in majority within the ensemble of overall DTs [26]. In RF, each DT predicts a class for the classification model and among all the predictions most voted class becomes our RF model prediction. Bagging approach in RF involves prediction for sample x′ by taking average of all predicted values obtained from individual DT’s:

$$\hat{f} = \frac{1}{N}\mathop \sum \limits_{n = 1}^{N} f_{n} \left( {x^{{\prime }} } \right)$$

Here, N is a parameter gives samples/tree. Generally RF algorithms perform slightly better than SVMs in many classification problems [27]. RF classifier [8, 10, 11, 14, 21, 25] performs efficiently for risk-factor cancer data as well as for pap smear cervix images in cervical cancer detection.

3 Methodology

3.1 Data Description

The cancer patients’ data used here for diagnosis is available at UCI, collected at 'Hospital Universitario de Caracas' in Caracas, Venezuela [28]. The dataset include 858 instances with 36 risk factors including 32 attributes and 4 target categories—Hinselmann, Schiller, Cytology and Biopsy. The description of attributes in the given Cervical Cancer data is shown in Table 1. Hinselmann test uses colposcopy using acetic acid, while colposcopy using Lugol iodine includes Schiller test, Cytology and Biopsy. Malignant infected target described as ‘1’ and benign as ‘0’. In whole of the dataset, around 90–96% data belongs to benign class in each of four target variables. All the attributes values given are either boolean, integer or float type. To build an efficient learning model the data to be fed to it, should be proper and complete. As some of the samples within dataset have missing values, also each attribute have different scaling ranges thus it required to preprocess the data before feeding to a learning algorithm.

Table 1 Attributes description of Cervical Cancer dataset

Performance Assessment of Machine Learning Classifiers Using Selective Feature Approaches for Cervical Cancer Detection

Abstract

Similar content being viewed by others

Evaluation of machine learning based optimized feature selection approaches and classification methods for cervical cancer prediction

A Comparison of Machine Learning Algorithms to Predict Cervical Cancer on Imbalanced Data

Automated invasive cervical cancer disease detection at early stage through suitable machine learning model

Explore related subjects

1 Introduction

2 Background of ML Algorithms Used

2.1 Naive Bayes (NB)

2.2 Logistic Regression (LR)

2.3 Linear Discriminant Analysis (LDA)

2.4 K-Nearest Neighbor (KNN)

2.5 Multilayer Perceptron (MLP)

2.6 Decision Tree (DT)

2.7 Support Vector Machine (SVM)

2.8 Random Forest (RF)

3 Methodology

3.1 Data Description

3.2 Preprocessing of Data

3.3 Implementation of ML Algorithms

3.4 Feature Selection Methodology

3.4.1 Univariate Feature Selection

3.4.2 Recursive Feature Elimination (RFE)

4 Experimental Analysis

4.1 Feature Selection Using SelectKBoost

4.2 Feature Selection Using RFE

5 Comparison Analysis

6 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and Animal Rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation