Keywords

1 Introduction and Related Work

Cancer refers to the uncontrolled growth of certain cells in the human body [1]. These cells can spread into the surrounding tissue forming a lump known as tumor or malignancy [2]. After lung cancer, the second most common malignancies and reason of mortality for women worldwide are breast cancer [3]. Breast cancer (BC) is a frequently observed cancer in females of childbearing age. Breast cancer is the prevalent diagnosed cancer and is increasing every year very rapidly [4, 5]. According to the changes in the environment, the nature of the breast cancer is also changing day by day [6]. As a result, raising awareness of the benefits of screening and early detection is desirable. Ultrasound (US), mammography, contrast-enhanced (CE), breast tomosynthesis (3D mammography), magnetic resonance imaging (MRI), computed tomography (CT), and positron emission tomography (PET) are the currently used clinical practices for the early diagnosis of BC. These methods are used to examine significant parameters such as the size, shape, location, type of cancer, stage of cancer, or how quickly it is growing. These methods are sometimes combined for a more accurate prognosis.

The most crucial and significant task is classification. Multiple classifiers and feature selection strategies are used in numerous research on datasets related to breast cancer [7]. Different machine learning (ML) classifiers have been developed for classification and employed on medical datasets. Machine learning is a subset of artificial intelligence within the realm of computing. ML is not only confined to computer science, but also extended to many other branches.

1.1 Based on Breast Cancer

Sengar [8] compared machine learning algorithms like Logistic Regression, Decision Tree on taken dataset. Decision Tree reported maximum classification accuracy of 95%. The main limitation of this work is that only two classifiers are evaluated. Anji Reddy Vaka [9] used deep neural networks on collected dataset. They collected data from Mahatma Gandhi Cancer Hospital and Research Institute, Visakhapatnam, India. As the dataset is limited, data augmentation is done to enlarge the dataset. Gaussian filtering is used for removal of noise as preprocessing step, and neglected values are removed using entropy followed by different ML algorithms for classification. Deep neural network reported highest accuracy of 97.01%.

Moh’d Rasoul Al-hadidi [10] used radiography images, and all the images are of equal size, thus making processing easier. Weiner filter is used to remove the image blurriness followed by Logistic Regression and back-propagation neural network. Back-propagation network attained maximum accuracy of 93%. Bazazeh [11] used WBCD dataset to train the model. Different machine learning algorithms like support vector machine, Random Forest, and Bayesian networks are used for evaluation.

Sadhukhan [12] converted images into fine-needle aspiration images which are further converted into grayscale images by removing hue from the images. For segmentation, thresholding is used and radii, smoothness, compactness, texture are calculated. Adel [13] cropped images to separate B-mode images amid elastography images. Different features like signal-to-noise ratio, width-to-height ratio, area, difference, perimeter difference, solidity, contrast-to-noise ratio, and compactness were extracted. Further, dimensionality reduction is done and input is fed to support vector machine achieving an accuracy of 94.12%. Kaklamanis [14] applied correlation matrix for feature selection. Further, CART, KNN, Naive Bayes, and SVM are used for classification reporting accuracy of 93%, 96%, 89%, and 96%, respectively.

1.2 Based on Feature Selection

Perez [15] used two datasets of breast mammography images. Features were selected using Mann–Whitney U Test and selected feature subset is fed as an input to feedforward back-propagation network. MacFarland [16] emphasized on Mann–Whitney U Test, and it is generally conducted on non-parametric and independent values. It was first started by testing on goats, and two groups of goats in a total of 30 were taken, in which one group received mineral supplement included in the diet, whereas the other group is supplied with normal meal. At the end of the treatment, mineral supplement supplied goats shown to be healthier than the other group. Some facts like details about mineral supplement, how it is added to the meal, cost, and treatment regulation are not disclosed.

1.3 Based on Machine Learning

Bhavsar [17] evaluated different machine learning classifiers, namely support vector machine, Decision Tree, Supervised Learning, and Nearest Neighbor Neural Network. Performance metrics’ accuracy, specificity, sensitivity were evaluated. Morgan [18] evaluated the performance using Gaussian process and Gaussian kernel ridge regression. For selecting pertinent features, Leave-Group-Out cross-validation root mean squared error is used. Fatima [19] provides comparative analysis of different machine learning algorithms for prognosis of different diseases. It emphasizes the use of machine learning algorithms for the analysis of disease and its decision-making.

2 Material and Method

2.1 Material

In this article, Wisconsin breast cancer (diagnostic) dataset (WDBC) [20] collected from UCI repository is used to differentiate benign from malignant sample. WDBC has 32 attributes and 569 instances, 357 of which are benign and 212 are malignant. Fine-needle aspirate (FNA) digitized picture was used to calculate features. These features exhibit ten characteristics of each cell nucleus. Excluding ID and diagnosis, for each attribute, mean, standard error, and “worst” or largest (mean of the three largest values) are computed. There are no missing data in the dataset. Table 1 shows the description of the WDBC dataset features.

Table 1 Description of the Wisconsin breast cancer dataset (WDBC)

2.2 Method

In this article, we describe a feature selection method for WDBC dataset diagnostic that uses the Mann–Whitney U Test followed by different machine learning classifiers for classification. Firstly, WDBC dataset is taken and unwanted columns are removed as preprocessing step. Secondly, to improve the classification accuracy, feature selection using Mann–Whitney U Test is performed to choose relevant features. To categorize tumor as benign or malignant, selected features are finally fed via machine learning classifiers. Figure 1 demonstrates a proposed method for classifying breast tumor.

Fig. 1
A block diagram of 2 connected blocks for the methodology to classify breast cancer. The first block is for, Wisconsin breast cancer dataset which can be used to preprocess the dataset. The second block is for the Mann Whitney U test-based feature selection that classifies the data into logistic regression, decision tree, and random forest. The data categorizes tumors as benign or malignant.

Proposed methodology for classifying breast tumor

The assessment is conducted on the above datasets with and without feature selection method. And, the results are compared and analyzed. Evaluation metrics’ sensitivity, specificity, and accuracy are calculated to access different machine learning classifiers. Experimental simulations were conducted using Jupyter Notebook.

2.2.1 Preprocessing

The first and most significant step is preprocessing, which enhances image quality while retaining key elements. Incorrect conclusions can be drawn from radiological images due to artifacts, noise, and other factors. The dataset consists of some unwanted columns which need to be removed for better result. No missing values are found in this dataset values. The categorical data diagnosis is changed to numerical data for compactness with the Mann–Whitney U Test [21].

2.2.2 Feature Selection

Following preprocessing, we carried out feature selection to select relevant features because they have a direct impact on classifier performance. The size of the feature space and computation time are reduced by removing redundant features. Gain ratio, recursive feature removal, Random Forest, Chi-square test, and searching algorithms are a few techniques frequently used for feature selection. In order to have effective prediction and computationally less costly models, the number of input classifier is limited. Mann–Whitney U Test is used as a feature selection technique in this paper. Mann–Whitney U Test is a statistical method used for non-uniformly distributed data.

In this test, calculation of U is done whose distribution under the null hypothesis is known. The normality of data was verified by U test, results obtained have a significant value less than 0.001 (p < 0.001), and 95% confidence interval (CI) marked those features was not normally distributed (non-parametric) [21]. A feature vector with 32 features (F [1], F [2], …., F [32]) is provided as input, and further, 26 features are chosen using the Mann–Whitney U Test.

2.2.3 Classification

Following feature selection, the classifier uses the pertinent features to categorize breast tumor as benign or malignant. Any automated system's classifier plots feature space as input to produce class labels [22]. The Naive Bayesian (NB), Decision Tree (DT), K-Nearest Neighbor (KNN), support vector machine (SVM), Random Forest, and Logistic Regression are examples of commonly used machine learning classifiers. Random Forest, Logistic Regression, and Decision Tree are evaluated in this study.

Logistic Regression transforms the linear regression model to allow us to probabilistically model the binary variables in consequence. A supervised procedure called Logistic Regression is used to predict the likelihood of a target variable. There are only two useful classes because the goal's or established variable's personality is binary. The established variable is binary in nature, with records encoded as 1 or 0. P(Y = 1) is predicted by the Logistic Regression version as a function of X [8]. Decision Tree is a popular and unsupervised approach used for classification and prediction [23]. It is represented as a recursive partition of the instance, where leaves represent the class labels and branches refer to outcome in the form of features. It is a top-down approach which divides each result of the data into subsets. This predictive paradigm acts as a mapping between the item's qualities and values. Random Forest (RF) algorithm is based on multiple Decision Trees which is merged to produce an accurate and stable prediction [24]. RF is an ensemble of classifiers grown from a certain amount of randomness. RF stands for randomized ensembles of Decision Trees and is defined as a generic principle. Every observation is input into every Decision Tree. The final result is the most common outcome for each observation.

3 Experimental Results and Discussion

This section discusses the findings of the proposed method and compares them with the other related work. Experimental simulations were conducted using Jupyter Notebook. On the WDBC dataset, simulations were used to categorize the breast tumor as benign or malignant. The proposed method employed Mann–Whitney U Test for feature selection, using Statistical Package for Social Sciences (SPSS) software with 95% confidence interval, and the significance level was chosen to be less than 0.001. The values shown in Table 2 are the asymptotic significance values obtained on conducting Mann–Whitney U Test (non-parametric test). If asymptotic significance is greater than 0.001, then the features will be eliminated. The benign and malignant values, which are in categorical form, are converted into ordinal form. Out of 30 features, 26 features are selected based on U test and four features are eliminated. Selected features are further passed through classifier, and result is evaluated with and without feature selection.

Table 2 Statistical analysis using Mann–Whitney U Test

Further features selected are fed as an input to Random Forest, Logistic Regression, and Decision Tree classifier. The dataset is split into sections: testing and training, under K-fold cross-validation protocol. Value of k = 10 is taken to compute the performance of the system. Table 3 shows evaluation measures used to evaluate the classifiers’ sensitivity, specificity, and classification accuracy. True positive represents the quantity of patients who have been correctly classified, while the number of patients who have been correctly classified as negative class is represented by true negative (tn). False positive represents the number of incorrectly predicted patients, whereas false negative indicates the number of incorrectly predicted patients.

Table 3 Performance measure

Table 4 shows the experimental result in terms of accuracy, specificity, and sensitivity obtained by applying ML algorithms with feature selection using Mann–Whitney U Test. And, Table 5 shows the result obtained without feature selection. As shown in Tables 4 and 5, performance measure under ten-fold cross-validation is evaluated. It is observed that employing Random Forest as a classifier increases accuracy on selecting features. Accuracy of 99.5%, sensitivity of 98.8%, and specificity of 94.5% are obtained.

Table 4 Performance metric obtained with feature selection
Table 5 Performance metric obtained without feature selection

A comparative analysis of the proposed technique with prior relevant work on the WDBC dataset is shown in Table 6. Accuracy rate of 97.2% for the proposed method was obtained. Asri et al. [25] used C4.5, SVM, NB, and KNN for classification. Saravana Kumar et al. [26] proposed multi-layer perceptron based on deep learning. Performance comparison of proposed work with aforementioned related work is mentioned in Table 6.

Table 6 Comparison with related work

4 Conclusion and Future Scope

This study offered a thorough methodology for ultrasound-based breast cancer diagnosis. The study's primary contributions are as follows: Firstly, the WBCD dataset were taken and some unwanted columns were removed for better result. Secondly, to pick relevant features, an effective statistical approach Mann–Whitney U Test was used. Thirdly, features are trained using different machine learning classifiers to differentiate class labels. For future scope of this work, we plan to use a substantial dataset to test our proposed study and also to use data augmentation for increasing data size and optimization techniques for feature selection. In conclusion, the potential for the proposed technique to classify breast tumors is apparent, though better optimization techniques and big datasets are still needed.