Hybrid Method for Breast Cancer Diagnosis Using Voting Technique and Three Classifiers

Saoud, Hajar; Ghadi, Abderrahim; Ghailani, Mohamed

doi:10.1007/978-3-030-37629-1_34

Hajar Saoud⁷,
Abderrahim Ghadi⁷ &
Mohamed Ghailani⁸

Part of the book series: Lecture Notes in Intelligent Transportation and Infrastructure ((LNITI))

Included in the following conference series:

The Proceedings of the Third International Conference on Smart City Applications

2110 Accesses

Abstract

Breast cancer is one of the most dangerous types of cancer in women sector; it infects one woman from eight during her life and one woman from thirty die and the rate keeps increasing. The early prediction of breast cancer can make a difference and reduce the rate of mortalities, but the process of diagnosis is difficult due to the varying types of breast cancer and due to its different symptoms. So, the proposition of decision-making solution to reduce the danger of this phenomenon has become a primordial need. Machine learning techniques have proved their performance in this domain. In previous work we tested the performance of several machine learning algorithms in the classification of breast cancer such as Bayesian Networks (BN), Support Vector Machine (SVM) and k Nearest Neighbor (KNN). In this work, we will combine those classifiers using the voting technique to produce better solution using Wisconsin breast cancer dataset and WEKA tool.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Breast Cancer Prediction by Levaraging Machine Learning Algorithm and Using Adaptive Voting Ensemble Method

Voting Based CAD Model for Breast Cancer Classification

Voting Classification Approach for Breast Cancer Detection

Keywords

1 Introduction

Breast cancer is a hard disease and its diagnosis is sometime difficult, the patient should pass through several tests starting with clinical examination ending with extracting and analyzing biological simples of breast cancer, the proposition of decision-making solution here has become a primordial need to reduce the process of diagnosis and also to reduce the rate of mortalities. In this paper, we tried to propose a solution for breast cancer diagnosis using machine learning due to their performance in the medical field.

In previous work we tried to classify breast cancer using several classifiers such as Bayes Network (BN), Support Vector Machine (SVM), k-nearest neighbors algorithm (Knn), Artificial Neural Network (ANN), Decision Tree (C4.5) and Logistic Regression in [1] the higher accuracies are given by Bayes Network (BN) and Support Vector Machine (SVM) 97.28%. Then we tried to improve those accuracy in [2] by using the technique of feature selection Best First, the accuracy of Bayes Network (BN) has increased to 97.42% but the accuracy of Support Vector Machine (SVM) has decreased to 95.279%. So, we should search for others solutions that can improve more the accuracy of classification of breast cancer.

The objective of this work is to improve the accuracy of breast cancer classification using voting technique that aim to combine between classifiers. First, we did a combination between Bayes Network (BN) and Support Vector Machine (SVM) but there is no improvement. Consequently, we added K Nearest Neighbors algorithm (Knn) and the accuracy of classification has improved.

The rest of this paper is structured as follows. Part two is a presentation of breast cancer. Part three gives a vision about similar research. Part four is a theoretic presentation of machine learning algorithms. Part five give the definition of voting technique. In part six we will explain our proposed approach. Part seven shows the experiments performed by WEKA software on Wisconsin breast cancer dataset and results of these experiments and finally conclusion and perspectives in part eight.

2 Breast Cancer

Breast cancer can be defined as an abnormal production of cells in the breast that form in the form of cancerous masses, these masses are called tumors. Cancer cells can stay in the breast these types of cancer are called non-invasive, they lead to healing and do not produce metastatic cases. The other type of breast cancer is called invasive. These are dangerous type of cancers that can spread to the other organs of the body and can lead to metastatic cases.

2.1 Types of Breast Cancer

The types of breast cancers are invasive and non-invasive, Ductal Carcinoma In Situ is a non-invasive type the others are invasive [3] (Table 1):

Table 1. Types of breast cancer.

Full size table

2.2 Diagnosis of Breast Cancer

The process of the diagnosis of breast cancer is difficult due to the varying types and symptoms of breast cancer and also the patient should pass through several steps [4] starting with physical examination, it is a palpation of the breast that can determine the signs of appearance of cancer. The next step is medical imaging, it allows the detection of tumor masses and also it provides details on the clinical examination, there are several types of medical imaging among them: Mammography, Ultrasonography and MRI, The choice of one of these techniques is made according to the case of the patient. A diagnosis can only be decided after having studied biological samples at the microscopic level of the lesions that appeared in the medical imaging, the choice of the sampling method is according to the characteristics of the lesion, the exciting techniques are Aspiration or Cytological Puncture, Biopsy and Macrobiopsia (Fig. 1).

The image obtained from the microscopic level will be studded at the same time with others images and features. So, the proposition of decision making solution will be an interesting thing to reduce the number of steps of the diagnosis also to avoid any error in the diagnosis. The machine learning techniques will be powerful tools due to their performance in the domain of medicine.

3 Related Works

Several approaches are proposed in the domain of cancer diagnosis and also for others diseases using machine learning algorithms, voting technique and others techniques like bagging, stacking and boosting. In this paragraph we will cite same of them:

Khuriwal and Mishra in [5] they proposed an adaptive ensemble voting method using Artificial Neural Network (ANN) and Logistic Regression (LR), the database used is Wisconsin Breast Cancer database. They achieved 98.50% in accuracy.

Kumar et al. in [6] they compared the performance of machine learning techniques in the classification of breast cancer using Wisconsin Breast Cancer database then they combined those techniques using voting technique. The three techniques tested in this research are Naïve Bayes, SVM and j48.

Latha and Jeeva in [7] they examined the ensemble algorithms bagging, boosting, stacking and majority voting for prediction of the heart disease using Cleveland heart dataset from the UCI machine learning repository.

Leon et al. in [8] they analyzed the influence of several voting methods on the performance of K Nearest Neighbor and Naïve Bayes algorithms used for datasets with different levels of difficulty.

Rishika and Sowjanya in [9] they aim to compare the performance of Decision Tree, Neural Network and Naive Bayes, then they tried to combine between them using stacking approach.

Sri Bala and Rajya Lakshmi in [10] they implemented four models Adaboosting, bagging and stacking or blending on preliminary classifiers to improve the accuracy of the classification of breast cancer. So, the totals of built models are 12.

4 Machine Learning Algorithms

The machine learning techniques that we will see in this paper are Bayesian Network (BN), Support Vector Machines (SVM) and k-nearest neighbors algorithm (Knn). We will examine each algorithm separately than we will combine between them to improve the accuracy of classification of the breast cancer.

4.1 Bayesian Network (BN)

Bayesian Network [11], also called (Bayesian belief network), is directed acyclic graph (DAG) composed of nodes and edges, the nodes represent variables and edges represent the probabilistic dependencies between those variables. Bayesian Network combines principles of statistics, graph theory, probability theory and computer science.

4.2 Support Vector Machines (SVM)

Support Vector Machines is supervised learning model, which is always known by the notion of hyperplane, this hyperplane is a line that divide a plan into two spaces each space represent a class. Taking training data the Support Vector Machines well search an optimal hyperplane that will separate the data into two dimensional spaces.

4.3 K Nearest Neighbors Algorithm (KNN)

The k-nearest neighbors classifier is a supervised machine learning algorithm that can be used in both classification and regression. The k-nearest neighbors classifier capture the idea of similarity (called also distance). So, the principle of k-nearest neighbors that it calculates the distance between a given test tuple and others tuples to search the K closest tuples, these tuples are named (k nearest neighbors).

5 Voting Classifier Technique

Voting classifiers is a technique used in classification; it aims to combine between classifiers to improve the accuracy of classification. The principle of voting technique that each machine learning technique gives classification or output then the vote of those outputs will be taken as classification (Fig. 2).

If we take the example of 3 classifiers C1, C2 and C3 the prediction of each classifier successively will be P1, P2 and P3. The final prediction will be:

P_F = mode {P1, P2, P3}.

6 Proposed Method

In our proposed method we will improve the accuracy of the classification of the three machine learning algorithms Bayes Network (BN) and Support Vector Machine (SVM) and K Nearest Neighbors algorithm (KNN) by using the voting technique, that aim to combine between them to improve the accuracy of classification. Figure 3 represents the process of the proposed method first we choose Wisconsin breast cancer dataset, then we did the pre-processing of data to eliminate missing data and finally we passed to the step of classification.

7 Experimentation and Results

7.1 Description of the Dataset

The database that we used in this research is the Wisconsin breast cancer dataset available in UCI machine learning repository [12]. It contains 699 records (458 benign tumors and 241 malignant tumors). It is composed of 11 variables 10 predictor variables and one result variable that shows whether the tumor is benign or malignant. The predictive attributes vary between 0 and 10. The value 0 corresponds to the normal state and the value 10 corresponds to the most abnormal state.

The table above presents the description of the 11 attributes of the Wisconsin breast cancer dataset (Table 2):

Table 2. Attributes of WBCD.

Full size table

7.2 WEKA Tool

The tool that we used to apply the machine learning algorithms on the breast cancer database is WEKA [13], because WEKA is a collection of open source machine learning algorithms, which allows realizing the tasks of data mining to solve real world problems. It contains tools for data preprocessing, classification, regression, grouping, and association rules. Also it offers an environment to develop new models.

7.3 K-Fold Cross-validation

To evaluate the performance of machine learning algorithms based on breast cancer data we used the K-fold cross validation test method. This method aims to divide the database in two sets, the training data to run the model and the testing data to evaluate the performance of the model. This is the most used method in the evaluation of machine learning techniques (Fig. 4).

7.4 Confusion Matrix

Confusion matrix gives the possibility to evaluate the performance of each classifier by calculating its Accuracy, Sensitivity and Specificity. It contains information about real classifications or (current) and predicted (Table 3):

Table 3. Confusion matrix.

Full size table

TP: the cases predicted as benign tumors, they are in fact benign tumors.

TN: the cases predicted as malignant tumors, they are in fact malignant tumors.

FP: the cases predicted as benign tumors but in the reality they are malignant tumors.

FN: the cases predicted as malignant tumors but in the reality they are benign tumors.

From the confusion matrix we can calculate:

$$ {\text{accuracy }} = \frac{{TP\text{ + }TN}}{{TP\text{ + }FP\text{ + }TN\text{ + }FN}} $$
$$ {\text{Sensitivity }} = \frac{TP}{TP + FN} $$
$$ {\text{Specificity }} = \frac{TN}{TN + FP} $$

7.5 Bayesian Network (BN)

The accuracy obtained by Bayesian Network (BN) is 97.28%, 680 from 699 are well classified instances and 19 are incorrectly classified instance that represent the 2.71%. Table 4 represents the confusion matrix of Bayesian Network (Figs. 5, 6 and Table 5):

Table 4. Confusion matrix of BN.

Full size table

Table 5. Results of BN.

Full size table

7.6 Support Vector Machines (SVM)

The accuracy obtained by Support Vector Machines (SVM) is 97.28% using the Puk as kernel function, 680 from 699 are well classified instances and 19 are incorrectly classified instance that represent the 2.71%, the same results as Bayesian Network (BN). Table 6 represents the confusion matrix of Support Vector Machines (SVM) (Figs. 7, 8 and Table 7):

Table 6. Confusion matrix of SVM.

Full size table

Table 7. Results of SVM.

Full size table

7.7 BN-SM

The accuracy obtained by BN-SVM is 96.99% there is no improvement, 678 from 699 are well classified instances and 21 are incorrectly classified instance that represent the 3%. Table 8 represents the confusion matrix of BN-SVM (Figs. 9, 10 and Table 9):

Table 8. Confusion matrix of BN-SVM.

Full size table

Table 9. Results of BN-SVM.

Full size table

7.8 K Nearest Neighbors Algorithm (KNN)

The accuracy obtained by k-nearest neighbors algorithm (Knn) is 95.27%, 666 from 699 are well classified instances and 30 are incorrectly classified instance that represent the 4.72%. Table 10 represents the confusion matrix of k-nearest neighbors (KNN) (Figs. 11, 12 and Table 11):

Table 10. Confusion matrix of KNN.

Full size table

Table 11. Results of KNN.

Full size table

7.9 BN-SVM-KNN

The accuracy obtained by the proposed combination of the three algorithms by voting techniques is 97.56%, 682 from 699 are well classified instances and 17 are incorrectly classified instance that represent the 2.43%. Table 12 represents the confusion matrix of BN-SVM-KNN (Figs 13, 14 and Table 13):

Table 12. Confusion matrix of BN-SVM-KNN.

Full size table

Table 13. Results of BN-SVM-KNN.

Full size table

Table 14 resumes the obtained results by each algorithm:

Table 14. Results of all models.

Full size table

8 Conclusion

To conclude, in this paper we tried to classify breast cancer into its two types benign or malignant using machine learning algorithm and the voting technique. First we examined each algorithm, Bayes Network (BN), Support Vector Machine (SVM) and k-nearest neighbors algorithm (KNN) separately then we tried to combine between them to improve the accuracy of the classification of breast cancer using the voting technique the accuracy produced 97.56%. The database of breast cancer in which the algorithms are tested is Wisconsin breast cancer dataset available in UCI machine learning repository using the WEKA tool.

References

Saoud, H., et al.: Application of data mining classification algorithms for breast cancer diagnosis. In: Proceedings of the 3rd International Conference on Smart City Applications - SCA 2018, pp. 1–7. ACM Press, Tetouan (2018). https://doi.org/10.1145/3286606.3286861
Saoud, H., et al.: Using feature selection techniques to improve the accuracy of breast cancer classification. In: Ben Ahmed, M., et al. (ed.) Innovations in Smart Cities Applications, edn. 2. pp. 307–315. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-11196-0_28
Le cancer du sein. https://www.passeportsante.net/fr/Maux/Problemes/Fiche.aspx?doc=cancer_sein_pm. Accessed 19 Oct 2019
Le diagnostic. https://rubanrose.org/cancer-du-sein/depistage-diagnostics/diagnostic. Accessed 19 Oct 2019
Khuriwal, N., Mishra, N.: Breast cancer diagnosis using adaptive voting ensemble machine learning algorithm. In: 2018 IEEMA Engineer Infinite Conference (eTechNxT), pp. 1–5. IEEE, New Delhi (2018). https://doi.org/10.1109/ETECHNXT.2018.8385355
Kumar, U.K., et al.: Prediction of breast cancer using voting classifier technique. In: 2017 IEEE International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), pp. 108–114. IEEE, Chennai (2017). https://doi.org/10.1109/ICSTM.2017.8089135
Latha, C.B.C., Jeeva, S.C.: Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inform. Med. Unlocked 16, 100203 (2019). https://doi.org/10.1016/j.imu.2019.100203
Article Google Scholar
Leon, F., et al.: Evaluating the effect of voting methods on ensemble-based classification. In: 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA), pp. 1–6. IEEE, Gdynia (2017). https://doi.org/10.1109/INISTA.2017.8001122
Rishika, V., Sowjanya, A.M.: Prediction of breast cancer using stacking ensemble approach 11
Google Scholar
Int. J. Adv. Res. Comput. Sci. Softw. Eng
Google Scholar
Mahmood, A.: Structure learning of causal bayesian networks: a survey 6
Google Scholar
UCI Machine Learning Repository: Breast Cancer Wisconsin (Original) Data Set. https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original). Accessed 19 Oct 2019
Weka 3 - Data Mining with Open Source Machine Learning Software in Java. https://www.cs.waikato.ac.nz/ml/weka/. Accessed 19 Oct 2019
Saoud, H., Ghadi, A., Ghailani, M.: Analysis of evolutionary trends of incidence and mortality by cancers. In: Ben Ahmed, M., Boudhir, A. (eds.) Innovations in Smart Cities and Applications. SCAMS 2017. Lecture Notes in Networks and Systems, vol. 37. Springer, Cham (2018)
Google Scholar

Download references

Acknowledgement

H. Saoud acknowledges financial support for this research from the “Centre National pour la Recherche Scientifique et Technique” CNRST, Morocco.

Author information

Authors and Affiliations

LIST Laboratory, University of Abdelmalek Essaadi (UAE), Tangier, Morocco
Hajar Saoud & Abderrahim Ghadi
LabTIC Laboratory, University of Abdelmalek Essaadi (UAE), Tangier, Morocco
Mohamed Ghailani

Authors

Hajar Saoud
View author publications
You can also search for this author in PubMed Google Scholar
Abderrahim Ghadi
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Ghailani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hajar Saoud .

Editor information

Editors and Affiliations

Faculty of Sciences and Techniques of Tangier, Mediterranean Association of Sciences and Technologies, Tangier, Morocco
Mohamed Ben Ahmed
Faculty of Sciences and Techniques of Tangier, Abdelmalek Essaadi University, Tangier, Morocco
Anouar Abdelhakim Boudhir
Polytechnic Institute of Castelo Branco, Castelo Branco, Portugal
Domingos Santos
Hassania School of Public Works, Casablanca, Morocco
Mohamed El Aroussi
Computer Engineering Department, Faculty of Engineering, Karabuk University, Karabük, Turkey
İsmail Rakıp Karas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saoud, H., Ghadi, A., Ghailani, M. (2020). Hybrid Method for Breast Cancer Diagnosis Using Voting Technique and Three Classifiers. In: Ben Ahmed, M., Boudhir, A., Santos, D., El Aroussi, M., Karas, İ. (eds) Innovations in Smart Cities Applications Edition 3. SCA 2019. Lecture Notes in Intelligent Transportation and Infrastructure. Springer, Cham. https://doi.org/10.1007/978-3-030-37629-1_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-37629-1_34
Published: 01 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37628-4
Online ISBN: 978-3-030-37629-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Hybrid Method for Breast Cancer Diagnosis Using Voting Technique and Three Classifiers

Abstract

Similar content being viewed by others

Breast Cancer Prediction by Levaraging Machine Learning Algorithm and Using Adaptive Voting Ensemble Method

Voting Based CAD Model for Breast Cancer Classification

Voting Classification Approach for Breast Cancer Detection

Keywords

1 Introduction

2 Breast Cancer

2.1 Types of Breast Cancer

2.2 Diagnosis of Breast Cancer

3 Related Works

4 Machine Learning Algorithms

4.1 Bayesian Network (BN)

4.2 Support Vector Machines (SVM)

4.3 K Nearest Neighbors Algorithm (KNN)

5 Voting Classifier Technique

6 Proposed Method

7 Experimentation and Results

7.1 Description of the Dataset

7.2 WEKA Tool

7.3 K-Fold Cross-validation

7.4 Confusion Matrix

7.5 Bayesian Network (BN)

7.6 Support Vector Machines (SVM)

7.7 BN-SM

7.8 K Nearest Neighbors Algorithm (KNN)

7.9 BN-SVM-KNN

8 Conclusion

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation