Abstract
Invasive breast cancer is a complex global health issue and the leading cause of women's mortality. Multiclassification in breast cancer, especially with high-resolution images, presents unique challenges. Clinical diagnosis relies on the cancer's pathological stage, requiring precise segmentation and adjustments. Complex structural changes during slide preparation and inconsistent image magnifications further complicate classification. To address these challenges, we propose a hybrid machine learning framework for accurate breast cancer detection and grading using large-scale pathological images. Our approach includes an improved Non-restricted Boltzmann Deep Belief Neural Network for nuclei segmentation, followed by feature extraction and novel feature selection using the Giraffe Kicking Optimization algorithm to mitigate overfitting. We implement an Optimal Kernel layer-based Support Vector Machine classifier to identify mitotic cells and nuclear atypia, using the Nottingham Grading System. Validation on the MITOSIS-ATYPIA-14 database demonstrates the framework's effectiveness, with performance metrics including accuracy, precision, recall, specificity, and F-measure. This approach addresses the complexities of breast cancer classification and grading in a streamlined manner, enhancing diagnostic accuracy and prognosis prediction.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
In recent years, the application of machine learning in cancer grading, particularly breast cancer, has become increasingly crucial in early detection and prognosis (https://www.iarc.who.int/cancer-type/breast-cancer/.). Breast cancer is a leading cause of cancer-related deaths globally, with millions of new cases diagnosed each year. In 2020, breast cancer became the most commonly diagnosed cancer type in the world; there were more than 2.26 million new cases of breast cancer and almost 685 000 deaths from breast cancer worldwide [1]. To combat this, machine learning techniques, such as deep learning algorithms, have been integrated into clinical image processing to enable early and accurate cancer detection [2].
Advancements in sensor technologies and machine learning have paved the way for recognition systems capable of performing tasks that were previously unimaginable [3, 4]. Breast cancer, being the most common malignancy in women, has been a focus of research for detection, prevention, and treatment. Various imaging modalities like mammography, ultrasound, and magnetic resonance imaging have been employed. Additionally, the stiffness of breast cancer tissue compared to normal tissue is now considered an early indicator of the disease. This stiffness information has been utilized for breast cancer classification using Shear Wave Elastography (SWE), a non-invasive technique that offers real-time assessment of breast tissue elasticity [5, 6].
Recent advances in quantitative microscopy and high-performance computing have enabled the development of high-throughput image-based measurements through techniques like High-Content Analysis (HCA) [7]. These measures not only provide precise quantitative data on parameters such as nuclear size, morphology, and DNA replication but also allow for the screening of thousands of cells [8]. Scholars have employed machine learning algorithms like principal component analysis, random forests, k-nearest neighbors, and support vector machines for population averaging before AI (ML) calculations [9, 10].
The emergence of DNA microarray technologies has revolutionized biological research, allowing for the simultaneous measurement of gene expression levels for thousands of genes [11]. This high-throughput data has enabled the detection of complex gene interactions within Gene Regulatory Networks (GRNs) [13]. Understanding the regulatory networks that maintain health and prevent cancer and other diseases is a critical recent development in cancer research [12].
Regulatory networks play a significant role in the pathogenesis of cancer at the molecular level [14]. Understanding these networks can improve diagnostic methods, identify gene therapy candidates, and elucidate drug targets. This deeper understanding of regulatory processes has shifted the focus of cancer drug discovery from chemicals that kill cancer cells to molecular targets underlying cell transformation [15].
Protein levels, including the presence or absence of three cell receptors (estrogenic receptor ESR1, progesterone receptor PGR, and human epidermal growth factor receptor-2 ERBB2), are used to classify breast cancer types and guide chemotherapy decisions [16]. Although only a limited number of molecular features are assessed, this classification is valuable for treatment planning.
The rapid progress in artificial intelligence, especially deep learning, has significantly enhanced the use of these methods in accurate cancer diagnosis. Deep learning algorithms process data through multiple layers of neural network computations, progressively learning and refining image representations [17]. Parameters such as nuclear pleomorphism, nuclear atypia, and mitotic count are crucial for predicting malignant breast tumors on histopathology slides. However, detecting mitotic cells, particularly those in the 3rd nuclear row, remains challenging. So the study aims to evaluate two machine learning models for whole breast radiation therapy based on specific criteria [18]. The base model employs a custom-made array-based U-NET architecture, while the second model utilizes DPS [19].
2 Contributions
In this paper, automated detection of mitosis and nuclear atypia in breast cancer using machine learning hybrid technique in histopathological images is proposed. The main technical contributions of our project can be summarized as follows:
-
1.
An improved Non-restricted Boltzmann Deep Belief Neural Network (NB-DBNN) is used for Nuclei segmentation and extracts hidden features from the segmented cells.
-
2.
Giraffe Kicking Optimization (GKO) algorithm is used for feature optimization to selects optimal best among multiple features.
-
3.
An Optimal layered Kernel-based Support Vector Machine (OK-SVM) classifier is introduced to detect mitosis cell count and nuclear atypia.
-
4.
Nottingham Grading System (NGS) is used to compute the grading levels of breast cancer pathological images.
-
5.
To validate the proposed technique using MITOSIS-ATYPIA-14 database, exploratory and simulation results are prepared for accuracy, precision, recall, specificity and quantification.
2.1 Paper organization
The rest of this article is organized as follows: Section 2 discuss review of recent studies on mitosis diagnosis and nuclear atypia in cancer. In Section 3, we discuss the problem statement and system model. Section 4 discusses the proposed methodology. Section 5 discusses the Simulation Results and Comparative Analysis. Section 6 concludes the paper.
3 Related works
Detecting mitosis and nuclear atypia is vital for cancer diagnosis and staging. Researchers are working on automated methods to make this process faster, more reproducible, and less burdensome for operators, ultimately improving cancer staging.
3.1 Review on breast cancer detection
Zhang et al. [20] introduced an innovative learning structure that integrates mathematical calculations and autoencoder neural networks to identify various gene expression patterns. They developed a group classifier using the Principal Component Analysis-Autoencoder-AdaBoost (PCA-AE-Ada) algorithm for breast cancer risk prediction. In parallel tests, an additional classifier (PCA-Ada) following the same framework as the proposed method was implemented, with the primary distinction being the training input.
Qi et al. [21] advocated for the application of this capability to higher-order tasks, demonstrating a learning framework for the classification of histopathological images in breast cancer.
Fatima et al. [22] conducted a comprehensive analysis comparing artificial intelligence, deep learning, and data mining techniques for cancer prediction.
Zhang et al. [23] proposed a classification approach based on genetic markers analyzed from various microarray studies to predict clinical outcomes in cancer patients. However, they observed that multiple markers contributed to the study with a low density ratio when individual data points were widely dispersed, which is considered a significant finding.
Byra et al. [24] presented a deep learning-based approach for predicting the response to Neo Adjuvant chemotherapy (NAC) in ultrasound imaging. They employed deep Convolutional Neural Networks (CNN) to build predictive models for treatment response using transfer learning.
Arya et al. [25] introduced a robust model aimed at minimizing the adverse effects and clinical costs associated with unnecessary cancer treatments. This model utilizes clinical expertise for early diagnosis and the selection of the most appropriate cancer treatment plan. It employs robust deep learning models for classification, generating informative features through random jumps in multivariate data to enhance breast cancer detection.
Gopal et al. [26] proposed an algorithm for the early detection of breast cancer using the Internet of Things (IoT) and machine learning. Their approach achieved impressive precision, recall, quantification, prediction, and precision rates of 98%, 97%, 96%, and 98%, respectively. Additionally, they evaluated the minimum classification error rates, which were 34.21%, 45.828%, and 64.47% for mean absolute error (MAR), root mean square error (RMSE), and relative absolute error (RAE). The results suggest that the MLP classifier outperforms LR and RF classifiers in terms of accuracy and error rates.
Bakx et al. [27] proposed an accurate U-NET model for assessing the distribution of whole breast Radiation Therapy (RT). The differences between the two models in the predicted distributions were relatively small and not significantly different from clinical applications. The results of both models were visualized in an automated plot generation process.
Surender et al. [28] introduced an early detection algorithm for breast cancer that leverages computer vision, image processing, clinical assessment, and neural processing.
Das et al. [29] presented a method that utilizes a deep learning modeling process to transform one-dimensional data into images. This approach is based on converting structured data into images and designing a systematic deep learning model that enhances performance compared to individual models. The proposed model was evaluated for breast cancer detection using gene expression datasets and breast histopathology images.
Wang et al. [36] presented a hybrid deep learning (CNN-GRU) model for the automatic detection of BC-IDC (+ , −) using Whole Slide Images (WSIs) of the well-known PCam Kaggle dataset. In this research, the proposed model used different layers of architectures of CNNs and GRU to detect breast IDC (+ , −) cancer. The validation tests for quantitative results were carried out using each performance measure (accuracy (Acc), precision (Prec), sensitivity (Sens), specificity (Spec), AUC and F1-Score. The model showed the best performance measures (accuracy 86.21%, precision 85.50%, sensitivity 85.60%, specificity 84.71%, F1-score 88%, while AUC 0.89 which overcomes the pathologist’s error and miss classification problem.
3.2 Review on breast cancer grading
Mitotic transitions can be studied effectively by combining fully connected layers with random jump classification [27], allowing the extraction of features from nuclear fragments and accurate prediction of nuclear cell classes. This approach adapts the threshold to accurately represent cell nuclei, even with limited training data, resulting in a high level of accuracy. The framework utilizes pre-processed models that are carefully designed and feature-based. Additionally, it employs restricted sampling to capture mitotic signatures in breast histopathology [25]. However, utilizing a deep learning framework within another deep learning framework presents three unique challenges. Comparative results demonstrate the effectiveness of the proposed approach, as evidenced by metrics such as F-score, accuracy, and review value when compared to other available strategies.
In a related study, a deep learning-based automated and precise mitosis detection method utilizes a semantic segmentation model to analyze 630 breast histopathology images [30]. This technique employs a filtering process to identify potential mitotic cells and utilizes focal loss as an annotation center to outline a semantic distribution network, achieving optimal performance. Another previously described approach [31] leverages prior data to perform mitosis detection with annotated points. Spatial location control, based on value weights derived from positive and negative ratios, mitigates mispredicted components and uniqueness types through multiple instance learning (MIL). Furthermore, a rapid and accurate method [32] was employed for the automatic detection of mitosis in histopathological images, with a higher threshold used to identify more mitotic lines and detect mitotic candidates.
Wang et al. [37] in their observational retrospective study, routine WSIs stained with haematoxylin and eosin from 1567 patients were utilised for model optimisation and validation. DeepGrade provided independent prognostic information for stratification of NHG 2 cases in the internal test set, where DG2-high showed an increased risk for recurrence (hazard ratio [HR] 2.94, 95% confidence interval [CI] 1.24–6.97, P = 0.015) compared with the DG2-low group after adjusting for established risk factors (independent test data).
In Jiang et al. [40], Nine hundred-eight subjects with invasive breast cancer and preoperative MRI scans were retrospectively obtained. The Rad-Grade showed independent prognostic value for re-stratification of NHG 2 tumors, where RG2-high had an increased risk for recurrence (HR 2.20, 1.10–4.40, p = 0.026) compared with RG2-low after adjusting for established risk factors. RG2-low shared similar phenotypic characteristics and RFS outcomes with NHG 1, and RG2-high with NHG 3, revealing that the model captures radiomic features in NHG 2 that are associated with different aggressiveness. Table 1 summarizes the current research gaps in breast cancer studies.
4 Problem statement
4.1 Research gaps
A rapid and precise method for the automatic detection of mitosis in histopathological images has been successfully demonstrated [33]. This method leverages morphological regions that consider the spatial scale of cell division. By innovatively manipulating scale space, it enhances the interaction entropy between particles and matter, contributing to improved detection accuracy. Cells separated into mitotic and non-mitotic categories were placed using random forest classification with weighted votes. Through empirical analysis and performance comparisons, our method exhibits significant superiority over all other approaches for mitosis detection across a range of challenging datasets. Each step of our method is finely tuned through extensive comparisons. Additionally, a computer-assisted technique [34] was employed to assess nuclear atypia, involving the grading of high-power field hematoxylin and eosin-stained images. Initially, we extract various nuclear features, encompassing morphology and structural characteristics, from previously segmented nuclear regions. Subsequently, we calculate histograms to capture statistical information about the numerical data. Finally, we employ a Support Vector Machine (SVM) classifier to categorize high-power field images into distinct types of nuclear atypia.
Early detection of breast cancer, one of the most prevalent diseases among women worldwide, is crucial for improving treatment outcomes. Mitosis and nuclear atypia are key factors in the histopathological assessment that determines the diagnosis and staging of breast cancer. The conventional method of transmitting images to pathologists is time-consuming and subject to subjective interpretation, and its accuracy diminishes as chromatin concentration increases in cell nuclei. While this method is highly reliable for scoring cell nuclei with at least three points, its accuracy declines with higher chromatin density. Despite nuclear atypia being a primary criterion for cancer staging, it has not received adequate research attention, and existing systems often struggle to accurately detect and classify critical cancer cells. Therefore, there is a pressing need to address these research gaps. Recently, several methods have been proposed for detecting nucleus, tubule, and mitosis in breast cancer images, but imaging the nuclear core of high-power fields remains a challenging and underexplored area. To bridge these gaps, we propose an automated approach for the detection of mitosis and nuclear atypia in breast cancer staging.
4.1.1 Objectives
The main objectives of our proposed method are:
-
1.
To identify the synergy between mitosis and nuclear atypia enhances the potential for early breast cancer detection and diagnosis.
-
2.
To introduce a kernel learning technique based on novel input images enhances recognition accuracy.
-
3.
To create an optimization algorithm for feature selection that addresses dimensional data challenges.
-
4.
To develop a hybrid machine learning approach for mitosis and nuclear atypia detection, as well as to assess the quality of cancer imaging.
4.2 Motivation
The motivation behind this research is to improve early breast cancer detection, focusing on mitosis and nuclear atypia in high-power field (HPF) images. Current manual assessments by pathologists are time-consuming and subjective, necessitating automated methods. Our approach combines these factors for enhanced detection, utilizing innovative kernel learning techniques, optimization algorithms, and a hybrid machine learning approach. The goal is to provide a robust and efficient solution for accurate breast cancer diagnosis, improving patient outcomes and reducing subjectivity. Our method incorporates the Non-restricted Boltzmann Deep Belief Neural Network (NB-DBNN) for nuclei segmentation, the Giraffe Kicking Optimization (GKO) algorithm for feature optimization, and the Optimal Kernel layer-based Support Vector Machine (OK-SVM) classifier for mitotic cell and nuclear atypia detection. It also employs the Nottingham Grading System (NGS) for comprehensive grading. Compared to traditional methods reliant on manual intervention, our approach offers objectivity, consistency, and potential accuracy. It leverages machine learning and deep learning to efficiently process histopathological data, enhancing breast cancer diagnosis and grading. Our methodology represents a novel and unexplored approach in breast cancer diagnosis and grading.
4.3 System model of proposed technique
Hybrid techniques are used to combine the strengths of different methods and address diverse aspects of complex problems, leading to improved performance, robustness, and adaptability. Figure 1 illustrates a modern approach to detecting mitotic and nuclear atypia, which is recommended for breast cancer staging. The process begins with feed classification and color segmentation applied to the original hematoxylin and eosin stain RGB image. Since the nuclear regions were effectively separated from the background, a grayscale image derived from the H-stained image was chosen for further processing. Core region segmentation is then carried out using NB-DBNN, considering core compactness, spatial extent, and regional-scale morphology. Subsequently, hidden features from various segments are extracted from the root parts, which are subsequently divided. Each feature undergoes histogram calculation to capture statistical information about nuclei in each high-power field (HPF) image. These feature histograms are incorporated into the OK-SVM classifier, resulting in the final step of detecting mitotic cell count and nuclear atypia.
5 Proposed methodology
5.1 Nuclei segmentation
The samples on the surface exhibit some variation in density and cell phase, but this variation is less pronounced than in the patterns found at the edges of the cells and the background. These differences in cell characteristics, background, and edges are not only observed in RGB H&E-stained images but also in the red-channel images of these slides. In our preprocessing and segmentation, we exclusively utilize the red channel of histopathological images. To enhance relative entropy through a right triangle, we reposition the image to allocate information to the corners, reducing the information in the background. To address this, we introduce a deep neural network architecture called "NB-DBNN," based on the Restricted Boltzmann Machine (RBM), which extracts hidden features from segmented cells. Recently, the high-fidelity Deep Belief Network (DBN) has been widely adopted for segmentation tasks. The architecture consists of multiple DBN structures, and the top layer employs the Restricted Boltzmann Machine (RBM) algorithm. The NB-DBNN is a dual-layer design with visible and hidden layers, where "w" represents the weight network connecting the visible and hidden layers, and "w" denotes the vectors of the visible and hidden layers. The energy potential of nodes in the hidden layer is then calculated between the visible and hidden layers (Figs. 2 and 3).
The energy function between the two layers is determined by the node values in the visible layer and the activation of the first node in the hidden layer. Specifically, the nodes in the visible layer are set to 0 and 1 in the hidden layer. The normal distribution function represents the summation of all levels in both the visible and hidden layers. This corresponds to the normal probability density distribution of expressed and latent genes;
Since ν is the position of the apparent angles of the stack, i. Thus, the hidden process introduces the power of the node layer.
At the same time, when the locations of the hidden vertices of layer "g" are known, c. The raw material is usually distributed.
where F is the initiation capability. Other initialization parameters can be determined to enhance the network. The number of hidden layer nodes is not fixed during the validation process. In NB-DBMM, the maximum likelihood threshold θ is used to find the parameters. This threshold is an important factor in the network's performance and can be adjusted as needed.
Similarly, if updates are necessary, variations among different devices will be applicable
η is the learning pace of the model, τ is the speed factor, The average fractional derivative < . > of a given distribution is defined as the mean after fitting the regression model. NB-DBNN employs a greedy algorithm for hidden training and error propagation to demonstrate the network. Therefore, the parameters of Adaptive Sparse NB-DBNN are as follows:
As an initial attempt to create a compact representation that captures the desired emotional state and neuron oscillation, a novel miniature representation using two Persian words is introduced. The basic workflow of the NB-DBNN algorithm is outlined in Algorithm 1.
5.2 Hidden feature extraction
Feature analysis is a quantitative method for measuring and identifying structural abnormalities in various brain tissues. Since embedding patterns or density in the brain at the gray level is challenging, additional texture features are required for further classification. Texture represents an inherent characteristic of all surfaces, defining visual patterns that contain detailed information about surface structure extent and spatial distribution on neighboring surfaces. Local features are extracted using curvilinear combinations and gray-level co-occurrence matrices (GLCM). A color histogram illustrates the frequency of a particular color in image editing. Color maps can be generated in various color spaces, typically divided into multiple distinct channels, each with its color value. This approach effectively determines the number of images within a given color range. Fourteen linear features are computed using GLCM, providing insights into the scale of accumulated values in the offset data.
"Haralick's Features: A Statistical Approach to the Distribution of Gray Values in a Local Region" assesses the local characteristics of each MRI image and extracts statistical parameters based on these local features. It calculates the number of pixels in a pixel by summing the values of neighboring pixels "i" and "j" and then dividing the entire array by an even number. Each data entry comprises the pixel value adjacent to pixel "i," resulting in fourteen statistical terms that collectively describe the structure using the same resultant matrix.
Local Binary Patterns (LBP) is a visual representation used in computer vision. Combining LBP with the histogram descriptor known as Histogram of Oriented Gradients (HOG) has been demonstrated to substantially enhance detection performance. In this technique, every pixel within a cell is compared to its eight neighboring pixels. If the center pixel value is greater than that of its neighboring pixel, it is labeled as "0"; otherwise, it is labeled as "1". This process transforms the input image into a 256-dimensional vector representation.
5.3 Feature optimization
Next, the Giraffe Kicking Optimization (GKO) algorithm is utilized for feature optimization, enabling the selection of the most appropriate feature among multiple options while addressing redundancy and data size concerns. To accomplish this, the algorithm combines the relevant pieces based on the distance measure learned previously, effectively classifying the leopard as a unified solid entity. GKO algorithm is employed to optimize features for the automated analysis of extensive observational data, particularly focusing on identifying abnormal behavior. This process begins with the selection of multiple features extracted from the data. GKO aims to choose features that are non-redundant, meaning they offer unique information, while also considering data size issues to reduce computational demands. A crucial aspect of GKO is the use of a predefined distance measure that quantifies the similarity or dissimilarity between features. This measure helps determine which features best represent the data. The goal is to classify the observed entity, in this case, a leopard, as a complete solid body based on these selected features and the distance measure, enabling automated analysis without the need for continuous human intervention. Ultimately, GKO streamlines the analysis of extensive data by optimizing feature selection [39] to enhance the system's efficiency in detecting abnormal behavior. The objective is to create a system capable of automatically analyzing extensive hours of observational data, focusing on small segments that human analysts typically assess for signs of abnormal behavior.
As it's impractical to anticipate all forms of abnormal behavior, we exclude maps associated with known animal behaviors and aim to identify the most trustworthy maps for the entire query.
Several frames in a video exhibit highly unusual behavior, denoted as S frames. Consequently, the capacity to recognize certain common animal behaviors is a crucial prerequisite for our approach. We calculated the time interval between two exposures. Following gj, gi should be employed to assess the instantaneous probability of accurate detection, the temporal sequence, and the detector's recall ratio (R).
In practice, processing all the exposures simultaneously, especially in the case of long videos, is not advisable. To address this, we illustrate the issue by partitioning the video into five-minute segments, each comprising 300 frames, and then applying the Hungry Multiframe algorithm to each of these segments. Of course, there are alternative approaches to creating video summaries. In this study, we compared two fundamental methods. The first prototype follows a uniform temporal pattern: a frame is added every five minutes.
In our second baseline approach, we utilize the motion history of the video frames to quantify the amount of motion. Frames with a significant history of kinetic energy, denoted as mg(s), are included.
A motion history image is generated by exponentially diminishing the dissimilarity between images from the previous time interval. Thus, we establish two-way associations within every stage of human development and across various environmental contexts. We define relatedness among individuals based on their group membership, specifically, individuals within the same group observed at daily intervals are considered related. To quantify this relatedness, we computed a half-weighted association index from the pairwise matrix, which measures the proportion of time pairs spent together versus time spent separately.
To incorporate socialization as a structural variable, it is essential to eliminate cycles. This is because the contact index between individuals j and i, when considering socialization, increases joint stability while reducing the correlation index between i and j. This is achieved by multiplying by the sum of all elements except for oneself.
To achieve this, a real-number representation should be employed, even when the random variable is discrete and non-continuous. Observations are categorized into three factors based on molecular experiments.
The aim of this analysis is to discern the primary effects and interactions of each factor and determine which of these effects should be disregarded or minimized.
PDF is utilized to generate values for each classical gene, meaning the function should be within the range of possible values for the optimization variable. This can be altered using a random walk technique known as "light flight," which involves a sequence of random steps. Nevertheless, randomness plays a crucial role in practical applications and innovation.
Lévy planes are employed to enhance the local search efficiency of the algorithm during the exploitation phase and the global search efficiency during the detection phase. The selection of flight paths may be assessed for optimization.
Both are randomly generated numbers within the range [0, 1]. The optimization process using the feature optimization algorithm GKO is outlined in Algorithm 2.
5.4 Breast cancer grading
An Optimized Kernel layer-based Support Vector Machine (OK-SVM) classifier is used to identify mitotic cells and nuclear atypia from selected features. We also employ the Nottingham Grading System (NGS) to determine the grade of breast cancer from pathological images. As a result, the kernel function typically maps the training data into a non-linear equation surface and higher-dimensional spaces. Data Value Metric (DVM) is a classification formula used to partition data into evidence and predictions. Each event in the organization consists of a target value and attributes, along with a set of attributes that can be utilized to forecast variances. Let Y and X be the input and output data sets, and a simple system is established to find the appropriate x \(\in\) X given a previously observed value of y \(\in\) Y:
The α kernel function parameters need to be adjusted more accurately for more accurate image classification results. The Optimal Kernel SVM (OK-SVM) classifier can adapt to various kernel functions, including linear, polynomial, and radial basis functions, to achieve efficient image segmentation. In OK-SVM, each hyperplane is characterized by a normal vector parameter and a constant "B" that is perpendicular to the hyperplane.
h is the ideal hyperplane expression, and h1 and h2 are finite planes.
When a linear kernel model is inappropriate, a multi-kernel method should be used, which efficiently fits more complex data. The expression of the polynomial kernel method is given as follows:
where, \(\Phi\)(y) is mapping function in a high dimensional space. A chord is considered open if it is preceded by other long chords or a syllable. For the remaining sequences, the frequencies representing normal and abnormal stacks, as well as their frequency of occurrence in the training data set, were calculated. Another communication system is employed.
It is used in code to detect malicious strings. Here, ε percentage sequences are selected in the dictionary system. Finally, space is a property of vectors.
The sliding window size K is predefined. The K value plays an important role in model discovery because it represents the feature extraction and model training problem. For each y, let the input be y and the support vector be y1. It is written as:
The space dimensions between data and vectors are called kernels. A linear combination of inputs is called a linear kernel. Kernel types such as polynomial and radial scale the input to their higher values. The optimal kernel polynomial is written as;
The accepted default value for gamma is 0.1, but gamma values can be between 0 and 1. It can be used to create complex parts in two-dimensional space. The work process and step-by-step procedure for breast cancer detection using OK-SVM is shown in Fig. 4.
6 Results and discussion
In this section, we validate the breast cancer detection technique using the open source test database MITOSIS-ATYPIA-14. First, we describe a comparative analysis of the proposed and existing segmentation algorithms. Then discuss the comparative analysis of the feature selection process with various performance measures such as precision, accuracy, recall and F-measure. A comprehensive evaluation of breast cancer classification is performed using various statistical measures. The results are then compared with the current state of the art methods.
6.1 Dataset description
We utilized the MITOS-ATYPIA-14 dataset for this replication. The slides were stained with standard hematoxylin and eosin (H&E) and examined using two slide scanners: Aperio Scanscope XT and Hamamatsu Nanozoomer 2.0-HT. On each slide, the pathologist selected various images at X20 magnification. An X20 frame is divided into four frames at X40 magnification. Specifically, the Aperio magnification data was captured by our X40 assay. These files are 24-bit RGB bitmap images with an image size of 1539 × 1376 and an image resolution of 4.073/µm. Figure 2 displays sample images of mitotic and non-mitotic cells in the MITOS-ATYPIA-14 dataset. Our proposed framework was validated with five different objectives (1 to 5 objectives), and their details are summarized in Table 2.
6.2 Analysis of segmentation process
We compare the segmentation performance of our proposed NB-DBMM with Relative Entropy Maximized Space Segmentation (REMSS) segmentation [33]. On the other hand, our proposed technique can avoid the maximum partition as it employs a specific grayscale threshold corresponding to the maximum relative entropy between the nuclei and the background for the spatial organization of the image scale.
6.3 Comparative analysis of feature optimization
In this section, we evaluate the performance of the proposed optimization method using the MITOSIS-ATYPIA-14 dataset. Table 3 presents a comparative analysis of the proposed GKO algorithm with various models and existing experimental classifiers, namely Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), and Logistic Regression (LR) [35].
For the case without feature optimization
The comparison between the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers reveals maximum accuracy at 9.212%, 1.011%, and 7.336% efficiency, respectively. The comparison shows maximum precision at 1.028%, 9.368%, and 7.460% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum Recall is observed at 2.532%, 7.428%, and 15.116% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum specificity is seen at 1.504%, 12.060%, and 11.303% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum specificity is also observed at 1.79%, 8.403%, and 11.475% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively.
For shape features only
The comparison between the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers shows maximum accuracy at 1.339%, 6.488%, and 10.758% efficiency, respectively. The comparison reveals maximum precision at 1.362%, 6.597%, and 10.938% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum Recall is observed at 3.831%, 11.396%, and 14.387% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum specificity is seen at 4.546%, 15.875%, and 12.601% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum F-measure is also observed at 2.619%, 9.073%, and 12.707% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively.
For GLCM features only
The comparison between the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers shows maximum accuracy at 2.455%, 9.205%, and 8.061% efficiency, respectively. The comparison reveals maximum precision at 2.496%, 9.359%, and 8.196% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum Recall is observed at 0.781%, 7.636%, and 20.768% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum specificity is seen at 0.59%, 9.227%, and 11.112% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum F-measure is also observed at 1.641%, 8.501%, and 14.979% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively.
For Intensity features only
The comparison between the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers shows maximum accuracy at 3.66%, 14.175%, and 6.141% efficiency, respectively. The comparison reveals maximum precision at 3.721%, 14.41%, and 6.243% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum Recall is observed at 4.251%, 5.357%, and 12.488% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum specificity is seen at 3.654%, 15.756%, and 13.065% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum F-measure is also observed at 3.98%, 10.86%, and 9.49% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively.
For GKO algorithm-based feature optimization
The comparison between the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers shows maximum accuracy at 2.795%, 7.842%, and 3.957% efficiency, respectively. The comparison reveals maximum precision at 2.842%, 7.971%, and 4.022% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum Recall is observed at 1.88%, 3.401%, and 9.486% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum specificity is seen at 2.72%, 10.83%, and 9.074% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum F-measure is also observed at 2.361%, 5.729%, and 6.849% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively.
6.4 Comparative analysis of existing and proposed classifiers
The SVM classification of the MITOSIS-ATYPIA-14 dataset is presented. Table 4 illustrates the performance of OK-SVM classifiers in the MITOSIS-ATYPIA-14 dataset. Figure 4 provides a graphical overview of OK-SVM event classification accuracy, precision, recall, specificity, and F-measure. The plot clearly shows that the average accuracy of the proposed OK-SVM classifier is 96.674%, the average precision is 96.127%, the average recall is 95.893%, and the average specificity is 96.127%. The recommended feature-based OK-SVM classifier achieves an accuracy of 95.748% and an F-measure of 96.01%. Table 5 describes the comparative analysis of the proposed OK-SVM classifier with current state-of-the-art classifiers: DBN-MCS [27], CNN [27], CNN-RELU [25], CUHK [30], Deepmitosis [30], CanN (alone) [30], CanN (med) [30], Segmitos-Random [30], PSPNet [31], CPCN [31], RF-REMSS [33] (Fig. 5).
In Fig. 6, we compare the accuracy of the proposed method and the current method for brain cancer grading. Here is a graph showing the accuracy of our proposed OK-SVM classifier. state-of-the-art DBN-MCS [27], CNN [27], CNN-RELU [25], CUHK [30], Deepmitosis [30], CanN (Discrete) [30], CanN (Average) [30], Segmitos—Random [30], PSPNet [31], CPCN [31] and RF-REMSS [33] classifiers, resp. In Fig. 7, we compare the existing with the proposed method for brain cancer staging. The recall of the proposed OK-SVM classifier was 2.673%, 6.01%, 7.991%, 68.705%, 53.782%, 50.132, 47.118%, 47.598%, 51.5198%, 85.519%, 51.519%, and 51.519%. DBN-MCS [27], Rhoncus [27], Cnn-RELU [25], CUHK [30], DeepMitosis [30], CanN (single) [30], CanN (average) [30]], Segmitos-Randa [30], PSPNet [31], CPCN [31], RF-REMSS [33] classifiers.
In Fig. 8, we compare the performance of our proposed method with existing methods for brain cancer staging. The OK-SVM classifier achieved an accuracy of 92.75%, precision of 89.56%, recall of 87.66%, specificity of 29.44%, F1 score of 43.75%, and AUC of 47.25%. The current state-of-the-art classifiers achieved an accuracy of 86.60%, precision of 68%, recall of 29.44%, specificity of 43.75%, F-measure of 47.25%, and AUC of 50.14%. In Fig. 9, we compare the F- measure of the proposed technique and the current approach for brain cancer management. The F-measure of the proposed OK-SVM is 95.748%. The F-measure of the current state-of-the-art classifiers (DBN-MCS [27], CNN [27], CNN-RELU [25], CUHK [30], DeepMitosis [30], CanN (single) [30], CanN (average) [30], SegMitosis-Random [30], PSPNet [31], CPCN [31], and RF-REMSS [33]) are 83.4%, 88.6%, 90.24%, 35.64%, 43.37%, 44.35%, 48.25%, 56.2%, 52.01%, 56.22%, 78%, and 90.24%. respectively.
7 Conclusion
A hybrid machine learning technique is proposed for breast cancer staging, involving the automatic detection of mitotic and nuclear atypia from histopathological images. This approach combines Augmented NB-DBNN to extract hidden features from segmented cells and utilizes the GKO algorithm for feature optimization, selecting the best features among multiple options. To detect mitotic cells and nuclear atypia, an optimized kernel-based support vector machine (OK-SVM) classifier is introduced. The proposed technique also incorporates the Nottingham Grading System (NGS) to grade pathological images of breast cancer. To validate this approach, the MITOSIS-ATYPIA-14 test database is used. Simulation results demonstrate that the performance of the proposed OK-SVM classifier surpasses that of existing state-of-the-art classifiers. The average accuracy, precision, recall, specificity, and F-measure of the proposed OK-SVM classifier are 96.674%, 96.17%, 95.893%, 95.748%, and 96.01%, respectively. Future work is to integrate deep learning techniques for even more accurate breast cancer detection and classification.
Data availability
Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.
References
Buumba BM, Bhardwaj S, Kaur P (2021) A critical review on recent development of techniques and drug targets in the management of breast cancer. Mini Rev Med Chem 21(15):2103–2129
Kaur Gagandeep, Gupta Ruchika, Hooda Nistha, Gupta Nidhi Rani (2022) Machine learning techniques and breast cancer prediction: A review. Wirel Personal Commun 125(3):2537–2564
Manju A, Arivukarasi M, Mahasree M (2022) AEDAMIDL: An Enhanced and Discriminant Analysis of Medical Images using Deep Learning. In 2022 Third International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE) 1–8. IEEE
Sonik D, Colarossi A (2020) Becoming Artificial: A Philosophical Exploration Into Artificial Intelligence and what it Means to be Human. Vol. 73. Andrews UK Limited
Hampson Ry ((2021)) Elasticity mapping for breast cancer diagnosis using tactile imaging and auxiliary sensor fusion
Moroni S, Casettari L, Lamprou DA (2022) 3D and 4D Printing in the Fight against Breast Cancer. Biosensors 12(8):568
Elumalai S, Managó S, Luca ACD (2020) Raman microscopy: progress in research on cancer cell sensing. Sensors 20(19):5525
Shahbandi Ashkan, Chiu Fang-Yen, Ungerleider Nathan A, Kvadas Raegan, Mheidly Zeinab, Sun Meijuan JS, Tian Di et al (2022) Breast cancer cells survive chemotherapy by activating targetable immune-modulatory programs characterized by PD-L1 or CD80. Nat Cancer 3(12):1513–1533
Kushwah VS, Saxena A, Pahariya JS, Kumar SG (2021) Support Vector Machine Technique to Prognosis Breast Cancer. In Soft Computing: Theories and Applications: Proceedings of SoCTA 2020, 2, pp. 339–351. Springer Singapore
Singh, Gurinder, Chaturvedi P, Shrivastava A, Vikram Singh S (2022) Breast Cancer Screening Using Machine Learning Models. In 2022 3rd International Conference on Intelligent Engineering and Management (ICIEM) 961–967. IEEE
Abd-Elnaby M, Alfonse M, Roushdy M (2021) Classification of breast cancer using microarray gene expression data: A survey. J Biomed Inform 117:103764
Singh S, Numan A, Maddiboyina B, Arora S, Riadi Y, Shadab Md, Alhakamy NA, Kesharwani P (2021) The emerging role of immune checkpoint inhibitors in the treatment of triple-negative breast cancer. Drug Discovery Today 26(7):1721–1727
Zhang Q, Xiao Y, Dai W, Suo J, Wang C, Shi J, Zheng H (2016) Deep learning based classification of breast tumors with shear-wave elastography. Ultrasonics 72:150–157
Ahmad FK, Deris S, Othman NH (2012) The inference of breast cancer metastasis through gene regulatory networks. J Biomed Inform 45(2):350–362
Lawrence RT, Perez EM, Hernández D, Miller CP, Haas KM, Irie HY, Lee SI, Blau CA, Villén J (2015) The proteomic landscape of triple-negative breast cancer. Cell Rep 11(4):630–644
Pérez NP, López MAG, Silva A, Ramos I (2015) Improving the Mann-Whitney statistical test for feature selection: An approach in breast cancer diagnosis on mammography. Artif Intell Med 63(1):19–31
Kim S, Choi Y, Lee M (2015) Deep learning with support vector data description. Neurocomputing 165:111–117
Krekel NMA, Zonderhuis BM, Stockmann HBAC, Schreurs WH, Van Der Veen H, de Klerk EDL, Meijer S, Van Den Tol MP (2011) A comparison of three methods for nonpalpable breast cancer excision. Eur J Surg Oncol (EJSO) 37(2):109–115
Renaudeau C, Lefebvre-Lacoeuille C, Campion L, Dravet F, Descamps P, Ferron G, Houvenaeghel G, Giard S, de Lara CT, Dupré PF, Fritel X (2016) Evaluation of sentinel lymph node biopsy after previous breast surgery for breast cancer: GATA study. The Breast 28:54–59
Pobiruchin M, Bochum S, Martens UM, Kieser M, Schramm W (2016) A method for using real world data in breast cancer modeling. J Biomed Inform 60:385–394
Garibaldi JM, Zhou SM, Wang XY, John RI, Ellis IO (2012) Incorporation of expert variability into breast cancer treatment recommendation in designing clinical protocol guided fuzzy rule system models. J Biomed Inform 45(3):447–459
Olsson N, Carlsson P, James P, Hansson K, Waldemarson S, Malmström P, Fernö M, Ryden L, Wingren C, Borrebaeck CA (2013) Grading breast cancer tissues using molecular portraits. Mol Cell Proteomics 12(12):3612–3623
Veta M, Van Diest PJ, Willems SM, Wang H, Madabhushi A, Cruz-Roa A, Gonzalez F, Larsen AB, Vestergaard JS, Dahl AB, Cireşan DC (2015) Assessment of algorithms for mitosis detection in breast cancer histopathology images. Med Image Anal 20(1):237–248
Sanchez-Garcia F, Villagrasa P, Matsui J, Kotliar D, Castro V, Akavia UD, Chen BJ, Saucedo-Cuevas L, Barrueco RR, Llobet-Navas D, Silva JM (2014) Integration of genomic data enables selective discovery of breast cancer drivers. Cell 159(6):1461–1475
Sarkar S, Vinay S, Djeddi C. Maiti J (2022) Classification and pattern extraction of incidents: a deep learning-based approach. Neural Comput Applic 1–22
Kapsner LA, Ohlmeyer S, Folle L, Laun FB, Nagel AM, Liebert A, Schreiter H, Beckmann MW, Uder M, Wenkel E, Bickelhaupt S (2022) Automated artifact detection in abbreviated dynamic contrast-enhanced (DCE) MRI-derived maximum intensity projections (MIPs) of the breast. European Radiology 1–11
Iqbal A, Sharif M, Yasmin M, Raza M, Aftab S (2022) Generative adversarial networks and its applications in the biomedical image segmentation: a comprehensive survey. Int J Multimedia Inf Retrieval 1–36
Cevik M, Angco S, Heydarigharaei E, Jahanshahi H, Prayogo N (2022) Active Learning for Multi-way Sensitivity Analysis with Application to Disease Screening Modeling. J Healthc Inf Res 6(3):317–343
Rustam F, Imtiaz Z, Mehmood A, Rupapara V, Choi GS, Din S, Ashraf I (2022) Automated disease diagnosis and precaution recommender system using supervised machine learning. Multimedia Tools and Applications 1–24
Zhang D, Zou L, Zhou X, He F (2018) Integrating feature selection and feature extraction methods with deep learning to predict clinical outcome of breast cancer. IEEE Access 6:28936–28944
Qi Q, Li Y, Wang J, Zheng H, Huang Y, Ding X, Rohde GK (2018) Label-efficient breast cancer histopathological image classification. IEEE J Biomed Health Inform 23(5):2108–2116
Fatima N, Liu L, Hong S, Ahmed H (2020) Prediction of breast cancer, comparative review of machine learning techniques, and their analysis. IEEE Access 8:150360–150376
Zhang X, He D, Zheng Y, Huo H, Li S, Chai R, Liu T (2020) Deep learning based analysis of breast cancer using advanced ensemble classifier and linear discriminant analysis. IEEE Access 8:120208–120217
Byra M, Dobruch-Sobczak K, Klimonda Z, Piotrzkowska-Wroblewska H, Litniewski J (2020) Early prediction of response to neoadjuvant chemotherapy in breast cancer sonography using Siamese convolutional neural networks. IEEE J Biomed Health Inform 25(3):797–805
Arya N, Saha S (2021) Multi-modal advanced deep learning architectures for breast cancer survival prediction. Knowl-Based Syst 221:106965
Gopal VN, Al-Turjman F, Kumar R, Anand L, Rajesh M (2021) Feature selection and classification in breast cancer prediction using IoT and machine learning. Measurement 178:109442
Bakx N, Bluemink H, Hagelaar E, van der Sangen M, Theuws J, Hurkmans C (2021) Development and evaluation of radiotherapy deep learning dose prediction models for breast cancer. Physics Imaging Radiat Oncol 17:65–70
Surendhar SPA, Vasuki RJMTP (2021) Breast cancers detection using deep learning algorithm. Materials Today: Proceedings
Kaur J, Singara S (2018) Feature selection using mutual information and adaptive particle swarm optimization for image steganalysis. In 2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO) 538–544. IEEE
Meng J, Chang-Li Li, Xiao-Mao Luo, Zhi-Rui Chuan, Rui-Xue Chen, Chao-Ying Jin (2023) An MRI-based Radiomics Approach to Improve Breast Cancer Histological Grading. Acad Radiol
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
I have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Maheshwari, N.U., SatheesKumaran, S. Automatic Mitosis and Nuclear Atypia Detection for Breast Cancer Grading in Histopathological Images using Hybrid Machine Learning Technique. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-023-18078-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-023-18078-8