1 Introduction

In recent years, the application of machine learning in cancer grading, particularly breast cancer, has become increasingly crucial in early detection and prognosis (https://www.iarc.who.int/cancer-type/breast-cancer/.). Breast cancer is a leading cause of cancer-related deaths globally, with millions of new cases diagnosed each year. In 2020, breast cancer became the most commonly diagnosed cancer type in the world; there were more than 2.26 million new cases of breast cancer and almost 685 000 deaths from breast cancer worldwide [1]. To combat this, machine learning techniques, such as deep learning algorithms, have been integrated into clinical image processing to enable early and accurate cancer detection [2].

Advancements in sensor technologies and machine learning have paved the way for recognition systems capable of performing tasks that were previously unimaginable [3, 4]. Breast cancer, being the most common malignancy in women, has been a focus of research for detection, prevention, and treatment. Various imaging modalities like mammography, ultrasound, and magnetic resonance imaging have been employed. Additionally, the stiffness of breast cancer tissue compared to normal tissue is now considered an early indicator of the disease. This stiffness information has been utilized for breast cancer classification using Shear Wave Elastography (SWE), a non-invasive technique that offers real-time assessment of breast tissue elasticity [5, 6].

Recent advances in quantitative microscopy and high-performance computing have enabled the development of high-throughput image-based measurements through techniques like High-Content Analysis (HCA) [7]. These measures not only provide precise quantitative data on parameters such as nuclear size, morphology, and DNA replication but also allow for the screening of thousands of cells [8]. Scholars have employed machine learning algorithms like principal component analysis, random forests, k-nearest neighbors, and support vector machines for population averaging before AI (ML) calculations [9, 10].

The emergence of DNA microarray technologies has revolutionized biological research, allowing for the simultaneous measurement of gene expression levels for thousands of genes [11]. This high-throughput data has enabled the detection of complex gene interactions within Gene Regulatory Networks (GRNs) [13]. Understanding the regulatory networks that maintain health and prevent cancer and other diseases is a critical recent development in cancer research [12].

Regulatory networks play a significant role in the pathogenesis of cancer at the molecular level [14]. Understanding these networks can improve diagnostic methods, identify gene therapy candidates, and elucidate drug targets. This deeper understanding of regulatory processes has shifted the focus of cancer drug discovery from chemicals that kill cancer cells to molecular targets underlying cell transformation [15].

Protein levels, including the presence or absence of three cell receptors (estrogenic receptor ESR1, progesterone receptor PGR, and human epidermal growth factor receptor-2 ERBB2), are used to classify breast cancer types and guide chemotherapy decisions [16]. Although only a limited number of molecular features are assessed, this classification is valuable for treatment planning.

The rapid progress in artificial intelligence, especially deep learning, has significantly enhanced the use of these methods in accurate cancer diagnosis. Deep learning algorithms process data through multiple layers of neural network computations, progressively learning and refining image representations [17]. Parameters such as nuclear pleomorphism, nuclear atypia, and mitotic count are crucial for predicting malignant breast tumors on histopathology slides. However, detecting mitotic cells, particularly those in the 3rd nuclear row, remains challenging. So the study aims to evaluate two machine learning models for whole breast radiation therapy based on specific criteria [18]. The base model employs a custom-made array-based U-NET architecture, while the second model utilizes DPS [19].

2 Contributions

In this paper, automated detection of mitosis and nuclear atypia in breast cancer using machine learning hybrid technique in histopathological images is proposed. The main technical contributions of our project can be summarized as follows:

  1. 1.

    An improved Non-restricted Boltzmann Deep Belief Neural Network (NB-DBNN) is used for Nuclei segmentation and extracts hidden features from the segmented cells.

  2. 2.

    Giraffe Kicking Optimization (GKO) algorithm is used for feature optimization to selects optimal best among multiple features.

  3. 3.

    An Optimal layered Kernel-based Support Vector Machine (OK-SVM) classifier is introduced to detect mitosis cell count and nuclear atypia.

  4. 4.

    Nottingham Grading System (NGS) is used to compute the grading levels of breast cancer pathological images.

  5. 5.

    To validate the proposed technique using MITOSIS-ATYPIA-14 database, exploratory and simulation results are prepared for accuracy, precision, recall, specificity and quantification.

2.1 Paper organization

The rest of this article is organized as follows: Section 2 discuss review of recent studies on mitosis diagnosis and nuclear atypia in cancer. In Section 3, we discuss the problem statement and system model. Section 4 discusses the proposed methodology. Section 5 discusses the Simulation Results and Comparative Analysis. Section 6 concludes the paper.

3 Related works

Detecting mitosis and nuclear atypia is vital for cancer diagnosis and staging. Researchers are working on automated methods to make this process faster, more reproducible, and less burdensome for operators, ultimately improving cancer staging.

3.1 Review on breast cancer detection

Zhang et al. [20] introduced an innovative learning structure that integrates mathematical calculations and autoencoder neural networks to identify various gene expression patterns. They developed a group classifier using the Principal Component Analysis-Autoencoder-AdaBoost (PCA-AE-Ada) algorithm for breast cancer risk prediction. In parallel tests, an additional classifier (PCA-Ada) following the same framework as the proposed method was implemented, with the primary distinction being the training input.

Qi et al. [21] advocated for the application of this capability to higher-order tasks, demonstrating a learning framework for the classification of histopathological images in breast cancer.

Fatima et al. [22] conducted a comprehensive analysis comparing artificial intelligence, deep learning, and data mining techniques for cancer prediction.

Zhang et al. [23] proposed a classification approach based on genetic markers analyzed from various microarray studies to predict clinical outcomes in cancer patients. However, they observed that multiple markers contributed to the study with a low density ratio when individual data points were widely dispersed, which is considered a significant finding.

Byra et al. [24] presented a deep learning-based approach for predicting the response to Neo Adjuvant chemotherapy (NAC) in ultrasound imaging. They employed deep Convolutional Neural Networks (CNN) to build predictive models for treatment response using transfer learning.

Arya et al. [25] introduced a robust model aimed at minimizing the adverse effects and clinical costs associated with unnecessary cancer treatments. This model utilizes clinical expertise for early diagnosis and the selection of the most appropriate cancer treatment plan. It employs robust deep learning models for classification, generating informative features through random jumps in multivariate data to enhance breast cancer detection.

Gopal et al. [26] proposed an algorithm for the early detection of breast cancer using the Internet of Things (IoT) and machine learning. Their approach achieved impressive precision, recall, quantification, prediction, and precision rates of 98%, 97%, 96%, and 98%, respectively. Additionally, they evaluated the minimum classification error rates, which were 34.21%, 45.828%, and 64.47% for mean absolute error (MAR), root mean square error (RMSE), and relative absolute error (RAE). The results suggest that the MLP classifier outperforms LR and RF classifiers in terms of accuracy and error rates.

Bakx et al. [27] proposed an accurate U-NET model for assessing the distribution of whole breast Radiation Therapy (RT). The differences between the two models in the predicted distributions were relatively small and not significantly different from clinical applications. The results of both models were visualized in an automated plot generation process.

Surender et al. [28] introduced an early detection algorithm for breast cancer that leverages computer vision, image processing, clinical assessment, and neural processing.

Das et al. [29] presented a method that utilizes a deep learning modeling process to transform one-dimensional data into images. This approach is based on converting structured data into images and designing a systematic deep learning model that enhances performance compared to individual models. The proposed model was evaluated for breast cancer detection using gene expression datasets and breast histopathology images.

Wang et al. [36] presented a hybrid deep learning (CNN-GRU) model for the automatic detection of BC-IDC (+ , −) using Whole Slide Images (WSIs) of the well-known PCam Kaggle dataset. In this research, the proposed model used different layers of architectures of CNNs and GRU to detect breast IDC (+ , −) cancer. The validation tests for quantitative results were carried out using each performance measure (accuracy (Acc), precision (Prec), sensitivity (Sens), specificity (Spec), AUC and F1-Score. The model showed the best performance measures (accuracy 86.21%, precision 85.50%, sensitivity 85.60%, specificity 84.71%, F1-score 88%, while AUC 0.89 which overcomes the pathologist’s error and miss classification problem.

3.2 Review on breast cancer grading

Mitotic transitions can be studied effectively by combining fully connected layers with random jump classification [27], allowing the extraction of features from nuclear fragments and accurate prediction of nuclear cell classes. This approach adapts the threshold to accurately represent cell nuclei, even with limited training data, resulting in a high level of accuracy. The framework utilizes pre-processed models that are carefully designed and feature-based. Additionally, it employs restricted sampling to capture mitotic signatures in breast histopathology [25]. However, utilizing a deep learning framework within another deep learning framework presents three unique challenges. Comparative results demonstrate the effectiveness of the proposed approach, as evidenced by metrics such as F-score, accuracy, and review value when compared to other available strategies.

In a related study, a deep learning-based automated and precise mitosis detection method utilizes a semantic segmentation model to analyze 630 breast histopathology images [30]. This technique employs a filtering process to identify potential mitotic cells and utilizes focal loss as an annotation center to outline a semantic distribution network, achieving optimal performance. Another previously described approach [31] leverages prior data to perform mitosis detection with annotated points. Spatial location control, based on value weights derived from positive and negative ratios, mitigates mispredicted components and uniqueness types through multiple instance learning (MIL). Furthermore, a rapid and accurate method [32] was employed for the automatic detection of mitosis in histopathological images, with a higher threshold used to identify more mitotic lines and detect mitotic candidates.

Wang et al. [37] in their observational retrospective study, routine WSIs stained with haematoxylin and eosin from 1567 patients were utilised for model optimisation and validation. DeepGrade provided independent prognostic information for stratification of NHG 2 cases in the internal test set, where DG2-high showed an increased risk for recurrence (hazard ratio [HR] 2.94, 95% confidence interval [CI] 1.24–6.97, P = 0.015) compared with the DG2-low group after adjusting for established risk factors (independent test data).

In Jiang et al. [40], Nine hundred-eight subjects with invasive breast cancer and preoperative MRI scans were retrospectively obtained. The Rad-Grade showed independent prognostic value for re-stratification of NHG 2 tumors, where RG2-high had an increased risk for recurrence (HR 2.20, 1.10–4.40, p = 0.026) compared with RG2-low after adjusting for established risk factors. RG2-low shared similar phenotypic characteristics and RFS outcomes with NHG 1, and RG2-high with NHG 3, revealing that the model captures radiomic features in NHG 2 that are associated with different aggressiveness. Table 1 summarizes the current research gaps in breast cancer studies.

Table 1 Summary of research gaps

4 Problem statement

4.1 Research gaps

A rapid and precise method for the automatic detection of mitosis in histopathological images has been successfully demonstrated [33]. This method leverages morphological regions that consider the spatial scale of cell division. By innovatively manipulating scale space, it enhances the interaction entropy between particles and matter, contributing to improved detection accuracy. Cells separated into mitotic and non-mitotic categories were placed using random forest classification with weighted votes. Through empirical analysis and performance comparisons, our method exhibits significant superiority over all other approaches for mitosis detection across a range of challenging datasets. Each step of our method is finely tuned through extensive comparisons. Additionally, a computer-assisted technique [34] was employed to assess nuclear atypia, involving the grading of high-power field hematoxylin and eosin-stained images. Initially, we extract various nuclear features, encompassing morphology and structural characteristics, from previously segmented nuclear regions. Subsequently, we calculate histograms to capture statistical information about the numerical data. Finally, we employ a Support Vector Machine (SVM) classifier to categorize high-power field images into distinct types of nuclear atypia.

Early detection of breast cancer, one of the most prevalent diseases among women worldwide, is crucial for improving treatment outcomes. Mitosis and nuclear atypia are key factors in the histopathological assessment that determines the diagnosis and staging of breast cancer. The conventional method of transmitting images to pathologists is time-consuming and subject to subjective interpretation, and its accuracy diminishes as chromatin concentration increases in cell nuclei. While this method is highly reliable for scoring cell nuclei with at least three points, its accuracy declines with higher chromatin density. Despite nuclear atypia being a primary criterion for cancer staging, it has not received adequate research attention, and existing systems often struggle to accurately detect and classify critical cancer cells. Therefore, there is a pressing need to address these research gaps. Recently, several methods have been proposed for detecting nucleus, tubule, and mitosis in breast cancer images, but imaging the nuclear core of high-power fields remains a challenging and underexplored area. To bridge these gaps, we propose an automated approach for the detection of mitosis and nuclear atypia in breast cancer staging.

4.1.1 Objectives

The main objectives of our proposed method are:

  1. 1.

    To identify the synergy between mitosis and nuclear atypia enhances the potential for early breast cancer detection and diagnosis.

  2. 2.

    To introduce a kernel learning technique based on novel input images enhances recognition accuracy.

  3. 3.

    To create an optimization algorithm for feature selection that addresses dimensional data challenges.

  4. 4.

    To develop a hybrid machine learning approach for mitosis and nuclear atypia detection, as well as to assess the quality of cancer imaging.

4.2 Motivation

The motivation behind this research is to improve early breast cancer detection, focusing on mitosis and nuclear atypia in high-power field (HPF) images. Current manual assessments by pathologists are time-consuming and subjective, necessitating automated methods. Our approach combines these factors for enhanced detection, utilizing innovative kernel learning techniques, optimization algorithms, and a hybrid machine learning approach. The goal is to provide a robust and efficient solution for accurate breast cancer diagnosis, improving patient outcomes and reducing subjectivity. Our method incorporates the Non-restricted Boltzmann Deep Belief Neural Network (NB-DBNN) for nuclei segmentation, the Giraffe Kicking Optimization (GKO) algorithm for feature optimization, and the Optimal Kernel layer-based Support Vector Machine (OK-SVM) classifier for mitotic cell and nuclear atypia detection. It also employs the Nottingham Grading System (NGS) for comprehensive grading. Compared to traditional methods reliant on manual intervention, our approach offers objectivity, consistency, and potential accuracy. It leverages machine learning and deep learning to efficiently process histopathological data, enhancing breast cancer diagnosis and grading. Our methodology represents a novel and unexplored approach in breast cancer diagnosis and grading.

4.3 System model of proposed technique

Hybrid techniques are used to combine the strengths of different methods and address diverse aspects of complex problems, leading to improved performance, robustness, and adaptability. Figure 1 illustrates a modern approach to detecting mitotic and nuclear atypia, which is recommended for breast cancer staging. The process begins with feed classification and color segmentation applied to the original hematoxylin and eosin stain RGB image. Since the nuclear regions were effectively separated from the background, a grayscale image derived from the H-stained image was chosen for further processing. Core region segmentation is then carried out using NB-DBNN, considering core compactness, spatial extent, and regional-scale morphology. Subsequently, hidden features from various segments are extracted from the root parts, which are subsequently divided. Each feature undergoes histogram calculation to capture statistical information about nuclei in each high-power field (HPF) image. These feature histograms are incorporated into the OK-SVM classifier, resulting in the final step of detecting mitotic cell count and nuclear atypia.

Fig. 1
figure 1

Overall system model of proposed technique

5 Proposed methodology

5.1 Nuclei segmentation

The samples on the surface exhibit some variation in density and cell phase, but this variation is less pronounced than in the patterns found at the edges of the cells and the background. These differences in cell characteristics, background, and edges are not only observed in RGB H&E-stained images but also in the red-channel images of these slides. In our preprocessing and segmentation, we exclusively utilize the red channel of histopathological images. To enhance relative entropy through a right triangle, we reposition the image to allocate information to the corners, reducing the information in the background. To address this, we introduce a deep neural network architecture called "NB-DBNN," based on the Restricted Boltzmann Machine (RBM), which extracts hidden features from segmented cells. Recently, the high-fidelity Deep Belief Network (DBN) has been widely adopted for segmentation tasks. The architecture consists of multiple DBN structures, and the top layer employs the Restricted Boltzmann Machine (RBM) algorithm. The NB-DBNN is a dual-layer design with visible and hidden layers, where "w" represents the weight network connecting the visible and hidden layers, and "w" denotes the vectors of the visible and hidden layers. The energy potential of nodes in the hidden layer is then calculated between the visible and hidden layers (Figs. 2 and 3).

Fig. 2
figure 2

(a) mitotic cells (b) Non  mitotic cells and (c) nuclear atypia score (NAS = 1, 2 and 3 (right side)) from the MITOS-ATYPIA-14 dataset

Fig. 3
figure 3figure 3

Simulation results of segmentation process (a) input RGB Image (b) corresponding ground truth image, and segmented results of (c) REMSS [33] (d) our proposed NB-DBMM algorithm

$$e\left(v,g,\theta \right)=-\sum_{j=1}^{N}{b}_{j}{v}_{j}-\sum_{i=1}^{m}{b}_{i}{g}_{i}-\sum_{j=1}^{N}\sum_{i=1}^{m}{v}_{j}{w}_{ji}{b}_{j}$$
(1)

The energy function between the two layers is determined by the node values in the visible layer and the activation of the first node in the hidden layer. Specifically, the nodes in the visible layer are set to 0 and 1 in the hidden layer. The normal distribution function represents the summation of all levels in both the visible and hidden layers. This corresponds to the normal probability density distribution of expressed and latent genes;

$${\sum }_{v,g}{E}^{-e\left(v,h,\theta \right)}$$
(2)
$$q\left(v,g,\theta \right)=\frac{{E}^{-e\left(v,h,\theta \right)}}{{\sum }_{v,g}{E}^{-e\left(v,h,\theta \right)}}$$
(3)

Since ν is the position of the apparent angles of the stack, i. Thus, the hidden process introduces the power of the node layer.

$$Q\left({g}_{j}=1|v\right)=F\left({a}_{i}+\sum_{i=1}^{M}{v}_{j}{w}_{ji}\right)$$
(4)

At the same time, when the locations of the hidden vertices of layer "g" are known, c. The raw material is usually distributed.

$$Q\left({v}_{j}=1|g\right)=F\left({b}_{j}+\sum_{i=1}^{M}{W}_{ji}{g}_{i}\right)$$
(5)

where F is the initiation capability. Other initialization parameters can be determined to enhance the network. The number of hidden layer nodes is not fixed during the validation process. In NB-DBMM, the maximum likelihood threshold θ is used to find the parameters. This threshold is an important factor in the network's performance and can be adjusted as needed.

$$\theta^\ast=arg\;max\sum_{s=1}^SlNQ\left(v^s\vert\theta\right)$$
(6)

Similarly, if updates are necessary, variations among different devices will be applicable

$${W}_{ji}^{l+1}=\tau {W}_{ji}^{l}+\eta {\left(<{v}_{j}{g}_{i}>{\mathrm{data}}-<{v}_{j}{g}_{i}>{\mathrm{recon}}\right)}^{l}$$
(7)
$${b}_{j}^{l+1}=\tau {b}_{ji}^{l}+\eta {\left(<{v}_{j}>{\mathrm{data}}-<{v}_{j}>{\mathrm{recon}}\right)}^{l}$$
(8)
$${a}_{j}^{l+1}=\tau {a}_{ji}^{l}+\eta {\left(<{g}_{i}>{\mathrm{data}}-<{v}_{i}>{\mathrm{recon}}\right)}^{l}$$
(9)

η is the learning pace of the model, τ is the speed factor, The average fractional derivative < . > of a given distribution is defined as the mean after fitting the regression model. NB-DBNN employs a greedy algorithm for hidden training and error propagation to demonstrate the network. Therefore, the parameters of Adaptive Sparse NB-DBNN are as follows:

$${\theta }_{\tau +1}={\theta }_{\tau }+\eta \left(\Delta {\theta }_{r1}+\Delta {\theta }_{r2}\right)$$
(10)

As an initial attempt to create a compact representation that captures the desired emotional state and neuron oscillation, a novel miniature representation using two Persian words is introduced. The basic workflow of the NB-DBNN algorithm is outlined in Algorithm 1.

Algorithm 1
figure a

NB-DBNN Algorithm

5.2 Hidden feature extraction

Feature analysis is a quantitative method for measuring and identifying structural abnormalities in various brain tissues. Since embedding patterns or density in the brain at the gray level is challenging, additional texture features are required for further classification. Texture represents an inherent characteristic of all surfaces, defining visual patterns that contain detailed information about surface structure extent and spatial distribution on neighboring surfaces. Local features are extracted using curvilinear combinations and gray-level co-occurrence matrices (GLCM). A color histogram illustrates the frequency of a particular color in image editing. Color maps can be generated in various color spaces, typically divided into multiple distinct channels, each with its color value. This approach effectively determines the number of images within a given color range. Fourteen linear features are computed using GLCM, providing insights into the scale of accumulated values in the offset data.

"Haralick's Features: A Statistical Approach to the Distribution of Gray Values in a Local Region" assesses the local characteristics of each MRI image and extracts statistical parameters based on these local features. It calculates the number of pixels in a pixel by summing the values of neighboring pixels "i" and "j" and then dividing the entire array by an even number. Each data entry comprises the pixel value adjacent to pixel "i," resulting in fourteen statistical terms that collectively describe the structure using the same resultant matrix.

$$angular\;Second\;moment=\sum_j\sum_iq\left(j,i\right)^2$$
(11)
$${\mathrm{contrast}}=\sum_{N=0}^{{n}_{h}-1}{n}^{2}\left\{\sum_{j=1}^{{n}_{h}}\sum_{i=1}^{{n}_{h}}q{\left(j,i\right)}^{2}\right\},\left|j-i\right|=N$$
(12)
$${\mathrm{Correlation}}=\frac{{\sum }_{j}{\sum }_{i}\left(ji\right)q\left(j,i\right)-{\mu }_{y}{\mu }_{x}}{{\sigma }_{y}{\sigma }_{x}}$$
(13)
$${\mathrm{Variance}}=\sum_{j}\sum_{i}{\left(j-\mu \right)}^{2}q\left(i,j\right)$$
(14)
$$Inverse\;diference\;method=\sum \limits_j\sum \limits_i\frac1{1+\left(j-i\right)^2}q\left(j,i\right)$$
(15)
$$sum\;average=\sum \limits_{j=2}^{2n_h}{jq}_{y+x}\left(j\right)$$
(16)
$$sum\;Variance=\sum \limits_{j-2}^{2n_h}\left(j-F_t\right)^{2q_{y+x}}\left(j\right)$$
(17)
$$sum\;entropy=\sum \limits_{j=2}^{2n_h}q_{y+x}\left(j\right)log\left\{q_{y+x}\left(j\right)\right\}=F_t$$
(18)
$${\mathrm{entropy}}=-\sum_{j}\sum_{i}q\left(i,j\right)log\left(q\left(j,i\right)\right)$$
(19)
$$Diff\;Variance=\sum \limits_{j=0}^{{n}_{h}-1}{j}^{2}{q}_{y+x}\left(j\right)$$
(20)
$$Diff\;entropy=-\sum \limits_{j=0}^{n_h-1}q_{y+x}\left(j\right)log\left\{q_{y-x}\left(j\right)\right\}$$
(21)
$$Info.\;Measure\;of\;Correlation\;1=\frac{GYX-GYX1}{Max\left\{GY,Gx\right\}}$$
(22)
$$Info.\;Measure\;of\;Correlation\;2=\left(1-exp\left[2\left(GYX2-GYX\right)\right]\right)^{1/2}$$
(23)
$$Max\;correlation\;Coeff=Square\;root\;of\;the\;Second\;largest\;Eigen\;values$$
(24)

Local Binary Patterns (LBP) is a visual representation used in computer vision. Combining LBP with the histogram descriptor known as Histogram of Oriented Gradients (HOG) has been demonstrated to substantially enhance detection performance. In this technique, every pixel within a cell is compared to its eight neighboring pixels. If the center pixel value is greater than that of its neighboring pixel, it is labeled as "0"; otherwise, it is labeled as "1". This process transforms the input image into a 256-dimensional vector representation.

5.3 Feature optimization

Next, the Giraffe Kicking Optimization (GKO) algorithm is utilized for feature optimization, enabling the selection of the most appropriate feature among multiple options while addressing redundancy and data size concerns. To accomplish this, the algorithm combines the relevant pieces based on the distance measure learned previously, effectively classifying the leopard as a unified solid entity. GKO algorithm is employed to optimize features for the automated analysis of extensive observational data, particularly focusing on identifying abnormal behavior. This process begins with the selection of multiple features extracted from the data. GKO aims to choose features that are non-redundant, meaning they offer unique information, while also considering data size issues to reduce computational demands. A crucial aspect of GKO is the use of a predefined distance measure that quantifies the similarity or dissimilarity between features. This measure helps determine which features best represent the data. The goal is to classify the observed entity, in this case, a leopard, as a complete solid body based on these selected features and the distance measure, enabling automated analysis without the need for continuous human intervention. Ultimately, GKO streamlines the analysis of extensive data by optimizing feature selection [39] to enhance the system's efficiency in detecting abnormal behavior. The objective is to create a system capable of automatically analyzing extensive hours of observational data, focusing on small segments that human analysts typically assess for signs of abnormal behavior.

$$f\subset \left\{s|1\le s\le S\right\}$$
(25)

As it's impractical to anticipate all forms of abnormal behavior, we exclude maps associated with known animal behaviors and aim to identify the most trustworthy maps for the entire query.

$$f\subset \left\{s|1\le s\le S\right\}$$
(25)

Several frames in a video exhibit highly unusual behavior, denoted as S frames. Consequently, the capacity to recognize certain common animal behaviors is a crucial prerequisite for our approach. We calculated the time interval between two exposures. Following gj, gi should be employed to assess the instantaneous probability of accurate detection, the temporal sequence, and the detector's recall ratio (R).

$${Q}_{\mathrm{temporal}}\left({g}^{j}\to {g}^{i}\right)=\left\{\begin{array}{c}0{\mathrm{i}}{\mathrm{f}}\Delta s<=0\\ {\left(1-R\right)}^{\left(\Delta s-1\right)}{\mathrm{o}}{\mathrm{t}}{\mathrm{h}}{\mathrm{e}}{\mathrm{r}}{\mathrm{w}}{\mathrm{i}}{\mathrm{s}}{\mathrm{e}}\end{array}\right.$$
(26)

In practice, processing all the exposures simultaneously, especially in the case of long videos, is not advisable. To address this, we illustrate the issue by partitioning the video into five-minute segments, each comprising 300 frames, and then applying the Hungry Multiframe algorithm to each of these segments. Of course, there are alternative approaches to creating video summaries. In this study, we compared two fundamental methods. The first prototype follows a uniform temporal pattern: a frame is added every five minutes.

$${f}_{\mathrm{uniform}}=\left\{300,600,900\right\}$$
(27)

In our second baseline approach, we utilize the motion history of the video frames to quantify the amount of motion. Frames with a significant history of kinetic energy, denoted as mg(s), are included.

$${f}_{\mathrm{motion}}=\left\{s|mg\left(s\right)\ge \lambda \right\}$$
(28)

A motion history image is generated by exponentially diminishing the dissimilarity between images from the previous time interval. Thus, we establish two-way associations within every stage of human development and across various environmental contexts. We define relatedness among individuals based on their group membership, specifically, individuals within the same group observed at daily intervals are considered related. To quantify this relatedness, we computed a half-weighted association index from the pairwise matrix, which measures the proportion of time pairs spent together versus time spent separately.

$$Gwj=\frac{y}{y+{x}_{BA}+\frac{1}{2\left({x}_{B}+{x}_{A}\right)}}$$
(29)

To incorporate socialization as a structural variable, it is essential to eliminate cycles. This is because the contact index between individuals j and i, when considering socialization, increases joint stability while reducing the correlation index between i and j. This is achieved by multiplying by the sum of all elements except for oneself.

$$y{\left({\mathrm{gregariousness}}\right)}_{ji}=log\left(\sum_{K\ne j,i}{Gwj}_{jK}\times \sum_{K\ne j,i}{Gwj}_{jK}\right)$$
(30)

To achieve this, a real-number representation should be employed, even when the random variable is discrete and non-continuous. Observations are categorized into three factors based on molecular experiments.

$${y}_{j}=\frac{F-\left({f}_{low}+{f}_{\mathrm{high}}\right)/2}{({f}_{\mathrm{high}}-\left({f}_{low}\right)/2}$$
(31)

The aim of this analysis is to discern the primary effects and interactions of each factor and determine which of these effects should be disregarded or minimized.

$${p}_{j}=\left[{h}_{j1}={q}_{j1}\left(y\right),{h}_{j2}={q}_{j2}\left(y\right),..\dots {h}_{jH}={q}_{jH}\left(y\right)\right]$$
(32)

PDF is utilized to generate values for each classical gene, meaning the function should be within the range of possible values for the optimization variable. This can be altered using a random walk technique known as "light flight," which involves a sequence of random steps. Nevertheless, randomness plays a crucial role in practical applications and innovation.

$$g=\alpha \times {\mathrm{levy}}\left({n}_{{\mathrm{var}}}\right)$$
(33a)
$$g=\beta \times {\mathrm{levy}}\left({n}_{{\mathrm{var}}}\right)$$
(33b)

Lévy planes are employed to enhance the local search efficiency of the algorithm during the exploitation phase and the global search efficiency during the detection phase. The selection of flight paths may be assessed for optimization.

$${\mathrm{levy}}\left({n}_{{\mathrm{var}}}\right)=0.01\times \frac{{RR}_{1}\times \delta }{{\left|{RR}_{2}\right|}^{1/\beta }}$$
(34)

Both are randomly generated numbers within the range [0, 1]. The optimization process using the feature optimization algorithm GKO is outlined in Algorithm 2.

Algorithm 2
figure b

Feature optimization using GKO algorithm

5.4 Breast cancer grading

An Optimized Kernel layer-based Support Vector Machine (OK-SVM) classifier is used to identify mitotic cells and nuclear atypia from selected features. We also employ the Nottingham Grading System (NGS) to determine the grade of breast cancer from pathological images. As a result, the kernel function typically maps the training data into a non-linear equation surface and higher-dimensional spaces. Data Value Metric (DVM) is a classification formula used to partition data into evidence and predictions. Each event in the organization consists of a target value and attributes, along with a set of attributes that can be utilized to forecast variances. Let Y and X be the input and output data sets, and a simple system is established to find the appropriate x \(\in\) X given a previously observed value of y \(\in\) Y:

$$X=F\left(y,\alpha \right)$$
(35)

The α kernel function parameters need to be adjusted more accurately for more accurate image classification results. The Optimal Kernel SVM (OK-SVM) classifier can adapt to various kernel functions, including linear, polynomial, and radial basis functions, to achieve efficient image segmentation. In OK-SVM, each hyperplane is characterized by a normal vector parameter and a constant "B" that is perpendicular to the hyperplane.

$$F\left(y\right)=W.Y+a$$
(36)

h is the ideal hyperplane expression, and h1 and h2 are finite planes.

$$h:{y}_{j}.w+a=0$$
(37)
$$h1:{y}_{j}.w+a=-1$$
(38)
$$h2:{y}_{j}.w+a=+1$$
(39)

When a linear kernel model is inappropriate, a multi-kernel method should be used, which efficiently fits more complex data. The expression of the polynomial kernel method is given as follows:

$$F\left(Y\right)=w.\Phi \left(y\right)+a$$
(40)

where, \(\Phi\)(y) is mapping function in a high dimensional space. A chord is considered open if it is preceded by other long chords or a syllable. For the remaining sequences, the frequencies representing normal and abnormal stacks, as well as their frequency of occurrence in the training data set, were calculated. Another communication system is employed.

$${q}_{F}=\frac{{F}_{n}}{{F}_{b}}$$
(41)

It is used in code to detect malicious strings. Here, ε percentage sequences are selected in the dictionary system. Finally, space is a property of vectors.

$$v=\frac{1}{m}\sum_{m=1}^{m}{z}_{M}$$
(42)
$$M=N-K+1$$
(43)

The sliding window size K is predefined. The K value plays an important role in model discovery because it represents the feature extraction and model training problem. For each y, let the input be y and the support vector be y1. It is written as:

$$k\left(y,y1\right)=sum\left(y\times y1\right)$$
(44)

The space dimensions between data and vectors are called kernels. A linear combination of inputs is called a linear kernel. Kernel types such as polynomial and radial scale the input to their higher values. The optimal kernel polynomial is written as;

$$k\left(y,y1\right)=1+sum{\left(y\times y1\right)}^{D}$$
(45)
$$k\left(y,y1\right)=exp(-{\mathrm{gamma}}\times sum\left(\left(y-y{1}^{2}\right)\right)$$
(46)

The accepted default value for gamma is 0.1, but gamma values can be between 0 and 1. It can be used to create complex parts in two-dimensional space. The work process and step-by-step procedure for breast cancer detection using OK-SVM is shown in Fig. 4.

Algorithm 3
figure c

Breast cancer grading using OK-SVM classifier

Fig. 4
figure 4

Graphical view of proposed OK-SVM classifier

6 Results and discussion

In this section, we validate the breast cancer detection technique using the open source test database MITOSIS-ATYPIA-14. First, we describe a comparative analysis of the proposed and existing segmentation algorithms. Then discuss the comparative analysis of the feature selection process with various performance measures such as precision, accuracy, recall and F-measure. A comprehensive evaluation of breast cancer classification is performed using various statistical measures. The results are then compared with the current state of the art methods.

6.1 Dataset description

We utilized the MITOS-ATYPIA-14 dataset for this replication. The slides were stained with standard hematoxylin and eosin (H&E) and examined using two slide scanners: Aperio Scanscope XT and Hamamatsu Nanozoomer 2.0-HT. On each slide, the pathologist selected various images at X20 magnification. An X20 frame is divided into four frames at X40 magnification. Specifically, the Aperio magnification data was captured by our X40 assay. These files are 24-bit RGB bitmap images with an image size of 1539 × 1376 and an image resolution of 4.073/µm. Figure 2 displays sample images of mitotic and non-mitotic cells in the MITOS-ATYPIA-14 dataset. Our proposed framework was validated with five different objectives (1 to 5 objectives), and their details are summarized in Table 2.

Table 2 Dataset description

6.2 Analysis of segmentation process

We compare the segmentation performance of our proposed NB-DBMM with Relative Entropy Maximized Space Segmentation (REMSS) segmentation [33]. On the other hand, our proposed technique can avoid the maximum partition as it employs a specific grayscale threshold corresponding to the maximum relative entropy between the nuclei and the background for the spatial organization of the image scale.

6.3 Comparative analysis of feature optimization

In this section, we evaluate the performance of the proposed optimization method using the MITOSIS-ATYPIA-14 dataset. Table 3 presents a comparative analysis of the proposed GKO algorithm with various models and existing experimental classifiers, namely Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), and Logistic Regression (LR) [35].

Table 3 Performance comparison of feature optimization techniques

For the case without feature optimization

The comparison between the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers reveals maximum accuracy at 9.212%, 1.011%, and 7.336% efficiency, respectively. The comparison shows maximum precision at 1.028%, 9.368%, and 7.460% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum Recall is observed at 2.532%, 7.428%, and 15.116% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum specificity is seen at 1.504%, 12.060%, and 11.303% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum specificity is also observed at 1.79%, 8.403%, and 11.475% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively.

For shape features only

The comparison between the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers shows maximum accuracy at 1.339%, 6.488%, and 10.758% efficiency, respectively. The comparison reveals maximum precision at 1.362%, 6.597%, and 10.938% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum Recall is observed at 3.831%, 11.396%, and 14.387% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum specificity is seen at 4.546%, 15.875%, and 12.601% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum F-measure is also observed at 2.619%, 9.073%, and 12.707% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively.

For GLCM features only

The comparison between the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers shows maximum accuracy at 2.455%, 9.205%, and 8.061% efficiency, respectively. The comparison reveals maximum precision at 2.496%, 9.359%, and 8.196% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum Recall is observed at 0.781%, 7.636%, and 20.768% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum specificity is seen at 0.59%, 9.227%, and 11.112% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum F-measure is also observed at 1.641%, 8.501%, and 14.979% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively.

For Intensity features only

The comparison between the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers shows maximum accuracy at 3.66%, 14.175%, and 6.141% efficiency, respectively. The comparison reveals maximum precision at 3.721%, 14.41%, and 6.243% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum Recall is observed at 4.251%, 5.357%, and 12.488% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum specificity is seen at 3.654%, 15.756%, and 13.065% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum F-measure is also observed at 3.98%, 10.86%, and 9.49% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively.

For GKO algorithm-based feature optimization

The comparison between the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers shows maximum accuracy at 2.795%, 7.842%, and 3.957% efficiency, respectively. The comparison reveals maximum precision at 2.842%, 7.971%, and 4.022% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum Recall is observed at 1.88%, 3.401%, and 9.486% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum specificity is seen at 2.72%, 10.83%, and 9.074% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively. The maximum F-measure is also observed at 2.361%, 5.729%, and 6.849% efficiency for the proposed OK-SVM classifier and the existing SVM, k-NN, and LR classifiers, respectively.

6.4 Comparative analysis of existing and proposed classifiers

The SVM classification of the MITOSIS-ATYPIA-14 dataset is presented. Table 4 illustrates the performance of OK-SVM classifiers in the MITOSIS-ATYPIA-14 dataset. Figure 4 provides a graphical overview of OK-SVM event classification accuracy, precision, recall, specificity, and F-measure. The plot clearly shows that the average accuracy of the proposed OK-SVM classifier is 96.674%, the average precision is 96.127%, the average recall is 95.893%, and the average specificity is 96.127%. The recommended feature-based OK-SVM classifier achieves an accuracy of 95.748% and an F-measure of 96.01%. Table 5 describes the comparative analysis of the proposed OK-SVM classifier with current state-of-the-art classifiers: DBN-MCS [27], CNN [27], CNN-RELU [25], CUHK [30], Deepmitosis [30], CanN (alone) [30], CanN (med) [30], Segmitos-Random [30], PSPNet [31], CPCN [31], RF-REMSS [33] (Fig. 5).

Table 4 Results of proposed OK-SVM classifier for five subsets
Table 5 Comparative analysis of proposed and existing classifiers for MITOSIS-ATYPIA-14 dataset
Fig. 5
figure 5

Accuracy of proposed and existing breast cancer grading classifiers

In Fig. 6, we compare the accuracy of the proposed method and the current method for brain cancer grading. Here is a graph showing the accuracy of our proposed OK-SVM classifier. state-of-the-art DBN-MCS [27], CNN [27], CNN-RELU [25], CUHK [30], Deepmitosis [30], CanN (Discrete) [30], CanN (Average) [30], Segmitos—Random [30], PSPNet [31], CPCN [31] and RF-REMSS [33] classifiers, resp. In Fig. 7, we compare the existing with the proposed method for brain cancer staging. The recall of the proposed OK-SVM classifier was 2.673%, 6.01%, 7.991%, 68.705%, 53.782%, 50.132, 47.118%, 47.598%, 51.5198%, 85.519%, 51.519%, and 51.519%. DBN-MCS [27], Rhoncus [27], Cnn-RELU [25], CUHK [30], DeepMitosis [30], CanN (single) [30], CanN (average) [30]], Segmitos-Randa [30], PSPNet [31], CPCN [31], RF-REMSS [33] classifiers.

Fig. 6
figure 6

Precision of proposed and existing breast cancer grading classifiers

Fig. 7
figure 7

Recall of proposed and existing breast cancer grading classifiers

In Fig. 8, we compare the performance of our proposed method with existing methods for brain cancer staging. The OK-SVM classifier achieved an accuracy of 92.75%, precision of 89.56%, recall of 87.66%, specificity of 29.44%, F1 score of 43.75%, and AUC of 47.25%. The current state-of-the-art classifiers achieved an accuracy of 86.60%, precision of 68%, recall of 29.44%, specificity of 43.75%, F-measure of 47.25%, and AUC of 50.14%. In Fig. 9, we compare the F- measure of the proposed technique and the current approach for brain cancer management. The F-measure of the proposed OK-SVM is 95.748%. The F-measure of the current state-of-the-art classifiers (DBN-MCS [27], CNN [27], CNN-RELU [25], CUHK [30], DeepMitosis [30], CanN (single) [30], CanN (average) [30], SegMitosis-Random [30], PSPNet [31], CPCN [31], and RF-REMSS [33]) are 83.4%, 88.6%, 90.24%, 35.64%, 43.37%, 44.35%, 48.25%, 56.2%, 52.01%, 56.22%, 78%, and 90.24%. respectively.

Fig. 8
figure 8

Specificity of proposed and existing breast cancer grading classifiers

Fig. 9
figure 9

F-measure of proposed and existing breast cancer grading classifiers

7 Conclusion

A hybrid machine learning technique is proposed for breast cancer staging, involving the automatic detection of mitotic and nuclear atypia from histopathological images. This approach combines Augmented NB-DBNN to extract hidden features from segmented cells and utilizes the GKO algorithm for feature optimization, selecting the best features among multiple options. To detect mitotic cells and nuclear atypia, an optimized kernel-based support vector machine (OK-SVM) classifier is introduced. The proposed technique also incorporates the Nottingham Grading System (NGS) to grade pathological images of breast cancer. To validate this approach, the MITOSIS-ATYPIA-14 test database is used. Simulation results demonstrate that the performance of the proposed OK-SVM classifier surpasses that of existing state-of-the-art classifiers. The average accuracy, precision, recall, specificity, and F-measure of the proposed OK-SVM classifier are 96.674%, 96.17%, 95.893%, 95.748%, and 96.01%, respectively. Future work is to integrate deep learning techniques for even more accurate breast cancer detection and classification.