1 Introduction

Blood cells are essential for maintaining health, and accurate classification is crucial for diagnosing various haematological conditions. This work aims to create a robust system that can identify different subtypes of white blood cells (leukocytes) in ancient blood samples [1, 2]. By leveraging deep learning techniques, we can improve efficiency and reduce our reliance on expert manual analysis. Blood has held a central position in human understanding, perceived not only as the essence of life but also as a vessel of profound knowledge [3, 4]. In the realm of archaeology, the study of ancient blood cells has emerged as a fascinating avenue for unravelling the mysteries of bygone civilisations, shedding light on health, lifestyle, and even the evolutionary journey of humanity [5].

This article delves into the intersection of artificial intelligence and archaeology, exploring how CNNs and SVMs can revolutionise the study of ancient blood cells. This article also discusses the principles behind CNNs and SVM, their adaptation to archaeological contexts, and the implications of their application in unlocking antiquity’s secrets. Through a comprehensive review of recent advancements and case studies, this work aims to illuminate the potential of CNNs as a transformative tool for elucidating the mysteries of ancient blood, offering new insights into the lives and legacies of our ancestors [6,7,8]. Hence this research has the following contributions.

  • The first contribution is to work with a preprocessing step to augment the data and eliminate noise.

  • The second contribution is to develop a nucleus segmentation algorithm that is both fast as well as accurate. This segmentation results in a better classification.

  • The third contribution involves extracting four color features from the cytoplasm as well as nucleus. This study used the convex hull of the nucleus to extract color features from the cytoplasm, eliminating the necessity for cytoplasm segmentation, which is a tough task.

  • The fourth contribution is the feature extraction process with the aid of CNN. This step reduces the number of features in the dataset, considering only the most efficient features.

  • The work’s final contribution is a classification using SVM.

The structure of the remaining parts of the organisation is as follows: Section 2 reviews some existing works and lists out the challenges. Section 3 discusses the proposed methodology. Section 4 discusses the results section with some comparative analysis. Section 5 concludes the work, followed by the references.

2 Literature review

Blood cell classification involves the categorization of various blood cell types by analyzing their morphological characteristics. This process holds significant importance in multiple domains of medicine and science, encompassing disease diagnosis, overall health monitoring, and the study of evolutionary transformations in blood cells. Within the human body, blood comprises diverse cell types, namely red blood cells (erythrocytes), white blood cells (leukocytes), along with platelets (thrombocytes). These cells possess distinct morphological attributes and fulfill specific functions vital to bodily processes [9,10,11].

Blood cell classification involves visually examining the shape, size, structure, and other properties of blood cells to determine their specific type. Microscopic examination of blood samples is commonly used to perform this classification. Using specialized staining techniques and high-resolution imaging, medical professionals and researchers can observe and differentiate between various blood cell types [12, 13].

Accurate blood cell classification is vital for diagnosing diseases such as anemia, leukemia, and infections. By identifying abnormal blood cell patterns, healthcare professionals can detect and monitor health conditions, prescribe appropriate treatments, and assess the effectiveness of therapies. Moreover, blood cell classification helps in understanding the evolution of blood cells throughout history. Analyzing the morphological changes in ancient blood cells provides valuable insights into the health and lifestyle of our ancestors, contributing to archaeological and anthropological research.

In recent years, advancements in technology and artificial intelligence have enabled the development of automated blood cell classification systems. Machine learning algorithms, including CNNs, have shown remarkable success in accurately categorizing blood cells based on digital images. These automated systems offer faster and more consistent results, reducing human error and providing efficient analysis in medical laboratories and research settings. Overall, blood cell classification plays a pivotal role in medical diagnostics, research, and understanding the fascinating world of blood cells. It enables us to uncover important insights into our health, history, and the intricate workings of the human body [14,15,16,17,18].

2.1 Challenges in ancient blood cell classification

Blood cell classification of ancient samples presents several unique challenges that need to be overcome for accurate analysis. These challenges include limited data availability, the deterioration of ancient blood cells, and the complex morphology exhibited by these cells.

2.2 Limited data availability

An issue in ancient blood cell classification is the limited availability of data. Ancient blood cell samples are scarce, and acquiring a substantial and diverse dataset for training classification models becomes a difficult task. The scarcity of samples hampers the development of accurate algorithms and poses challenges in achieving reliable results. Researchers must work with the available data and explore techniques to make the most of the limited samples.

2.3 Deterioration of ancient blood cells

Over time, ancient blood cells undergo degradation, leading to structural changes and the loss of vital information. Preservation conditions, environmental factors, and the effects of aging contribute to the deterioration of these cells. As a result, the cells’ characteristics may be altered, making it challenging to identify and classify them accurately. The degradation process can affect the cell’s shape, color, and internal structures, further complicating the classification task. Researchers must develop methods that can handle the variations and noise introduced by the deterioration process.

2.4 Complex morphology of ancient blood cells

Ancient blood cells often exhibit intricate morphological variations, adding to the complexity of classification. Preservation conditions, storage methods, and other factors can result in diverse morphologies among the cells. Some cells may be distorted, fragmented, or fused together, making it difficult to discern their original form. Additionally, the effects of aging can introduce unique characteristics and features not present in modern blood cells. The complex morphology requires sophisticated algorithms and techniques capable of capturing the subtle differences and variations in the cells. Researchers must consider the specific challenges posed by ancient blood cell morphology and develop strategies to address them effectively.

Overcoming these challenges is essential to unlock the valuable information contained within ancient blood cells. Advanced techniques such as CNNs have shown promise in handling these challenges and improving the accuracy of ancient blood cell classification. By leveraging these technologies and continuing research efforts, scientists can gain deeper insights into our ancestors’ health and evolutionary changes in blood cells over time.

3 Proposed methodology

The purpose of this work has been to deliver a novel method that is lighter, faster, also added consistent than current methods for the classification of white blood cells in peripheral smear images. Minicomputers and mobile devices can easily implement the algorithm due to its low weight and speed, eliminating the need for a TPU or GPU. This work employs an approach that blends machine learning and deep learning. We first carry out preprocessing for data augmentation and noise removal. The next step involves using Otsu thresholding for image segmentation. CNNs then automatically retrieve the features from the segmented image, eliminating the need for human approaches.

Ultimately, SVM uses these features to categorise WBCs. Figure 1 displays the proposed block diagram. We devise a technique and compare it with other presented approaches in the nucleus detection phase. We also construct four additional colour features for the feature extraction phase, and demonstrate how these new features enhance the classification accuracy. It is significant to note that the color features developed in this research are specific to the WBC classification issue and are not applicable to other problems. The final stage assesses the proposed approach. The proposed technique must consider that the ability of intelligence systems to generalize is a crucial skill in the real world. The study has compared the proposed technique to existing models and investigated its potential for generalization. The results section demonstrates that our proposed technique outperforms the well-known CNN models in terms of generalisation power.

Fig. 1
figure 1

Proposed process flow

3.1 Dataset

This study uses the Raabin-WBC [22] dataset for assessment. In 2021, we released this sizable dataset for free. The Raabin-WBC dataset includes three classification sets of WBC cropped images such as Test-A, Train, along with Test-B. Two experts have independently labelled each WBC in the Train as well as Test-A sets. However, the labels for the Test-B images remain incomplete. As a result, in our investigation, this work employed the Train as well as Test-A sets. We gathered these two sets, totaling 14,514 WBC images, from 56 normal peripheral blood smears (for neutrophil, eosinophil, lymphocyte, and monocyte) also a chronic myeloid leukaemia (CML) patient (for basophil). We dyed all these films using the Giemsa method. This work employed the Olympus CX18 microscope along with the Samsung Galaxy S5 camera phone to capture the standard peripheral blood smears. Additionally, an LG G3 camera phone and a Zeiss-brand microscope were used to image the CML slide. The images were all captured at a magnification of 100, which is important to note.

3.2 Preprocessing

Existing works recognize only a small number of datasets, so first, this work constructs the dataset through data augmentation. This research used data augmentation methods, including rotation, flipping, and zooming, to overcome the small dataset size. The purpose of preprocessing is to improve training by removing noise from blood cell images. Existing works used large input images, which extended the training period. Hence, this work initially lowered the image size of blood cells to address this issue. A 120 × 120 × 3 image size adjustment is made. In order to identify image edges, a value-based filter is worked to the images; the result is visible edges. The third stage is to convert the BGR image to the luma component, red projection, also blue projection (YUV). This process maintains the full resolution of the Y channel while reducing the U as well as V channels resolution. This conversion favors brightness over color. It is possible to considerably reduce the size by lowering the V and U channels. Ultimately, edge smoothing and histogram normalization are used to convert the YUV images back to RGB. These actions contribute to the overall preprocessing pipeline.

3.3 Segmentation

Following preprocessing image segmentation is done. Consider the following processes for nucleus segmentation: First, this research subjects the preprocessed RGB input image to a colour balancing method. After calculating and merging the CMYK and HLS colour spaces, this work generates a soft map. Finally, this work uses Otsu’s thresholding technique with the previously described soft map to segment the nucleus. The nucleus segmentation algorithm’s specific stages are as follows:

  • Colour-balanced RGB image conversion to CMYK colour space

  • KM= (K component) – (M component)

  • Altering a colour-balanced RGB image to the HLS colour space.

  • MS = Min (M component, S component)

  • Output soft map = MS – KM;

  • Segmenting the nucleus using Otsu’s thresholding technique.

The Fig. 2 represents the segmentation method block diagram. Comparing to those in the M component, cthe resultant red blood cells (RBC) and White blood cell (WBC) cytoplasm is more intense in the K component. Additionally, the WBC nucleus’s intensity is lower than that of the M component. An image is obtained by subtracting the M component from the K component with zero or nearly zero nucleus pixels. In contrast, calculating the minimum of the M and S channels produces an image with almost negligible background and RBC intensity. Subtraction ultimately removes cytoplasm, RBCs, as well as background from the image. This study employs the colour balancing method to reduce colour fluctuations. To provide a colour-balanced representation of the image, we must calculate the R, G, along with B channels mean and the grayscale RGB image representation [19].

Fig. 2
figure 2

Segmentation method block diagram

3.4 Feature extraction

This work uses CNN for feature extraction. This feature extraction is done with the segmented image. CNN is a versatile architecture that caters to both categorization and feature extraction. However, this study uses CNN not for classification but for feature extraction. Most commonly, we use the extended form of artificial neural networks, or CNN, to extract features from matrix datasets that resemble grids. For instance, visual datasets with many data patterns, such as images or movies. Several layers comprise CNN, comprising the input layer, pooling layer, convolution layer, also fully connected or dense layer. This block diagram is illustrated in Fig. 3.

Fig. 3
figure 3

CNN architecture

The convolutional layer processes the segmented input image to extract features; the pooling layer lessens calculation by downsampling the image; also, the fully connected layer makes the final classification. The network employs gradient descent along with backpropagation to learn the finest filters. The network utilises a convolution tool known as “feature extraction,” which divides and recognises the numerous features in the image for further analysis. Several pairs of pooling or convolutional layers make up the feature extraction network. The fully connected layer utilizes the convolution process output to forecast the image’s class rooted on the formerly extracted features. The CNN feature extraction model goal is to minimise the number of features in a dataset. It produces new features that are the preexisting features summary identified in the first feature gathering. The CNN architecture diagram demonstrates the several layers that make up CNN.

3.5 Classification

Following CNN’s extraction of features from segmented data, the max-min approach is used to normalise the features before they are fed into an SVM classifier. Further classifiers, like deep neural networks and K-nearest neighbour (KNN), were also put to the test. On the other hand, the SVM is shown to provide the best outcomes. After much trial and error, the optimal overall accuracy is found when the neutrophils weight in the training is set to be more than one also the other classes are all one. In this sense, there are three typically used kernels: radial basis functions, polynomial kernels, and linear kernels. Additionally, a regularisation parameter is a crucial training parameter for a support vector machine. To correctly train the SVM model, three crucial hyperparameters are therefore tuned: the regularisation parameter, the kernel, and the class weight.

4 Results and discussions

This research uses various measures, such as the dice similarity coefficient (DSC), sensitivity, accuracy, f1score, along with precision, to estimate the proposed classification method effectiveness. We construct these metrics using the following equations, which are based on true positive (TP), false positive (FP), true negative (TN), and false negative (FP).

$$\:DSC=2\times\:\frac{TP}{\left(FN+TP\right)+\left(FP+TP\right)}$$
(1)
$$\:sensitivity=\frac{TP}{FN+TP}$$
(2)
$$\:precision=\frac{TP}{FP+TP}$$
(3)
$$\:f1score=2\times\:\frac{precision\:\times\:sensitivity}{sensitivity\:+precision}$$
(4)
$$\:accuracy=\frac{TN+TP}{FP+TN+FN+TP}$$
(5)

Table 1 displays the outcomes of the proposed segmentation technique. With accuracy, sensitivity, along with DSCs of 0.9979, 0.9529, and 0.9681, respectively, the proposed segmentation approach can identify the nucleus of an object. We compare the proposed segmentation technique with the mask R-CNN [21] (which uses ResNet50 as its backbone) and Mousavi et al.‘s approach in terms of performance. Mask R-CNN, a popular deep CNN for image segmentation, is one such model. In order to train, 989 randomly selected images from the Raabin-WBC dataset had their ground truths extracted by a professional using the Easy-GT software. 199 neutrophils, 195 eosinophils, 197 basophils, 199 monocytes, and 199 lymphocytes make up the training set. We assessed the aforementioned models using 250 ground facts after 40 epochs of training. Additionally, we train R-CNN masks under supervision. As a result, they need a lot more data for training. The proposed method, however, does not require any prior knowledge. Additionally, the proposed segmentation technique is quicker and simpler than the present approach, which has many parameters and takes longer to segment an image. The proposed technique can find the nucleus of a WBC in 43 ms.

Table 1 Segmentation results

The proposed method accuracy is 99.79% where the existing accuracy of MobileNet-V2 [23] is 98.48, MnasNet1 [24] is 98.29, and ShuffleNet-V2 [25] is 98.36 as in Table 2. After cautious inspection of the accuracy, recall, and f1-score criteria, it is clear that our suggested approach has formed the finest results across the majority of classes. Unlike conventional methods, this article’s technique is straightforward, inventive, and simple to implement. This approach uses the cytoplasm and nucleus to extract features with the right shape and colour. This is most likely due to the overabundance of duplicate or zero characteristics that pre-trained models extracted. The number of trainable parameters increases when pretrained CNNs extract a significant number of features prior to fully linked layers.

Table 2 Classification comparison

4.1 Discussion

As previously stated, the suggested approach consists of three stages. In the first phase, we segment the nucleus also find a portion of the cytoplasm inside the convex hull of the nucleus. We ultimately categorize WBCs using extracted characteristics, following the retrieval of shape and color features. The segmentation phases aim to eliminate RBCs as well as cytoplasm. Table 1 demonstrates that the segmentation method is capable of DSC and high-accuracy nucleus detection. When compared to the R-CNN model for masking, the suggested segmentation method is very quick. In the cytoplasm detection phase, we selected only portions of the cytoplasm within the convex hull of the nucleus as a representative of cytoplasm (ROC), a departure from the standard procedure of segmenting the entire cytoplasm. This method is not difficult for cytoplasm segmentation, but the ROC-extracted features improve classification accuracy. CNN then extracts the significant characteristics during the feature extraction stage.

Analysing a patient’s blood sample is a crucial responsibility in the medical sector. Abnormalities in blood cells cause many health problems. RBCs, or RBCs, are a key component of blood. The RBC’s classification helps us diagnose many illnesses. Manually visualising RBCs under a microscope is a laborious and time-consuming procedure that could lead to incorrect interpretations due to human error. Normal RBCs may vary in size, texture, and form depending on a number of different health situations. The proposed approach uses CNN for feature extraction and SVM in conjunction with image processing to categorise the RBCs. The algorithm may extract and classify the features of each split cell image. We gathered blood slide images from the hospital. Consequently, we designed the technology to provide quick, precise findings that could potentially save patients’ lives. The last stage uses an SVM model for classification. We individually trained the SVM model for each combination to determine the optimal combination of hyperparameters. We propose an automated, straightforward, and rapid technique that does not need picture resizing or cytoplasm segmentation.

5 Conclusion and future work

To categorise WBCs, this study developed four colour features and a nucleus segmentation technique. Thus, this work developed a quick and accurate nucleus segmentation approach. In this study, we design and extract four colour features from both the cytoplasm and nucleus. We extract color features from the cytoplasm using the convex nucleus hull, eliminating the need to split the cytoplasm. After segmentation, the CNN aids in feature extraction. As a result, the CNN features aid the SVM model categorize WBCs more precisely. When classifying the datasets successfully, the proposed approach guaranteed effective accuracy. The findings also show that the proposed approach is quicker and more generalizable. Thus, we can infer that the proposed method is not only sturdy and dependable, but also suitable for use in laboratory settings.