1 Introduction

In the last decade researchers have provided some interesting methods for the automated detection of diseases [4, 43, 46, 47, 54]. Some of these methods make use of different steps in image processing namely, segmentation, feature extraction, and classification based on deep learning [15, 21, 37, 44, 45]. These methods have extensively been applied for white blood cell (WBC) cancer diagnosis. WBC cancer is among the number of diseases that need to be diagnosed at its early stage and treated, to give a better chance of recovery to the patient. WBC cancer is caused by a rapid and out of control growth of abnormal WBC which disturbs or interrupts the right functioning of the human immune system [48]. There are three main types of blood cancer, namely, leukaemia, lymphoma, and myeloma [8]. Leukemia is the most common, and it can be diagnosed using different methods such as molecular biology, cytochemistry, immunophenotyping, and the study of morphology under a microscope. The morphological study of WBCs in a blood smear under microscope is the most available and cost-effective method for leukaemia diagnosis. However, this method is mainly conducted manually and therefore influenced by a human factor. Results for the same sample could differ from one operator to another hence an inter-operator variability that could lead either to false positives or false negatives.

Many methods of automated detection of leukaemia from blood smear images acquired in different conditions have been proposed to address the above challenge. Segmentation of WBC plays an important role in this wise. Segmentation methods change from one author to the other. Byoung et al. [23], applied the estimated probability density function, mean-shift clustering, merge rule, and the snake algorithm for nucleus segmentation. They used 100 blood smear images of CellaVision and a private database in their study. They reported a low average accuracy of 88% for nucleus segmentation. Madhloom et al. [29] applied arithmetic calculations, followed by linear contrast stretching, histogram equalization, and a 3×3 median filter to highlight the leukocyte nucleus in the peripheral blood smear images. Finally, a global threshold was applied to extract nuclei on the resulting image.

Vogado et al. [57] on the other hand, used components of two color spaces (CMYK and L\(\times a\times\)b), contrast adjustment and median filter for nucleus segmentation. They also made use of K-means with 3 clusters and morphological erosion and dilatation as final process to refine the boundary of the nucleus. Unfortunately, the implementation of morphological operation affected the natural shape of the nucleus. Three databases (ALL-IDB2, BloodSeg and Leukocytes) were used in their study. They reported an average K-index equal to 0.902. This method also achieved a high average of Dice similarity coefficient of 89.84 % ompared to six existing nucleus segmentation methods based on k-means[1, 5, 25, 30, 40, 56] in the comparative study done by Ref [6].

Hedge et al. [18] proposed an image pre-processing method based on contrast enhancement and normalization process. The tissuequant algorithm and adaptive thresholding were successively applied on the result of the prepossessing step for nucleus extraction. However, this method is single color space dependent and was mostly tested on derived images of a private dataset. Tareef et al. [50], applied an arithmetical equation on the RGB and CIElab component to generate a grayscale image, then, Poisson distribution based on thresholding for WBC nucleus segmentation and morphological filter for post-processing. These researchers also yielded decorrelation stretch enhancement and discrete wavelength transform for cytoplasm extraction. This method was tested on BloodSeg and LSCI databases and achieved a segmentation performance metric equal to 88.6%. Zeinab et al. [33] presented a computer-aided diagnosis system of leukaemia. In their system, the WBC nucleus was segmented by applying thresholding based on intuitionistic fuzzy divergence on the component of L\(\times\)a\(\times\)b color space. However, this nucleus segmentation method was quantitatively evaluated on a single database, namely LSCI. The authors reported an average Segmentation performance metric equal to 76%.

More recently, Makem and Tiedeu [31] proposed a nucleus segmentation method based on a two-color spaces (HSV and CMYK), an adaptive PCA fusion process, and the Otsu’s thresholding method. These authors avoided the use of morphological operation in their segmentation method and their algorithm was assessed on three databases (BloodSeg, CellaVison, and JTSC). They reported an average dice similarity coefficient of 94.2%. Khamael et al. [2] on the other hand applied contrast enhancement technique, morphological opening and closing, and edge-based GACS with forces such as curvated, normal direction, and vector field for WBC nucleus extraction. They obtained an average F-score of 92.09% on three datasets (CellaVision, ALL-IDB2, and Wadsworth Centre). Sapna et al [38] proposed an approach of nucleus segmentation based on mathematical operation, Fuzzy c-mean clustering, and morphological opening and erosion. The authors achieved an overall accuracy of 88.1% on the LSCI database.

In addition to traditional image processing methods, deep learning networks were also employed for WBC segmentation and classification [7, 9,10,11, 19, 22, 26, 28, 36, 52, 58]. Thanh et al. [52] proposed a deep learning architecture, namely SegNet for both segmentation of white blood cells and red blood cells in peripheral blood smear images. The authors reported the use of 45 images of the ALL-IDB1 database. Kutlu et al. [26] proposed a regional base convolutional neural network (CNN) for the identification and classification of WBCs. In their study, AlexNet, VGG16, GoogLeNet, ResNet50 architectures were tested with full learning and transfer learning. The architectures were trained and tested using the LSCI and BCCD databases. Partha et al. [9] used color space conversion, circular average filter and k-means clustering for nucleus extraction. However, this nucleus segmentation method did not perform well on contrast variation of the JTSC database. The authors also used a CNN model based on the concept of fusing first and last convolutional features and propagating input image to each layer for WBC classification. Their classification model was trained and tested on the BCCD database. In their study, Yusuf et al. [10] classified WBC under five categories by using capsule networks. They considered the LSCI database. Amin et al. [22] proposed a method based on deep learning for automatic detection of the nucleus and cytoplasm region in the peripheral blood smear. In their study, techniques such as regularization, transfer learning, and data augmentation were used to avoid the overfitting problem. However, this method was trained and tested on 87 images of a private database. Anita et al. [7] made use of the ellipse detection approach and artificial electric field algorithm (AEFA) for automatic detection and counting of white blood cells. They considered 68 images from the AHS database to evaluate the performance of their method. An overall detection of 96.80% was reported. Reena et al. [36] proposed a semantic segmentation technique based on deepLav3+ architecture for leukocyte segmentation in blood smear images. The authors tested their method on three public databases, namely JTSC, CellaVision and LSCI. The authors reported an average accuracy of 96.15%. Deep convolutional neural networks were also used by Chen et al. [11] for leucocyte segmentation. However, the authors did not evaluate the performance of the method on the challenge dataset, namely LSCI. They report an average dice coefficient of 96.68% on CellaVision and JTSC databases.

From these related works the most current limitations are:

  1. 1.

    Most of the existing nucleus segmentation methods based on traditional image processing techniques used erosion and dilation as a final step to refine the boundary of the nucleus, meanwhile, these processes directly affect the original shape of the nucleus [42]. Furthermore, some methods used private image databases with less variation of contrast and illumination [18, 22, 23] and their methods are not accessed on other databases.

  2. 2.

    Deep learning-based models required significant memory and computational time for training and testing the data [36]. They also require a large set of training data to achieve good segmentation accuracy and also to be generalized in other databases. Unfortunately, most of the authors who used the deep learning approach in their study had trained and tested their models on a maximum of two databases.

  3. 3.

    Many authors who used the deep learning approach in their study focused on the analysis of the entire leucocyte. Though it is demonstrated that, nucleus features are more useful than cytoplasm features in WBC classification and leukaemia diagnosis [20].

  4. 4.

    According to references [, 39, 62], the major setbacks of leukocyte nucleus segmentation are shape and color variation of the nucleus, WBC, presence of artifact and overlapped cells.

Faced with the above drawbacks, the question arises: Is it possible to develop an efficient and robust segmentation algorithm based on simple techniques that can selectively segment WBC nucleus for computer-aided-diagnosis of leukemia?

To conduct this work, we assume the following hypotheses:

  1. a.

    The algorithm developed can selectively segment nucleus of WBC

  2. b.

    The combination of simple and well known techniques used namely, arithmetical operation, Fourier Transform algorithm, mean shift technique for smoothing, and adaptative k-means clustering give improved results as compared to those present in literature

  3. c.

    The algorithm developed is efficient and robust

In this paper, we used a smart combination of techniques namely, Fourier Transform, arithmetical operation, mean shift cluster and k-means, to segment WBC nucleus which proved very efficient and robust. It is rather the smartness of the combination of known techniques that has be highlighted here, coupled to the fact that the proposed method yielded very good results as will be seen later (Section 4.2.3.). These are the major contributions of our work:

  1. 1.

    An efficient WBC image enhancement based on the combination of Fourier Transform, arithmetical operation, and mean shift cluster is presented.

  2. 2.

    An adaptive approach for the selection of the right number of cluster k, based on solidity and area parameters is designed for better extraction of the nuclei in the image processing using k-means clustering.

  3. 3.

    The proposed algorithm can segment the nucleus of WBC from five different databases acquired from different staining techniques and imaging conditions.

  4. 4.

    The results yielded from testing our method on these databases exhibit better performances than recently proposed methods in the literature.

The rest of this paper is organized as follows: the proposed method is presented in Section 2. In Section 3, the image databases used are described. Section 4 presents the evaluation metrics, the results and discussion. Finally, the conclusion and perspectives are given in Section 5.

2 Proposed method

Our three-step method aim is to extract the WBC nucleus on 5 blood smears image databases, acquired differently and having different colour variations. The block diagram of the proposed method is shown in Fig. 1. It is based on a new principle of image enhancement which uses Fourier Transform and arithmetical operation to enhance the contrast between the nucleus and the background. This enhancement is applied after localization of the WBC and followed by the adaptive K-means segmentation method.

Fig. 1
figure 1

Block diagram of the proposed method.

2.1 WBC localization

The objective of this step is to crop the input I_RGB image to obtain sub-images containing the WBC. The Cropping of WBC in a sub-image allows the reduction of the processing time and the improvement of segmentation accuracy [42]. The steps of our WBC localization method are listed below.

  1. 1.

    Extracting the green (G) and blue (B) components of the input I_RGB image.

  2. 2.

    Improving on the contrast of the nucleus in the blue component using the relation I_div = 70 * B / G.

  3. 3.

    Applying Otsu’s thresholding method on the grayscale image I_div.

  4. 4.

    Applying an area filter for extraction of fragment platelets and artefacts.

  5. 5.

    Applying morphological dilation with a diamond structural element of size 10 pixels to extend the detected region of WBC. The size 10 was found experimentally.

  6. 6.

    Cropping the Input image, I_RGB using the binary mask dimensions obtained in the previous step.

In the WBC localization process, Steps 1 and 2 help to obtain a grayscale image having two or three pixel intensities as shown in Fig. 2.b, from an input image (Fig. 2.a). Moreover, the amplification parameter 70 was found experimentally suitable for all the images of the different database used. Steps 3 to 6 allow the location of the WBC, estimation and application of the crop rectangle as shown in Fig. 2.d and c.

Fig. 2
figure 2

WBC localization: (a) original image, (b) \({\text{I}}_{\text{d}\text{i}\text{v}}\) image, (c) WBC cropping, (d) sub-images obtained.

2.2 Image enhancement

Based on the fact that each color space contains specific characteristics[20], in the proposed image enhancement method, all the components of RGB and CMYK spaces are not used. Indeed, the nucleus of WBC has high pixel intensities in the C and M color components of the CMYK space, but low in the R and G color components of the RGB space. These components will therefore be used in the image enhancement stages. Fig. 3 shows the difference in pixel intensity of the same nucleus for the C, M, R and G components.

Fig. 3
figure 3

M, G, C and R components of RGB and CMYK color spaces.

The proposed preprocessing method uses two main steps including the one performing arithmetic subtraction and addition operations between colour components, and the one performing the Fourier transform of an RGB image and the phase shift on all the elements of the phase [16]. These two steps help to obtain 3 gray level images (\({\boldsymbol{I}}_{\boldsymbol{M}\boldsymbol{G}},{\boldsymbol{I}}_{\boldsymbol{M}\boldsymbol{G}\_\boldsymbol{C}\boldsymbol{R}} \text{e}\text{t} {\boldsymbol{I}}_{\boldsymbol{F}\boldsymbol{T}\_\boldsymbol{M}\boldsymbol{G}}\)), from which the red blood cells are eliminated as shown in Fig. 4. (f, g and m). The three images are then recombined to form a new colour image named\({I}_{{R}^\prime{G}^\prime{B}^\prime}\). The mean shift clustering method using colour information with a flat kernel, proposed by [3], is applied to smooth and preserve the nucleus region in the new image\({I}_{{R}^\prime{G}^\prime{B}^\prime}\). Figs. 4. (n and o) respectively illustrate the image \({I}_{{R}^\prime{G}^\prime{B}^\prime}\)and the image obtained after applying the mean shift clustering.

The creation of intermediate images \({\boldsymbol{I}}_{\boldsymbol{M}\boldsymbol{G}}\) and \({\boldsymbol{I}}_{\boldsymbol{M}\boldsymbol{G}\_\boldsymbol{C}\boldsymbol{R}}\)

  1. 1.

    Converting the input \( {\boldsymbol{I}}_{\boldsymbol{RGB}} \) image into \({\boldsymbol{I}}_{\boldsymbol{C}\boldsymbol{M}\boldsymbol{Y}\boldsymbol{K}}\) in CMYK color space

  2. 2.

    Extracting the (R and G) components of \({\boldsymbol{I}}_{\boldsymbol{R}\boldsymbol{G}\boldsymbol{B}}\)image and (C and M) components of \({\boldsymbol{I}}_{\boldsymbol{C}\boldsymbol{M}\boldsymbol{Y}\boldsymbol{K}}\) 

  3. 3.

    Calculating \(:{\alpha }_{1}=\text{max}\left(M\right)/\text{m}\text{a}\text{x}\left(G\right)\)\({\alpha }_{2}=\text{max}\left(C\right)/\text{max}\left(R\right)\)    

  4. 4.

    Calculating: \({\boldsymbol{I}}_{\boldsymbol{M}\boldsymbol{G}}=\left\{\begin{array}{*{20}c}M-G i { \alpha }_{1}>1\\ M -0.5*G si { \alpha }_{1}<1 \end{array}\right.\)\({\boldsymbol{I}}_{\boldsymbol{C}\boldsymbol{R}}=\left\{\begin{array}{*{20}c}C-R si { \alpha }_{2}>1\\ C-0.3*R si { \alpha }_{2}<1 \end{array}\right.\) 

  5. 5.

    Replacing the negative pixels of \({\boldsymbol{I}}_{\boldsymbol{M}\boldsymbol{G}}\) and \({\boldsymbol{I}}_{\boldsymbol{C}\boldsymbol{R}}\) by zero.

  6. 6.

    Applying the circular average filter of radius r = 3 on \({\boldsymbol{I}}_{\boldsymbol{M}\boldsymbol{G}}\) and \({\boldsymbol{I}}_{\boldsymbol{C}\boldsymbol{R}} \text{i}\text{m}\text{a}\text{g}\text{e}\text{s}\).

  7. 7.

    Calculating \(:{\boldsymbol{I}}_{\boldsymbol{M}\boldsymbol{G}\_\boldsymbol{C}\boldsymbol{R}}=({\boldsymbol{I}}_{\boldsymbol{M}\boldsymbol{G}}+{\boldsymbol{I}}_{\boldsymbol{C}\boldsymbol{R}})/2\).

The creation of intermediate image \({\boldsymbol{I}}_{\boldsymbol{F}{\boldsymbol{T}}_{\boldsymbol{M}\boldsymbol{G}}}\)

  1. 1.

    Calculating \({\boldsymbol{I}}_{\boldsymbol{R}\boldsymbol{G}\boldsymbol{B}\_\boldsymbol{F}\boldsymbol{T}}=\) Fourier Transform of\({\boldsymbol{I}}_{\boldsymbol{R}\boldsymbol{G}\boldsymbol{B}}\).

  2. 2.

    Computing the phase \({\boldsymbol{I}}_{\boldsymbol{p}\boldsymbol{h}\boldsymbol{a}\boldsymbol{s}\boldsymbol{e}}\) and the magnitude \({\boldsymbol{I}}_{\boldsymbol{m}\boldsymbol{a}\boldsymbol{g}}\) of\({\boldsymbol{I}}_{\boldsymbol{R}\boldsymbol{G}\boldsymbol{B}\_\boldsymbol{F}\boldsymbol{T}}\).

  3. 3.

    Computing\({\boldsymbol{I}}_{\boldsymbol{R}\boldsymbol{G}\boldsymbol{B}\_\boldsymbol{F}\boldsymbol{T}\boldsymbol{S}}={\boldsymbol{I}}_{\boldsymbol{m}\boldsymbol{a}\boldsymbol{g}} \times \boldsymbol{e}\boldsymbol{x}\boldsymbol{p}\left(\boldsymbol{j}\right({\boldsymbol{I}}_{\boldsymbol{p}\boldsymbol{h}\boldsymbol{a}\boldsymbol{s}\boldsymbol{e}}+\frac{\boldsymbol{\pi }}{2}\left)\right)\)

  4. 4.

    Calculating \({\boldsymbol{I}}_{\boldsymbol{R}\boldsymbol{G}\boldsymbol{B}\_\boldsymbol{I}\boldsymbol{F}\boldsymbol{T}}=\) Inverse Fourier Transform of\({\boldsymbol{I}}_{\boldsymbol{R}\boldsymbol{G}\boldsymbol{B}\_\boldsymbol{F}\boldsymbol{T}\boldsymbol{S}}\).

  5. 5.

    Normalizing \({\boldsymbol{I}}_{\boldsymbol{R}\boldsymbol{G}\boldsymbol{B}\_\boldsymbol{I}\boldsymbol{F}\boldsymbol{T}}\)in the range [0 1]

  6. 6.

    Extracting the green component \({\boldsymbol{I}}_{\boldsymbol{G}\_\boldsymbol{I}\boldsymbol{F}\boldsymbol{T}}\) of\({\boldsymbol{I}}_{\boldsymbol{R}\boldsymbol{G}\boldsymbol{B}\_\boldsymbol{I}\boldsymbol{F}\boldsymbol{T}}\).

  7. 7.

    Calculating\({\boldsymbol{I}}_{\boldsymbol{G}\_\boldsymbol{I}\boldsymbol{F}\boldsymbol{T}\boldsymbol{R}1}=Rescale\left({\boldsymbol{I}}_{\boldsymbol{G}\_\boldsymbol{I}\boldsymbol{F}\boldsymbol{T}} , 0.3, 1\right)\)

  8. 8.

    Computing \({\boldsymbol{I}}_{\boldsymbol{G}\_\boldsymbol{I}\boldsymbol{F}\boldsymbol{T}\boldsymbol{R}2}= Rescale\left({\boldsymbol{I}}_{\boldsymbol{G}\_\boldsymbol{I}\boldsymbol{F}\boldsymbol{T}} , a, b\right)\) where \(a=\text{min}\left({\boldsymbol{I}}_{\boldsymbol{M}\boldsymbol{G}\_\boldsymbol{C}\boldsymbol{R}}\right);b=\text{m}\text{a}\text{x}\left({\boldsymbol{I}}_{\boldsymbol{M}\boldsymbol{G}\_\boldsymbol{C}\boldsymbol{R}}\right)\);\(Rescale\left(I,a,b\right)=a+\left(\frac{I-\text{min}\left(I\right)}{\text{max}\left(I\right)-\text{min}\left(I\right)}\right)\times (b-a)\)

  9. 9.

    Calculating\({\boldsymbol{I}}_{\boldsymbol{F}\boldsymbol{T}\_\boldsymbol{M}\boldsymbol{G}} =\frac{{\boldsymbol{I}}_{\boldsymbol{G}\_\boldsymbol{I}\boldsymbol{F}\boldsymbol{T}\boldsymbol{R}2}+{\boldsymbol{I}}_{\boldsymbol{M}\boldsymbol{G}}}{2\times {\boldsymbol{I}}_{\boldsymbol{G}\_\boldsymbol{I}\boldsymbol{F}\boldsymbol{T}\boldsymbol{R}1}}\).

  10. 10.

    \({\boldsymbol{I}}_{\boldsymbol{F}\boldsymbol{T}\_\boldsymbol{M}\boldsymbol{G}}=Rescale\left({\boldsymbol{I}}_{\boldsymbol{F}\boldsymbol{T}\_\boldsymbol{M}\boldsymbol{G}} , a, b\right)\) 

  11. 11.

    Applying the circular average filter of radius r = 3 on \({ \boldsymbol{I}}_{\boldsymbol{F}\boldsymbol{T}\_\boldsymbol{M}\boldsymbol{G}}\).

Fig. 4
figure 4

(a) Original image; (b) R color component; (c) G color component; (d) C color component; (e) M color component; (f) \({I}_{MG}\) (subtraction of M and G colors components); (g) \({I}_{CR}\) (subtraction of C and R colors components); (h) \({I}_{MG\_CR}\) (average of \({I}_{MG} and {I}_{CR}\)); (i) \({I}_{RGB\_IFT}\) (RGB image obtain with phase shift of \(\frac{\pi }{2}\)); (j) \({I}_{G\_IFT}\) (G component of \({I}_{RGB\_IFT}\)); (k) \({I}_{G\_IFTR1}\) (resale of \({I}_{G\_IFT}\)in range [0.3 1]); (l) \({I}_{G\_IFTR2}\) (resale of \({I}_{G\_IFT}\)in range [a b]);(m) \({I}_{FT\_MG}\) (image obtain with \({I}_{MG\_CR}, {I}_{{G}_{IFTR1}}and {I}_{G\_IFTR2}\) ); (n) \({I}_{{R}^\prime{G}^\prime{B}^\prime}\)(new RGB image); (o) mean shift clustering result of \({I}_{{R}^\prime{G}^\prime{B}^\prime}\)  

2.3 Adaptative K-means clustering

The resulting image \({I}_{{R}^\prime{G}^\prime{B}^\prime}\) of the proposed enhancement step in the ideal case is a two-colour image. One describes the nucleus and the other describes the background as shown in Fig. 5. (a and j). In that case, the k-means algorithm [12] with a number of clusters (nb_clust) equals to two will effectively segment the nucleus. But in some cases, the image \({I}_{{R}^\prime{G}^\prime{B}^\prime}\) possesses more than two colors which describe the nucleus, background and cytoplasm as shown in Fig. 5.(b - i). In order to find the suitable nb_clust of all the images resulting from the preprocessing for nucleus extraction, the K-means algorithm is applied first to separate the images \({I}_{{R}^\prime{G}^\prime{B}^\prime}\) into 3 clusters. The intensities of the clusters are saved separately. The cluster of maximum intensity is taken for the nucleus, the median intensity for cytoplasm, and the last for background. A criterion based on area and solidity is defined to establish the appropriate number of clusters for the extraction of WBC nucleus in each image.

Fig. 5
figure 5

Output of the proposed image enhancement for (a)-(b) CellaVision, (c)-(d) JTSC, (e) BloodSeg, (f)-(h) ALL-IDB2, and (i)-(j) LSCI database.

The choice of the number of clusters is made according to the following steps:

  1. 1.

    Applying K-means on \({I}_{{R}^{\text{'}}{G}^{\text{'}}{B}^{\text{'}}}\) image to group the pixels into 3 clusters.

  2. 2.

    nuc_clust = takes the maximum intensity cluster group to segment the nucleus

  3. 3.

    cyto_clust = takes the mid-intensity cluster group to segment the cytoplasm.

  4. 4.

    \(nb_{clust}=\left\{\begin{array}{*{20}c}2 & if \ \frac{Area(\text{cyto}\_\text{clust})}{Area(\text{nuc}\_\text{clust})}<\frac23\\ 3 & if \ \frac{Area(cyto\_clust)}{Area\left(nuc\_clust\right)}>\frac23 \ and \ solidity<0.7\\ 2 & otherwise\end{array}\right.\) 

The boundaries values (2/3 and 0.7) used to select the nb_clust were defined experimentally for optimal extraction of the WBC nucleus. Finally, the K-means algorithm is applied to the image \({I}_{{R}^\prime{G}^\prime{B}^\prime}\) by considering the nb_clust found. The cluster with the maximum intensity is then taken to segment the nucleus. The area filter is applied to remove the platelet fragments. As in Ref [31], pixel’s regions less than 800 were eliminated in BloodSeg and CellaVision database, and pixels less than 100 for JTSC database. For LSCI and ALL-IDB2 databases, pixel’s regions less than 500 were removed.

3 Image databases and evaluation metrics

3.1 Image databases

In this study, five public blood smear image databases acquired under different conditions and widely used by several researchers were considered to assess the performance and robustness of the proposed method. These databases commonly called ALL-IDB2 [41], BloodSeg [32], LSCI [35], CellaVision and JTSC [63] were respectively acquired in Italy, Sweden, china, Malaysia and Iran. The blood smear images of these databases contain red blood cells, platelets and WBC. WBC has a nucleus and a cytoplasm, and exists in five types (basophil, neutrophil, eosinophil, lymphocyte and monocyte). CellaVision, JTSC, BloodSeg and LSCI databases are developed with manual segmentation performed by an expert. For the ALL-IDB2 database, we considered the expert segmentation provided and used in the work of Partha et al. [9]. The features of the five databases considered are summarized in Table 1.

Table 1 Features of image databases used to evaluate the segmentation methods

3.2 Evaluation metrics

The following parameters: Accuracy (A), Precision (P), Recall (R), Specificity (S), Dice Similarity Coefficient (DSC), Kappa index (K), Classification Error (ME), Jaccard Distance (JD) [6, 31] were computed to assess the performance of the proposed segmentation method. We also calculated the confusion matrix and the true positive rate \({TPR}_{t}\) _t [6]. The DSC and K metrics assess differently the similarity between the segmentation performed by the expert and the one obtained with the proposed method. The values of these metrics vary from 0 to 1. The values of DSC close to 1 attests the robustness of the segmentation method, which corresponds to a low value of ME and JD. According to [27], a specific qualifier can be given to the segmentation method based on the value of K: poor (K≤ 0.2); reasonable (0.2 <K≤ 0.4); good (0.4 <K≤ 0.6); very good (0.6 <K≤ 0.8); excellent (K> 0.8). \({TPR}_{t}\) represents the ratio between the number of white blood cells (w) with \({DSC}_{l} \ge t\) and the total number of white blood cells (n) in the base. The value of t considered is 0.9 as in the work of [6]. The mathematical expressions of the previously mentioned metrics are shown in Table 2. In this table, TP represents the region of the segmented nucleus which corresponds to the segmentation made by an expert (Ground Truth (GT)), TN the segmented background coinciding with that segmented by expert, FP the background segmented as the nucleus and FN the nucleus region segmented as the background. In addition, \(Aexp\) represents the area of nucleus segmented by the expert (GT) and \(Aprog\) the area of nucleus obtained with the proposed method.

3.3 Complexity analysis

The evaluation of the computational complexity of an algorithm is mainly based on the number of arithmetic and logical operations of the algorithm. The proposed method has three main steps. Firstly, the WBC is localized in the original image using Otsu’s thresholding which time complexity is O(L)[49] plus other operations such as multiplication, division, comparison for which complexity is in the order of O(N) [60]. The time complexity of this step is O(L+N). Where N is the number of pixels of the input image and L highest intensity level of the pixel. Secondly, the image enhancement using arithmetical operation, Fourier Transform and mean shift operation for which time complexity are respectively estimated to O(N), O(N2)[53] and O(TN2)[13], where T is the number of iterations and N the number of pixels of the input image. The time complexity of this step is O(N+N2+ TN2) ≈ O(N2). Finally, adaptative k_means operation is applied to segment the nucleus. According to reference[34], the k-means algorithm is known to have a time complexity of O (N2). Where N is also the number of pixels of an image. So, the total time complexity of the proposed method is O(L+N+N2+N2)≈ O(N2).

Table 2 Mathematical expression of the similarity metrics

4 Results and discussions

4.1 Results of the new proposed enhancement method

Our proposed method for the enhancement of blood smear images is robust to variations in colour and illumination. Figs. 6, 7, 8, 9, and 10 show the results of the proposed enhancement method for the five image databases.

Fig. 6
figure 6

Output of the proposed image enhancement for ALL-IDB2 database: (a) Original image, (b) \({I}_{{R}^\prime{G}^\prime{B}^\prime}\)(new RGB image), (c) mean shift of \({I}_{{R}^\prime{G}^\prime{B}^\prime}\).

Fig. 7
figure 7

Output of the proposed image enhancement for CellaVision database: (a) Original image, (b) \({I}_{{R}^\prime{G}^\prime{B}^\prime}\)(new RGB image), (c) mean shift of \({I}_{{R}^\prime{G}^\prime{B}^\prime}\).

Fig. 8
figure 8

Output of the proposed image enhancement for BloodSeg database: (a) crop of original image, (b) \({I}_{{R}^\prime{G}^\prime{B}^\prime}\)(new RGB image), (c) mean shift of \({I}_{{R}^\prime{G}^\prime{B}^\prime}\).

Fig. 9
figure 9

Output of the proposed image enhancement for JTSC database: (a) Original image, (b) \({I}_{{R}^\prime{G}^\prime{B}^\prime}\)(new RGB image), (c) mean shift of \({I}_{{R}^\prime{G}^\prime{B}^\prime}\).

Fig. 10
figure 10

Output of the proposed image enhancement for LSCI database: (a) crop of original image, (b) \({I}_{{R}^\prime{G}^\prime{B}^\prime}\) (new RGB image), (c) mean shift of \({I}_{{R}^\prime{G}^\prime{B}^\prime}\).

4.2 Evaluation and comparison of the performance of the proposed method on the five databases.

A Core-i5 processor computer with Windows 10 operating systems and 12 GB RAM was used for experiments. The proposed method has been implemented in Matlab 2018.a with the appropriate toolboxes.

4.2.1 Presentation of the confusion matrix results

In this section, a comparison is made between the region of the nucleus obtained with the proposed method and that obtained by the expert segmentation. In the case of the ALL-IDB2 database, the comparison is made separately depending on whether the leukocytes are leukemia or healthy. The results of the mean value of the confusion matrix for the five databases are shown in Tables 3, 4, 5, 6, 7, and 8. These tables illustrate that the proposed method can effectively segment WBC nucleus in the five image databases with a minimum average percentage of TP equal to 85.01% (Table 8), and a low percentage of FN equal to 0.26% (Table 5).

Table 3 Average confusion matrix for normal leukocyte of ALL-IDB2 database
Table 4 Average confusion matrix for abnormal leukocyte of ALL-IDB2 database
Table 5 Average confusion matrix for CellaVision database
Table 6 Average confusion matrix for JTSC database
Table 7 Average confusion matrix for BloodSeg database
Table 8 Average confusion matrix for LSCI database

4.2.2 Results of the evaluation metrics

For each image database considered, the nine metrics presented in Table 2 were calculated, and the results were presented in Tables 9 and 10. From the analysis of these tables, it emerges that the proposed method achieves much better performance for the abnormal leukocytes of the ALL-IDB2 database, with respective mean values of 96.52%, 96.69%, 96.09% and 0.96 for the R, DSC, TPR and K (Table 9). In addition, the obtained results of the proposed method for CellaVison database are higher than the results of the four other databases in terms of A, P, R, DSC, K and TPR (Table 10). This superiority is confirmed by the inferiority values of the metric ME and JD of this database compared to those of the four other databases (Table 10). The lowest performance is obtained for LSCI database, with average values of R, DSC, TPR and K respectively equal to 85.16%, 86.60%, 60.74%, 0.85 (Table 10). Although the lowest value of K being 0.85 is obtained for LSCI database, according to [27] a qualifier of excellent is attributed to the proposed method. The poor performance obtained for the LSCI database may be due to the presence of nucleus regions which are very dark and difficult to discern in this database [33].

Table 9 Results of evaluation metrics for the ALL-IDB2 database
Table 10 Results of evaluation metrics for the CellaVision, JTSC, BloodSeg and LSCI databases

4.2.3 Comparison of the proposed method with previous methods in the literature

In this session, the results of the nucleus segmentation obtained with the proposed method are visually compared with the expert segmentation and three other competitive segmentation methods [31, 56, 57]. The selection of those three methods was motivated by the availability of their implementation algorithms and their performance compared to 15 existing nuclei segmentation methods in the study made by Refs [6, 31]. So, it is an indirect comparison to 15 other methods. The visual comparison is performed by considering 5 images in each of the databases used, and the illustrations are shown in Figs. 11, 12, 13, 14, and 15. In these figures, row (a) represents the images obtained after superimposing the contours of nucleus segmented by the expert on the original image; line (b), the images obtained after superimposing the boundary of the nucleus obtained with the proposed method; and rows (c) to (e) represent the result of the superposition of the nucleus boundary obtained with the methods of Makem and Tiedeu. [31], Vogado et al. [57] and vincent et al. [56], respectively, on the original image. As shown in Fig. 11 (B, 1-5), Fig. 12 (B, 1-5), Fig. 13 (B, 1-5), Fig. 14 (B, 1-5), Fig. 15 (B, 1- 5), the region of the nucleus segmented by the proposed method matches nicely with that of expert, for all the 5 image databases considered. These results indicate that the proposed method is robust in face of the variation of nucleus contrast, consequently to the different acquisition processes of blood smear images. In contrast to the proposed method, in ALL-IDB2 and BloodSeg databases, the methods implemented by Makem and Tiedeu. [31], Vogado et al. [57] and Vincent et al. [56] include some portions of cytoplasm in the segmented region of nucleus as illustrated in Fig. 11. ([c, d, e], 2), Fig. 11. ([c, d, e], 5), Fig. 14. ([c, d, e], 3) and Fig. 14. ([c, d, e ], 4). For CellaVision and LSCI databases, the method of Makem and Tiedeu. [31], as shown in Fig. 12. ([c, e], 1-4) and Fig. 15 ([c, e] 2-3) gives better nucleus region compared to the methods of Vogado et al. [57] and Vincent et al. [56]. However, as shown in Fig. 12. ([b, d], 3-4) and Fig. 15. ([b, d], 1-2), the method proposed in this work gives more satisfactory results than that of Makem and Tiedeu [31]. Compared to the proposed method, the algorithm implemented in the JTSC database by Vincent et al. [56] gives poor nucleus segmentation results in most of the images as shown in Fig. 13. (e, [1 ,2, 3, 5]).

Fig. 11
figure 11

Visual comparison of the proposed method with expert segmentation and the result of three state of art method for ALL-IDB2 database: (a) expert segmentation, (b) result of the proposed method, (c) Vogado et al. [57], (d) Makem et al. [31], (e) Vincent et al. [56] .

Fig. 12
figure 12

Visual comparison of the proposed method with expert segmentation and the result of three state of art method for CellaVision database: (a) expert segmentation, (b) result of the proposed method, (c) Vogado et al. [57], (d) Makem et al. [31], (e) Vincent et al. [56] .

Fig. 13
figure 13

Visual comparison of the proposed method with expert segmentation and the result of three state of art method for JTSC database: (a) expert segmentation, (b) result of the proposed method, (c) Vogado et al. [57], (d) Makem et al. [31], (e) Vincent et al. [56] .

Fig. 14
figure 14

Visual comparison of the proposed method with expert segmentation and the result of three state of art method for BloodSeg database: (a) expert segmentation, (b) result of the proposed method, (c) Vogado et al. [57], (d) Makem et al. [31], (e) Vincent et al. [56] .

Fig. 15
figure 15

Visual comparison of the proposed method with expert segmentation and the result of three state of art method for LSCI database: (a) expert segmentation, (b) result of the proposed method, (c) Vogado et al. [57], (d) Makem et al. [31], (e) Vincent et al. [56] .

In Figs. 16, 17, 18, 19, 20, and 21, a quantitative comparison between the proposed method and the algorithmss by Makem and Tiedeu [31], Vogado et al. [57] and Vincent et al. [56] for all the image databases used are presented. This comparison was carried-out using parameters A, DSC and TPR. To perform the comparisons, the methods of Makem and Tiedeu. [31], Vogado et al. [57] and vincent et al. [56] were coded in Matlab 2018. Analysis of the comparison figures reveals that the proposed method gives the highest values of DSC and TPR in the database CellaVision (Fig. 18), JTSC (Fig. 19) and abnormal leukocytes in the base ALL-IDB2 (Fig. 16). For the LSCI database (Fig. 21), the proposed method gives an average DSC value greater than those of the methods of Vincent et al., Vogado et al. and Makem and Tiedeu [31, 56, 57], with a TRP value also greater than those obtained by the methods of Refs. [56, 57] but 2.89 lower than that of Ref. [31]. For normal leukocytes of ALL-IDB2 database (Fig. 17), the mean value of DSC obtained with the proposed method is lower than those of the methods by Makem and Tiedeu [31] and Vogado et al.[57], nevertheless, higher than that of the method by Vincent et al. [56]. The mean value of DSC of the proposed method is slightly lower than that of Makem and Tiedeu [31] for the BloodSeg (Fig. 20) database, but higher than that of Vogado et al. [57] and Vincent et al. [56].

Fig. 16
figure 16

Comparison between the proposed method and recent methods in the literature for leukemia WBCs using the ALL-IDB2 database.

Fig. 17
figure 17

Comparison between the proposed method and recent methods in the literature for normal WBCs using the ALL-IDB2 database.

Fig. 18
figure 18

Comparison between the proposed method and recent methods in the literature unsig the CellaVision database.

Fig. 19
figure 19

Comparison between the proposed method and recent methods in the literature using the JTSC database.

Fig. 20
figure 20

Comparison between the proposed method and previous methods in the literature using the BloodSeg database.

Fig. 21
figure 21

Comparison between the proposed method and recent methods in the literature using the LSCI database.

We also compared the average performance of the proposed method with existing segmentation algorithms. The similarity measure for nucleus extraction is presented in Table 11.

Table 11 Performances of the state-of-the-art algorithms and the proposed method.

4.3 Discussion

The shape of the nucleus of WBCs varies from round to lobulated depending on the type of WBC, on one hand. On the other hand, the color of the nucleus in the blood smear images varies from database to database, due to the difference between acquisition devices and the techniques of smear preparation. Therefore, segmentation of the nucleus WBCs from different image databases is a very difficult task. Confronting the results obtained to our initial hypotheses, we can bring out the following remarks:

  1. 1.

    The algorithm developed in this paper successfully segmented in a selective way the nucleus of WBC. The average accuracy of the algorithm on the databases as diverse as the ones we have chosen is still very high (98.34% ). This confirms our first hypothesis

  2. 2.

    From comparison with other works standpoint, the proposed algorithm gave better results than 18 recent works in the literature (15 by indirect comparison and 3 by direct comparison as explained in Section 4.2.3). This was the point of our second hypothesis.

  3. 3.

    The proposed algorithm has been tested on 5 databases and gave good results on one hand. On the other hand, we computed a metric (K) that measures the robustness of a segmentation algorithm. According to [27], a value of K>0,8 is considered excellent. The proposed algorithm yields an average value of K of 0,91when used on the 5 databases. This means that the segmentation algorithm proposed is very robust. This confirms the third hypothesis.

We recall that most of the existing methods using k-means achieve low results because they focus on a static number of clusters for all the input images. The new method proposed gave satisfactory results on the databases (ALL-IDB2, JTSC, CellaVision, BloodSeg, and LSCI). Our results were compared, database by database, with those of 3 competitive literature methods [31, 56, 57]. These comparisons have been illustrated by Figs. 16, 17, 18, 19, 20, and 21. An average performance comparison was also made between our proposed method and 12 existing nucleus segmentation methods in Table 11. From that table it emerged that the performances of the proposed method is better compared to the other proposed methods that do not use private databases [9, 24, 31, 33, 38, 50, 51, 56, 57]. We obtained an average DSC equal to 92.89% for all the five databases acquired from different staining techniques and imaging conditions. This indicates that the combination of Fourier Transform, arithmetical operation and mean shift for nucleus enhancement performs better for images with brightness and color variation. The high performance obtained by Refs [17, 18, 22] may be the result of the mainly use of private database with less variation of contrast and illumination for their evaluation and the inconsideration of the challenged database, namely LSCI for their test. The proposed method could give more promising results for segmentation of nuclei in the five or more databases if we made an accurate choice of the number of clusters

As far as the complexity is concerned, the value from the proposed algorithm which is O(L+N+N2+N2)≈ O(N2), is high compared to the method proposed in [14] O(L + N.K) but, better as compared to other methods developed in [55] O(Nc+l), [59] \(O\left(m{\left(\frac{N}{m}\right)}^{3}+N{log}_{B}N\right)\) and [61] tO(N2).

As far of the clinical significance of our finding is concerned, suffices it to say that WBC segmentation is a critical step in preparation of WBC classification for computer-aided-diagnosis of leukaemia. The result of the segmentation affects the classification and therefore the computerized diagnosis. It is our pleasure to note that the proposed algorithm is able to segment leucocyte nucleus of both leukemia and healthy cells (see Table 9) Based on this, our research can be used in the segmentation step of an automated system of leukemia diagnosis. As limitation to our proposed method, the proposed criteria for the selection of the number of clusters may not give a good result on some enhancement images.

5 Conclusion

This paper proposed a new method for detection and extraction of the nucleus of white blood cells in blood smear images. To extract the WBC nucleus, images with a resolution greater than 300\(\times\)300 were first cropped, then a new colour image named \({I}_{{R}^{\text{'}}{G}^{\text{'}}{B}^{\text{'}}}\)is obtained from the cropped image by exploiting the arithmetic operations using a control parameter, the Fourier Transform, the average circular filter and the mean shift clustering method. A new criterion is established for choosing the appropriate number of clusters to segment the nucleus in the color image. After segmentation of the nucleus, the area filter is applied to eliminate artifacts in the binary image. The proposed method was tested on 5 totally different image databases, the following means of [DSC, TPR] were obtained: [97.35%, 97%] for CellaVision database [96.63%, 96.09%] for leukemia leukocytes of ALL -IDB2 database, [93.48%, 83.74%] for BloodSeg database, [93.17%, 89.33%] for JTSC database, [90.63%, 73.30%] for the healthy leukocytes of the ALL-IDB2 database and [86.02%, 60.74%] for the LSCI database. These results demonstrate the robustness of the proposed method with respect to the variation of nucleus contrast in the different image databases. The proposed method yields a better average performance as compared to at least nine competitive methods in the literature. WBC segmentation is an important part of the WBC classification system, and the segmentation results directly affect the accuracy of the system. Our new approach is able to segment leucocyte nucleus of both leukemia and healthy cells. Based on this, our research can be used in the segmentation step of an automatic system of leukemia diagnosis.

The ultimate goal of this project is computer-aided-diagnosis of leukaemia. The first step is segmentation of WBC nucleus. The second one will deal with classification of WBC nucleus. This will be carried out using convolutional neural networks