Keywords

1 Introduction

The Segmentation is the process that subdivides an image into its constituent parts or objects. Accurate segmentation of objects of interest in an image further improvises the analysis of these objects. Much desired segmentation techniques are edge detection and clustering techniques. Unsupervised clustering is required to identify the interesting patterns or groupings in a given set of data. In the area of pattern recognition of an image processing, the unsupervised clustering is used for “segmenting” the images and it can be a very effective technique to pinpoint natural groupings in data from a large data set, thereby letting concise representation of relationships embedded in the data [1].

Leukemia has become a modern day curse on mankind and turned to be a deadly disease claiming lives of thousands. Leukemia is a medical condition, where the combination of cancers initiated from the bone marrow and ends in forming excessive numbers of abnormal white blood cells. These abnormal blood components are termed as leukemia cells or “blasts”. Based on the rate of leukemia cells growth condition, it is termed as acute or chronic. The type of cells infected by Leukemia defines its condition – myelocytes or lymphocytes and, leukemia is classified as lymphocytic, chronic myelocytic and acute myelocytic.

There is an urgent requirement of detecting leukemia automatically as the present system involves manual methods examining the blood smear as the first step toward diagnosis, the method is a lengthy process and it’s accuracy is unproven. Morphological, textural and color features are extracted from the segmented nucleus and cytoplasm regions of the leukemia images, which facilitates hematologists towards timely identification and detection of leukemia from blood microscopic images, which could save the patient life. From the close review of allied work and published materials, it could be seen that high number of researchers utilized wide range of segmentation strategies [2].

Saha et al. [3] has proposed a novel and robust method for segmenting nucleus in overlapping Pap smear images, which forms a basis for further processing of cell images. The method proven to be successful in detecting and segmenting nucleus for isolated, touching and overlapping cells in Pap smear images and it also evolves a circular shape function with FCM clustering to improvise the image data partition. Dice coefficient, precision and recall were used for quantitative evaluation of the proposed method.

Gu et al. proposed [4] a new clustering algorithm, called sparse learning based fuzzy c-means (SL_FCM). Firstly, to reduce the computation complexity of the SR based FCM method and removing the redundant information in the discriminant feature is also done to improve the clustering quality. This algorithm performs better than other state-of-art methods with higher accuracy, for the large scale dataset and image.

Ananthi et al. [5] developed a new threshold based segmentation technique cantered on Interval-Valued Intuitionistic Fuzzy Sets (IVIFS). This methodology was developed to sort out the issue of selecting the values of membership function to symbolize imprecise data. The main aim of this paper is to segment leukocytes in blood smear images with the help of IVIFSs. An IVIFS is to be identified among the 256 IVIFSs having maximizing ultra-fuzziness along with varying threshold.

Liu et al. [6] introduce a first time fluid identification method in carbonate reservoir based on the modified Fuzzy C-Means (FCM) Clustering algorithm. Both initialization and globally optimum cluster center are generated from Chaotic Quantum Particle Swarm Optimization (CQPSO) algorithm, which is prudently used to evade the drawback of sensitivity to initial values and simply falling into local convergence in the Standard FCM Clustering algorithm.

A dynamic niching clustering algorithm based on individual-connectedness (DNIC) has been developed by Chang et al. [7] in which he proposes a compact k-distance neighborhood algorithm and an individual-connectedness algorithm. The algorithm derives the adaptive selection of the number of the niches to and dynamically identifies the niches. Many of the datasets with varying cluster volumes along with noisy points is successfully processed using DNIC clustering algorithm.

Shang et al. [8] proposed a clone kernel spatial FCM (CKS_FCM), which enhances segmentation performance through the generation of initial cluster centers, and by combining spatial information into the objective function of FCM and utilized a non-Euclidean distance based on a kernels metric, in place of the Euclidean distance traditionally used in FCM.

Jasmine Begum et al. [9] proposed a Hybrid Fuzzy C-Means Algorithm with cluster center Estimation (HFCMCCE), which hybridises the FCM with subtractive clustering for Leukemia Image Segmentation. This method is found to be suiting even for the images with 90% noise density up to 90%, additionally there is an increase in PSNR value and reduction in the Mean Squared Error for HFCMCCE applied image corresponding to the input and FCM applied image.

E. Rajaby, developed a novel method for color image segmentation by using only hue and intensity components of image and combines those by adaptive tuned weights in a specially defined fuzzy c-means cost function. This method specifies proper initial values for cluster centers with the aim of reducing the overall number of iterations and avoiding converging of FCM to wrong centroids. This algorithm showing a better performance is segmentation and speed compared to the other similar methods.

A Study is proposed by Jose L. Salmeron using a well-known soft computing method called Fuzzy Cognitive Maps (FCMs) for the early diagnosis of Rheumatoid Arthritis (RA) in order to assist physicians. Then, Particle Swarm Optimization (PSO) and FCMs along with medical experts’ knowledge were used to model this problem and calculate the severity of this disease. The obtained result shows that this tool will be useful for General Practitioner’s (GPs) to timely diagnosis of patients with RA.

The prime objective of this paper is to develop an advanced methodology in medical diagnosis bettering the available regular and other ambiguous approaches. Supplemental to the above objective, the paper proposes a computer-aided diagnosis system to assist the doctors in assessing medical images for diagnosing the disease in leukemia patients at the earlier stage.

2 Standard Fuzzy C-Means Algorithm

Fuzzy C-Means was initially proposed by Bezdek et al. It is the widely used tool for image processing in clustering objects in an image. FCM facilitates the pixels to secure a place with various cluster along with alterable degrees of participation. Owing to this extra adaptability, FCM is also termed as Soft clustering strategy. But in hard clustering, the data gets portioned into a specified number of mutually exclusive subsets. Fuzzy clustering is a simple methodology when compared to the hard clustering and it carries out the non-unique partitioning of the data in a collection of clusters [10].

The Standard Fuzzy C-Means Algorithm is as follows:

figure a

3 Particle Swarm Optimization

The particle swarm optimization (PSO) is proposed by Eberhart and Kennedy. PSO is a simple yet a robust search technique; it is used in a range of search and optimization problems, which includes image processing problems such as image segmentation. The simple explanation of the PSO is that the swarm of birds looks for its feed reiteratively the region around the bird, which seems to be near the food mostly and gets its feed at the end. After the repetitive process, particles traverse towards the region around the and \( \varvec{ }G_{{\varvec{Best}}} \varvec{ } \), so that it could arrive at the optimal point. Since each particle consist of several component particles, the process of particles movements is essentially that the component particles traverse in the direction of the corresponding particles of the pBest and gBest particles. So it becomes obvious that if there exists no corresponding relationship between the component particles of each particle the evolution process of the particles will be disordered and could not converge to the optimal solution [11, 12].

$$ V_{i} \left( {t + 1} \right) = \omega .V_{i} \left( t \right) + c_{1} .r_{1 } .(X_{i}^{l} \left( t \right) - X_{i} (\text{t})) + c_{2} .r_{2 } .(X^{g} - X_{i} (\text{t})) $$
(5)
$$ X_{i} \left( {t + 1} \right) = X_{i} \left( t \right) + V_{i} \left( {t + 1} \right) $$
(6)

\( \omega \) is the inertial weight.

\( V_{i} \left( t \right) \) is the previous velocity in iteration t of ith particle.

c1 and c2 are coefficients.

\( r_{1 } \) and \( r_{2 } \) are random numbers ranging between 0 to 1.

\( (X_{i}^{l} \left( t \right) - X_{i} ({\text{t}})) \) is the difference between the local best \( X_{i}^{l} \) of the ith particle and previous position \( X_{i} ({\text{t}}) \).

\( (X^{g} - X_{i} ({\text{t}})) \) is the difference between the global best \( X^{g} \) and previous position \( X_{i} ({\text{t}}) \).

The Particle Swarm Optimization Algorithm as follows

figure b

3.1 Coherence Particle Swarm Optimization with Specified Scrutiny of Fuzzy C-Means (CPSO-SSFCM) Algorithm

The various stages involved in the proposed algorithm CPSO-SSFCM include enhancement of microscopic images, segmentation of background cells, features extraction, and finally the classification. The Figs. 1 and 2 shows the Flow diagram and schematic diagram of the proposed algorithm. The Proposed Algorithm CPSO-SSFCM is given as follows:

Fig. 1.
figure 1

Schematic diagram of the proposed algorithm

Fig. 2.
figure 2

The flow chart for the proposed algorithm

figure c
figure d

4 Result Analysis

In order to validate the effectiveness of the method proposed in this paper, using coherence particle swarm optimization with Fuzzy C-Means Clustering algorithm based on the direct histogram to detect the normal and leukemia infected blood microscopic images.

The proposed method is implemented using the Image Processing Tools of MATLAB R2013a. The blood smear images were collected from dataset source [16]. These images were digitalized with the digital camera connected to a Carl Zeiss photo microscope with a magnification of 200x. The size of image is 256 × 256. The test is conducted on 50 images and in this paper detection of Acute Myelocytic Leukemia (AML) affected image and normal image is shown. The results of the proposed algorithm CPSO-SSFCM is compared with the standard FCM, HFCMCCE [8], EHFCMCCE [17] based on Full Reference and pixel based Image quality measures such as PSNR, MSE and statical measures such as Sensitivity, Specificity and Accuracy.

4.1 Pre-processing Step

For image denoising, linear filters are less effective to remove the noise. The best solution is to use nonlinear filters like median filter. The median filtering process is executed by replacing the central pixel with the median of all the pixels value in the current neighbourhood [18]. The Fig. 3 illustrates the pre-processing stage where the median filter is applied on the input image to remove the noise to equalize the gravy levels of image intensities.

Fig. 3.
figure 3

(a) Input image (leukemia infected) (b) Resized to 256 × 256 (c) Median filtered image

4.2 Segmentation Phase

In the proposed CPSO-SSFCM, initially the Fuzzy C-Means (FCM) algorithm is tuned based on the spatial hue value which is determined using the color histogram to detect the optimal number of clusters for the segmentation of the White Blood Cells. The Coherence Particle Swarm Optimization algorithm is integrated to the FCM clustering algorithm to select the effective cluster heads. Cluster heads undergo crossover operation with the nodes in the clusters. The fitness function get the minimum distance between the cluster nodes by comparing the maximum number of nodes in the each cluster, from which the cluster heads are selected and updated each time to get the effective clustering results.

The Canny operator has been designed to be an optimal edge detector. It takes as input a grey scale image, and produces as output an image showing the positions of tracked intensity discontinuities [19]. The canny edge detection method is applied to extract the nucleus and cytoplasm from the clustered image. The Fig. 4 shows the segmentation of the AML image by the standard FCM, HFCMCCE, EHFCMCCE and proposed CPSO-SSFCM.

Fig. 4.
figure 4

(a) Standard FCM image (b) HFCMCCE image (c) EHFCMCCE image (d) CPSO-SSFCM image (e) Nucleus and cytoplasm differentiation (f) Label1 image (g) Label2 image (h) Label3 image (i) Label4 image

4.3 Feature Extraction Phase

The most important task in pattern recognition is selecting the proper diagnostic features, describing the image by the numerical values, and enabling the automatic system to perform the recognition. In this step the geometrical and size ratio features from the regions like nucleus cytoplasm and the WBC is extracted.

Geometric Features

  1. (i)

    Area: It is evaluated by calculating all nonzero pixels within image region.

    $$ Area = \sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{m} {b_{ij} } } $$
    (7)

where \( b_{ij} \) is the value of binary image (0,1) at the pixel coordinate (i, j) with in a m × n image.

  1. (ii)

    Perimeter: The perimeter was measured by computation distance between the Successive boundary pixels.

  2. (iii)

    Circularity: Area-to-perimeter ratio is the measure of roundness or circularity But local irregularities are not reflected by this feature. It is defined as:

    $$ Circularity = 4*\pi *Area\,\,/\,\,perimeter^{2} $$
    (8)

A circle gets the value of 1, while objects with bumpy boundaries get lower values.

  1. (iv)

    Eccentricity: This parameter is used to measure to what extend the shape of a nucleus deviates from being circular. It is an important feature since Monocytes are more circular than the Monoblast. The value of eccentricity ranges between 0 and 1. If the value of the eccentricity is below one then it is not circular. Eccentricity is provided by the Eq. (9).

    $$ Eccentricity = \sqrt {a^{2} - b^{2} } \,\,/\,\,a $$
    (9)

Where a is the semi minor axis and b is the semi major axis.

  1. (v)

    Solidity: The solidity is the ratio of actual area and the convex hull area and is also an essential feature for classification of a blast cell. This measure is defined in Eq. (10) [20].

Size Ratio Measure

  1. (i)

    Nucleus to Cytoplasm Ratio (NCR): It is a ratio of the area of the nucleus to the area of the cytoplasm. It is a measurement to indicate the maturity of a cell, because as a cell matures, the size of its nucleus generally decreases. Pre-cancerous cells have increased nucleus to cytoplasm ratio. Malignant cells occur in clumps and have irregularly shaped nuclei and cytoplasm [21].

    $$ NC \,\,Ratio = {\text{Nucleus area }}\,\,/\,\,{\text{Cytoplasm area}} $$
    (10)

The geometric feature of the WBC is one of the prime factor to validate whether the segmented WBC can be treated as either normal or leukaemia infected. The experimentation is conducted on 40 leukemia infected and 10 normal blood microscopic images. The area, radius and diameter of the leukemia affected input image is shown in the Table 1. The Eqs. (11)–(13) are used to calculate the diameter and radius of the WBC in pixels which is then converted into micrometer (μm).

Table 1. Geometric features of WBC

The Fig. 5 shows the subimage of the WBC. The diameter of the normal monocytes is 12–20 μm [21]. But, for the input image the diameter is greater than the normal size.

$$ WBC \,\,Radius = \sqrt {\frac{Area}{\pi }} $$
(11)
$$ WBC \,\,Radius = WBC \,\,Diameter\,\,/\,\,2 $$
(12)
$$ WBC \,\,Diameter = 2*\sqrt {\frac{Area}{\pi }} $$
(13)
Fig. 5.
figure 5

WBC sub image (a) Cell 1 (b) Cell 2 (c) Cell 3

According to the hematologist the shape of the nucleus and cytoplasm is an essential feature for distinguish of blasts. The geometric features are extracted for shape analysis of the nucleus and cytoplasm. The extracted sub image of nucleus, cytoplasm of the leukemia infected and normal blood cell image using bounding box technique is given in the Fig. 6.

Fig. 6.
figure 6

(a) Nucleus extracted image (b) Cytoplasm extracted image (c) Sub image Nucleus 1 (d) Sub image Nucleus 2 (e) Sub image Nucleus 3

The extracted shape features Area, Perimeter, Eccentricity, Circularity, Solidity and Nucleus to Cytoplasm Ratio of Nucleus 1, 2 and 3 is shown in Table 2. The circularity value is less than 0.90, which indicates that the nucleus is not circular and also denotes the distortion in the shape of the nucleus. If eccentricity value is near to one indicates that the nucleus is a blast. Considering the solidity, if its value is below one then it seems to possess irregular boundaries which again substantiates that the input image contains a blast of the nucleus. As Pre-cancerous cells have increased nucleus to cytoplasm ratio here the NCR value of the Nucleus 1, 2 and 3 is high which in turn indicates it must be a blast.

Table 2. Geometric and size ratio features of nucleus and cytoplasm [Blast - Eccentricity ≈ 1, Solidity < 1 and Circularity < 0.90 & NCR > 1]

4.4 Classification Phase

Support Vector Machine (SVM), proposed by Vapnik, is a well-known pattern recognition tools and was associated with various areas such as text mining, bioinformatics, image classification, cancer diagnosis, and feature selection [18]. After feature extraction of the nucleus and cytoplasm the SVM classification is performed to classify the input image as normal or abnormal.

To ensure the effectiveness of the classifier and segmentation the following parameters are calculated.

True Positive (TP) = No of images having leukemia and detected

True Negative (TN) = No of images that have not leukemia and not detected

False Positive (FP) = No of images that have not leukemia and detected

False Negative (FN) = No of images have leukemia and not detected [18].

The Fig. 7 shows the classification of the input image as AML. The Table 3 shows the results of the proposed algorithm for SVM classifier with values 100.00%, 92.85% and 83.33%, for sensitivity, accuracy, and specificity, respectively which is exhibiting a good performance of the functioning of the proposed method.

Fig. 7.
figure 7

Classification of input image as AML

Table 3. Performance analysis of proposed method using SVM classifier

Finally the performance of the proposed method is analysed by calculating the Full Reference and Pixel difference based Measures such as Peak Signal to Noise Ratio (PSNR) and Mean Square Error (MSE) where PSNR of the image is measured by the ratio between the maximum possible power of an image and a power of corrupting noise and higher the PSNR better the quality of the image [17]. MSE is used to measures gray-level difference between pixels of the ideal and the distorted images [22]. From the Table 4 and the comparison chart given in the Fig. 8, it can recognized that the proposed method is showing high PSNR and low MSE values which reveals that the segmentation using the proposed method is showing good results than the Standard FCM, HFCMCCE [8] and EHFCMCCE [18].

Table 4. Performance analysis of full reference and pixel difference based measure on existing and proposed algorithm [Ideal values: PSNR-High, MSE-Low]
Fig. 8.
figure 8

Performance analysis of full reference and pixel difference based measure on existing and proposed algorithm

5 Conclusion

The obtained results confirms that the proposed algorithm CPSO-SSFCM, which is developed for leukemia image segmentation performs well in comparison to the regular and existing methods used in practice. The cluster heads are optimized by utilizing CPSO. The cluster heads are converged within 4 number of iterations which asserts the efficiency of the proposed algorithm. Then the segmented WBC, nucleus and cytoplasm is analysed based on the Geometric and size ratio measures. Immediately, the classification is made to detect and classify the normal and leukemia infected images. The results of SVM classifier is examined based on the Sensitivity, Specificity, Accuracy and ROC curve. Finally the effectiveness of the algorithm is assessed on the Full Reference and pixel based Reference measure such as PSNR and MSE. As a future work, the algorithm can be extended to identify the sub type of the leukemia and in the analysis phase additional measures such as statistical, textural can be presented.