Keywords

1 Introduction

The purpose of the medical imaging is to create visual representation intelligible information of a medical nature. This problem fits more broadly in the framework of the scientific imaging and related technological cyber-physical systems [1, 2]. The objective is indeed to be able to represent a large amount of data from a multitude of sources in a format somewhat simple to a giving imaging mode or modes [3, 4]. In the multitude of medical images, the present investigation focuses on the characterization of the microscopic image by texture.

The texture is very often seen as disruptive as characterized by transitions but unattractive in terms of object contours. Various methods exist to extract the characteristics of image textures [5,6,7,8,9].

The texture, almost omnipresent in the images and, particularly, microscopic medical images (IMM), plays a paramount role in the analysis, segmentation, classification, representation, and characterization of imageries. Although it has interested many researchers and many of the works issued in recent years [10,11,12] tackled these issues, the knowledge area remains, still not fully exploited.

Feature extraction (FE) represents an image through a set of features (feature vector) with or without some dimensionality reduction. When the acquisition algorithm handles input data that are too bulky and contain redundancy, then the input can be transformed into a compact representation relying on a set of features. Hence, FE expresses the input data through a basis consisting of a set of image-related features. If the elements mined are chosen sensibly (this process is also known as handcrafted features), then the feature set will portray the relevant data from the input data satisfactorily. As a consequence, the sought after task can be performed with the reduced representation as a replacement for the full-size input. Features habitually encompass information about gray intensities, shape, texture, or context. One must initially extract some image features to categorize a picture object. The figure below depicts the processing stages of the proposed scheme [11,12,13,14] (Fig. 9.1).

Fig. 9.1
figure 1

Depiction of the feature extraction process

The extracted blood cell features turn out to be the input to a classification stage that categorizes the cells according to hematological models automatically. The classification module should identify the blood cells relying on the extracted features from real images. When it comes to noisy images, this can impair the classification.

The statistical methods [15, 16] study the associations between each pixel and its neighbors. These procedures afford some adaptability of the study of the fine structures without apparent regularity. Three major statistical methodologies will be studied in this work: the first-order, second-order, and those of higher order. In this manuscript, the order of a technique [17] amounts to the number of pixels put into play during the assessment of each result [18].

The rest of the text is organized as follows. Section 9.2 overviews image analysis and the Gray-level co-occurrence matrix (GLCM) for textural analysis applied to the biomedical field. This section condenses the five main Haralick’s texture features calculated and employed the gray-level co-occurrence matrix for the bio-image textural analysis with a specific database of blood cell images. Section 9.3 details the main steps of the proposed analysis, arranges for experimental results, and discusses a comparative study. This part of the text scrutinizes the detection of abnormal blood cells that can be prospective cancerous cells. As a final point, a conclusion and future works emerge in Sect. 9.4.

2 Methods and Materials

Many applications are employing optical microscopy, including blood cancer cell detection. These applications require high-quality data for accurate cancer cell understanding and analysis. This work uses images provided by optical microscopy to identify pre-diagnoses abnormal blood cells using textural features to distinguish between the different grades of cancer cells (Fig. 9.2). An analysis of the textures and structures, present in the bio-images representing of samples, allows making a diagnosis of different degrees of cancers malignancy correspond to different structural patterns as well as apparent textures. We propose to apply the Haralick’s texture features based on the GLCM in various types of blood cell images.

Fig. 9.2
figure 2

Blood smear for normal cells (left) and cancerous cells (right) [9]

2.1 Textural Analysis Based on Gray-Level Co-occurrence Matrix (GLCM)

The authors consider the GLCM parameters as part of a statistical textural analysis study of bio-images to remove the different characteristics of texture [19]. The GLCM helps to relieve the burden of dealing with long feature descriptors [3, 4].

2.1.1 Gray-Level Co-occurrence Matrix (GLCM)

A statistical way of examining textures that may ponder on the spatial relationship of pixels is the GLCM, otherwise known as the gray-level spatial dependence matrix. The GLCM characterizes the texture of an image by evaluating how often pairs of pixels with particular numerical values and with a specified spatial connection happen in an image. These pieces of evidence aid in creating a GLCM and, then, identifying statistical metrics from this matrix. GLCM parameters described in texture analysis cannot provide information about shape contrariwise to the textural filter functions, i.e., the spatial pixel relationships within an image. The GLCM is an N × N square matrix, where N represents the gray level of the image. GLCM denotes the probabilities Pd(i, j) of transition from a pixel of an i gray intensity to a pixel of a j gray intensity. The separation between i and j happens by a translation vector defined by the r direction and a d distance. The current values used are r = 00; 450; 900; 1350 and d = {1,2,3,4}. The GLCM computation is widespread in texture depiction and hinges on the repetitive occurrence of some gray-level textural configuration, which varies fast with distance in fine textures and gradually for large textures. The GLCM definition is as follows:

$$ {P}_{d\left(i,j\right)}=\mid \left\{\left(\left(r,s\right),\left(t,v\right)\right):I\left(r,s\right)=i,I\left(t,v\right)=j\right\}\mid, $$
(9.1)

where

  • (r, s), (t, v) are the image coordinates with (t, v)=(r + dx, s + dy).

  • d is the distance vector (dx, dy).

  • |.| designates the cardinal of the whole

GLCM contains elements contingent on the image size. For instance, the authors found 256 × 256 elements for a 256-gray-level image, increasing the effort to manipulate GLCM.

In that way, the resolutions of images are often reduced to consider the gray-level coding of 8, 16, or 32 bits in practice. According to the GLCM, several calculated parameters characterize the spatial texture.

2.2 Haralick Parameters Extraction

GLCM contains lots of information complicated to exploit directly. Therefore, 14 parameters (as defined by Haralick [12]) can be calculated from GLCM, providing more easily the descriptive characters of the textures. The study’s case only utilizes and computes the next five main Haralick’s coefficients (or parameters) on GLCM for textural analysis: energy (ENE), contrast (CST), entropy (ENT), correlation (COR), and homogeneity (HOM) described as follows [19,20,21]:

  1. 1.

    Energy: It measures the texture uniformity. It has high values when the gray-level distribution is constant or periodic consistent with the expression below:

    $$ \mathrm{ENE}=\sum \limits_i\sum \limits_j\left( Pd\ {\left(i,j\right)}^2\right). $$
    (9.2)
  2. 2.

    Contrast: This feature measures the image intensity contrast or the local variabilities present in an image to indicate the texture fineness. It is strongly uncorrelated to energy as illustrated by the next equation:

    $$ \mathrm{CST}=\sum \limits_i\sum \limits_j\left({\left(i-j\right)}^2 Pd\ \left(i,j\right)\right). $$
    (9.3)
  3. 3.

    Entropy: This parameter gauges the disorder within the image. It attains high values for a random texture and correlates strongly with the reverse of the energy. The succeeding expression defines this coefficient:

    $$ \mathrm{ENT}=-\sum \limits_i\sum \limits_j\left(\log \left( Pd\left(i,j\right)\right) Pd\ \left(i,j\right)\right). $$
    (9.4)
  4. 4.

    Correlation: This feature estimates the linear dependency (relatively to d) of the gray levels in the image. It is uncorrelated to energy and entropy parameters. The equation underneath specifies this parameter:

    $$ \mathrm{COR}=\sum \limits_{i,j=1}^N Pd\left(i,j\right)\left[\frac{\left(i-{m}_i\right)\left(j-{m}_j\right)}{\sqrt{\sigma_i^2{\sigma}_j^2}}\right],\mathrm{with} $$
    (9.5)
    $$ \Big\{{\displaystyle \begin{array}{c}{m}_i=\sum \limits_{i,j=1}^Ni\left({P}_d\left(i,j\right)\right)\\ {}{m}_j=\sum \limits_{i,j=1}^Nj\left({P}_d\left(i,j\right)\right)\end{array}} $$
    (9.6)

    and

    $$ \Big\{{\displaystyle \begin{array}{c}{\sigma}_i^2=\sum \limits_{i=1}^N{P}_d\left(i,j\right){\left(i-{m}_i\right)}^2\\ {}{\sigma}_j^2=\sum \limits_{i,j=1}^N{P}_d\left(i,j\right){\left(j-{m}_j\right)}^2\end{array}}. $$
    (9.7)
  5. 5.

    Homogeneity: It returns a value corresponding to the closeness of the distribution of elements within the GLCM to the GLCM diagonal. The next equation states this coefficient:

    $$ \mathrm{HOM}=\sum \limits_i\sum \limits_j\frac{Pd\ \left(i,j\right)}{1+{\left(i-j\right)}^2}. $$
    (9.8)

    All these attributes are defined for a displacement value d, which is very important [11] with regard to obtaining a significant result. Thereby, for each pixel, we define a vector of attributes vi (energy, entropy, and so on).

2.3 Databases

The authors have handled images downloaded from [22] and a sample of blood cells. The results discussed in the next hand apply to the image from Fig. 9.3.

Fig. 9.3
figure 3

Example of an input picture to the Haralick’s FE process

3 Results and Discussion

The authors have treated the parameters of the co-occurrence matrix via Matlab R2012aa environment and tested on a typical PC Pentium (R) Dual Core CPU Processor 2.20 GHz with 4 GB RAM.

The analysis window size must fulfil two conflicting criteria, viz., be as small as possible to lessen the risk of blending different textures while the largest possible to extract statistics quite robust and significant. After several measures, the choice for the size of the window is 8 × 8. First, the GLCM representation for each processed image pixel is below (Tables 9.1, 9.2, and 9.3):

Table 9.1 Representation of an 8 × 8 image window
Table 9.2 Calculates GLCM (8 x 8)
Table 9.3 Statistical comparison by texture indices of second order

The GLCM calculation for the previous image window occurs as in the following table:

The GLCM analysis results demonstrate that the summation of the coefficients p(i, j) for any block of the same image remains equal. In this case, Sp =  ∑ P(i, j) = 56. The computation of the five criteria for 8 × 8 blocks with a displacement = 1 give the results below, where the following figures show the distribution of the results of five parameters under the form histograms (Figs. 9.4, 9.5, 9.6, 9.7, and 9.8).

Fig. 9.4
figure 4

Histogram of energy

Fig. 9.5
figure 5

Histogram of contrast

Fig. 9.6
figure 6

Histogram of entropy

Fig. 9.7
figure 7

Histogram of correlation

Fig. 9.8
figure 8

Histogram of homogeneity

The results above relying on the representation of five parameters of the GLCM form histograms. The visual quality of the bio-image textures from the database improved a lot based on the evolution of pixel-based on gray levels for each index calculation. A table with a comparison between a healthy cell and other cancerous for abnormal regions detection confirms the effectiveness of the parameters’ choice using the co-occurrence matrix as calculated before to form the histograms:

4 Conclusion

This work applies a method to make a bio-image textural analysis successfully and accurately, relying on a mathematical basis. The resultant feature vector has five entries: energy (ENE), contrast (CST), entropy (ENT), correlation (COR), and homogeneity (HOM). This software scheme employs a higher-order GLCM statistical method hinging essentially on the study of the relations between each pixel and its neighbors (to spot the fine textures) and the spatial distribution of the gray levels. This article only offers statistical information on the images, which is, therefore, considered as a higher order strategy. This happens because the methods allow quantifying nonvisible evidence (data that are incomprehensible by the humans) and, thus, augmenting the opportunity of interpreting more of the image data and in particular of the textured images. Although the GLCM scheme gives good results, it demands too many calculations. Each calculation is already relatively heavy. The results obtained are allowed to enrich the diagnostic of data thanks to the calculated parameters. The authors also suggest a hardware implementation. The FPGA technology speeds up the computation of features to attain both high performance and flexibility in fast computation in real-time processing [23]. Even though working with a reduced set of descriptors speeds up the diagnosis, in the future, a suspicious result can be directed to other stages of processing to additional analysis. Since the recommended method is simple and the feature vector contains only five entries, the algorithm has low dimensionality, and it is more proper for hardware implementations than some sophisticated deep learning realizations [24,25,26,27,28,29,30,31].