Feature subset selection for classification of malignant and benign breast masses in digital mammography

Chaieb, Ramzi; Kalti, Karim

doi:10.1007/s10044-018-0760-x

Feature subset selection for classification of malignant and benign breast masses in digital mammography

Survey
Published: 09 November 2018

Volume 22, pages 803–829, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Pattern Analysis and Applications Aims and scope Submit manuscript

Feature subset selection for classification of malignant and benign breast masses in digital mammography

Download PDF

510 Accesses
18 Citations
Explore all metrics

Abstract

Computer-aided diagnosis of breast cancer is becoming increasingly a necessity given the exponential growth of performed mammograms. In particular, the breast mass diagnosis and classification arouse nowadays a great interest. Texture and shape are the most important criteria for the discrimination between benign and malignant masses. Various features have been proposed in the literature for the characterization of breast masses. The performance of each feature is related to its ability to discriminate masses from different classes. The feature space may include a large number of irrelevant ones which occupy a lot of storage space and decrease the classification accuracy. Therefore, a feature selection phase is usually needed to avoid these problems. The main objective of this paper is to select an optimal subset of features in order to improve masses classification performance. First, a study of various descriptors which are commonly used in the breast cancer field is conducted. Then, selection techniques are used in order to determine the most relevant features. A comparative study between selected features is performed in order to test their ability to discriminate between malignant and benign masses. The database used for experiments is composed of mammograms from the MiniMIAS database. Obtained results show that Gray-Level Run-Length Matrix features provide the best result.

Comparative analysis of proficiencies of various textures and geometric features in breast mass classification using k-nearest neighbor

Article Open access 12 January 2022

Diagnosis of breast tissue in mammography images based local feature descriptors

Article 16 July 2018

Computer aided detection of mammographic mass using exact Gaussian–Hermite moments

Article 11 June 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Breast cancer is the leading cause of cancer deaths among the female population [1]. The only way today to reduce it is its early detection using imaging techniques [2, 3]. Mammography is one of the most effective tools for prevention and early detection of breast cancer [4,5,6]. It is a screening tool used to localize suspicious tissues in the breast such as microcalcifications and masses. It allows also the detection of architectural distortion and bilateral asymmetry [7]. A mass is defined as a space-occupying lesion seen in, at least, two different projections [8]. Mass density can be high, isodense, low or fat containing. Moreover, mass margin can be circumscribed, microlobulated, indistinct or spiculated. Mass shape can be round, oval, lobular or irregular [9]. In recent years, screening campaigns are being organized in several countries. These campaigns generate a huge stream of mammograms, and it is still difficult for expert radiologists to provide accurate and consistent analysis. Therefore, computer-aided diagnosis (CAD) systems are developed to help the radiologists in detecting lesions and in taking diagnosis decisions [9,10,11,12,13,14].

A generic CAD system by image analysis includes two main stages: feature extraction stage followed by a classification stage. In the literature, various numbers of techniques are studied to describe breast abnormalities in digital mammograms. A lot of research has been done on the textural and shape analysis on mammographic images [1, 9, 14,15,16,17,18,19]. In this paper, the main objective is to determine which features optimize the classification performance.

In the process of pattern recognition, the goal is to achieve the best classification rate using required features. The extraction of these features from regions of the image is one of the important phases in this process. Features are defined as the input variables which are used for classification. The quality of a feature is related to its ability to discriminate observations from different classes. The characterization task often generates a large number of features, and the obtained features space may include a large number of irrelevant ones. This will induce greater computational cost, occupy a lot of storage space and decrease the classification performance. Thus, a feature selection phase is needed to avoid these problems. In this study, we propose an automated computer scheme in order to select an optimal subset of features for masses classification in digital mammography. The obtained results can be used in other applications such as segmentation and content-based image retrieval.

The remaining part of this paper is organized as follows. The next section gives an overview of the proposed methodology. Sections 3 to 5 describe the process of selecting features. Sections 6 and 7 present the combination of three classifiers and the different measures used to evaluate the classification performance. The experimental results are evaluated and discussed in Sect. 8. Finally, concluding remarks are given in the last section.

2 Methodology

Our approach in this study is composed of two main stages: characterization and classification. Each of these stages is explained in detail by the flowchart given in Fig. 1.

2.1 Feature extraction

Various features have been proposed in the literature for the characterization of masses. These features are organized into families according to their nature [16]. The majority of studies focus on one family and analyze its performance. In this work, we propose to study the performance of a set of feature families. Then, we make a comparison between these families in order to select the best feature set. Finally, the most discriminant features are selected from the obtained feature set. The process is described in Fig. 2. {FS₁, FS₂, …, FS_i} denotes the set of feature families. For each feature family, we selected its most relevant features using a number of feature selection methods (FSM₁, FSM₂, …, FSM_j). After this step, we obtained for each FS the set {FSM₁(FS_i)}, {FSM₂(FS_i)}, …, {FSM_j(FS_i)} where FSM_j(FS_i) denotes FS_i selected features using FSM_j. Obtained results were used as input to the next step which is the selection of the best FSM that selects the most relevant features for each FS. Finally, we selected the optimal subset of features from the obtained results.

Texture and shape are the major criteria for the discrimination between the benign and malignant masses. In this study, we have followed two main kinds of description techniques. The first employs texture features extracted from ROIs. The second is based on computer-extracted shape features of masses, since morphology is one of the most important factors in breast cancer diagnosis.

2.1.1 Texture analysis

Texture analysis is performed in each ROI selected in the previous phase. The texture feature space can be divided into two subspaces: statistical and frequential features.

a) Statistical features: The statistical textural features that we have used in this study can be grouped into five sets based on what they are derived from: First-Order Statistics (FOS), Gray-Level Co-occurrence Matrices (GLCM), Gray-Level Difference Matrices (GLDM), Gray-Level Run-Length Matrices (GLRLM) and Tamura features.

First-Order Statistics features: FOS provides different statistical properties of the intensity histogram of an image [20]. They depend only on individual pixel values and not on the interaction of neighboring pixels values. In this study, six first-order textural features were calculated: Mean value of gray levels, Mean square value of gray levels, Standard Deviation, Variance, Skewness and Kurtosis.

Denoting by I(x,y) the image subregion pixel matrix, the formulae used for the metrics of the FOS features are as follows:

Mean value of gray levels:

$$M = \frac{1}{XY}\mathop \sum \limits_{x = 1}^{X} \mathop \sum \limits_{y = 1}^{Y} I\left( {x,y} \right)$$
(1)
Mean square value of gray levels:

$${\text{MS}} = \frac{1}{XY}\mathop \sum \limits_{x = 1}^{X} \mathop \sum \limits_{y = 1}^{Y} \left[ {I\left( {x,y} \right)} \right]^{2}$$
(2)
Standard Deviation:

$${\text{SD}} = \sqrt {\frac{1}{{\left( {XY - 1} \right)}}\mathop \sum \limits_{x = 1}^{X} \mathop \sum \limits_{y = 1}^{Y} \left[ {I\left( {x,y} \right) - M} \right]^{2} }$$
(3)
Variance:

$$V = \frac{1}{{\left( {XY - 1} \right)}}\mathop \sum \limits_{x = 1}^{X} \mathop \sum \limits_{y = 1}^{Y} \left[ {I\left( {x,y} \right) - M} \right]^{2}$$
(4)
Skewness:

$$S = \frac{1}{XY}\mathop \sum \limits_{x = 1}^{X} \mathop \sum \limits_{y = 1}^{Y} \left[ {\frac{{I\left( {x,y} \right) - M}}{SD}} \right]^{3}$$
(5)
Kurtosis:

$$K = \left\{ {\frac{1}{XY}\mathop \sum \limits_{x = 1}^{X} \mathop \sum \limits_{y = 1}^{Y} \left[ {\frac{{I\left( {x,y} \right) - M}}{SD}} \right]^{4} } \right\} - 3$$
(6)

Gray-Level Co-occurrence Matrix features: The second feature group is a robust statistical tool for extracting second-order texture information from images [15, 21]. The GLCM characterizes the spatial distribution of gray levels in the selected ROI. An element at location (i,j) of the GLCM represents the joint probability density of the occurrence of gray levels i and j in a specified orientation θ and specified distance d from each other (Fig. 3). Thus, for different θ and d values, different GLCMs are generated. Figure 4 shows how a GLCM with θ = 0° and d = 1 is generated. The number 4 in the co-occurrence matrix indicates that there are four occurrences of a pixel with gray level 3 immediately to the right of pixel with gray level 6.

Nineteen features were derived from each GLCM. Specifically, the features studied are: Mean, Variance, Entropy, Contrast, Angular Second Moment (also called Energy), Dissimilarity, Correlation, Inverse Difference Moment (also called Homogeneity), Diagonal Moment, Sum Average, Sum Entropy, Sum Variance, Difference Entropy, Difference Mean, Difference Variance, Information Measure of Correlation 1, Information Measure of Correlation 2, Cluster Prominence and Cluster Shade.

Denoting by: Ng the number of gray levels in the image, p(i,j) the normalized co-occurrence matrix, p_x(i) and p_y(j) the row and column marginal probabilities, respectively, obtained by summing the columns or rows of p(i, j):

$$p_{x} \left( i \right) = \mathop \sum \limits_{j = 1}^{{N_{g} }} p\left( {i,j} \right)$$

(7)

$$p_{y} \left( j \right) = \mathop \sum \limits_{i = 1}^{{N_{g} }} p\left( {i,j} \right)$$

(8)

$$p_{x + y} \left( k \right) = \mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{g} }} p\left( {i,j} \right);\quad k = i + j = 2,3, \ldots ,2N_{g}$$

(9)

$$p_{x - y} \left( k \right) = \mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{g} }} p\left( {i,j} \right);\quad k = \left| {i - j} \right| = 0,1, \ldots ,N_{g}$$

(10)

The formulae used to calculate the GLCM features are as follows:

Mean:

$$M = \mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{g} }} i p\left( {i,j} \right)$$
(11)
Variance:

$$V = \mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{g} }} \left[ {\left( {i - M} \right)^{2} p\left( {i,j} \right)} \right]$$
(12)
Entropy:

$${\text{Ent}} = \mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{g} }} \left[ {p\left( {i,j} \right) {\text{log}}\left( {p\left( {i,j} \right)} \right)} \right]$$
(13)
Contrast:

$${\text{Cont}} = \mathop \sum \limits_{n = 0}^{{N_{g} - 1}} n^{2} \left\{ {\mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{g} }} p\left( {i,j} \right) ; \left| {i - j} \right| = n } \right\}$$
(14)
Angular Second Moment (also called Energy):

$${\text{ASM}} = \mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{g} }} \left\{ {p\left( {i,j} \right)} \right\}^{2 }$$
(15)
Dissimilarity:

$${\text{Diss}} = \mathop \sum \limits_{i = 1}^{{N_{g} - 1}} \mathop \sum \limits_{j = 1}^{{N_{g} - 1}} \left( {\left| {i - j} \right| p\left( {i,j} \right)} \right)$$
(16)
Correlation:

$${\text{Corr}} = \frac{{\left[ {\mathop \sum \nolimits_{i = 1}^{{N_{g} }} \mathop \sum \nolimits_{j = 1}^{{N_{g} }} \left( {ij} \right)p\left( {i,j} \right)} \right] - \mu_{x} \mu_{y} }}{{\sigma_{x} \sigma_{y} }}$$
(17)
where μ_x, μ_y, σ_x and σ_y are the Mean values and Standard Deviations of p_x and p_y, respectively.

$$\mu_{x} = \mathop \sum \limits_{i = 1}^{{N_{g} }} \left[ {i\mathop \sum \limits_{j = 1}^{{N_{g} }} p\left( {i,j} \right)} \right]$$
(18)

$$\mu_{y} = \mathop \sum \limits_{j = 1}^{{N_{g} }} \left[ {j\mathop \sum \limits_{i = 1}^{{N_{g} }} p\left( {i,j} \right)} \right]$$
(19)

$$\sigma_{x} = \mathop \sum \limits_{i = 1}^{{N_{g} }} \left[ {\left( {i - \mu_{x} } \right)^{2} j\mathop \sum \limits_{j = 1}^{{N_{g} }} p\left( {i,j} \right)} \right]$$
(20)

$$\sigma_{y} = \mathop \sum \limits_{j = 1}^{{N_{g} }} \left[ {\left( {j - \mu_{y} } \right)^{2} \mathop \sum \limits_{i = 1}^{{N_{g} }} p\left( {i,j} \right)} \right]$$
(21)
Inverse Difference Moment (also called Homogeneity):

$${\text{IDM}} = \mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{g} }} \left[ {\frac{1}{{1 + \left( {i - j} \right)^{2} }}p\left( {i,j} \right)} \right]$$
(22)
Diagonal Moment:

$${\text{DM}} = \mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{g} }} (\frac{1}{2}\left( {\left| {i - j} \right|} \right) p\left( {i,j} \right))^{1/2}$$
(23)
Sum Average:

$${\text{SA}} = \mathop \sum \limits_{i = 2}^{{2N_{g} }} \left[ {ip_{x + y} \left( i \right)} \right]$$
(24)
Sum Entropy:

$${\text{SE}} = - \mathop \sum \limits_{i = 2}^{{2N_{g} }} \left[ {p_{x + y} \left( i \right) { \log }\left[ {p_{x + y} \left( i \right)} \right]} \right]$$
(25)
Sum Variance:

$${\text{SV}} = \mathop \sum \limits_{i = 2}^{{2N_{g} }} \left[ {\left( {i - SA} \right)^{2} p_{x + y} \left( i \right)} \right]$$
(26)
Difference Entropy:

$${\text{DE}} = - \mathop \sum \limits_{i = 1}^{{N_{g} }} \left[ {p_{x - y} \left( i \right) { \log }\left[ {p_{x - y} \left( i \right)} \right]} \right]$$
(27)
Difference Mean:

$${\text{DMean}} = \mathop \sum \limits_{i = 1}^{{N_{g} }} \left[ {i p_{x - y} \left( i \right)} \right]$$
(28)
Difference Variance:

$${\text{DV}} = \mathop \sum \limits_{i = 1}^{{N_{g} }} \left[ {\left( {i - {\text{DMean}}} \right)^{2} p_{x - y} \left( i \right)} \right]$$
(29)
Information Measure of Correlation 1:

$${\text{IMC}}1 = \frac{HXY - HXY1}{{max\left\{ {HX,HY} \right\}}}$$
(30)
where HX and HY are the entropy of p_x and p_y, respectively.

$$HX = - \mathop \sum \limits_{i = 1}^{{N_{g} }} p_{x} \left( i \right)\log \left( {p_{x} \left( i \right)} \right)$$
(31)
$$HY = - \mathop \sum \limits_{j = 1}^{{N_{g} }} p_{y} \left( j \right)\log \left( {p_{y} \left( j \right)} \right)$$
(32)
and

$$HXY = - \mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{g} }} \left[ {p\left( {i,j} \right) { \log }\left( {p\left( {i,j} \right)} \right)} \right]$$
(33)

$$HXY1 = - \mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{g} }} \left[ {p_{x} \left( {i,j} \right) { \log }\left( {p_{x} \left( i \right)p_{y} \left( j \right)} \right)} \right]$$
(34)
Information Measure of Correlation 2:

$${\text{IMC}}2 = \left[ {1 - { \exp }\left[ { - 2\left( {HXY2 - HXY} \right)} \right]} \right]^{1/2}$$
(35)
where

$$HXY2 = - \mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{g} }} \left[ {p_{x} \left( i \right) p_{y} \left( j \right) { \log }\left( {p_{x} \left( i \right)p_{y} \left( j \right)} \right)} \right]$$
(36)
Cluster Prominence:

$${\text{CP}} = \mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{g} }} \left( {i + j - 2M} \right)^{4} p\left( {i,j} \right)$$
(37)
Cluster Shade:

$${\text{CS}} = \mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{g} }} \left( {i + j - 2M} \right)^{3} p\left( {i,j} \right)$$
(38)

Gray-Level Difference Matrix features: The GLDM features are extracted from the gray-level difference matrices vector of an image [22]. The GLDM vector is the histogram of the absolute difference of pixel pairs separated by a given displacement vector δ = (∆x,∆y), where I_δ(x,y)= |I(x + ∆x) − I(y + ∆y)| and ∆x and ∆y are integers. An element of GLDM vector p_δ(i) can be computed by counting the number of times that each value of I_δ(x,y) occurs. In practice, the displacement vector δ = (∆x,∆y) is usually selected to have a phase of value as 0°, 45°, 90° or 135° to obtain the oriented texture features. GLDM method is based on the generalization of a density function of gray-level difference. In this study, five GLDM features were calculated: Mean, Contrast, Angular Second Moment, Entropy and Inverse Difference Moment.

Denoting by f(i/δ) the probability density associated with possible values of I_δ, f(i/δ)= P(I_δ(x,y)= i) and M the number of gray-level differences, the formulae used for the metrics of the GLDM features are as follows:

Mean:
$${\text{Mean}} = \mathop \sum \limits_{i = 1}^{M} i\,f(i|\delta )$$
(39)
Contrast:
$${\text{Cont}} = \mathop \sum \limits_{i = 1}^{M} i^{2}\,f\left( {i |\delta } \right)$$
(40)
Angular Second Moment:
$${\text{ASM}} = \mathop \sum \limits_{i = 1}^{M} \left[ {f(i|\delta )} \right]^{2}$$
(41)
Entropy:
$${\text{Ent}} = \mathop \sum \limits_{i = 1}^{M} - f\left( {i |\delta } \right)\ln \left( {f\left( {i |\delta } \right)} \right)$$
(42)
Inverse Difference Moment:
$${\text{IDM}} = \mathop \sum \limits_{i = 1}^{M} \frac{f(i|\delta )}{{i^{2} + 1}}$$
(43)

Gray-Level Run-Length Matrix features: GLRLM provides information related to the spatial distribution of gray-level runs (i.e., pixel structures of same pixel value) within the image [22]. Each gray-level run can be characterized by its gray level, length and direction. Textural features extracted from GLRLM evaluate the distribution of small (short runs) or large (long runs) organized structures within ROIs. For each of the four directions θ (0°, 45°, 90° and 135°), we can generate a GLRLM. Figure 5 shows an example of constructing a GLRLM with θ = 135°.

Denoting by Ng the number of gray levels, N_r the maximum run length and p(i,j) the (i,j)th element of the run-length matrix for a specific angle θ and a specific distance d (i.e., p_θ,_d(i,j)), each element of the run-length matrix represents the number of pixels of run length j and gray level i. Gray-Level Run-Number Vector and Run-Length Run-Number Vector are defined as follows:

Gray-Level Run-Number Vector:
$$p_{g} \left( i \right) = \mathop \sum \limits_{j = 1}^{{N_{r} }} p\left( {i,j} \right)$$
(44)

This vector represents the sum distribution of the number of runs with gray level i.
Run-Length Run-Number Vector:
$$p_{r} \left( j \right) = \mathop \sum \limits_{i = 1}^{{N_{g} }} p\left( {i,j} \right)$$
(45)

This vector represents the sum distribution of the number of runs with run length j.
The total number of runs:
$$N_{\text{runs}} = \mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{r} }} p\left( {i,j} \right)$$
(46)

From each ROI, eleven GLRLM features are generated: Short Runs Emphasis (SRE), Long Runs Emphasis (LRE), Gray-Level Non-uniformity (GLN), Run-Length Non-uniformity (RLN), Run Percentage (RP), Low Gray-Level Run Emphasis (LGRE), High Gray-Level Run Emphasis (HGRE), Short Run Low Gray-Level Emphasis (SRLGE), Short Run High Gray-Level Emphasis (SRHGE), Long Run Low Gray-Level Emphasis (LRLGE) and Long Run High Gray-Level Emphasis (LRHGE).

The formulae used to calculate GLRLM features are as follows:

Short Runs Emphasis:
$${\text{SRE}} = \frac{1}{{N_{runs} }}\mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{r} }} \frac{{p\left( {i,j} \right)}}{{j^{2} }} = \frac{1}{{N_{runs} }}\mathop \sum \limits_{j = 1}^{{N_{r} }} \frac{{p_{r} \left( j \right)}}{{j^{2} }}$$
(47)
Long Runs Emphasis:
$${\text{LRE}} = \frac{1}{{N_{\text{runs}} }}\mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{r} }} p\left( {i,j} \right).j^{2} = \frac{1}{{N_{\text{runs}} }}\mathop \sum \limits_{j = 1}^{{N_{r} }} p_{r} \left( j \right).j^{2}$$
(48)
Gray-Level Non-uniformity:
$${\text{GLN}} = \frac{1}{{N_{\text{runs}} }}\mathop \sum \limits_{i = 1}^{{N_{g} }} \left[ {\mathop \sum \limits_{j = 1}^{{N_{r} }} p\left( {i,j} \right)} \right]^{2} = \frac{1}{{N_{\text{runs}} }}\mathop \sum \limits_{i = 1}^{{N_{g} }} \left[ {p_{g} \left( i \right)} \right]^{2}$$
(49)
Run-Length Non-uniformity:
$${\text{RLN}} = \frac{1}{{N_{\text{runs}} }}\mathop \sum \limits_{j = 1}^{{N_{r} }} \left[ {\mathop \sum \limits_{i = 1}^{{N_{g} }} p\left( {i,j} \right)} \right]^{2} = \frac{1}{{N_{\text{runs}} }}\mathop \sum \limits_{j = 1}^{{N_{r} }} \left[ {p_{r} \left( j \right)} \right]^{2}$$
(50)
Run Percentage:
$${\text{RP}} = \frac{{N_{\text{runs}} }}{{N_{\text{pixels}} }}$$
(51)
where N_pixels is the total number of pixels in the image.
Low Gray-Level Run Emphasis:
$${\text{LGRE}} = \frac{1}{{N_{\text{runs}} }}\mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{r} }} \frac{{p\left( {i,j} \right)}}{{i^{2} }} = \frac{1}{{N_{\text{runs}} }}\mathop \sum \limits_{i = 1}^{{N_{g} }} \frac{{p_{g} \left( i \right)}}{{i^{2} }}$$
(52)
High Gray-Level Run Emphasis:
$${\text{HGRE}} = \frac{1}{{N_{\text{runs}} }}\mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{r} }} p\left( {i,j} \right).i^{2} = \frac{1}{{N_{\text{runs}} }}\mathop \sum \limits_{i = 1}^{{N_{g} }} p_{r} \left( j \right).i^{2}$$
(53)
Short Run Low Gray-Level Emphasis:
$${\text{SRLGE}} = \frac{1}{{N_{\text{runs}} }}\mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{r} }} \frac{{p\left( {i,j} \right)}}{{i^{2} . j^{2} }}$$
(54)
Short Run High Gray-Level Emphasis:
$${\text{SRHGE}} = \frac{1}{{N_{\text{runs}} }}\mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{r} }} \frac{{p\left( {i,j} \right).i^{2} }}{{j^{2} }}$$
(55)
Long Run Low Gray-Level Emphasis:
$${\text{LRLGE}} = \frac{1}{{N_{\text{runs}} }}\mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{r} }} \frac{{p\left( {i,j} \right).j^{2} }}{{i^{2} }}$$
(56)
Long Run High Gray-Level Emphasis:
$${\text{LRHGE}} = \frac{1}{{N_{\text{runs}} }}\mathop \sum \limits_{i = 1}^{{N_{g} }} \mathop \sum \limits_{j = 1}^{{N_{r} }} p\left( {i,j} \right).i^{2} . j^{2}$$
(57)

Tamura features: Tamura et al. [23, 24] defined six texture features (coarseness, contrast, direction, linearity, regularity and roughness). The first three descriptors are based on concepts that correspond to human visual perception. They are effective and frequently used to characterize textures.

b) Frequential features: The second texture feature subspace is based on transformations. Texture is represented in a frequency domain other than the spatial domain of the image. Two different structural methods are considered: Gabor transform and two-dimensional wavelet transform.

Gabor filters: Gabor filter is widely adopted to extract texture features from the images [25, 26] and has been shown to be very efficient. Basically, Gabor filters are a group of wavelets, with each wavelet capturing energy at a specific frequency and a specific direction. After applying Gabor filters on the ROI with different orientations at different scales, we obtain an array of magnitudes:

$$E\left( {m,n} \right) = \mathop \sum \limits_{x} \mathop \sum \limits_{y} \left| {G_{mn} \left( {x,y} \right)} \right|$$

(58)

where m, n and G represent, respectively, the scale, orientation and filtered image.

These magnitudes represent the energy content at different scales and orientations of the image. Texture features are found by calculating the mean and variation of the Gabor filtered image. The following mean µ_mn and Standard Deviation σ_mn of the magnitude of the transformed coefficients are used to characterize the texture of the region.

$$\mu_{mn} = \frac{{E\left( {m,n} \right)}}{P \times Q}$$

(59)

$$\sigma_{mn} = \frac{{\sqrt {\mathop \sum \nolimits_{x} \mathop \sum \nolimits_{y} \left( {\left| {G_{mn} \left( {x,y} \right) - \mu_{mn} } \right|} \right)^{2} } }}{P \times Q}$$

(60)

Five scales and six orientations are used in common implementation, and the features vector f is created using μ_mn and σ_mn as the feature components.

$$f = \left( {\mu_{00} , \sigma_{00} , \mu_{01} , \sigma_{01} , \ldots , \mu_{30} , \sigma_{30} } \right)$$

(61)

Wavelets: Wavelets are very efficient for the characterization of texture in images (especially mammographic images) which may have different types of texture and require sufficient detail characterization. For this reason, it was chosen to be used in this study. The discrete wavelet transform (DWT) of an image is a transform based on the tree structure (successive low-pass and high-pass filters) as shown in Fig. 6. DWT decomposes a signal into scales with different frequency resolutions. The resulting decomposition is shown in Table 1. The image is divided into four bands: LL (left top), LH (right top), HL (left bottom) and HH (right bottom). High frequencies provide global information, while low frequencies provide detailed information hidden in the original image.

Table 1 Wavelet transform representation of an image (two levels)

Feature subset selection for classification of malignant and benign breast masses in digital mammography

Abstract

Similar content being viewed by others

Comparative analysis of proficiencies of various textures and geometric features in breast mass classification using k-nearest neighbor

Diagnosis of breast tissue in mammography images based local feature descriptors

Computer aided detection of mammographic mass using exact Gaussian–Hermite moments

Explore related subjects

1 Introduction

2 Methodology

2.1 Feature extraction

2.1.1 Texture analysis

2.1.2 Shape analysis

3 Feature selection

3.1 Tabu search

3.2 Genetic algorithm

3.3 ReliefF algorithm

3.4 Sequential forward selection and sequential backward selection

4 Feature dimension reduction

5 Feature performance

5.1 The class classifier (CC) measure

5.2 The class variance (CV) measure

5.3 The total class classifier (TCC) measure

5.4 The weighted average of class classifiers (WACC) measure

5.5 The weighted average of class variances (WACV) measure

6 Classification

6.1 Multilayer perceptron (MLP)

6.2 Support vector machines (SVMs)

6.3 K-nearest neighbors (K-NNs)

7 Classification performance evaluation

8 Experimental results

9 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation