Abstract
In this article, we propose an automatic method for the detection and segmentation of the tumor on mammogram images. Most methods of detection of a tumor require an extraction of a large number of texture features from multiple calculations. The study first examines a technique of pre-processing images to obtain the Otsu thresholding method which eliminate items that do not belong in. After performing the thresholding, we estimate the number of base classes of technical LBP (Local Binary Pattern). To automate the initialization task, the classification proposed by applying dynamic k-means and improve the classes obtained by the method of Markov. Then we calculate the correlation between these classes and the original image, we deduce the class that contains the tumor and pectoral muscle. Finally, it uses the method of growing the region to eliminate pectoral muscle. The result obtained by this approach shows the quality and accuracy of extracting parts of the tumor compared to existing approaches in the literature.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Despite advancements in early detection and treatment [9, 20, 25,26,27], the breast cancer is the leading cause of death among women due to the lack of early diagnosis. Mammography screening is the most adopted technique to performing early detection of breast cancer. In mammography images, suspected breast cancer appears as white spots. Breast density, the presence of tags, artifacts or even pectoral muscle influence the sensitivity of mammography.
In the literature, several studies have been developed for the detection of regions of interest (ROI) in mammograms. Among these studies, we find the study of Alayliet al. [2] in which they employ the thresholding algorithm for the detection of breast cancer. This technique poses the problem of determining the threshold. Abdo et al. [15] proposed a method based on K-means with a mixture of gamma distributions; Singh et al. [33] have used K-means and Fuzzy C-means for the detection of mass center in mammography; Siddheswar et al. [4] proposed a method based on image processing functions, K-means and Fuzzy C-Means clustering. Approaches based on K-means algorithms and Fuzzy C-means have the disadvantage of initializing cluster number and centers. Elmoufidi [13] chose to combine the LBP (Local Binary Pattern) and dynamic k-means algorithm. Liu et al. [19] use GVF snake algorithm for extraction of extrapolated breast object. Mustraet al [22] proposed an Adaptive histogram equalization and polynomial curvature estimation. Agrawal et al. [1] proposed Saliency maps for ROI segmentation and the ROIs classification using entropy features. Jen et all [16] proposed a detection method for abnormal mammograms in mammographic datasets based on the novel abnormality detection classifier (ADC) by extracting a few of discriminative features, first-order statistical intensities and gradients. Kwok et al. [17] used the Hough transform to identify the pectoral muscle. In its approach, the pectoral muscle edge was estimated first by a straight line and then refined to a curve. But, the pectoral muscle limit can’t be properly found when the complex texture exists in the muscle region. However, segmentation may become inaccurate for small pectoral muscles. These techniques can be cited along other ones [6, 21, 24].
Nagi et al. [24] used the morphology and the growing seeded area as pretreatment to detect the pectoral muscle. However, the hypothesis of a segment of right for the representation of the edge of the pectoral muscle is not always correct. Wang et al. [35] presented a method based on a discrete time Markov chain (DTMC) and an active contour model to detect the edge of pectoral muscle.
In this paper, we propose new techniques which address the problems cited in the previous paragraph. Our idea is outlined as follows: we start with an Otsu’s thresholding method. Next, an image classification by estimating the number of classes based on LBP (Local Binary Pattern) technique. To automate the initialization task, we have proposed to apply the classification by k-means dynamic improved by Markov method. The tumors image is the result of the maximum correlation.
This paper is an extension of our work presented in [11] and is organized as follows: In section 2, we present used methods. In Section 3, we describe proposed approach. The results along with discussions are presented in Section 4 and the last section is dedicated to the conclusion.
2 Materials and methods
In this article we have used several methods adopted for their simplicity as well as their efficiencies demonstrated in the literature. This work tried to solve two major problem which are elimination of the pectoral muscle and detection of the tumor on mammogram images. First, we started with an Otsu method which is the most used in the literature to erases unwanted areas and labels in mammograms images. Then we use an LBP method to estimate the average number of classes. Then to extract the optimal number of classes we adopt the algorithm proposed by Elmoufidi et al. [13]. After that, we explored the Marcov method to improve the classes obtained by k-means dynamic. Finally, we took advantage of the correlation to know the classes of pectoral muscle and tumor.
2.1 Otsu method
The principle of Otsu method is to find an optimal threshold that maximizes the difference between two classes [30]. It is performed based on the variance. The optimal threshold Soptimal is one that maximizes the following functions:
If η(t) is chosen, then
Where \( {\delta}_T^2,{\delta}_B^2,{\delta}_W^2 \) are successively the total variance of the image, the inter-class variance (between-class variance) and intra-class variance (within-class variance).
\( {m}_T={\sum}_{i-\mathit{\min}}^{max}i\ast {P}_i \): The total average of all the image points
Pi: The probability of occurrence of the gray level i in the image.
Pfont(t), Pobjet(t): The sum of the probabilities of occurrence of gray levels of pixels of the background and that of the object by taking the threshold t.
mfont, mobjet: The average of the pixels belonging to the background and that of the pixels of the object.
\( {\delta}_{font}^2(t),{\delta}_{objet}^2(t) \): The variance of the class background and the variance of the class object.
[min, max] is the dynamic range of the image.
2.2 LBP (label binary pattern)
The descriptor LBP (Local Binary Pattern) was proposed by Ojala et al. [28, 29] in 1996 for the texture classification.
We consider an image I(x, y) and gc representing the gray level of the central pixel (x, y) Moreover, gp the gray value of its neighbors and P represents thetotal number of neighbors concerned and R is the radius of the neighborhood:
LBP operator is defined as follows:
The thresholding function S(x) is defined by:
2.3 K-means
k-means is the simplest unsupervised learning algorithm that solve the problem of classification.
k-means is to minimize the sum of squared distances between all the points and the class center [31].
Where:
-
K: The number of cluster centers;
-
ci: The number of data points in ithcluster;
-
‖xj − vi‖: The Euclidean distance between xj andvi;
-
vi: The mean of ith in ci during each iteration; it is as Follows:
Let X = {x1, x2x3, ……. ., xn} be the set of data points and V = {v1, v2, v3………, vc] be the set of centers.
The main steps of the method “K-means” can be summarized as follows:
-
Randomly selecting K objects.
-
Assigning each object to the nearest class, each of these classes is characterized by a center.
-
Calculate the new representatives for classes.
-
Repeat 2 and 3 until the centers cease moving.
The intra-cluster distance: is the sum of squared distance from all points to their cluster centers (see eq. 16).
Where: N is the number of pixels in the image, k is the number of clusters, and vi is the cluster centre of clusterci.
The inter-cluster distance: is the distance between cluster centers (see eq. 17).
Where: i = 1, 2, …, k − 1 andj = i + 1, …, k.
2.4 Hidden markov
The hidden Markov models (Hidden Markov Models or HMM) model random phenomena that are assumed to comprise a first level of a random process of transition between unobservable states (hidden states) and on second level, other random process in each state generates observable values. Assume that Z is a 2D gray-level matrix (M ∗ N).The \( {Z}_i^T \) denotes the intensity measurement at pixel i. Given an image y = (y1, y2, …., yN). Each yi associated with pixel i is an unknown class label xiϵ L where L is regarded as the set of all possible labels. The Gaussian Hidden Markov Random Field (HMRF) can be specified as:
Where X = (X1, X2, ……, XN), g(yi, θ1) is a Gaussian probability density function with parameter θ1 = (μ1, σ12) and \( q\Big(1\left|{X}_{N_i}\Big)\right. \) is a conditional probability mass function for the class label l.
We use the MAP and EM algorithm to estimate the parameter set x and θ.
-
MAP algorithm
We seek a labeling of an image, which is an estimate of the true labeling, according to the MAP criterion:
It is assumed that yi and Xi are pair-wise independent so
and the probability density function for x is the so-called Gibbs distribution (proposed by Geman [14]) is given by:
Where Z is a normalizing constant called the partition function, and U(x) is an energy function given by the form:
Where Vc(x) is the clique potential and C is the set of all possible cliques (see more details in [14]). In this paper, it is assumed that each pixel has at most 4 neighbors in the image domain. Then, on pairs of neighboring pixels, the clique potentials is calculated by:
The MAP estimation is equivalent to minimizing the posterior energy function
Where \( U\Big(y\left|x\Big)=\right.\left.{\sum}_i\left[\frac{{\left({y}_i-{\mu}_{X_i}\right)}^2}{2{\sigma}_{X_i}^2}\right.+\frac{1}{2} loglog{\sigma}_{X_i}^2\right] \) Forsolving the MAP problem we can use the same approach proposed in [36].
-
EM algorithm
-
We use the EM algorithm to estimate the parameters θ. Below, it is briefly explained:
-
At the kth iteration, we have Θ(k), and We compute the EM functional:
-
For obtaining the next estimate we maximize the EM functional.
More details can be found in [3, 36].
2.5 Cross-correlation
The cross-correlation measurement normalized centered, noted ZNCC (Zero mean Normalized Cross-Correlation) is given by:
ZNCC(fg, fd) Values belong to the [−1, 1] interval. This measure corresponds to the coefficient of linear correlation classic statistics. This measurement is one of the most used, particularly in [16]. It has the advantage of exhibiting gain and bias type of invariance.
2.6 Region growing
The segmentation method by increasing regions [10, 32] is still used in many applications. In fact, this technique enables us to take into account the positions previously found to accelerate the segmentation. The method begins by sowing « seeds » in the image; they will give birth to regions. Then, regions grow, and then merge so that we finally obtain stable regions. The original pixels are called « seeds » or « primers ». We start with a seed and it is extended by adding adjacent pixels that satisfy the homogeneity criterion.
2.7 Data base
To test our approach, we have used the mini-MIAS database [34]. This database contains 322 digital mammograms images of the size 1024 * 1024 pixels and of the PGM type, these images are in grayscale with a pixel intensity of the interval [0,255], acquired mammogram images are classified into three major cases: normal, benign and malignant. The Fig. 1 shows the various components of an image of the base used.
3 Proposed approach
The problem we want to solve in this article is how to extract and detect the tumor region in mammograms images. As shown in Fig. 2 the proposed approach is based on the following steps:
-
Step1:
The preprocessing phase: (This phase is applied on all the base images) Applying a pretreatment on each image of the database MIAS using Otsu’s method for the binarization and the removal of unwanted areas. Obtained images are stored in a new base (treated MIAS).
-
Step2:
The number of recovery phase of average classes: In this step, we want to recover the average number of classes from the MIAS treated base to utilize it in the next phase. To this end, we apply the LBP method.
-
Step3:
The recovery phase of the number of optimal classes to extract the optimal number of classes, we use the algorithm proposed in [13], this algorithm take as parameters input an image and the number of average classes recovered in the previous phase and as output, the number of optimal class of input image.
-
Step4:
The extraction phase of the classes After the recovery of the optimal number classes, this number is used to initialize the k-means algorithm, as a result of the application of this algorithm; we obtain a set of images, each representing a class.
-
Step5:
The adjustment classes phase to get a good classification, we adjusted the classes obtained in the previous phase using the method of Markov [12]. To make this adjustment, this method that uses the original image as a reference image to correct the classes obtained in the extraction phase of the classes.
-
Step6:
The selection phase of the tumor class in this step, we want to choose the class that contains the tumor in an automatic manner, to this end; we compute the correlation class obtained in step 5 for each image of the database MIAS_treated. After observing results, we observe that for each image, a class having a high cross-correlation represents the tumor class Table 1. So later this criterion is used to select tumor class.
-
Step7:
The pectoral muscle elimination phase; most of the tumor classes selected contain several objects, these objects represent parts of tumors and another part represent (pectoral muscle); so to distinguish the tumor objects in this phase we want to eliminate the pectoral muscle. In order to do this, we apply the method applied in Growing region. It begins with the starting pixel research of this method. This pixel is found either in the left or right corner of the image. After the elimination of the muscle, we clearly see that the objects that remain in the tumor class represent only the tumor.
4 Result and discussion
4.1 Result
The algorithm described above is used to segment in an automatically manner the breast tumors in a mammogram. As we mentioned in Section 2, the images used in this study were obtained from the mini-MIAS database. Figure 3 present some example of the detecting the breast tumors based on the criteria presented above. This figure shows the results obtained for three images. The output of each step of our algorithm has been shown in different lines. These results show that the proposed method can segment and detect the tumor part with good quality.
The pectoral is the term relating to the chest. It is a large fan shaped muscle that covers much of the front upper chest. Hence during the mammogram capturing process pectoral muscle also would be captured. The pectoral muscle represents a predominant density region. Hence it will severely affect the result of image processing. For better detection accuracy pectoral region should be removed from mammogram image. The orientation of the breast should be found out to remove the pectoral region. After the removing the artifact, the pectoral region also removed using connected component labeling methods. Figure 3 shows the pectoral muscle removal image. Table 2 show the comparative analysis of pectoral muscle removal results. For the 322 mammograms evaluated, the mean values of accuracy and error are 91,92% and 8,07% respectively.
4.2 Discussion
Most of the work (see Table 3) that treat the tumor extraction problem meet two main problems. The first problem is the removal of pectoral muscle and the second problem is the extraction of tumor. For this work, we presented a solution to solve these two problems at the same time. We used the MIAS basis to test our approach, each image contains a defect, we have information about the center and an approximation on the circle radius around the anomaly presented.
5 Conclusion
In this article, we have proposed a method for classification and automatic detection of the tumor on mammogram images. To improve the quality of detection of the tumor, first, we presented a technique of preprocessing to remove objects that not belong to the breast through the Otsu method. After the preprocessing step, we estimated the number of classes based LBP Technique (Local Binary Pattern). Then we performed a classification from k-means and we improved the classes obtained with this method based on the method of hidden Markov. Finally, we calculated the correlations between these classes and the original image to detect automatically the class that contains the tumor and the pectoral muscle. To eliminate the pectoral muscle, we applied the region growing method. Experimental results compared with previous state-of-the-art methods on mini-MIAS database showed that our method consistently achieved high accuracy of pectoral muscle removal which reaches 91,92%.
References
Agrawal P, Vatsa M, Singh R (2014) Saliency based mass detection from screening mammograms. Signal Process 99:29–47
Alayli RM, El-Zaar AY (2013) An iterative mammographic image thresholding algorithm for breast cancer detection. ACIT'2013
Aschwanden P, Guggenbuhl W (1992) Experimental results from a comparative study on correlation-type registration algorithms. Robust Computer Vision, 268–289
Bon AT (2009) Developing K-Means Clustering on Beltline Moulding Contours. J Appl Sci Res 5(5):2189–2193
Boss R, Thangavel K, Daniel D (2013) Automatic mammogram image breast region extraction and removal of pectoral muscle. arXiv preprint arXiv:1307.7474
Chan TF, Vese LA (2001) Active contours without edges. IEEE Trans Image Process 10(2):266–277
Chen Z, Zwiggelaar R (2010) Segmentation of the Breast Region with Pectoral Muscle Removal in Mammograms. Medical Image Understanding and Analysis (MIUA) 2010. The University of Warwick, Coventry, pp 71–76
David R, Arnau O, Joan M, Marta P, Joan E (2005) Breast Segmentation with Pectoral Muscle Suppression on Digital Mammograms, Springer-Verlag, Berlin Heidelberg, 471–478, LNCS 3523
Djukovic D, Zhang J, Raftery D (2018) Colorectal Cancer Detection Using Targeted LC-MS Metabolic Profiling. In: Colorectal Cancer. Humana Press, New York, pp. 229–240
Dokládal P, Lohou C, Perroton L, Bertrand G (1999) Liver blood vessels extraction by a 3-D topological approach. In Medical Image Computing and Computer-Assisted Intervention–MICCAI’99 (pp. 98–105). Springer Berlin/Heidelberg
El Idrissi el Kaitouni S, Abbad A, Tairi H (2017). Tumor extraction and elimination of pectoral muscle based on hidden Markov and region growing: applied based MIAS. In International Conference on Advanced Technologies for Signal and Image Processing (ATSIP). IEEE Conference. In press
El Idrissi el Kaitouni S, Abbad A, Tairi H (2017) Automatic detection of the tumour on mammogram images based on hidden Markov and active contour with quasi-automatic initialisation. International Journal of Medical Engineering and Informatics 9(4):316–331
Elmoufidi A, El Fahssi K, Jai-Andaloussi S, Madrane N, Sekkaki A (2014). Detection of regions of interest's in mammograms by using local binary pattern, dynamic k-means algorithm and gray level co-occurrence matrix. In Next Generation Networks and Services (NGNS), 2014 Fifth International Conference on (pp. 118–123). IEEE
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721–741
Gumaei A, El-Zaart A, Hussien M, Berbar M (2012) Breast segmentation using k-means algorithm with a mixture of gamma distributions. In Broadband Networks and Fast Internet (RELABIRA), 2012 Symposium on (pp. 97–102). IEEE
Jen CC, Yu SS (2015) Automatic detection of abnormal mammograms in mammographic images. Expert Syst Appl 42(6):3048–3055
Kwok SM, Chandrasekhar R, Attikiouzel Y, Rickard MT (2004) Automatic pectoral muscle segmentation on mediolateral oblique view mammograms. IEEE Trans Med Imaging 23(9):1129–1140
Liu L, Wang J, Wang T (2011) Breast and Pectoral Muscle Contours Detection Based on Goodness of Fit Measure, 978–1–4244-5089-3/11/$26.00. IEEE
Liu CC, Tsai CY, Tsui TS, Yu SS (2012) An improved GVF snake based breast region extrapolation scheme for digital mammograms. Expert Syst Appl 39(4):4505–4510
Margolies LR, Salvatore M, Yip R et al (2018) The chest radiologist's role in invasive breast cancer detection. Clin Imaging 50:13–19
Mohanty AK, Sahoo S, Pradhan A, Lenka SK (2011) Detection of masses from mammograms using mass shape pattern. International Journal of Computer Technology and Applications 2(4):1131–1139
Mustra M, Grgic M (2013) Robust automatic breast and pectoral muscle segmentation from scanned mammograms. Signal Process 93(10):2817–2827
Mustra M, Bozek J, Grgic M (2009) Breast Border Extraction and Pectoral Muscle Detection using Wavelet Decomposition. EUROCON, IEEE, St. Petersburg, pp 1426–1433
Nagi J, Kareem SA, Nagi F, Ahmed SK (2010) Automated breast profile segmentation for ROI detection using digital mammograms. In Biomedical Engineering and Sciences (IECBES), 2010 I.E. EMBS Conference on (pp. 87–92). IEEE
Nakajima T, Yasufuku K (2018) Early lung cancer: methods for detection. In: Interventions in Pulmonary Medicine. Springer, Cham, pp. 245–256
Nie L, Wang M, Zhang L, Yan S, Zhang B, Chua TS (2015) Disease inference from health-related questions via sparse deep learning. IEEE Trans Knowl Data Eng 27(8):2107–2119
Nie L, Zhang L, Yang Y, Wang M, Hong R, Chua TS (2015) Beyond doctors: Future health prediction from multimedia and multimodal observations. In Proceedings of the 23rd ACM international conference on Multimedia (pp. 591–600). ACM
Ojala T, Pietikäinen M, Harwood D (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recogn 29(1):51–59
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics 9(1):62–66
Ray S, Turi RH (1999) Determination of number of clusters in k-means clustering and application in colour image segmentation. In Proceedings of the 4th international conference on advances in pattern recognition and digital techniques (pp. 137–143)
Selle D, Preim B, Schenk A, Peitgen HO (2002) Analysis of vasculature for liver surgical planning. IEEE Trans Med Imaging 21(11):1344–1357
Singh N, Mohapatra AG, Kanungo G (2011) Breast cancer mass detection in mammograms using K-means and fuzzy C-means clustering. Int J Comput Appl 22(2):0975–8887
Suckling J, Parker J, Dance D, Astley S, Hutt I, Boggis C, … Taylor P (1994) The mammographic image analysis society digital mammogram database. In ExerptaMedica. International Congress Series (Vol. 1069, pp. 375–378)
Wang L, Zhu ML, Deng LP, Yuan X (2010) Automatic pectoral muscle boundary detection in mammograms based on Markov chain and active contour model. Journal of Zhejiang University-SCIENCE C 11(2):111–118
1Zhang Y, Brady M, Smith S (2001) Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans Med Imaging 20(1):45–57
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
El Idrissi El Kaitouni, S., Abbad, A. & Tairi, H. A breast tumors segmentation and elimination of pectoral muscle based on hidden markov and region growing. Multimed Tools Appl 77, 31347–31362 (2018). https://doi.org/10.1007/s11042-018-6089-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6089-z