Introduction

Breast cancer is one of the leading causes of cancer death in women. The development of the malignancy among women in developed countries is very common, and it continues to rise in less developed countries [1]. It is the most common cancer in women worldwide, comprising 16% of all female cancers [2]. Although breast cancer is thought to be a disease of the developed world, a majority (69%) of all breast cancer deaths occurs in developing countries [3]. Survival rates due to breast cancer vary greatly worldwide, ranging from 80% or over in North America, Sweden, and Japan to around 60% in middle-income countries and below 40% in low-income countries [4]. The low survival rates in less developed countries may be due to lack of early detection programmes, thereby resulting in a higher number of women being detected with the later-stage disease. Hence, the increase in the death rate is a big concern. Survival rate declines as women enter further in the stages of breast cancer. Thus, to reduce breast cancer mortality rate and increase survival rate, early detection and treatment are crucial.

Various modalities are available to detect the presence of breast cancer these days. They are based on light (optical), sound (ultrasound), heat (thermogram), magnetism, attenuation, microwave, x-rays, nuclear, electrical impedance, and computer modeling (inverse simulation), or a fusion of different methods and computer-aided diagnosis exists.

Mammography is considered the gold standard screening tool for the early detection of breast cancer. Mammography shows tumors long before they develop into later stages or big enough to be felt by the doctor [5]. It uses doses of ionizing radiation to form images of the breast area, which are then used to detect abnormal breast mass. In the procedure, different sides of breast are compressed by a mammography machine to flatten the tissue and hold the breast still. This increases the quality of image and prevents motion blur. Variability in the interpretation and tissue density may affect the results of the mammography [6, 7]. It was shown that, Magnetic Resonance Imaging (MRI) can be used effectively with high sensitivity in the early detection of breast cancers, particularly in pre-menopausal women [8] with reduced specificity [9].

The relationship between breast skin temperature and breast cancer was studied [10, 11]. They have detected and measured changes in skin temperature of clinically healthy and cancerous breasts. The use of infrared imaging is based on the principle that the metabolic activity and vascular circulation in pre-cancerous tissue and its surrounding area are often higher than in normal breast tissue [12]. The cancerous tumors increase circulation to their cells in order to supply nutrients by opening existing blood vessels, dormant (inactive) vessels and new ones. Thus, this results in an increase in regional surface temperature of the breast that can be detected by infrared imaging. This procedure involves the use of medical infrared cameras and computers to detect and produce high quality images of temperature variations. Due to the sensitivity of infrared imaging, earliest signs of breast cancer and the pre-cancerous state of the breast can be observed in the temperature spectrum.

Other methods of breast cancer detection such as mammogram, ultrasound and MRI rely mostly on finding the tumor. However, infrared imaging focuses on finding thermal signs that suggest the presence of an early stage tumor which cannot be detected physically or suggest a pre-cancerous stage of the breast through minute variation in normal blood vessel activity. X-rays scan readings are affected by dense breasts or by hormonal changes that occur in women who have underwent hormone replacement therapy. Infrared imaging allows detection of both breast cancer and potential breast cancer risk which sometimes cannot be done through mammogram, MRI, and ultrasound. In a study by Gautherie et al. [13], 1527 patients with initially healthy breasts and abnormal thermograms were followed for a period of 12 years. 44% of the patients developed cancer within the first 5 years. The group concluded that “an abnormal thermogram is the single most important marker of high-risk category for the future development of breast cancer”. Similar conclusion was obtained in many other studies [14, 15]. But, the accuracy of thermography depends on many factors such as the symmetry of the breasts' temperature, temperature stability, physiological state, and menstruation [16].

Establishing the surface isotherm pattern of the breast and the normal range of cyclic variations of temperature distribution can assist in identifying the abnormal infrared images of diseased breasts. Therefore, Ng et al. [16] investigated the cyclic variation of temperature and vascularization of the normal breast thermograms under a controlled environment. The authors presented a method to segment the thermograms and to choose an ideal time for thermal examination. Nowadays, IR imaging is becoming an increasingly popular diagnostic tool to detect various diseases. It has been widely used to detect the malignant tumors in the breast by thermovision techniques [1619].

The texture of an image can be simply defined as a function of spatial variation in pixel intensities [20]. Numerous methods have been proposed to study texture, and these methods can be classified into four categories: statistical methods, model based methods, geometrical (structural, syntactic) methods, and signal processing methods [20]. In the medical field, analysis of texture plays an important role in a number of applications. Recently, Tan et al., have used texture features to study the ocular thermograms in young and elderly subjects [21]. They have reported a significant difference in their respective texture parameters.

Figure 1 shows the block diagram of the proposed system used in this work. First, the thermogram image is acquired using the IR camera. Then the image is cropped and converted to a grayscale image. Subsequently, different texture parameters are extracted and subjected to ‘t’-test for selecting the best features. These selected features are fed to the Support Vector Machine (SVM) classifier for automatic detection.

Fig. 1
figure 1

Proposed block diagram of our work

The aim of this work is to make use of infrared imaging to detect signs of breast cancer or abnormality automatically. As the temperature of cancerous cells is higher compared to normal cells, these cancer cells can be better identified on infrared images. Texture analysis is adopted to extract features from breast thermograms, and to detect the presence of cancerous tumors subsequently. Different statistical and several features of co-occurrence matrix and run length matrix were extracted and fed into the SVM classifier for automatic classification. The paper is organized as follows: “Data Acquisition and Preprocessing” provides a detailed outline on the acquisition and preprocessing of the breast thermograms. A theoretical background on the texture analysis is given in “Texture Analysis”. The methodology, and the results obtained on evaluating the performance of the classifier are presented in “Classification” and “Results”, respectively. The discussion is given in “Discussion”. Finally, the paper concludes in “Conclusion”.

Data acquisition and preprocessing

Field data were collected from the Department of Diagnostic Radiology, Singapore General Hospital using non-contact thermography [22, 23]. Infrared thermograms were acquired using NEC-Avio Thermo TVS2000 MkIIST System 3.0–5.4 μm short wavelength (30 frames/sec), Stirling cooler, InSb detector with (256 × 200) elements (Japan) (URL: www.nec-avio.co.jp/en/contact/index.html) which has a measuring accuracy of ±0.4% (full scale) and temperature resolution of 0.1°C at 30°C black body, with the instrument placed 1 m away from the chest with lens (FOV 15° × 10°, IFOV 2.2 mrad) attached. 90 patients were chosen at random to undergo the thermography examination. Examination was done in a temperature-controlled room with the temperature range of 20–22°C (within ±0.1°C). Humidity of the examination room was maintained at 60% ± 5% [2428]. The patients were required to rest for at least 15 min to stabilize and reduce the basal metabolic rate, which will result in minimal surface temperature changes, and therefore, satisfactory thermograms [29, 30]. Also, the patients were asked to wear a loose gown that does not restrict airflow. Furthermore, it was ensured that the patients were within the recommended period of the 5th to 12th and 21st day after the onset of menstrual cycle since during these periods the vascularization is at basal level with least engorgement of blood vessels [16, 31]. In this work, we have used a total of 50 thermograms, where 25 thermograms were from cancer patients (age: 51 ± 8 years) and 25 were from normal subjects (age: 46 ± 10 years). In the malignant class, 15 patients had stage III cancer and rest had stage II cancer. 50% of the lumps were found in the upper-outer quadrant, 35% in the area behind the nipple, and 15% were located in the upper-inner quadrant. We have analyzed the cancerous breast in each of the 25 malignant cases and one normal breast in each of the 25 normal cases. Figure 2(a) shows the thermogram image of a malignant breast, Fig. 2(b) is the corresponding grayscale image, and Fig. 2(c) presents the 50 × 120 cropped images of the left and right breasts.

Fig. 2
figure 2

Thermogram images: a Original b Grayscale version c cropped left and right breasts

Texture analysis

Texture measures smoothness, coarseness, and regularity of pixels in an image. These features describe the mutual relationship among intensity values of neighboring pixels repeated over an area larger than the size of the relationship [32]. The texture recognition system can be grouped into two main classes: structural and statistical. Structural texture analyses are more complex compared to the statistical approach [32]. Statistical approaches yield characterization of textures as smooth, coarse, grainy etc. These methods are based on the relationship between intensity values of pixels; measures include entropy, contrast, and correlation based on the gray level co-occurrence matrix. In statistical methods, features are described using a spatial gray level dependency matrix. Some of the statistical features extracted from the thermograms are described in this section.

Co-occurrence matrix

Given an M x N image, the gray level co-occurrence matrix (GLCM) is defined [21] by

$$ {C_d}(i,j) = \left| {\left\{ {(p,q),(p + \Delta x,q + \Delta y):I(p,q) = i,I(p + \Delta x,q + \Delta y) = j} \right\}} \right| $$
(1)

where \( (p,q),(p + \Delta x,q + \Delta y) \in M \times N,\;d = (\Delta x,\Delta y) \) and \( \left| : \right| \) denotes the cardinality of a set. Given a grey level i in an image, the probability that a pixel at a (∆x,∆y) distance away is j can be expressed as

$$ {P_d}(i,j) = \frac{{{C_d}(i,j)}}{{\sum {{C_d}(i,j)} }} $$
(2)

From the above matrix, the following moments m 1 , m 2 , m 3 , and m 4 can be obtained using the equation

$$ {m_g} = \sum\limits_i {{{\sum\limits_j {(i - j)} }^g}{P_d}(i,j)} $$
(3)

Difference statistics is “the distribution of the probability that the gray level difference is k between the points separated by δ in an image” [33]. They are the subset of co-occurrence matrix, and obtained from the matrix [33] by

$$ {P_{\delta }}(k) = \sum\limits_i {\sum\limits_j {{C_d}\left( {i,j} \right)} } $$
(4)

where \( \left| {i - j} \right| = k,\;k = 0,{1}, \ldots n - {1} \), and n is the number of grayscale level [34].

Run length matrix

In run length matrix, the gray level runs are characterized by the gray tone, length and the direction of the run. P θ (i,j) is the run length matrix. In each entry it consists of the number of elements where gray level “i” has the run length “j” continuous in the direction θ [35]. Various textural features were calculated from the run length matrices of θ = 0°, 45°, 90°, and 135° [36]. The features computed for classification are given below.

$$ {\hbox{Gray}}\;{\hbox{level}}\;{\hbox{non}} - {\hbox{uniformity}}:\sum\limits_i {{{\left\{ {\sum\limits_j {{P_{\theta }}(i,j)} } \right\}}^2}} /\sum\limits_i {\sum\limits_j {{P_{\theta }}(i,j)} } $$
(5)
$$ {\hbox{Run}}\;{\hbox{percentage:}}\;\sum\limits_i {\sum\limits_j {{P_{\theta }}\left( {i,j} \right)} } /A. $$
(6)

where A is the area of the image of interest. Normalization is performed to scale down the values of the computed features.

Classification

After normalizing the features, they were fed to the classifier. In this work, the Support Vector Machine (SVM) classifier was used for the automated diagnosis of the breast cancer. The SVM classifier has illustrated excellent performance in a great deal of pattern recognition problems. The SVM is a supervised learning method which aims to determine a separating hyperplane that distinguishes positive examples from negative examples. Given a set of labeled training data, it generates input-output mapping functions which can either be used for classification or for regression. For classification, input data are often transformed to high-dimensional feature space with the use of nonlinear kernel functions, so that the transformed data becomes more separable compared to the original input data.

Results

We have extracted 16 texture features: homogeneity, energy, entropy, moment1, moment2, moment3, moment4, entropy, angular second moment, contrast, mean, short runs emphasis, long runs emphasis, run percentage, gray level non-uniformity, and run length non-uniformity. But, only four features: moment1, moment3, run percentage, and gray level non-uniformity were selected as they were clinically significant (low p-values) compared to the other features. Table 1 shows the ranges of the selected four texture features for normal and malignant breast thermograms. The results show that, the four features are clinically significant (p value is low).

Table 1 Selected features with the least p-values

Three-fold stratified cross validation method was used to test the SVM classifier. The whole dataset was split into three parts (roughly) such that each part contains approximately the same proportion of class samples as the original dataset. Two parts of the data (training set) were used for classifier development and the built classifier was evaluated using the remaining one part (testing data). This procedure was repeated three times (folds) using a different part for testing in each case. Then the average of all the three results was calculated to get the accuracy, sensitivity, specificity and positive predictive accuracy. Sensitivity is the probability that a test will produce a positive result when used on diseased population. Specificity is the probability that a test will produce a negative result when used on disease-free population. Accuracy is the ratio of the number of correctly classified samples to the total number of samples. The positive predictive value is the proportion of patients with positive test results who are correctly diagnosed. A Receiver Operating Characteristic (ROC) curve is obtained by calculating the sensitivity and specificity of a diagnostic test at different threshold values and plotting sensitivity vs. (1 – specificity). A test that perfectly discriminates between the two groups (normal and abnormal) would yield a curve that coincides with the left and top sides of the plot. Generally, the goodness of a diagnostic test is assessed by determining the Area under the ROC curve (AUC), which can vary between 0.5 and 1. In practice, the closer the area is to 1.0, the better the test is, and the closer the area is to 0.5, the worse the test is.

Table 2 shows the results of the classification. 36 images (18 normal and 18 malignant) were used for training. 14 thermograms (7 in each class) were used for testing. Our SVM classifier is able to classify with a classification accuracy of 88.10%. Table 3 shows the values of sensitivity, specificity, positive predictive accuracy, and the AUC. It can be seen from the Table 3 that, we are able to identify the unknown class with a high sensitivity and specificity of 85.71% and 90.48%. This can be further improved by taking more diverse thermograms and better texture features.

Table 2 Results of classification
Table 3 Values of sensitivity, specificity, positive predictive accuracy, and area under the ROC curve for the SVM classifier

Figure 3 shows the snap shot of graphical user interface developed for our proposed system. There is a Load Image button provided to load the test thermogram image. Once that button is clicked, the selected image is displayed. The grayscale images of the left and right breasts are also displayed. The patient information corresponding to the selected image (Name, Age, and Gender) are also displayed on the top left hand side. On clicking the Support Vector Machine (SVM) push button, the features are automatically extracted from the image, fed into the SVM classifier, and the final class of the thermogram is displayed in the Result section. In the illustration, the class is Malignant.

Fig. 3
figure 3

Graphical user interface of the proposed system

Discussion

Mammogram is the most commonly recommended diagnostic modality for breast cancer detection. It is able to identify about 61–87% of breast cancer cases [37]. However, it has a lower sensitivity in women of aged less than 50 years because of its inability to effectively image dense breast tissue that younger women have [37]. Moreover, the false negative rates of mammogram are between 5% and 15%. Using mammography it is also difficult to differentiate tumor from postoperative breast scar. Thus, despite being the primary imaging method, mammography has its own limitations. In addition, ultrasound and mammogram can only detect an already developed cancer that is big enough to be detected in the obtained images [37].

Owing to these limitations of the current popular imaging modalities, several other modalities are being continuously evaluated for breast cancer screening and diagnosis. One of the most popular modalities under study is the infrared thermogram. In a recent study by Schaefer et al. [38], the features derived from cross co-occurrence matrix, coupled with fuzzy classification were utilized to analyze breast thermogram for cancer diagnosis. Their proposed algorithm was able to identify the malignancies with an accuracy of 80%. Tan et al. [39] proposed a Complementary Learning Fuzzy Neural Network (CLFNN), as a Computer-Assisted Intervention (CAI) tool for breast thermogram analysis. Experimental results show that the confluence of breast thermography and CLFNN not only provides a low cost alternative but also aids the physician in breast cancer detection and thermogram analysis with relatively superior accuracy. Application of k- and fuzzy c-means for color segmentation of thermal infrared breast images was reported by EtehadTavakol et al. [40]. They suggested that fuzzy c-means is preferred because the fuzzy nature of IR breast images helps it to provide more accurate results with no empty cluster. Recently, Wiecek et al. [41] used Discrete Wavelet Transform (DWT) with biorthogonal and Haar mother wavelets, and neural networks to classify the normal and benign thermograms. They were able to classify accurately with an efficiency of 86.6%.

In another study [5], discrete temperature readings were recorded by placing 16 temperature sensors on the surface of the breast to detect normal, benign, cancer, and suspected cancer stages [5]. They used five classifiers namely, back-propagation algorithm, probabilistic neural network, fuzzy, Gaussian mixture model, and support vector machine for classification. They were able to achieve more than 80% accuracy in classifying the four different classes.

Similar to the results of the above mentioned studies, in this work, we have also demonstrated the utility of breast surface temperature as an indicator for malignancy. This method is suitable for young women for whom mammography has proved to be not very efficient. A thermogram presents a visual representation of ‘hot spots’ of the breast, and hence, the interpretation may be subjective. Therefore, we extracted texture features from the thermograms in order to feed into classifiers for automatic classification. This makes the interpretation more objective and automatic, and therefore, inter-observer variability of diagnostic prediction is highly reduced. By using the SVM classifier and the texture features, we have demonstrated that our proposed technique has a higher classification accuracy of 88.10% in differentiating normal and malignant breasts. The sensitivity and specificity were also high (85.71% and 90.48%, respectively).

Out of the 25 cancerous cases studied here, there were 10 carcinoma patients with stage II cancer and 15 with stage III cancer. To make the proposed approach more useful, more thermograms have to be obtained from women with early stage I small malignancies, and texture features should be extracted from them and fed to the SVM classifier. Besides, detection of malignant breasts using infrared thermography can be further improved by developing an algorithm that analyzes the relevant segmented part of the breast area instead of using the cropped image. Moreover, the accuracy and reliability of the system can be improved by increasing the number of training images used for classification.

Conclusion

We have developed an automatic diagnosis technique for the assessment of breast cancer based on thermograms using texture features and the SVM classifier. In this work, we have proposed the use of four texture features namely moment1, moment3, run percentage, and gray level non-uniformity for representing the thermograms. These features reflect the pixel variation and contours in the images. On evaluating an SVM classifier with these features, a high accuracy of 88.10%, sensitivity and specificity of 85.71% and 90.48% respectively were obtained. The accuracy of a diagnostic tool that uses classifiers depends on several factors such as the size and quality of the training data and features chosen as classifier inputs. The accuracy can be further improved by extracting better texture features and by using a larger sample size. To summarize, we have demonstrated that infrared thermography with the help of an automatic classification algorithm can prove to be a valuable and reliable adjunct tool for physicians to detect the breast cancer.