Introduction

Early detection and diagnosis of breast diseases are crucial to reduce the incidence and mortality rate of breast cancer by improving the survival benefits. In India, the incidence rate of breast cancer increases in young age group and the cancer growth is very aggressive in women of younger age group [1]. Moreover, the vulnerability of young women towards the radiation risk and high false negative rate of the traditional gold standard method X-ray mammography necessitates exploring the efficiency of non-invasive and non-radiating infrared breast thermography (IBT) in early breast abnormality detection. Since breast cancer is asymptomatic until late in the disease process, regular breast health examination is necessary to identify any change in breast health. With the sensitivity of 90% [2], IBT can be used for routine examination of the breast health in asymptomatic patients for detecting the cases that require urgent medical attention. In literature [3, 4], it is reported that the thermograms which possess asymmetric temperature distributions signify the physiological dysfunction in patients’ breasts most of the time [5]. Compared to these thermograms, the thermograms having increased nipple temperature, hot patches, and vascular changes may be more suspicious and indicate more severe breast problems [6, 7]. The appearance of these hot patches and vascular changes are due to the heat produced by the high metabolic activities of the blood vessels, changes in blood perfusion, etc. in the suspicious regions [7, 8]. However, the intensity of heat radiation emitted by a breast mass may vary with its location within the breast [9], due to which a malignant tumor present deeper inside the breast may radiate poor radiation than a benign tumor present near the skin surface. Besides, the subtle temperature difference in breast thermograms may lead to an incorrect analysis of thermograms. So in order to identify the presence of breast abnormality in early stages, designing of an efficient computer-aided detection (CAD) system to analyze the thermal patterns may play a significant role.

As reported in the literature, the most common breast abnormality detection method is the bilateral asymmetry analysis of breast thermograms, which is based on the premise that the thermal patterns of the two breasts of an abnormal thermogram are noticeably asymmetric [10]. Although many contributions to the breast abnormality detection from infrared thermograms have been found in the literature [3, 11, 12], differentiating the abnormal thermograms from the normal one is still very challenging because of the subtle nature of temperature patterns in breast thermograms. The early and accurate detection of breast abnormality is important not only to provide a second opinion to the physicians in decision making but also for treatment planning.

In this study, we propose an automatic breast abnormality detection system to support the radiologists in decision making. Figure 1 depicts the flow of the proposed breast abnormality detection system. The proposed method is motivated by the fact that the abnormal thermograms have more high temperature regions or thermal patches in either breast than the normal thermograms, for which the abnormal thermograms have more spatial information than the normal thermograms. To prove this fact, “Association of singular values with image complexity of breast thermograms” section performs the complexity analysis of the processed breast thermograms which concludes that the abnormal thermograms possess more complex information than the normal thermograms. The proposed system aims at discriminating the abnormal thermograms with suspected abnormality and also to predict the degree of abnormality in these thermograms. Such an automatic system can help to reduce the radiologists’ burden and examination time with the additional advantages of objectivity and diagnosis quality. Moreover, it can help to rapidly identify the cases that require urgent medical attention and further evaluation. The key contributions of this paper are summarized as follows.

Fig. 1
figure 1

The overview of the proposed breast abnormality detection system

  1. 1.

    We propose a simple yet effective breast abnormality detection method. Unlike the conventional method of using texture feature based bilateral asymmetry analysis, the proposed method makes use of the candidate thermal patches to signify the breast abnormality. A set of singular values (SVs) has been extracted from the pre-processed images to quantify the abnormality present in each thermogram.

  2. 2.

    The classification performance of the proposed breast abnormality detection system has been evaluated with both in-house and public breast thermogram databases namely Department of Biotechnology-Tripura University-Jadavpur University (DBT-TU-JU) [13] and Database of Mastology Research (DMR) [14] by using a series of seven state-of-the-art classification systems.

  3. 3.

    The potential of the SVs in breast abnormality detection is compared with some other feature sets and the evaluation results reflect that the proposed method outperforms the existing methods.

  4. 4.

    A breast abnormality grading (BAG) index has been designed by using 1st and 2nd rank SVs to grade the abnormal thermograms into mild abnormal and severely abnormal based on the degree of abnormality present in it.

The rest of the paper is organized as follows. A related work on IBT based breast abnormality detection is given in “Related work” section. The detail description of the proposed breast abnormality detection method is provided in “Proposed method” section. “Experimental results” section provides the details of the experimental databases and the experimental results. The comparative studies are presented in “Discussion” section. Finally, “Conclusion” section concludes the paper.

Related work

IBT with the characteristics of showing the abnormal thermal patterns or hot thermal patches as the suspicious breast regions draw the attention of the researchers to use it in breast abnormality detection. Bilateral asymmetry analysis is the most widely investigated and classical method of breast abnormality detection from thermograms. In literature [3, 11], there are several prior investigations in asymmetry analysis based breast abnormality detection from thermograms. A summary of the existing works on asymmetry analysis based breast abnormality detection has been illustrated in Table 1. Table 1 summarizes the system protocol like the feature sets, the size of the experimental dataset, the classifier algorithm and the performance scores of the existing methods [5, 15,16,17,18,19,20,21,22,23,24,25,26,27].

Table 1 Summary of existing bilateral asymmetry analysis based breast abnormality detection methods from thermograms

Even though a lot of work had been done in bilateral asymmetry based breast abnormality detection, this conventional method of bilateral asymmetry analysis cannot detect the presence of breast abnormalities in breast thermograms where both breasts are suffering from abnormalities. It is because, in thermograms where both breasts are suffering from breast abnormalities, the features that measure the bilateral asymmetry possess a very minute difference, which ends up concluding the absence of abnormality. Hence, in contrast to bilateral asymmetry analysis for abnormality detection, our proposed method has used the SVs to characterize these thermal patterns irrespective of their location in either breast. Thus, the proposed method may improve the accuracy of breast abnormality detection by increasing the number of true positives or by detecting the abnormalities in those breast thermograms, where bilateral asymmetry analysis fails to detect.

Proposed method

Given a breast thermogram, a set of efficient features is required to be extracted for detection of abnormality. Considering that our work focuses on the generation of efficient features from breast thermograms that can provide better performance over the reported works. The proposed breast abnormality detection system primarily consists of four sub-steps. The first sub-step involves the pre-processing and the segmentation of the breast regions from raw breast thermograms. The second sub-step does the identification of the candidate thermal patterns from the segmented images by doing hue–saturation–value (HSV) transformation and power law transformation of the images. The third sub-step decomposes the preprocessed images into SVs to quantitatively characterize the candidate thermal patterns of the breast thermograms. In fourth sub-step, the extracted SVs are fed to seven different classification methods to find the most efficient classifier for our proposed method in differentiating the abnormal thermograms from the normal ones.

Furthermore, based on the 1st and 2nd rank SVs extracted from each breast thermogram, a BAG index has also been developed to predict the degree of abnormality it possesses. Detail description of each sub-step is provided in the following sub-sections.

Pre-processing and segmentation of breast region

Pre-processing of breast thermograms

In designing a computer aided breast abnormality detection system, the pre-processing of the raw thermograms is very crucial since the raw breast thermograms generally contain some irrelevant parts like neck portion, area underneath the breasts and the background, etc. Hence to make the raw thermograms in “Rainbow HC” color pallet more suitable for further processing and to enhance the performance of the proposed system, the breast thermograms are manually cropped to discard the irrelevant regions. For the subsequent steps, all the breast thermograms are resized to have a resolution of 200 × 400 pixels. The pre-processing of the breast thermograms has been followed by the segmentation of the breast regions.

Segmentation of breast region

The primary objective of segmenting the breast region from a breast thermogram is to discard all those portions that are not belonging to the breasts so that the prediction accuracy of the breast abnormality detection system gets improved. Infrared breast thermograms are poor in contrast for which it lacks clear edges and it is amorphous in nature [28], which makes automatic segmentation of breast region very tedious. Although a lot of work have been done for automatically segmenting the breast region from a breast thermogram, due to the unclear lower parabolic boundaries in most of the cases present in our dataset, these methods are not found to be robust. Therefore by considering the fact that the lower breast boundaries and the inframammary folds should not be getting lost in segmentation of breast thermograms, we have adopted a semi-automatic segmentation method. This semi-automatic segmentation method requires the human interaction for selecting the lower parabolic boundaries of both breasts. Algorithm 1 summarizes the procedure of segmenting the breast region from a breast thermogram. Figure 2 shows the segmented breast regions of some sample breast thermograms.

Algorithm 1: The Breast Region Segmentation Algorithm

Input: RGB image I

Output: Segmented Breast Region S

 Step 1. I′ ← Compute the blue difference Chroma component Cb by using the RGB to YCbCr Image conversion formulae described in [29, 30]

 Step 2. B ← Convert I′ to binary image to distinguish the body area from the background by using thresholding segmentation technique

 Step 3. P ← Select the lower breast boundary points manually on I′ where the coordinates of the last point must be the first point so that it creates a polygon

 Step 4. X ← Generate a binary mask of the polygon, P

 Step 5. MB − X, M is the binary breast mask

 Step 6. S ← I * M

Fig. 2
figure 2

a Manually selected lower parabolic curve, b generated breast mask, c segmented breast region

Identification of candidate thermal patches

The goal of this step is to enhance the segmented breast thermograms to make the candidate thermal patches more prominent by neglecting the unnecessary surrounding regions. The candidate thermal patches are those thermal patterns that are characteristically in contrast to the surrounding regions on its color and intensity that can be used as an important fact for separating these candidate patterns from the surrounding areas. As illustrated in Fig. 3a–c, the temperature radiations emitted by different regions of a breast are represented with different pseudo colors. For better description of the candidate thermal patterns in thermograms, we have labeled each temperature region with different colors as shown in Fig. 3d–f. Among all the pseudo colors, the white region corresponds to the highest temperature region (hotspot) indicating the anomalous region of a breast and this highest temperature region along with its surrounding reddish white regions are considered as the candidate thermal patches. Figure 3a, d shows a breast thermogram with a malignant tumor in the right breast. Figure 3b, e shows the breast thermogram of a healthy person with no candidate thermal patches and Fig. 3c, f shows the breast thermogram of a healthy person with some candidate thermal patches. In order to make these candidate thermal patterns more prominent, two methods have been used here.

Fig. 3
figure 3

ac Some sample breast thermogram images; df corresponding labeled images with candidate thermal patches

RGB to HSV transformation

In a breast thermogram image, the color component does not carry important information, for which these color information can be removed to reduce the processing complexity of breast thermograms. Unlike the RGB color model, the HSV model has the advantages in that the intensity is decoupled from the color information, for which object description in terms of these components is easy. Besides, the hue and saturation are closely related to the way in which human observes color [30]. Figure 4 illustrates the histogram of each channel of an RGB and HSV color spaced breast thermogram. As depicted in Fig. 4a, it can be stated that in the RGB breast thermogram, all three color components are dominant over the entire dynamic range; no channel is efficient in highlighting the highest temperature regions. On the contrary, Fig. 4b illustrates that if converted to HSV color space, each of the three color components is dominant in different ranges. Hue is found to be more or less dominant over the entire dynamic range, while saturation is dominant over the higher scale and the value is found to be dominant over the middle to higher scale of the dynamic range. Based on the distribution range of each channel in the HSV breast thermogram, we have found that saturation channel will be more suitable for our research work to characterize the higher temperature regions. Hence, for developing a human visual system based breast abnormality detection method, the saturation channel (S) has been extracted from each segmented breast thermogram by using the Eq. (1) [30, 31],

$$S{\text{ }}=\left\{ {\begin{array}{*{20}{l}} {1{\text{ }} - {\text{ }}\frac{3}{{R{\text{ }}+{\text{ }}G{\text{ }}+{\text{ }}B}}\hbox{min} {\text{ }}(R,{\text{ }}G,{\text{ }}B),}&{if\;R+G+B>0,} \\ 0&{if\;R+G+B=0.} \end{array}} \right.{\text{ }}$$
(1)
Fig. 4
figure 4

The histogram plot of each channel of a breast thermogram image in a RGB color space and in b HSV color space

The saturation refers to the pureness of the color or the amount of whiteness in the color mixture [30] and hence, the higher temperature regions having the maximum intensity values are poorly saturated. The hue, saturation and value components obtained from a breast thermogram image are illustrated in Fig. 5. As shown in Fig. 5b, it has been seen that the candidate thermal patches have the highest contrast in the saturation channel and these patches appear as a group of pixels with local minimal intensity surrounded by higher intensity pixels. However, in contrary to the fact that the hot patch bears the highest temperature or intensity value, saturation channel makes the hot patches to have the lowest intensity values due to the poor saturation. Hence, the saturation channels of all breast thermograms are complemented so that the higher temperature regions are represented with the higher intensity values as shown in Fig. 5d and it is denoted as Is. Here it is worth mentioning that only the pixel values inside the breast region of the saturation channel get complemented for which the non-breast region (the black region of the breast mask shown in Fig. 2b) remain same before and after performing the complementation.

Fig. 5
figure 5

a Hue channel, b saturation channel, c value channel and d complemented saturation channel of a breast thermogram

Power law transformation

In thermal images, due to the absence of sharp transition of intensity values from one region to another, the thermal patterns do not have sharp boundaries and it causes the appearance of some unnecessary regions in Is as shown in Fig. 5d. Hence to further improve the representation power of the channel and to make the candidate thermal patches more prominent by removing all the unnecessary portions, the power law transformation has been performed. The power law transformation reduces the insignificant spread of the patches in Is to make the candidate patches more distinguishable. The power law transformed image, Ip is obtained by,

$${I_P}\left( {i,j} \right)=c \times {I_s}{\left( {i,j} \right)^\gamma },$$
(2)

where c, γ are positive constants.

The power law transform changes the dynamic range of Is. It is worth mentioning that for any γ > 1, power law transform increases the bandwidth of the high intensity values at the cost of the low pixel values and for γ < 1, it enhances the low intensity value while decreasing the bandwidth of the high intensity values. Since our motive is to suppress the low intensity pixel values by enhancing the high intensity pixels, we have used positive values of γ. For our experimental purpose, the γ value is set to 3.5. Moreover, the value of c is fixed at 1 as for any value of c not equal to 1, scaling significantly affects the dynamic range of the pixel values of Is.

Feature extraction

Feature extraction is probably the single most essential step in achieving high accuracy in breast abnormality prediction. In pattern recognition, it is desirable to extract features that focus on discriminating between classes. In this section, our main objective is to extract an efficient feature set from the candidate thermal patches that can significantly characterize the breast abnormality with limited number of features. To deal with this problem, we have utilized the image compression property of singular value decomposition (SVD), where it can represent an image with a limited number of SVs that can preserve the useful information of Ip. The extraction of SVs, their significance, their association with image complexity and normalization are detailed in subsections.

Singular value decomposition

SVD is a linear transformation of an M × N matrix, which refactors the matrix into three component matrices. Thus, SVD decompose the Ip into three component matrices, U, S and V such that

$${I_p}=US{V^*},$$
(3)

where U is an M x M orthogonal matrix containing the left singular vectors of Ip, V is an N × N orthogonal matrix containing the right singular vectors of Ip, S is an M × N diagonal matrix, in which the nonnegative entries along the diagonal of S are the singular values of Ip. The singular values \({\sigma _1},{\sigma _2} \ldots {\sigma _k}\), k = min(M, N) are unique. If r is the rank of the matrix Ip, then,

$$({\text{i}})\quad {\sigma _1} \ge {\sigma _2} \ge \ldots \ge {\sigma _r}>0,$$
$$({\text{ii}})\quad {\sigma _{r+1}}={\sigma _{r+2}}=...={\sigma _n}=0.$$

Proposition

The low rank SVs of the abnormal thermograms are larger than the low rank SVs of the normal thermograms.

Since the presence of thermal patches signifies the presence of breast abnormality, let us assume the abnormal thermograms bear more candidate thermal patches than the normal thermograms. Let, the normal thermogram be I and abnormal thermograms be I + Ω, where Ω: additional candidate thermal patches.

Now, the properties of 2-norm of a matrix can be used. However, in practice it is difficult to compute the 2-norm of a matrix due to the unavailability of direct formula. Hence instead of directly computing the 2-norm of a matrix, we use the concepts of vector norm to compute the 2-norm of a matrix. For doing this, we need to treat the M × N elements of any matrix A as the elements of an MN dimensional vector, then compute the 2-norm of the vector as follows [32],

$${\left\| {\text{A}} \right\|_2}={\left\| {\text{A}} \right\|_F}={\left\{ {\sum\limits_{{i=1}}^{M} {\sum\limits_{{j=1}}^{N} {{{\left| {{a_{ij}}} \right|}^2}} } } \right\}^{\frac{1}{2}}}.$$
(4)

For easiness, we denote I as A and I + Ω as B. If A and B are of same order and Ω: additional candidate thermal patches in B, then,

$${b_{ij}}>{a_{ij}},\quad {a_{ij}} \in A,{b_{ij}} \in B,\quad i,j \in \Omega \Rightarrow \sum\limits_{{i=1}}^{M} {\sum\limits_{{j=1}}^{N} {\left| {{b_{ij}}} \right|} }>\sum\limits_{{i=1}}^{M} {\sum\limits_{{j=1}}^{N} {\left| {{a_{ij}}} \right|} }$$
$$\Rightarrow {\left\{ {\sum\limits_{{i=1}}^{M} {\sum\limits_{{j=1}}^{N} {|{b_{ij}}{|^2}} } } \right\}^{\frac{1}{2}}}>{\left\{ {\sum\limits_{{i=1}}^{M} {\sum\limits_{{j=1}}^{N} {|{a_{ij}}{|^2}} } } \right\}^{\frac{1}{2}}} \Rightarrow ||B|{|_2}>||A|{|_2}.$$

Thus,

$${\left\| {I+\Omega } \right\|_2}>{\left\| I \right\|_2}.$$
(5)

Another definition to compute 2-norm of a matrix is as follows [33, 34],

$$||{\text{A}}|{|_2}=\sqrt {{\text{Maximum eigenvalue of }}\text{A}^{\prime}{\text{A}}} ={\text{ Largest singular value of matrix A}}.$$
(6)

Thus, Eqs. (5) and (6) imply that,

$$||I+\Omega |{|_2}>||I|{|_2}$$
$$\Rightarrow {\text{Largest singular value of }}I+\Omega {\text{ }}>{\text{ Largest singular value of }}I.$$

If we denote largest SV or Rank1 SV of a matrix as \(\underline {\sigma }\), then

$$\underline {\sigma } (I+\Omega )>\underline {\sigma } (I).$$
(7)

Based on the Eq. (7), we can conclude that the first SV of an abnormal thermogram is larger than the first SV of a normal thermogram. In the same way, we can show that other low rank SVs of an abnormal thermogram are also larger than the corresponding lower rank SVs of the normal thermograms. Here, we mention the term low rank because, as rank increases the magnitude of the SVs tends to be zero.

The first 20 (from Rank1 to Rank20) SVs extracted from all breast thermograms of the normal and abnormal groups are plotted in Fig. 6. As illustrated in Fig. 6a, it has been seen that the magnitudes of the rank r SVs of the abnormal thermograms are larger than the magnitudes of the corresponding r rank SVs of the normal thermograms. However as the rank increases, the discriminating power of SVs gets reduced. In Fig. 6b, the rank wise average of the SVs obtained from the thermograms of the abnormal group is plotted against the rank-wise average of the SVs of the thermograms of normal group and it also depicts the discriminating power of lower rank SVs.

Fig. 6
figure 6

a Rank-wise magnitudes of SVs of all normal and abnormal breast thermograms of DBT-TU-JU dataset, b Rank-wise average of the SVs of normal and abnormal breast thermograms of DBT-TU-JU dataset

Statistical significance of rank wise singular values

Extraction of SVs from breast thermograms is followed by the testing of the statistical significance of these SVs in distinguishing breast thermograms. For this, 2-sample t test with significance level 0.1% and null hypothesis, Ho: the abnormal and normal thermograms have equal means has been used. Due to the limitation of space to show the statistical significance of all the extracted SVs, the statistical significance of first 10 SVs i.e., from Rank1 to Rank10 for both normal and abnormal thermograms have been demonstrated in Table 2. Table 2 illustrates that the average magnitude of the SVs (in each rank) of the abnormal thermograms is significantly greater than the average magnitude of the SVs of the corresponding rank of the normal thermograms. Moreover as shown in Table 2, the SVs of all ranks, obtained from all the normal and abnormal thermograms are statistically significant with P-value < 0.001 in breast abnormality detection.

Table 2 Average of SVs in each rank

Association of singular values with image complexity of breast thermograms

This section analyzes the association of the SVs with the degree of image complexity that a breast thermogram bears. In the absence of any standard definition of image complexity, here the complexity of a breast thermogram is considered as the amount of information present after pre-processing the thermograms. Hence, the image complexity is not measured from the raw breast thermograms; instead the thermograms possessing the candidate thermal patches are used for measuring the image complexity. Three different measures are used here to compute the complexity of processed breast thermograms. The first measure is based on the Shannon’s definition of information, known as entropy [35].

Let \(p(i)={{n(i)} \mathord{\left/ {\vphantom {{n(i)} N}} \right. \kern-0pt} N}\) be the probability of gray level i, where n(i) is the number of pixels having gray level i and N is the total number of pixels in a breast thermogram. Then, Shannon’s entropy can be defined as,

$$H= - \sum\limits_{{i=0}}^{{255}} {p(i) \times \log p(i)} ,\quad \forall i \in {I_p}.$$
(8)

The second measure is the spatial information (SI) [36], which is an indicator of edge energy and it is computed as follows.

Let \({H_s}\) and \({V_s}\) denote the edge images generated by applying the horizontal and vertical kernels of Sobel, then the spatial information \(S{I_{mean}}\) of a breast thermogram is given by,

$$S{I_{mean}}=\frac{1}{N}\sum {{S_i},\quad \forall i \in {I_p},}$$
(9)

where N is the total number of pixels in \({I_p}\) and

$${S_i}=\sqrt {H_{s}^{2}+V_{s}^{2}} ,\quad \forall i \in {I_p}.$$
(10)

The third measure is the fractal dimension (FD) [37], which observes how the number of boxes deviates as the grid becomes finer by applying a box-counting algorithm. To determine the FD of a pattern, it is imagined that the pattern is laid on an equally spaced grid, and then the number of boxes required to cover the whole set is counted [38]. Mathematically, it is computed as follows.

Let B is the number of boxes that covers the candidate thermal patches and r is the magnification or the inverse of box size, then FD is the slope of the line when the value of log(B) is plotted on Y-axis against the value of log(r) on the X-axis and it is given by,

$$FD=\frac{{\log (B)}}{{\log (r)}}.$$
(11)

After extracting the image complexity values of the thermograms of both normal and abnormal group, it has been seen that the abnormal thermograms possess significantly higher entropy (ent), SI and FD values than that of the normal breast thermograms, i.e., based on these three complexity measures, we can conclude that the abnormal thermograms are more complex than the normal thermograms. The mean and standard deviation of the values of these complexity metrics for the thermograms of both normal and abnormal group are demonstrated in Table 3. Moreover, as illustrated in Table 3, all the complexity metrics are statistically significant (p < 0.001, t test) in showing that the abnormal thermograms are having higher image complexity than the normal thermograms.

Table 3 Average values of image complexity metrics

Now, the association of the computed SVs with the complexity of the thermograms has been evaluated by using the Pearson Correlation measure [39], which is a statistical tool used to measure the degree to which the variables are associated with each other. The correlation between the rank r SVs of all thermograms and their corresponding image complexity measures are analyzed and it has been found that the SV of any rank is positively correlated with the image complexity that means for an image with higher complexity, the SVs will also be greater. Thus, the abnormal breast thermograms have higher SVs than that of the normal thermograms. The correlation of the Rank1 SVs of all the normal and abnormal thermograms with the corresponding image entropy, SI and FD has been illustrated in Fig. 7 for both DBT-TU-JU and DMR databases.

Fig. 7
figure 7

ac Correlation of Rank1 SVs of all breast thermograms of DBT-TU-JU dataset with the image complexity measures, df correlation of Rank1 SVs of all breast thermograms of DMR dataset with the image complexity measures

Normalization of singular values

As demonstrated in Table 2, the SVs of different ranks exhibit significant variation in their range, which necessitates the normalization or rescaling of the SVs before feeding them to various classifiers. Moreover, the normalization of all SVs to a fixed range ensures that each feature contributes proportionately to the final match score.

Let f and f′ denote a feature vector before and after normalization. The normalized feature vector f′ is computed through,

$$\mathop {f_{i}^{'}}\limits_{{i \in N}} =\frac{{{f_i} - \hbox{min} (f)}}{{\hbox{max} (f) - \hbox{min} (f)}},$$
(12)

where N is the number of thermograms.

Classification of breast thermograms

In a CAD system, the feature extraction is followed by the classification of the images based on the extracted features. Although all the SVs are found to be statistically significant, utilization of all the SVs may degrade the accuracy of the system as the discriminating power of the SVs gets reduced with increasing ranks. Hence, to obtain a right combination of SVs and classifier to attain best prediction performance, different combination of SVs with seven most widely used classifiers including support vector machine (SVM) with three different kernels [radial basis function (RBF), polynomial, linear], K-nearest neighborhood (KNN), decision tree (DT), artificial neural network (ANN), random forests (RF), AdaBoost (AB) and linear discriminant analysis (LDA) have been investigated.

Designing of breast abnormality grading (BAG) index

Determination of degree of abnormality in an abnormal thermogram is very much crucial for early detection and diagnosis of the severely abnormal cases. Considering this, the classification of thermograms as normal and abnormal is followed by the designing of a BAG index. For designing the BAG index, among all the SVs of a thermogram, we consider the first two SVs, SV1 and SV2, which are found to be highly discriminative as shown in Table 2. By using these two SVs: SV1 and SV2, the BAG index is defined as-

$$BAG{\text{ }}index=S{V_1}+S{V_2}.$$
(13)

As illustrated in “Association of singular values with image complexity of breast thermograms” section, the SVs are highly correlated with the image complexity for which the SVs SV1 and SV2 are also highly correlated with abnormality. Hence, thermograms with less abnormality will have smaller values of BAG index compared to the BAG index values of the thermograms having severe abnormality. Thus, based on the values of BAG index, we can predict the degree of abnormality in a breast thermogram.

Experimental results

Experimental databases

Both publically available and in-house acquired databases were used to evaluate the performance of our proposed breast abnormality detection method. Two breast thermogram databases namely DBT-TU-JU [13] and DMR [14] have been used. The DBT-TU-JU database is our own developed database in collaboration with Regional Cancer Centre (RCC), Agartala Government Medical College (AGMC), Tripura, India. To acquire the thermograms, a standard acquisition protocol suite had been designed [13] that comprises of a number of important parameters: patient preparation, patient acclimation, patient intake form, examination room condition, patient position and acquisition views. The breast thermograms were acquired by using the FLIR T650sc thermal camera with thermal sensitivity of < 20 mK @ 30 °C and image resolution of 640 × 480 pixels. Currently, the database comprises the thermograms of both healthy and pathological subjects in the age group of 21–80 years. Moreover, the database is also annotated with the ground truth images of the suspicious hot regions. From this database, a dataset of 120 breast thermograms (70 abnormal and 50 normal) in frontal view has been considered for the experimental purpose. The set of abnormal thermograms comprises of 35 thermograms having benign tumors, 15 thermograms having malignant tumors and 20 other thermograms suffering from other breast problems like breast pain, feeling of solid mass, bloody discharge, pus formation etc.

For validating the performance of the designed BAG index in breast severity prediction, the abnormal thermograms are clinically categorized into mild abnormal (MA) and severely abnormal (SA). Due to the incapability of X-ray mammography in breast abnormality detection in all suspected cases of DBT-TU-JU database, the categorization of breast thermograms has been done based on patient history, clinical findings, X-ray mammography report and Fine Needle Aspiration Cytology (FNAC) report. Considering the findings of these modalities, all the abnormal thermograms are categorized into MA and SA. The abnormal thermograms whose mammography or FNAC reports show the presence of either benign or malignant tumor are labeled as SA. On the other hand, the abnormal thermograms whose FNAC reports are not available and mammography could not reveal the presence of any tumor or calcifications, but clinically they are found to be abnormal or having some disease related symptoms like feeling of solid and lumpy structures, experiencing pain for a long period of time are labeled as MA. Moreover, the abnormal thermograms of those patients whose mammography reports are normal, but they are experiencing blood or pus discharge for very long periods of time are also labeled as SA.

The publically available DMR database [14] contains the breast thermograms of 287 subjects. The breast thermograms of almost 47 subjects are labeled as ‘Sick’ and remaining are labeled as ‘Healthy’. It is reported that the breast thermograms were captured using FLIR SC-620 Infrared camera with temperature sensitivity < 0.04 °C and image resolution: 640 × 480 pixels. Here for experimental purpose, a dataset of 100 normal (healthy) thermograms and 45 abnormal (sick) thermograms has been used.

Evaluation metrics

To facilitate the performance evaluation of our proposed breast abnormality detection system, three evaluation metrics: accuracy, sensitivity and specificity have been used that establish the superiority of our proposed method. These evaluation parameters are given as follows,

$${\text{Accuracy }}={\text{ }}\frac{{{\text{TP }}+{\text{ TN}}}}{{{\text{TP }}+{\text{ TN }}+{\text{ FP }}+{\text{ FN}}}},$$
(14)
$${\text{Sensitivity }}={\text{ }}\frac{{{\text{TP}}}}{{{\text{TP }}+{\text{ FN}}}},$$
(15)
$${\text{Specificity = }}\frac{{{\text{TN}}}}{{{\text{TN + FP}}}},$$
(16)

where TP, TN, FP and FN indicate true positive, true negative, false positive and false negative respectively. The sensitivity is the proportion of positive (abnormal) cases that are correctly identified as positive, specificity is the proportion of negative (normal) cases that are correctly identified as negative and the accuracy is the proportion of total number of cases that are correctly classified. Thus, if both sensitivity and specificity are high (low), the accuracy will also be high (low). But, if any of the sensitivity or specificity is high, then the accuracy will be biased to any one of these measures.

Assessment of classification performance

The classification performance of the proposed breast abnormality detection system has been evaluated with seven ‘state-of-the-art’ classification systems including: SVM, DT, KNN, ANN, RF, LDA and AB employing tenfold cross-validation. A brief description on designing of each of these classifiers except LDA is provided below. In LDA, no parameter tuning is done.

Support vector machine

For the SVM classifier, the performance has been tested with three different kernels: RBF, polynomial and linear kernel. Except for the linear kernel, the parameters of both polynomial and the RBF kernel are altered to obtain better classification accuracy. Among different values of sigma, the SVM with RBF kernel shows the highest accuracy against sigma = 2. Similarly, SVM with polynomial kernel provides the best accuracy against the polynomial order = 3.

Decision tree

In DT, except ‘MaxNumSplits’ parameter, other parameters are set as default to obtain the optimal classifier to better fit the data. For tuning DT, the maximum number of splits is set with a number ranging from 1 to 15 and found that with higher number of splits, our model can perfectly predict the train data, but fails to predict the test data. However, the maximum accuracy with train and test data is obtained with split number 5.

K-nearest neighborhood

To obtain the optimal number of neighborhood K for the KNN classifier, the classification accuracy of the proposed system with K = 2 to K = 10 has been evaluated with cross-validation and we have found that the highest classification accuracy was obtained with K = 9. Thus, KNN classifier with K = 9 is used as the optimal model.

Artificial neural network

For implementing ANN, we used pattern recognition network which is a feed-forward network, where the size of the input, output and output layers are initially set to zero. During training, according to training data, sizes of these parameters are adjusted. The number of hidden layers is varied from 1 to 10 and best prediction performance with the testing set is obtained with ten hidden layers. The designed network used ‘Scaled Conjugate Gradient’ training function to train the network.

Random forest

RF is an ensemble tool that combines multiple decision trees to get a more accurate and stable prediction. Here, square root(total number of features) is used as the maximum number of features in each decision tree. For obtaining the better prediction performance, different number of decision trees are used to model the RF classifier and the maximum classification was obtained with 20 numbers of decision trees.

AdaBoost

AB constructs a robust classifier by iteratively adding multiple weak classifiers. We used AdaBoost M1 classifier for binary classification. In each round of training, the classification performance of the ensemble is enhanced by adding a new weak learner. Here, one level decision tree also known as decision stump is used as weak learner to create the ensemble and the ensemble undergoes 100 learning iterations to achieve the optimal performance.

As aforementioned, the classification accuracy of the proposed system has been evaluated with different numbers of SVs so that we can make a choice of optimal number of SVs to obtain the best classification performance. However, instead of using all SVs, the performance of the proposed system has been evaluated with maximum of 30 SVs, since beyond this the performance of the system gets degraded noticeably. The maximum of the evaluation metric values obtained with each classifier for different number of SVs over 20 iterations is demonstrated in Table 4. The set of observations made from Table 4 are as follows,

Table 4 The accuracy, sensitivity and specificity values of each classifier with different number of SVs
  1. 1.

    Different classifiers show their best performances either with 1, 2 or 5 SVs. However in most of the classifiers, the highest prediction performance has been obtained with two SVs in both the DBT-TU-JU and DMR datasets. The performances of all the classifiers either get degraded or remain same when the number of SVs gets increased.

  2. 2.

    If the classification performance of a classifier is same for both 1 and 2 SVs, then their value of sensitivity is used to break the tie and the classifier with the higher sensitivity is considered to be better.

  3. 3.

    In DBT-TU-JU dataset, SVM with Polynomial kernel and ANN give the highest prediction accuracy of 92.50% with two SVs. Moreover in this dataset, the proposed system has obtained the prediction accuracy more than or equal to 90% with all other classifiers.

  4. 4.

    In DMR dataset also, the highest prediction accuracy of 98.00% has been obtained with SVM using polynomial kernel and ANN with two SVs. Except for LDA, all other classifiers show the prediction accuracy > 95% in DMR dataset.

  5. 5.

    Moreover like accuracy, the sensitivity and the specificity of the proposed system are also very high in both the datasets.

  6. 6.

    Thus, in both the DBT-TU-JU and DMR datasets, ANN and SVM with polynomial kernel give the highest classification results with two SVs.

However, while doing the performance analysis of the SVs in breast abnormality prediction, it is obvious to evaluate the prediction accuracy of the system with the highly correlated image complexity features so that we can investigate whether it is possible to substitute the SVs with these image complexity features in the proposed system. To do so, we have evaluated the prediction performance of the image complexity features and listed the accuracy values in both datasets by using the individual complexity features and also by using all three complexity features together in Table 5. Like the SVs, the complexity features are also normalized in the same way before feeding them to the classification algorithms. The best classification accuracy of each classifier is represented in boldface in Table 5. The listed accuracy value of each classifier against each feature set is the maximum of all the 20 accuracy values obtained in 20 consecutive iterations. As illustrated in Table 5, it has been seen that compare to the SI and FD, the entropy is more discriminative and it provides more prediction accuracy with all classifiers than the SI and FD. And while all complexity features are used in combined way, they provide satisfactory performance with all the classifiers. However, when the performances of these complexity features with all classifiers are compared with the classification performances of the set of 2 SVs (as demonstrated in Table 5), it has been seen that the classification accuracy of the SVs with all classifiers is much better than the classification accuracies of the complexity features. Thus, the SVs are more discriminative in nature than that of the image complexity features.

Table 5 The prediction accuracy of each classifier with the image complexity measures

Assessment of the grading performance of BAG index

This section evaluates the grading performance of the BAG index. The degree of abnormality or severity that a thermogram bears can be predicted by observing the magnitude of the BAG index. By using Eq. (13), the BAG index values for all breast thermograms have been computed and analyzed to obtain the BAG index range in various categories of thermograms. The boxplots of BAG index values for the normal (N), MA and SA thermograms have been plotted in Fig. 8. As depicted in Fig. 8, the dominant range of BAG index for N, MA and SA thermogram is 0–2, 16–22 and 36–55 respectively and it clearly demonstrates the discriminative capacity of BAG index in the degree of severity prediction. Along with the boxplots, the mean BAG index value of each group is also plotted in Fig. 8. From the mean value also, we can conclude that each class of thermogram maintain a different range of BAG index values.

Fig. 8
figure 8

Boxplots of BAG index values for normal (N), mild abnormal (MA) and severely abnormal (SA) breast thermograms of DBT-U-JU database

Discussion

The traditional approach of breast abnormality prediction is the bilateral asymmetry analysis whose accuracy relies on the proper separation of left and right breasts from a thermogram. The proposed method on the other hand uses the characteristic thermal patterns within the breast regions, whose presence signifies the presence of breast abnormality. The experimental results show the efficiency of our proposed method both in public and in-house datasets. However, to further evaluate the potentiality of the proposed method, this section provides a comparison of our proposed method with other existing works. The comparison of the proposed method is performed in two ways: first, the proposed method is compared with the conventional method of breast abnormality detection known as bilateral asymmetry analysis as reported in [40] and then, a comparison of the proposed method with other existing works has been made. Along with the comparison, this section also incorporates the limitations of our proposed method.

Comparison of the proposed method with conventional method of breast abnormality detection

In this section, we compare the proposed method with our previously reported method of breast abnormality detection [40]. The breast abnormality detection method reported in [40] was based on the bilateral asymmetry analysis. For doing the bilateral asymmetry analysis, the extracted breast regions were further segmented into left and right breasts. Then, from each breast, a set of 7 first order statistical features: mean, entropy, skewness, kurtosis, variance, standard deviation, maximum intensity value and 17 second-order texture features: contrast, correlation, dissimilarity, energy, entropy, sum entropy, difference entropy, homogeneity, variance, sum of variance, difference variance, autocorrelation, sum average, information measure of correlation1 and information measure of correlation2 were computed. The statistical significance of these extracted features was carried out by using non-parametric Mann–Whitney–Wilcoxon (MWW) test [41] with significance level of 0.1%. Out of these 24 features (7 statistical and 17 texture), 13 and 16 features were found to be statistically significant (p < 0.001) for the breast thermograms of DBT-TU-JU and DMR databases respectively. Then considering these statistically significant features, three different feature sets namely: first order statistical features (FStat), second order texture features (STex) and statistically significant (SSigST) features had been formed. Then, the performance of these three feature sets were evaluated by using different classifiers. Based on the classification performances, it was found that with all the classifiers the statistically significant features gave better performance than both the FStat and STex feature sets in both the datasets. So, here we compare the efficiency of these three feature sets with the proposed breast abnormality detection system to evaluate the superiority of the proposed method over these three sets of features. Since it is aforementioned that the highest classification accuracy is obtained with only two SVs, here for comparison we consider the results obtained using only two SVs.

For comparison, the classification accuracies obtained in 20 iterations with two SVs are statistically evaluated with the classification accuracies of each of FStat, STex, and SSigST feature set in all 20 iterations. Along with the FStat, STex and SSigST feature sets, the classification accuracies of Ent, SI, FD and ESF (Combination of Ent, SI and FD) feature sets in each iteration is also statistically tested. The p-values obtained using the non-parametric MWW test with significance level of 1% and alternative hypothesis, Ha: The classification accuracy of SVs is higher than the classification accuracy of Ent/ SI/ FD/ ESF/ FStat/ STex/ SSigST have been listed in Table 6. With the p-values < 0.01 for each pair of feature set (SVs vs. Ent/SI/FD/ESF/FStat/STex/SSigST) and for each classifier, the MWW test accepts the alternative hypothesis Ha and thus, it can be concluded that for each classifier, the classification accuracy of SVs obtained in each iteration is significantly (p < 0.01) greater than the classification accuracies obtained with all other feature sets. Thus, the SVs are found to provide the highest classification accuracy among all the feature sets.

Table 6 Statistical significance of the classification accuracies of SVs against the accuracies of other feature sets

Besides the statistical significance test, the receiver operating characteristics (ROC) curves for all these feature sets with the classifiers that give the highest classification performance have also been plotted in Fig. 9 and the area under the ROC curve (AUC) is considered for the efficiency measure of the feature sets. Along with the ROC curves for SVs, first order, second order and statistically significant features; the ROC curves for the image complexity features are also plotted here for comparison. As reflected in Fig. 9, it has been seen that for both the ANN and SVM_Polynomial classifiers, the SVs return the higher AUC values than the remaining feature sets in both the DBT-TU-JU and DMR datasets, which indicates the superiority of the SVs over the other feature sets. However, even though the classification accuracies of the ANN and SVM_Polynomial are same, the AUC values of SVM_Polynomial are higher than the AUC values of the ANN in both the datasets. In a disease diagnosis system, the higher AUC value (AUC > 0.9) indicates the better prediction accuracy of the system [42, 43]. Hence, by considering both the accuracy and AUC values, it can be concluded that the prediction performance of SVM with polynomial kernel is better than the ANN in both the datasets.

Fig. 9
figure 9

a, b The ROC curves for ANN and SVM_P obtained using the DBT-TU-JU dataset and c, d the ROC curves for ANN and SVM_P obtained using the DMR dataset

Comparison with other reported methods

In this section, the comparison of the proposed method is made with the previously reported IBT based breast abnormality detection methods. As described in the literature section of this article, several bilateral asymmetry analysis based breast abnormality detection methods with different degree of efficiency had already been reported. However, it is worth mentioning that due to the unavailability of the breast thermogram databases used in existing research works, we are not able to provide the comparative study of our proposed method in those respective databases. But, there are some bilateral asymmetry analysis based breast abnormality detection methods whose performances were evaluated by using the publically available DMR database. Hence, for comparison purpose, we have also evaluated our proposed system in the publically available DMR database and report the same in Table 7.

Table 7 Prediction performances of the proposed and other reported IBT-based breast abnormality detection methods

As summarized in Table 7, the proposed method outperforms the other existing techniques. Unlike the bilateral asymmetry analysis based methods, the key advantages of the proposed method is that its accuracy does not rely on the accurate separation of left and right breasts of a thermogram, instead it uses the SVs to characterize the candidate thermal patches whose presence within breast region indicates the breast anomaly. Moreover in contrary to the bilateral asymmetry analysis, the proposed method can detect the presence of breast abnormality even when the abnormality is present in both breasts of a thermogram and thus, improves the accuracy by increasing the number of true positives. Above all, instead of using a large number of features, the proposed method can detect the presence of breast abnormality by using only two SVs and based on these two SVs, it can also predict the degree of abnormality of any thermogram.

Limitation of our proposed method and future work

Although with regard to the obtained performance measures, the proposed method outperforms the existing methods, the ranges of the BAG index for predicting the degree of abnormality is dataset dependent for which the BAG index range for normal, mild abnormal and severely abnormal may slightly vary depending on the datasets. Moreover, the accuracy of the proposed method depends on the delineation of breast regions from the thermograms so that the thermal patches within the breast region gets characterized but not the thermal patches of the non-breast regions. But, in our proposed method, the delineation of breast region is done by using a semi-automatic segmentation method that needs human-interaction for selection of lower parabolic boundaries. Hence, a potential area for future improvement of our proposed method is to delineate the breast regions automatically from breast thermogram images without the human intervention.

Conclusion

This paper describes a new approach for computer-aided diagnosis of breast abnormality in asymptomatic patients. The task is to automatically distinguish the abnormal thermograms from the normal one to identify the cases that require urgent medical attention. Due to the fact that the thermograms having physiological dysfunction bear more high-temperature regions than that of the healthy breast thermograms, the proposed approach is based on the analysis of the thermal patterns of the breast thermograms. For this purpose, the SVD is used for characterization of these thermal patterns and to distinguish the abnormal thermograms. The experimental results show that the proposed system performs better than other recently reported breast abnormality detection methods. In this paper, our contribution is twofold. First, we have used a single channel of the color channels instead of the three channeled RGB image to represent the candidate thermal patches and then, we have proposed to use the SVs to characterize the thermal patterns and it has brought a remarkable improvement in breast abnormality detection accuracies. We also demonstrate that although the image complexity features are highly correlated with the SVs, the SVs are more efficient in distinguishing the abnormal thermograms from the normal ones. Secondly, based on the extracted SVs, a BAG index has been designed to investigate the degree of abnormalities these abnormal thermograms bear.