Keywords

1 Introduction

The human body naturally manages the creation, growth, and death of cells in its tissues. Once the process begins to function abnormally, the cells do not die as fast as they should. Therefore, there is an increase in the ratio of cell growth to cell death, which is a direct cause of cancer. Likewise, breast cancer appears when cells divide and grow without reasonable control; It is a disease well known all over the world. In the United States, one in eight women is diagnosed with breast cancer in her lifetime, and more than 40,000 die each year in the United States [1]. We can reduce this number by using early detection techniques, awareness campaigns, and better diagnoses, as well as improving treatment options. In this article, we will first explore infrared digital imaging, which assumes that a necessary thermal comparison between a healthy breast and a breast with cancer always shows an increase in thermal activity in precancerous tissues, and the areas surrounding developing breast cancer. This is due to the metabolic activity and vascular circulation that surround the cancer cells. Over the last 20 years, several techniques have been proposed to detect breast cancer much earlier, such as [2], where the author presents a powerful approach to separate healthy to sick (cancerous) breasts. Likewise, our deep convolutional neural networks (DNN) can make this distinction but with a Novelty, which is the classification of the cancerous breast into four stages:

  • T1: The tumor in the breast is 20 mm or smaller in size at its widest area.

  • T2: The tumor is more significant than 20 mm but not larger than 50 mm.

  • T3: The tumor is more significant than 50 mm.

  • T4: Cancer has grown into the chest wall.

As per our Knowledge, it is the first time that such classification is done using thermal images of the breasts.

2 Previous Work

The authors in [3] presented an overview of computer-aided design (CAD) systems based on the state-of-the-art techniques developed for mammography and breast histopathology images. They also describe the relationship between histopathology phenotypes and mammography, which takes into account the biological aspects. They propose a computer modeling approach to breast cancer that develops a mapping of the phenotypes/characteristics between mammographic abnormalities and their histopathological representation. Similarly, the authors in [4] have studied the use of Local Quinary Patterns (LQP) for the classification of mammographic density in mammograms on different neighborhood topologies. They adopted a multiresolution and multi-orientation approach, studied the effects of multiple neighborhood topologies, and selected dominant models to maximize texture information. Nevertheless, they used a Support Vector Machine (SVM) classifier to classify the resulting data.

Despite the preference of mammography for several decades, new techniques have emerged to overcome the limitations of mammography. Similarly, research has shown near-infrared fluorescence (NIRF) as an essential part of the cancer diagnostic process [5], as well as ongoing observation of the disease and its treatment. It is vital that the image processing produces a powerful NIRF light signal so that the image taken contains a lot of information very close to the actual state of the breast. As we know, the earlier the tumor is detected and the sooner the treatment is started, the better the chances of success. Other researchers have discussed the difficulty of obtaining tumor parameters such as metabolic heat, tumor depth and thermogram diameter [6]. Another article [7] mentions the limitations of computed tomography (CT) and magnetic resonance imaging (MRI), which have low sensitivity for lesions less than one centimeter because of their limited spatial resolution. Some research has shown other negative points of successive mammograms for ten years [8]. According to the study [8], the rate of false positive diagnoses in women after a mammogram each year for ten years is 49.1%. Another study showed that when women were advised to biopsy the Sentinel Lymph Node (NLS), this reduced the risk of disease progression (breast cancer) [9]; moreover, other authors [10] cautioned against taking the results of breast cancer thermography as sufficient information for decision-making, but such images can be used by a powerful Computer Assist Device (CAD) for positive outcome.

A typical maximum transient thermal contrast during warming for a breast with a 10 mm tumor to a depth of 5 mm after being cooled for 1 min was presented in this article. At the time of reading of the thermal graph processes by the computer, the amplitude of the transient peak and its corresponding time, as well as the response time, were extracted from the maximum transient thermal contrasts for tumors of different diameters located at different depths. This analysis shows the peak temperature generated on the breast surface with the tumor. As shown in the diagram of (see Fig. 1), the area surrounding the tumor will produce more heat during the warm-up phase before falling back to a stable temperature.

Fig. 1.
figure 1

A phase of breast warming [7].

Let’s explored other important aspects:

A Genetic Factor for Breast Cancer:

The article [11], the analysis and synthesis of 12 short- and medium-term breast cancer clinical research projects suggested 12 areas of research aimed at to improve detection rates and treatment of breast cancer disease. Our point of interest (Developing better tools for identifying genetically predisposed patients) was explored in the third part of this article, where they identified two essential genes (BRCA1 and BRCA2) in the genetic testing of patients with suggestive family history. It was found that confirmation of genetic predisposition could facilitate the implementation of risk reduction strategies. Besides, the use of new genetic testing tools, such as the high-risk hereditary breast cancer panel, should be accompanied by an appropriate interpretation of the results and their variants for better use in the clinical decision. Recent studies show that poly (ADP-ribose) polymerase inhibitors (PARPs) may be useful in treating tumors with BRCA1/2 mutations that develop breast cancer.

Stages of Breast Cancer:

According to [12], Breast Cancer can be grouped into four stages describes as:

  • T1: The breast tumor is 20 mm or less in its most extensive area, which is a little less than an inch. This category can be divided into four sub-categories depending on the size of the tumor:

    • T1a: a tumor has a size greater than 1 mm but less than or equal to 5 mm.

    • T1b: a tumor has a size greater than 5 mm but less than or equal to 10 mm.

    • T1c: a tumor has a size greater than 10 mm but less than or equal to 20 mm.

  • T2: The tumor is more significant than 20 mm but not larger than 50 mm.

  • T3: The tumor is more significant than 50 mm.

  • T4: Cancer has grown into the chest wall. This category can be divided into four sub-categories depending on the size of the tumor:

    • T4a means that the tumor has developed in the chest wall.

    • T4b is when the tumor has developed in the skin.

    • T4c is a cancer that has developed in the chest wall and skin.

    • T4d is an inflammatory breast cancer.

3 Propose Model

For this work, the images were extracted from the Research Data Base (RDB) containing images of frontal thermograms, obtained using a FLIR SC-620 infrared camera with a resolution of 640 × 480 pixels [13]. The dataset includes images of individuals aged 29 to 85 years old. These images include breasts of different shapes and sizes, such as medium, wide and asymmetrical breasts (Table 1).

Table 1. Summary of subjects.

The statistic data of the subjects are presented in Table 2, with:

Table 2. Dataset repartition use for training
  • Total number of subjects (N) = 67

  • Total number of healthy/normal subjects (NH) = 43

  • Total number of sick/abnormal subjects (NS) = 24

The articles discussed earlier, let emphasize the importance of the image processing, which is currently not well achieve by Artificial intelligence (in comparison to Human being). This highlights the need for a better Computer Assist Device (CAD) that will help us to understand better the thermal images captured by our different thermal imaging cameras. In this context, a CAD will be a deep neural network with an KNN model as a classifier, as shown in Fig. 2 (assuming it is already formed), which will take the thermal images and classify them as non-cancerous (healthy) or cancerous with the possibility to see in which stage Breast cancer is. This model which is, in fact, an InceptionV3 with k-Nearest Neighbors (InceptionV3-KNN) has an extension to what we called “ CancerStage”.

Fig. 2.
figure 2

Flow chart of the proposed model.

CancerStage:

As shown (see Fig. 2), it is after KNN classifier and it is used when our InceptionV3-KNN, classifies a picture as Sick. This last is given to the next module “CancerStage “as input which will be thresholded to reveal light regions. Also, it will blur the resulting image, and a series of erosions and dilations will be done to remove any small blobs of noise as shown in Fig. 3.

Fig. 3.
figure 3

The third images column wise show the result of the threshold (to reveal light regions), and the remain images show results of blur operation, erosions operation and dilations.

3.1 Pre-processing of Breast Thermal Images

We can subdivide this task as follows:

Pre-treatment of Thermal Images of the Breast:

We chose to use a public dataset [14]. The thermal images of this data set were obtained following a dynamic protocol of taking a picture after cooling the breasts with air flow. During the process of restoring the thermal balance of the patient’s body with the environment, the author of the dataset obtained 20 sequential images spaced 15 s apart. The pictures in their original input format (640 × 480) are huge for our DNN, so it is important to trim them to eliminate unwanted areas.

Obtaining the Region of Interest:

From each grayscale image, the region of interest (ROI) was extracted. Each ROI image is converted into a matrix of features that will be processed, and the areas most likely to have cancer will be transferred to the next component entry.

Thus, pre-processing involves grayscale RGB conversion and image culture to remove unwanted regions such as the neck region, arms, and sub-membrane folding of the part.

3.2 Image Classification Framework

Considering the concept of transfer learning, we used a pre-trained Inception V3 model [15], which is powerful enough for feature extraction. Our DNN can be described as [10, 20, 10], which means there is a layer of 10 neurons, where each is connected to 20 neurons in the next layer, and similarly, each is connected to 10 neurons in the third layer. Also, we retrained the final classified layer so that it could determine cancer versus no cancer with considerable confidence (>0.65). If the last layer output had the confidence of <0.65 and 0.55, we submitted the matrix of features to our k-Nearest Neighbors (KNN) for output prediction [17]. During the training of our model, we set the learning rate at 0.0001, the epochs to 14, and the steps to 4000 (all these numbers were obtained through several experiments). Furthermore, our training was done using a sample of 64 breasts, including 32 that were healthy and 32 with some abnormality, where each breast had 20 sequential images. This resulted in 1062 images (after the pre-processing phase) used to train and test our network with respective repartitions of 0.8 and 0.2 as shown in Table 3. To validate the performed tests, we used 12 breasts that were entirely new to our model.

Table 3. Dataset repartition use for validation test

InceptionV3 Model:

When training our InceptionV3 model (Figs. 2 and 3), we observed an increased accuracy and reduced entropy of our model after 3900 steps. Above this number of steps, the model will be over-adjusted, which will result in a decrease in accuracy (Fig. 4).

Fig. 4.
figure 4

Shows the increase in the accuracy, over the number of iterations, the training and the validation become stable after 3900 training steps.

k-Nearest Neighbors (KNN):

Our representation of extracted features via t-distributed stochastic neighbor embedding (tnse) indicates that the features are not fully classified. Also, two main groups can be distinguished: Healthy and Sick Breast. Our goal here is to add on top of our Deep Neural Network (DNN), another classifier to get a good accuracy. For this reason, we have performed through several experiments, a test of several classifiers as shown in Fig. 7. As a result, we found that KNN has the best performance (see Fig. 6D).

After the model is trained, a validation test is done by taking the breast images from the dataset of 12 new breasts (as mentioned earlier). Bearing in mind that for each breast, we have 20 sequential images; we submitted 480 images for the validation test to be organized according to Table 3. Figures 6 and 7 show the performance of our model.

During our experiments, we got a similar ROC curve using a Support Vector Machine (SVM) [16] and k-Nearest Neighbors (KNN). To clear any doubt and add some confidence in the prediction, we have implemented a probability calibration of all the classifiers that we could use on top of our DNN (see Fig. 7). Furthermore, we found that KNN in this case is a well calibrated classifiers as its curve can be assimilated to the diagonal (Perfectly Calibrated).

CancerStage:

As mentioned before, the images classified as Sick by our InceptionV3-KKN will be further process by our module “CancerStage”. It will identify after performing threshold, blur, and a series of erosions and dilations operations the areas containing a dense concentration of white pixel in our gray Scale image. Furthermore, the radius of the zones will be computed and compare according to T1–T4 properties as per Sect. 2. Figures 5, 6 and 7 show how efficient the module is (Figs. 8, 9 and 10).

Fig. 5.
figure 5

The entropy decreases during the training of our model, which shows a positive sign (the ambiguity reduces in our model).

Fig. 6.
figure 6

Performance of our method: Confusion Matrix, Precision-Recall and 2 classes precision-recall of our InceptionV3-KNN. A shows a 2D representation of our feature map; B shows the confusion matrix of our InceptionV3-KNN model, we can see that our model can easily classify a breast (Sick or Healthy); C and D demonstrates the power our model.

Fig. 7.
figure 7

Calibration plots of five Classifier applied on top of our InceptionV3

Fig. 8.
figure 8

Cancerous area identified and classified as Stage 2 (T2). The analysis reveals that the tumor is still located on one side of the breast.

Fig. 9.
figure 9

Cancerous area identified and classified as Stage 3 (T3). The analysis reveals that the tumor has spread to almost all the breast’s tissues surrounding the nipple and it causes an abnormal heat.

Fig. 10.
figure 10

Cancerous area identified and classified as Stage 4 (T4). The analysis reveals that the tumor has spread to all the breast’s tissues causing an abnormal heat.

4 Conclusion

During the literature review, it appeared that work in the field of breast cancer detection from computer scientists’ angle could be a valuable contribution to the field. With this in mind, the most common techniques used to detect breast cancer were presented, as well as their strengths and weaknesses. One method seemed to have a promising future, due to its non-immersive property. Infrared imaging coupled with a powerful Computer Assist Device (CAD) can lead to a very accurate tumor detector. In a future study, we will use a thermal sensitivity camera of 0.5 and will propose a model that can address the issue of breast cancer detection and classification using a 3D structure of the Breast.