Keywords

1 Introduction

Detect pneumonia on chest X-rays is a challenging task even for X-ray specialists. Pneumonia on X-rays is difficult to distinguish from some pathologies about the other chest area [1]. This work requires knowledge of chest-related pathologies, as well as expertise and experience in viewing X-rays to make an initial diagnosis. The intelligent computer systems to support doctors became essential to help doctors diagnose pneumonia from X-ray images more effectively.

Many recent researches in the field of image processing have achieved good results in image classification, especially using Convolutional Neural Networks (CNN), a deep learning model that is heavily used in image recognition, because CNN is capable of extract key features from image fast and efficiently [2]. In the field of medicine, there have been many scientific papers that have applied the CNN model [3,4,5,6,7,8,9] such as breast cancer detection [3], skin cancer [4], pneumonia detection [9], etc.

To assist doctors in detecting pneumonia, Ayan [10] studied the diagnosis of pneumonia with the CNN model when tested on pretrained models of Xception and Vgg16. To adjust the parameters for the model in the process of training the model, the authors combined the method of transfer learning and fine-tuning. The result was the accuracy of Xception with 82% and Vgg16 with 87%. Ponnada [11] presented a system for detecting pneumonia. The author and collaborators suggested a CNN method consisting of seven hidden layers to train their model. The result with accuracy is 86%, precision is 79%, and recall is 98% with LIDC-IDRI and Mendeley dataset. Kadam [12] have developed a deep neural network, which helped predict the presence of pneumonia using chest X-rays, a CNN model has been employed to increase efficiency and accuracy, to further add to the performance, optimum differential learning rates have been selected using the techniques of cosine annealing and stochastic gradient with restarts, and the result with accuracy is 92.90%, precision is 90.88%, and recall is 99.27%.

The detection of pneumonia on chest X-ray is an image classification problem to determine whether the presence of pneumonia in the chest X-ray image. The system input is a chest X-ray image; the output is a normal X-ray image or pneumonia image. In this paper, we proposed an adaptive technique for lung diseases image classification based on the deep learning method. We improved the convolutional neural network for lung diseases image classification, created a training model with a suitable number of hidden network layers and optimal algorithms to detect pneumonia images. The rest of this paper is organized as following: in Sect. 2, we describe deep learning for classification. The proposed method for lung diseases image classification is presented clearly in Sect. 3. The experiment results and conclusion are presented in Sects. 4 and 5.

2 Background of Deep Learning for Classification

Deep learning is the neural network with multiple hidden layers. A neural network may have few hidden layers, but with deep learning, the number of hidden layers is very large, up to hundreds of layers. More layers will increase the accuracy of the neural network. In deep learning technique, the more the input data, the more accurate the output [5]. Deep learning is a sub-branch of machine learning; deep learning algorithms will perform a task several times, each time adjusting it a little to improve results.

CNN is a class in deep learning. CNN has applications in image and video identification [13], recommendation systems [14], image classification, medical image analysis [10,11,12], natural language processing [15], and financial time series [16]. CNN performs images classification by receiving input images, processing them, and classifying them as labels. The computer receives the input as an array of pixels based on the resolution of the image, from which the computer recognizes the image as height × width × dimension (h × w × d). An image with (150 × 150 × 3) means there are three color channels RGB.

In the CNN model, each input image scans through convolutional layers, also known as hidden layers. Each convolutional layer includes filters (kernel), activation function, max pooling, and the result is the feature map. To classify the output data, the feature map will switch to the fully connected layer (FC) and apply an activation function.

Figure 1 illustrates the process of receiving input image, extracting features of the image through layers, and classifying the output object.

Fig. 1
figure 1

A neural network with many convolutional layers

The convolutional layer is used to extract features of the image. The input image of each convolutional layer is a 3-dimensional tensor of size (H × W × D) that will slide over by a kernel of size (Wk × Hk × Dk) with stride S and padding P. Each convolutional layer has N kernel; the output is a 3-dimensional tensor with the formula shown in the lecture [17]:

$$\left( {\frac{{{\text{W }} - {\text{ Wk }} + { }2{\text{P}}}}{{\text{S}}} + 1} \right)* \left( {\frac{{{\text{H }} - {\text{ Hk }} + { }2{\text{P}}}}{{\text{S}}} + 1} \right)* {\text{N}}$$
(1)

Consider a matrix X [5 × 5] containing image pixels with values 0 or 1, and a filter K [3 × 3] slides up to X with stride = 1. Each time K slides over X, extract from X a matrix A [3 × 3], perform the element-wise calculation on two matrices by the formula:

$${\text{Y}} = {\text{X}} \otimes {\text{K}}$$
(2)

For each element xij in a matrix X, take out a matrix equal to the size of kernel K, and element xij in the central, called matrix A. Then, sum the elements of the element-wise calculation of matrix A and matrix K, write the result into matrix Y. Figure 2 illustrates the element-wise calculation to get the features map [18].

Fig. 2
figure 2

Features map

The value of y11 is calculated: y11 = sum(A ⊗ K) = x11 × k11 + x12 × k12 + x13 × k13 + x21 × k21 + x22 × k22 + x23 × k23 + x31 × k31 + x32 × k32 + x33 × k33 = 3. And perform the same calculation with the remaining elements in the matrix.

However, color images have three color channels: red, green, blue. The image is represented as a 3-dimensional tensor, and the filter is also a 3-dimensional tensor of size (k × k × 3).

The output of the convolutional layer will apply the activation function, and the input of the next convolutional layer will be received from the output of the previous layer as in the lecture [17]. Each kernel has dimensions (Wk × Hk × Dk) and has a bias coefficient, the total parameter of a kernel is (Wk × Hk × Dk + 1). With N kernels in the layer, the total number of parameters in this layer is (N × (Wk × Hk × Dk)).

Pooling layers are used between convolutional layers, to reduce data size, but retain important attributes. The reduced data size helps reduce computations in the model. If the pooling layer is size = (2 × 2), stride = 2, padding = 0, the output size of width and height will be halved, the depth will not change. After the image has been transferred to many convolutional layers and pooling layers, the model has learned the features of the image. The last layer, the tensor of size (H × W × D), will be converted to a size vector (H × W × D) in the fully connected layer. Finally, we combine the fully connected layer with image features to get the model’s output.

3 Lung Diseases Image Classification

In this section, we proposed the method for lung diseases images classification based on deep learning. The proposed method is presented in Fig. 3.

Fig. 3
figure 3

Process flow of the proposed method

In Fig. 3, the proposed method includes six steps. The detail of the above steps presents as follows:

  1. 1.

    Get training data from the dataset: perform reading chest X-ray image data from dataset “train” directory. Images, in turn, are put into the system, the total number of photos in training is 5218. The parameters are configured as follows, batch_size = 32, epoch = 32, image_dimension = 150 × 150 px. Therefore, corresponding to one epoch, the training dataset will be put into the system with 163 batches (5218/32 batch_size).

  2. 2.

    Data augmentation: deep learning techniques require a huge amount of data to get good results. In many cases, the amount of data is not large enough, such as medical data. In machine learning, the solution often chosen to solve this problem is data augmentation, help avoid redundancy and improve algorithm accuracy [19]. In this work, we have used different data augmentation techniques such as rescaling, image change adjust brightness, shear intensity, range for random zoom (rescale = 1./255, brightness_range = [0.7, 1.0], shear_range = 0.2, zoom_range = 0.2). Data Augmentation is an effective way to increase training data.

  3. 3.

    Image analysis using CNN: the model was built using five convolutional blocks with kernel_size = 3 × 3, activation function is “Relu”, pool_size = 2 × 2 and uses “Dropout” to reduce overfitting with rate = 0.2. The last layer uses the activation function is “Sigmoid” because of binary classification. The optimization chosen was “Adam”. Using Callbacks including ModelCheckpoint to save a copy of the best model, and EarlyStopping to stop model training when the difference between “training” and “validation error” starts to increase, instead of decreasing. The detail of this step is presented in Fig. 4.

  4. 4.

    Compile and build the model: implementing the model training process.

  5. 5.

    Test model on test data: use the “confusion matrix” and get the results for precision, recall, accuracy.

  6. 6.

    Check and evaluate results: in this step, we check and evaluate the results of the proposed method.

Fig. 4
figure 4

Process flow of image analysis using convolutional neural network

To evaluate the results of the model, we use evaluation scales such as accuracy, precision and recall. A confusion matrix is used to present the values obtained after the model testing process, with these values, we can calculate the effectiveness of the training model. The components of the confusion matrix are described in detail in Fig. 5.

Fig. 5
figure 5

Confusion matrix [20]

We apply the diagnosis of pneumonia to explain these four indicators. In the diagnosis of pneumonia, we have two layers: the pneumonia layer and the normal layer. The grade of pneumonia was marked as positive, and the normal layer was marked as negative. We define the parameters for the evaluation results such as True Positive (TP), False Negative (FN), True Negative (TN) and False Positive (FP).

  • TP: Actual image is pneumonia, and the predicted model is pneumonia.

  • FN: Actual image is pneumonia, but it is predicted to be normal.

  • TN: Actual image is normal, and it is predicted to be normal.

  • FP: Actual image is normal, but it is predicted to be pneumonia.

Accuracy is the ratio of the number of pneumonia predictor data that is correctly divided by the total number of pneumonia data in the test dataset. The accuracy method is only suitable for general assessment. If we need to get the number of details of each type of prediction is true or false, this method has not provided enough information. For example, a model with 95% accuracy can be either very good or bad depending on actual data classification requirements [20, 21]. The value of accuracy is calculated as Eq. (3):

$$Accuracy = \frac{{{\text{TP }} + {\text{ TN}}}}{{{\text{TP }} + {\text{ TN }} + {\text{ FP }} + {\text{ FN}}}}$$
(3)

The precision is the ratio of the number of cases that the pneumonia image prediction model was true to the actual dataset divided by the total number of cases the model predicted pneumonia images. The value of precision is calculated as Eq. (4)

$$Precision = \frac{{{\text{TP}}}}{{{\text{TP }} + {\text{ FP}}}}$$
(4)

The recall is the ratio of the number of cases that correctly predicted pneumonia divided by the total number of pneumonia image cases in the real dataset. The value of recall is calculated as Eq. (5):

$$Recall = \frac{{{\text{TP}}}}{{{\text{TP }} + {\text{ FN}}}}$$
(5)

In the diagnosis of pneumonia, if the recall rate is very high, the efficiency of the model will be very good, because the model can detect most cases of pneumonia.

4 Experiments and Results

In this section, we experiment and evaluate the results of the proposed method. We compare the results of the proposed method with the results of the other methods, such as Ayan method [10] and  Kadam method [12].

4.1 The Dataset for Training and Testing

We use X-ray dataset provided by Kaggle [22]. This dataset contains 5863 chest X-ray (JPEG) images with different resolutions. The dataset is structured into three parts, such as: training, testing and validation. Each part containing subfolders for each category is pneumonia or normal. X-ray images have been manually classified by doctors and divided into two categories. The images in the dataset have different resolutions, such as from 407 × 178 to 2916 × 2583 pixel. There are 1583 normal cases, 4273 cases of pneumonia images in the dataset. In our study, all images were scaled down to 150 × 150 pixels to maintain the same aspect ratio between all images. Some images from the dataset are shown in Fig. 6 including fifteen normal images, and Fig. 7 including fifteen images of pneumonia. The number of medical images for training, testing and validation present in Table 1.

Fig. 6
figure 6

X-ray image of the lungs normal case

Fig. 7
figure 7

X-ray image of lungs pneumonia case

Table 1 Illustrates the number of images in three parts

Now, we test the proposed method with the above dataset and compare with the other methods, such as Ayan method [10] and Kadam method [12].

4.2 Experiments

Our experiments developed on Google Colab, including resources such as 13 GB of RAM, 2 CPUs of 2.2 GHz, GPU and TPU 12 GB. Dataset X-ray images stored on Google Drive cloud. Methods to build training models and classify diseases through six steps as in Sect. 3. Table 2 present the configuration information for the parameters in the model.

Table 2 The configuration of CNN networks

Table 3 describes the layers in CNN, including one input layer, five convolution blocks, fully connected layer and output layer.

Table 3 Convolutional neural network architecture

Information on model test results is presented in “confusion matrix” in Fig. 8.

Fig. 8
figure 8

Confusion matrix result

Now, we compared the results of the proposed method with the results of the other methods. The results comparisons are presented in Table 4.

Table 4 The table compares the results of researches on the chest X-ray image

According to the Table 4 presented the above, our experimental results are better than the experimental results of Ayan method [10] and Kadam method [12] on the same dataset.

With the results of Ayan method [10], the results of the proposed method are better than this method on all three indicators: accuracy, precision, and recall. With the results of Kadam method [12], the value of recall of Kadam method [12] is 99.27% while the value of the proposed method is 98.72%. However, both the precision and accuracy values of the proposed method are 91.23% and 93.27%, which are greater than the precision and accuracy values of the Kadam method are 90.88% and 92.90%.

The reason which the results of the proposed method has better results than the other method because the classification results of pneumonia are higher in the evaluation methods.

5 Conclusions

Lung diseases image classification is hard work and challenge. In this paper, we developed an automated method of checking chest X-ray images with high accuracy, using the dataset Chest X-ray images. These images have been pre-classified by radiologists and divided into two types: pneumonia and normal images. We use deep learning methods, build models starting from steps such as pre-processing, image acquisition, image layering, setting appropriate parameters for each layer such as image resolution, number of training times, optimal algorithms, threshold changes, and state transitions during training to get the best and most appropriate model. The test results are highly effective image classification model. The results of the proposed method have high precision and better than those recently published on the same dataset of chest X-ray images. In the future work, we will extend the experiment to update the number of layers, the time of training or testing on the other pre-train models such as: InceptionNet, ResNet, InceptionResNet, MobileNet and CheXNet. Those tasks aim to reduce the training time and increase the level accuracy of the model compared to the above results.