Keywords

1 Introduction

Coronavirus disease 2019 also known as COVID-19 has been announced high risk by the World Health Organization (WHO) and has started the decade with a new strain of a respiratory disease. People who are afflicted with this ailment might have cold, fever, and chest tightness as the symptoms [1]. The recent variant omicron and deltacron are expected to have mild effects with a great communicability factor among the population, resulting in a high spread rate.

The mild symptoms or the asymptomatic patients are the easiest one to cure during the pandemic [2]. The most important test for finding out the infection is RT-PCR, but the problem is the shortage of the testing kits [3]. People who are at high risk due to the infection can be helped out by detecting COVID-19 early and so they can get their treatment start without waiting for the result of RT-PCR which will help them to recover soon and will help in decreasing the mortality rate. It can help identify patients with high levels of COVID and test them without RT-PCR [4]. Based on extensive clinical criteria, the CT technician should make the careful decision to operate on a CT to verify an aberrant diagnosis from a chest X-ray [5].

The authors of [6] have suggested a unique detailed-oriented capsule network architecture capable of recognizing fine-grained and discriminative picture characteristics for classification of patients with COVID-19 by following data augmentation model. The accuracy they obtained by their model is 87.6% and F1-score is 0.871.

In [7], authors have suggested a model named as EffiecintCovidNet which has a method based on voting and a cross-dataset exploration. By using their model for identification of COVID-19 using CT reports, they are getting the accuracy of 87.60% on their dataset. In [8], the authors build a CNN design model for distinguishing COVID-19 CT scans from others and detecting COVID-19. Their model CNN-2 is much better then original SqueezeNet with accuracy of 85.03% having F1-score of 0.862. The authors of [9] performed ensemble learning on the combined dataset using ResNet50, Inception V4, and EfficientNetB0 and produced predictions based on majority vote, and by using their model they are getting the accuracy of 95.36%. The convolution neural network (CNN) technique in deep learning [10] has shown significant utility in image classification and thus is most widely used by researchers today.

For CT-scan analysis [11] of the chest, deep learning techniques are popular because everyone may use them with low-cost imaging techniques and have a large amount of relevant data to train models with. Manual evaluation of the images of CT scans reports and X-rays reports requires their specialized knowledge, and the process is also time consuming and might be inaccurate sometimes, that is why there comes the algorithms of deep learning and machine learning which can help to extract the relevant information and perform the same task of evaluation of COVID-19 in an easy way [12]. The goodness of the two most effective pre-trained deep CNN models, namely InceptionV3 and ResNet50, has been explored for ensemble learning before too in some research, and their performances have been analysed on the basis of accuracy and other parameters [13]. Their study's goal is to provide a transfer-learning strategy based on CNN for identifying COVID-19 utilizing multiple models which can be more accurate [14]. Ultrasounds, dermoscopy, X-rays, magnetic resonance imaging (MRI), cognitive analytic therapy (CAT), and positron emission tomography (PET) are dynamic and developing domains for research, especially in image-processing techniques and algorithms [15]. The model which is been worked upon in this research is created with Google Colab using Keras module and trained using graphics processing unit (GPU).

The following are the primary contributions of this planned research. In this research, there is a hybrid model which is using the weighted average ensemble method so that the models can be trained and there will be the qualities of both of the model, i.e. InceptionV3 and ResNet50, and this will increase the accuracy of the detection. In InceptionV3, the accuracy is 90.23% and in ResNet50 the accuracy is 89.65% and so the hybrid model accuracy is 94.23% and having F1-score of 91.56%.SARS-CoV-2-CT-Scan-Dataset has been used for training and testing purposes with 80:20 ratio.

The paper is subdivided into the subsequent sections. Section 2 provides the information about the methodology of the research and information about the models used. Section 3 discusses the result of the proposed model along with detailed discussion. Finally, Sect. 4 concludes the paper.

2 Methodology

In this section, the detailed methodology used for COVID detection is proposed using ensemble deep learning algorithms.

2.1 ResNet50 

ResNet50 [16] uses the concept of skip connection that resolves the vanishing gradient problem. In skip connection connects the input of the model to the output of the convolution block by either training the layer or just skipping the layer. It mainly consist two types of block namely identity block as shown in Fig. 2 and convolution block as shown in Fig. 3. Identity block works when the output size of the layer is equal to the input size of the layer, whereas convolution block works when the size of input and output layer differs by the method of 1 × 1 convolution block in the skip connection part which provides the solution of making the size equal. ResNet50 as shown in Fig. 1 is more precise with a large dataset as compared to the small dataset [17]. The architecture of ResNet50 is shown in Fig. 1.

Fig. 1
A flow diagram consisting of alternating layers of convolution and identity blocks, leading to average pooling, flatten and dense layers.

Architecture of ResNet50 with identity and convolutional block

Fig. 2
A block diagram of the identity block consisting of 3 convolution layers, with a skip connection.

Identity block

Fig. 3
A diagram of the convolution block comprises multiple layers of convolution.

Convolution block

In the following equation, f(x) is minimized which symbolizes the difference between the input and output variables for the provided layers. Hence, providing the minimum data loss for different layers used in the model.

$$y = f\left( x \right) + x,$$
(1)

where in Eq. (1) x defines input for the convolution layer.

2.2 InceptionV3

InceptionV3 [18] model is represented in Fig. 4. Learns from different sized filters parallel at multiple stages. The model uses both small and big filters to assess all vital information from the images. This can happen due to variations in the location of the information. Inception covers a wider network by using parallel filters of different sizes rather than deeper networks. The number of parameters is reduced to increase the computational complexity by using factorization of convolution in smaller convolution and asymmetric convolution and makes the algorithm more memory efficient. A small CNN model is used in the middle of the layers, while training the model and the loss computed is added to the main network loss thus reducing the vanishing gradient problem [19]. Figure 2. describes the architecture of InceptionV3 and the different blocks which were used to build this model.

Fig. 4
A framework consisting of convolution layers, repeated Inception modules, global average pooling, and fully connected layers with regularization.

Architecture of InceptionV3

2.3 Hybrid Ensemble Model

Ensemble learning [20] provides better performance and reduces the diffusion, i.e. decrease in variance, of the predictions with respect to any single model. Ensemble learning has three types: bagging, boosting, and stacking. This paper used the bagging type ensemble learning approach in which different models trained on the same dataset and then by averaging, validating the results, the ensemble model will predict the final class.

Deep learning procedure is used which bundle predictions from different models by taking individual model’s features in proportion to the model's effectiveness or capabilities. This model is commonly known as weighted sum/average ensemble. To achieve greater accuracy, authors proposed this hybrid weighted average approach and predicted the result. The training set will be used to fit the hybrid model in Fig. 5, and the validation set will be used to evaluate it. The model weighting will be determined by the validation set's accuracy.

$$W_{1} = \frac{{{\text{Accuracy}}_{1} }}{{{\text{Accuracy}}_{1} + {\text{Accuracy}}_{2} }}$$
(2)
$$W_{2} = \frac{{{\text{Accuracy}}_{2} }}{{{\text{Accuracy}}_{1} + {\text{Accuracy}}_{2} }}$$
(3)
$$Y = \frac{{\mathop \sum \nolimits_{i = 1}^{n = 2} {\text{Accuracy}}_{i} *W_{i} }}{{/ \mathop \sum \nolimits_{i = 1}^{n = 2} W_{i} }}.$$
(4)
Fig. 5
A diagram of the hybrid model proposed combines multiple machine learning models and assigns weights to them to improve its accuracy and robustness.

Architecture of proposed hybrid model using weighted average ensemble learning

Here, Accuracy1 is the accuracy of ResNet50 and Accuracy2 is the accuracy of InceptionV3. W1 in Eq. (2) and W2 in Eq. (3) are calculated weight for the ResNet50 and InceptionV3, respectively. Equation (4) describes the hybrid weighted approach to predict the results.

3 Results and Discussions

3.1 Dataset

In the present paper, the proposed model is used to improve the accuracy of COVID and non-COVID detection from CT scan image reports. SARS-CoV-2-CT-Scan-Dataset [17] from Kaggle is utilized for model training, validation, and testing. The experiment has been performed on Google Colab by using Keras library. The dataset comprises 2481 CT scan images fetched from 120 patients including 1252 COVID images and 1229 non-COVID images. The training and testing dataset is divided into 80:20 ratio, whereas the training dataset is further bisect into training and validation by the ratio of 80:20.

3.2 Accuracy

The model has been parameterized according to accuracy, specificity, sensitivity, precision, and F1-score, defined below. The proposed hybrid model is compared with individual model is summarized in Table 1. Following evaluation parameters are used to compare the proposed model with the individual model.

$${\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{\left( {{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}} \right)}},$$
(5)
$${\text{Sensitivity}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}},$$
(6)
$${\text{Specificity}} = \frac{{{\text{TN}}}}{{{\text{TN}} + {\text{FP}}}},$$
(7)
$${\text{Precision}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}},$$
(8)
$$F1\,{\text{Score}} = \frac{{\left( {2*{\text{TP}}} \right)}}{{2*{\text{TP}} + {\text{FN}} + {\text{FP}}}},$$
(9)
Table 1 Comparison between InceptionV3, ResNet50, and the proposed hybrid model

where TP, TN, FP, and FN act as true positive, true negative, false positive, and false negative values, sequentially.

3.3 Discussion

Table 2 shows the comparisons between previous work done in this field and the proposed model solution to resolve the problem with better and more accurate results. InceptionV3 and ResNET50 are deep learning algorithms which have great impact on the vanishing gradient problem of dataset and helps us to produce better accuracy than other CNN algorithms. Hybrid model uses the qualities of both the algorithms which are combined using the weighted average ensemble method that also helps to achieve a step ahead accuracy and prediction then the single model. Decaps and Peekaboo [6] model proposed an accuracy of 87.6% on the dataset prepared by Zhaoetal consisting of 746 CT scan images which is lesser by 7.035% that the proposed methodology. Another method namely DRE-Net [21] executed over a small database of only 88 CT scan reports which have reached an accuracy of 93%. It showed an unexpected result with all the parameters having the same value. The proposed model uses the best techniques ResNet50, InceptionV3, and ensemble learning models.

Table 2 Comparison between the accuracy of models from different research works

The confusion matrix represented in Fig. 6. helps us to get a visual idea of the accuracy represented by the proposed model.

Fig. 6
A confusion matrix represents actual versus predicted. The values are. T P and T N are 1.6 times e + 02, F P is 26, and F N is 8.

Confusion matrix for the proposed model

4 Conclusion

In this paper, a hybrid ensemble model is proposed using InceptionV3 and ResNet50 to detect COVID and non-COVID patients CT scan images. The experiment has been executed on Google Colab by the help of Keras library implemented over the dataset extracted from Kaggle named as SARS-CoV-2-CT-Scan-Dataset. The proposed model uses bagging ensemble learning which predicts the final accuracy by the weighted average of the individual model accuracy. It provides an accuracy of 94.23% which is higher than the InceptionV3 and ResNet50 model which have an accuracy of 90.23% and 89.65%, respectively. This research can be further improved to segregate different lung diseases.