1 Introduction

In February 2019, transmissible pneumonia in humans was first localize in China. This particular pneumonia causes a severe respiratory gene caused by the corona-2 virus called COVID-19. As a result, individuals of 16 to 21% according to the WHO fell seriously ill and consequently 2 to 3% died from the consequences of this highly contagious disease whose rate of transmissibility was around 3.77% [43]. Thus, to minimize the spread, it is therefore urgent to quick directly individuals infected with the virus to quarantine them and to start their treatment.

The diagnosis of COVID-19 based on well-defined criteria such as clinical symptoms, epidemiological history, positive CT and X-ray images, and positive pathogen tests based on real-time RT-PCR and virus nucleic acid sequencing [1, 9, 12]. To confirm these last two tests, it has been repeat several times in many cases and this is a serious limitation on their accuracy. To solve this problem, we are going to use chest X-ray images as a means to detect this virus. Indeed, the signature of atypical pneumonia due to COVID-19 will be manifest in an image by the presence of opacities in ground glass at the first stage of the disease, which manifests as pulmonary consolidation in the last stage of the disease [10, 25, 28]. Unfortunately, the features of atypical pneumonia caused by COVID-19 are confused with the features of other typical inflammatory pneumonia, making it difficult for the radiologist to detect COVID-19. As solution to this problem, the researchers used artificial intelligence and have already compiled a list of features extracted from images in order to detect the presence of the COVID-19 virus [22]. In literature [4, 7], radiologists have been able to develop clinically several characteristics of viral pathogens at the level of chest radiographic images.

In fact, the researchers noted that the presence of an uneven distribution of bilateral shadows and opacity of the ground glass is a strong indication of the presence of a COVID19 infection. Therefore, we propose to train a convolutional neural network (CNN) using these characteristics extracted from radiographic images to accurately predict infection by COVID19.

In this work, we will rely on our research in the field of artificial intelligence for the service of health [2, 3, 13, 15,16,17,18,19,20, 29,30,31,32, 34, 44]. We plan to offer a new approach for the early detection of COVID19 that will be make available to structures in rural and remote areas with a simple chest X-ray machine. To achieve this goal, we propose a transfer learning-based model to reduce learning step and speed up the whole training process. The transfer model offers a promising alternative to refine an already pre-trained CNN on a large dataset like ImageNet using its weights. In this way, it will help us to speed up the process of convergence during the training stage. Transfer learning is a very useful technique and has also obtained significant results in computer vision and other fields of application.

Moreover, our proposed model uses seven already pre-trained CNN extracted from the ImageNet database. The use of these pre-trained models, avoids the passage through the long process of learning models from scratch. We have chosen seven different models pre-trained on the ImageNet database: InceptionResNetV2 [6], DenseNet121 [24], MobileNet [21], InceptionV3 [40], Xception [8], and VGG16 and VGG19 [23]. Although these models have already been trained on the ImageNet dataset, we will also train them on our base of chest x-ray images that we have taken. The architecture of the proposed model begins with a pre-processing of the input images followed by an increase in quality of the data. Then, the model begins a step to extract the characteristics followed by the learning step. Finally, the model begins a classification and prediction process with a fully connected network made up of several classifiers. To show the location of the important regions of the image which strongly participated in the prediction of the considered class, we will use the Gradient Weighted Class Activation Mapping (Grad-CAM) approach [37]. A comparative study was carried out to show the robustness of the prediction of our model compared to the visual prediction of radiologists.

The rest of the paper is organized as follows. In section II, we will present the development methodology of our classification model, which is composed of four stages: the collection of images to form the database, the pre-processing and the increase of the collected data, transfer learning model and presentation of the pre-trained networks of our model. In section III, we present our simulation results. In section 4, we present the regions of interest which participated enormously in the classification. In section 5, we present some prediction results performed by a renowned radiologist. We end our article with a discussion followed by a general conclusion.

2 Related work and materials and methods

Recently, the scientists in COVID-19 classification has numerous methods with different datasets used in some study. Furthermore, many attempts are proposed some ready network with some changes in some cases to enhance the performance of the classification. Myriad works on COVID-19 classification are available, and we will briefly discuss the most relevant works. The American College of Radiology (ACR) advised against the use of CTs and X-rays as a first-line diagnostic or screen tool for COVID-19 diagnosis (the American College of Radiology 2020, March 22). Where they indicated that the images could only show signs of infection that may be due to other factors. However, there have been plenty of studies where artificial intelligence was applied to test COVID-19 based on chest X-ray images [23, 26, 27, 33, 35, 38, 42]. In [33] Narin et al. experimented with three different CNN models (ResNet50, InceptionV3, and Inception-ResNetV2), and ResNet50 achieved the best accuracy of 98% for 2-class classification. Since they did not include pneumonia cases in their experiment, it is unknown how well their model would distinguish between COVID-19 and other pneumonia cases. Also The authores in Hemdan et al. in [23] used deep learning models in X-ray images to diagnose COVID-19 and suggested a COVIDX-Net model consisting of seven CNN models. Wang and Wong [42] presented a deep residual architecture called COVID-Net .it is one of the early works that has been done on COVID-19, which uses a deep neural network to classify chest X-ray images into three categories (COVID-19, Healthy, Non-COVID-19). In this study, we propose Deep Transfer Learning (DTL) model for recognizing covid-19 from chest x-ray images. The latter are less expensive, easily accessible to populations in rural and remote areas. In addition, the device for acquiring these images is easy to disinfect, clean and maintain. we will design a global transfer model composed of seven pre-trained convolutional neural networks (CNNs): DenseNet121, VGG16, VGG19, InceptionResNetV2, Xception, MobileNet and InceptionV3. The proposed model will address a three-class classification problem: COVID-19 class, pneumonia class, and normal class. To show the location of the important regions of the image which strongly participated in the prediction of the considered class, we will use the Gradient Weighted Class Activation Mapping (Grad-CAM) approach. A comparative study was carried out to show the robustness of the prediction of our model compared to the visual prediction of radiologists.

Our experiments were based on a chest X-ray image dataset collected the COVID-19 radiographic image database which was developed by Cohen JP [11] and the ChestX-ray14 database provided by Wang et al. [41].

CNN models were developed using Tensorflow, with a wrapper library Keras in Python. The experiments were performed on a Lenovo Thinkpad p51 with Intel® Core™ i7-7820HQ, 2.90 GHz, NVIDIA Quadro M2200 8GB GPU and 8GB of RAM. Adam optimization algorithm was used for hyperparameter optimization for training the CNN models, and cross-entropy as loss function. The learning rate is started from the value 0.001 and is reduced after 4 epochs if the loss value didn’t improve with the help of callbacks function. The models were configured to train for 60 iterations.

The proposed methodology includes four steps. The first step deals with collecting the images to form the database. The second step deals with pre-processing and augmentation of the collected data using range rotation, width shift range, height shift range, zoom range, horizontal flip, and vertical flip. After these changes, it was possible to balance the Dataset from both COVID and NON-COVID classification on the testing and training sets. This database augmentation happens during run time, when chest X-ray images are presented as an input for the input image. The third step deals with presenting our proposed Convolutional Neural Networks (CNN) and the use of Deep Transfer Learning (DTL). The fourth step deals with pre-training CNN. We will present these steps in the following paragraphs.

3 Dataset

In this work, radiographic images were obtained from two different sources that were collected to form the lung radiographic image database. The first source concerns the COVID-19 radiographic image database, which was developed by Cohen J.P. [11]. This database contains images from various open-access sources. It is constantly updated with images shared by researchers from different regions. To date, 230 radiographic images have been diagnosed with COVID-19 in the database. We randomly selected 146 images of cases declared positive to COVID19. The second source concerns the ChestX-ray14 database provided by Wang et al. [41]. This database contains 5863 JPEG radiographic pneumonia (viral pneumonia / bacterial pneumonia) images. We randomly selected 210 images of cases with pneumonia and 200 images randomly selected from normal cases. Table 1 below shows the summary of the prepared dataset. Figure 1 below shows some samples of chest X-ray images from the prepared dataset. Thus, we collected a total of 556 radiological images classified into 3 classes: a class of COVID-19 type, a Class of Pneumonia type, and a class of Normal type. In the experimental analysis, 70% of the data set was used as training and 10% for validating data and 20% was used as the test set.

Table 1 Dataset summary
Fig. 1
figure 1

X-ray Images of the three classes: a COVID-19, b Normal c Pneumonia

4 Data pre-processing and augmentation

Not all of the data available on the internet has been subject to the same pre-processing, most of our positive COVID-19 data have all the X-rays occupy most of the screen, with little or no black bars on the sides. This becomes an issue, as our models may later learn that it is enough to look at the black bars on the side, to decide whether the sample is the case of COVID-19, normal or pneumonia. After manual inspection of the data set, it became apparent that 10 to 20 COVID-19 images, 90 to 95 normal images, and 90 to 95 images of pneumonia have black bars. Therefore, to solve this issue, we create a script that removes these black pixels from the samples, an example of this process is illustrated in Figs. 1 and 2. To combine images for the training from different dataset, first we focus on the pixel scaling techniques for different dataset (Pixel Normalization, Pixel Centering, Pixel Standardization) to have the same distribution for the different datasets, In order to overcome the unbalanced data problem, we used resampling technique called random under-sampling which involves randomly deleting examples from the majority class until the dataset becomes balanced. We then resized all the images to the dimension of 224 × 224 pixels with a resolution of 72 dpi.

Fig. 2
figure 2

Elimination of black bars on the sides of the image. a Original image, b Pre-processed image

5 Convolutional neural networks using the deep transfer learning

In a traditional CNN, the input image usually goes through a sequence of convolution and pooling to squeeze the width and height dimension white, increase the feature depth. The purpose of this stacked convolution and pooling layer is to learn the representation of the image or features. The learned featured will be then the input to the fully connected layers for classification. In much previous architecture, researchers have tried to stack increasingly convolution and pooling layers together, hence the term Deep Learning, with an assumption that deeper networks will learn better representation of the object of interest. However, in practice, this design suffers from vanishing gradient issue and the performance degraded as the networks gets deeper. Often, forming a CNN from scratch is generally difficult as this process required large training data as well as significant expertise to select appropriate model architecture for appropriate convergence.

As a solution to this issue, we used Deep transfer learning model. This later have been used for different types of applications [14, 36]. This model offers a promising alternative to refine an already pre-trained CNN on a large dataset like ImageNet using its weights. This helps us to speed up the convergence process during training. Transfer learning is a very useful technique and has also achieved significant results in computer vision and image processing areas.

The proposed model is composed of two modes. A pre-trained start mode which transforms the input images into descriptor vectors. Another mode consists of several classifiers strongly connected together where each classifier will give its output its own prediction. In this way, the prediction which obtained a maximum score is that which will be retained by the global system at its exit. More precisely, the proposed model consists of three main processes as shown in Fig. 3. The first process used is that of image pre-processing followed by an increase in data. The pre-treatment consisted in eliminating all that outside the zones, delimiting the two lungs while keeping an image of 224 × 224. Therefore, the increase in the number of images was achieved through horizontal feedback, random cropping and varying the intensity of each image.

Fig. 3
figure 3

Proposed model process

The second process used is that of feature extraction and learning. At this step, the process uses seven already pre-trained CNN extracted from the ImageNet database. Furthermore, the use of these pre-trained models avoids the passage through the long process of learning models from scratch. Although these models have already been trained on the ImageNet dataset, we will also train them on our base of chest x-ray images that we have taken.

The third process used is that of classification and prediction through a fully connected CNN network made up of several classifiers. At the output of the network, a prediction vector is obtained. However, the final prediction retained by the global system is that which obtained a maximum score.

6 Pre-trained convolutional neural networks

The main advantage of proposed model is avoiding the passage through the long process of learning models from scratch through the use of the weights of the layers of already preformed models extracted from the ImageNet database. In a transfer model, these already predefined weights will be assigned to the first layers placed just after entering the model, while the weights of the last layers placed at the exit of the model will be trained on the images of the new domain. In the following sections, we will present different architecture models based on a convolutional neural network with different layers of depth.

6.1 VGG16 architecture

In this paper, we will use model with trainable and frozen layers based on a convolutional neural network model proposed by K. Simonyan [39]. We used the layers from CONV1–1 up to CONV4–3 as pre-trained layers with weights that are already fixed. Hence, the remaining layers were used for training on our dataset of radiographic lung images already in place as is shown in Fig. 4.

Fig. 4
figure 4

VGG16 Model with trainable and frozen layers

6.2 VGG19 architecture

The VGG-19 model is a trained Convolutional Neural Network based on VGG-16 architecture. While the number 19 stands for the number of layers with trainable weights. In total, we have 16 convolutional layers and 3 fully connected layers. In the proposed model, we used the layers from block1_conv1 to block5_conv1 as pre-trained layers with weights that are already fixed. The remaining layers were used for training on our dataset.

6.3 MobileNet architecture

MobileNet architecture was proposed by Google. In our proposed model, we used the layers from 1 to 75 as pre-trained layers with weights that are already fixed. Therefore, as shown in Fig. 5, the remaining layers were used for training on our dataset of radiographic lung images already in place.

Fig. 5
figure 5

MobileNet model with trainable and frozen Layers

6.4 InceptionV3 architecture

InceptionV3 architecture is the third version introduced by Google to improve the inception convolutional neural network and it is composed by 48 deep layers. In our proposed model we used the layers from the first to mixed_10b layers as pre-trained layers with weights that are already fixed. Therefore, the remaining layers were used for training on our dataset of radiographic lung images already in place as is shown in Fig. 6.

Fig. 6
figure 6

InceptionV3 architecture with trainable and frozen Layers

6.5 Xception architecture

On another side, Xception architecture is a convolutional neural network that is 71 layers deep proposed by Francois Chollet [VGG]. In our proposed model, we used all the layers for training on our dataset of radiographic lung images with using pre-trained weights from ImageNet as a starting point, as is shown in Fig. 7.

Fig. 7
figure 7

Xception architecture with trainable Layers

6.6 InceptionResNetV2 architecture

While InceptionResNetV2 architecture is obtained following modifications made by researchers on the third CNN inception version. The researchers drew on residual connections from the Microsoft ResNet network to offer a deeper, simple, and meaningful version of the Inception architecture. In our proposed model, we used the layers from the first to conv2D-58 layers as pre-trained layers with weights that are already fixed. In this way the remaining layers were used for training on our dataset of radiographic lung images already in place as is shown in Fig. 8.

Fig. 8
figure 8

Inception-ResNetV2 architecture with trainable and frozen Layers

6.7 DenseNet121 architecture

Finally, in DenseNet121 architecture, each layer has as input all the outputs of the layers that precede it, making the network architecture very dense, allowing deep supervision. In our proposed model, we used the layers from the first to conv3-block10–2-conv layers as pre-trained layers with weights that are already fixed. While, the remaining layers were used for training on our dataset of radiographic lung images already in place as is shown in Fig. 9.

Fig. 9
figure 9

DenseNet121 architecture with trainable and frozen Layers

6.8 Stacking model

In the remaining section we will present our new Stacking model that we developed. The latter is based on the neural network sub-models are integrated into an overall stacking model. Our new model allowed us to find the best way to combine the predictions of several existing models already pre-trained. Consequently, we will develop a stacking model using 7 neural networks as a sub-model and a scikit-learn classifier as a meta-learner. The basic idea of this approach is to consider the prediction of each network by assigning it a score. Once the seven networks have finished their prediction, the score obtained by each prediction is counted. Only the prediction with the highest score is selected as output from the model, as is shown in Fig. 10.

Fig. 10
figure 10

Stacking model using seven different pre-trained architecture and majority voting

7 Performance metrics evaluation

The following metrics were used to validate the CNN system:

  • Accuracy (ACC): accurate classification rate as per the total number of elements.

  • Precision (P): ratio of true positive classifieds.

  • Recall/Sensitivity (SE): true positive rate.

  • Specificity (SP): true negative rate.

  • F1-Score: relationship between precision and recall/sensitivity.

    They are commonly used to assess the performance of classification algorithms. There is a standard way to show the number of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) to be more visual. This method is called the confusion matrix.

    The confusion matrix allows us to determine the following metrics

    $$ {\displaystyle \begin{array}{c} Accurency: ACC=\frac{No. of\ images\ correctly\ classified\ }{Total\ no. of\ images}\\ {} Precesion:P=\frac{Sum\ of\ all\ True\ Positives\ (TP)}{Sum\ of\ all\ True\ Positives\ (TP)+ Al l\ False\ Positives(FP)}\\ {}\begin{array}{c} Recall/ Sensitivity: SE=\frac{Sum\ of\ all\ True\ Positives\ (TP)}{Sum\ of\ all\ True\ Positives\ (TP)+ Al\ l\ False\ Negatives(FN)}\\ {} Specificity: SP=\frac{TN}{TN+ FP}\\ {}F1\ Score:S1- Score=\frac{2\times \left( Precision\times Recall\right)}{\left( Precision+ Recall\right)}\end{array}\end{array}} $$

8 Simulation results

We used the Keras library of the platform Python and Tensorflow to develop our architectures of the CNN. We used a 5-fold cross-validation approach to assess the performance of our main 3-class model. The training set was randomly divided into 5 equal sets. Four of five sets were used to train the CNN model while the remaining set was used for validation. This strategy was repeated five times by shifting the validation and training sets. The final performance of the model was reported by averaging values obtained from each fold.

We also used the Adam algorithm to optimize our CNN architectures, and we have defined the loss function through the notion of cross-entropy. The learning rate is started from the value 0.001 and is reduced after 4 epochs if the loss value didn’t improve with the help of callbacks function. The models were configured to train for 60 iterations. We split our dataset into training set and test set using stratify parameter to preserve the proportion of target as in original dataset for better prediction and reproducibility of results. Performance of each class prediction results obtained by our model during the validation of the learning phase is presented in Table 2. We found that the stacking model has allowed improving considerably the result obtained by each model taken individually.

Table 2 The prediction results for each class using the model set which combines the seven models (0: COVID-19, 1: normal, 2: pneumonia)

As we can see in the Table 3, VGG16 and VGG19 achieved the best test accuracy 96.88 and 95.31, respectively. The result for each neural network model is shown in Table 3. After analyzing the obtaining of the predictions of all models, we combined the results of 7 models: VGG16, VGG19, IceptionV3, Xception, DenseNet121, InceptionResNetV2, and MobileNet. Hence, we combine the predicted class of each model in a vector and we take the class which was most frequently predicted by all models. By using this stacking model, we were able to make our final classifier achieved the best performance with test accuracy of 98%, f1-score 98.33%, precision of 98.66% and sensitivity 98.33% as is presented in Table 3.

Table 3 Comparative results for each model on the test set

For simulation results and performance analysis, we present in Fig. 11 various plots of sensitivity against specificity for each model and the stacking model on the testing set. We notice that all models achieve a good performance.

Fig. 11
figure 11figure 11

Receiver Operating Characteristic (ROC) curves of a our proposed model: a denseNet121 CNN, b InceptionResNetV2 CNN, c Inception V3 CNN, d MobileNet, e VGG16 CNN, f VGG19, g Xception, h Stacking model

Figure 12 illustrates and displays the plots of accuracy and cross-entropy versus epoch for all trained models. The plot (a) presents accuracy versus epoch and plot (b) presents cross-entropy versus epoch. Also Table 4 presents performance comparison of existing deep learning methods and our proposed Stacking model on 3-class classification task.

Fig. 12
figure 12

Accuracy and cross-entropy versus epoch for all trained models (a) Accuracy versus epoch, (b) cross-entropy versus epoch

Table 4 Comparison of the proposed Stacking with other existing deep learning methods

9 Localization of the pulmonary lesions

To locate lesions in Chest X-ray images caused by the disease we highlight the points of interest that have strongly contributed to the classification of our model. For this we first perform a linear combination of the gradient maps of all the layers of the model. The region of interest of dimension n × m for a class (i) is obtained by first calculating the gradient of the probability of presence Pi of each class for a given layer of the model Ck. The global average of these gradients will fix the weighting of the neurons relative to class (i).

$$ {\beta}_k^i=\frac{1}{Z}\sum \limits_r^m\sum \limits_s^n\frac{\partial {P}^i}{\partial {B}_{rs}^k} $$
(1)

Then we apply the ReLU function:

$$ {L}^i= ReLU\left(\sum \limits_k{B}_k^i{B}^k\right) $$
(2)

The regions of interest obtained by our algorithm correspond to the three types of classes: Normal, Pneumonia and COVID-19 are shown in Fig. 5. We represented with yellow the pixels which presented a strong gradient and which contributed enormously to the classification of our model. The blue color was assigned to the pixels which presented a weak gradient and which did not contribute to the classification of our model. Note that in the case of the images of patients declared positive for COVID-19, our algorithm focused on the region showing the opacity of the frosted glass, which indicates a clinical pathological character of COVID-19. For Pneumonia class images our algorithm showed a pulmonary inflammation characterizing a typical pneumonia. For the images of the normal class, no region showed a strong variation of the gradient and consequently no region was detected by our algorithm.

In addition, we gave six test images to our partner team from the radiology department of the Fez University Hospital, made up of renowned radiologists. The latter surrounded with a marker in red lesions of the pathology present on the image, as is shown in the images above of Fig. 13. Then we compared their location with that indicated by our algorithm, as is shown by the images below of the Fig. 13. The comparison showed that there was a perfect agreement with the location of the radiologists and that of our algorithm.

Fig. 13
figure 13

Visual results obtained by Grad-CAM on different disease classes: a COVID19 case Top to down: Initial image and its Grad-CAM transform, b Normal case: Top to down: Initial image and its Grad-CAM transform, c Pneumonia case: Top to down: Initial image and its Grad-CAM transform

10 Comparison between radiologist and machine prediction

To be able to compare the performance of our proposed model with that of radiologists, we chose a sample composed of 10 images of COVID-19 type, 10 images of Normal type and 10 images of Pneumonia type. Then we asked a renowned radiologist to visually reclassify his 30 images. As shown in Table 5, the radiologist 1 could not exceed an accuracy of 63.64% with a sensitivity of 70% and a F1-score of 66.67%. Similarly, radiologist 2 could not exceed an accuracy of 37.5% with a sensitivity of 30% and a F1-score of 34%.

Table 5 The proposed model performances compared to prognostic radiologists

11 Discussion

Given the unavailability of PCR tests as well as their exorbitant price, it was useful to make available to healthcare personnel a method based on artificial intelligence to predict COVID-19 quickly and precisely. In this article, we have proposed an intelligent clinical decision support system for the early detection of COVID 19 from chest x-rays, which are cheaper, easily accessible to rural populations, and whose acquisition device is easily disinfected, easily cleaned and easily maintained. This model uses deep learning through seven Convolutional Neurons Networks extracted from the ImageNet database already pre-trained. Each network will predict a class as the input image. The advantage of our model is that it will assign a score to each class prediction and only send out the prediction which obtains the highest score. In this way, we were able to obtain an overall precision of 98.6% at the time when the other models taken individually give a precision lower than this. In addition, our model achieved better performance with a test accuracy of 98%, an f1 score of 98.33%, an accuracy of 98.66% and a sensitivity of 98.33% at the time when the prediction of renowned radiologists could not exceed an accuracy of 63.34% with a sensitivity of 70% and f1 score of 66.67%. With these results, it can be confirmed that the proposed model can be used to detect COVID-19 precisely and quickly, especially since it succeeds in discriminating between pneumonia typical of that of COVID-19. The comparison of the prediction of our model with the prediction of the radiologist showed a much higher precision and sensitivity.

The results obtained by our proposed model are superior compared to other studies in the literature. Table 4 presents a summary of studies conducted in the automated diagnosis of COVID-19 from chest X-ray images and their comparison with our proposed model stacking.

These results demonstrate that an accurate and reduced diagnosis can be made with in-depth learning, especially in isolated rural areas where CT and Polymerase Chain Reaction (PCR) tests are rare. Among the limitations of this model is the non-availability of COVID-19 image data, which prevents complete learning and thereby achieving higher precision. Indeed, learning deep neural networks with limited data available leads to over-adjustment and prevents good generalization. To remedy this issue, researchers should look into the notion of increasing image data to further improve accuracy and avoid over-fitting.

12 Conclusion

The objective of this research was to propose an intelligent clinical decision support system for the early detection of COVID 19 from chest x-rays. Given the lack of PCR and CT tests in rural and isolated regions of the underdeveloped countries, we have developed in this paper a system for the early detection of COVID-19 from chest X-ray images which are accessible and inexpensive in these regions and whose acquisition equipment is easily disinfectable, easily accessible and easily maintained. For this, we have used Deep Transfer Learning models to classify three classes: COVID-19, Pneumonia and normal using Transfer Learning concept. We have used the pre-trained architectures such as DenseNet121, VGG16, VGG19, InceptionResNetV2, Xception, MobileNet, and InceptionV3. We were able to determine characteristic features from chest X-ray images, and we take the advantages of the seven models to build a stacking model that outperformed all other models. The proposed model has achieved the best performance with a test accuracy of 98%, an f1 score of 98.33%, an accuracy of 98.66% and a sensitivity of 98.33% when the prediction of renowned radiologists could not exceed an accuracy of 63.34% with a sensitivity of 70% and an f1 score of 66.67%. We observed that the performance could be improved further by increasing dataset size with the using of data augmentation in the future. In addition to that, we used the concept of Gradient Class Activation Map (Grad-CAM) to visualize the areas which are the most responsible for the final prediction as a heat map. Our study supports the notion of Deep Transfer learning (DTL) methods can be used to simplify the diagnostic process and improve disease management. It will act as assistive intelligence to medical practitioners and potentially help to reduce the burden on the healthcare system. The initial results show promise, but with the increase in the number of X-ray images of COVID-19 and more diverse data, we will be able to make our models more accurate and generalized.