1 Introduction

In late 2019, a virus related to severe acute respiratory syndrome (SARS) coronavirus has been identified in Wuhan city, China. The virus is shortly known as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), causing an infection called COVID-19. It is highly relevant to SARS and has been initially identified by the symptoms of fever with respiratory problems. This virus is highly infectious and can transmit to other humans easily [21]. At the end of the year 2019, the first case of COVID-19 has been identified in Wuhan city and it has been rapidly spread to all other parts of the world from Europe to the United States. The increased mortality rate by COVID-19 has been threatened in the whole world [15]. Using the spike exterior structure of COVID-19, it can easily bind with the receptors [17]. The presence of COVID-19 in infected persons cannot be identified from the primary clinical symptoms [20]. The diagnostic procedure of COVID-19 becomes highly essential due to the faster spreading rate [22].

Nowadays, healthcare professionals utilize the Reverse Transcription Polymerase Chain Reaction (RT-PCR) test to detect the COVID-19. To perform this test, the respiratory specimens are collected and the major problem lies in the receipt place of the specimens. Besides, this diagnosis procedure can also be prone to error by the expert mistakes [11], and also it consumes more time. During the testing period and report generation, the patients have to be secluded in unsuitable circumstances for hours or days till the test reports are received. Besides, these kinds of tests have resulted in a poor detecting rate in the range of 30% to 50%. Therefore, it is needed to repeat the test several times for confirming the reports [6]. In this situation, the crucial problem to tackle with the problem of COVID-19 is the deployment of rapid treatment to the patients. Due to the quick spreading of the pandemic, the patients are drastically admitted to the hospitals. It leads to the requirement of quicker diagnosis models, which need to be resolved. Another option is the use of radiological images like Chest X-ray (CXR) or Computed Tomography (CT) images to diagnose CoV-2 infections.

Presently, Machine Learning (ML) approaches are widely employed to diagnose diseases in an automated way. Deep learning (DL) is a familiar research area in artificial intelligence (AI), which allows the design of dedicated models for attaining better outcomes with no need for manual feature extraction. It is commonly employed for several problems like a breast cancer diagnosis, diabetic retinopathy, brain tumor identification, pneumonia detection from CXR images, etc. The COVID-19 pandemic is rapidly rising the need for knowledge. It has enhanced the consciousness in designing an automatic AI-based detection model. The modest, precise, and rapid AI methods find useful to resolve the issue and offer support to patients at appropriate times.

Due to the advancements in the healthcare sector, the predictive and diagnosis models are increased rapidly. Although the operative feature extraction methods are required for achieving effective performance, DL models are commonly employed in the healthcare sector for automated deep feature extraction [10]. Earlier researchers proved that the COVID-19 abnormalities can be recognized in the CXRs and CT images. Though the radiological image-based diagnosis process seems faster and attains an increased detection rate in the earlier stages of a pandemic, it necessitates medical experts for interpreting the images. Essentially, the AI-based diagnosis process recommends the experts get a fast and precise clarification over the X-ray images while detecting COVID-19 [1]. Keeping this in mind, several works have been carried out in the AI and DL-based diagnosis process of COVID-19 using radiological images [2,3,4].

The convolutional neural network (CNN) is found to be an effective DL model for medical image classification. Since the outbreak of COVID-19, much research has been conducted on processing the data related to DL algorithms, especially CNN. On the other hand, transfer learning is the cornerstone of computer vision. Various categorization tasks related to images can accomplish effective performance with datasets of a limited size with transfer learning than using any other method. With the urgent need for solutions to cope with the COVID-19 pandemic and based on the recent efforts among researchers to design AI-based solutions, this study introduces a new multi-modal fusion of deep transfer learning (MMF-DTL) model for COVID-19 diagnosis on CXRs. The presented MMF-DTL model involves a pre-processing step to improve the image quality by contrast enhancement, artefacts removal, and image sharpening. Then, the MMF-DTL undergoes the fusion of three DL models namely VGG-16, ResNet 50, and Inception v3. Finally, softmax classified is used for the classification of applied images into a set of six different classes, namely acute respiratory distress syndrome (ARDS), COVID-19, No Finding, Pneumocystis, SARS, and Streptococcus. The proposed MMF-DTL model is developed as a user interface to simplify the diagnosis process and assist physicians, healthcare professionals, researchers, and so on. A detailed experimental validation takes place on the CXR dataset. The overall paper contribution is summarized as follows

  • Preprocessing includes contrast enhancement, artefact removal, and image sharpening

  • Present a new MMF-DTL model for COVID-19 classification by the fusion of three DL approaches such as VGG-16, ResNet 50, and Inception v3.

  • Classifies the CXR images into six classes, such as ARDS, COVID-19, No Finding, Pneumocystis, SARS, and Streptococcus

  • Develop a simple user interface for the MMF-DTL model to simplify the diagnosis process and control the present pandemic

2 Literature survey

DL models have gained significant attention among researchers in the computer vision community. DL comprises multiple layers and it makes it different from the artificial neural networks. Xu et al. [1] devised a DL-enabled COVID-19 detection model to categorize the pulmonary CT images. The author has used a 3D convolution neural network (CNN) to segment candidate infection regions and the classification process takes place with respective confidence scores utilizing local attention mechanism with CNN. At last, the infection type, and the total confidence score of the CT cases were determined using Noisy-or Bayesian function. This model has achieved a maximum detection accuracy of 86.7%. Ali Narin et al. [2] had undergone an examination of three CNN models for the identification of COVID-19 and normal cases in CXR radiographs. The authors have designed an end-to-end structure with automated feature extraction and selection approaches. The experimental results indicated that the pre-trained ResNet-50 model outperformed the other two models with the maximum accuracy of 98% whereas the InceptionV3 and Inception-ResNetV2 models have ended up with the accuracy values of 97% and 87% respectively.

Ayyar et al. [4] presented a hierarchical classifier model for COVID-19 using CXR images. It comprises several binary classification models and it is exhibited that the DL models integrated into the global attention mechanism outperformed the baseline COVID-Net model. Ioannis et al. [3] have performed experimental validation of the existing CNN models for the classification of COVID-19. Particularly, the authors have used transfer learning for the identification of different abnormalities in image datasets and offered significant results. It has reached the accuracy values of 96.78% and 94.72% on the applied first and second datasets respectively. Ozturk et al. [13] presented a machine learning (ML) model for the identification of viral epidemics using X-ray images. At the earlier stage, the images are extracted by the use of 4 feature extraction techniques. Then, the SMOTE algorithm is applied for the elimination of unbalanced data problems. Afterward, the feature reduction process takes place using the stacked autoencoder (SA) and principal component analysis (PCA). Lastly, the support vector machine (SVM) model is utilized for the classification task. This model has classified a set of six classes with higher accuracy of 94.23%. Nguyen et al. [12] aimed to investigate the severity of the problem by the use of DL models for COVID-19 detection and classification. Rasheed et al. [16] presented a new ML-based COVID-19 detection model using CNN and logistic regression (LR) models. The authors in [14] performed a study on the detection of COVID-19 using ML models. Sharmila [9] presented an effective CNN and deep convolutional generative adversarial network (DCGAN) which categorizes CXR images into different classes. Though several DL-based COVID-19 diagnosis models were presented, still there is a need to improve the detection rate and thereby increase the survival rate of infected persons.

3 The proposed MMF-DTL model

In this study, a novel MMF-DTL technique has been developed to detect and classify COVID-19. The presented MMF-DTL model incorporates several processes such as preprocessing, feature extraction, fusion process, and classification. The working principle of the MMF-DTL technique is demonstrated in Fig. 1.

Fig. 1
figure 1

Overall process of MMF-DTL model

3.1 Preprocessing

The MMF-DTL model undergoes preprocessing in three stages, namely contrast enhancement, artefacts removal, and image sharpening.

3.1.1 Contrast enhancement

CLAHE has demonstrated the most applicable usefulness in the intensity of medical images. The main aim of this model is to enable the observer to examine the disease easily in an image [23]. It is used to investigate the intensities of histograms in a contextual area that is placed at every pixel and fix the demonstrated pixel intensities as a pixel intensity rank in the concerned histogram. The histogram can be defined as an extended form of the normal histogram where the contrast level improvement is intended by a model at every intensity level that is restricted to a user-selectable extent. CLAHE is fundamentally a conventional technique of contrast improvement that maximizes the local information of an image effectively. CLAHE is mainly applied to enhance the contrast of an image when related to the actual CXR image.

The procedures involved in CLAHE are:

  • Step 1: Classify the image into continuous and non-overlapping tiles.

  • Step 2: Fix the histogram of every tile of a threshold and re-share the fixed pixels over the entire histogram in a uniform manner.

  • Step 3: Use Histogram Equalization on every tile.

  • Step 4: Use bilinear interpolation for mapping among the adjacent tiles. Hence, the mapped pixel is said to be the outcome of interpolation from the intensity mapping of 4 corresponding tiles.

3.1.2 Artefacts removal

There are massive diverse artifacts present in the medical images, where some of them affect the diagnostic process, whereas others might confuse with pathology. The elimination of an artifact is a mandatory operation in the medical image examination [5]. The processes involved in the artifact removal technique are listed below.

  • Step 1. A grayscale medical image has been acquired as the input.

  • Step 2. A threshold value of an image is estimated under the application of the standard deviation (SD) model

  • Step 3. The image has been binarized with the help of a threshold value where the pixel values cross a threshold are fixed to 1 and pixels lower than thresholds are fixed to 0.

  • Step 4. The binary form of the image is modeled and regions of linked elements were evaluated with the application of equivalent categories.

  • Step 5. The linked element with the maximum area as well as a connected component that has partial area can be identified.

  • Step 6. The ratio of the higher area to that of the second higher area is determined.

  • Step 7. When the ratio is maximum, then the component with a greater area is maintained and the remaining values are eliminated. Else, when a ratio is low, then the component with higher and second maximum area is provisioned and residuals are removed.

  • Step 8. A convex hull is computed for 1 pixel of an image and the region inside the convex hull has been allocated as 1.

  • Step 9. Then, the attained image matrix can be improved to the actual image matrix to get the image of a problem-free medical image.

3.1.3 Image sharpening

The image sharpening is used to improve the contrast among bright as well as dark sites for obtaining the features. Once the noise is removed from an image, sharpening the images are carried out under the application of a filter 2D model. The performance of filter 2D is used to convolve a kernel along with an image. A maximum filter might be utilized on the image that estimates the average of pixel values that are present in a window. A 3 × 3 averaging filter kernel (K) is presented as given in the following:

$$ K=\frac{1}{9}\left[\begin{array}{ccc}0& -1& 0\\ {}-1& 5& -1\\ {}0& -1& 0\end{array}\right] $$
(1)

Sharpening the images using the predefined kernel operates as follows: for each pixel, a 3 × 3 window should be located in the pixel, each pixel that arrives in a window is calculated and a sum is divided by 9. These functions are finally computed with the average of pixel values in a window. This process is carried out on all pixels in the image for producing the resultant image.

3.2 Feature extraction

The preprocessed input images undergo the feature extraction process by the use of VGG-16, ResNet 50, Inception v3, and fusion models. The processes involved in these models are discussed in the following subsections.

3.2.1 VGGNet-16

VGG-Net was presented by [19], the standard performance of VGG-Net eliminates the complexities involved in the system. The convolution of AlexNet and ZFNet was developed by [24] in the primary convolutional layer are 11 × 11 with stride 4 and 7 × 7 with stride 2, correspondingly. Also, VGG-Net shows that 2 successive 3 × 3 convolutions are similar to 5 × 5 receptive fields whereas 3 are identical to 7 × 7. This model is comprised of 2 major benefits in applying 3 × 3 convolutions rather than using a single 7 × 7 convolution: firstly, as an alternative of one, 3 ReLU layers have been applied to make the decision function more distinguishable; then, minimizes the parameters. The main goal of using a 1 × 1 convolution layer is to enhance the non-linearity of the decision functions. Even though the 1 × 1 convolutional task is linear, ReLU improves the nonlinearity. In this study, a tremendous VGG16 network has been selected to fine-tune the 6 classes of coronavirus diagnosis. The architecture of VGG16, as well as the parameter settings, are provided at every level. Assume that there is a training dataset with m instances, {(x(l), y(l), . . , (x(m), y(m))}. In the entire sample, the network overall cost performance can be represented as follows:

$$ J\left(W,b\right)=\left[\frac{1}{m}\sum \limits_{i=1}^m\left(\frac{1}{2}{\left\Vert {K}_{w,b}\left({x}^{(i)}\right)-{y}^{(i)}\right\Vert}^2\right)\right]+\frac{\lambda }{2}\sum \limits_{l=1}^{n_{l-1}}\sum \limits_{i=1}^{s_l}\sum \limits_{j=1}^{s_{l+1}}{\left({W}_{ji}^{(l)}\right)}^2 $$
(2)

Where Kw, b(x(i)) represents the NN model, \( {W}_{ji}^{(l)} \) is a connection weight among the jth element of layer 1 and ith element of layer l + 1, b refers to the bias term of the hidden layer neuron. Besides, a popular batch gradient descent optimization approach is developed.

3.2.2 ResNet 50

ResNet employs the residual block for resolving degradation as well as gradient vanishing issues which often occur in the CNN model. The residual block lengthens the network intensity and enhances network performance. ResNet networks are capable of producing the best results in ImageNet [7] classification competitions. The residual blocks of the ResNet model execute the residual under the inclusion of input present in a residual block as well as the simulation outcome of the residual blocks. The residual function can be represented as follows:

$$ y=F\ \left(x,W\right)+x $$
(3)

where x means residual block input; W represents residual block weights; y denotes the outcome of residual blocks. It comprises different residual blocks where the convolution kernel size of a convolution layer is diverse. The conventional structure of ResNet contains RetNet18, RestNet50, and RestNet101. The building block of ResNet50 is shown in Fig. 2.

Fig. 2
figure 2

Building block of ResNet50

ResNet50 is referred to as a 50-layer Residual Network. This model has produced a highly convenient outcome in the ImageNet and MS-COCO competitions. This structure is constrained with 49 convolutional layers and 1 fully connected (FC) layer along with the input map of size 224 × 224. The convolutional part includes 5 blocks with tiny filter sizes of 1 × 1 and 3 × 3. The initial block is 1 convolution layer and max-pooling layer, which has a stride of 2. In the case of alternate blocks, it is filled with massive residual blocks with a stride of 2 for the final 3 blocks. The ResNet50 structure has concluded with an FC layer and softmax activation functions. The common suggestion of the transfer learning models is to employ a pre-trained dataset that has to be suitable for a novel operation. Hence, the key objective is to begin the learning task from patterns that are already learned to overcome diverse issues and it leads to the model normalization. Transfer learning has been utilized for tiny datasets and applied as a feature extractor. The major aim is to maintain the convolutional portions and interchange the FC layers with the presented classifier.

3.2.3 Inception v3

The Inception model of CNN is said to be the series of neural networks (NN) which should be rejected from the records of CNN. Many NN performs only the network lengthening operation by improving the convolutional layer to attain the optimal function of Inception NN. This Inception NN has modified the predefined principle. The Inception module evolved from the Inception NN applies diverse filter sizes and higher pooling to limit the data dimension. There is a major benefit of reaching effective features with meaningfully minimized processes and limited variables. Inceptionv3 employs the asymmetric model to degrade large-scale convolution kernels into small-scale convolution kernels, which reduces the 3 × 3 convolution kernels into 2 convolution kernels like 1 × 3, 3 × 1, and also limit the network variables at the time of provisioning network performance. Under the application of asymmetric decomposition, the depths of the NN could be deepened and the non-linearity of the system has been maximized [8]. Conventionally, the CNN pooling task is applied for lessening the mesh size of feature maps. To eliminate the problems, the activation dimension of network activation has to be extended before using maximum pooling or average pooling.

3.2.4 Fusion model

In this paper, the MMF model is utilized for improving the classification process of COVID19 and makes it effective in the detection of six different classes using CXR images. Since a single modality would not be adequate to attain an effective detection rate, the integration of three models by the use of decision-based multimodal fusion increases the detection rate. In MMF architecture, the improvements in unimodal CNNs are beneficial and significant progress has been made. The number of features in the fusion model is 256,512 features.

3.3 Softmax classification

The extracted feature vectors are provided as input into the softmax classifier, which is generally employed for multiclass classification problems [25]. It undergoes mapping of an input vector i to a N dimensional space to K output classes, as defined in Eq. (4):

$$ {y}_j=\frac{\exp \left({\theta}_j^Ti\right)}{\sum_{k=1}^K\exp \left({\theta}_k^Ti\right)} $$
(4)

Where θk = [θk1, θk2, .., θkN]T indicates the weights, which have been tuned by an optimization algorithm. The output value of the layer is identical to the class count generated, i.e. six classes.

4 Results and discussion

The implementation of the MMF-DTL model takes place using Intel i5, 8th generation PC with 16GB RAM, MSI L370 Apro, Nividia 1050 Ti4 GB. We used Python 3.6.5 tool along with pandas, sklearn, Keras, Matplotlib, TensorFlow, opencv, Pillow, seaborn and pycm. The general set of parameters involved in the simulation process has been provided in Table 1.

Table 1 General Parameters Settings

Besides, the model specific parameters of VGG-16, ResNet-50, and Inception v3 model are offered in Table 2.

Table 2 Model Specific Parameters

4.1 Dataset description

The effectiveness of the proposed MMF-DTL model on COVID-19 diagnosis takes place on the Chest-X-Ray dataset (https://github.com/ieee8023/covid-chestxray-dataset). It contains a total of 305 images under six classes such as ARDS, COVID-19, No Finding, Pneumocystis, SARS, and Streptococcus. The details of the classes along with the corresponding image count are tabulated in Table 3. A few of the sample test images from six classes are also displayed in Fig. 3.

Table 3 Dataset Description
Fig. 3
figure 3

Sample Test Images (a) ARDS (b) COVID19 (c) No Findings) Pneumocystise) SARS (f) Streptococcus

4.2 Results analysis

Figure 4 shows the sample visualization of the images produced from the different stages of preprocessing. Figure 4a shows the actual input images and the respective contrast improved images by the CLAHE model are shown in Fig. 4b. Then, Fig. 4c shows the images with artifacts removed and finally, the sharpened image is illustrated in Fig. 4d. The class activation map of the test images for different classes is shown in Appendix 1 Fig. 13. The class activation map helps to detect the infected portions of the lungs for the respective class label (https://towardsdatascience.com/detection-of-covid-19-presence-from-chest-x-ray-scans-using-cnn-class-activation-maps-c1ab0d7c294b). Besides, the subprocesses involved in the MMF-DTL model in the classification of six classes have been visualized in Appendix 2 Figs. 14, 15, 16, 17, 18 and 19.

Fig. 4
figure 4

a Original Images b Contrast Enhanced Images c Artefacts Removed Images d Sharpened Images

Figure 5 shows the accuracy graph generated by four DL models at the time of training and validation. It is noted that the training and validation accuracies get improved with a rise in epoch count. The figures portrayed that the fusion model has offered a higher accuracy rate over the other three methods. Likewise, Fig. 6 depicts the loss graph generated by four DL models during the training and validation processes. The figure showed that the loss rate gets declined with an increase in the number of epochs. Besides, it is found that the fusion technique has shown proficient outcomes by the attainment of a minimal loss rate.

Fig. 5
figure 5

Accuracy Graph (a) VGG16 (b) ResNet50 (c) Inception V3 (d) Fusion Method

Fig. 6
figure 6

Loss Graph (a) VGG16 (b) ResNet-50 (c) Inception V3 (d) Fusion Model

Figure 7 shows the receiver operating characteristic (ROC) curves offered by the different DL models and fusion models. The ROC analysis demonstrated that the DL models have showcased effective classification process with improved values of ROC. Particularly, it is evident that the fusion model has accomplished maximum ROC values on the classification process.

Fig. 7
figure 7

ROC Curves (a) VGG16 (b) ResNet-50 (c) Inception V3 (d) Fusion Model

Figure 8 illustrates four confusion matrices generated by different models. From Fig. 8a, it is shown that the VGG-16 model has correctly classified 10 instances to ARDS, 201 instances to COVID-19, 27 instances to No Finding, 15 instances to Pneumocystis, 7 instances to SARS, and 17 instances to Streptococcus classes respectively. From Fig. 8b, it is depicted that the ResNet model has properly classified 11 instances to ARDS, 202 instances to COVID-19, 27 instances to No Finding, 15 instances to Pneumocystis, 7 instances to SARS, and 17 instances to Streptococcus classes respectively. Similarly, Fig. 8c indicated that the Inception v3 model has classified a set of 11, 206, 27, 15, 11, and 17 images under ARDS, COVID-19, No Finding, Pneumocystis, SARS, and Streptococcus classes respectively. Finally, Fig. 8d indicated that the Fusion model has resulted in effective performance and classified 9 images to ARDS, 215 images to COVID-19, 27 images to No Finding, 15 images to Pneumocystis, 11 images to SARS, and 17 images to Streptococcus classes respectively.

Fig. 8
figure 8

Confusion Matrix (a) VGG-16 (b) ResNet-50 (c) Inception-v3 (d) Fusion Model

Table 4 exhibits the manipulation of the values presented in the confusion matrix with respect to True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).

Table 4 Manipulation from Confusion Matrix

Table 5 offers a comprehensive comparative classification result analysis of the presented technique and other DL models. Figure 9 displays the average classifier outcomes of the proposed models on the test images. The experimental outcome pointed out that the VGG-16 model has classified the CXR images with average sensy of 86.94%, specy of 97.94%, precn of 82.46%, accuy of 96.79%, Fscore of 83.12% and kappa value of 80.04%. Besides, the ResNet-50 model has attained a slightly higher classifier outcome over the VGG-16 model with the average sensyof 88.13%, specy of 97.95%, precn of 84.07%, accuy of 97.04%, Fscore of 84.60% and kappa value of 81.67%. Moreover, the Inception v3 model has attained an even higher average sensy of 94.49%, specy of 98.38%, precn of 87.28%, accuy of 98%, Fscore of 90.44% and kappa of 88.24%. Furthermore, the Fusion model has attained maximum classifier results with the average sensy of 92.96%, specy of 98.54%, precn of 93.60%, accuy of 98.80%, Fscore of 93.26% and kappa of 91.86%.

Table 5 Performance Study of Proposed Method with different DL models
Fig. 9
figure 9

Average Classifier results analysis of proposed model

Table 6 and Figs. 10, 11 and 12 shows the comparative examination of the classification outcome provided by the presented and existing techniques on test CXR images. The table values indicated that the diagnosis technique presented by the DLS-SCD model classifies the images into distinct class labels with sensy of 86.67%, precn of 86.86%, and accuy of 86.70% respectively. Next to that, a slighter better result has been offered by ADCD-DCNN technique classifies a set of two classes with the sensy of 84%, specy of 90%, precnof 91%, and accuy of 87%. In line with, the DLA-CVD technique classifies two classes with the sensy of 88%, specy of 87%, accuy of 89.50%, and Fscore of 86%. In addition, the AD-TLCNN (3 Class) method has resulted to sensy of 98.66%, specy of 96.46%, and accuy of 96.78%.

Table 6 Comparison Study of proposed with different models [18]
Fig. 10
figure 10

Comparative results analysis of different models in terms of sensy and specy

Fig. 11
figure 11

Comparative results analysis of different models in terms of accuy

Fig. 12
figure 12

Comparative results analysis of different models in terms of F _ score

Besides, the AD-TLCNN (2 Class) method has obtained sensy of 98.66%, specy of 96.46%, and accuy of 96.72%. Simultaneously, the VGG-16 model has outperformed the earlier models with sensy of 86.94%, specy of 97.94%, precn of 82.46%, accuy of 96.79%, and Fscore of 83.12%. In line with, the ResNet-50 model has offered somewhat better results over the VGG-16 model with the average sensy of 88.13%, specy of 97.95%, precn of 84.07%, accuy of 97.04%, and Fscore of 84.60%. At the same time, the Inception v3 model has achieved near-optimal results with sensy of 94.49%, specy of 98.38%, precn of 87.28%, accuy of 98%, and Fscore of 90.44%. However, the proposed fusion model has shown effective results over all the compared methods by offering maximum sensy of 92.96%, specy of 98.54%, precn of 93.60%, accuy of 98.80%, and Fscore of 93.26%.

By looking into the above-mentioned results, it is evident that the MMF-DTL technique is found to be an effective tool for COVID-19 diagnosis using radiological images. The simulation outcome concluded that the MMF-DTL model has offered better diagnosis results by attaining average sensy of 92.96%, specy of 98.54%, precn of 93.60%, accuy of 98.80%, and an F-score of 93.26%. The application of the fusion process in the MMF-DTL model helps to achieve a higher detection rate.

5 Conclusion

This study has presented a new MMF-DTL technique to detect COVID-19 using CXRs. A new user interface has also been created to assist healthcare professionals in the classification process. The MMF-DTL model has initially preprocessed in three stages, namely contrast enhancement, artefacts removal, and image sharpening. Then, the feature extraction and classification processes are performed. The validation of the MMF-DTL technique is performed against the Chest X-Ray dataset and the images are classified into a set of six classes. The experimental values demonstrated that the MMF-DTL model has offered better diagnosis results with an average sensitivity of 92.96%, specificity of 98.54%, precision of 93.60%, the accuracy of 98.80%, and an F-score of 93.26%. Therefore, the proposed MMF-DTL model is found to be more effective than other DL models and can be employed for real-time classification processes. In the future, the performance of the MMF-DTL technique can be enhanced by the use of a large-sized dataset to improve the training process, and thereby the detection rate can be significantly improved. Besides, the proposed model can be tested on real-time images in our future work.