Keywords

1 Introduction

Pandemics are epidemics that spread rapidly over the globe. If not properly controlled, the hospitals, physicians, and healthcare staff become overloaded, which results in substantial morbidity and mortality and causes significant economic and social harm. During the pandemics such as COVID-19, patients flock to hospitals at any time of the day, and many others check in with the mildest symptoms. Healthcare workers need to be present at all times, and severe fatigue can affect their performance. As a gold standard, the Polymerase Chain Reaction (PCR) test is used to confirm the presence of COVID-19. However, PCR is time-consuming and has high false-negative rates. Therefore, in some medical centers, they have been replaced by Computerized Tomography (CT)-scan imaging (A non-invasive imaging technique that gives detailed three-dimensional images from the body). CT-scan diagnosis by a specialized radiologist is faster, contains more details about pneumonia, and can provide a quantitative measurement of the severity of infection [1]. However, experts are not always available, and pandemics can exacerbate these conditions.

In these situations, accurate and rapid diagnosis of the disease effectively prevents the outbreak of a pandemic. However, factors such as the increasing workload of physicians and lack of expert radiologists in a pandemic make screening and identifying suspicious patients difficult and slow [2]. However, due to the high transmission rate of the COVID-19 pandemic, rapid diagnosis of the disease is necessary. Artificial intelligence (AI) systems can pressure off the medical centers by assisting physicians in speeding up the diagnosis and treatment procedures. They can also help the physicians to make early predictions and efficiently manage the available resources to control the spread of disease.

Artificial Intelligence is the ability of a computer to perform tasks commonly associated with intelligent beings [3]. The core of an AI system is its knowledge processing unit. The system uses this unit to acquire knowledge and perform specific tasks [4]. In Machine learning (ML), an important domain in AI, rules are learned from data and used to make accurate decisions. Computers have advantages over humans that make them more suitable to perform specific tasks. They can perform calculations much faster, have more memory to remember what is essential, and are perpetually available. They never get tired, never change modes, and no internal state affects their decision-making process. For example, the COVID-19 diagnosis of CT-scan takes up to 15 minutes for a radiologist, while it takes only a few seconds for an AI-based method [5]. Another challenge in the COVID-19 pandemic is distinguishing between COVID-19 pneumonia and other bacterial and non-COVID-19 viral pneumonia types. AI can help inexperienced physicians improve their ability to correctly diagnose different types of infections [6]. In addition, compared with an experienced thoracic radiologist, AI can achieve higher sensitivity than radiologists at the early stages of the disease when the lack of human-visible abnormalities in CT-scan exist [7].

Resource planning is utilized in hospitals to determine whether the hospital can care for a patient’s needs. However, planning becomes a challenge during a pandemic with overburdened hospitals and limited resources. The goal is to save as many lives as possible. In this case, they have no choice but to admit patients based on the severity of their condition and the possibility of their survival. Some patients do not need to be treated in the hospital. However, for others, hospitalization is essential. Therefore, determining the condition of each patient and predicting the required facilities becomes vital. AI can help with resource planning by learning complex patterns from symptoms, historical and clinical data. Currently, many AI models are being used to predict overall mortality for the next few months or even a year [8]. These methods use various information such as blood biomarkers, age, gender, and disease background of the patient to achieve accurate predictions [9].

During a pandemic, patients with varying degrees of severity are admitted to hospitals. However, not all of them need to be hospitalized. Moreover, hospitals can be a source of further infection, especially during a pandemic. For example, a study reported that 41% of 138 hospitalized patients with COVID-19 were infected after being admitted to a hospital [10]. The Health Monitoring System (HMS) allows patients to be monitored remotely. HMS is an advanced technology alternative to traditional patient and health management. It consists of a wearable wireless device, such as a wristband with a sensor, equipped with software for the specialist to access important medical information [11]. AI can help to monitor the patients on two levels. First, it can assess the condition more frequently than a physician to warn for irregularities. Second, it can also predict the patient’s health condition to adopt the required precautions in the coming days.

Infectious diseases such as COVID-19 can spread from person to person by physical contact, droplets, saliva, or airborne [12]. Because of the rapid spread of the COVID-19 virus, it has a high rate of infection [13]. To prevent the outbreak of the disease, one approach is to implement a patient tracking system that immediately alerts individuals who have had recent contact with known cases of the disease during their viral period and prompt them to become isolated [14]. There are several ways to track down individuals who have had contact or interaction with patients affected by the COVID-19 virus. For example, by collecting data from Bluetooth-based tracking applications, GPS and social graphs, video surveillance, and CCTV cameras. Further information can be gathered from card transaction data, internet search and social media monitoring, text data, and network-based Application Programming Interfaces (APIs), which are intermediary programs that connect two other apps together. Automated systems powered by AI can be designed to analyze different data types and model a tracking system to control the pandemic [14,15,16]. However, challenges such as technical limitations, socioeconomic disparities, data privacy and security risks, and ethical issues still lie ahead [17, 18].

The following section describes the general concepts and guidelines for using an appropriate AI system to manage the COVID-19 pandemic, focusing on diagnostic and screening tasks. Then, we explain the potential problems and pitfalls of AI-based methods using experiments with actual data. In section three, we elaborate on the use of AI to facilitate various operations in a pandemic. In section four, we describe the State-Of-The-Art (SOTA) AI systems used to diagnose COVID-19. The last section discusses the obstacles to exploiting AI’s full potential in the current COVID-19 pandemic and proposes future works to make AI more effective in a pandemic.

2 A Guideline to Develop AI Models for Diagnosis and Screening

Continuous screening and rapid diagnosis are two of the most frequent tasks in controlling a pandemic. The function of the standard COVID-19 screening test is to identify people with a higher risk of spreading the disease. Since asymptomatic transmission plays a significant role in a pandemic, using screening tests to classify patients becomes very important to reduce the outbreak of the disease. The purpose of these tests is not to confidently detect the virus but to identify the suspicious patients and isolate them to minimize the rate of infection. For example, chest X-ray (CXR) images are used for screening COVID-19 as they are widely available, inexpensive, fast, and contain helpful information for detecting COVID-19 infections [19,20,21,22,23]. In addition, some studies have shown that magnetic resonance imaging (MRI), PET imaging, and ultrasound can be used for diagnosis and screening. However, they are not usually used in clinical practice [24,25,26,27,28,29,30].

On the other side, the goal of diagnostic tests is to detect the presence or absence of the virus in suspected patients of COVID-19. These tests may also identify the cause, location, and severity of the disease. If the screening result indicates the presence of the disease, diagnostic tests are required to confirm the infection. For instance, the PCR or CT-scan tests can confirm the presence of COVID-19 [1, 31,32,33].

There is a wide range of data sources for the detection of a virus. Usually, the easier, faster, and more available the data collection, the less accurate it becomes. Generally speaking, during the COVID-19 pandemic, the data range starts with mobile applications widely available and fast. It continues to PCR testing and CT-scan images, which are more accurate and less available. As mentioned above, screening tests require simple and rapid data collection, while diagnostic tests require more accurate data.

As described above, screening is conducted over a larger population. However, an expert is not available to assess the test in many cases. Although diagnosis is performed over a selected population, their number is still high in a pandemic, resulting in the extreme tiredness of the experts. In addition, there is a shortage of experts in some regions. AI can help in both tasks and assist the medical staff in making more accurate and rapid diagnoses. We provide a guideline for how an AI system can be trained for such tasks. We also describe the general concepts and the common problems and pitfalls in screening and diagnosis with several real-world examples [34,35,36,37].

2.1 Training, Testing, and Further Validation of AI Models

In the supervised learning approach for an ML model, everything starts from data. We do not embed the knowledge of how to decide or how to calculate in the model. Instead, we let the model learn its optimal configuration from the data. ML problems fall into three general categories of classification, regression, and clustering. There are a limited number of groups in classification problems, and the goal is to identify the group for each sample in the dataset. For example, in classification, the goal is to determine whether a person has a specific disease. In regression, each sample is associated with a continuous number, and the goal is to calculate that number. Calculating the percentage of infected lung regions in pulmonary diseases with pneumonia is an example of this category. In clustering, the samples do not have any assigned groups or numbers, and the goal is to group samples with similar patterns in the collected data. For example, clustering patients with different symptoms in an unknown disease can lead experts to the disease’s possible stroke paths.

The procedure of deploying an AI model for a classification problem is depicted in Fig. 1. In a classification problem, the dataset consists of multiple samples from each group or class. At first, the dataset is divided into three sets; training, validation, and test. Next, the training data is used to obtain the parameters of the model. A model may perform well on the training data but only learns the appearance of the samples as they are repeated in the training process. This phenomenon is called overfitting. To avoid over-fitting, multiple augmentations are applied to the input sample. For image data, rotation, geometric transformations, and changing brightness are examples of common augmentations. Finally, the trained model is evaluated on the validation data to make sure it has learned the general properties of the data. In addition to evaluation metrics, interpretations of the network may also be used as an extra check to validate the reasons behind the model’s decisions.

Fig. 1
figure 1

The procedure of deploying an AI model for a classification problem

Different architectures and training procedures lead to various performances of the model on training and validation sets. The one with the best performance on validation data is selected as the final model among the trained models. In evaluating the performance on validation data, redesigning, and retraining the model, the validation set is observed multiple times. Therefore, the model’s performance on the validation set cannot be a fair metric for the model’s generalizability. Therefore, a test set is held out to evaluate the model’s performance on completely unseen data.

In some cases, models are designed to fit specific data appearances. As a result, they may fail on other samples, e.g., samples of another imaging device, harming the model’s generalizability. Therefore, collecting the test data from sources other than the training and validation data can improve the model’s generalizability. At the final step, the model is deployed and tested by an expert. The feedbacks are collected and returned to the model to make it optimal.

2.2 Data Gathering and Soundness

As mentioned above, an ML algorithm learns the model from data. Therefore, it is essential to have a diverse dataset to enforce the model learn different observations. Otherwise, it is limited to what it has seen and may perform poorly for other datasets. For example, it is crucial to consider patients with different stages of the disease for diagnosis, including healthy people in the dataset. If the training dataset only consists of the severe stage of the disease, it may fail to detect the early stages of the disease.

Another important point is to ensure the labels are correctly assigned to the dataset samples because the model learns from those labeled samples. If an expert labels the samples, the model will become biased towards the expert’s opinion. Having confirmation from experts, using laboratory tests, and post-data-gathering samples' observations ensure correctness and make the model more robust. Another problem induced by the inexactness of the labels is the trainability of the model. When there are many mistakes in labeling, the model fails to train well. Therefore, we designed an experiment to investigate the effect of inappropriate data labeling on AI models. We selected two groups of data, one from NIH’s chest-X-ray dataset [38] and the other from RSNA’s chest-X-ray dataset [39]. The NIH’s dataset contains 112120 images that are labeled using natural language analysis on the radiological reports. Therefore, they may be inexact. On the other hand, RSNA’s dataset is a subset of NIH’s dataset that at least two experts annotate to make it more trusted. Therefore, we gathered “No Finding” and “Pneumonia” samples of NIH’s dataset for the first group, including 45449 negative and 696 positive samples. We collected “Healthy” and “Pneumonia” samples of RSNA’s dataset for the second group, including 8851 negative and 9555 positive samples.

To eliminate the effect of unequal group sizes, some of the negative samples of NIH and some of the positive samples of RSNA were randomly discarded. As a result, 8851 negative samples and 696 positive samples remained for both groups. Next, 80% of the datasets Were selected as the training set and the rest as the validation set. Finally, two well-known models, Inception V3 [40] and ResNet18 [41], were trained with the Adam optimizer [42]. We utilized transfer learning and initialized the weights using the trained models available in the Torchvision package [43]. For both models, we first tuned the last fully connected layer and then one block before that. To avoid the effect of imbalanced classes, batches of size 32 containing 16 positives and 16 negative samples were used for training. Data augmentation techniques such as random brightness, cropping and resizing, rotating, and shearing was used to expand the dataset for generalization.

The epoch acquiring the best average sensitivity and specificity on the validation data was selected for all the models. The fine-tuning was done in the second stage and continued until the model achieved more than 98% sensitivity and specificity for the training data. The epoch with the best performance on validation data was selected as the final model. We tested the models with the dataset of COVID-19 radiography database [44], Chest X-Ray Pneumonia [45], and Kaggle VinBigData [46] to evaluate their generalizability. The results for validation and test datasets are presented in Table 1.

Table 1 The effects of inappropriate data labeling. Sens, sensitivity; spec, specificity

As results show, both models trained on the NIH dataset fail to achieve high-performance metrics as the models trained on RSNA with the validation set. This proves that in datasets of equal size, certainty on labels helps exceedingly in training the model. Apart from the low sensitivity achieved on the test data due to the shortage of positive samples in the selected groups, the models trained with RSNA samples have achieved higher performance metrics for the test data.

2.3 Data Diversity, the Problem of Batch Effect and Generalization

In the COVID-19 pandemic, many published papers have used small datasets and reported high evaluation results. However, in most cases, the samples related to different classes were collected from other sources. Therefore, these models are susceptible to bias toward the appearance of samples for each dataset instead of learning to solve a general problem. As a result, the models lack generalization and may fail when applied to unseen data. This problem is not specific to small datasets, and large datasets may also have the same issue. For example, suppose the test set was gathered from the same training and validation datasets. In that case, the problem would not be identified in the final evaluation stage, and a problematic model will be deployed.

We designed an experiment to show the destructive effect of bias and batch effect in an intense investigation. We selected two groups of data from Kaggle’s chest Xray Pneumonia and Kaggle’s RSNA challenges. Both challenges were intended for detecting pneumonia from chest X-ray images. We selected pneumonia samples from the RSNA dataset and healthy samples from the chest-X-ray Pneumonia dataset for the first group, including 9555 positive and 1583 negative samples. We selected the same amount of positive and negative samples from both datasets for the second group. We selected 80% of the datasets as the training set and 10% as the validation set. 10% of each primary dataset was put aside as the shared test dataset. We eliminated the positive samples related to the RSNA dataset and the negative samples of the chest X-ray Pneumonia from the test set to bold the differences. The training and model selection was performed similarly to the scheme mentioned in Sect. 2.2. The evaluation results of the models are presented in Table 2. As the table shows, the models trained on the first group have reached high sensitivities and specificities on the validation set, which has the same bias as the training data. One may think the models perform even better than the second group considering the performance metrics of the validation sets. However, as the evaluation of test data with inverse bias shows, the models of the first group have become entirely biased toward the appearances of the samples for each dataset. In contrast, the evaluation metrics of the second group on the test set are similar to the validation set results.

Table 2 The problem of batch effect and generalization caused by a biased dataset

2.4 Interpreting the Black-Box Deep AI Models

Deep neural networks have shown excellent performances, achieving high accuracies in many domains, even better than human experts. However, most of these models are black-boxes, meaning the internal decision-making mechanism of the network at the intermediate layers is not known. Therefore, their high accuracy is not sufficient to build trust toward them. Consequently, they may have performed well due to wrong reasons that are irrelevant to the domain-specific concerns [47]. In recent years, researchers have focused on interpreting the black-box models. Interpreting means explaining the reasons behind the decision of a model in a human-understandable way [48]. These interpretations help identify the bias in the model’s decisions, make sure the model has been fair, monitor the model performance based on the reasons behind the decisions, and learn unknown domains from the model [47].

To show the trace of bias in the models, we interpreted the decisions of the above ResNet18 model using Guided Grad-CAM [49]. The interpretations will show which parts of the images were considered by each model to make a decision. Figure 2 presents the interpretation results of the models on seven random pneumonia samples from the test dataset. As the figure shows, the model trained on the biased dataset pays more attention to the outside regions of the lung, while the other model focuses on lung infections. It shows that the model trained on the biased dataset has learned the apparent differences between the two datasets as the distinguishing feature of the patient and healthy images, instead of learning the patterns of pneumonia.

Fig. 2
figure 2

Comparing interpretation results on ResNet18 using Guided Grad-CAM, trained on a the biased, and b the unbiased datasets

3 State-Of-The-Art AI Technologies for COVID-19 Diagnosis

Physicians in many countries have adopted imaging techniques and test kits to obtain more accurate diagnosis results of COVID-19 and determine the severity of the disease for each patient. CT-scan imaging is more accurate and sensitive among other imaging techniques, and many researchers have worked on the automatic diagnosis of COVID-19 using CT-scan images. Some have also focused on estimating the severity metrics for the patients. In addition to detecting infected from healthy individuals, some have also aimed to solve the more difficult problem of distinguishing COVID-19 from other lung diseases such as community-acquired pneumonia. Unfortunately, a fair comparison between published results is impossible due to the lack of benchmark datasets. Therefore, most studies applied similar steps with a limited number of choices on different datasets. Consequently, we discuss the existing challenges and the general methods used by other studies to overcome these challenges in the case of COVID-19 diagnosis. The details and performance measures for the selected studies are given in Table 3.

Table 3 Details and performance metrics of the SOTA studies in diagnosing COVID-19

Diagnosis with CT-scan images is not a straightforward problem. It includes many challenges, and researchers have used different methods to overcome these challenges. For example, CT-Scan samples may have a different number of images, from 30 to 800, depending on the thickness of the cuts. This poses a significant challenge to the learning process. Some researchers have focused on high-resolution samples to keep the sensitivity and accuracy high [50]. In contrast, the others have trained their model on large cohorts with different thicknesses to have a more generalized model [6, 51, 52].

The COVID-19 CT-scans datasets usually have labeled samples. This is because carefully annotating images associated with infection and images with disease features is time-consuming, while radiologists focus on their primary tasks during the pandemic. Having many images per sample and being trained only on the samples’ labels is like looking for a needle in a haystack for a model that does not know about the disease. When we train a model to distinguish between groups of samples, the model tries to learn the groups’ differences. Searching in a large sample makes the training process harder and slower. Moreover, a sample of almost 200 images can fill the memory of a common GPU with 12 GB of RAM. Therefore the training becomes even more difficult.

Integrating human knowledge about the disease to the method, architecture of the model, and training process can significantly reduce the training challenges. It can also help the model learn relevant differences, which helps distinguish the groups of samples. For example, since we know COVID-19 affects the lungs, it would be logical to search for the marks of the disease in the lung areas of CT-scan images. Many researchers have adopted this knowledge in a pre-processing step. Some have used image processing techniques such as automatic thresholding to separate the lung areas from images. Some have trained a network over their own annotated private dataset to detect the lungs [6, 50,51,52,53,54]. Therefore, the rest of the images are eliminated except for the lungs, and the model skips the irrelevant areas during the training process.

Researchers have tried to ease the challenge of training with different methods. Some have adopted a fully supervised approach by providing slice-based labels and training their model to classify the slices rather than the whole image [51]. While being very helpful in training the model, this solution suffers from being biased to the error of the annotator. Even if the error were reduced by aggregating the annotations from multiple radiologists, there would still be a problem with the lack of human-detectable signs. This means the model will at most be able to distinguish what radiologists can distinguish. In practice, we have many unlabeled samples and only a limited number of labeled samples. Therefore, Some researchers have adopted semi-fully supervised methods for the training process. This is possible by adopting a two termed loss function; one for the sample-level prediction and the other for the slice-level prediction. The first term is calculated over all the samples of the training batch, but the second term only focuses on samples with specific slice labels. Here the sample-level loss considers the slices with a lack of human observable marks. Thus, the slices with a negative label belonging to a positive sample should not be included as healthy slices in the training process for the annotated samples. Finally, many researchers have trained their models merely based on the sample-level labels and achieved notable results [6, 50, 52, 53].

COVID-19 is detected mainly by ground-glass opacity infections in the peripheral of the lung. Some researchers [52, 55] have developed their method based on this knowledge. First, they utilize the existing models and tools for pulmonary lesion detection that specify the regions of the lesions [56]. In the next step, they use deep learning models to classify the type of lesions and aggregate the results of the lesions to decide for the whole sample. Unfortunately, these methods may also miss the samples with the lack of human-detectable marks.

Diagnosing COVID-19 from CT-scan images is inherently a 3D problem. This is because the images of the CT-scan are cuts through another dimension, named depth. Some researchers have treated CT-scan images as 3D data, extended the 2D models to work with 3D data, and used the sample-level labels to train the model [6, 53, 54]. These models utilize 3D convolutional kernels and demand much more GPU memory than the 2D models during the training, and have more extensive computations in the evaluation phase. In addition, they are more sensitive to the thicknesses of the slices as convolutional networks are not invariant to different scales. Others have worked merely on 2D models and classified slices rather than the whole samples [51]. Other researchers have just aggregated the features extracted from the slices before making the final decision using a pooling layer [50] or an LSTM network [6]. However, they miss the information that the peripheral slices can add to each slice. Other researchers have adopted a hybrid 2D-3D model to have advantages of both schemes. They have used 2D models to extract information from individual slices and added the peripheral slices as extra channels to the input slices. Another problem with the 2D and the hybrid methods is their requirement for slice-level labels in the training phase. Some have overcome this problem by aggregating the features extracted from slices before the decision-making section of the network. To solve the problem with GPU’s memory, they have subsampled uniformly from different parts of the sequence of images for each CT-scan. It is more probable to capture the disease-related marks in at least one of the chosen slices in subsampling. Some others have used a weakly supervised approach to dynamically guess the slices related to the disease in each round of training [52]. Moreover, for the 2D and hybrid models, an additional post-processing step must be considered to reduce the false positives. Many have used Markov models to extract more reasonable probabilities for slices based on the probabilities of the peripheral slices.

To train the models for detecting infections and diagnosing COVID-19, pixel-level labels are required. This takes even more effort from radiologists for annotations. Researchers have used fully supervised, semi-supervised, and weakly supervised schemes for training the model based on slices of CT-scan images. There have also been other weakly supervised schemes that use the interpretations of the model to perform sample-level or slice-level predictions for finding the input areas that affect the output. They use these areas as ground truth and train the infection detection model by utilizing them [below]. In detecting infection, fully supervised and semi-supervised methods can lead to more accurate results. In contrast, the weakly supervised methods would only detect approximate zones due to many pixels and the inexact labels used in training.

4 AI and Pandemic

This section addresses other areas where AI can help speed up processes in managing the pandemic.

4.1 AI for Status Prediction

Two different approaches have been taken to study COVID-19 mortality; predicting large-scale mortality and predicting the death of each individual according to its condition. The first approach is commonly used to predict mortality in a city or country. It uses the distribution of mortality rates in recent days and weeks to predict the near future. To achieve higher accuracy, parameters such as hospital facilities, human mobility, non-pharmaceutical interventions, demographics, historical air quality, and econometrics in the area can also be considered [58].

The second approach predicts the probability of death for each individual. It uses different models like Neural Networks, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest, and Decision Tree to obtain the highest accuracy. For example, in our study, the neural network model achieved 89.98% accuracy in predicting the mortality of COVID-19 patients.

In addition, in [59] they trained an AI model on a large dataset of hospitals in Ontario, with nearly 70 thousand patients. Data were collected from the Ontario Health Services (PHO) and Canada Health Services (PHAC) datasets. The model achieved an Area Under the Curve (AUC) of 90% for PHO and 93.5% for PHAC datasets.

4.2 Utilization of AI in Vaccine Discovery

To combat the COVID-19 pandemic, vaccination is the most effective strategy. Most SARS–CoV-2 vaccines in advanced clinical trials are based on modern vaccine approaches that rely on introducing the specific parts of the virus or their genes into the body to generate a targeted immune response. Thus, current methods have shifted away from live-attenuated and inactivated whole-pathogen vaccines to purified antigens and epitopes.

Vaccine development is complex and often requires extensive, time-consuming, and resource-intensive studies to determine efficacy and potential side effects. AI can help speed up the lengthy and costly process of vaccine development.

The vaccine design process has been revolutionized by reverse vaccinology, which focuses on finding potential vaccine candidates by analyzing pathogens' protein-coding genome (proteome). The SARS–CoV-2 consists of four structural proteins, E (envelope), M (membrane), N (nucleocapsid), and S (spike), as well as several non-structural proteins. The AI approaches can facilitate antigen selection, epitope prediction, immune response modeling, and affinity with human leukocyte antigen alleles in the vicinity of COVID-19 to select the best possible ones. Numerous epitope prediction studies focused on this protein since the S protein is involved in viral entry and provokes an immune response. BNT162b2, mRNA-1273, and AZD1222 are three recently approved vaccines against SARS–CoV-2, all of which use the S protein. Thus, AI approaches can identify specific epitopes from a significant number of potential SARS SARS–CoV-2 peptides capable of inducing a robust and protective immune response.

Another critical problem that could be solved by an AI-based method in predicting the immunogenicity of the developed vaccine. In search for SARS–CoV-2 proteins associated with optimal immune response, computational biology can identify gene coding proteins associated with COVID-19 severity. In addition, a cellular immune response network can be constructed using host-virus and virus-host interaction data.

One of the significant challenges in vaccine development lies in the mutations of SARS–CoV-2 strains. Therefore, there is a need to test whether a selected epitope is conserved across mutations and multiple populations for vaccine development. Despite the development of several machine learning-based classifiers for allergenicity and toxicology, there is currently no method for predicting the toxicity of all vaccine components in combination, which computational network analysis could achieve [60]. Furthermore, breeding SARS–CoV-2 variants resistant to the approved vaccines is not impossible. Therefore, more robust and precise AI-in-silico approaches should be developed to a better vaccine against the new variants of the virus.

4.3 AI in Controlling the Pandemic

COVID-19, as a global health crisis, forced the health care providers to seek new technologies to monitor and control the spread of the pandemic. The extraordinary amount of data derived from public health surveillance, real-time epidemic outbreak monitoring, trend forecasting, regular situation briefings, and medical records must be managed to control and anticipate new diseases.

AI-based methods can track the spread of the virus in real-time, plan public health interventions and monitor their effectiveness. Indeed, the flexibility, rapid analysis and identification of patterns, ability to adapt based on a new understanding of the disease process, self-improvement as new data become available, and lack of human bias in the analysis make AI a promising new tool for pandemic management. Table 4 indicates some possible applications of AI in the control and management of the COVID-19 pandemic.

Table 4 Possible applications of artificial intelligence and big data for the management of the COVID-19 outbreak

Information obtained from patient tracking plays a vital role for general public health governance in designing, planning, and organizing to cope with the pandemic [15]. Researchers in [18] have listed 36 countries that have successfully implemented mobile app patient tracking systems. There are several ways to achieve this goal as follows.

4.3.1 Patient Tracking Using Mobile Apps

A successful example is the QR-code-based screening app used in Hubei, China, to monitor people’s movement. A similar approach has been used in Taiwan to track high-risk individuals based on their travel history to affected areas [61, 62]. Tracking information from Internet-based searches can also help to predict future outbreaks [61]. For example, using WeChat text data in the context of COVID-19 was another successful approach to predicting disease outbreaks in China [63]. In addition, by searching for “fever” and “cough” in Google Trends, the researchers discovered that these words had a significant association with the COVID-19 outbreak and subsequent hospitalizations or deaths [22].

4.3.2 Patient Tracking Using Video Surveillance

Using video surveillance to detect proximity and social distance has advantages over other approaches such as Bluetooth or GPS technologies. However, the methods mentioned above lead to a high rate of false positives due to their low spatial resolution. Other studies show deep-learning approaches for analyzing CCTV cameras in the workplace to monitor workers’ activities and detect violations [64, 65]. In another study, a facial touch detection system (that detects when someone unconsciously touches their face) was also developed to ensure security and collect more data that can be used in related applications [66].

4.3.3 Patient Tracking Using Natural Language Processing (NLP)

An important application of machine learning in patient tracking is determining public opinion and society’s perception of social distancing [67]. For example, the results of a text classification study showed that public opinion plays a vital role in guiding relevant decisions (sentiment analysis to ensure that they have been well taught) [14].

There are also challenges with current mobile apps [14]. For example, technical limitations include a lack of highly skilled developers and companies developing and deploying the tracking system. Different countries like the US, UK, and Australia have come to other solutions to overcome this problem [18].

Unfortunately, there is not enough quantitative information about the contact tracing apps used in the above countries to compare their performance and other features comprehensively.

However, in [68], the authors mentioned a COVID-19 Tracing App Scale (COVIDTAS) framework to compare these apps. COVIDTAS was adopted from a framework in [69] and developed based on features such as usability, technology, privacy, tracking effectiveness, and factors reflecting user experience and sentiment.

4.4 Wearable Sensors Application in COVID-19 Pandemic

Other trending approaches that are progressively developing are COVID-19 wearable devices. Despite the classic and conventional techniques used in clinical settings like PCR tests or imaging modalities, these approaches have continuous access to health records of potential COVID-19 subjects all day long. At present, devices like smartwatches or wristbands are recording health data used for screening purposes. Moreover, wearable devices like smart lenses, smart on-teeth sensors, smart masks, and smart biosensors are gaining more attention. However, possible precautions of these technologies, such as data privacy, must be considered [70, 71].

More recently, observational studies on wearable biosensors for remote monitoring of COVID-19 subjects by AI algorithms have revealed promising performance in detecting COVID-19 patients. Un and his colleagues designed their observational study to show the potentials of wearable biosensors and AI in clinical monitoring. They illustrated that wearable biosensors with AI reached a high correlation with manual procedures in predicting clinical worsening events, as well as prolonged hospitalization [72].

Additionally, there are other related works in literature, including newly-developed wearable devices and state-of-the-art AI algorithms that can predict the potential outcomes of disease. As another example, a wireless skin-interfaced device attaching to the suprasternal notch designed by FitBit can sense multiple features like body movement, heart rate, respiratory-related signals, and other signs and symptoms like body temperature and cough. Therefore, by an ongoing fusion of these features by utilizing AI techniques, one can classify or predict diseases (e.g., being COVID-19 or not) more accurately [73].

5 Future Directions

5.1 The COVID-19 Pandemic Experience

Since the COVID-19 outbreak in late 2019, the disease has become a potential threat to global health. Facing a pandemic is cross-sectoral. These sectors, including economy, social, cultural, environmental, and political, are collectively called social determinants of health.

In a 2000 report, the World Health Organization (WHO) outlined the primary role of health systems in achieving their goals through a set of six fundamental building blocks, including service delivery, health workforce, health information systems, medical products, vaccines and technology, health financing, and governance. Information and AI technologies that serve these six principles can increase the resilience of health systems.

The strategy of countries on health systems plays a crucial role in controlling pandemics. COVID-19 pandemic showed a fundamental weakness in international organizations and governments to face a pandemic. For example, WHO made basic mistakes in managing the COVID-19 pandemic. Without sufficient information, it declared the virus is non-communicable and considered the use of the mask unnecessary for a long time. A year and a half after the pandemic, COVAX vaccine distribution policies cannot provide comprehensive vaccination plans. In the COVID-19 pandemic, medical advances in specialized fields did not help significantly. Unfortunately, at the beginning of this crisis, due to incorrect policy-making, the influx of hospitals and emergency rooms caused the spread of the virus and increased mortality.

Since it is time-consuming to find effective vaccines and treatments at the beginning of a pandemic, the most important global action is identifying and tracking patients and implementing smart distancing between citizens. The experience and evidence of successful countries in the field of COVID-19 pandemic management in the world have shown that most of the successful management activities are based on the correct and timely application of AI solutions. The first step in managing the information of any epidemic is to identify and record the information of the target population. Accurate identification and registration of patients in different disease states are possible through the following:

  1. 1.

    Identifying the exact number of cases and the actual geographical prevalence,

  2. 2.

    Preparation of contact tracking map of infected people and carriers and the possibility of predicting the future pattern of outbreaks,

  3. 3.

    Ability to control the movement of the population for the best type of social/physical distancing policy, and

  4. 4.

    Anticipating the needs and resources of care and treatment, focusing on efficient distribution of resources.

The following steps are suggested for implementing a control and management system:

  1. 1.

    Implementing a national pandemic web-based system,

  2. 2.

    Identifying the status of each individual; without symptoms, suspicious symptoms, definitive infected, and convalescence categories,

  3. 3.

    Launching a status inquiry system

  4. 4.

    Requiring citizens to carry pandemic IDs,

  5. 5.

    Electronic screening process through registering geographical position and an AI algorithm to compute the risk of being infected by being near the patients and suspected people,

  6. 6.

    Tracking patients through their phones and GPS,

  7. 7.

    Establishing an AI-based face tracking system for patients to prevent them from entering crowded areas,

  8. 8.

    Developing AI-based screening systems that recognized patients through their temperature or other characteristics,

  9. 9.

    Developing AI-based chatbots, and

  10. 10.

    Preparing a telemedicine system to checkup patients in remote locations.

We should adopt new data collection and analysis strategies using emerging technologies. For example, the Internet of Things (IoT) refers to the interconnected network of physical objects such as sensors, health measuring devices, intelligent sensors, home appliances, automotive devices, etc. IoT enables objects to sense, process, and communicate with each other and automatically interact with people and provide intelligent service to users. The IoT platforms can also be used over cloud computing platforms to provide systematic and intelligent prevention and control of COVID-19, which includes five steps: symptom detection, quarantine monitoring, disease contact detection, and social distancing, disease prognosis, and disease mutation tracking. If IoT, cloud, and AI are appropriately utilized, they can provide rapid and efficient healthcare services, especially in the perspective of COVID-19.

5.2 Toward a Universal Crowd-Sourcing and Validating Framework for AI Models

The AI methods have declared they can come in handy during the pandemics, although they have not shown a significant impact on the case of COVID-19 [74]. It was expected to have the AI methods emerge before the first COVID-19 peak in diagnosis and screening, which means scientists had few months to train sophisticated models. As mentioned before, having a large dataset with a wide variety is crucial to have generalizable models, while there was no public dataset in the said period for the COVID-19 pandemic. This led scientists to search the literature, hoping to find sufficient data, and another group searched among the care centers looking for data. However, sufficient positive cases cannot be located in one medical center in the emergence of a pandemic. Besides, there is numerous paperwork to satisfy the privacy of data, which takes even more time. In other words, much time would be spent on gathering the required data while it can be spent on designing and training a suitable model. These problems can be solved if there is a universal crowd-sourcing public dataset. There might be a lack of samples in a small area. However, there is undoubtedly enough data all over the world. Many people wish to donate their samples and contribute to making a universal public dataset to help scientists battling against the pandemic. This framework should have strict confirmation policies to ensure data validity and trustability. Other scientists like physicians and experts can also use this worldwide public dataset to study the disease comprehensively.

Unfortunately, the hotness of a pandemic causes a paper storm. In this situation, it is difficult and time-consuming to find state-of-the-art methods. The review papers ease this process by summarizing many studies. Nevertheless, there is a trade-off between the coverage of ideas and faster release. For example, in screening and diagnosing COVID-19 using medical images, there are more than 2000 papers, howbeit no review paper has covered more than a few methods. Another problem is that not many experts trust the papers in the arXiv, and it takes a long time to be published in a peer-reviewed journal or conference. These problems can be solved by having a universal public framework for sharing the data, papers, and results in a structured way. For example, in the case of diagnosing COVID-19, it could be a tabled data containing the date, the number of training samples, the number of test samples, classification groups, evaluation metrics on the private test set and also on public datasets, and some extra tags for the general methodologies like fully supervised, semi-supervised, weakly supervised, or unsupervised. Having tabled data like this could make searching the literature much easier and faster by utilizing AI-based filters and sorting tools. There are websites like https://paperswithcode.com/ that have aimed at a similar goal by introducing datasets and grouping the studies.

The experience of the COVID-19 pandemic showed that the potentials of AI methods did not widely explore to combat the pandemics. One reason is that people cannot easily trust the reported results. The results cannot be trusted unless one can trust the data and run the proposed trained model on the same data. Some organizations like Kaggle and Dream Challenge aim to solve global problems by challenging and validating the trained models with private test datasets. During a pandemic, the challenges can help to a great extend. Unfortunately, in the case of diagnosing COVID-19 using CT-scan images, there were no such challenges.

In conclusion, in the case of the COVID-19 pandemic, the absence of the proposed framework is evident. As a result, there was a delay in the emergence of sophisticated and trustable AI models, duplicate ideas on different private datasets, and no fair comparison. This happened while the studies could have been complementary and helped in the evolution of working solutions. The recent experience proved that the world had not been prepared for a pandemic. Efficient adoption of emerging technologies such as IoT, cloud, and AI can significantly help control and manage pandemics.

6 Conclusion

This chapter briefly introduced AI, its strong potentials, and its capability of making manual procedures faster and more accurate during pandemics. Considering its benefits, AI has a high potential to help in a pandemic where hospitals are overloaded, and health experts cannot respond on time. In recent years, AI models have shown outstanding performance in many health applications, especially during pandemics. Therefore, it is essential for the health staff who work with AI systems to know these models and how they operate. So, they neither overestimate nor underestimate them. We demonstrated how AI models are trained. We set up an experiment on actual data to show that models cannot be trained perfectly on any data. A more precisely labeled dataset can lead to much better training. We also showed that not every model that has reported high performance could be confidently trusted. Moreover, we set up an extreme experiment to show how a model can become biased to dataset-specific features, resulting in increased performance in the dataset and low performance in other general datasets. In this regard, we introduced the concept of explainability of AI models. We showed how explaining decisions could help understand and trust the model’s findings, which can lead to understanding whether the model’s decision on a single case is reliable and if the model is performing rationally in general or has become biased. When a model is biased to unrelated features, it will not behave as expected. In conclusion, the models need to be evaluated on a diverse set of test data representing the actual population’s distribution so that the experts can trust the reported performance metrics.

We described the general applications of AI in pandemics and the corresponding SOTA studies. We focused more deeply on the role of AI models in diagnosis and screening as two essential requirements of breaking out a pandemic. Several methods of diagnosis and screening were used during the COVID-19 pandemic. As stated before, evaluating the models on large test sets can indicate their similar performance on the general population and, therefore, their applicability on large-scale situations. Consequently, we selected SOTA methods acquiring the desired quality. Due to the high overlap between the methods, we described the general methodologies and mentioned differences rather than the extensive description of those methods.

Finally, we demonstrated experiences during the COVID-19 pandemic that avoided exploiting the AI’s full potentials. We stated the lack of valuable data and the tremendous required effort to gather datasets are the major delaying factors in delivering effective AI systems for the pandemic’s peak time. We also described the effect of the so-called paper storm as another delaying factor. The literature review shows that most studies have reported similar performance on different private datasets. However, their performance on general datasets is unknown. Therefore, creating an open dataset could facilitate the delivery of effective AI solutions in a pandemic. We reported the lack of trust towards the reported performance metrics as a prohibitory reason for using AI models by experts. To solve the problems mentioned above, we proposed a unified framework for gathering data, effectively managing the paper storm, and trustfully evaluating the models that can be more effective in the next pandemic.