1 Introduction

Devastating pandemics have plagued humanity throughout history. These include the Black Death, which was brought on by fleas on infected rodents and killed up to 150 million people, the Spanish Flu, which killed over 50 million people and infected 25% of the world’s population, and smallpox, which wiped out 90% of the native population of the Western Hemisphere and killed hundreds of millions of people in the twentieth century alone. In ten years, smallpox was completely eliminated worldwide.

Symptoms of the novel Coronavirus Disease (COVID-19) include fever, cough, dyspnea, myalgia, and muscle aches. The disease follows previous epidemics caused by highly transmissible recurring respiratory viruses [16]. A strict quarantine rule was implemented in an effort to control the COVID-19 outbreak, but the spread of the virus has been rapid. COVID-19 is caused by the SARS-CoV-2 virus, which was initially found in a public market in Wuhan, China, and transformed into worldwide public health emergency [24]. SARS (Severe Acute Respiratory Syndrome) causes this acute respiratory condition. Known as a positive-sense coronavirus, SARS-CoV-2 is related to other coronaviruses of the RNA B family and to bat coronaviruses that appear to mimic the symptoms of SARS.

Due to the alarming number of infections and deaths, the World Health Organization (WHO) deemed the COVID-19 a pandemic disease in March 2020 [4].According to the WHO report on COVID-19 as of 1st January, 2022, about 306,163,226 people have been affected by COVID-19 and about 5,503,704 people have lost their lives. It is spread by respiratory droplets, transmitted via air or other surfaces from one individual to another [30]. This virus survives for a long time on a suitable surface, generally several hours to multiple days. It was determined that the COVID-19 has a wide maturation period between 3 and 14 days, with symptoms such as cough, fever, dyspnea, loss of taste, loss of smell, diarrhea, etc.

Researchers and public health laboratories have used multiple diagnostic methods to detect COVID-19. Indirect tests investigate whether a host has developed antibodies against the virus after exposure. By detecting viral RNA directly, direct tests detect infections [12]. Diagnoses should be sensitive and accurate enough during pandemics in order to produce appropriate clinical decisions quickly. A few diagnostic methods have been approved by World Health Organization (WHO) and Food and Drug Administration (FDA), and some new methods are being approved during emergencies [19].

Analytical performance, throughput, batching capacity and turnaround time differ between these diagnostic methods. Additional factors that can influence the results of a method include the collection protocol, the reagents used, cross contamination risk, and the way the sample/reagent is stored, in addition to its equipment and method [33]. These factors must be taken into account when selecting an accurate and rapid diagnostic method in order to make an informed decision and take prompt public health action.

The reverse transcription polymerase chain reaction (RT-PCR) is a widely used diagnostic technique that can detect nucleotides from sources such as oropharyngeal swabs, nasopharyngeal swabs, bronchoalveolar lavages, and tracheal aspirations. Recent surveys indicate that the sensitivity of RT-PCR is not sufficient to detect COVID-19, which can possibly be attributed to the insufficient amount of viral material in the specimen and poor quality, stability, and stability of the samples [1].

There are multiple types of research being conducted to develop effective and speedy COVID-19 detection schemes. As the gold standard method to detect COVID-19 in lung CT scans, Reverse Transcription Polymerase Chain Reaction (RT-PCR) has been used worldwide. This method has been shown to be highly sensitive and approximately 95% accuracy in detecting COVID-19.

As a result of the public health emergency, an unprecedented global campaign is needed to increase testing capacity. Because of the worldwide spread of the virus, there is a large demand for RT-PCR tests [39]. Having to wait so long for results shows the limitations of this type of diagnosis on a large scale, such as the necessity for certified laboratories, trained staff, equipment, and reagents for which demand can easily exceed supply. Italy, for instance, ran out of reagents and specialized laboratories and limited swab testing to only those with symptoms of severe respiratory syndrome, resulting in an underestimation of the number of infected individuals [12]. A Chest X-ray can play a very important, non-invasive role in the preliminary diagnosis of various pulmonary disorders.

In this paper we have explored multiple COVID Detection methods and techniques and the use of modern machine learning and deep learning based technologies. We have reviewed COVID detection methods and compared their analytical efficiency, sensitivity, limit of detection, affordability, ease of use and other such characteristics. We have presented our perspective on future development and clinical testing of the SARS virus and discussed the limitations of the detection techniques.

1.1 Motivation

The field of machine learning is one of the most researched today, providing numerous applications and results in numerous fields. The main goal of machine learning is to enable computers to learn and improve without being explicitly programmed. In recent years, the data boom and affordable computing power have made AI algorithms easier to implement. The field of artificial intelligence includes many subfields, including machine learning (ML), deep learning (DL), language processing (NLP), and computer vision (CV), which are used to detect patterns, explain, and predict based on data with similar accuracy to that of humans. Using machine learning, computer programs can access data and use it to learn on their own.

In addition to being classified as a public health emergency of international concern, the coronavirus disease was declared a pandemic on 11th March 2020. As a result of the COVID-19 outbreak, worldwide efforts are underway to develop a vaccine. AI researchers are also exploring ways to combat COVID-19 [34]. In the case of a pandemic caused by COVID-19, ML and DL have the potential to mitigate its effects and identify any restrictions on their use. There are multiple areas where these new technologies and innovations are contributing to combat the pandemic. It can offer early warnings and alerts, tracking and prediction, data dashboards, diagnosis and prognosis, treatments and cures and social control. A number of research questions were presented in Table 1, which motivated us to examine the potential of ML in working with the COVID-19 pandemic. The prowess of machine learning and deep learning as applied to healthcare intrigued us [17]. There is no reputable survey that discusses these techniques for all mentioned six areas. In this present pandemic, these techniques have proven to be cost-effective and highly effective, and are needed at this moment. These AI techniques are evolving to deliver significant results in terms of their efficacy and resiliency, and they can significantly alter the healthcare industry as a result [35]. With the advent of newer deep learning models, these AI techniques have advanced significantly. It is for this reason that we decided to write a comprehensive review on the use of ML and DL techniques to counter COVID-19.

Table 1 Related reference papers

1.2 Contibutions

The main contributions of this paper are as follows:

  • Our review includes a comparison of existing state-of-the-art detection methods published by researchers worldwide with the proposed survey, as well as an assessment of the severity of the COVID-19 pandemic and its potential threats to social life.

  • Multiple ML and DL techniques are presented to address the problems encountered during COVID-19 detections and discuss their affordability, ease of use, and performance.

  • ML/DL integration and future research directions are discussed, as well as issues that remain unresolved.

  • We present a case study on ML and DL techniques used for COVID-19 detection.

1.3 Methods and materials

Our comprehensive study of COVID-19 detection techniques in this pandemic situation required a broad overview of ML and DL techniques. Therefore, we limited our search to peer-reviewed and reputable journals only. We have used Google Scholar, IEEEXplore, Sciencedirect, and Springer (Fig. 1) is order to research. Our search criteria include keywords and search strings such as “Detection techniques for COVID-19”, “Applications of Machine Learning in COVID”, “Deep Learning technologies for COVID-19”, and others [33]. We had to manually find the search string in several review papers for which it was not present in either the title or the abstract. ML and DL have applications in multiple domains, so the search string “ML for COVID” or “DL for COVID” usually gave papers, which are not directly related to our survey. We searched the strings “ML for COVID detection” or “COVID Detection using DL” to get relevant papers. The Table 1 below shows the Related Reference Papers in this field and the points covered and drawbacks.

Fig. 1
figure 1

Process of selecting reference papers

1.4 Structure of this survey

The paper is structured as follows (Fig. 2). Section 2 explains the characteristics of ML and DL techniques and their potential use in the COVID-19 pandemic. In Section 3 has a Dataset Analysis which explains the dataset and methodologies used. Section 4 highlights the Detection Techniques challenges, ease of use and accuracy analysis. Section 5 presents a research gap analysis and in Section 6 we have discussed some possible challenges and limitations. Section 7 concludes the paper.

Fig. 2
figure 2

Structure of the survey

1.5 Research questions

The paper contains answers to a few questions. The Table 2 below gives the brief idea of the questions and their answers.

Table 2 Research questions

2 Detection techniques

2.1 Laboratory tests

RT-PCR-The Gold Standard:- Molecular detection methods identify viruses by analyzing the nucleic acids present in sample samples. Real-time reverse transcriptase polymerase chain reaction (RT-PCR) is the most common laboratory method for diagnosing COVID-19. SARS-CoV and MERS-CoV can be diagnosed and monitored using RT-PCR. A variety of primers and probes are used in order to detect SAR COV-2 virus using RT-PCR [31]. Based on the PCR version, the results of RT-PCR tests can be obtained right away or within a couple of days.

A two-step or one-step RT-PCR can be used to detect SARS-CoV-2. The one-step method is more efficient, because DNA polymerase and RT are to be used together in a reaction tube, which happens to be a preferred method for RT PCR. In a two-step method, RT of RNA is conducted in one tube, and DNA polymerization is conducted in the second tube [37]. RT-PCR machines can examine one sample at a time or hundreds, depending on the format of the assay.

RT-PCR results are influenced by several factors including the sample collection, the primers and probes used, the fluorescence curve analysis, and accurate temperature control. The negative control is used to determine whether samples have been cross-contaminated, and the positive control is used to determine whether the primers, probes, and reagents are chemically compatible. For centralized laboratories, RT-PCR assays are typically run in 96-well plates, with batch reading of signals [31]. Recent research has demonstrated that as little as 5 microliters of a newly developed 384-well assay system is needed for detection of viral genomes. Sensitivity and specificity were both 100% using the high-throughput method.

Numerous studies have noted that RT-PCR assays are clinically sensitive depending on the type of specimen, the amount of virus in the swab, and the time between the time the sample was collected and when symptoms appeared. In order to increase viral load, it makes sense to wipe the nose and throat together. Based on the kinetics of SARS-CoV-2 viruses, lung specimens often show peak viral loads during the first week of illness and show downfall during the subsequent weeks. Consequently, it is important to collect samples at the best times possible to enhance sensitivity. There were significantly higher viral titers in saliva samples than in NP swabs, and even more important, longitudinally collected saliva samples showed less temporal variation in viral titer than NP swabs. It is essential to conduct a rigorous, large-scale study that estimates the shedding dynamics and their correlation across the life course of an infection to accurately estimate diagnostic sensitivity.

2.2 Blood test using machine learning

Implementation of ML model development involves four processes: imputation, data normalization, feature selection, and classification. Python is used to implement the algorithm, which is implemented with numpy, pandas, and scikit-learn libraries. Multivariate k-nearest neighbors algorithms were used for imputation, with k = 5 [3]. Recursive feature elimination algorithms were used for selection, with hyper-parameter optimization used for optimization of features. Various algorithms were evaluated for classification namely Naive Bayes, Support Vector Machine, Random Forest, k-nearest neighbors(KNN) and Logistic regression. There has been extensive evaluation of these algorithms and also proof that they are capable of achieving very high performance on a trained model and dataset and can be interpreted to some extent. All hyperparameters were automatically optimized with a grid search method using the hyper-parameters of different classification algorithms. In order to minimize the risk of over-fitting, a two-step procedure is used to select, train, and evaluate the model: The dataset is split into a training set and a testing set based on strata. Then, the hyper-parameters are optimized on the training set using a 5-fold stratified cross-validation grid search using AUC as the reference measure. To ensure repeatability of the experiment, randomization was controlled throughout the model development process [9]. AUC, Brier scores, and accuracy of the calibrated models are subsequently evaluated.

A bootstrap-based approach is used for internal-external validation in order to assess the models’ ability to generalize to a new situation when provided with a limited amount of new data. The external validation procedure, is done through training the best models found for the COVID-specific dataset on the combined dataset that contain the IOG dataset as well, which involves testing both their specificity and their ability to identify potential suspect cases. For both symptomatic and asymptomatic patients, sensitivity and specificity are evaluated separately using the combined COVID dataset and the IOG dataset. After removing the Suspect feature, the models have to be retrained, and the retrained models are evaluated on both symptomatic and asymptomatic patients separately.

2.3 Artificial intelligence

AI refers to the ability of computers to simulate human intelligence, i.e., act intelligently and solve problems in a similar way to humans. In short, machine learning is the process of allowing machines to mimic human intelligence.Using artificial intelligence, a computer can convert data from an external source to machine-understandable format, learn from it, and continue learning until it reaches a specific task or goal by making flexible adaptations. Artificial intelligence (AI) tools such as machine learning and deep learning are quite popular. Among many applications of machine learning include chatbots and predictive text, language translation apps, Netflix recommendations, and the way social media posts are presented. It also powers autonomous vehicles and medical diagnosis software using images.

ML utilizes a variety of algorithm types, including logistic regression, linear discriminant analysis, random forest, support vector machines, k-nearest neighbour classifiers, cluster analysis, and modern deep learning, reinforcement learning, decision trees, etc. In ML, different types of learning methods are used, such as supervised or unsupervised learning. However, a different type of learning method has recently been developed: semi-supervised learning. Healthcare makes extensive use of machine learning. In the area of bioinformatics it enables the solution of complex biological problems such as DNA binding prediction, RNA sequence analysis and prediction, amino acid sequence prediction and enhancer-promoter interaction (EPI) identification. A major application of ML to bioinformatics is in genomics, which utilizes ML tools and methods to obtain useful information. Gene finding technique is the most promising application of ML to bioinformatics. By using clustering algorithms of ML, it is possible to classify DNA in a detailed manner by identifying groups of individuals having similar types of genes or to what extent they possess a certain gene.

Cancer can be defined as any unusual growth of cells in the body. There are over approximately 100 types of cancer found in medical research today. Early detection of cancer can help to prevent declining patient’s health and save many lives. To detect cancer, ML algorithms such as Bayesian networks, neural trees, and radial basis functions (RBFs) can be used to analyze gene expression on patient samples. Epilepsy is a condition characterized by unanticipated recurrent seizures that occur without prior warning.The result is temporary convulsions of the whole body, as well as loss of concentration and judgment. Physical injuries may be decreased by frequent seizures, and even death may result. Based on the patient’s medical conditions, ML techniques can develop detectors capable of detecting seizure onset rapidly and accurately using electroencephalograms (EEG). EEGs are non-invasive measures of brain activity.

CNN models can be used for neuroprotiens control, movement intention decoding with the help of kinematic and EMG data. We can also estimate limb movements and robotic arm movements with the help of RNN. Automated image analysis tools that use ML algorithms have the capacity to improve image analysis quality. High blood sugar, which is caused by diabetes mellitus, has two major causes: improper production of insulin by the pancreas and the improper response of body tissues towards the insulin produced. Automated detection using CNN and LSTM of diabetes can be far better than manual process of detecting since it can give optimized and better accuracy. COVID-19 outbreaks will be greatly impacted if AI techniques are used to help with early diagnosis using X-rays, computed tomography (CT), and ultrasound (US).

2.4 Neural networks

Deep Learning Neural Networks resembles the way humans gain knowledge in the same way machine learning and artificial intelligence do. In addition to statistics and predictive modeling, data science is heavily reliant on deep learning. A CNN’s differentiable functions enable each layer to transform a volume of activations into another [20]. Input image volume is transformed into output class scores by the layers. CNNs are made up of several layers, including the convolutional layer, the pooling layer, the nonlinearity layer, and the fully connected layer. Basically, it is a method for classifying images, texts, and sounds with the help of a computer model. The accuracy of deep learning models is sometimes better than human performance. For training models, many layers of neural network architectures are used with a lot of labeled data.

The most popular type of deep neural network is the convolutional neural network(CNN) [26]. The combined power of convolutional layers and learned features makes them ideal for processing 2D data, such as images. Using CNNs, features can be extracted automatically instead of manually, thus reducing the number of features needed to classify images. These algorithms extract attributes from images directly. Computer vision tasks including object classification can be performed with high accuracy by deep learning models, since their relevant features are automatically extracted as they train using a collection of images.

The scanning of human organs in hospitals is made easier and faster by X-ray machines. It is normally the job of an expert radiologists to interpret X-ray images. Training images captured from COVID-19 patients with deep learning will be a great help for medical experts to identify them. This will be particularly helpful for developing countries where the X-ray facility is readily available, but access to an expert is still elusive. In accordance with that, one of our goals is to develop a deep neural network dubbed nCOVnet, which is capable of analyzing X-ray images of lungs and determining if a person is infected [16]. Due to its capability of displaying visual information, CNN can be applied to the detection of COVID-19, since it has already been successfully used to diagnose retinopathy, pneumonia, cardiomegaly, and several types of cancer.

3 Related technologies

3.1 Machine learning

  • CNN - DL classifiers consisting of images and videos most commonly employ convolutional neural networks (CNNs). Layers in this program perform various tasks including reducing data dimensionality and vectorizing it.

  • RNN - RNN is an updated and specifically altered version of feedforward NNAs opposed to feedforward networks, which rely on output from the previous layer, this network is recurrent.

  • K-nearest neighbor(KNN) - Among the most popular neighborhood classifiers, this algorithm assumes that items belonging to the same class are close to each other. Prediction and classification are among the possible applications of the algorithm [18].The question of choosing an optimal value of k remains to be answered.

  • Decision Trees(DT) - A tree-like structure represents the dataset in the statistical model of DT.All options are considered in the model, and the decision path can be tracked [4]. It is true that overfitting occurs when DT is used on large datasets.

  • Linear Regression - Model’s primary goal is to minimize prediction errors on testing data. It is useful in representing relationships between dependent and independent variables. The difference between multiple and simple linear regression, involving one independent variable, is that multiple linear regression requires more than one independent variable. However, only linear values can be calculated using this model.

  • Logistic regression - This method is used when the dependent variable takes the form of a continuous variable. In this case, range is exclusive from 0 to 1. Multinomials, binomials, and ordinals may be used to formulate the LR. We use LR to predict if an email is spam when the output is not in binary forms. Linear regression fails when the output is unbounded, such as when predicting whether an email is spam. In the absence of categorical data, the technique does not work, showing its high dependency on categorical data.

3.2 Deep earning

  • CNN - Convolutional Neural Network (CNN) is a Deep Learning algorithm. It takes in an input image and assigns weights/biases (learnable weights) to various aspects or objects in the image and is able to distinguish one from the other. Compared to other classification algorithms, convolutional networks require much less preprocessing [24]. While primitive techniques rely on hand-engineered filters, convolutional networks are able to learn these characteristics/filters when trained sufficiently.

  • ResNet50:It consists of 50 layers of convolutional neural networks. The network can be accessed from the ImageNet database in a pretrained state. After being trained on 224-by-224 images, this network can represent a wide range of images with rich features. ResNet are a subclass of convolutional neural networks that are most commonly used in classification of images. Their primary innovation is skip connections [2]. It is well known that deep networks are often characterized by vanishing gradients, which means that as the model attempts to backpropagate, the gradient gets smaller and smaller. Tiny gradients can render learning infeasible.

  • VGG19 - A pretrained version of the network based on the ImageNet database can be trained which is a 19-layer convolutional neural network. An image-representation network has learned rich representations for a wide range of image types. Its input size is 224 by 224. Consequently, the network has learned rich features for images of keyboards, mice, pencils, and many animals.

  • Federated Learning - The term Federated Learning refers to a decentralized or distributed form of Machine Learning. With Machine Learning, a data generated by mobile devices and laptops is trained and then consolidated it into a centralized server. Contrary to more classical decentralized approaches, this strategy stands out from traditional centralized machine learning techniques because it assumes that local data samples are uniformly distributed throughout the data set.

  • Transfer Learning-A machine can reuse its pre-trained model to learn about a new problem by applying the knowledge acquired from a previous assessment. Transfer learning increases accuracy by using the knowledge of a previous assignment [25]. A ML model which has already been trained is transferred to a new, though closely related problem through transfer learning. A number of advantages of transfer learning exist, including the fact that training is faster, neural networks perform better (most of the time), and there is no need for a large amount of training data.

4 Critical analysis

4.1 Dataset analysis

In this section, we describe the dataset and methodologies used to predict COVID-19. The science of data analysis involves examining information to draw conclusions about it in order to make informed decisions or expand one’s knowledge about the subject. The different covid-19 detection techniques datasets differ according to the technological requirements. A Dataset analysis helps future researchers to decide which dataset is suitable for their training method which saves time and gives the researchers more options and a larger range of datasets to choose from which contain not only the main distinguishing factor but other supporting data which can be incorporated into the model which helps increase the accuracy and sensitivity of the models and prevents false positives as there are more hyperparameters for the model to be trained on. There was an immediate need for large covid positive datasets for researchers to apply different algorithms and techniques on them to diagnose them faster and with more accuracy.Due to the pandemic and restrictions such large datasets were not available and only were available much later in 2021 as more voluntary trials were held and large datasets were formed [11]. Researchers had to resort to other methods to increase the training and testing data required for increasing the accuracy of the model being trained for covid-19 diagnosis. They used various techniques like data augmentation in X-Ray datasets by flipping the image, rotating it by some degree or adding small amount of distortion.This kinds of techniques multiply the dataset size. In blood test and RT-PCR test Synthetic Data Creation is used to generate a larger dataset. Researchers have curated datasets using patient data from hospitals and using those images to train and test their models.Some research teams used certified radiologists to curate their dataset to create a robust and reliable model to analyze recieved images and training the model on these datasets increases its accuracy and sensitivity. Researchers used online databases hosted on sites like kaggle etc. as there was a shortage of reliable datasets. It is bifurcated into three categories for RT-PCR Database,X-Ray Database and Blood Test Database.

RT-PCR is the golden standard and the most commonly used method for detecting COVID-19. It is also used in diagnosis of other viral infections and disease like SARS-CoV and MERS-CoV. The RT-PCR method for testing COVID-19 involves collecting a sample from a patient in the form of swabs taken from the nasal cavity or oropharynx, and dispersing it in the medium. RT-PCR results are influenced by the sample collection, probes and primers used, the use of appropriate controls, fluorescence curve analysis, and the reliability of the temperature control. RT-PCR method of detection has numerous drawbacks. Due to limited supply, test kits can’t meet the mounting demand, and the tests themselves take 1-2 days, which aggravates the spread of COVID-19.

In order to optimize these limitations, Machine Learning model is used. It is trained using RT-PCR test data from both COVID positive and negative patients. Patients with one symptom or more, as well as those with no symptoms, are included in the dataset. Multiple other features of patients are also used like pulse, age and temperature and health conditons like diabetics, cancer, asthama, fatigue, muscle pain,cough and shortness of breath are also taken into account. Due to the havoc caused by the pandemic, the dataset is a small dataset and downsampling is not possible. In this scenario, downsampling will reduce the amount of data considerably, resulting in data inefficiency and the loss of critical COVID information.

The dataset is used to train and test the COVID Detection Model. These models offer highly accurate results in a very short time period. As seen in Fig. 3, the availability of COVID data is very limited. The AI based models are much faster than RT-PCR and have high accuracy. These models can be much more refined if used on a much larger dataset and hence need to applied on a large database before clinical trials.

Fig. 3
figure 3

Comparitive analysis of dataset size of RTPCR related papers

Currently blood tests are used to detect if a person was infected by covid-19 virus in the past or not. The antibody tests and the serology tests conducted checks for antibodies in the patients to check for antibodies that might have been present in the blood from past infection from covid-19 virus. It can take up to 3 weeks for the antibodies to be detected in the blood after the infection. Antibodies can also be produced after being vaccinated which would mean that the body has immunity for the virus. These tests though are not 100% accurate.

Blood tests are also done after the detection of covid-19 virus in a person to track the health of the patient. Patients severely infected have a higher level of plasma levels of IL-6 , platelets and D-dimer [1]. Elevated levels of LDH, ferritin and CRP also high levels of neutrophils i.e. neutrophilia, low levels of lymphocyte i.e. lymphopenia and elevated serum AST and ALT levels are associated with greater disease severity and those patients need immediate hospitalization. Elevated levels of creatinine, ALT and AST levels might suggest severe cases of covid-19 and might indicate an increased risk of impaired liver and kidney functions [8]. Myoglobin, Cardiac troponin and CRP significantly increased in cases with mortality. Also increase in the CRP and decrease in the albumin can be correlated with disease progression. Refractory patients had higher levels of LDH, CRP ,NC and platelet count as compared to the general patients. Refractory patients are those who showed cases of lung abnormalities which indicated the development of the virus.

All the above mentioned indications were used to train the models used in the papers mentioned in the figure above. The models trained using the blood test dataset analyzed all the parameters and predicted outputs based on the training dataset. These models can be very helpful in not just detection but also to keep in check the infection and progression of the virus in the patient. Artificial intelligence is fast and can compare multiple parameters much more than humans can do at a time. This could help save patient’s lives and also reduce the workload of doctors and nurses. Also multiple patient diagnosis can be handled at the same time. This in return could help in mass detection of covid in places where individual detection is difficult.

Due to the restrictions imposed because of the pandemic led to no availability of large datasets. The datasets used in the papers mentioned in the Fig. 4 were low on covid positive data so they used synthetic data for training models better and increased the accuracy of the models trained. Large datasets were only available much later in 2021 when more voluntary tests were taken for research purposes and so the datasets used in these papers lacked authenticity. The datasets were synthesized and so we can not rely on them.

Fig. 4
figure 4

Comparitive analysis of dataset size of blood test related papers

Covid-19 prediction using deep learning on X-Ray data is the quickest way to detect covid-19 in patients. A chest x-ray can be taken and the results can be obtained within 15 minutes of imaging [21]. This way provides hospitals and healthcare workers to stay on top of situations as diagnosis of patients can be sped up exponentially using these techniques. Using of chest x rays to detect pneumonia and other diseases affecting the lungs have been previously diagnosed using the same x-ray image analysis using deep learning. The use of transfer learning can be made to lessen the training time of the networks and cope with the lack of reliable and large datasets cite transfer learning paper. Many of the papers use different image augmentation techniques to increase the training and testing dataset size. Data augmentation techniques help to increase the accuracy of the model being trained when there is no reliable data available in abundance.

The models trained on x-ray datasets have very high accuracy rate but require quite a sophisticated data input and take a longer time to be trained.cite mini covid net paper [30]. This paper goes over a point of care testing deep learning algorithm which does not require a long training time and this can significantly reduce the time required for diagnosing a patient and getting them the appropriate help required and can help prioritize the more severe patients.This type of point of care testing can also help different regions battling various different strains of the same virus and training them on the database of the patients affected with the same virus would further improve the accuracy of the model.

In the X-Ray dataset analysis we go over the different sizes of datasets used by varying deep learning papers. In Fig. 5 we see that the papers have been trained on a limited dataset which does not give confidence to organizations and governments to implement these detection techniques on a large scale in hospitals and testing centers.

Fig. 5
figure 5

Comparitive analysis of dataset size of X-Ray related papers

These techniques can be used to target different kinds of virus in the future or another disease having similar testing conditions or symptoms, this can lead to better management of future pandemics so that there are fewer deaths.

The RT-PCR and blood test datasets have been compiled into Table 3. These help researchers find newer datasets and can train their models on a larger and more widespread database that helps increase generalization which consequently increases sensitivity and specificity of the model. Analyzing such datasets saves researchers time in searching for different datasets having similar parameters to their model.

Table 3 Collection of RTPCR and blood test datasets

Table 4 depicts the various models trained using X-Ray and LUS images and compares their accuracy and various other parameters it also lists the limitations of the referred paper so that newer researchers can avoid those mistakes.

Table 4 AI models using X-Ray datasets

4.2 Ease of use/affordability

RT-PCR is the current golden standard for COVID-19 detection. It offers highly accurate detection of SARS-CoV-2. It requires medically trained personnel to use the testing kits in order to detect COVID. It has return times between 2-48 hours. Availability of testing kits is a major issue for RT-PCR especially in developing and under-developed countries. Mass detection is not possible, only one detection can be done at a time. During a major breakout, there is a dire shortage of testing kits and with the long turnaround times, the patients are left vulnerable. Hence alternative means are required to detect covid during these emergencies. These shortcomings of RT-PCR pave the way for ML and DL based models for a cheaper and quicker solution.

Chest X-rays and CT scans of Lungs are used to detect pneumonia caused by COVID which can help in the diagnosis of COVID [6]. It has been proved very useful in countries where there was an imminent shortage of testing kits. X-ray and CT scan machines are vastly available in hospitals across the globe and hence facilitate this method of detection. Medical personnel are required to analyze the image and diagnose the disease [13]. The diagnosis done by personnels vary based on individuals and don’t provide consistency. Also, mass detection is not possible for personnels and this method has high logistic requirements in terms of machinery required. In order to tackle these problems, Machine Learning and Deep Learning is used. Machine Learning offers a cheap and fast detection technique.

Blood test based ML models use multiple blood test parameters like WBC, LDH, CRP, Platelets, D-Dimer etc. These parameters help in training the model in order to analyze and detect COVID. It also helps in analyzing the severity of the infection in the patient. X-ray based ML and DL models are also used. Image processing is used to train these models. These models are more accurate and accommodate for human errors. They offer a very fast and highly accurate technique. After the training of the models, it can be used by any person who has know-how of basic computers. Mass detection is possible as the images can be feeded into the model and results can be displayed in a very short time period.

4.3 Analysis of detection techniques

This section does a comprehensive study of different AI based papers and comparing them on different aspects of their research and analysis. In the Tables 5 and 6 An accuracy analysis helps other researchers refering the paper to directly see how different optimization and loss reducing techniques are used and how they affect the accuracy and sensitivity which gives them an idea how to implement those same techniques in their own research.

Table 5 Analysis of detection techniques -1
Table 6 Analysis of detection techniques -2

5 Research gap analysis

Absence of uniformity across AI-based COVID detection algorithms is a significant research gap. AI can be applied in a variety of ways, such as image analysis, machine learning methods, and natural language processing, to detect COVID. Nonetheless, it can be challenging to compare and rank various approaches in the absence of a consistent methodology. A review of the current literature could look for any gaps or contradictions in the techniques applied in earlier investigations.

The lack of data for the development and testing of AI-based COVID detection algorithms represents another research gap. Numerous previous research have relied on tiny datasets or information that might not accurately reflect the community as a whole. As a result, it’s possible to exaggerate or underestimate how well AI-based COVID detection systems function. In order to find any gaps or restrictions in the data utilised in earlier studies, a review could look at the data sources that are now available.

The models trained have an accuracy of 85-90% which is comparatively less accurate than the current rt-pcr test done for covid detection so the accuracy of the model must be increased for it to be applied by the countries as an alternative of rt-pcr for covid detection. Also the requirements for the model to be trained and run is very specific and so may not be possible for every hospital to have those specifications and so may not be possible to implement it.

ML models presented in the paper have a lower accuracy than the gold standard for the detection of covids, which is RT-PCR. It requires more research and better models to be implemented to replace the method currently used. The study only uses x-ray images as the database, other medical inputs are not utilized.

There is no evidence of the efficiency of AI-based COVID detection systems in real-world contexts, despite the fact that many of them have been created and tested in lab settings. Any gaps in the evaluation of AI-based COVID detection methods in actual environments, such as clinical trials or community-based testing programmes, could be found by reviewing the body of existing literature.

Lastly, the fact that many AI-based COVID detection approaches lack openness represents a significant research gap. Certain AI-based systems may employ challenging to comprehend or interpret complicated algorithms or models. As a result, evaluating the precision and dependability of these systems may be challenging. The application of AI-based COVID detection approaches in clinical practise could be examined in a review by looking at the existing research to find any gaps in transparency.

6 Challenges

AI is a rapidly developing field and new problems arise each and every day. Researchers have to find new and innovative ways to work around these problems and make a reliable and robust model. The most common challenges faced while training AI models is that to increase their accuracy, sensitivity and specificity they require a large training dataset. Owing to the pandemic there was a shortage of properly labeled and reliable datasets which caused researchers to train their model on a smaller dataset and using data augmentation techniques to increase the size of the dataset, but these techniques cause the lack of generalization in the model.

The models used are very memory intensive and can be further optimized by changing the hyperparameters. The training dataset could be larger. A larger database of COVID19 patients is necessary to increase the study’s accuracy and avoid overfitting due to the lack of public images of X-Rays and CT Scans available for the study. Unbalanced databases have a certain drawback like showing high accuracy. In order to enhance the sensitivity of COVID-19 detection, the network design could be improved. Since there are fewer COVID-19 chest X-ray images available, training deep learning models from scratch becomes more challenging [32]. It has a lower accuracy as compared to heavier CNN methods and this by itself would not be the conclusive test. A more robust system will integrate other features such as body temperature, details regarding chronic diseases such as diabetes, blood pressure etc. in addition to the image data. By adding additional layers for a larger data set, it is possible to make the architecture of the CNN network more complex and improve the ability to learn highly abstract features, thus reducing the likelihood of overfitting [27]. There is no clinical study behind the suggested method. Therefore, it is not possible for it to replace medical diagnosis and a more detailed investigation is needed to be conducted with a larger dataset.

Using an unbalanced database tends to increase the accuracy. In order to adequately train the model, the radio-graphic response to different abnormalities such as tuberculosis, pneumonia and influenza must be complex, confusing the classifier and limiting the system’s diagnostic accuracy. Massive training data must be acquired to satisfy the model. Synthetic data was created for more covid positive patient’s data which might not be accurate. The model was designed on a CPU and if we need to apply it on a large scale a faster processing system will be required. COVID-19 is currently diagnosed using RT-PCR, which is the gold standard for diagnostics [29]. Accordingly, the ML-based models achieve a performance that is comparable to RT-PCR, but still inferior to RT-PCR. When severe data heterogeneity exists among clients, the standard configuration of federated learning introduces a high cost of communication and cannot guarantee model performance. Severe data heterogeneity can occur when utilising AI-based Covid identification approaches because of variations in patient demographics, the severity of the disease, and comorbidities, among other things. For instance, because to variances in the patient population, testing procedures, and healthcare infrastructure, a machine learning model trained on data from one hospital or nation may not generalise well to data from another hospital or country. The sample sizes are often low, samples are frequently heterogeneous, and performance matrices may differ. As a result, clinicians have difficulty determining which algorithm will be most effective for a particular patient. Since the datasets used are very small and so the model reliability is in doubt. We will require a larger dataset to analyze and implement the model on a larger scale.

7 Conclusion and future scope

AI based detection techniques are the future of the medical industry. These techniques facilitate the use of newer imaging techniques and use those techniques to detect any abnormalities in the patient data. AI techniques can be used to incorporate various factors other than the key parameters which further improves their accuracy. In this paper we have explored the various covid detection techniques and their advantages and drawbacks and how these techniques were limited by various factors and how these techniques can be further improved using different datasets and parameters. We took inspiration from similar review papers depicted in Table 1 and used them to create a more analytical review and so that we don’t repeat the same mistakes. With ML and DL, we can effectively combat the effects of Coronavirus. They provide early alerts, detections, early warnings, and we can track the pandemic with their high accuracy predictions of the numbers in the coming days. As part of the decision-making process, these technologies can collaborate with medical personnel, while providing consistent treatment. In addition, it can detect infections, symptoms, and patterns within the patient, so hospital resources can be judiciously allocated. ML can be easily scaled up with the help of existing surveillance technologies which ensure that the government rules and norms are being followed. ML can also be used for vaccine development, which can be accelerated. This paper examines the different COVID-19 detection techniques and the possible role of ML and DL in the detection process so as to combat the pandemic. In every domain, artificial intelligence is deeply impacting our daily lives, and one can expect that it will hold massive potential in our fight against the once-in-a-lifetime pandemic. A detailed analysis of the detection techniques has been presented, including comparisons of their potential shortcomings, implementations, and future potential. Despite still being in the genesis phase, research applied to AI for the pandemic is certainly on the rise. Given the increasing use of AI in recent years, it is already evident that AI will contribute significantly to the future experience of managing COVID-19.

AI can be applied to create models that can do advanced prediction of covid-19 and other similar diseases. Models created for covid diagnosis can be used for detection of similar diseases and transfer learning can be applied on those models to reduce their training time. CT and LUS based detection will be the future of diagnosing as the models become more accurate as more data is provided. The principal drawback of majority papers in this domain is the lack of properly labeled and reliable datasets. In the future as more such datasets are available they can be used to generalize the models such that it gives accurate results even under unideal conditions. AI can be applied in multiple use cases like early detection and diagnosis of the infection. Blockchain can be used for contact tracing of people by identifying clusters and hotspots. AI can also be used for projecting cases and mortality. AI can assist in identifying the most susceptible places, people, and countries so that appropriate steps can be taken. Another major concern that AI can help alleviate is reducing the work of frontline healthcare workers and can help predict and prevent future pandemics. The future of AI led diagnosis of diseases is very bright and it only gets better each day as new technologies are discovered.