Keywords

1 Introduction

Coronavirus is a significant virus that causes illness in both animals and humans. It is a family of RNA viruses that is medium-sized and has a viral RNA genome largest of all known. A new, so far unknown coronavirus, SARS-CoV-2, the cause of COVID-19 disease, belongs to the same subgroup as MERS CoV and SARS-CoV. Coronavirus is known to common people as COVID-19 was declared as a pandemic by WHO(The World Health Organization), on March 11, 2020 [1]. It has forced the world into a mandatory lockdown.

The spread of this virus in the human race has caused 3.35 M deaths in the world as of May 2021 and has brought the economy to a standstill. It has also introduced several challenges worldwide. To date, the mode of transmission of SARS-CoV-2 is unresolved and is a topic of debate among researchers. Most researchers believe that it might be identical to SARS, which transmits through in-person contact or unsanitized surroundings in the form of aerosols and droplets. Studies have accentuated that patients with pulmonary symptoms are at higher risk of transmission [2, 3]. However, studies proved that it is also viable from asymptomatic patients [4]. Therefore, concluding that COVID-19 can spread via symptomatic as well as asymptomatic patients. The major task in fighting COVID-19 in most countries is to find asymptomatic patients who might be potential carriers of coronavirus. Currently, widely used methods for the diagnosis of COVID-19 are RT-PCR (Reverse Transcription-Polymerase Chain Reaction) and X-ray or CT scans. Since X-rays require a chest scan at a well-equipped medical facility and are quite expensive, RT-PCR is more widely accepted. However, according to study, this testing is not scalable and sometimes inaccurate  [5]. It is also costly, and most countries have faced difficulties buying more test kits. Thus, in near future, there would be a need for an alternate testing method that is simpler, unintrusive, lab-free, and less expensive. Such a method should address all the limitations of current preliminary diagnostic techniques. It must also be based on sound science and identify at-risk individuals effectively.

This research proposes a solution which is a deep neural network that recognizes the differences between COVID-19 positive and negative Coughs using audio classification techniques. It takes input as raw audio files and provides a diagnosis of whether that cough comes from a COVID-infected individual.More precisely the contributions of this research paper are as follows:

  • It provides a pre-screening tool for diagnosis of COVID-19 based on deep learning (AI) ubiquitously available to everyone. Its low cost, rapid results and ease of access to everyone makes it a unique solution which can be employed in offices and various institutions as a pre-screening for entry. It can be used as aiding tool to increase the diagnostic capability and devise a treatment plan in areas where adequate supplies, healthcare facilities, and medical professionals are not available.

  • We increased our dataset upto 5 times by leveraging data augmentation techniques on the open-source cough audio data set by virufy. Thus, illustrating a potential way to overcome the problem of overfitting in machine learning models due to shortage of dataset.

  • The research uses features extracted from samples using sound processing techniques. The research constructed four models by using two main approaches i.e., Time Series waveform approach and Amplitude waveform approach. In the time series waveform approach, we extracted MFCC’s which were fed to MLP, CNN and RNN with LSTM. Whereas in amplitude waveform approach we extracted the features from the flatten layer of VGG-19 which were then fed to SVM. Results shows that out of all these four models, MLP was most successful in classifying the COVID-19 positive and negative cough with an accuracy of 96%. Thus, showing that time series waveform approach was able to learn the robust features and was able to generalize classification better in comparison to the Amplitude waveform approach.

  • Were able to successfully fine tune multi-layer perceptron to such an extent that it outperformed some of the existing literatures [6, 7].

  • Portraying several future directions for our analysis and voice-based diagnosis in the context of COVID-19, which could open the door to pre-screening of COVID-19 and tracking the impact of COVID-19.

2 Background

The primary reason behind the intractability of COVID-19 is that there is a significant delay between infection and diagnosis. Two main types of COVID-19 diagnostic techniques: Laboratory-based testing and Radiography testing.

2.1 Laboratory-Based Testing

Laboratory Testing can further be categorized into two kinds: immunoassays and nucleic acid or molecular tests. Immunoassay tests discern virus-associated proteins whereas Nucleic Acid tests or molecular tests discern the genetic code of the virus. In comparison to Immunoassay tests, Nucleic acid tests are sensitive to early detection and for that reason, they are widely being used during this pandemic. The above tests often depend upon classical technologies one of which is RTPCR(Reverse transcription-polymerase chain reaction) [18]. To perform laboratory-based testing samples were obtained with throat swabs, nasopharyngeal swab, deep airway material, or sputum. Even though this technique is quite sensitive in the early detection of COVID-19, however, there are certain limitations to this technique:

  1. i

    Geographical and temporal factors limit the availability of testing in various countries.

  2. ii

    To fulfill the massive time-sensitive demand, it leads to scarcity of clinical testing and increases their cost.

  3. iii

    The need for a personal visit to the medical facility. Such a visit exposes many segments of the community to coronavirus. This can be a major obstacle, according to the study, the aerosol stability of COVID-19 ranges from three hours up to one week on different planes making it highly stable and hence contagious [8].

  4. iv

    Many reputed newspapers recently highlighted that the turnaround time stretched to 6–7 working days in a few countries due to laboratories being overflowed with COVID tests. As a result, the virus might have already been transmitted to many, by the time a patient is diagnosed and his treatment starts [9, 10].

  5. v

    Often medical staff are at higher risk of infection due to these in-person testing techniques. Failure to secure our physicians can further lead to biomedical shortages and increase stress on the already distressed paramedical staff.

  6. vi

    To protect others from potential exposure, many countries like India have also approved at-home sample collection under the guidelines of ICMR [11]. However, once a patient collects a nasal sample, they need to put it in a saline solution and ship it overnight to a certified lab authorized to run specific tests on the kit. Hence, this approach also introduces delays and could compromise the quality of samples if the sample is stored for too long.

2.2 Radiography Testing

Experts urge that we need more and faster testing to control the coronavirus and many have suggested that Artificial Intelligence (AI) is the solution. According to the study, multiple diagnoses of COVID-19 in development use AI to quickly analyze X-ray or CT scans have shown that in comparison to laboratical tests, radiographic tests provide sharpened sensitivity [12, 13]. In order to manage coronavirus, a Thoracic CT scan - an optional imaging modality - can play a crucial role. This type of CT scan is an important aspect of COVID-19 diagnosis as it has higher precision. To produce high-resolution medical images, firstly X-rays from the patient’s thorax cavity are picked up by the radiation detection tools, further, the radiographs generated are remodeled to form the medical images. One should look out for certain patterns in the thorax cavity, which might reveal different symptoms. This is examined by a radiographer, or when integrated with the AI-based analysis of the image, may detect COVID-19 with much higher specificity. This might be more efficient than that of a laboratical test such as rRT-PCR. Promising results were shown by study, it was calculated at a 95% confidence interval, having high precision and lower recall of 94% and 37% respectively for a diagnostic test based on radiology [14]. However, these techniques require scanning the chest in a well-equipped and expensive medical laboratory. So, indirectly this method also does not solve the problems faced by office-based tests as accentuated above.

2.3 Cough-Based Testing

Many kinds of research, have been carried out, where various prognostic tools for examination of respiratory infections have been presented which are self-regulating [15,16,17]. They have used various deep neural networks such as Convolutional Neural Networks (CNNs) to recognize coughs within natural noise and to determine various diseases such as Bronchitis, bronchiolitis, Asthma, COPD, etc. depending on their distinctive cough sound features. Although cough is a frequent medical symptom in many pulmonary diseases, study has demonstrated that depending on different conditions and locations of the underlying irritants, cough from various pulmonary diseases has unique characteristics [7]. Many types of studies have been done, which show that changes in the character of a coughing sound can indicate conditions of lung disease [19, 20]. Pathological situations arise as a result of certain conditions such as obstruction, restriction, and integrated patterns. Researchers have made numerous efforts to improve the mechanism of objective classification of coughing, to classify different respiratory infections. Isolation of the cough audio signal helps to distinguish between Covid-19 positive and negative cough based on these features. The analysis of recent neurological symptoms shown by COVID-19 patients developed a link between the brain and COVID-19. This led MIT researchers to evaluate their Alzheimer’s biomarkers for COVID-19 diagnosis. To detect Covid-19 coughs, they primarily used vocal cord strength, lung performance, sentiment, and muscular degradation in the human body [21] (Fig. 1).

3 Methodology

3.1 Proposed Architecture

Fig. 1.
figure 1

Proposed architecture

3.2 COVID-19 Cough Dataset

In medical research, finding the right amount and standard data is a difficult task. The dataset used in this study was taken from various sources and combined, COVID 19 cough samples were taken from the virufy open-source audio dataset [22]. The dataset consists of 121 sound segments which are digital audio files in .mp3 format out of which 48 are COVID positive and 73 are negative. Within the dataset, out of three, two relevant discrete attributes for the respective domain were selected as shown in Table 1. The cough audio samples were converted from .mp3 format to .wav format. To ensure consistency all over the dataset, preprocessing of three major sound properties(Audio Channels, Sample Rate, and Bit-depth) was done. The audio channels of the cough samples were integrated into mono channels and the sample rates were modified to the default sample rate of 22.05 kHz. In addition to this, in order to remove the discrepancy in bit depth, the value of each audio file’s average amplitude was called down to range between −1 and 1.

Table 1. Selected attribute list from the dataset.

3.3 Data Augmentation

Some domains have limited access to large data, such as medical image analysis or biomedical audio analysis. As a result, the dataset is not readily available and is quite small in size. This can lead to a problem known as overfitting. Overfitting refers to an event in which a network masters a function with very high variations to the maximum level at which it degrades the performance of the model on unseen data. One of the methods to resolve this problem is data augmentation.

Data Augmentation includes many strategies that improve the diversity and quality of data available for training models so that Deep Learning models can be built on it without facing the problem of overfitting. Audio augmentation algorithms are used to generate synthetic audio data. In this study noise injection, shifting time, changing pitch, and speed were applied to the dataset using librosa (library for Recognition and Organization of Speech and Audio). This provides an easy way to manipulate pitch and speed while a Numpy python package was used to handle noise injection and shifting time. As a result, we were able to increase the dataset by 5 folds.

3.4 Feature Extraction

Past studies have showed that the acoustic of cough sounds may carry important information related to diseases [16]. For extracting these features, in this study two approaches are used. The first one is by extracting MFCC (Mel Frequency Cepstral Coefficient) from Audio Samples. It has been scientifically proven that humans are more efficacious at identifying minute changes in a speech at lower frequencies. Thus, to leverage this property one can use MFCC’s i.e., Mel frequency cepstrum coefficients. The MFCC converts the standard frequency to the Mel Scale using Eq. 1. It takes into account the human perceptiveness for sensitivity at appropriate frequencies and is therefore suitable for audio classification and sound processing. Mel scale equation is given below:

$$\begin{aligned} Mel(f) = 2595 log(1 + (f/700)) \end{aligned}$$
(1)

An audio signal’s power spectrum, which is short-term, is represented using the Mel frequency cepstrum (MFC). The first step for obtaining MFC is Fourier transformation. On taking the log of the magnitude of this Fourier spectrum as shown in Fig. 2, and then performing cosine transformation to obtain the spectrum of this log, we observe a crest wherever there is a periodical element in the original time signal [23]. MFCC’s are emanated by the cepstrum visualization of sound samples. They are coefficients that altogether form the MFC. The study used the librosa python package to calculate a series of 40 MFCCs for each sample as shown in Fig. 3 and stored it in a pandas data frame.

Fig. 2.
figure 2

Fourier transformation of negative and positive cough samples.

Fig. 3.
figure 3

Mel frequency cepstrum of negative and positive cough samples.

The second approach was extracting important features from the last flatten layer of the VGG-19 model. After that, constructing the VGG-19 model, ImageNet images of size 64 * 64 were fed for pre-training. After this, the NumPy array of pixel values was created by converting the PIL image object. Next, with dimensions of [samples, rows, columns, channels], it was expanded to the 4D array from the same 3D array. According to the VGG19 model, pixel values need to be changed. After this, all we need to do is to extract features.

In the VGG19 model as shown in Fig. 4. The last layer (1000-dimensional) is removed and the flattened layer results in a 4096-dimensional feature vector representation of an input image. After extracting these features, a 60–40 train test split was performed and then fed into the models.

Fig. 4.
figure 4

VGG-19 architecture [24]

4 Model Architecture

Since the introduction of Neural Networks (NN) for pattern recognition, they have outperformed the results obtained with traditional algorithms. For instance, in the system for urban sound classification conducted, the performance of an SVM was compared with different configurations of neural models like a deep neural network (DNN) a recurrent neural network (RNN), and a Convolutional Neural Network (CNN), obtaining better results using a CNN or a DNN than using an SVM or an RNN [25]. Keeping this in mind, this research used 3 different configurations of neural network and SVM. In the end, the results of each model were compared and the best model was chosen.

4.1 Multilayer Perceptron

Multilayer Perceptrons, or MLP for short, is a long-established neural network. A combination of multiple neurons forms a multilayer perceptron. The feeding of data takes place at the input layer which is then processed by the hidden layers. These hidden layers are used to increase the level of abstraction. After the processing of data from the hidden layers, the output layer gives us the final predictions. The study used Data Augmentation (noise, shift, and stretch) to increase the audio dataset in order to overcome overfitting. MLP can be constructed using Keras and Tensorflow backend. The model built in this research was sequential in nature and consisted of four layers to increase the level of abstraction. All the four layers - input layer, two hidden layers, and an output layer are of dense type, which is the standard type in most of the cases. The number of nodes comprised by each of the three layers including input and hidden layers were 256, 128, and 64 respectively with an activation function ReLU and a dropout value of 25%. ReLU has proven to perform extremely well with neural network frameworks, it is explained further more in Appendix A.2. For better generalization in models, dropout is used which randomly excludes nodes from each epoch which in turn decreases the chance of overfitting. Finally, the output layer has 2 nodes which indicate the number of class labels with softmax. Softmax is the activation function used in the output layer, explained further in Appendix A.1. Softmax transforms the results in the form of probabilities, due to which it is highly used with various machine learning models. The model then, based on the highest probability, classifies the cough into COVID-19 positive or negative.

4.2 Convolutional Neural Networks

Another Deep Learning algorithm implemented in this study is Convolutional Neural Network (CNN). It can take an image as input, allot significance to the various elements in the image, and be able to distinguish one from the other. As a precautionary measure, each recording of the input cough, processed with the MFCC package, was divided into 6-second audio clips and was padded as required. The study used the Convolutional Neural Network again with Keras and TensorFlow as a backend. It is a sequential model that comprises of four Conv2D convolution layers out of which two are dense layers. A pooling layer of the MaxPooling2D type is linked with the final convolutional layer. The pooling layer reduces the parameters as well as the requirements for subsequent computation. This in turn reduced the dimensionality of the model. As a result, it shortens the duration of the training and reduces overfitting. The Max Pooling version has taken the greatest size possible of every window. For convolutional layers, the ReLU activation function was used, it is explained further more in Appendix A.2. A dropout value of 50% after the final convolutional layer is applied. The output layer has 2 nodes (number of labels, positive and negative) which are the same as the number of possible classifications. Softmax is the activation function used in the output layer, explained further in Appendix A.1. Softmax transforms the results in the form of probabilities, due to which it is highly used with various machine learning models.The model then, based on the highest probability, classifies the cough into COVID-19 positive or negative.

4.3 Recurrent Neural Networks with Long Short-Term Memory

Recurrent neural network (RNN) is a category of neural networks that help in data sequencing. Based on feedforward networks, RNNs show a similar mechanism of action as in the human brain. To put it simply, there is no alternative algorithm that can produce predictable results in sequential data as accurately as a recurrent neural network can. The model used a sequential model, consisting of two LSTM layers, with four Time distributed layers. All LSTM layers consisted of 128 nodes. After the final LSTM layer, we used a Dropout of the value of 50%. The model has four Time Distributed Layers of dense type with 64, 32, 16, and 8 nodes respectively with an activation function as ReLU (Rectified Linear Activation), it is explained further more in Appendix A.2. The output layer has 2 nodes (number of labels, positive and negative) which are the same as the number of possible classifications. Softmax is the activation function used in the output layer, explained further in Appendix A.1. Softmax transforms the results in the form of probabilities, due to which it is highly used with various machine learning models. The model then, based on the highest probability, classifies the cough into COVID-19 positive or negative.

4.4 Support Vector Machines

Support vector machines or also known as SVM, come under the category of data mining techniques that are used for both classification and prediction. It is able to generalize between two different classes. After providing the SVM model set of labelled training data for every category, it can classify the new text by checking the hyperplane that is able to distinguish between the two classes. After extracting features from the VGG-19 flatten layer as explained in Sect. 3.4. A 70–30 train test split was performed and then fed into a LinearSVM for classification.

5 Results

Predictions generated by models were expected to generalize well and could effectively produce the appropriate category label or data classification of previously unknown data. The effectiveness of the classification model was assessed based on the number of precise and false predictions observed by various models implemented on the unseen database. Accuracy, precision, and recall were the three evaluation metrics used which assess the nature of predictions made by the machine learning models developed in this research.

5.1 Accuracy

Accuracy is a measurement of the approximate level of quantity rather than the actual value of a quantity. It can be computed from the confusion matrix using the equation mentioned below (Table 2).

$$\begin{aligned} Accuracy = \frac{True Positives + True Negatives}{True Positives + True Negatives + False Positives + False Negatives} \end{aligned}$$
(2)
Fig. 5.
figure 5

Overall accuracy is achieved by the models in graphical representation.

Table 2. Overall accuracy is achieved by the models in tabular representation.

The Fig. 5 shows that Multilayer Perceptron and Convolutional Neural Network performed better than the rest of the models with an overall accuracy of 96% and 86% respectively. SVM performed fairly decent with 81% accuracy whereas Recurrent Neural Network was not able to generalize well and had an accuracy of only 68%.

5.2 Precision

In pattern detection, data retrieval, and categorization (machine reading), precision is the ratio of relevant instances among the retrieved instances. Precision is also known as a positive predictive value. In this study, that would be the proportion of patients who were positively identified with COVID-19 in all patients who actually had it. It was computed using the equation given below.

$$\begin{aligned} Precision = \frac{True Positives }{True Positives + False Positives} \end{aligned}$$
(3)

The precision of each model achieved in both negative and positive classes in this study was recorded in Table 3.

Fig. 6.
figure 6

The precision is achieved by the models in graphical representation.

Table 3. The precision achieved by the models in the tabular representation.

Higher Precision relates to lower false-positive rates. Figure 6 shows that Multilayer Perceptron and Convolutional Neural Networks have lower false-positive rates and are able to classify covid positive patients very well with a precision of 93% and 87% respectively. RNN has a higher false-positive rate and is prone to false alarms. All the models have a lower false-negative rate and are able to classify non-covid patients very well.

5.3 Recall

The recall is the measure of our model that accurately identifies True Positives. It is also known as the sensitivity of the model. Therefore, in all patients with actual COVID-19, recall tells us how many did the model accurately identified as COVID-19 positive. It can be computed using the following equation:

$$\begin{aligned} Recall = \frac{True Positives }{True Positives + False Negatives} \end{aligned}$$
(4)

The recall of each model achieved in both negative and positive classes in this study was recorded in Table 4.

Fig. 7.
figure 7

The recall is achieved by the models in graphical representation.

Table 4. The recall is achieved by the models in tabular representation.

Higher Recall relates to higher true positive rates. Figure 7 shows that Convolutional Neural Networks and Support Vector Machines have higher true positive rates for class positive. CNN and SVM correctly identify 90% and 87% of all the positive cases respectively. Multilayer Perceptron and Convolutional Neural Networks have a higher specificity. RNN can only identify 79% of all the positive cases and 61% of all the negative cases.

6 Conclusion

The Trace, Test, and Treat strategy has shown that it is necessary for governments to be able to effectively track the spread of the disease, isolate infected people. This helps in flattening the curve of infection successfully. However, most countries are not able to do enough rapid tests; which is why the alternative proposed can be very helpful. This paper presents an ML model for the initial diagnosis of COVID-19 with cough samples. On the basis of performance evaluation parameters, various models used in this study were analyzed. This analysis revealed that the Multi-Layer Perceptron outperformed with an accuracy of 96%. Convolutional Neural Networks and Support Vector Machines, on the other hand, have performed fairly well in terms of accuracy. Higher precision and lower recall give an extremely accurate result, but it then misses a large number of difficult instances to classify which can’t be ignored in COVID-19 diagnosis. Thus, there is a need for models having higher precision and higher recall at the same time for improved generalized classification.

The results show that precision and recall of both Multi-Layer Perceptron and Convolutional Neural Network yielded somewhat comparable results. On the other hand, Recurrent Neural Network and Long Short-Term Memory were not able to generalize well on COVID-19 cough samples due to higher false-positive rates and lower true positive rates.

Overall, Multi-Layer Perceptron was able to generalize well with a higher sensitivity, ensuring low false alarms. These results promise that AI can be used in the clinic and at home as a support system for physicians and the general public in the early detection of COVID-19. It may play an important role in medical diagnosis. This significant achievement supports extensive testing for COVID-19 even in areas where health facilities are not readily available. As a result, it helps to reduce the burden on paramedical staff.