Keywords

1 Introduction

Currently, Covid-19 is detectable with Reverse Transcriptase Polymerase Chain Reaction (RT-PCR), which detects presence of genetic fragments of SARS-Cov-2 within secretions from nasal and pharyngeal epithelial mucus membrane. Employed techniques of RT-PCR and immunoglobulin presence detection methods have their own limitations of detection within a specific time period. Prior to detection through RT-PCR, no method is available to assess Covid-19 infection during incubation and after the onset of symptoms. Consequently, a high transmission rate has been reported and needs to be reduced for effective containment.

At present, Reverse Transcriptase Polymerase Chain Reaction (RT-PCR), which finds the presence of genetic fragments of SARS-Cov-2 within secretions from nasal and pharyngeal epithelial mucus membrane, is currently the only method for detecting Covid-19. The detection thresholds for the RT-PCR and immunoglobulin presence detection methods that are used each have their own restrictions. There is no technology available to evaluate Covid-19 infection during incubation and after the start of symptoms prior to identification using RT-PCR. Since a high transmission rate has been noted, it must be decreased for containment to be effective.

Clinical manifestations of SARS-Cov-2 [1] appeared variable as compared to influenza. Symptoms of Covid-19 also vary slightly from region to region. Abdominal symptoms were more frequent in the USA than China. Asymptomatic, mild, and severe symptoms were observed in various studies. Asymptomatic or milder cases did not seek medical intervention; mild symptoms included a temperature >37.5 °C and dry cough initially and could develop to moderate symptomatic cases. Fever, cough, abdominal discomfort, and deranged blood biomarkers were recorded in moderate cases. Severe cases presented with shortness of breath, dyspnea, and tachypnea and required mechanical ventilation. Persistent cough, fever, and fatigue were associated symptoms of an underlying pathology or pre-existing pathology not restricted to cardiovascular issues, hypertension, liver compromise, and diabetes. Blood pO2 levels decreased. Blood biomarkers developed lymphopenia, thrombopenia, and elevated aminotransferases in moderate and severe cases. White blood cells deteriorated in severe cases and required mechanical ventilation. Persistent fever and characteristic consistent coughing—initially dry for several days followed by a productive cough—are the main features in patients with pre-existing respiratory infections; a few symptoms were variable with geographical regions.

In comparison to influenza, SARS-clinical Cov-2’s symptoms seemed more varied. The signs and symptoms of Covid-19 also differ marginally by locale. The USA experienced more abdominal symptoms than China did. In numerous researches, asymptomatic, minor, and severe symptoms were noted. A temperature over 37.5 ℃ and a dry cough were minor symptoms that could progress to moderate symptoms in individuals that were asymptomatic or had milder symptoms. These cases did not seek medical attention. In moderate cases, symptoms such as fever, coughing, stomach pain, and abnormal blood biomarkers were noted. Tachypnea, dyspnea, and shortness of breath were symptoms of severe instances, and mechanical ventilation was necessary. The signs of an underlying pathology or pre-existing pathology, including but not limited to cardiovascular problems, hypertension, liver damage, and diabetes, included a persistent cough, fever, and exhaustion.

In the current study, it is proposed a novel method for the early detection of Covid-19 using Cough spectral analysis with discrete wavelet transform (DWT) and deep convolution neural network (DCNN). The rest of the article is organized as follows. Section 2 reviews the literature relevant to the current area of research in Covid detection. Section 3 defines the problem and describes the proposed system. Section 4 elaborates the implementation methodology, followed by results and discussion in Sect. 5. Finally, Sect. 6 concludes with the salient observations and challenges ahead for further work.

2 Literature Review

Yu-Ting Shen and Hui-Xiong Xu [2] proposed in ancient there are currently different technologies that may be utilised to improve the usage of the telemedicine system to improve and improve traditional public health techniques to cope with the COVID-19 outbreak. Because of that sophisticated technology, the COVID pandemic may create a compelling “case” for incorporating the potential advantages of telemedicine into real-world therapeutic activity. Telemedicine provides a fantastic chance to take use of new technologies while maintaining a constant focus on the object and efficiency. Ilana Harrus, Jessica [3] has proposed the power of Deep learning against the covid detection to be seen and ignored. AI applications have evolved from pre-epidemic use to drug diagnosis and development, to predict the spread of the disease and to monitor human movement. New applications are also designed to address new needs and major challenges in the epidemic, including the provision of health care.

Zhao, Jiang, and Qiu [4] created a model that detects covid instances with 92% accuracy. Our model, when compared to the architecture model, represents current performance in all of the criteria we’ve discussed. Tese guarantees that patients who do not have COVID-19 be treated as they are in most situations, reducing the possibilities of identifying illnesses that do not have COVID-19 and easing the strain on the health system. In addition, we tested the model’s performance with little data and discovered that it was still working properly. Ting DSW, Lin H, Ruamviboonsuk P, Wong TY, Sim DA [5] have created a sophisticated and effective digital platform that can improve health communication by improving patient care and education, allowing for speedier decision, and reducing resource usage. Various machine learning domains, such as natural language processing, have been tested and employed in health care facilities to date. As a person-to-person chat agent between the user and the service provider, AI-based chatbots play a critical function. Our health-care system will be greatly influenced by this chatbot and response mechanism.

Qurat-ul-Ain Arshad, Wazir Zada Khan, Faisal Azam, Muhammad Khurram Khan [6] have proposed a standardized test called Polymerase Chain Reaction (PCR) tests to detect the COVID is expensive and it consumes more time and it is danger. However, to assist specialists and radiologists in diagnosing and diagnosing COVID-19, in-depth study plays an important role. Numerous research efforts have been made to develop deep learning strategies and techniques for diagnosing or classifying patients with COVID-19, and these strategies have been proven as great tool that can detect or diagnose COVID cases.

Wong ZSY, Zhou J, Zhang Q [1] proposed a review of the use of artificial intelligence, telehealth, which aligns with public health responses in the workplace of health care within COVID-19 disease. Systematic scoping reviews were conducted to identify potential symptoms. Other includes a more no of evidence on a variety of health and medical applications over the telephone. A large number of reports have investigated the use of artificial intelligence (AI) and analysis of big data, weaknesses in research design and translation intelligence, highlighting the need for continuous research of the real world.

Samer Ellahham [7] suggested that Deep learning introduced a limited support during the covid disaster. It can be used to teach patient, tests and early detection of symptoms such as the flu etc. with chatbot apps. AI can also be used to remotely control mild symptoms in patients with depressive conditions through the implementation of telemedicine in practice. For major roles such as treatment planning/diagnosis, medical supervision is highly recommended.

3 Proposed System

In the recent past, several researchers [8, 9] have been attempted to detect Covid from the cough sound, it is more challenging issue to find good and qualitative features of the cough signal, which are reflecting the Covid status. It remains to be open problem to find the salient features and signs to detect Covid characteristics, which can distinguish the covid related cough symptoms. The main objective of this paper is to foster a pre-screening technique that could prompt mechanized distinguishing proof of COVID-19 through the investigation of frequency domain analysis with comparative execution. As shown in Fig. 1, the proposed system consists of acquisition of cough sound, pre-processing of cough audio signal, frequency domain feature extraction relevant to covid characteristics and classification using machine learning algorithm. In the proposed work, we have the following modules.

  1. i.

    Cough sound acquisition

  2. ii.

    Pre-processing

  3. iii.

    Feature extraction

  4. iv.

    Prediction and classification using DCNN model

Fig. 1.
figure 1

Proposed system for Covid detection using Cough Sound

The main objectives of this proposed paper is follows.

  • To develop a fast and accurate method for the programmed identification of Covid-19 using Cough spectral analysis.

  • To obtain the joint time frequency domain features of the cough using DWT, prompting characteristic proof of Covid-19.

  • To deploy an unique model of supervised method using deep convolutional neural network to analyze Covid-19 from the observed features.

3.1 Preprocessing

The process of reducing or suppressing noise from the cough sound signal is referred as pre-processing. The raw cough audio samples need to be pre-processed to enable to accept for further steps including feature extraction, classification, etc. The essential preprocessing steps followed are:

  • Background noise removal

  • Normalization

  • Data augmentation

Noise Removal

The background noise removal is done by using noise filtering. The noise that associate in cough signals process are primarily Additive White Gaussian Noise (AWGN) with an even frequency distribution or the random noise. In our research work, we have considered the AWGN with the cough signals for noise filtering with the help of a first order Butterworth high pass filter (FOBHPF) in our method.

Normalization

It is the process of making the default values or data into standard scale. This is usually done when data attributes are at a different level. The standard formula for normalization is given by:

$$x_{{{\text{std}}}} = \left( {x - x_{{{\text{min}}}} } \right)/\left( {x_{{{\text{max}}}} - x_{{{\text{min}}}} } \right)$$
(1)

Data Augmentation

This leads us to the following step in the data pre-processing data augmentation procedure. Many times, the amount of data we have is insufficient to adequately accomplish the classification task. In such circumstances, data augmentation is used. In deep learning problems, augmentation is frequently employed to improve the volume and variation of training data. Only the training set should be augmented; the validation set should never be augmented.

3.2 Dynamic Feature Extraction

The process of translating raw data into numerical features that can be processed while keeping the information in the original data set is known as feature extraction. It produces better outcomes than applying machine learning to raw data directly. The process of feature extraction consists of the following elements.

  • Discrete wavelet transform (DWT)

  • Spectral centroid

  • Spectral analysis

  • Zero Crossing Rate (ZCR)

Chroma DWT

A feature of a musical pitch class is the chroma feature. To include chroma characteristics, the Chroma DWT in the picture below required a short-term modification. Voice separation and signal structure are represented by the DWT. The highest values are displayed as a spike.

Spectral Centroid

The spectral centroid is a metric used to characterise a spectrum in digital signal processing. It shows where the spectrum’s centre of mass is located. It has a strong perceptual link with the perception of a sound’s brightness.

Zero-Crossing Rate (ZCR)

An audio frame’s Zero-Crossing Rate (ZCR) is the rate at which the signal’s sign changes during the frame. In other words, it’s the number of times the signal’s value changes from positive to negative and back, divided by the frame’s length. ZCR is an indicator function and a key feature to classify percussive sounds. It is given by the formula,

$$zcr = \frac{1}{T - 1}\sum\limits_{t = 1}^{T - 1} {1{\mathbb{R}}_{ < 0} \left( {S_{t} S_{t - 1} } \right)}$$
(2)

where s is a signal of length T and \({\text{l}}_{{{\text{R}} < 0}}\), is an indicator function.

Spectral Analysis

The difference between the upper and lower frequencies in a continuous range of frequencies is known as bandwidth. Because signals oscillate around a point, if the point is the signal’s centroid, the sum of the signal’s highest deviation on both sides of the point can be regarded the signal’s bandwidth during that time frame. The mel spectrum is calculated by using DWT for the signal passing through the filter. The Mel frequency can be defined by,

$$f_{{{\text{Mel}}}} = 2595\;\;{\text{log}}_{10} \left( {1 + \frac{f}{700}} \right)$$
(3)

Finally, the feature extraction will be carried out by using Mel frequency cepstral coefficients (MFCC). The method of MFCC is applying the discrete signal on a window, with wavelet transform and computing logarithm of coefficients magnitude (cepstral coefficients), followed by warping frequencies to a mel scale as shown in Fig. 2. In our method, the Hanning window was utilised with a window size of 1024 and an overlap of 512. The size of the feature vector obtained from MFCC was 121 × 13.

Fig. 2.
figure 2

Dynamic feature extraction using MFCC with DWT and DCT

3.3 Prediction and Classification

Deep learning [10, 11] is a developing branch of computation and prediction, which finds numerous applications in fields like medical, agriculture, forecasting the weather, the stock market, etc. The Deep learning based Convolutional neural networks (DCNN) are frequently utilised for both feature extraction and time series forecasting. We have used DCNN for feature extraction and long short term memory (LSTM) for time series multi-step forecasting of Covid based on Cough with multiple simultaneous inputs. In contrast to the majority of other forecasting algorithms, LSTMs may pick up on sequence nonlinearities and long-term relationships. Therefore, LSTMs are less concerned with stationary. The DCNN-LSTM model used for the covid-19 prediction is shown in Fig. 3.

Fig. 3.
figure 3

Feature extraction and Prediction with DCNN

4 Implementation Methodology

This section describes bout the methodology of implementation. It begins with capturing Cough audio signal capturing, preprocessing, feature extraction, training, validation, prediction and classification as shown in Fig. 4. For implementing this proposed system, we have used Kaggle and Github to collect audio and data sets as input. We recorded real-time cough sounds with Google Recorder and saved in.wav format after being converted from .mp3 format. The Audio recordings are preprocessed and noise filtered and transcribed in real time by the app.

Fig. 4.
figure 4

Methodology for Covid detection with Cough Spectral analysis using DWT

4.1 Activation ReLU

In the neural network, the activation function is g which is responsible for converting the total input from node to local activation. The Relu function is active in the function of opening the fixed line unit. By default, this function returns the Relu unlock frequency (x, 0) while the smart 0-element element and the input tensor. It helps to prevent strong growth in computational needs in order to work on the network. Relu nets are well suited to represent convex activities.

4.2 Dense Layer

It is a highly linked layer with the preceding layer, meaning that each of the layer’s neurons is connected to every neuron in the prior layer. The input layer and the first hidden layer are both supplied as parameters in the first component function Object(). Artificial neural networks make extensive use of it. The numbers of hidden neurons should be 2/3 the size of the input layer, and the input layer size should be 2/3 the size of the input layer. To get the density of the network, the density layer contains 576 input channel number, while the output number is 64 and the number of parameters is 36928. The main purpose of the dense layer is to separate the images based on the output from the convolution layers. Each layer in the neural network consists of neurons that comprise the rate at which they are implanted and the weighted mass transmitted by a non-linear function called activation function.

4.3 Sigmoid Activity

The estimated amount of input goes through the activation function and this output splits as the input is in the next layer. The sigmoid unit is a neural network and the activation function is a guarantee rather than the output of this unit which will remain between 0 and It is used to add a non-linearity to a machine learning model and performs tasks with great efficiency and to map out the real number in the possibilities. The scope of the sigmoid function is that the input value is between −∞ and +∞ while the output can only be between 0 and 1.

4.4 Dropout Layer

Dropout is also called Dilution/Drop-connect. Inputs not set to 0 are increased by 1/(1 - rate) which helps to prevent over-fitting. It is placed on top of the fully integrated layers only because they are the ones with the most parameters and over-aligning themselves causing over-fitting. It is a stochastic formulation. It is used during training to make it more dynamic in flexibility in data training after layers of convolution and after compilation of layers.

4.5 Convolution Layer

We can extract features from the our picture using convolution. Simply said, we break down our embedded picture into smaller tiles, make unique adjustments to these smaller tiles, and store the result as a new representation of our original tile. As a consequence, the most part interesting bits of the primary actual tile, i.e. the input picture components, are extracted for the each little output.

4.6 Max-Pool

Top integration is a blending function that chooses a substantial portion of the filter region covered by the feature map. As a result, the multi-component layer’s output might be a f map that includes the features of the preceding feature. Kernel removes the most quantity of component possible throughout the Max Pooling conversion procedure.

4.7 Training and Validation

In order to attain a goal, ML algorithms [12] need training data. This training dataset will be analysed, the inputs and outputs will be classified, and the dataset will be analysed once again. An algorithm will effectively memorise all inputs and outputs in a training dataset if it is given enough time. The training dataset size and test sets are the process’s key stop parameters. This is commonly stated as a percentage difference between training dataset 0 and 1 or test data sets. With A training set of 0.65 (65%) indicates that the remaining 0.35 (35%) is assigned to a test set.

A test set is a monitoring set that is used to evaluate on the performance of a model based on certain basic performance criteria. It is critical that the test set include no awareness from the training set. It will be difficult to determine if the algorithm has learnt to function normally from the training set or has just remembered it if the test set incorporates instances from the training set. Classification is a type of supervised machine learning and is a process of categorizing data into classes. Binary classification is about determine of a target variable is 0 or 1.

5 Results and Discussion

This section discusses the empirical results obtained from the collected datasets of Cough with the help of deep learning based models. The deep learning models tested are DenseNet-169, ResNet-50, InceptionV3, VGG-16 and VGG-19, applied with Adamax optimizer. The learning rates of different models and their obtained accuracies are tabulated in Table 1.

Table 1. Performance comparison of Deep learning models

6 Conclusion

In this research paper, we have devised a novel method for the early detection of the presence of corona virus using cough sound as an input with the help of deep learning model. The acquired cough signal was preprocessed and then applied with feature extraction with DCNN model followed by deep learning based prediction using LSTM model, which predicts the covid infection probability. The proposed DCNN-LSTM model was tested empirically and it identifies Covid-19 patients with cough sound to the extent of 93.6 % accuracy, which is better than the comparable similar methods. Also, it was observed that among the tested deep learning models with Adam-max optimizer, the VGG-19 model has performed better with the prediction accuracy of 93.6%.