Introduction

One of the leading causes of death worldwide is respiratory disorder diseases (WHO 2019). Patients suffering from these diseases produce adventitious sounds in their breathing cycle. The World Health Organization (WHO) declared COVID-19 as a global pandemic that is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and is rapidly spreading across more than 200 countries worldwide (Sanders et al. 2020). COVID-19 comes with an indication like fever, throat, dry cough, dyspnea, fatigue, and headache. Also, when it is an indication of a critical condition, its symptoms with multiple organ failure (Kujawski et al. 2020; Chang et al. 2020) and Sun et al. (2020). Unconditionally, lung sounds also affect the scarcity of voice, affecting even shortness of breath and congestion in the upper airway. The repetitive dry coughs cause lungs that affect voice sound quality. Researchers even reported that COVID-19 symptoms with inadequate airflow by the vocal tract result in pulmonary and laryngological involvements in people (Asiaee et al. 2020). As a result, all symptoms, as stated, result in the patient’s lungs with sounds as an identifiable voice signature. There are mainly two categories of adventitious sounds, namely continuous and discontinuous. The continuous adventitious breath sounds (wheezes, stridor, and rhonchi) have a time length of > 250 ms, but discontinuous (crackle) adventitious signals have a time duration of 25 ms, according to Islama et al. (2018). It has been seen that it is critical for proper assessment of COVID-19 patients to mitigate and halt the rapid expansion of diseases across the nations. With this intensity and demise rate of COVID-19 in the presence of lung/pulmonary diseases is increased, which is rapidly spreading among the public. TB and COVID-19 differ as TB is curable, but COVID-19 lacks effective anti-viral agents and drugs (Pan et al. 2020; Cantini et al. 2020). It has been seen that both COVID-19 and TB affect health systems since both are airborne transmissible diseases, and we can diagnose them rapidly. Both effects stigma and requires awareness among the public and need cooperation so that it can be prevented and diagnosed so that treatment can be effective. It has been seen that most countries still lack information about COVID-19, and as compared to TB, it does not require many clinical and immunological parameters such that we can understand how both differ from the interaction between the two diseases. Besides, the COVID-19 pandemic led to a notable fall in TB notifications (Migliori et al. 2020). COVID-19 has main dominating symptom is coughing, which is also a symptom of more than 100 diseases. However, how it affects the respiratory system varies from COVID-19, as we can see from the fact that diseases of the lungs affect the airway to be either restricted or obstructed, which influences cough acoustics. However, the glottis behaves diversely under unpredictable pathological conditions, and we can distinguish between coughs due to TB, asthma, bronchitis, and pertussis (Pahar et al. 2021). The burning flare-up of COVID-19 requires COVID-19 individuals well-planned testing such that it can limit and arrest diseases as it is rapidly spreading globally. Chronic pulmonary diseases have been observed main cause of the severity and mortality ratio of COVID-19-affected patients. One of the most feasible assessment approaches is a radiographic examination exploiting chest X-ray images for pulmonary disorders, including COVID-19. The researchers conduct DL image classification by developing DL classifiers with nine class CXI to predict pulmonary diseases with COVID-19 (Bhosale and Patnaik 2022b). To test COVID-19 cases, we have analyzed the most successful radiography utilizing chest X-ray sounds. For analyzing diseases, we have used SVM-LSTM-BO-based artificial intelligence for seeing various sounds of lung-based disease and have studied to see how much improvement we can achieve as a machine learning for our algorithm-based study to see the advantage as compared to obstructive pulmonary diseases (Bhosale et al. 2022; Bhosale and Sridhar Patnaik 2022). The main aim of DL methodology is to grasp hierarchical features from data. DL methods permit us to tackle complex patterns skilfully. GGO, consolidation, pleural effusion, and bilateral lung involvement are the specific patterns due to infectious COVID-19 in radiological images (Carotti et al. 2020). These specific patterns can be identified by using different DL architectures (Khan et al. 2021). It has been reported that DL models have higher sensitivity and specificity values and more accurate predictions for COVID-19 detection. DL methods reduce negative error and false positive rates and provide medical specialists and radiologists with a quick, economical, and accurate diagnostic. COVID-19 can be finely tuned, and much time is saved on analysis-related tasks as we built the DL model and trained them from scratch. The present research aims to uncover the non-linear characteristics of the adventitious sounds heard during the respiratory cycle. It has been discovered that wheezing noises are recognized between expiration and inspiration in intensity, pitch, position, and duration. As a result of narrowing the airway blockages, we can have either a high or low pitch (Swapna et al. 2020; Taplidou and Hadjileontiadis 2007). Kevat et al. (2020) used the manual accusation to detect adventitious breath sounds with low inter-observer reliability. The study gathered 192 auscultation recordings of children with two digital stethoscopes (Clinicloud and Littman) categorized as wheezes and crackles. The above research uses spectrogram and waveform analysis of clinicloud recordings to detect wheezes and crackles. The above study had a positive percent agreement (PPA) of 0.95 and a negative percent agreement (NPA) of 0.99, while Littman collected sounds had a PPA of 0.95. The PPA and NPA were both 0.82 (Fig. 1).

Fig. 1
figure 1

Graphical abstract

Shi et al. (2019a) use temporal feature Mel spectrogram features and Bi GRUVGish classifier combination on a database of 384 subjects to achieve an accuracy of 87.41%. Aykanat et al. (2017) employ MFCC spectrograms with CNN classifier to achieve an accuracy of 80%.

Niu (2019) describe a system for detecting the presence of sputum and acquired inhale and exhale respiratory sounds. The research extracted audio features and fed them to a tenfold cross-validation experiment (logistic classifier). The research achieved a sensitivity of 92.26% and a specificity of 92.26%

In the research conducted by Shi et al. (2019b), they chose WCC features and combined them with the BPNN classifier to achieve an accuracy of 92.5% with a database of 64 subjects.

Bardou et al. (2018) obtained the features in the form of spectrograms and fed these to a CNN-based classifier to achieve an accuracy of 95.56%.

Demir et al. (2020) clubbed time frequency-based features with convolution neural networks to achieve an accuracy of 65.5%.

The above researchers have employed either one or two classifiers only for testing proposed features. Most of the features target the linear characteristics of adventitious sounds. The accuracy so achieved has an upper cap of 95%. The database also has a limited number of subjects.

Taplidou and Hadjileontiadis (2010) used higher-order spectral features to classify adventitious sounds to detect adventitious sounds based on statistical attributes (SPSS tool), with a 96% accuracy rate. The current research proposes sixteen features (two sets of eight each, with two forwarded (Taplidou and Hadjileontiadis 2010)) based on WBS and WBP. These features are fed to the proposed classification model. Here we used SVM-LSTM with BO as the classification model. SVM algorithm uses loss function for training by folding at k= 5. The finding suggests a gradual rise in accuracy from the SVM to the SVM-LSTM algorithm and the SVM-LSTM-Bayesian optimization model for both types of features. Remote automated auscultation systems may play a crucial role in combating the problem of the availability of expert physicians. Hence, artificial intelligence can be leveraged to assist physicians in performing auscultation remotely and more accurately. This paper proposes a new hybrid framework for lung sound classification for biomedical engineering by combining feature engineering (FE), LSTM, and SVM with Bayesian optimization (BO) for machine learning. The FE module comprises feature selection and extraction phases. And the SVM with LSTM with Bayesian optimization (BO) algorithm is used to fine-tune the control parameters of data and provides more accurate results by avoiding the optimal local trapping. The proposed FE-SVM-LSTM-BO framework works in such a way as to ensure stability, convergence, and accuracy. The present FE-SVM-LSTM-BO model is tested based on data for lung sounds for categorical wheeze, crackle, and normal sounds with error calculation parameters. The results show that the proposed model has significantly improved the accuracy with a fast convergence rate and shows efficiency from previous studies for all statistical and error parameters (Zulfiqar et al. 2022). Our proposed work tested adventitious sounds, i.e., crackles, wheezes, and both but it is incapable of detecting other sounds, i.e., rhonchi and squawks. Also, the ICBHI database on which our work has been proposed has only a limited number of respiratory cycles, and it is a fact that recording respiratory sounds is a challenging process compared to other physiological signals, i.e., ECG; fewer studies focus on them. Also, as we use DL strategies due to noisy and small data suffering from significant deviation and generality failures, and also as we use DL systems, it is critical to assess efficiency since they are susceptible to noise and incorrect model interpretation. The inductive implications inherent in cases of uncertainty (Bhosale and Patnaik 2022a). The paper’s organization flow explains the methods followed in the research and highlights data acquisition and pre-processing techniques followed in the research. And another part gives a broad overview of feature set analysis. And later presents the experimental results and is later detailed in the discussion. And concludes the paper. Finally, end with Acknowledgements.

Methods

The methodology adopted in this work is divided into the following points:

  1. 1.

    Data analysis takes place in two subsections: data acquisition and data processing, in which we have taken the Rale database.

  2. 2.

    Feature extraction phase: Here, the research uses mathematical extraction of features described in the feature analysis section.

  3. 3.

    To extract the feature’s numerical values, we have constructed an excel sheet.

  4. 4.

    Categorical data based on wheeze, crackle, and normal sounds, the research adopts SVM, LSTM, and LSTM with BO algorithm as artificial intelligence.

  5. 5.

    The algorithm’s running generates parameters that make a confusion matrix.

  6. 6.

    Using the confusion matrix, we calculated error parameters.

  7. 7.

    In this analysis, based on the results, we discussed and concluded all our points and discussed them in the result and conclusion sections.

Data acquisition

The data in this research mainly comprise of the Respiration Acoustics Laboratory Environment RALE\(^{{\circledR }}\) (Pasternak 2008) lung sounds 3.2 (permitted to use the data for academic research) and other resources (Huang 2005; Keroes 2018). The educational program RALE\(^{{\circledR }}\) aims to educate doctors, nurses, medical professionals, and students. It features about 50 recordings, a collection of adventitious sounds from people of various ages and conditions. There is also a quiz area with 24 more instances. The Health Sciences Communications Association has given the collection a commendation award for computer-based products. Wheezes (normal, monophonic, and polyphonic wheezes) are represented by 252, crackle (coarse and fine crackle) is characterized by 70, and normal sound (bronchial, bronchovesicular, and tracheal sounds) is represented by 50.

Data pre-processing

The voltage range of the captured sound is − 5V to + 5V (− 32,767 to + 32,768). The sampling frequency for the captured sound is 4kHz, 16 bits, and 1024 points per segment. Following the computerized respiratory sound analysis (CORSA) guidelines, the first-order Butterworth filter high passes the signals at 7.5kHz for filtering out DC offset. The signals are low pass filtered at 2.5kHz using eighth-order Butterworth LPF. The system uses BPF (150Hz–2kHz) for heart sound cancellation. The signals are divided into segments of its waveform by using Goldwave\(^{{\circledR }}\) Software. A pulmonologist manually validated the database in the medical clinic in Indore, India.

Feature analysis

To provide a distinctive identity, the values are drawn from a signal called a feature. In this paper, we have proposed a feature set that shows non-linearity in the time-frequency domain, and for this proposed system, we have seen non-stationary characteristics and the quadratic phase coupling of harmonic peaks of the feature are non-linear in nature. So as a higher-order spectrum, we have a rich feature scope in non-linear signals.

Figures 23456, and 7 are the higher-order spectra of wheeze, crackle, and normal health sounds of wavelet bi-spectrum and bi-phase. They are marked with global max and min peaks with arriving rise and fall times.

Fig. 2
figure 2

Higher-order features of wavelet bi-phase for wheezes

Fig. 3
figure 3

Higher-order features of wavelet bi-spectrum for wheezes

Fig. 4
figure 4

Higher-order features of wavelet bi-phase for crackle

Fig. 5
figure 5

Higher-order features of wavelet bi-spectrum for crackle

Fig. 6
figure 6

Higher-order features of wavelet bi-phase for normal

Fig. 7
figure 7

Higher-order features of wavelet bi-spectrum for normal

Wavelet bi-phase (WBP) and bi-spectrum (WBS)

In obstructive pulmonary disease, airway restriction introduces non-linearity in harmonic peak interactions. The wavelet analysis aids in the detection of non-linearity in signal analysis. We convolve the wave-like structures (wavelets) with the signal in wavelet transform. This convolution procedure reveals the signal’s transitory features. The mathematical formula for CWT is as follows: Hadjileontiadis (2018)

$$ W_{x}(a,b)= \frac{1}{ \sqrt{a} } {\int}_{- \infty }^{+ \infty }x(t) \psi *\left( \frac{t-b}{a}\right)dt $$
(1)

where x(t) represents the signal in time-domain (x(t) ∈ L2(R)), * represents complex conjugate, and ψ(t) is the mother wavelet scaled by a factor a, a > 0 and dilated by a factor b, also a and b are continuous. The Morlet wavelet has the advantage of time and frequency localization. They are also helpful in identifying measurable features in the time-frequency domain and are preferred as mother wavelets.

$$ \psi (t)= \frac{1}{ \sqrt{ \pi f_{b} } } e^{\frac{- t^{2} }{f_{b} } } e^{j2 \pi f_{c}t } $$
(2)

where fc and fb are the central wavelet frequency and bandwidth parameters, respectively. The wavelet bi-spectrum is defined as

$$ W B_{x}(a_{1}, a_{2})= {\int}_{T} W_{x}^{*}(a, \tau ) W_{x} (a_{1}, \tau ) W_{x}(a_{2}, \tau )d \tau $$
(3)

The preceding integration takes place over a limited time interval T: τ0 < τ < τ1 and a, a1, and a2 are the wavelet component and signal scale lengths. The WBS provides quadratic phase coupling between wavelet components in the interval T. Wavelet bi-amplitude and bi-phase refer to the magnitude and phase of complex WBS, respectively.

Instantaneous wavelet bi-amplitude and bi-phase

The WBS defined in equation 3 corresponds to time interval T; instantaneous WBS (IWBS) is defined as follows:

$$ IW B_{x} (a_{1} , a_{2},t) = \mid IW B_{x} (a_{1} , a_{2},t)\mid e^{j \leq IW B_{x} (a_{1} , a_{2},t)} $$
(4)

= Axejφx

The IWBS is a complex quantity with a magnitude of bi-amplitude and a phase of bi-phase, as shown by the above equation.

Global peaks (GPs) maxima, minima, and Euclidean distance

The following features are based on GPs and Euclidean distance: Feature 1: Global max value in the amplitude domain (wavelet bi-spectrum) GMaxWBx Feature 2: Global min value in the amplitude domain (wavelet bi-spectrum) GMinWBx Feature 9: Global max value in the amplitude domain (wavelet bi-phase) GMaxϕx Features 10: Global min value in the amplitude domain (wavelet bi-phase) GMinϕx If the peak’s amplitude exceeds the average amplitude, it is classified as GP. Global maxima or minima is the maximum value attained by a function in the positive or negative direction. The global peaks (GPs) appear throughout the signal’s lifespan (TTotal). Their features offer the proposed feature’s bi-frequency-related qualities.

$$ m A_{x} (\omega_{1}, \omega_{2} )= \overline{A_{x} (\omega_{1}, \omega_{2},t)} $$
(5)

where Ax(ω1,ω2) is the IWBC amplitude of instantaneous wavelet bi-amplitude over the area Δ that exceeds the statistical noise.

$$ mA b_{x}^{G P_{i} }(\omega_{c1} , \omega_{c2})=mA b_{x}^{G P_{i} } (\omega_{1} , \omega_{2}) \mid_{mA_{x}^{G P_{i} } (\omega_{1} , \omega_{2})=\max } $$
(6)

Ci = (ωc1,ωc2),i = 1,2,.......l The function of global maxima in domain D has a global maximum at CD.

if f(x) < f(c) for all xD, it has a global minima at CD

if f(x) ≥ f(c) for all xD, it has a global maxima at CD As seen in Table 1, the GMax of both bi-phase and bi-spectrum has − 45.8678, − 12.276, and − 13.3763 values for W, C, and N. The GMin of both bi-phase and bi-spectrum for crackle has − 99.7943 values. And GMin for bi-spectrum has − 102.6085 and − 111.2704 values for W and N. And GMin for bi-phase has − 102.609 and − 111.27 values for W and N.

Table 1 Parmeters of higher order spectrum

The following section puts up light on feature number three and eleven: Feature 3: The distance of the Ci from the contour S of the i th GP at the bi-frequency domain in the wavelet bi-spectrum DGPiWBx Feature 11: The distance of the Ci from the contour S of the i th GP at the bi-frequency domain in the wavelet bi-phase \( D^{GP_{i}} \phi _{x}\)

The distance of the Ci from the contour S of the i th GP at the bi-frequency domain, considering the contour S of GPi is denoted by DGPi. The Euclidean distance (feature numbers 3 and 11) of Si from Ci can be defined as follows:

$$ s^{j}= (\omega_{s1},\omega_{s2})^{j} \in S, j=1,2,....m $$
(7)

where m is the number of points on the contour S of GPi. As seen in Table 1, the DGPi of both bi-phase and bi-spectrum has 56.7406,630.5659 and 106.9485 values for W, C, and N.

Amplitude above mean

Feature 4: Amplitude above mean in wavelet bi-spectrum) AmeanWBxFeature 12: Amplitude above mean in wavelet bi-phase) Ameanϕx The fourth and twelfth features are discussed in this section. The peak-to-peak (p-p) amplitude is the difference between the largest and the smallest points. Figure 2 depicts the signal amplitude measurement points. The p-p amplitude is denoted by the number “2” in Fig. 8.Peak-to-Peak amplitude (represented as “2” in Fig. 8) = Mean of the spectrum (MS) = Peak-to-Peak amplitude (denoted as “2” in Fig. 8)/2Amplitude above mean Amean = Peak amplitude (denoted as “1” in Fig. 2)−Mean of the spectrum (MS)

Fig. 8
figure 8

Location of measurement of signal amplitude

As seen in Table 1, the Amean of both bi-phase and bi-spectrum has 28.3703, 43.7591, and 48.9471 values for W, C, and N.

Average instantaneous WBS/WBP

Feature 6: Average instantaneous wavelet bi-spectrum across the examined total time interval T mWBx(ω1,ω2)Feature 14: Average instantaneous wavelet bi-phase across the examined total time interval T mϕx(ω1,ω2)This section elaborates the features number six and fourteen. The maximum instantaneous wavelet bi-phase of the LPs in the time interval t is denoted as mϕx(ω1,ω2) and for WBS as mWBx(ω1,ω2). The frequencies \(\omega _{c_{1} }\), \(\omega _{c_{2} }\) where LPi has its maximum value vary with time. The representation of time dependence of the wavelet frequencies \(\omega _{c_{1} } \)and \(\omega _{c_{2} } \) is represented as \(\omega _{c_{1} } (t)\) and \(\omega _{c_{2} } (t)\).

As seen in Table 1, the Average instantaneous of both bi-phase and bi-spectrum has 5.18E + 04, 7.01E + 04, and 51134 values for W, C, and N.

Maximum WBS/WBP across time

Feature 7: Maximum wavelet bi-spectrum across time-related to LPs \({\max \limits } WB_{x}^{LP}\)Feature 15: Maximum wavelet bi-phase across time-related to LPs \({\max \limits } \phi _{x}^{LP}\)This section puts light on feature numbers seven and fifteen. The local peaks (LPs) are seen in the signal’s detailed perspective based on window overlapping section Δ using IWBS analysis.

$$ A b_{x}^{L P_{i} } (\omega_{c1} , \omega_{c2},t)=A b_{x}^{LP_{i} } (\omega_{1} , \omega_{2},t) \mid_{mA_{x}^{LP_{i} } (\omega_{1} , \omega_{2},t)=\max } $$
(8)

where l is the number of LPs, and i is the maximum peak position. The maximum WBS/WBP across time is related to local peaks as follows:

$$ \max \phi_{x}^{LP} =\max (\phi_{x}^{LP}(t)) $$
(9)
$$ \max WB_{x}^{LP} =\max (WB_{x}^{LP}(t)) $$
(10)

As seen in Table 1, the Max of both bi-phase and bi-spectrum has 1020, 1024, and 1021 values for W, C, and N.

Arithmetic mean (AM) and standard deviation (SD)

Feature 5: Mean wavelet bi-spectrum related to LPs \(\text {mean} WB_{x}^{LP}\)Feature 8: The standard deviation of the wavelet bi-spectrum related to LPs \( stdWB_{x}^{LP}\)Feature 13: Mean wavelet bi-phase related to LPs (Taplidou and Hadjileontiadis 2010) \( \text {mean} \phi _{x}^{LP}\)Feature 16: The standard deviation of the wavelet bi-phase related to LPs (Taplidou and Hadjileontiadis 2010) \( std \phi _{x}^{LP}\)This section discusses feature numbers five, eight, thirteen, and sixteen. AM measures the dispersion of a collection of data from its mean and is the central value of the SD

$$ std WB_{x}^{LP} =std(WB_{x}^{LP}(t)) $$
(11)
$$ std \phi_{x}^{LP} =std(\phi_{x}^{LP}(t)) $$
(12)
$$ \text{mean} \phi_{x}^{LP} =\text{mean}(\phi_{x}^{LP}(t)) $$
(13)
$$ \text{mean} WB_{x}^{LP} =\text{mean} (WB_{x}^{LP}(t)) $$
(14)

As seen in Table 1, the mean of both bi-phase and bi-spectrum has 510.7143, 535.3702, and 511.34 values for W, C, and N. As seen in Table 1, the Std of both bi-phase and bi-spectrum has 290.4293, 289.4929, and 287.2834 values for W, C, and N.

Results

The result section puts up light on the confusion matrix, accuracy vs. iterations, loss vs. iterations, and derivation of statistical measures.

As seen from Tables 23456, and 7, we have attained Precision value improvement in WBS, and WBP with LSTM with Bayesian optimization shows WBS with LSTM with Bayesian has a good Precision value. As seen from Tables 23456, and 7, we have attained Recall value improvement in WBS, and WBP with LSTM with Bayesian optimization shows WBS with LSTM with Bayesian has a good Recall value. As seen from Tables 23456, and 7, we have attained Specificity value improvement in WBS, and WBP with LSTM with Bayesian optimization shows WBS with LSTM with Bayesian has a good Specificity value. As seen from Tables 23456, and 7, we have attained F1 value improvement in WBS, and WBP with LSTM with Bayesian optimization shows WBS with LSTM with Bayesian has a good F1 value.

Table 2 Performance measures for SVM algorithm of WBP for each class
Table 3 Performance measures for LSTM algorithm of WBP for each class
Table 4 Performance measures for LSTM algorithm with Bayesian optimization of WBP for each class
Table 5 Performance measures for SVM algorithm of WBS for each class
Table 6 Performance measures for LSTM algorithm of WBS for each class
Table 7 Performance measures for LSTM algorithm with Bayesian optimization algorithm of WBS for each class

For both WBP and WBS, Tables 23456, and 7 present the performance metrics for SVM, LSTM, and LSTM with Bayesian optimization for each class, i.e., wheeze, crackle, and normal sounds. The above model applied in the current study is a new model applied to lung sounds (Anderson et al. 2021), and also it is giving better results.

Table 9 shows the error calculation for SVM, LSTM, and LSTM with BO. These errors are calculated from Figs. 910, and 11. Table 9 shows that the MSE values for WBS AND WBP for SVM and LSTM are 97.333 and 90.667 and for LSTM with BO for WBS is 38.667 and WBP is 44.000. So we conclude that the lower the MSE better it is. And it is clear from Table 9 that bi-spectrum is good for LSTM with BO. Table 9 shows that the PSNR values for WBS AND WBP for SVM and LSTM are 24.2482 and 28.5563 and for LSTM with BO for WBS is 32.2579 and WBP is 31.6983. So we conclude that the lower the PSNR better it is. And it is clear from Table 9 that bi-spectrum is good for LSTM with BO. Table 9 shows that the R-values for WBS and WBP for SVM and LSTM are 0.9958 and 0.9962 and for LSTM with BO for WBS is 0.9984, and WBP is 0.9981. So we conclude that the higher the R-value better it is. And it is clear from Table 9 that bi-spectrum is good for LSTM with BO. Table 9 shows that RMSE for WBS AND WBP for SVM and LSTM are 9.8658 and 9.5219 and for LSTM with BO for WBS is 6.2183 and WBP is 6.6332 (Table 8). So we conclude that the lower the RMSE better it is. And it is clear from Table 9 that bi-spectrum is good for LSTM with BO. Table 9 shows that NRMSE for WBS AND WBP for SVM and LSTM are 0.0453 and 0.0471 and for LSTM with BO for WBS is 0.0308 and WBP is 0.0328. So we conclude that the lower the NRMSE better it is (Fig. 12). And it is clear from Table 9 that bi-spectrum is good for LSTM with BO.

Table 8 Comparative analysis of researches in terms of accuracy of detection
Table 9 Error calculation for WBS and WBP for confusion matrix
Fig. 9
figure 9

Confusion Matrix for SVM algorithm for WBS and WBP

Fig. 10
figure 10

Confusion matrix for LSTM algorithm for WBS and WBP

Fig. 11
figure 11

Confusion matrix for LSTM and Bayesian optimization algorithm for WBS and WBP

Fig. 12
figure 12

Accuracy and loss vs. iteration plot for LSTM algorithm and LSTM and Bayesian optimization algorithm for WBS and WBP

Table 8 shows the comparative analysis for researchers who have achieved lower results than our proposed work. Kevat et al. (2020) use neural network study had a positive percent agreement (PPA) of 0.95 and a negative percent agreement (NPA) of 0.99, while Littman collected sounds had a PPA of 0.95. The PPA and NPA were both 0.82. Shi et al. (2019a) use temporal features, Mel spectrogram features, and Bi GRUVGish classifier combination on a database of 384 subjects to achieve an accuracy of 87.41%. Aykanat et al. (2017) employ MFCC spectrograms with CNN classifier to achieve an accuracy of 80%. Niu (2019) describes a system for detecting the presence of sputum and acquired inhale and exhale respiratory sounds. The research extracted audio features and fed them to a ten-fold cross-validation experiment (logistic classifier). The research achieved a sensitivity of 92.26% and a specificity of 92.26%. Shi et al. (2019b) chose WCC features and combined them with the BPNN classifier to achieve an accuracy of 92.5% with a database of 64 subjects. Bardou et al. (2018) obtained the features in the form of spectrograms and fed these to a CNN-based classifier to achieve an accuracy of 95.56%. Demir et al. (2020) clubbed time frequency-based features with convolution neural networks to achieve an accuracy of 65.5%.

It shows that all error parameters have better values for LSTM with the BO model. Also, Tables 23456, and 7 reflect that LSTM with BO model performs best in all parameters, i.e., sensitivity, specificity, precision, F-measure, and accuracy.

Discussion

Automatic classification of adventitious sounds for identifying pulmonary obstructive is a challenge. Previous methods to detect adventitious sounds of lungs mostly employed features based on linear characteristics. Here in this research, we propose features based on the non-linear characteristics of lung sounds. Table 8 compares the various research using RALE\(^{{\circledR }}\) database. Table 8 shows that conventional classifiers used for testing the features are conventional. Here in this research, we used the SVM-LSTM-BO ML combination for the first time to separate lung anomalies. The results compare the accuracy of the algorithm with and without Bayesian optimization. The results show that with the Bayesian optimization proposed algorithm model becomes more effective in detecting the targets. When using Bayesian optimization, the algorithm benefits from prior knowledge of a problem’s structure, and the data shows a set of high-quality solutions. Here we can adjust the previous information to information gathered during the run to produce new solutions. Tables 23456, and 7 shows that the accuracy of SVM for both WBS and WBP is 94.086% and for LSTM for both is 94.624% while for LSTM with Bayesian optimization for WBP IS 95.161% and for WBS is 95.699%. So we can conclude from this that major improvement is seen in LSTM with Bayesian and also best improvement is seen in wavelet bispectrum for accuracy parameters. And also, for other parameters, as seen from Tables 23456, and 7 we discuss, the LSTM with Bayesian optimization is efficient for wavelet bispectrum for F-measure, sensitivity, specificity, precision for each class, i.e., wheeze, crackle, and normal with macro avg and micro avg respectively. As seen from Tables 23456, and 7, we have attained TP value improvement in WBS and WBP with LSTM with bayesian optimization with values 118.67 and 118, which shows WBS with LSTM with Bayesian has a good TP value. As seen from Tables 23456, and 7, we have attained TN value improvement in WBS and WBP with LSTM with Bayesian optimization with values 242.67 and 242, which shows WBS with LSTM with Bayesian has a good TN value. As seen from Tables 23456 , and 7, we have attained FP value improvement in WBS and WBP with LSTM with Bayesian optimization with values 5.3333 and 6, which shows WBS with LSTM with Bayesian has good FP value, i.e., lower FP more improvement. As seen from Tables 23456, and 7, we have attained FN value improvement in WBS, and WBP with LSTM with bayesian optimization with values 5.3333 and 6, which shows WBS with LSTM with Bayesian has good FP value, i.e., lower FN more improvement.

The results show that SVM parameters such as penalty and kernel parameters positively affect SVM model correctness and complexity. Besides, the findings revealed that the proposed method might be employed as a system of aid to diagnose COVID-19 disease. The findings uncover that the suggested strategy has good behavior in increasing classification accuracy and optimal feature selection. The presented strategy can be considered a useful clinical decision-making tool for clinicians. With the increasing popularity of LSTMs, various alterations have been tried on the conventional LSTM architecture to simplify the internal design of cells to make them work more efficiently and reduce computational complexity.

Conclusion

Researchers proposed two sets of features based on WBS and WBP to detect adventitious sounds of lungs. Results reveal that feature sets based on WBS AND WBP obtained an accuracy of 94.086% for SVM and 94.684% for LSTM, and 95.699% and 95.161% for WBS with WBP, respectively, for LSTM and Bayesian optimization. The concept that adventitious sounds have distinct non-linear features has been proven via research. We concluded that combining LSTM with Bayesian optimization improved each class’s accuracy and all statistical parameters. The model achieved accurate AI-aided detection of lung diseases for light weighted devices. As seen from the results, we reached on conclusion that SVM with LSTM with Bayesian optimization have achieved improvement in all parameters, i.e., accuracy, specificity, sensitivity, precision, and recall for each class also, i.e., wheeze, crackle, and normal sounds also we have found that for WBS have more improvement in LSTM with Bayesian as compared with WBP. Also, we conclude from this part that combining SVM with LSTM with Bayesian for WBS proposed method concludes improvement from previous work. Future works will focus on increasing the data-set size to include more subjects and a wider range of diseases such as COVID-19. This will improve the credibility of the proposed model. Although the proposed classification model achieves high-performance metrics, it may be further improved by adjusting the pre-processing techniques and the training structure.