1 Introduction

Respiratory diseases (also known as pulmonary or lungs disorder) are basically biochemical disturbances of the lungs’ tissues and cells that become the inhaling process difficult. More specifically any parts of the respiratory system such as the bronchi, trachea, bronchioles, alveoli, pleurae, pleural cavity, nerves, and physique when affected by ailments then we fall under the category of respiratory diseases [1]. Some basic types of respiratory diseases are cold, influenza, and pharyngitis, while some serious diseases are bacterial pneumonia, pulmonary intercalation, TB, severe asthma, lung cancer, and severe acute respiratory disorders, such as coronavirus. Respiratory illnesses can be classified by considering the affected part of the respiratory system such as organs or tissues, by nature and pattern, or by cause of disease [2]. The field and study of respiratory disease are called pulmonology and the physician is known as a pulmonologist or respirologist. International organizations of world international respiratory societies such as the American College of Chest Physicians (ACCP), International Respiratory Societies (IRS), European Respiratory Society (ERS), Asian Pacific Society of Respirology (APSR), International Union against Tuberculosis and Lung Disease, and others relevant societies/associations meet regularly to control and monitor the respiratory diseases worldwide [2, 3]. The societies have 70,000 professional workers worldwide to emphasize the liability of respiratory diseases worldwide. Prevention from respiratory diseases may be possible if diagnosed at an early stage. If one takes precautions and adheres to the Standard Operating Process (SOP), the disease can then be reduced and, in many cases, avoided [3,4,5]. Therefore, basic treatment guidelines and medical supervision are vital to control respiratory diseases at early stages and if patients were unable to find proper guidance and treatment on time then the disease may increase and become life-threatening. To cope with the gap, several researchers have proposed a system typical called a “Clinical Decision Support System (CDSS)”. This CDSS is a type of software that take patient information and symptoms and processes this information using advanced artificial intelligence algorithms through SOPs, and then prescribes possible treatment for patients after predicting the disease [6,7,8,9].

Since CDSS is a computer-based software that also records and keeps the patients’ data of previous visits, this system may also improve patient care by providing detailed prescriptions based on patients’ history. Since CDSS is primarily a decision support system, it assists clinicians to make improved decisions and hence, reduces errors in medical treatment. In US 439 quality indicators found that adults receive only half of the prescribed treatment [10]. According to the US Institute of Medicals, which estimated that each year 98,000 US residents die due to preventable medical errors [11]. Similarly, UK hospitals observed that 11% of affected roles experienced chronic events, 48% were inevitable, while 8% of patients led to mortality [12]. To reduce such cases and increase the efficiency of medical treatment healthcare companies are turning to CDSS. As compared with other manual approaches, CDSS is shown more effective and more likely to result in lasting improvement in clinical practice. Recently 66% of CDSS systems worked well while 34% did not [13]. To improve and enhance CDSS efficiency researchers are working on enhancing the rules embedded in the software. For this purpose, Artificial Intelligence such as the Machine Learning technique will be best to enhance the efficiency of CDSS [6, 7].

Artificial Intelligence (AI) started in late 1956 and is described as “the study of "intelligent agents", tools that recognize their environment and take acts that maximize the probability of effectively achieving their goals” [14]. AI in the healthcare industry started through the enhancement of whiz systems, rules were acquired from interviews with medical experts, and the provided rules were programmed into a software system [15]. Round nearly 450 rules and SOPs based on an initial expert system was developed in 1976 named “MYCIN”, this system was used to suggest antibiotics for bacterial diseases. Due to the large volume of rules, the expert system was never used in practical clinics. To overwhelm the limitations of expert systems, Machine Learning (ML) techniques were developed and adapted for CDSS [15]. In ML manual rules are replaced by system-generated rules, as the ML algorithms learn and practice the environment and then use those learning rules in the future, which is more helpful in CDSS. The famous and effective ML technique is known as deep learning based on artificial neural networks [16]. ML works on the availability of data with volume and quality, consequently, sometimes ML-based systems are also known as Data-Intensive Systems. Data on healthcare is rapidly increasing especially after the evolution of the Internet of Medical Things (IoMT) and the availability of inexpensive electronic-based internet devices. Most of the CDSS are expert systems as compared to ML-based CDSS which becoming more popular due to efficient learning and prediction [17]. ML is widely used in health care experts’ systems, such as in X-rays, MRI machines, pointing cancerous places in lung nodules, tissues, and many other fields. In this article, we only focus on clinical decision support systems (CDSS) for respiratory illnesses. Therefore, it may be concluded that by using ML techniques we can enhance CDSS for respiratory diseases. Currently, researchers are using different ML techniques to diagnose and treat respiratory diseases, such as artificial neural networks, Deep reinforcement systems, and Convolutional neural networks (CNN). From the results point of view, ML for CDSS needs more work to enhance efficiency and accuracy. According to the cleverest authors’ information, there is no survey paper on machine learning CDSS for respiratory diseases. The following are the contributions of the proposed research work:

  • The article covers the overview of ML techniques used by CDSS for respiratory diseases.

  • The latest research work is provided for respiratory disease diagnosis using a clinical decision support system. After that recent research work for early detection of respiratory disease using machine learning is provided pros and cons.

  • Systematic comparison and discussion on the results of the state-of-the-art ML techniques used for respiratory disease treatment and precautionary measures. Detailed analysis and future directions are provided in the discussion section as analyzed from the literature.

Further, the paper is planned as follows: In Sect. 1 we have given a brief background of the title and discussed the aim of this paper. Section 2 is about the methods to recruit the studies included in this systematic review and the guideline that was followed by inclusion and exclusion criteria. Section 3 is the results in which the studies have been summarized in the form of a table. Section 4 presents a comprehensive overview of techniques used for the early detection of respiratory diseases using machine learning techniques. Section 3 consists of a detailed discussion of the provided techniques and future directions. In Sect. 5, the authors have discussed the future directions in detail. Section 6 consists of the conclusion of the systematic review paper.

2 Materials and Methods

In recent decades, there has been a significant increase in the number of studies published in biomedical literature, particularly in tropical medicine and health. Nonetheless, the available studies are often heterogeneous in nature, operational efficiency, and subject matter and can interact with the investigative problem in different ways, adding to the complexity of proof and the convergence of findings [18]. The high standard of proof as established by the evidence-based pyramid is the systematic examination and meta-analysis (SR/MAs). A well-managed SR/ MA is thus considered a viable approach to keep healthcare practitioners aware of current evidence-based medicine. In addition, despite increased guidance for the successful performing of a systematic review, we found that the key steps still begin with the problem of framing, identifying relevant study consisting of requirements creation and article-searching and evaluating the quality of the studies used, summarizing the data and interpreting the findings. Most problems can be dealt with by a researcher without any detailed clue [19]. In accordance with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [20,21,22], this review was planned, conducted, and reported.

2.1 Search Engine and Keyword Strategy

Using the Boolean operator, analogies and alternative phrases were combined to form a set of search sequences: AND focuses and limits the hunt, while OR extends and widens the search. We were able to refine our search by using Boolean operators like as (CDSS) AND (asthma OR COPD OR Cystic Fibrosis OR lung cancer) AND (machine learning OR computer vision OR neural network OR pattern recognition OR artificial intelligence).

PubMed, IEEE Xplore, and ScienceDirect were used to search for peer-reviewed papers. It was decided to limit the results in ScienceDirect to article reviews, peer-reviewed research publications, and conference abstracts. Until June 2020, all three sources have been reviewed (Fig. 1). These sources have been searched using the specified keywords.

Fig. 1
figure 1

Flowchart for the final selection of studies based on PRISMA guidelines [20,21,22]

2.2 Inclusion Criteria

  • Studies predicting respiratory diseases.

  • To make predictions about the diseases, studies are conducted using respiratory measures/parameters or pictures of the lungs obtained from different imaging techniques.

  • Studies based on machine learning methods.

  • To identify diseases using photographs, methodology based on usage of image segmentation algorithms software or applications

  • The English language is used throughout all of the content.

  • Published studies are only published in a journal or conference proceedings.

2.3 Exclusion Criteria

  • Studies that do not include the prediction of respiratory disease.

  • Studies that did not use respiratory parameters or pictures of the lungs as data to identify the condition.

  • Non-machine learning or detecting software research articles.

  • Studies that were published in any other language other than English are excluded.

  • Articles that were not reported on in any academic journals or conference sessions.

3 Results

Some of the studies that are recruited after following PRISMA guidelines are summarized in the table briefly. Using the search approach, authors independently assessed the quality of any papers that met the criteria for inclusion. To evaluate research quality and identify relevant data relating to study design and statistical methodologies were done. The details that are extracted from the chosen studies and separated in each column of Table 1 are the essence of any machine learning study. Table 1 column 3 represents the name of the study which was chosen in that particular study. The most important analysis this column represents is which disease is the most chosen and proved to be the center of focus by most researchers. When we talk about the machine learning article, in specific we are looking out for the details about the algorithm, feature extraction method, and accuracy. These three correlate with each other. For example in [23], the author yielded an accuracy of 96% by using CNN but in [24] the author yielded an accuracy of 92%. Both authors worked on the same disease, but the neural network architecture is different in both studies.

Table 1 Summarized details of the studies recruited after following PRISMA [20,21,22] guidelines

4 Discussion

In this section, a detailed discussion is provided on the techniques of machine learning (ML) being used and utilized by researchers around the world for the early detection of respiratory diseases. A typical ML techniques process starts with the collection of the relevant data, this data is then processed, analyzed, and examined for useful information [41, 42]. This information is the unique feature exhibited by the collected data. Then the ML algorithm is trained on these unique features. Finally, after training the model is implemented and tested in real time. The complete process is summarized in Fig. 2.

Fig. 2
figure 2

Steps required for machine learning system

4.1 Machine Learning Techniques

Numerous Machine Learning techniques can be used to detect respiratory diseases, but few among them have grasped the researchers’ attention (Fig. 3). In this section, these ML techniques are discussed in detail:

Fig. 3
figure 3

Widely used machine learning techniques

4.1.1 Logistic Regression

Logistic regression uses a logistic function by using binary dependent variables. It is a statistical model to evaluate variables. It uses two variables with the value pass or fail. Logistic regression produced linear classifiers. It uses a linear model followed by a link function. Logistic regression is a straightforward machine learning technique that can be applied to detect respiratory diseases by examining affected areas of the body [43,44,45,46].

4.1.2 K Nearest Neighbor (KNN)

KNN is an ML technique that is one of the standard classification processes in pattern recognition. It uses an instances-based learning and memory-based model, first it learns the available dataset, when new data is added to the dataset then the model follows the already followed instructions. KNN finds K numbers of training points close to the asked point using a similarity function established on the Euclidean Formula. It can be used to detect lung cancer and other respiratory diseases [47,48,49].

4.1.3 Decision Tree (DTREE)

A decision tree, one of the important ML techniques used in a hierarchical arrangement, consists of nodes and sub-nodes in the form of a tree. It has three types of nodes such as root node, internal, and terminal nodes. The provided data set we classified according to tree rules and maintaining hierarchical order. This is an efficient ML model and can be used to detect respiratory diseases [50,51,52].

4.1.4 Artificial Neural Networks (ANN)

ANN is a comparable technique built of various neurons whose purpose is defined by network architecture, correlation paths, and dispensation. This model gets knowledge from the available data and stores it in a weighted path [1]. It has multiple architectures, but multilayer perceptron (MLP) is the most popular as shown in Fig. 4. ANN model can be used for respiratory disease detection and treatment [53].

Fig. 4
figure 4

Artificial neural networks multi-layer perception

4.1.5 Support Vector Machines (SVM)

A support vector machine is an ML technique that is applied in statistical learning theory for prediction and is widely used in clinical decision support systems (CDSS). It is used for two-class classification problems and its essential form is a direct classifier, which performs classification by making a hyperplane that optimally takes apart the classes [2, 3]. It can use nonlinear classifiers for classification, and SVM can be used for early detection of respiratory diseases.

4.1.6 Random Forest (RF)

Random forest combines and produces many DTREEs using ensemble learning models. It is applied to a dataset and divided into a hierarchy such as a tree as shown in Fig. 5. Every node is further classified accordingly to reach the last word. This technique was created by Breiman and Cutler that includes two fundamental characteristics, bagging ideas. “Bagging” is pronounced as “Bootstrap and Aggregation” [54,55,56,57,58,59].

Fig. 5
figure 5

Random forest machine learning top-down classification

4.1.7 AdaBoost

AdaBoost is also a Machine Learning technique that can be used to predict future events by training the algorithm through available datasets. It works based on combining lean classifiers into a single robust classifier. It is used on top of any classifier for the classification process efficiently [60,61,62,63,64]. It works in two directions, begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted by calculating weight and assigned to each classifier to maximize global performance. 50% accuracy gets zero weight, and below 50% negative weight is assigned. Results are obtained from all the weak classifiers for further results in the future. The above ML technique can be used to train respiratory diseases and other diseases by providing proper rules/SOPs.

4.2 Machine Learning Techniques Used for Respiratory Diseases Through Clinical Decision Support System

In this section, a detailed overview of ML techniques/ research work done in the area of respiratory disease detection and prevention is provided. Two popular methods are used for testing respiratory diseases such as “Spirometry” and “Forced oscillation”. ML techniques used for respiratory diseases are divided into models used for Spirometry, used for Forced oscillation, and pulmonary function as shown in Fig. 6.

Fig. 6
figure 6

ML Techniques used for respiratory disease through CDSS

5 Research Work Done for “Spirometry”

In [32], three parameters such as forced expiratory volume in the first second (FEV1), forced vital capacity (FVC), and the ratio of FEV1/FVC is used for multiclass Support Vector Machine (SVM) correlated with Error-Correcting Output Codes (ECOC) for classification of spirometry patterns such as (obstructive, restrictive, and normal). According to the authors, the proposed model was capable during simulation of diagnosis of respiratory diseases and its accuracy was 97.32%. Its main limitation was that the proposed model was not evaluated on a real-world dataset as the accuracy was a dummy. Similarly, in [4] the authors used normal and abnormal FEV1 information collected from hundred patients to diagnose respiratory diseases using the ANN. The proposed model focused on FEV1 and ignored the other parts such as FVC and ECOC etc. In [5], the authors developed a second-order transfer function used to reduce airflow in COPD. A total of 336 patients were studied with COPD and five ML techniques were used to predicate COPD such as KNN, SVM, Linear Bayes, DTREE, and RBFNN. The accuracy achieved in the diagnosis of COPD was 88.2%, and sensitivity was 85%, with 98.1% specificity. The model performed well but its time complexity was too high as they used several techniques in one algorithm.

In [35], SVM classification was used with the similar model described in [34] for COPD classification, when formal spirometry criteria were discordant. Other diseases were also classified in the proposed mechanism. The proposed model correctly allocated 68% of the discordant (n ¼ 53) and correctly diagnosed it. The system also considered non-discordant subjects (n ¼ 370) and it was adept to correctly detect and diagnose of COPD in 95% of subjects. In [36] the author proposed the ML technique for early diagnosis of Asthma that bands clinic-epidemiological and spirometry knowledge. A total of 42 features were considered for diagnosis. MLP neural networks, as well as the sporadic decision tree method, were used for classification. Seventeen features were found efficient while detecting Asthma respiratory disease. The proposed classifier model has achieved an accuracy of 96% on only seven features selected iteratively and spontaneously. The main drawback of this model was the linear execution of features, if a parallel feature extraction model is used then the accuracy may be increased to a high level. Topalovic et al. [37] offered an ML method to detect lung diseases using clinical variables with a general accuracy of 68%. The proposed model was built on DTREE and provided upbeat prophetic value and warmth for diseases like COPD (78/83), asthma (66/82), neuromuscular disorder (54/100), and interstitial lung disease (52/59).

6 Research Work Done for “Forced Oscillation”

Researchers from State University, Rio de Janeiro is working on ML techniques for enhancing the force oscillation method to detect respiratory diseases [65]. Initially, they developed a model to manufacture classifier systems to assist in the analysis of respiratory diseases. The proposed model was managed to detect COPD using the forced oscillation technique. This was compared with other ML techniques such as Support Vectors Machines (SVM), KNN, ANN, Bayes Normal Classifier, and DTREE by using performance parameters such as the area under the ROC curve (AUC), sensitivity (Se), and specificity (Sp). Out of the used techniques SVM, KNN and ANN performed well and reached the value of analysis (AUC > 0.95, Se > 87%, and Sp > 94%) [66]. Similarly, the above methodology was employed to improve a reflex classifier to enhance the accuracy of the compelled oscillation technique to as soon as possible detect respiratory diseases in smoking patients. The author in [6] used a genetic algorithm (GA) with ten-fold cross-validation with the help of average AUC. The proposed enhanced model increased which shows the high correctness of respiratory disease detection.

ML techniques were used to enhance clinical decision support systems and enhance the correctness of the forced oscillation technique (FTO) in the hierarchy of air route obstruction levels in patients with COPD [7]. In the proposed study two-step solution was provided, in step first proved that FOT parameters did not distribute enough correctness in identifying COPD questions in the first step. In the second round of the proposed model, several ML techniques were analyzed. The ML techniques were applied where accuracy was not efficient in the FOT diagnosis process. From the results, it was concluded that KNN and RF classifier provided more accuracy as compared with other used ML techniques. From the result in the second round, it was analyzed that accuracy was reached to the required extent. But its limitation was not evaluated on a large real-world dataset. Similarly, the ML technique is used for the early detection of asthma using the FOT process through airway obstruction. During the study, it was noticed that the most excellent parameter of the FOT method was resonance frequency which achieved sufficient accuracy [8]. The second phase of the proposed study consists of various ML techniques used. All the used ML techniques enhanced detection exactitude up to the acceptable rate, however, a proper train model is required for future detection of respiratory diseases.

7 Research Work Done for “Miscellaneous Pulmonary Function Methods”

In this section, research work was done for respiratory disease detection other than Spirometry and Forced Oscillation. In [9] 2013, the authors proposed and provided the neural network usage for matching formal lung properties with the use of airflow. This was a very basic model in the start ear of ML in the CDSS. By the proposed system the author enhanced IT. Similarly, in [10] the authors proposed an ML-based model that detects exacerbation and successive triage in COPD patients. The proposed model uses a clinical predictive physician rule to train supervised prediction algorithms. Out of the used ML techniques, the Logistic Regression (LR) and Gradient-Boosted DTREEs (GB DTREEs) proved the ultimate accomplishment. By analysis, the system outperforms as compared with physical physicians. The proposed model also showed better performance in the context of sensitivity, specificity, and positive projecting value when predicting a patient’s demand for emergency care. The author also provided that the proposed system is not a substitute for physicians however it can be used in the home whenever the physician is not available in an emergency.

The difference between asthma and various wheezing subtypes in childhood was studied by [11]. Subject class members are not allocated to a particular class in the proposed method but are divided probabilistically into all classes to see different perspectives of the desired output. The author also concluded that the proposed model disambiguates the complex designs of symptoms communicated by these distinct diseases. Furthermore, in [12] the authors proposed an ML-based model using the ANN method, with social capabilities in evaluating and detecting respiratory disease. The proposed system works on 27 critical asked questions were prepared and implemented for respiratory diseases such as COPD, Asthma, Tuberculosis, and Pneumonia, the dataset consists of 60 cases. The proposed system achieved more than 90% accuracy, indicating that the provided system can be useful for clinical decision-support systems to early detect respiratory diseases. In [13] the author proposed an ML-based model to differentiate between, normal lung function, asthma, and COPD. The system works on the provided knowledge and symptoms of the diseases. The proposed model increased the accuracy up to 98.71%.

7.1 Latest ML Technique Used for the Early Detection of Respiratory Diseases

In this we provided the most recent work done in the field of early detection of respiratory diseases through ML techniques. In [14] the author provided a CDSS model for the detection of COVID-19 using chest X-ray images. They provided three-layer architecture to analyze the disease. In the first part preprocessing is done on the provided images supported by data intensification. The second part comprises learning and feature abstraction. The third phase of the proposed model generates prediction and classification using different classifiers. The provided result in the paper bore an AUC of 0.97 for inner validation and 0.95 for outer validation established on the number of chest X-ray images. In [15] the author proposed an early detection system for detecting the morality of COVID-19 through five features such as neutrophils, hs-CRP, age, lymphocyte, and LDH that improves in perfectly forecasting the impermanence of COVID-19 patients. In the proposed model different ML techniques were used to accurately predicate morality from COVID-19. The ML technique neural network predicated morality 96 percent for the entire duration, during the COVID-19 duration and 90 percent accuracy was predicated sixteen days before the actual duration of the disease. The model was evaluated into three different scenarios and the result shows that more than 90% accuracy was obtained. The main limitation of the provided model is that it should be analyzed on more diverse real-world data to check that either the system maintains the same accuracy or not.

In [16] the author proposed a new ML-based technique named as “BOMLA” indicator for respiratory disease patients. The dataset is collected from Khulna and Bangladesh Asthma patients. The proposed technique used various ML techniques for classification such as ACC, SE, kappa index, MCC, etc., alongside ROC chemical analysis. The proposed model detected ASTHMA with an accuracy of 94.35 percent by using the ADASYN classification technique. In [67] Autor presented a user-friendly and low-cost early detection tool for ASTHMA respiratory disease. From the proposed technique a practical application named DSS was developed to give benefit to clinical staff for asthma detection. From the proposed method one can build a recommender system that can be easily incorporated into a mobile system. The limitation of the proposed ML technique for the early detection of asthma is that the data set used consists of very limited data. The model needs to implement in a real environment and then check the accuracy.

In [17] the authors used three classifiers for detecting lung cancer such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Convolutional Neural Network (CNN). The proposed model is used to detect lung cancer in the early stage. They used the WEKA tool for the implementation of the proposed model. Results show that SVM, CNN, and KNN provided accuracy with 95.56 percent, 92.11 percent, and 88.40 percent respectively. If this model is used for other respiratory diseases, then the result will be better as compared to the already available models. The authors developed an ML-based CDSS technique that can be used in the clinic for respiratory disease detection. They presented BIOMARKER SIGNATURE that could classify SSC patients with and without PAH. The findings of the study were evaluated by an external associate. Serum trials from patients along with SSc and PAH (n = 77) and SSc not including pulmonary hypertension (non-PH) (n = 80) were randomly chosen from the clinical DETECT analysis and experienced proteomic vetting applying the Myriad RBM Discovery manifesto consisting of 313 proteins. Samples from a sovereign justification SSc cohort (PAH n = 22 and non-PH n = 22) were gained from the University of Sheffield [68].

7.2 Case Findings to Validate the ML Techniques on CDSS

Min et al. using the Geisinger Health System's medical claims information from January 2004 through September 2015, may compare and contrast the efficacy of the various methods under consideration. Both knowledge-driven features, which are features derived based on clinical information possibly connected to COPD readmission, and data-driven features, which are features collected from the patient data itself, are used as the basis for the machine learning models we develop. Based on the one-year claims history before discharge, our investigation indicated that the prediction performance may be improved from roughly 0.60 utilizing knowledge-driven features to 0.653 by integrating both knowledge-driven and data-driven features. We also show that the best AUC for these predictions is about 0.65, showing that even the most complicated deep-learning models cannot help [69]. Karthikeyan and his colleagues (2021) recommend using data from blood tests and machine learning (ML) techniques to estimate the fatality rate associated with COVID-19. Mortality may be predicted with 96% accuracy using a combination of neutrophils, lymphocytes, lactate dehydrogenase (LDH), high-sensitivity C-reactive protein (hs-CRP), and age. We have trained and evaluated the performance of many machine learning (ML) models (neural networks, logistic regression, XGBoost, random forests, support vector machines, and decision trees) to identify the model with the highest accuracy during the whole duration of the illness. The optimal approach, based on XGBoost feature significance and neural network classification, achieves an accuracy of 90% as early as 16 days prior to the conclusion. The suggested model's high predictive performance and applicability are verified by robust testing with three examples based on days to result. Using these primary biomarkers, a thorough analysis and trend detection were undertaken to provide actionable insights. The findings of this research provide methods that might improve the timeliness, precision, and dependability of healthcare system decisions for targeted medical treatments [70]. In prospective research conducted in a real-world inpatient environment, Segal et al. assessed the accuracy, validity, and therapeutic value of medication mistake warnings provided by a new system using outlier detection screening algorithms. In one hospital's tertiary care unit, they included an innovative outlier system in the current electronic medical record system. During those 16 months, the system tracked every prescription medication that was filled. All warnings were evaluated for precision, clinical relevance, and practicality by the department's personnel. All doctors' instantaneous reactions to alarms were captured. For just 0.4% of all pharmaceutical orders, the system issued an alert, resulting in a very light alert load. Adjustments in patient status leading to medication changes accounted for 60% of warnings that were raised after the drug had already been delivered (eg, changes in vital signs). The clinical validity of 85% of the warnings was verified, and 80% were deemed clinically valuable. There were corrections to subsequent medical instructions in 43% of warnings [65].

7.3 Limitations

If one class contains samples much higher than the other class, the obtained model can be overfitted or biased. Every class must have the same number of samples, however, this is usually not possible in the dynamic environment of diseases. In the multi-class categorization of COVID-19, pneumonia, and usual lung, the total no of images of pneumonia will be more than the images of COVID-19. Original image size classification and evaluation are necessary, as it takes more space and requires power [71, 72]. Researchers mostly squeeze/compress the image to reduce its size due to huge computational power and storage. So, large data sets of huge images are complex and difficult to process. To obtain high accuracy and provide efficient ML-based respirator disease detection mechanism one needs to have thousands of images in their dataset. By providing more data, the more accurate classifier can be built. However, due to the limitation of available datasets, there is no such accurate respiratory disease detection algorithm. This limitation will lead research to find another solution and it is a big challenge for ML-based methods as well. It necessitates various faults to make the set of classifiers perform best. The basic classifier applied should have very minimal correspondence. This will ensure that the errors of these classifiers will also change. Best classifiers combine another classifier to obtain more accurate and efficient results. Most surveyed studies only combine classifiers accomplished on similar characteristics. This leads to high correlation errors in the base classifier [73].

8 Future Directions

The following future directions are concluded from the provided literature and general future directions of early detection of respiratory diseases using Machine Learning Techniques. The following are major future directions:

  • Data set availability is necessary for ML techniques while detecting early respiratory diseases. Such as if we don’t have enough lung cancer images data set then if we build an efficient ML technique it will not provide the required accuracy as the data set will be weak. This condenses oversimplification errors because the model befits more wide-ranging when trained on supplementary examples. Healthcare data is difficult to obtain. Consequently, if the data set is made public, researchers can get supplementary data.

  • Most researchers used CNN for automatic feature extraction. Approximately other features are studied, such as HOG, Gabor, GIST, SIFT, etc. Though, many other features to be explored, such as quadtrees and image histograms. Exertions can be directed to dissimilar types of characteristics. This can solve the problem that errors are highly related when using integrated technology. More features bring more changes. When combining many variants, the results are usually better. Feature engineering permits further material to be extracted from current data that can better describe the variance in the training data.

  • Integration of open medical literature into future decision systems [74] may be aided by developments in Natural Language Processing (NLP, i.e. the capacity of computers to analyze human language). However, when data is lacking in some regions or in resource-limited contexts, the creation of ML-CDSS employing minimal variables may be of relevance. Paying close attention to the factors utilized by the ML-CDSS to forecast their result is essential; for instance, we discovered an ML-CDSS that relied on the administration of antibiotics in the intensive care unit to make predictions.

  • Quality and availability of clinical data utilized for developing and validating ML-CDSS are limitations. Future machine learning (ML) methods will only be helpful to physicians if extensive datasets containing relevant clinical data are made available.

  • Future ML-CDSS in ID should be incorporated in a systematic process of integration into clinical settings, and should be developed across a variety of health settings, including primary care and LMICs that are presently underrepresented. Clinical outcomes after long-term implementation in normal clinical treatment should be the focus of future research

9 Conclusion

In this article, a review study is conducted by using PRISMA guidelines on the early detection of respiratory disease through clinical decision support systems using machine learning techniques. Initially, the overview of machine learning techniques that are used for the early detection of respiratory diseases such as Asthma, lung diseases, etc., are discussed. Then different ML techniques such as KNN, DTREE, SVM, ANN, etc., are explained. The pros and cons of different work done in the field of respiratory disease diagnoses are also explained in the study. Furthermore, detailed work on the early detection of respiratory diseases using machine learning, such as COVID-19 morality detection, Lung cancer detection in early stages is presented. Finally, the studied literature is discussed and open issues are highlighted in the field of early detection through ML techniques with future directions.