Introduction

Large quantity of data is generated in health care domain and rate of storing healthcare information in clinical databases is growing with a fast pace. Hence, it becomes important to find the relevant information from the explosive amount of data for effective treatment of patients and diseases diagnosis and prediction. Presently, research community give more attention towards healthcare domain data and tries to establish the relationship between health care data and machinery learning algorithms and mine the significant information for better treatment of diseases. Large number of machine learning and medical data mining model have been presented in literature. Nowadays, application of soft computing techniques are also increased gradually in diverse areas of medical services such as for helping the medical expert, owing to effectiveness and continuous improvement of classification and detection system, upgrading diagnostic rate and many more.

Data mining and soft computing

A non trivial process of extracting potentially profitable, novel, valid and ultimately understandable patterns from huge collection of data is termed as knowledge discovery [1]. According to Cios, et al. [2] and Cios & Moore [3] knowledge discovery process is a six step mode, which involves multiple two-way iterations and interactions, as it may impact other steps in the series. The six steps of knowledge discovery process can be shown in Fig. 1 and consist of feedback loops which shows the continuous improvement in the model. Whereas, the block diagram for the knowledge discovery process is shown in Fig. 2. The objective of knowledge discovery process is to define a set of steps such as plan, work through and reduce the cost of project, describing procedures to be undertaken as part of each step. But, due to rapid increasing of numbers of existing databases and size of information, it is tough task to extract the the information manually from these databases [5]. Data mining is one of the most important processes for finding valuable information and patterns from databases and further this extracted information can be used in decision making process. It is iterative in nature and follows an interactive methodology, including many states such as visualization of information, various database techniques, recognition of patterns, neural networks, artificial intelligence, machine learning, statistical techniques, etc. Recently, various soft computing techniques have also been encapsulated along with data mining to handle most vital, challenging and motivating areas of medical research. As we are aware, the last decade has witnessed major advancements in the methods of the diagnosis. Soft computing techniques involve multiple methodologies working synergistically to provide significant information processing capacities for managing realistic but ambiguous situations. The prime challenges in traditional approaches are mainly precision, certainty as well as hardness. But, in soft computing, the principle notion is to devise methods of calculation leading to evolution of low cost solution. It is found that the adoption of soft computing for data mining task improve the accuracy rate as well as reduce the time constraint. Earlier, data processors were involved in making a knowledge based decision support systems in the clinical area, collecting data from medical experts and transferring in the strain of computer algorithms manually, thus ultimately leading to knowledge creation. Apparently, it became a time consuming process and also include the manual processes of medical expert opinion. To overcome above mentioned problem is to applicability of soft computing technologies in medical expert opinion such as genetic algorithms, neural nets, fuzzy logics, etc. [69].

Fig. 1
figure 1

Six step knowledge discovery process

Fig. 2
figure 2

Block diagram for knowledge discovery

Further, in developing countries like India, medical challenges are increasing day by day with more crunches on resources due to ever increasing population. Shortage of medical specialists further adds up to the problem of high mortality and various diseases. Sometimes, it is also seen that patients ignore the symptoms of diseases in earlier stage and approach for medical facilities either when it’s too late or it become extremely critical. One of the solution for handling such types of problems is to use intelligent soft computing prediction model over medical data and build medical support organizations which can mimic the doctor’s reasoning and conclude the symptoms without consulting of specialists. The integration of the computer technology with medical field is one of the most dynamic research area.

Method for literature review

Research question (RQ)

The primary aim of this review is to find the answer of following research questions (RQ). These are

  • RQ1: What are the various soft computing techniques available for diseases diagnosis and prediction?

  • RQ2: What is the overall accuracy achieved from these techniques?

  • RQ3: What are the existing research gaps in context of soft computing techniques and disease diagnosis and prediction?

  • RQ4: Which technique can improve the diagnosis or prediction rate for physician’s decision-making process?

Source of information

Due to wide scope of review articles, it is recommended that various research databases can be explored for existing work. So in this study, six databases are searched to find the existing related work in the domain of soft computing and medical data mining. These databases are

Inclusion and search criteria

The aim of this study is to outline the relationship between various soft computing techniques and healthcare domain particularly, the applicability of soft computing techniques for diseases diagnosis and prediction. Hence, the studies based on diseases diagnosis and predictions using soft computing techniques are included in this review if the following criteria is met.

  • Related to soft computing techniques

  • Includes the data on diseases diagnosis and prediction

  • Computerized implementation and systems to detect diseases

  • Includes single and multiple occurrence of diseases

  • Work published in between 2008 to 2016

  • Published in SCI listed journals

Initial search considers all relevant existing work in the area of soft computing techniques and diseases diagnosis and prediction. The SLR of this study is mentioned in Fig. 3. The keywords used for this work are (((Soft Computing Techniques) < OR > (Machine Learning) < OR > (Evolutionary methods) < OR > (Swarm Based Techniques)) < AND> ((Diseases diagnosis) < OR> (Diseases Prediction) < OR > (Diseases detection)) < AND > ((Classification Algorithm) < OR> (Classification Model) < OR> (Expert System) < OR> (Rule Based System) < OR> (Fuzzy System) < OR> (Neuro Fuzzy System) < OR> (Cases based Reasoning))).

Fig. 3
figure 3

Methodology and evaluation of review on soft computing techniques and health care data

To find all existing literature work, the above search string is applied on full text rather than title or abstract. The range of the literature work is set to January 2008 up till February 2016. In this study, various research databases are searched for the existing literature survey. The different research databases having different searching standards. Hence, the searching strategy is also designed accordingly. Some of databases do not have the “And” option. For those, search string is reassessed and the search can perform using following keywords.

  1. 1)

    ((Soft Computing Techniques) < OR > (Machine Learning) < OR > (Evolutionary methods))

  2. 2)

    ((Diseases diagnosis) < OR > (Diseases Prediction) < OR > (Diseases detection))

  3. 3)

    ((Classification Algorithm) < OR > (Classification Model) < OR > (Expert System) < OR > (Rule Based System) < OR > (Fuzzy System) < OR > (Neuro Fuzzy System) < OR > (Cases based Reasoning))

Exclusion criteria

The exclusion criteria is also considered after search process. The exclusion criteria includes content from books, magazines, newsletters and educational courses; case studies; articles published in national and international conferences, symposium and workshops and articles published in non SCI listed journals.

Extraction of articles

Initially, 2658 research articles are collected from all sources mentioned in subsection 2.2. Huge amount of research articles are collected due the generic terms “soft computing techniques” and “diseases diagnosis and prediction”. One of the reason is that the research databases also search the articles using computing, techniques, methods and detection keywords. In the next step, articles irrelevant to our search criteria i.e. soft computing techniques and diseases diagnosis, are removed. As a result of this, only 1784 articles are left. Further, In this study, the articles published in SCI listed journals are considered. But, the search string is not capable to Differiante between SCI listed journals and non SCI listed journals. So, this can be done in manually and all of articles published in non SCI journals, books and magazines are removed from the list, resulting in the exclusion of 1560 articles. Rest of article is assessed in terms of title, abstract, full text and keywords. As a result of this, only 118 articles are considered for review. Further, it is also noticed that 26 article are not fitted well in our predefined search criteria and these are articles are excluded from the list. Finally, 95 articles are undergone for the review study. All these articles have been published in SCI listed journals and Table 1 shows the details of journals, publishers and number of articles being published in journal.

Table 1 Categorization of research article with journals

Further, for scrutiny of collected articles, a team of five researchers is being formed. This team is responsible for the selection of the research articles using our predefined search criteria. Initially, three researchers select the articles and the selected articles are further cross checked through fourth and fifth members of team. The conflict situation can handle using collectively decision of all team members. This process is repeated at each step of study selection. It is also found that twenty three different diseases are diagnosed and predicted using soft computing techniques. Out of 95 research articles, 36 articles have been published with new technology, whereas, 47different feature selection/ parameters optimization and rule generation/ extraction methods have also been reported in finalized articles. It is also observed that the SVM techniques are widely employed for disease diagnosis and prediction. This technique can be applied in fifteen different research articles whereas ANN techniques can be used in eleven different articles. PSO is most significant technique which can be applied for parameter optimization of SVM and ANN, whereas, PCA is most reliable technique for features reduction.

Data extraction strategy

All the finalized research articles are categorized in five different techniques and also explored thoroughly to find some common key properties for comparison. The data extracted from these articles are presented in Tables 1, 2, 3, 4, 5 and 6, and the five different architecture based on the information contained in these articles are shown in Figs. 4, 5, 6, 7 and 8. Each article studies through and evaluates to find the following.

  • Technique/Methodology Adopted

  • Type of Disease

  • Utility/ Accuracy

  • Authors

Table 2 Summary of medical data mining techniques using Classification Approach
Table 3 Summary of medical data mining techniques using Expert Systems
Table 4 Summary of medical data mining techniques using Fuzzy and Neuro- fuzzy Based Approach
Table 5 Summary of medical data mining techniques using Rule Based Approach
Table 6 Comparison of rule based and case based system.
Fig. 4
figure 4

Categories of clinical support system

Fig. 5
figure 5

Classification model

Fig. 6
figure 6

Block diagram of expert system

Fig. 7
figure 7

Fuzzy based system

Fig. 8
figure 8

Rule based expert system model

Literature review

This section describes the literature survey related to usage of soft computing techniques in medical data analysis. This section includes various techniques related to classification, clustering, regression and association for disease prediction and diagnosis. For this study, substantial research literature related to mining on medical data using soft computing is collected. All collected literature divides into five subsections on the basis of information and data. Each subsection summarizes the latest research or progress in the era of soft computing techniques and its applicability in healthcare domain. Figure 4 shows the various categories of clinical support system used in this study.

Classification method based systems

Classification is a simple data mining method to divide the sample items into target categories with a goal for ensuring that the each items belongs to its category i.e. close to accuracy. For example, classification can be used to determine the condition of patient is either better or worst on a particular day based on disease symptoms. The classification method can be broadly categorized into a binary and multilevel classification. The most elementary model for classification problem, when only two potential values for the target attribute exists, known as binary classification. For instance, high credit rating or low credit rating. While, in multilevel approach, multiple values are presented in terms of target class such as low, medium, high, or unknown credit rating. In classification, information set is divided into training and test sets, the training set is used to make the model, whereas, the test set is used to square off the accuracy of the model and also to validate training model. Classification model is one of the commonly used models applicable in medical data mining. Figure 5 shows the process of classification method based systems. Table 2 summarizes the information on medical data mining techniques based on classification methods.

For automatic detection of normal and Coronary Artery Disease conditions, Giri et al. [10], have used heart signals rather than cardic signals to design a methodology [10]. The heart signals are associated with several frequency sub bands using Discrete Wavelet Transform (DWT). Hence, to reterive the heart signal from DWT and also to reduce the dimension of data, three statistical methods are applied on the set of DWT coefficients. Further, the normalized information is passed through a set of four classifier such as Support Vector Machine (SVM), Probabilistic Neural Network (PNN), Gaussian Mixture Model (GMM), and K-Nearest Neighbor (KNN) for decision making. The results indicate that the GMM model coupled with ICA provides higher accuracy rate. Further, to improve the classification performance of the Coronary Artery Disease, Babaoglu et al., have developed a BPSO, GA and SVM based classification model based on stress testing data [11]. In the proposed model, BPSO and GA techniques are applied for obtaining the relevant set of features to predict Coronary Artery Disease. Whereas, SVM is adopted as classifier system to classified patients with CAD. For enhancing the diagnostic rate, 10 cross fold technique is incorporated in SVM classifier. Results indicate that the proposed classification model achieves higher accuracy rate and having less complexity. To investigate the characteristics of diabetic patients and also to predict type-2 diabetic effectively, Patil et al., have developed a hybrid prediction model for diabetic disease [12]. In the proposed prediction model k-means clustering algorithm is applied to validate the class labels of diabetic dataset and classical C4.5 method is selected to build the final classifier using k-cross fold validation method. From results, it is stated that the proposed system attains higher sensitivity and specificity rate than the other methods being compared. An intelligent hepatitis diagnostic system for caring and treatment of hepatitis patients is reported in [13]. The working of the proposed diagnostic system is divided into two parts- features extraction and reduction, and classification of hepatitis disease. In the features extraction and reduction part, firstly features are extracted from hepatitis database and further, the more relevant features are obtained by using PCA method. Whereas, in classification part, LSSVM classifier is employed on reduce set of features to obtain final results. It is seen that the PCA-LSSVM based diagnostic system not only improve the accuracy rate of hepatitis disease but also enhance the sensitivity and specificity rate. Another application of SVM technique is also reported for effective diagnosis of cancer affected patients [14]. In this work, QPSO algorithm is applied to train the various parameters of SVM and an active set strategy is also introduced for obtaining good results.

To accurate diagnosis of erythemato-squamous disease, Abdi et al. [15], have developed a diagnosis model based on particle swarm optimization (PSO), support vector machines (SVMs) and association rules (ARs). In this model, ARs are used to to determine an optimal subset of features. Whereas, PSO algorithm is adopted to optimize the various parameters of SVM. Finally, the optimized SVM is applied to identify the patinets with erythemato-squamous disease or not. It is also seen that the proposed model have 98.91 % accuracy rate. For early detection and effective treatment of breast cancer patients, Bhardwaj et al. [16], have proposed a genetic algorithm inspired neural network model for diagnosing the breast cancer. In their work, authors optimize the value of weight and neural netwok structure by using genetic algorithm. The results of the proposed model are compared with BPNN and Koza model of neural network. It is observed that genetically optimized neural network provides better results. Fei has presented an effective classification model based on PSO and SVM techniques for effective treatment of arrhythmia disease [17]. In this model, PSO is integrated within SVM to optimize its parameters. The results illustrate that PSO-SVM model attains higher diagnostic rate in comprasion to ANN for the diagnosis of arrhythmia cordis. For the better diagnostic rate of hepatitis disease, Chen et al., have described a hybrid classification algorithm based on fisher discriminant analysis (LFDA) and supporting vector machine [18]. In their work, authors adopt the LFDA method as a feature selction tool, whereas SVM is applied to improve the diagnostic rate of hepatits disease. It is noticed that proposed hybrid classification algorithm provides higher accuracy rate i.e. 96.7 %. An automated disease classification system has been reported in [19]. In their work, author adopts LVQ based ANN for effective diagnosis and treatments of various dieseases. The reults report that LVQ-ANN gives better reults in comprasion to other established studies. A computer aided diagnosis system is developed for automatically detection of Alzheimer’s disease (AD) [20]. In this work, neuroimages of brain are considered for the evaluation of Alzheimer’s disease. It is also noted that PCA method is adopted for finding the relevant features from neuroimages and these features are further enhanced by applying LDA technique. Finally, NN and SVM methods are employed to detect the Alzheimer’s disease affected patients. Results stated that the NN and SVM models provide significant improvement in accuracy rate as compared to classical voxels-as-features (VAF) reference approach. To overcome the shortcomings of classical CBR systems, Park et al., have presented a cost sensitive based system for disease diagnosis, called CSCS [21]. In proposed system, genetic algorithm is adopted to find the optimal classification cut off points. The results of CSCS are comparing to two other cost sensitive methods i.e. C5.0 and CART. It is found that the CSCS method gives overall better results than C5.0 and CART.

For the diagnosis of atherosclerosis, Latifoglu et al., have developed a medical diagnostic system based on PCA, k-NN and artificial immune recognition system (AIRS) [22]. In this work, carotid artery doppler signals are considered to diagnosis the atherosclerosis disease. Further, it is also noted that the proposed medical diagnostic system consists of four phases. In first phase, fast Fourier transformation (FFT) technique is applied to obtain the features of disease. In second phase, PCA techniques is applied to reduce the dimensions of data as well as to find more suitable and important features from features set. In third phase, a k-NN based weighted technique is adopted for preprocessing of data. Finally in fourth phase, AIRS classifier is applied to find the patients with atherosclerosis. The proposed system obtains almost 100 % accuracy rate.

Expert systems

An expert system is an application to help healthcare providers to make clinical decisions by analyzing the medical data. It can be understood as a computer program simulating judgement and tries to imitate human or an organization that has experience as well as expert knowledge in a particular subject area. The usage of artificial intelligence methods is increasing because of the effectiveness in classification and detection schemes assisting experts in the medical field. The architecture of expert system is explained in Fig. 6 and Table 3 provides the utility of different expert system based approaches.

It is well know fact that the classical systems are not handle imprecise and vague information effectively. The classical systems work with sharp boundaries. But, the fuzzy systems have capability to represent the imprecise and vague information. Hence, to keep in mind the application of fuzzy theory, Lee et al., have developed a fuzzy based expert system for diabetic disease [38]. In their work, authors have proposed a five layer fuzzy ontology based expert system for effective diagnosis of diabetic disease and provides remarkable results. Due to increased applicability of artificial methods in medical diagnosis, Sengur has developed a heart valve disorder detection system to determine the normal and abnormal heart valves [39]. In this study, doppler heart sounds are considered for the detection of normal and abnormal heart valves patients. The proposed system consists of three layers. First layer performs data pre processing tasks viz. filtering, normalization and denosing etc. In the second layer, features are extracted from doppler heart sounds using wavelet packet decomposition method and further, the dimension reduction is also done with the help of PCA technique. The third layer of the system is responsible to classify the patients with normal and abnormal heart valves. For the classification pupose, artificial immune system (AIS) and fuzzy k-NN techiques are implemented in third layer. Results report that the proposed detection system provides higher sensitivity and specificity rate. In continuation of their work, sengur has also developed another detection system for heart valve disorder using linear discriminant analysis (LDA) and adaptive neuro-fuzzy inference system (ANFIS) [40]. To exchange the information within medical devices, Arsene et al., have developed collaborative expert system for medical diagnosis [41]. The proposed expert system is a collabrative framework of the software agents and Bayesian networks are applied for repersentaion of the knowledge. This system consist of three key concept i.e. knowledge management, uncertainty reasoning and software components. Authors concluded their work through hybrid arctiecture of AI techniques, software agents and Bayesian network. In medical science, it is still tough task to identify earlier symptoms of diseases. To address the same, Pal et al., have developed a prescreening expert system for diagnosis of coronary artery disease (CAD) in earlier stage [42]. In their work, authors identify the risk factors responsible for CAD and further, these risk factors are derived in fuzzy rules with help of doctors and fuzzy inference system. Finally, prescreening expert system is presented for diagnosis of coronary artery disease. It is observed that the proposed system is capable to identify risk of CAD at an earlier stage. Further, it is noted that the proposed system achieves higher sensitivity rate up to 95.85 %. Hariharan et al., have presented an expert diagnostic system for the prediction of Parkinson’s Disease [43]. In this work, PCA and LDA methods are used to extract features and the extracted features, SFS and SBS methods are used for dimension reduction and selected attributes are further classified with the help of LS-SVM, PNN and GRNN to find patients with PD. The proposed diagnostic system attains higher accuracy rate of 100 %.

For earlier detection and effective treatment of thyroid disease, Keles et al., have presented a thyroid disease expert system, called it ESTDD [44]. In ESTDD, neuro fuzzy classification method is adopted for detection of thyroid disease. It is found that ESTDD system gains 95.33 accuracy rate to diagnose thyroid disease. To improve the detection rate of Parkinson’s disease (PD) and also making an effective and efficient diagnosis system for PD, Chen et al., have presented a diagnostic expert system for PD affected patients [45]. In this work, fuzzy k-nearest neighbor method is employed for Parkinson’s disease diagnosis. To improve the detection rate, PCA technique is adopted to find the new features for Parkinson’s disease. From results, it can be seen that the proposed diagnostic system obtains 96.07 % accuracy rate using 10 cross fold validation method.

Fuzzy and neuro-fuzzy based system

Fuzzy or Neuro fuzzy based systems that are used in health check decision making provide more fleshed out medical exam data to be examined in a briefer time. In the last two decades, the use of fuzzy method of medical analysis is increasing as classification and detection systems have enhanced a great bargain to serve the medical experts in the assignments. The block diagram of fuzzy and neuro-fuzzy based system is presented in Fig. 7. Table 4 depicts the information of different fuzzy and neuro fuzzy based systems.

By adopting the application of fuzzy set theory, Ucar et al., have proposed a new hybrid machine learning method for identification of tuberculosis disease [46]. The proposed machine learning method is the combination of Adaptive neuro fuzzy inference system and Rough sets. The results indicate that the proposed method provides more vibal results in comprasion to other algorithms being compared.

To predict the heart valve disorder, Uguz, has developed an adaptive neuro fuzzy based system [47]. In the proposed system, three layers are presented for features extraction, features selection and classification. Features extraction is done by DWT, whereas, Shannon entropy algorithm is adopted for features selection. Finally, all selected features are classified using ANFIS classifier. The proposed system obtains 98.3 % classification accuracy. For the effective diagnosis of coronary artery disease (CAD), Muthukaruppan et al., have presented particle swarm optimization (PSO)-based fuzzy expert system [48]. In the proposed system, initially decision tree is implemented to find the best features for better prediction of coronary artery disease. Further, these features are coverted into fuzzy if then rules and make a database of fuzzy rules. Prior to convert the crisp set into fuzzy set, a fuzzy membership function is applied in fuzzy set theory and this function has significant impact on the fuzzy output. Hence in this work, authors are adopted PSO algorithm to tune the value of fuzzy membership function. From results, it is noted that the proposed fuzzy system is more capable to diagnosis of coronary artery disease in comprasion to other methods being compared. Seera and Lim have described a fuzzy min-max neural network for medical data classification task [49]. The proposed system comprises of Fuzzy Min–Max neural network, Regression Tree, and the Random Forest algorithms. The efficacy of the proposed system is tested on Breast Cancer, Diabetes and Liver Disorders diseases and provides significant results. To achieve better accuracy and diagnostic rate of breast cancer affected patients, Ubeyli et al., have proposed an adaptive neuro-fuzzy based inference system (ANFIS) for breast cancer detection [50]. In this work, first order Sugeno model is applied with two fuzzy if-then rules and further, this model is integrated with neuralk network architecture for effective detection of breast cancer diesease. It is found that the ANFIS achieves 99.08 % accuracy rate. A survey paper on application of neuro fuzzy systems is reported in [51]. This paper also includes the application of neuro fuzzy systems (NFS) in medical systems. A small portion of dieseases diagnosis is also highlighted in above mentioned article. Papageorgiou et al., have investigated the application of augmented fuzzy cognitive maps to design a novel framework for dynamic decision support system (DDSS) for medical informatics [52]. In the proposed framework, fuzzy rule extraction methods are integrated with fuzzy cognitive maps. It is seen that the proposed DSSS have capability to handle variety of data and knowledge from various sources and also provides more accurate results in comprasion to other existing decision support systems. For pre-determination the risk of cardiovascular disease, Sanza et al., have developed fuzzy rule-based classification systems using fuzzy rule base and interval-valued fuzzy sets [53]. In their work, authors initially alter the linguistic labels of classifier using interval-valued fuzzy sets. After alteration process, an operator (Ka) is introduced to determine ignorance degree in fuzzy inference process. Further, the genetic algorithm is applied to find the optimized value for parameter of operator (Ka) and this operator is applied within each rule of classification systems. It is observed that the proposed fuzzy rule-based classification system is a suitable tool to medical diagnosis of cardiovascular diseases. In continuation of their work, Ubeyli et al., have also proposed an adaptive neuro-fuzzy based inference system for detection of six types of erythemato-squamous diseases [54]. The proposed system obtains higher accuracy rate in comparison to neural network. Kannathal et al., have also shown the applicability of neuro fuzzy network for assessing and examining the different activities regarding autonomic nervous system (ANS) [55]. In their work, authors consider heart signals to find various heart abnormalities. From results, it can be stated that proposed neuro fuzzy tool is an effective and efficient tool for diagnose the different heart abnormalities.

Rule-based system

For the knowledge representation, rules are one of the most popular paradigms. A rule based expert system consists of a knowledge base that takes the domain knowledge coded in some patterns. The architecture of rule based system is described in Fig. 8. The utility of rule based systems are mentioned in Table 5. Some of key advantages of rule based systems are given as:

  • Modularity: To allow encapsulation of knowledge and expansion of the expert scheme.

  • Explanatory: Simply by keeping track of the rules that are fired, an explanation facility can show a range of reasoning resulting to a certain end.

  • Human process resemblance: Rules help to explicate the structure of knowledge to the experts.

A considerable amount of research has already been performed in the area of medical sciences using the rule based technique; we have tried reviewing some of the literature work applying this methodology till date. The basic model for rule based systems can be seen in Fig. 7.

Ciabattoniet A., et al. [58] examined three mathematical frameworks introduced by the authors in [59, 60] to decipher the numbers and the inference mechanism based on fuzzy logic probability theory and poroblasitic logic, respectively using Computer-Assisted DIAGnosis (CADIAG-2). The comprehensive precision of proposed diagnosis for CADIAG-2, in comparison to the actual diagnosis is approximately around 80 %. Barakat et al. have presented a SVM based system for diagnosis of diabetes [61]. In the proposed system, rules are extracted using blak box model and the diagnostic rate of proposed system is 94 %. For effective diagnosis of Alzheimer’s disease, an association rule based model is reported in [62]. The proposed system evaluates the rules based on the association mining. Further, it is also noticed that in the proposed system, PCA is applied to find the significant features of Alzheimer’s disease and these features are further classified by using the SVM. Results indicate that proposed system obtains an accuracy rate up to 91.7 %.

Ell et al., [63] have examined patients with either focal Basal Ganglia (BG) lesions or Parkinson’s Disease (PD) on the basis of claims of selective attention and working memory. Focal BG lesions individuals were impaired on tasks with high working memory demand and behave alike to healthy controls on the task with high selective-attention demand. People with PD were also impaired on both projects. The data indicated that the patients with BG lesions exhibited an early-training impairment on a rule-based task in which the demands on working memory were high, but not on a rule-based task that required to selectively attend to one dimension. Instead, it was observed that the PD impairment on the task with high working memory demand was fuelled up by the economic consumption of suboptimal decision strategies.

To reduce the possibility of errors in decision, Astrom and Koker have presented a rule-based parallel neural network for effective diagnosis of Parkinson’s Disease (PD) affected patients. The proposed system contains two phases. In first phase, parallel feed-forward neural networks is implemented for classification of PD data. Whereas, the rulebase is designed based on voting decision scheme. Results show that the classification rate of Parkinson’s Disease is increased by 8 % in comparison to other method being compared.

Kong et al., have developed a clinical decision support system (CDSS) for observing the stratification of risk in cardic chest pain affected patients [65]. In proposed CDSS, belief rule-based inference concept is adopted for obtaining the better and viable results. The belief rule-based inference is implemented using belief rule base learning method. It is also seen that the proposed CDSS can handle both of clinical domain knowledge and clinical data uncertainties in effective manner. Kumar et al., have described an alternative approach to deal with the problems viz. high complexity, changing medical conditions and low experienced new staff [66]. The proposed approach is the mixture of case-based reasoning and rule based method and it is totally different from pure rule based reasoning method. Experiment results prove the existence of the proposed method to deal the ICU data.

To investigate the various factors responsible for heart disease in men and women, Nahar et al., have described association rule mining based system to identify the heart disease factors in patients [73]. The proposed system includes the advantage of both the association rule mining and the computational intelligence. Further, it is observed that the rulebase is constructed using apriori, predictive apriori and tertius rule generation algorithms. Results indicate that proposed system is a significant tool for heart disease prediction. Lisboa et al., have examined breast cancer survival data using rule based method [67]. In this work, the analysis of breast cancer data is done in three distinct stages- time-to-event modeling, risk stratification and model interpretation. Event modeling is achieved by selecting the benchmark linear model. Whereas, risk stratification can be measure using cox regression and partially logistic ANN. An automatic rulebase are used to interpret or analysis the breast cancer data with linear model. In their work, authors adopt the Orthogonal Search Rule Extraction (OSRE) method to generate the rule automatically. Further, authors also claimed that the proposed model classifies the patients with low and high risk of breast cancer survivability. For clinical data extraction process, Mykowiecka et al., have presented rule-based information extraction system [68]. The mammography reports and hospital records of diabetic patients are used to achieve the same. The results show that proposed information extraction system reliable and useful for automated clinical data processing. To identify and resolve discrepancies regarding PD affected patients, rule based category learning is reported in [69]. The rule based learning considers various factors like pattern generation, maintenance, shifting and selection for effective treatment of PD patients. Further, neuroimaging and computational modeling are also integrated with rule based learning to get more accurate results. For monitoring heart patients, Seto et al., have developed a rule-based expert system to predict the heart failure symptoms [70]. The proposed rule based expert system is incorporated in mobile phone. In proposed system, rules are designed with the help of heart failure clinician’s experts. The performance of the rule base is assessed through a randomized controlled trial. It is found that the proposed rule based expert system improves the clinical management, self-care and quality of life. For effective treatment of knee arthritis, Wei et al., have described a rule based system based on rough set theory [71]. In the proposed system, rules are extracted using rough set theory and specialist opinions. The proposed system consists of three stages. In first stage, data preprocessing is done with the help of specialist opinions. In second stage, two level feature selection methods are adopted to find out relevant features of knee arthritis. In third stage, rules are generated and discretized using rough set theory. It is indicated that proposed system provides valuable and robust results for of knee arthritis. For diagnosing type 2 diabetes, Sevastianov et al., have presented a rule based framework with evidential reasoning for type 2 diabetic patients [72]. The proposed framework is also pinpointing the two collateral problems such as interval-valued belief structures from different sources and comparison of belief intervals. The first problem is handled using interval extended zero method. While, Dumpster–Shafer theory is applied to overcome the belief intervals comparison problem. Results demonstrate that the proposed framework improves the classification rate of type 2 diabetic.

Case based reasoning (CBR) system

It is a novel AI problem solving paradigm, and has drawn the attention of both soft computing and data mining communities [82]. For a long time, research focused on rule-based expert systems which use production rules to capture domain knowledge and applied an inference algorithm to induce. In recent years, the focal point of research shifted from this principle-based approach to new approaches to knowledge-based reasoning. One of these new approaches is Case-based reasoning. The architecture of case based reasoning system is given in Fig. 9. The advantages of case-based reasoning over reasoning with rules are summarized in Table 6. It has been applied successfully in various medical domains [8387] and is considered an important instance of decision support systems [88]. It is one of the reasoning methodologies that simulate human reasoning using past experiences to resolve novel problems. Table 7 describes the summary of CBR systems.

Fig. 9
figure 9

Case based reasoning system

Table 7 Summary of medical data mining techniques using Case Based Reasoning Approach

Generally, the classical CBR model consists of four step cycle of problem solving [89]:

  • RETRIEVE: To call back similar cases to the new case.

  • REUSE/ADAPT: To reuse previous results of the most similar example of the new case. Adaptation process can sometimes be included as well to adapt the settlement of the retrieved case is to fit the new case.

  • REVISE: To revise the suggested solution for confirmation.

  • RETAIN: To retain the learned cases.

For medical data classification, Fan et al., have presented a hybrid model based on fuzzy decision tree and case-based reasoning methods, called it decision making system [90]. In the proposed system, case based reasoning method is employed for data preprocessing phase. Whereas, fuzzy decision tree and genetic algorithm are applied to construct rules for the proposed decision making system. Further, it is noticed that the proposed system cam make more accurate decision in comparison to other models being compared. To diagnosis the chronic obstructive pulmonary disease, Guessoum et al., have developed a case-based decision support system, named it RESPIDIAG [91]. The proposed RESPIDIAG is the combination of case based reasoning and experts opinions. Five data imputation methods are adopted to handle the missing data. Out of five, three methods compute the similar data for missing information. Results indicate that proposed system provides satisfactory performance. Marling et al., have explained four synergistic systems based on case based reasoning and AI techniques [92]. These are CARE-PARTNER, 4DSS, RHENE and Malardalen Stress System. CARE-PARTNER is a decision support system that is developed for oncology patients, undergone for stem cell transplantation. 4DSS is developed to help the diabetes-1 patients. The proposed system detects problems related to blood glucose control and also gives the valuable suggestions to overcome these problems. RHENE is presented to handle the patients of ERSD disease. Malardalen Stress System (MSS) is developed for the diagnosis and treatment of stress. For effective treatment and also reducing prediction errors, Huang et al., have proposed case based reasoning classifier [93]. The performance of the proposed classifier is compared with PSO optimized ANN and adaptive neuro fuzzy inference system. It can see that the proposed classifier provides better performance in comparison to PSO-ANN and ANFIS. To reduce the threats of chronic disease, Haung et al., have suggested a model for prognosis and diagnosis of chronic diseases [94]. The proposed model incorporates the data mining techniques in case based reasoning. In the proposed model, data mining technique is adopted to devise the rules for examine the health of patients and further, these rules are used to find the presence of chronic diseases. Whereas, CBR is applied for effective diagnosis and treatment of the chronic diseases. Results show that the proposed model is very helpful for doctor as well as patients for the treatment of chronic diseases. Lin and Chuang have presented an intelligent liver diagnosis model for the presence of liver disease in patients [95]. The proposed diagnosis model integrates CBR, ANN and analytic hierarchy process to predict the liver disease affected patients. Further, it is noticed that the proposed is also capable to diagnosis various types of liver diseases. To increases the accuracy rate and also improve the transparency in terms of disease diagnosis, McSherry has developed a conversational case-based reasoning (CCBR) model [96]. In the proposed model, iNN(K)-L method is applied to select the relevant features and further, CCBR algorithm is implemented to diagnosis the disease. Results demonstrate that the existence of the proposed model in the field of medical diagnosis. A case-based reasoning system is also reported for diagnosis of liver disease [97]. The proposed system integrates back-propagation neural network (BPN), regression tree, logistic regression and discriminatory analysis for early diagnosis of liver disease and also enhances classification rate. Another applicability of cased based reasoning is reported in [98]. In the proposed system, GA is applied to obtain appropriate feature of hypertension patients. Further, CBR technique is adopted for the detection of hypertension affected patients. Same with CBR systems, a very few techniques are reported i.e. ANN. An evidence based decision support system for personal health index is reported in [102].

Conclusion

The main aim of this study is to identify, categories, and assess the role of different soft computing techniques for diseases diagnosis and prediction. Ninety Two research articles are examined and also arranged these articles into five different classes on the basis of information and data. The main findings of this review are summarized as follows:

  • (RQ1) Lot of soft computing techniques have been presented for diseases diagnosis and prediction including SVM, ANN, decision tree, CART, fuzzy, neuro fuzzy and many more. In this work, these techniques are classified into five different classes such as classification model, expert system, fuzzy and neuro-fuzzy based model, rule based model and cased based reasoning approach. It is found that in classification model, SVM and ANN are widely adopted for diseases diagnosis and prediction and both of techniques give viable results. Further, it is also noticed that several studies have also been reported on the parameters optimization of SVM and ANN and observed that PSO and GA algorithm are applied to obtain the optimized value of SVM and ANN parameters. In expert system, rules are obtained with help of fuzzy logic and neuro fuzzy methods. The rules are described in the form of if then structure and it can see that k nearest neighbor and PCA/LDA methods are adopted to find the significant parameters of diseases. The fuzzy and neuro-fuzzy models investigate the applicability of fuzzy logic and neuro fuzzy concept with other soft computing technique except rule generation. Fuzzy logic and neuro fuzzy concepts have been widely used for parameters optimization of different techniques via AIS, PSO, k-NN etc. In rule based system, rule base have been designed for prediction and diagnosis of diseases. It is noticed that decision tree and some learning methods are adopted for designing the rules. The case base reasoning have also been incorporated with various algorithms like ANN, GA, PSO, AIS for better accuracy and prediction.

  • (RQ2) For classification model, PSO-SVM, DT-BN-ANN, GA-ANN and K-Means-SVM have achieved higher accuracy for various types of diseases i.e. 98.91 %, 99.48, 98.24 and 97.38 respectively. While, fuzzy k-nearest neighbor method have obtained an accuracy rate of 96.07 % for expert system class. In fuzzy and neuro-fuzzy based system, k-NN-Fuzzy-AIS have obtained 99.1 % accuracy for diagnosing the disease. In case of CBR system, BPNN-LR-CBR has achieved 95 % accuracy while, DT-GA has obtained 97.44 % accuracy for disease diagnosis and prediction. Overall, it is concluded that SVM and ANN based techniques provide more accurate prediction of diseases rather than other approaches.

  • (RQ3) In classification model based system, it is noticed that most of researchers have applied SVM, ANN and tree based techniques for diseases diagnosis. Further, numbers of studies have been presented on optimizing the parameters of SVM and ANN and mostly statistical method like PCA, LDA etc. have been adopted by researchers for optimizing parameter. A very few have been reported on GA and PSO. Hence, it can conclude that optimized values of parameters enhance the performance of techniques. On the other side, it is reported that evolutionary computational methods have always advantage over traditional methods, so other evolutionary algorithms can apply for optimizing the parameters of SVM and ANN. It is also observed that DT can work efficiently with linearly separable data and in case of linearly non separable data, its performance very poor. Whereas, SVM works well with linearly non separable data. It is found that for some diseases, SVM works better and for other, DT provides better results. The other issues related to SVM performance is optimal selection kernel, decreased accuracy rate in multi class classification, extensive memory requirement. Hence, to make an effective and better decision support system, it is necessary the proposed technique can work efficiently with both of linearly and non-linearly separable data of diseases. In terms of ANN technique, the numbers of hidden layers and numbers of input neurons affects the accuracy rate. The main disadvantage related to ANN is over fitting and under fitting of data. Another, issues related to ANN is its black box nature and computationally extensive. In expert system approaches, a very few number of soft computing algorithms have been applied for rule generation process like AIS and GA etc. But, large numbers of evolutionary and soft computing methods have been reported in literature and these methods provide better results for solving large number of problems such as ABC, ACO, TLBO, etc. In case of fuzzy and neuro fuzzy based approaches, parameter optimization is one of the major concern area and also affects the performance of the system. The optimized values for parameters also work better than user defines values and provide more accurate results and prediction. Another issue related to applicability of the fuzzy logic either it is used for rule designing or fuzzified the values of parameters. In rule based systems, rules are designed for the diagnosis and prediction of diseases. But, the selection criteria of diseases symptoms are one of main concern for effective rule designing process. So, in rule designing process, feature selection techniques play an important role, if features are not selected correctly then it affects the accuracy rate as well as diagnosis rate. In case of CBR systems, a very few techniques such as ANN, DT and PSO, are integrated to diagnose the diseases.

  • (RQ4) Thirty eight research articles have presented the framework and models for effective diagnosis and prediction of various diseases. It is also seen that twenty four models have obtained the accuracy rate higher than 95 %. Hence, these models will help for the doctors and clinicians for effective treatment of patients and also for early detection of diseases. Further, it is also stated that the improvement in clinical practice is laid in the development of computerized systems oriented to diseases symptoms. Rests of articles have described either the new classification algorithm for better diagnose of diseases or presented the new methods for the selection of significant symptoms of diseases.

Apart from these, the several points can be highlighted from this study:

  • Twenty six papers are based on unique diseases which show the significant applicability of soft computing techniques for diseases diagnosis and prediction.

  • A lot of articles have been reported for effective and better treatment and diagnosis of Cancer and heart diseases.

  • Both of the clinical and environmental parameters are considered for diagnosis of various diseases.

  • SVM, ANN and FL techniques are widely used in the health care domain for diagnosis as well as earlier identification of diseases symptoms.

  • In present scenario, Case based reasoning systems act as an effective tool in medical diagnosis.

  • SVM technique obtains better diagnosis accuracy and prediction rate in comparison to other methods being compared.

  • The inbuilt parameters of techniques also affect its performance in terms of accuracy, sensitivity and specificity.

  • The parameters optimization is also a warm area of research to enhance the performance of techniques and number of algorithm are applied for this.

  • PSO algorithm provides better results in terms of parameter optimization among all other methods.

  • A very few article reported on the application of evolutionary, swarm and nature inspired algorithms in the field of medical data analysis.