Keywords

1 Introduction

Artificial Intelligence (AI) is an emulating human intelligence on machines to act and think similar to human beings. It contains reasoning, planning, intelligent search, machine learning and perception building. Traditional AI problem resolving methods are focused on problem states and rule set design to draw transitions in problem states. AI techniques are better for inductive and analogy-based learning compared to supervised learning [1]. These techniques are less feasible for optimization and machine learning with regard to uncertain data. These failures of traditional AI opened up with solutions from CI techniques for real world problems. CI adopts techniques motivated by nature, that possess the ability to learn and deal with new situations with high computational speed. Also, they are less error-prone to noisy information sources [2].

1.1 Motivation

There are several data mining techniques to analyze data. Still there is always a scope for new approaches to analyze data for better decision making. The current trend is to develop techniques for data analysis based on domain, volume and type of the data. This approach has every chance of better decision making than generic approaches. Hence, techniques tailored based on the domain (banking, finance, social media, medical etc.) would perform better. Every domain has some or other form of uncertainty that has to be interpreted properly. This research is focused on medical domain.

In general, medical data has uncertainty due to the reason that a patient suffering from a specific illness cannot be completely determined by one or more symptoms; a certain set of symptoms can only indicate that there is a probability of a particular illness.

Broadly medical misdiagnosis can be classified into three classes.

  • False positive: misdiagnosis of a disease that is not actually present.

  • False negative: failure to diagnose a disease that is present.

  • Equivocal results: inconclusive interpretation without a definite diagnosis [60].

General data mining methods are inefficient in dealing with cognitive uncertainties such as vagueness and ambiguity.

Proper analysis of medical datasets is one of the best ways to discover potentially useful information for diagnosis and also drug discovery. This research work aims to address the problem of uncertainty in medical datasets. Fuzzy-rough set based data preprocessing methodologies have been proposed for handling medical datasets. Finally an ensemble of rule based fuzzy-rough classifier is built for analyzing medical datasets.

The proposed work has been carried out to achieve the following objectives that address the above mentioned problem.

1.2 Objectives of the Proposed Work

To study the role of uncertainty in real world problems and to understand the scope of RST for analyzing uncertain data.

To design an efficient classifier that handles ambiguity and vagueness in medical datasets for better diagnosis of illness.

To design and built a rule based fuzzy-rough classifier for analyzing uncertainty in medical dataset.

To perform experimental analysis to evaluate the performance of the proposed classification model.

To test the performance of the proposed approach with existing state-of-the-art approaches.

1.3 Computational Models for Prediction

In general , systems built based on CI techniques are used for decision making. Decision making is essential in all fields of human activities and affect every sphere of our life. Today’s real world problems are mostly data driven. Data accumulation in every domain of life is a current challenge for data engineers. Often, data is imprecise, ambiguous, vague and uncertain. Decision making is a difficult process due to the uncertainty in data. Firstly, data has to be set free from ambiguity and uncertainty. Then, it can be used for decision making. These situations can be better handled with CI based solutions. CI techniques constitute of neural networks (NN), genetic algorithms (GA), fuzzy systems, rough set and hybrid systems (combinations of NN, GA, fuzzy system and rough sets).

Neural networks learn by instances. Thus, NN architectures identifies known instances of a problem before they are tested for their ‘inference’ capability on known instances of the problem [3]. Therefore, it identifies new instances before untrained. It has ability to generalize and predict new outcomes.

Genetic algorithms are computerized search and optimization techniques based on procedure of selecting natural genetics. GA and evolutionary strategies mimic the principle of natural genetics and natural selection to construct search and optimization procedures [4].

Fuzzy set theory is proven to solve uncertainty in data. Rough set is a new approximation of a crisp set. It deals with uncertainty in data by using approximation [5]. Uncertain data in a given model can be handled by indiscernibility for rough sets and vagueness for fuzzy sets.

2 Uncertainty in Data

Uncertainty may be due to lack of knowledge or insufficient information [69]. Vagueness and ambiguity are the two major forms of uncertainty. Vagueness is associated with the difficulty of making sharp or precise distinctions in the real world [10, 11].

Ambiguity is associated with two or more alternatives such that the choice between them is left unspecified. Broad classification of uncertainty is depicted in Fig. 1.

Fig. 1
figure 1

Classification of uncertainty

Information based uncertainty arises from lack of information. It is classified into discord and ambiguity. Discord is associated with the conflict in choosing among several alternatives of the attribute. It is handled by the probability theory [12]. Ambiguity can be addressed by rough set theory, to discover hidden patterns in data. It finds partial or total dependencies in databases, eliminates redundant and missing data.

Linguistic uncertainty that arises in natural language is vague and also the precise meaning of the words can change over time. Linguistic variables are words or sentences in a natural or artificial language. The meaning of linguistic variable is given as vagueness. Vagueness is existence of objects which cannot be uniquely classified relative to a set or its complement [1].

2.1 Classical Set Theory Versus Fuzzy Set Theory Versus Rough Set Theory

This section provides an overview of classical set, fuzzy set and rough set theories which includes the basic concepts, notations, applications and limitations. A summary of the same is given in Table 1.

Table 1 Classical set theory versus fuzzy set theory versus rough set theory

Classical Set Theory

A set is defined as collections of objects which share certain characteristics. Set theory is the branch of mathematical logic that studies sets. It was initiated by Georg Cantor and Richard Dedekind in 1870s. After the discovery of paradoxes in naive set theory, numerous axiom systems were proposed in the early twentieth century, of which the Zermelo–Fraenkel axioms, with the axiom of choice, are the best-known [47]. Table 1 gives a summary on the basics, operations and applications of set theory.

Fuzzy Set Theory

Fuzzy Set Theory permits gradual assessment of membership of elements in a set with real value interval [0, 1]. It represents classical bivalent sets as crisp sets [13]. It is used when information is vague, incomplete or imprecise. In real life, human imprecision causes an information available to be vague or fuzzy. Vagueness is handled by making use of “soft” boundaries of fuzzy sets i.e., graded membership which gives subjective knowledge to define these attributes. Fuzzifying attributes is tedious when precise information is available. Brief on basics, operations and applications of fuzzy theory is depicted in Table 1.

Rough Set Theory

Rough Set Theory is an extension of conventional set theory that supports approximations in analyzing decisions [14]. It is represented as a pair of sets, Lower and Upper approximation of crisp set. Lower approximation identifies objects that certainly belong to subset of interest, whereas upper approximation identifies objects that possibly belong to subset [1518].

Rough set theory based models efficiently handles incomplete or imperfect knowledge [1921]. Table 1 gives the outline of basics, operations and applications of classical set, fuzzy set and rough set theories.

2.2 Combining Fuzzy and Rough Set Theories

Fuzzy and rough set theories evolved as successful approaches to represent and compute imperfect data in real world applications [2224]. Both of the theories are not competitive rather they are complementary to each other [2527]. Along with the complementary nature, the similarities between them have motivated to develop a hybrid theory that covers the strengths of each. Lynn Deer et al., have proposed the process of fuzzifying the lower and upper approximations [2830].

In this theory, given incomplete information \((A)\) representing a subset of given universe \((U)\) containing examples of a concept \(\left( C \right),\) the lower and upper approximations, along with an equivalence relation \((R)\) that models indiscernibility can be expanded in the following two ways:

  1. i.

    Objects in set \(A\) can belong to a concept with varying degree, i.e., making the set into a fuzzy set.

  2. ii.

    Objects are classified into classes with soft boundaries, using similarities among the objects which are represented by a fuzzy relation \(R\). Here, the fuzzy relation is used instead of indiscernibility relation.

This article does not intend to cover fuzzy-RST in its entirety. Rather, it confines to a brief introduction of rough-fuzzy and fuzzy-rough concepts.

Rough-Fuzzy Sets

A rough-fuzzy set is simplified from approximation of a fuzzy set in a crisp approximation space, where decision attribute values are fuzzy and conditional values are crisp [42]. In rough-fuzzy set, lower and upper approximations of objects belonging to these sets are defined as

$$\mu_{{{{\underline{P}}}X}} \left({\left[x \right]_{P} } \right) = inf \left\{{\mu_{X} \left(x \right) |x \in \left[x \right]_{P} } \right\}$$
(1)
$$\mu_{{{\overline{P}}X}} \left( {\left[ x \right]_{P} } \right) = sup\left\{ {\mu_{X} \left( x \right) |x \, \in \, \left[ x \right]_{P} } \right\}$$
(2)

where \(\left[ x \right]_{P}\) is crisp, \(\mu_{X} \left( x \right)\) is degree of \(x\) fit into fuzzy equivalence class \(X\), tuple \(<{\underline{P}}X,\overline{P}X >\) is represented as rough-fuzzy set. Rough-fuzzy sets simplified to fuzzy-rough sets, when equivalence classes are fuzzy.

Fuzzy-Rough Sets

Competing with rough-fuzzy set theory, researchers focused on fuzzy-rough set theory [44, 45]. Earlier, such type of hybridization makes data analysis on information systems. Dubois suggested fuzzy-rough set theory which approximate fuzzy sets [43]. Rough sets are defined in terms of fuzzy membership function. It represents boundary regions, positive and negative region. In boundary region objects belong to membership value is 0.5, positive region objects belong to membership value is 1and negative region objects belong to membership value is 0.

2.3 Neural Networks

Neural networks are simplified models of the biological nervous systems. They can be defined as data processing systems, consisting of a large number of simple, highly interconnected processing elements (artificial neurons), in an architecture inspired by the structure of the cerebral cortex of the brain.

Learning methods in NN are broadly classified into three types: supervised, unsupervised and reinforced [31].

In supervised learning, every input pattern that is used to train the network is associated with an output pattern, which is the target or the desired pattern.

In unsupervised learning, the target output is not presented to the network.

In reinforced learning, indication whether the computed output is correct or incorrect is given. A reward is given for the correct output and incorrect output is penalized. Classification of learning algorithms as depicted in Fig. 2.

Fig. 2
figure 2

Classification of learning algorithms

Table 2 shows the classification of the NN systems listed above, according to their learning methods and architectural types.

Table 2 The classification of NN system with respect to type of architecture

2.4 Genetic Algorithms

Genetic algorithms are adaptive heuristic search methods based on principles of evolutionary ideas of natural selection and genetics [40]. GAs represents decision attributes of a search problem into fixed-length strings of certain cardinality. Strings results as candidate solutions to search problem for specified chromosomes, alphabets are referred to as genes and values of genes are named as alleles. When problem is encoded in a chromosomal manner and a fitness measure for discriminating good solutions from bad ones have been chosen, solutions to search problem start to evolve using Initialization, Evaluation, Selection, Recombination and Mutation [41].

Initial population of candidate solutions is generated randomly across the search space. When population is initialized, fitness values of the candidate solutions are evaluated. Selection allocates more copies of those solutions with higher fitness values and thus imposes the survival-of-the-fittest procedure on candidate solutions. Recombination combines parts of two or more parental solutions to create new, possibly better solutions. The offspring under recombination will not be identical to any particular parent. Table 3 presents most commonly used selection methods, recombination (crossover) operators, mutation operators and replacement of GAs [39].

Table 3 Genetic algorithm operators (GAO)

While recombination operates on two or more parental chromosomes, mutations are changes in genetic sequence of a chromosome. These changes occur at many different levels of individual chromosomes. The offspring population created by selection, recombination, and mutation replaces the original parental population.

2.5 Hybrid Systems

Hybrid systems is combination of two or more techniques to overcome limitations of individual techniques. It is impossible to deal with either as purely continuous or discrete-event system without ignoring important phenomena that result from the combination of continuous and discrete movements of this system [35].

Mohammed Hamed Ahmed Elhebir described minimum support requirement dictates the efficiency of association rule mining [48]. If support threshold is low, then not truly interesting rules are generated. On the other hand, if the support threshold is high, then interesting rules are missed from the rule set. Eliminating redundant rules and clustering decreased the size of the generated rule set for obtain interestingness rules.

Abraham et al., presented biological motivation on particle swarm optimization and ant colony optimization algorithms. The basic data mining terminologies are explained by using swarm intelligence techniques [49].

Benxian Yue et al., investigated optimal reducts using a particle swarm optimization approach. This approach observed change of positive region as particles proceed throughout search space is best attribute [50].

Dong-Hwa Kim et al., proposed hybrid approach genetic algorithms and bacterial foraging algorithms for function optimization problems [51]. Performance of hybrid approach is studied on mutation, chemotactic steps, crossover, variation of step sizes and lifetime of bacteria.

Various applications of the hybrid systems is given in Table 4.

Table 4 Hybrid systems and their applications

3 Classification Modeling Using CI Techniques

Fuzzy and rough sets are well known for analyzing different aspects of uncertainty. Combination of these techniques will enable to build a robust mathematical approach for combating problems of uncertainty. This work suggests fuzzy-rough theory based solution for classifying medical datasets. Medical datasets are considered as special domain as most of the times, medical data is incomplete, vague and noisy. Fuzzy-rough rule induction (FRRI) is developed to generate rules for classification. Further, proposed FRRI is tested to understand the performance on par with other CI techniques such as Multi-Layer Perceptron (MLP) and Fuzzy-Genetic Modeling (Table 6). The metrics are used for perforamce analysis of classifiers is shown in Table 5.

Table 5 Performance measures

3.1 Proposed Algorithm for Rule Generation

This section gives the proposed algorithm for rule generation that uses Fuzzy-Rough Rule Induction. In the proposed hybrid classifier lower and upper approximation of RST is fuzzified by adopting the approach given below:

  • A fuzzy set in \({\text{X}}\), is simplified to set \({\text{A}}\), allowing these objects can fit to class label to different degrees.

  • Alternatively, estimate objects indiscernibility, their approximate equality, represented by a fuzzy relation R may be measured. Subsequently, objects are classified into classes with ‘‘soft” boundaries based on their equality to each other.

By definition, abrupt transitions between classes are adjusted by gradual ones, allowing that an element can fit (to varying degrees) to more than one class.

Fuzzy Indiscernibility

An information system \(I\) is considered. The fuzzy indiscernibility relation \({\text{R}}_{\text{a}}\) is used for any fuzzy relation that determines degree to which two objects are indiscernible. The following equations are based on a quantitative attribute in tolerance relations \({\text{R}}_{\text{a}}\):

$${\text{R}}_{\text{a}} \left( {{\text{x}},{\text{y}}} \right) = { \hbox{max} }\left( {\hbox{min} \left( {\frac{{{\text{a}}\left( {\text{y}} \right) - {\text{a}}\left( {\text{x}} \right) +\upsigma_{\text{a}} }}{{\upsigma_{\text{a}} }},\frac{{{\text{a}}\left( {\text{x}} \right) - {\text{a}}\left( {\text{y}} \right) +\upsigma_{\text{a}} }}{{\upsigma_{\text{a}} }}} \right),0 } \right)$$
(3)

where \(\forall {\text{x}},{\text{y}} \, \in \, {\text{U }}\) with \(\upsigma_{\text{a}}\) denoting the standard deviation of a. Fuzzy-rough set theory lower and upper approximations are defined by an implicator I and t-norm τ. The following are the fuzzy B-lower and B-upper approximations of a fuzzy set A in U.

$$\left( {{\text{R}}_{\text{B}} \downarrow {\text{A}}} \right)\left( {\text{y}} \right) = {}_{{{\text{x}} \, \in \, {\text{U}}}}^{ \inf }\uptau({\text{R}}_{\text{B}} \left( {{\text{x}},{\text{y}}} \right){\text{A}}({\text{x}}))$$
(4)
$$\left( {{\text{R}}_{\text{B}} \uparrow {\text{A}}} \right)\left( {\text{y}} \right) = {}_{{{\text{x}} \, \in \, {\text{U}}}}^{ \sup }\uptau({\text{R}}_{\text{B}} \left( {{\text{x}},{\text{y}}} \right){\text{A}}({\text{x}}))$$
(5)

\({\text{R}}_{\text{B}} \,\downarrow \,{\text{A }}\) is set of elements necessarily satisfying concept (strong membership), while \({\text{R}}_{\text{B}} \,\uparrow \,{\text{A }}\) is set of elements possibly belonging to concept (weak membership). Mainly, these were designed to deal with uncertainty in data. Fuzzy B-positive region is defined based on fuzzy B-indiscernibility relations, for \({\text{y}} \, \in \, {\text{X}}.\)

$${\text{POS}}_{\text{B}} \left( {\text{y}} \right) = \left( {\bigcup\nolimits_{{{\text{x}} \, \in \, {\text{U}}}} {{\text{R}}_{\text{B}} \,\downarrow \,{\text{R}}_{\text{d}} {\text{x}}} } \right)\left( {\text{y}} \right)$$
(6)

Degree of dependency of \({\text{d }}\) on \({\text{B}}\), \(\upgamma_{\text{B}}\) by

$$\upgamma_{\text{B}} = \frac{{\left| {{\text{POS}}_{\text{B}} } \right|}}{{\left| {\text{U}} \right|}} = \frac{{\sum\nolimits_{{{\text{x}} \, \in \, {\text{X}}}} {{\text{POS}}_{\text{B}} \left( {\text{x}} \right)} }}{{\left| {\text{U}} \right|}}$$
(7)

An algorithm for classification of medical dataset with a hybrid approach using fuzzy-rough set theory is represented as Fuzzy-Rough Rule Induction (FRRI).

Algorithm: Fuzzy - Rough Rule Induction (FRRI)

Given medical dataset \({\text{M}}\), select randomly a subset \(B\) of conditional attribute \({\text{A}}\).

  1. (a)

    Initially subsets \({\text{B }}\) of conditional attribute, ruleset \(R\) and cover set \({\text{Cov}}\) are empty.

  2. (b)

    For each attribute \({\text{a}} \, \in \, {\text{M}},\) repeat the following steps.

  3. (c)

    For each object \({\text{o}}_{1} \, \in \, {\text{M}}\), repeat the following steps.

  4. (d)

    Compute \(\upgamma\) degree \({\text{D}}_{1}\) of belongingness of \({\text{o}}_{1}\) to positive region of attribute \(a\).

  5. (e)

    Compute \(\upgamma\) degree \({\text{D}}_{2 }\) of belongingness of \({\text{o}}_{1}\) to positive region for given dataset M.

  6. (f)

    If degree \({\text{D}}_{1}\) equals to \({\text{D}}_{2}.\)

  7. (g)

    Then construct the rule \({\text{r}}\), for the object \({\text{o}}_{1}\) and attribute subset \(B \cup a\).

  8. (h)

    Add the rule to ruleset \({\text{R}}\) if r does not have same or more coverage than existing rules in ruleset \({\text{R}}\). Update the coverage set.

This algorithm generates fuzzy-rough if-then rules for classifying a given medical dataset. However, it is observed that in such rules, antecedents are covering almost all attributes of decision system which eventually increase computational time in classification. It is favorable to combine rule induction and attribute selection process. Thereby, fuzzy rules are generated from best attributes that maximally cover the decision system.

3.2 Multi-layer Perceptron for Classification

Multilayer perceptron (MLP) classifier is based on the feed forward artificial neural network [32]. Here, information moves in only forward direction from input nodes and passed through next hidden nodes to output nodes. There are no cycles or loops in the network. It consists of multiple layers of nodes. Each layer is fully connected to the next layer in the network. In input layer, nodes represent input data. All other nodes map inputs to outputs by a linear combination of inputs with the node’s weights \(w\) and bias \(b\) by applying an activation function. This can be written in matrix form for MLP with \(K + 1\) layers as follows:

$$y\left( X \right) = f_{K} \left( { \cdots f_{2} \left( {w_{2}^{T} f_{1} \left( {w_{1 }^{T} X + b_{1} } \right) + b_{2} } \right) \cdots + b_{K} } \right)$$
(8)

A multi-layer neural network can compute a continuous output instead of real numbers called step function. A common choice is the so-called logistic function. Sigmoid function refers to the special case of the logistic function. Nodes in intermediate layers use sigmoid (logistic) function \(f\left( {Z_{i} } \right)\).

These nodes of networks consists of multiple layers are interconnected in a feed-forward direction. Every neuron in one layer shown feed-forward directed network to its subsequent layer. This process is implemented by sigmoid function as an activation function in this network.

$$f\left( {Z_{i} } \right) = \frac{1}{{1 + e^{{ - Z_{i} }} }}$$
(9)

Nodes in the output layer use softmax function:

$$f\left( {Z_{i} } \right) = \frac{{e^{{Z_{i} }} }}{{\sum\nolimits_{K = 1}^{N} {e^{{Z_{K} }} } }}$$
(10)

Number of nodes in the output layer corresponds to the number of classes.

3.3 Genetic-Fuzzy Modeling (GFM)

This section gives a hybrid CI approach that have attracted a lot of attention in past decade, which successfully offered solutions to many real world problems. In this work, hybridization of fuzzy logical and genetic algorithm (GA) is used to construct the classification model. GA are stochastic search based on natural selection and genetics. The fitness function simply defined as candidate solution to the problem as input and produces as output how “fit” our how “good” the solution is with respect to the problem in consideration. Calculation of fitness value is done repeatedly in a GA and therefore it should be sufficiently fast. The significant attributes selected from dataset helps in diagnosing system to built a classification fuzzy inference model. The rules for the fuzzy system are generated from dataset. These rule sets are most significant and optimal subset of rules are selected using genetic algorithm. The benefits of GA and fuzzy inference system for effective prediction of heart disease in patients. Genetic-Fuzzy Logic (GFL) model for effective heart disease prediction [33].

Genetic-Fuzzy Logic

GFL finds fitness of a chromosome in a population by genetic algorithm, as it decides termination criterion. The attributes in the dataset are selected using GA and the fuzzy inference system for classification. The fitness function value is the measurement that helps to check the nearness of the optimal solution. Genetic operators used in genetic algorithms are analogous to those in the real life: survival of the fittest, or selection; reproduction (crossover, also called recombination); and mutation.

Selection is a process of selecting parents among the population by using roulette wheel selection. Intermediate crossover is used to select parents and interchanges position of values based on crossover point fixed. The values before fixed point from one chromosome is replaced with first part of new chromosome and the values that are in the second chromosome are replaced with new ones, thus inheriting the features of both the parents. Gaussian mutation is a process of changing gene values based on its given probability 0.05 and 1. The stochastic search of the genetic algorithm stops based on the convergence criteria. GA combine high performance notions to achieve better performance for getting optimal solution.

Membership Functions

Membership function represents the fuzzy set and measures degree of similarity. It is defined as fuzzy set \(A\) on the universe of discourse \(X\) is defined as \(\mu_{A} :{\text{X}} \to [0,1],\) where each element of X is mapped to a value between 0 and 1. This value, called membership value or degree of membership, quantifies the grade of membership of the element in \(X\) to the fuzzy set \(A\).

$$\mu_{A} \left( {x,c,s,m} \right) = exp\left[ { - \frac{1}{2}\left| {\frac{x - c}{s}} \right|^{m} } \right]$$
(11)

where \(c\) is center, \(s\) is width and \(m\) is fuzzification factor.

Fuzzy Inference System

A fuzzy inference system maps inputs to output using predefined fuzzy rules available in knowledge base. The knowledge base consists of if-then fuzzy rules that specify the relationship between the input and output fuzzy sets. If \(a1, a2, \ldots ,an\) are the attributes and \(c1, c2, \ldots ,cm\) are class labels then a fuzzy rules are based on the linguistic values. As it requires the input in fuzzy values, the input is fuzzified and output from the inference system is defuzzified.

Fuzzy Classifier

Fuzzy classifier learns data in the form of rules and predicts target value for set of test data. It uses cross validation tenfold technique for estimating the test error. This process identifies subset for testing with all the other subsets as training subsets. Using trained model testing subset is classified. It continues for ten times with different training and testing data. The actual labels are provided based on the data in the dataset whereas the modified labels are framed based on the fuzzy classifier. Here, defuzzification methods replaces fuzzy values to their corresponding crisp values by using centroid method.

In GFL, genetic algorithm implements stochastic search on dataset to reduce number of features. Fuzzy inference system predicts test data by fuzzy Gaussian membership function and centroid defuzzification method.

3.4 Rule Classifier Using Fuzzy Ant Colony Optimization

Ant Colony Optimization (ACO), an inspired algorithm from nature, has been successfully applied to classification tasks of data mining [5456]. A rule-based system for medical data mining by using a combination of ACO and fuzzy set theory, called Fuzzy ACO-Miner was proposed by Mostafa Fathi Ganji and Mohammad Saniee Abadeh [52].

Fuzzy ACO-Miner operates in rule generations and its optimization phases. Initially, an ACO algorithm is applied to learn fuzzy rules. This algorithm applies the artificial ants to explore among the training samples and gradually deriving fuzzy rules. The ants learns the rules related to each class separately corresponding fuzzy rules.

Ant constructs rule randomly by adding one term at a time and in the next iterations the ants modify rule. Each ant chooses term to modify (or add to current rule in the first iteration) with following probability:

$$P_{i,j} = \frac{{\tau_{i,j} \left( t \right)\,\cdot\,\eta_{i,j} }}{{\sum\limits_{i}^{a} {\sum\limits_{j}^{{b_{i} }} {\tau_{i,j} \left( t \right)\,\cdot\,\eta_{i,j } } } }}$$
(12)

\(\eta_{i,j}\) is a problem-dependent heuristic value for term. The function that defines the problem-dependent heuristic value. \(\tau_{i,j}\) is the amount of pheromone currently available (at time t) between attributes. \(I\) is the set of attributes that are not yet used by the ant.

It is necessary to decrease the pheromone of terms that have not participated in the construction of rules. For this purpose, pheromone evaporation is simulated. To simulate the pheromone evaporation in real ant colony, the amount of pheromone associated with each term that does not occur in the constructed rule must be decreased. The pheromone of unused terms is decreased by dividing the amount of the value of each \(\tau_{i,j}\) by the summation of all \(\tau_{i,j}\) [53].

Fuzzy ACO-Miner has also have additional features that make it different from existing classifiers based on ACO meta-heuristic. Unknown data is classified using fuzzy ACO-Miner based on averaging both the number of rules and the covering value to classify the input data.

3.5 Fuzzy Discrete Particle Swarm Optimization Classifier for Rule Classification

In contrast to traditional mining approaches, biologically inspired algorithms such as evolutionary algorithms and swarm intelligence approaches can also be used for the purpose of classification task. Hence, fuzzy discrete particle swarm optimization classifier by local search (fuzzy DPSO-LS) classifier is proposed to handle imprecise and uncertain data is explained by Min Chen and Simone A. Ludwig [57].

FDPSO-LS classifier, which uses a rule base to represent a ‘particle’ that evolves rule base over time. It is implemented as a matrix of rules, representing fuzzy IF-THEN classification rules, that have conjunctive antecedents and one consequent. It is applied to both discrete and continuous data sets [58, 59]. In addition, a local mutation search strategy was incorporated in order to take care of the premature convergence of PSO. As the number of rules rises, an efficient algorithm that can automatically find the fuzzy rules is important and necessary. Normally, several rules of the rule base are fired in the fuzzy rule classification system. The predicted class for a given instance is determined by the membership degree of the input variables. Specifically, for each class \(k\),

$$\upbeta_{\text{Classk}} = \mathop {\arg { \hbox{max} }}\limits_{\text{k}} \sum\limits_{{1 \le {\text{i}} \le {\text{n}}}} {\prod\limits_{{1 \le {\text{j}} \le {\text{m}}}} {\upmu_{\text{ij}} } }$$
(13)

where \(\upmu_{\text{ij}}\) is the input membership degree of the ith rule of the jth antecedent. The class that has the largest \(\upbeta\) value is selected as the predicted class. Limitation of this approach are, it may not efficiently normalize discrete datasets using linguistic terms, leads to overfitting of the model and may result in decrease of classification of accuracy.

4 Experimental Evaluation of the Classification Model

The degree of belief on classifier’s capability to classify the unknown instances can be acknowledged by looking at its generalization ability on trained instances. The classifier learned on training dataset has to be tested experimentally for its performance. Experimental procedures adopted for evaluation of the proposed classification model and analysis of results is shown in this section. The experimental process is carried out by using various classification evaluation metrics for assessing the classifier’s performance on a dataset.

4.1 Evaluation Metrics

Most of the study on medical data analysis is done using binary classification models. This study focuses on medical data analysis with multiclass classification problem, as most in most of the cases the automated medical diagnosis cannot be formulated as binary classification problems. In multiclass classification, the instances are to be classified into only one class from set of non-overlapping classes. Subsequently, a fuzzy-rough classifier for classification of medical datasets having uncertain data is developed. The experimental evaluation process involves in comparison of proposed classification algorithms performance with existing computational intelligence techniques.

The choice of appropriate metrics for evaluation of the classifiers is vital to measure the actual performance. Here, the focus should be on performance but not the perception of the trained classifier. The most common metric to evaluate classifiers are accuracy and error-rate. Accuracy tells the ratio of instances the classifier classified correctly. Error-rate specifies the ratio of incorrect classification done by classifier. Although, specificity and sensitivity are the popular metrics to indicate success in medical diagnosis classification models along with medical test, they are not preferred here as the experimental evaluation is done for multiclass classification problems. Precision and Recall are known to be quality metrics for multiclass classification problems. Precision specifies classifiers ability in precisely (exactness) identifying the relevant class labels by the classifier for each class. Recall specifies completeness of the classifier.

Also performance of the proposed approach is given by comparing the results obtained with that of existing well known methods.

4.2 Description of the Dataset Used

The main focus is to analyze uncertainty in medical dataset, so that efficient diagnosis of diseases can be made by medical practitioners. This study made use of Cleveland heart disease dataset retrieved from UCI machine learning repository [34]. The dataset has 303 instances with 14 attribute. The attribute indicating level of heart disease is considered as response attribute which is distributed into five classes.

4.3 Experimental Analysis

In the experimental evaluation process, Cleveland heart disease dataset is randomly partitioned into training and testing sets. Holdout method approach is used in splitting training and test sets with 2/3, 1/3 of total instances respectively. More instances of each class can make the classifier to attain good generalization capabilities. Training dataset is given as input to the classifiers to build classification model. For evaluation of the classifier the test set is used. The results obtained are showcased subsequently. Also, a performance of the proposed classifier is evaluated by comparing results obtained with that of the other classifier’ built using computational intelligence techniques on the same dataset.

Fuzzy-Rough Rule Induction

Rule based classifiers are known for their better representation and interpretation of classification model. A rule based classifier (FRRI) is built on Cleveland heart disease dataset. The proposed FRRI has generated 4 rules. Classification performed by these rules has given an accuracy of 90.8%. The performance of FRRI is tested with other CI techniques. The techniques used in comparative analysis are, Multi-layer perceptron and also, fuzzy based hybrid classification techniques: genetic-fuzzy modeling (GFM), fuzzy ant colony optimization (FACO-Miner), fuzzy particle swarm optimization (Fuzzy DPSO). It is observed that results are inferior to that of FRRI.

Table 6 depicts the classification performances of aforementioned computational intelligence techniques.

Table 6 Classification performance

The graphical representation of classification models are depicted in Fig. 3.

Fig. 3
figure 3

Classification models

5 Conclusions and Future Research Directions

The problem of analyzing medical data for better diagnosis by using computational intelligence techniques has been studied. This work emphasized on analysing uncertainty in medical data which is inevitable while collecting information from patients. A predictive hybrid model is built using fuzzy-rough classifiers that classifies a new instance/record of a patient and labels/conveys whether the patient might suffer from the disease or not (this research is carried on heart dataset). To achieve this, the model is trained to generate rules for classification. Fuzzy-rough concepts are used for the entire process as it is proven to handle uncertainty efficiently. A rule based fuzzy-rough classifier FRRI, which classifies uncertain medical datasets has been built.

Similarly, the framework can be extended to different domains other than medical field. The behavior of a netizen can be analyzed by such hybrid models. Also, the impact of social media can also be studied. Computational intelligence techniques can be used to build robust decision making systems. Further, ensemble of random forests can be used for classification. Parallel processing can be adopted to reduce the computational complexity of random forests.