Keywords

1 Introduction

Decision analysis always deals with complex data that have different characteristics and structures. The most problematic data is categorized as uncertain data whereas the criteria value is practically difficult to be determined. This problem will remain unsolved if the decision-making process is involved with a complex data or more specifically, if it deals with a big data set [31]. A very powerful method needs to be selected in order to avoid the problem of ineffectiveness in the computational work and also to produce the best computational results. Based on the literature, one of the most popular approaches that has always been investigated is parameterization method. This method helps decision makers to simplify a complex data set. Even though the best method has been selected to handle this kind of problem, it still has some disadvantages. For instance, the classical rough set theory which is a well-known method that is capable of handling complex problems [17] still needs assistance from other methods to deal with parameterization problem. Thus, various theories and concepts such as granular computing, deep learning, mathematical theories, artificial intelligent (AI), and hybrid approaches have been proposed in order to overcome these problems.

The purposes of this study are: (i) to highlight the existing works of hybrid rough set and soft set theories in decision analysis especially in parameter reduction (PR) process; (ii) to propose a new hybrid rough set and soft set theories as a parameter reduction method; and (iii) to evaluate the performance of the proposed hybrid rough set and soft set parameter reduction method towards medical data sets. Several factors have inspired this study to integrate the rough set and soft set theories as a parameter reduction method. Firstly, based on the literature study conducted, soft set theory and other existing hybrid methods have not been tested with a complex medical data set. Secondly, the AI techniques such as rough set theory and fuzzy c-means are claimed as the best techniques to handle complex multi-criteria decision problems as stated by [23]. Thirdly, the soft set theory is one of the emerging mathematical tools which has been proven to be a good parameterization tool by many researchers [10, 19]. Therefore, it is selected to be integrated with the rough set theory. Finally, the medical data set is selected because of its complexity and vagueness structure which make it suitable to test the uncertainty and complexity issues. It is also believed that a new medical knowledge will be discovered by studying the relationships and the patterns of the medical data. Furthermore, the analysis of the medical data usually involves the study of improving the incomplete information as uncertain information handling as well as organizing different level of data representation [15].

This paper is divided into 4 sections. Section 1 introduces the purposes of the study and Sect. 2 briefly explains the basic concept of parameter reduction process, rough set theory, soft set theory, and the existing hybrid methods of these two theories. Section 3 provides the experimental work and its results to benchmark the existing rough set and soft set hybrid parameterization methods. Lastly, Sect. 4 concludes the overall works.

2 Implementation of Hybrid Rough Set and Soft Set Theories in Decision-Making

This section discusses the related theories and the existing hybrid rough set and soft set methods in handling medical data. It begins with the basic explanation of the parameter reduction process, followed by rough set theory, soft set theory, and in-depth discussion of hybrid rough set and soft set theories in decision-making process. Parameter reduction is one of the important processes that could be applied in decision-making. It is applied in the pre-processing phase in which it helps to reduce the volume of the data set before other tasks?such as classification or ranking task?is executed. It is also used to eliminate less important attributes and uncertainty values. Reduction process can be divided into two parts; parameter reduction and parameter value reduction. Most of the publications deal with parameter reduction instead of parameter value reduction [19].

2.1 Rough Set Theory

Rough set theory (RST) deals with approximation concept. It has been proven by many researchers that rough set theory has the strong capability of handling big and uncertain data [21]. According to the publication written by [22], the philosophies of rough set theory are to minimize the data size and to deal with uncertainty data. RST will construct a set of rules that provides useful information from the uncertain and inconsistent data [14]. The capability of rough set theory in handling uncertainty and ambiguity data has been successfully tested by many researchers either by extending the rough set theory or by combining it with other theories.

RST can also be applied as a parameter reduction technique to remove unnecessary attributes by preserving the original information. RST has also been proven as a good parameter reduction technique by many researchers and has been extensively used in many areas such as medical diagnosis, decision-making, image processing, as well as economic and data analysis. RST is also recommended to be a tool that can effectively reduce unnecessary attributes when it is integrated with other techniques in decision-making process. Recently, various parameter reduction techniques which are based on rough set theory have been proposed such as dominance-based rough set approach (DRSA), variable consistency dominance-based rough set approach (VC-DRSA), and variable precision dominance-based rough set approach (VP-DRSA) [12]. Each of these techniques has their own ability and limitation.

2.2 Soft Set Theory

Soft set theory is a theory that utilizes the advantages of rough set theory in handling imprecise and vague data [6, 18]. Soft set theory allows the object to be defined without any restricted rules. In other words, to identify the membership function, adequate parameters are needed [13]. It is a mathematical tool that has been proposed by Molodtsov and it is independent of any insufficient parameterization tools that are inherited by several approaches such as rough set and fuzzy set theories [31]. Guan et al. stated that soft set is a set of data that comprises of a record set, a set of parameters, and a mapping set of selected parameters from a power set of universe [9].

As stated by [13], recently, the theory of soft set has attracted many researchers to further improve the theory or apply in various areas such as operational research, medical research, and decision-making; especially in an environment with uncertain information. Soft set has emerged recently due to its functionality and ability in handling uncertainties. Molodtsov has claimed that soft set is better than fuzzy set and rough set in decision-making process. Besides, it does not need any parameterization tools [30]. Consequently, motivated by this theory, researchers have done many works related to the decision-making area. Most of the works published were investigating and proving the ability of soft set theory to assist decision makers in making a good decision.

2.3 Hybrid Methods

Method hybridization is a process of integration between one method and other methods. Hybridization presents the alternative to solve the limitation of single method in a particular process. For example, fuzzy set relies on the expert?s knowledge, rough set suffers from nondeterministic polynomial-time hard (NP-hard) problem in attribute reduction and optimal rule discovery, and genetic algorithm (GA) might face convergence problems [3]. Recently, many researchers have proposed several enhancements and one of them is the integration of artificial intelligent (AI) methods in order to maximize the functionality of the methods and to minimize the shortcoming of the original method. For instance, interval valued fuzzy ANP (IVF-ANP) was developed to solve the multiple attribute problems by determining the weights of each defined criteria [28] and rough AHP was proposed to measure the system performance [2]. The followings are some of the hybrid methods that integrate either soft set theory or rough set theory in their proposed works.

(1) Fuzzy soft sets: Fuzzy soft set is another soft set approach which has been introduced in 2001 to solve many problems including uncertainties. It is an extension to the classical soft set approach which was proposed to solve decision-making problems in real world situation. Many publications have contributed to fuzzy soft sets. One of the publications was written by Agarwal et al. [1] in which they proposed an expert system that generalizes the intuitionistic fuzzy soft set approach to solve medical diagnosis problem. Meanwhile, Xiao et al. [29] initiated an optimization approach based on interval-valued fuzzy soft set in solving multi-attribute group decision-making problems under uncertain environment. Geng et al. [7] proposed a model that provides an approximate description of objects in an intuitionistic fuzzy environment and also with the additional information of weight attributes in solving multi-attribute decision-making problems in 2011. Besides, in the same year, [8] also had proposed a method that considers multiple parameters group decision-making by implementing the interval-valued intuitionistic fuzzy soft set approach. Furthermore, some of the publications have considered to apply the fuzzy soft set in certain particular application problems to accomplish different objectives. For example, these research works were conducted in order to reduce the chances of piracy in image transmission [25], to rank the technical attributes in quality function deployment [30], to apply new soft information order algorithm to solve problems [9], and to solve ranking problems by using the concept of intuitionistic multi-fuzzy soft set [4].

(2) Soft fuzzy rough set and soft rough fuzzy set: Soft fuzzy rough set is a combination of three mathematical tools; soft set theory, fuzzy set theory, and rough set theory. These tools are almost related when dealing with uncertainties and vagueness problems. It was introduced by Feng et al. [5] who investigated the problem and the consequences of integrating these three theories in which it was inspired by Dubois and Prades research work named rough fuzzy sets. Later in 2011, Meng et al. [20] redefined the concept proposed by Feng et al. [5] and introduced a new soft approximation space by considering several issues arose from the previous research works. Then, still in 2011, another definition for the soft fuzzy rough set was presented by Sun et al. [27]. They proposed a new concept of soft fuzzy rough set by integrating soft set, rough set, and fuzzy set with traditional fuzzy rough set. Later, in 2012, an enhancement of intuitionistic fuzzy soft set approach with rough set theory was proposed by Zhang et al. [31]. A new alternative related to the intuitionistic fuzzy soft sets problems was suggested for decision makers in making a scientific and appropriate decision. He successfully proofs his work was more suitable than the other works in dealing with the crisp soft set decision-making problems.

(3) Rough set, modified soft rough set and rough soft set: Rough soft set and soft rough set were introduced by Feng et al. with the conjunction of integrating the three approaches?rough set, soft set, and fuzzy set?to produce one hybrid model named soft rough fuzzy set [24]. As pointed out by Feng et al. in [6], rough soft set is the approximation of soft set in rough approximation space which was introduced by Pawlak meanwhile soft rough set is grounded by soft rough approximation in soft approximation space. According to [16], soft rough set which was introduced by Feng et al. was a generalization of the rough set model over the soft set model which promotes in providing a better approximation in certain cases compared to Pawlaks work. Based on the findings, the researchers are recommended to make enhancement and modification to their work by exploring the associations between other rough set models and soft rough sets [6]. In 2013, Shabir et al. made an improvement to the soft rough set approximation theory which is called modified soft rough set (MSR sets) [24]. This work claims that the proposed approach is more robust and the granules of information produced are finer than the original soft rough sets.

3 Experimental Work and Results

Based on the literature works done on rough set and soft set hybrid methods, most of the hybrid research works have only provided the theorems and algorithms without any experimental works to proof the proposed theory either it can be applied to real data or not. Therefore, in this section, an experimental work was conducted by using real data sets with a new proposed hybrid framework. The proposed hybrid framework integrated the rough set and soft set theories in the parameter reduction process to help the classifier deals with complex medical data as mentioned in Sect. 1. Four types of medical data sets were applied to the experimental task and it can be downloaded at UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/. The data sets used were breast cancer Wincostin, BUPA liver disorders, heart disease, and diabetes Pima Indian. The characteristics of the data sets are depicted in Table 1.

Table 1. Data sets description.

Figure 1 presents the framework of the proposed work which has four phases; (1) data cleaning and formatting; (2) parameter reduction process; (3) classification process; and (4) performance evaluation. The software that was used in the experimental work were MATLAB version 2014a, Rough Set Exploration System (RSES) and WEKA 3.6. MATLAB was used to execute the soft set parameter reduction algorithm and neural network classification process, RSES which can be downloaded from http://www.mimuw.edu.pl/˜szczuka/rses/about.html was used to execute the rough set reduction process. Meanwhile, WEKA 3.6 was used to execute other hybrid parameter reduction methods listed in Table 3.

Fig. 1.
figure 1

Proposed framework for the implementation of hybrid rough set and soft set parameter reduction method in the classification process

3.1 Experimental Work

In phase 1, the raw data sets were cleaned up and formatted according to the certain format based on the software requirements. Basically, the attributes and the number of instances are defined during the pre-processing task. The data sets then went through the hybrid parameter reduction process in order to simplify the uncertain and complex issues of the data in phase 2. During this phase, two level of parameter reduction processes were executed. The first step was implementing the rough set parameter reduction process and the second step was applying the soft set parameter reduction process. Basically, all the attributes that contain missing values and ranked as less important based on the specified formulations are removed. For rough set parameter reduction process, several algorithms have been provided by the RSES in which it can be selected to execute the reduction process. After several tests with available algorithms had been performed, the exhaustive algorithm was used to generate the reduct sets because it helps the classifier in generating a good classification result. The reduct sets contain several sets of attributes which can be used in the next processing task. One of the reduct sets generated by the rough set parameter reduction process was used as an input for the soft set parameter reduction process. The soft set parameter reduction process applied the parameter reduction algorithm introduced by [10, 11]. It was used to determine the best attributes that need to be used in the classification task. Both parameter reduction processes identified the important attributes to be used in the classification task. Only one reduct set for each parameter reduction level was used in the classification task. Figure 2 demonstrates the flow of the two-level reduction process in detail with a simple example. Meanwhile, Fig. 3 provides the example of the data set after it went through the hybrid parameter reduction process. Data set heart disease which has 14 attributes was reduced to 6 attributes after it when through the hybrid PR process. Most of the uncertain attributes and missing values were eliminated after the parameter reduction process was executed.

Fig. 2.
figure 2

Hybrid rough set and soft set parameter reduction process

Fig. 3.
figure 3

Example of the obtained result from the hybrid rough set and soft set parameter reduction process for heart disease data set

After the new simplified data set had been generated through the hybrid parameter reduction process, the classification task was executed in phase 3. At this stage, neural network was used as a classifier to train and test the performance of the proposed hybrid parameter reduction method. Before the classification process was executed, the processed data sets were randomly divided into three groups, 70 % for training process, 15 % for testing process, and another 15 % for validation process. Training process was done several times until the best classification result was obtained. Finally, in phase 4, the obtained classification results for each data set were evaluated by using six standard performance measures; accuracy (ACC), sensitivity (SENS), specificity (SPEC), positive predictive value (PPV), negative predictive value (NPV), and receiver operating characteristic (ROC) curves [26]. The classification results are presented in Tables 2, 3, 4 and Figs. 4, 5.

Table 2. Overall performance before implementation of hybrid parameter reduction method.
Table 3. Overall performance after implementation of hybrid parameter reduction method.
Table 4. Performance of existing and proposed hybrid parameter reduction methods.
Fig. 4.
figure 4

Overall classification results

Fig. 5.
figure 5

Receiver operating characteristic (ROC) plot for each data sets

3.2 Results and Discussion

Based on the accuracy rates in Tables 2 and 3, all the data sets delivered satisfactory results after the implementation of hybrid PR in the classification task. Three data sets (breast cancer, heart disease, and diabetes) showed a good increment of the classification accuracy rate after the implementation of the proposed hybrid parameter reduction method. Heart disease showed the most significant result compared to the other three data sets in which the accuracy rate improved by 17.3 %, from 64.3 % to 81.6 %. However, the accuracy rate for BUPA liver disorder data set decreased by 9.3 %, from 75.7 % to 66.4 %, after the implementation of the parameter reduction process. The differences of the performance accuracy are presented in Fig. 4 which are denoted by the accuracy rate before the hybrid PR is implemented (ACC) and the accuracy rate after the hybrid PR is implemented (ACC1). The performance of each data set is presented in Fig. 5. via receiver operating characteristic (ROC) curve plots. Among all the ROC plots, breast cancer showed the nearest curve to the upper-left corner in which the classification accuracy was 97.1 %, that is near to 100 %. It can be concluded that the neural network classifier performed well with the breast cancer data set.

To demonstrate the significant of this study, the obtained results were compared to the results of selected existing works that applied similar or nearly similar data sets. The results from two existing works by [15, 26] were used as the comparison and benchmarking since many existing works did not use real data set to test their proposed work. However, not all data sets were tested by the selected existing works. In this case, it is represented by ?NA? or non-available symbol. Unfortunately, these two existing works did not apply the same research method as the proposed work. Therefore, two additional hybrid parameter reduction methods were applied to all data sets as a comparison to the proposed work. The comparison results are presented in Table 4.

The proposed hybrid parameter reduction method produced significant results for two data sets, breast cancer and heart disease, when it is compared to the other two hybrid PR methods (information gain with soft set and principle component with rough set). Meanwhile, by referring to the existing works, there are some possibilities that make the classification results better than the proposed work. Firstly, the objective of the work proposed by [15] was to test the ability of the bijective soft set to generate the classification rules instead of using it as a parameter reduction method. Secondly, the data set used by [26] in their work were from their own collection in which there were no missing values and other limitations such as small number of instances. Among the four data sets used, it can be concluded that breast cancer is one of the best data set that can be applied in any decision analysis problem since it helps in producing good results as shown in Table 4. A number of possibilities that might cause low accuracy rate of the data sets such as the characteristic of the data set, data size, and the chosen classifier should be further investigated. Since the work proposed by [15] produced good classification results, it might be considered to be implemented in the proposed framework in order to increase the classification performance.

4 Conclusion

To ensure a good decision is made, one of the most important approaches that researchers should apply in the decision analysis task is parameter reduction process. This process will eliminate complex data such as uncertainty values and simplify data volume into an acceptable data format. Recently, many research works have proposed different hybrid parameter reduction methods to overcome the limitation of a single parameter reduction method and to increase the performance of the previous works. Driven by the existing works highlighted in the previous sections, this study proposed a new framework of hybrid parameter reduction method which manipulates the advantages of rough set and soft set theories in dealing with complex data problem. Consequently, this study analysed the existing hybrid works and also investigated the ability and the performance of the proposed hybrid parameter reduction method in processing complex medical data for classification problem.

Most of the publications preferred to explore and apply the fuzzy concept into the soft set theory instead of integrating the rough set theory with the soft set theory in solving multi-criteria decision-making problems. Each of these methods has proven that a good solution could be obtained based on the given numerical examples without conducting real experimental works with real data sets. Evidently, most of the beneficial works have provided a simple and small data set in the validation test as stated in [6] which provided ?life expectancy? example that consists of six people with four decision parameters. It is difficult to prove whether the proposed hybrid methods are really efficient in producing the best solution without facing any computational problem. Thus, this study tested the performance of the proposed hybrid method with a large data set that consists of more than 100 instances as described above.

The outcome proved that parameter reduction method is needed when the data used are complex and contain uncertain or missing values. It helps in reducing the complexity without changing the structures and meaning of the data. As a conclusion, the hybrid rough set and soft set parameter reduction method have a great potential for researchers to further their research directions towards this area especially in solving big data phenomena, uncertainties, and data complexity problems. It is beneficial if all the proposed hybrid methods can be applied to any application areas such as medical science, social science, and economy.