1 Introduction

Liver is the biggest internal part of the human body that performs a big role in human life [1, 2]. Non-alcoholic fatty liver disease (NAFLD) is among the most common diseases related to liver and has several severity levels; it starts with a simple steatosis and develops to cirrhosis [3, 4]. Appropriate and early detection of the disease is very important as it can prevent serious later risks [5, 6].

Liver biopsy was introduced as a first method for fatty liver disease detection [6, 7]. But recently, optical methods (or image analyses [2]) have become more popular because of being less invasive and risky for the patient. The FibroScan is the least risky method for measuring the elasticity of liver [8, 9]. The elasticity measure shows the level of the NAFLD severity [1, 4, 10, 11]. Table 1 presents different levels of the NAFLD.

Table 1 The results of the FibroScan for detecting five levels of NAFLD

The accuracy of the FibroScan is higher than other NAFLD detection methods. However, it is an expensive and not widely accessible method. Therefore, researchers are seeking to create a system for disease severity detection which requires simpler and less paid tests [12,13,14,15,16]. The main problem of these systems (such as the Forns score system in 2002 [14] and Angulo method in 2007 [13]) is that only certain levels of disease are detectable with accuracy values much lower than FibroScan. In this research, we introduce a new simple and less paid method for NAFLD detection based on artificial neural networks (ANN). In the new method, severity levels of fatty liver disease are induced from clinical features obtained from complete blood count (CBC) and ultrasonography test. The CBC is a common test, and the result contains parameters that explain the conditions of organs through measuring different enzymes in blood (like sugar, cholesterol, and urea).

The dataset used in this research was obtained from patients who visited Sayad Shirazi hospital in the Golestan province, Iran in 2011. The dataset contains results of the blood test, ultrasonography, and FibroScan for the patients. All the participants gave oral informed consent to use these data for scientific purposes, and the study was approved by our Ethical Committee.

In this research, after preprocessing, parameters of the blood test and ultrasonography were applied as inputs to a neural network. The known disease level from FibroScan serves as the class label (the network output) in the training phase. The relationship between input and output is attained by training the neural network. Usually a neural network presents a good result in classification of training data, but the network prediction is less interpretable by humans [17,18,19]. In the literature, different methods have been developed for rule extraction from neural networks [20, 21]. The outcome of such methods is a collection of rules, which can be used as alternative to the original neural network in operation [17]. The rules have the benefit of being more comprehensible by the user. In this paper, a Four-Step Rule Extraction (FSRE) method is introduced to derive the rule set from the neural network. This method has less complexity with an equal or higher accuracy when compared to several other methods. The results of applying the FSRE method in NAFLD severity detection present the efficiency and usefulness of this rule extraction method in comparison with previous work.

This paper is organized as follows. Section 2 reviews previous works in the liver fibrosis diagnosis and rule extraction context. Our rule extraction method from neural network is presented in Sect. 3. Section 4 includes the evaluation and simulation results. Finally, Sect. 5 presents the conclusion.

2 Related works

In this section, we first review the studies related to rule extraction for NAFLD detection, and then discuss the studies on rule extraction using neural networks. The methods whereby the researchers could propose to evaluate a fibrosis diagnosis score using blood test parameters are described.

Forns et al. [14] examined the relationship between laboratory test values and liver fibrosis on 351 patients. They concluded that the parameters age, gamma-glutamyl transpeptidase (GGT), total-cholesterol (T-CHOL), and platelets (PLT) are independent predictors of fibrosis. They created the below formula based on these parameters:

$${\text{Score}} = 7.811 - 3.131 \times \ln \left( {\text{PLT}} \right) + 0.781 \times \ln \left( {\text{GGT}} \right) + 3.467 \times \ln \left( {\text{Age}} \right) - 0.014 \times \left( {\text{TCHOL}} \right)$$
(1)

This method can attain a cutoff value for detecting only the significant levels of the disease (F2–F4). Wai et al. [16] reported that the ratio between the multiple of the upper limit associated with a normal glutamic oxaloacetic transaminase (GOT) and PLT is useful for assessing liver fibrosis. The formula obtained in their research is shown below.

$${\text{Score}} = \frac{\text{GOT}}{{{\text{upper limited of normal for}}\frac{\text{GOT}}{\text{PLT}}}} \times 100$$
(2)

The Wai et al. method, similar to the Forns et al. method, can only detect significant levels of fibrosis (F2–F4 levels). Lok et al. [15] used blow formula to predict liver cirrhosis rather than graded assessment of fibrosis:

$${\text{Log odds}} \left( {\text{predicted cirrhosis}} \right) = - 5.56 - 0.0089 \times {\text{PLT}} + 1.26 \times \frac{\text{GOT}}{\text{GPT}} + 5.27 \times {\text{PT}}\_{\text{INR}} ,$$
(3)

in which the PT_INR is the international normalized ratio of prothrombin time.

Rule extraction using artificial neural networks is an appropriate method to extract suitable rules for disease detection or classification. For rule extraction, first a neural network is selected according to the problem in hand. Second, the training process yields the relation between input and output features. Finally, this relation is expressed in the form of a set of limited, simple, and efficient rules. Different algorithms exist for rule extraction from neural networks which can be classified in different aspects.

The type of attributes in the database is an important issue for a rule extraction method. Some methods are suitable for datasets only containing continuous attributes (e.g., temperature, height, weight) [22]. Some other methods only support dataset containing discrete attributes (e.g., sex, education level, and religion) [23, 24]. There are some methods that can deal with attributes containing both discrete and continuous types [21, 25].

The rule extraction methods can also be classified based on the form of the neural network’s output. Some methods express results in the form of an M-of-N rule set. In this case, N conditions (without any priority or ordering) are considered for an output class. An input sample is a member of this class if it satisfies at least M conditions. For example the M-of-N rule for two-bit XOR problem may be expressed as follows: “if (exactly 1 of 2 inputs is true) then Odd parity, else Even parity” [23]. However, majority of available methods are based on IF–THEN rules. In this case, each rule declares a condition of belonging to a special output class. For example the IF–THEN rule for two bits XOR problem is: “if (X1 = 1 and X2 = 0) or (X1 = 0 and X2 = 1) then Odd parity, else Even parity” [26, 27]. There are some other methods that employ fuzzy conditions instead of crisp ones. In this case fuzzification and defuzzification operations will be required [21, 28, 29].

Some of the rule extraction methods use the self-organizing map (SOM) neural networks [30, 31]. Generally, the outcome of such methods is a very simple rule set that can be applied in simple datasets. As this type of neural networks is only used for data representation and deriving relations between neighborhoods, they are not generally useful for classification problems. Malone et al. [31] applied a SOM neural network on the Iris dataset to show the differences among samples of different classes and also their relations to the nearby samples in the same classes. Then they derived the classification rules. Some of the methods are useful for radial-basis function (RBF) neural networks. These methods can be used for more complex classification/clustering problems. Also the obtained results from these methods have a higher accuracy [32]. Most existing rule extraction methods leverage multilayer perceptron (MLP) neural networks. These networks are useful for both simple and complex classification/clustering problems and generally present higher accuracy values [27, 33].

Most existing rule extraction methods are used for classification. In these issues, the desired output includes discrete values that represent several distinct categories. In this case, the number of neurons in the output layer of the neural network accommodates with the number of available categories [26, 34, 35]. There are also small numbers of procedures that can be used for regression problems. In these studies, unlike classification, the desirable output is continuous. In this case, there is only one neuron in the output layer of the neural network which produces the result [36, 37].

There are three types of overall approaches for rule extraction: the Pedagogical approach considers the entire neural network as a black box, and extracts the desired rules regardless of the operations within the network, and only with respect to its inputs and outputs [27, 38]. In the Decompositional approach, the rules are obtained by decomposing some parts of the neural network (input, hidden, and output) and considering the conducted operations in each section [39,40,41]. And finally the Eclectic approach uses a combination of compositional and pedagogical approaches [20].

3 Rule extraction using neural networks

The existing rule extraction methods are useful for simple problems and can only support less complex in datasets. Although a neural network is capable of classifying more complex data, but rule extraction from this data leads either to a very large number of rules or rules with low accuracy results. On the other side, the complexity in fatty liver severity detection based on clinical data is significant, and existing methods cannot achieve the appropriate rule set. We use a Four-Step Rule Extraction algorithm which can derive rules from ANN in complex problems. This method which is a decompositional approach is applicable to multilayer perceptron (MLP) neural networks and is capable of extracting IF–THEN rules for classification problems.

FSRE has four main phases. The first phase is the Data preprocessing/representation phase in which the input data are normalized using data mining techniques, the second phase is the Model learning in which the main classification operation is performed by training an MLP neural network. The Pruning phase comes next in which some of the less importance connections and neurons in the network are pruned. Finally, the last phase is Rule extraction which derives rules from the pruned neural network. The details of the process will be explained in the next subsections.

3.1 Data preprocessing/representation

In most problems, raw data in its initial form may not lead to ideal classification results, and usually some preprocessing techniques are required for simplification and increasing the accuracy. The goal of this step is to process the initial data such that the neural network can classify the data with a maximum accuracy. To this aim, some of the data mining techniques may be used, including aggregation, sampling, dimensionality reduction, feature subset selection, feature creation (extraction, mapping, and construction), discretization, and feature transformation [42]. Later in Sect. 4, we describe the specific operations performed on the dataset of this study.

3.2 Model learning

The second phase includes creation, training, and validation of a learner model. The method uses an MLP neural network with a standard structure (only one hidden layer) as a learner model for data classification. A three-layer (input, hidden, and output layers) fully connected neural network is constructed. The structure of this neural network is determined according to the dataset. The number of neurons in the input and output layers is equal to the number of features and the number of output classes, respectively. The hidden layer size is gradually incremented until the desired accuracy is obtained based on the validation subset of data. All weights (edges value) in the neural network are initialized to small random values. The value of bias is equal to 1, and only the hidden layer contains a bias node. The tangent hyperbolic and linear functions are used as the activation functions of the hidden and output layers, respectively.

The back-propagation algorithm is used for training the neural network, and the network performance is calculated using the Mean Square Error (MSE) measure. The generalized error is calculated using unseen samples of the test set.

3.3 Pruning

Usually the neural networks are fully connected, but not all of these connections are significant. Some may even be ineffective for predicting the neural network output [19]. Rule extraction from a fully connected neural network is very difficult (or impossible), and the final result has low comprehensibility. The purpose of neural network pruning is to remove the connections and nodes that are less useful in predicting the output. After pruning, the neural network structure will be simpler and contains only the most dominant connections and nodes.

First the connections between the hidden and output layers are pruned. This process specifies the effective hidden neurons for each output class. Afterward, the ineffective input edges for each hidden neuron are removed. This process will be continued until the final neural network efficiency is acceptable.

The continuous output values of the hidden neurons are one of the critical issues in the rule extraction methods that complicates perception of neural network calculations. The output values of hidden neurons are in the interval [−1, 1] because of applying the tangent hyperbolic activation function The outputs of hidden neurons per each input data are converted to discrete values using one or more threshold values (Such as Fig. 1).

Fig. 1
figure 1

Discretization of the output value for one hidden neuron according to the threshold values

3.4 Rule extraction

The last phase of the FSRE method is rule extraction. In this phase, a set of simple rules is extracted from structure of the pruned neural network. This phase contains four steps described below.

Step 1 Determining hidden layer patterns of input samples: input samples are applied to the network to obtain the discrete output values from the hidden layer. These discrete values constitute the hidden layer pattern of the sample. In Fig. 2, an example of pattern generation is shown for input samples in a pruned neural network. In this example, the input samples have three distinct features and belong to output classes. The trained neural network used for this problem initially had four neurons in the hidden layer, but only two neurons (H2 and H4) remained after pruning. Moreover, the second feature of the data was also pruned and was not further used in later steps of rule extraction. Discretization of the hidden layer outputs created two and four discrete values for neurons H2 and H4, respectively. By applying each sample to this neural network, the pattern of that sample was obtained and shown in the rightmost table of Fig. 2.

Fig. 2
figure 2

Determining the pattern for each input sample

Step 2 Sample classification: Each collection of samples with identical hidden layer patterns constitutes a category. Each category is associated with one output class according to the majority of network predicted outputs for samples of that category. Hence, input samples with the same hidden layer pattern would belong to the same class. Notably, several different categories may be assigned to one particular class.

Figure 3 shows the classification of the samples in the previous example based on the pattern obtained from the neural network. In this example, it is seen that three distinct patterns are created for the samples. Samples with the sample pattern belong to the same category. The categories (and the respective samples) are associated with output classes,

Fig. 3
figure 3

Classifying input samples according to their pattern

Step 3 Pruning and combining: Several categories may be associated to the same class, and also because of pruning input features, duplicate samples may exist in some categories. Therefore, we need to handle these issues, as stated below, to create the final rule set:

  • Removing the duplicate samples in each category. In Fig. 4a, samples and are similar because of pruning the second feature. In this case, one of the samples is deleted.

    Fig. 4
    figure 4

    Management of classes for the production of final rules

  • Combining similar samples to create a more general case (or rule). In Fig. 4b, samples and have been combined together due to having same value in the third feature and successive values in the first feature.

  • Combining categories that belong to the same output class. In Fig. 4c, the two created categories in class C2 are combined together.

  • If similar cases (or rules) exist in the new categories, they are combined/pruned to create more general cases (or rules). In Fig. 4d, samples and produced in the new category have been combined together for having the same values in the third, and successive values in the first feature.

  • Presenting the classification rule set for each output class. In the presented example, final rules for the classification of samples are produced according to the categories obtained for class C1 (Fig. 4b) and the category produced for Class C2 (Fig. 4d). In this example, values of the first feature for the two output classes were separated, and there was no need to use the third feature in the production of rules (Fig. 4e).

Step 4 Default rule: among the classes, the one with the largest number of samples, or the largest number of rules, or most classification error value is identified. This output class is considered as a default response (rule) for input samples. If an input sample is not supported by any other rules, the default rule is applied. This process reduces the number of rules and improves the results.

Finally, the rule set is constructed and can be applied to unseen (test) data. This rule set can be used on new data instead of the neural network if its efficiency is acceptable. We further analyze the performance of FSRE on two common datasets and compare its accuracy with other methods.

4 Evaluation results

In this research, the proposed method is used for NAFLD severity detection based on the clinical parameters. The FibroScan result is considered as the desired output for each sample (patient). This result consists of five severity levels introduced previously in Table 1. Four different neural networks were used to detect the four severity levels (F1–F4). If an input sample is not classified as belonging to any of the four levels, it is labeled as “healthy” and belongs to the default level (F0). In this section we present the evaluation and the obtained results.

4.1 Dataset

The collected dataset contains 726 samples (patients). The following 17 features were recorded for each sample: personal information (sex, age, height, and weight), blood test parameters (total-cholesterol (T-CHOL), high-density lipoprotein (HDL), low-density lipoprotein (LDL), Triglyceride (TG), blood creatinine (CRE), glomerular filtration rate (GFR), blood urea (BU), alkaline phosphatase (ALP), gamma-glutamyl transpeptidase (GGT), fasting blood sugar (FBS), white blood cell (WBC), and platelets (PLT)), and ultrasonography result (Ultra_Score). Each patient is classified to one of the five severity levels. Table 2 presents the output classes and number of samples in each class.

Table 2 Output classes and number of samples in each class

4.2 Efficiency parameters

The accuracy, sensitivity, and specificity metrics are used to measure efficiency of the propose method. The metrics are calculated as follows:

$${\text{Accuracy}} = \frac{{\left( {{\text{TP}} + {\text{TN}}} \right)}}{{\left( {{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}} \right)}} = \frac{\text{Number of Correctly Classified samples}}{\text{Total number of samples}}$$
(4)
$${\text{Sensitivity}} = \frac{\text{TP}}{{{\text{TP}} + {\text{FN}}}}$$
(5)
$${\text{Specificity}} = \frac{\text{TN}}{{{\text{TN}} + {\text{FP}}}}$$
(6)

In Eqs. (4)–(6), the TP, TN, FP, and FN stand for number of true positive, true negative, false positive, and false negative results, respectively.

4.3 Preprocessing

The dataset includes 17 basis features in which some of them may be irrelative for the classification. So the importance and priority of these features are initially examined, then the appropriate features are selected. We used information gain (IG) [43] to rank the features. The information gain is a statistical property based on entropy. Once the entropy of a feature varies widely for different samples in a category, it indicates that the feature is unstable; hence, it is not appropriate for identifying that category. Table 3 represents the information gain results, and the priorities of various features in different categories.

Table 3 Information gain score for some superior features

The results presented in Table 3 show that the superior features have IG scores near to zero (<0.05). Evaluation of the method using these features shows low sensitivity values in classification (Table 4).

Table 4 Result of the neural network on the 17′s basic features

Therefore, solely using these features is not suitable for disease severity detection. To increase the efficiency of the system, another feature named Forns_Score, proposed by Forns et al. [14], is employed as below:

$${\text{Forns}}\_{\text{Score}} = 7.811 - 3.131 \times \ln \left( {\text{PLT}} \right) + 0.781 \times \ln \left( {\text{GGT}} \right) + 3.467 \times \ln \left( {\text{Age}} \right) - 0.014 \times \left( {\text{TCHOL}} \right)$$
(7)

The research conducted in [14] designed a noninvasive method aimed to discriminate between patients with and without significant liver fibrosis (Level F2–F4 vs. F0–F1). Using the Forns_Score on the dataset, 82.35% of patients in the significant levels (F2–F4) can be successfully detected. Also the Body Mass Index (BMI) feature was employed by the following formula and added to dataset.

$${\text{BMI}} = \frac{{{\text{Weight}}_{\text{kg}} }}{{{\text{Height}}_{\text{m}}^{2} }}$$
(8)

The F0 class was considered as the default answer, because it has more number of samples among the classes. So, it is assumed that the input sample by default belongs to this class if it does not satisfy any other rule. The four separate detector systems are considered for the four disease levels (based on Table 1). The seven more effective features selected by the information gain score are represented in Table 5.

Table 5 Features applied to the neural network rule extraction

After choosing the features, an effective method is needed to convert the continuous values to discrete values. Many methods were presented for this purpose in recent years (e.g., CAIM [44], Fast-CAIM [45], Modified-CAIM [46], and Ur-CAIM [47]). In this research, the ur-CAIM method is used because of its performance results compared to the other methods. The discretization operation is done separately for each detector. Afterward, the new features values are normalized to the unit interval. The general normalization method is used to value normalization.

$${\text{NormData}} = \frac{{\left( {{\text{Data}} - \mu_{\text{Data}} } \right)}}{{\sigma_{\text{Data}} }}$$
(9)

After the above-mentioned preprocessing operations, the data are ready to be applied to the neural network. For the implementation, 70% of the dataset is used for training, 15% is used for validation, and the rest are used for testing. The proportion of the different classes in each subset is consistent with the proportion of these classes in the dataset.

4.4 Results

In this stage, four detector systems are considered to the four existing disease levels (except for F0 level). The structure of the neural network for each detector is decided independently. The parameters of these neural networks will be assigned according to the method mentioned in Sect. 2. Each detector system will solely decide on its specified class (one vs. all). After the training phase, the efficiency value is calculated using the whole data, then all of the additional connections and neurons are pruned. The input sample will be applied to the detectors in a hierarchical manner (and not simultaneous). The order of applying a sample to these detectors is based on the probability of each class as shown in Fig. 5.

Fig. 5
figure 5

Applying the input sample to the detector systems as hierarchical form

The advantage of the hierarchical scheme over the simultaneous one is that there is no uncertainty and subscription in the detector systems’ decisions. The errors will also be reduced in this manner, because each sample is exactly member of a special class. Table 6 represents the structure of neural networks and the results of training and pruning phases. At the end, the discrete models are created for the samples, and classification rules are obtained for each class (see the results in Table 7).

Table 6 The obtained results after training and pruning phases
Table 7 The obtained results from extracted rules

The results in Table 6 indicate that the neural networks (used as detector systems) can detect the data during the training phase very well. Also, these neural networks will obtain more ideal results with a simpler structure for serious cases of the disease (i.e., F3 and F4). The classification accuracies for classes F1 and F2 are above 80% which are better than the previous works. The results of the pruning phase also indicate that the sensitivity of the detectors has increased; even with the elimination of some connections, the obtained results are congruous and acceptable.

In Table 7, it is shown that the neural networks with a simpler structure lead to the production of a lower number of rules, and the received results of these rules are almost equal to the results of pruning phase (Table 6). These results are ideal for effective levels of disease (F2, F3, and F4) and are acceptable for the early stage of disease (F1). The set of extracted rules are as follows:

  • Class F4:

    \({\text{if}} \left( {{\text{LDL}} < 216} \right)\,{\text{and}}\,\left( {{\text{FornsScore}} \ge 115.01} \right)\,{\text{then ClassF}}4\)

  • Class F3:

    \({\text{if}} \left( {{\text{UltraScore}} = \left[ {1 {\text{or}} 7} \right]} \right)\,{\text{and}}\,\left( {{\text{LDL}} < 178.5} \right)\,{\text{and}}\,\left( {{\text{FronsScore}} \ge 109.035} \right)\,{\text{then ClassF}}3\)

    \({\text{if}} \left( {{\text{UltraScore}} = \left[ {1 {\text{or}} 3 {\text{or}} 5 {\text{or}} 7} \right]} \right) {\text{and}} \left( {{\text{LDL}} \ge 178.5} \right) {\text{and}} \left( {69.8 \le {\text{FBS}} < 220.7} \right) {\text{and}} \left( {{\text{FornsScore}} \ge 109.035} \right) {\text{then ClassF}}3\)

    \({\text{if}} \left( {2 \le {\text{UltraScore}} \le 6} \right) {\text{and}} \left( {{\text{LDL}} < 178.5} \right) {\text{and}} \left( {69.8 \le {\text{FBS}} < 220.7} \right) {\text{and}} \left( {{\text{FornsScore}} \ge 109.035} \right) {\text{then ClassF}}3\)

  • Class F2:

    \({\text{if}} \left( {{\text{LDL}} \ge 71.5} \right) {\text{and}} \left( {{\text{FornsScore}} \ge 106.485} \right) {\text{then ClassF}}2\)

    \({\text{if}} \left( {3 \le {\text{UltraScore}} \le 5} \right) {\text{and}} \left( {{\text{LDL}} < 65.5} \right) {\text{and}} \left( {{\text{FornsScore}} \ge 106.485} \right) {\text{then ClassF}}2\)

    \({\text{if}} \left( {{\text{UltraScore}} = 1} \right) {\text{and}} \left( {65.5 \le {\text{LDL}} < 71.5} \right) {\text{and}} \left( {{\text{FornsScore}} \ge 106.485} \right) {\text{then ClassF}}2\)

  • Class F1:

    \({\text{if}} \left( {{\text{LDL}} \ge 105.5} \right) {\text{and}} \left( {{\text{TRIG}} < 275.5} \right) {\text{and}} \left( {{\text{BMI}} < 32.108} \right) {\text{and}} \left( {105.085 \le {\text{FronsScore}} < 106.725} \right) {\text{then ClassF}}1\)

    \({\text{if}} \left( {{\text{LDL}} < 104.5} \right) {\text{and}} \left( {{\text{TRIG}} < 263} \right) {\text{and}} \left( {{\text{BMI}} < 32.108} \right) {\text{and}} \left( {105.085 \le {\text{FronsScore}} < 106.725} \right) {\text{then ClassF}}1\)

    \({\text{if}} \left( {{\text{LDL}} < 104.5} \right) {\text{and}} \left( {{\text{TRIG}} \ge 275.5} \right) {\text{and}} \left( {{\text{BMI}} < 32.108} \right) {\text{and}} \left( {105.085 \le {\text{FronsScore}} < 106.725} \right) {\text{then ClassF}}1\)

    \({\text{if}} \left( {{\text{LDL}} < 104.5} \right) {\text{and}} \left( {{\text{TRIG}} < 241.5} \right) {\text{and}} \left( {{\text{BMI}} \ge 32.667} \right) {\text{and}} \left( {105.085 \le {\text{FronsScore}} < 106.725} \right) {\text{then ClassF}}1\)

  • Class F0:

    \({\text{Default is Class F}}0\)

4.5 Discussion

This section compares experimental results of FSRE with the results of other works. Table 8 compares FSRE results of NAFLD severity detection with those produced using Frons_Score [14], JRip Rule Induction [48], and J48 [49] algorithms on the dataset.

Table 8 Accuracy of the different methods for NAFLD severity diagnosis

Breast cancer and Wine datasets from UCI database [50] are used to test the proposed method and to evaluate its performance. The obtained results are compared with other popular rule extraction methods including SV-DT [51], RGANN [52], Rex-P [53], and Rex-M [53]. Considering the results shown in Tables 9 and 10, the FSRE is able to produce a rule set with a higher accuracy and comprehensibility compared to the existing methods.

Table 9 Comparison of the results obtained for Breast cancer problem
Table 10 Comparison of the results obtained for Wine problem

5 Conclusion

In this research, a new technique was introduced for liver fibrosis diagnosis using neural networks. This technique consists of four main phases, namely data preparation, constructing and training the detectors (neural networks) for various severity levels of the disease, pruning, and rule extraction. The method was used to determine the relationship between clinical parameters and the severity of NAFLD (non-alcoholic fatty liver) disease. After training and eliminating the unnecessary connections in these detectors, the relationships between clinical parameters and levels of NAFLD disease were obtained in the form of a set of rules. In this research, a hierarchical system has been used to appropriately detect the various severity levels of the disease. The ultimate accuracy of these rules for severity levels of F1, F2, F3, and F4 is, respectively, equal to 80.58, 93.94, 99.31, and 100%. The results are superior compared with the existing scoring systems.

In this study, only a few clinical parameters were used for NAFLD diagnostic, and other parameters are ignored. These parameters will be investigated in future works. Also using other base networks and training models such as ELM/Kernel ELM instead of MLP to achieve a better and more helpful rule set in real clinical environments can be another direction of future work.