Keywords

1 Introduction

Diabetes is increasing rapidly in developed countries; the increase is widespread because over 246 million people worldwide were treated from this chronic disease according to data from the International Diabetes Foundation in 2005. It was also reported that by 2025 at least 300 million people worldwide may develop diabetes disease [1]. The exact number of people with diabetes mellitus is higher since, by a definite epidemiological study on each diagnosed patient, there is one non-diagnosed patient. [2]. A large number of patients with diabetes belong to the group of the working population.

According to the International Diabetes Foundation, 2010 Nigeria is one of Africa’s most populous countries with the largest number of people with diabetes, with 3.0 million, 1.9 million South Africans, 1.4 million Ethiopians, and 769,000 Kenyans. Providing a remedy for insufficient medical care using human resource education involves long and high costs which can result in an improved patient morbidity rate.

Meanwhile, emphasis on the need for preventive methods has been the interest of health care, researchers, and medical personnel around the world [3,4,5]. The preventive method that would support the well-planned assessment that could moderate or lower the risk of transitioning from pre-diabetes to diabetes. Diabetes mellitus is graded as “type 1,” “type 2,” “gestational diabetes” and other different forms of [6].

Diabetes Mellitus represents both the medical and socio-economic problems in modern society [7,8,9,10]. In developed and developing countries the disease is widespread. It was estimated that 175 million people having diabetes in 2004, worldwide and it was projected that an estimate of 354 million will have diabetes by 2030 [11]. According to the International Diabetes Foundation, 2010, Nigeria is one of the most known countries in Africa that records the huge figure of DM, three million in South Africa, one million and nine hundred thousand in Ethiopia, and one million and four hundred thousand in Kenya 769,000 [12]. Providing a solution for inadequate medical services using the education of human resources requires a long time and high expenses that may result in an increased morbidity rate in patients [13, 27].

Artificial Intelligence (AI) in medicine has provided numerous advantages in diagnosis, management, and prediction of highly complicated and uncertain diseases [14, 15]. Despite the high rate of complexity and uncertainty in this field, computational intelligent systems like NN, FL, and genetic algorithm (GA) have been used to improve health care, minimize treatment expenses and improve the quality of life [16,17,18]. A new system category is the hybrid intelligent system focused on artificial intelligence techniques such as fuzzy logic, neural network, expert system, and genetic algorithms [19, 20].

Using single technique in the diagnosis of diabetes has been comprehensively investigated showing some level of accuracy [12, 21]. Researchers have investigated the effect of hybridizing more than one techniques to show better results in the diagnosis of diabetes [22] but to the best of our knowledge none use GA for features selection. Therefore, the work developed an improved neuro-fuzzy inferential system that combines NN and FL and used GA for the optimization of parameters before the classification of DM.

2 Related Work

Different types of intelligent system techniques have been applied to detect, classify, and diagnose diabetes and its complications. Expert System (ES) is built using one or combination of these techniques: Artificial Neural Networks (ANN), Genetic Algorithm (GA), Fuzzy Logic (FL) [23] used fuzzy logic to build an expert diabetes diagnostics program with judgment mechanism for instance. Triangular membership functions with Mamdani’s inference are used in a fuzzy verdict mechanism. The conversion of fuzzy values into crisp values was done with the Defuzzification method. The judgment process was used to enforce rules to make a decision about the probability of diabetes sufferers and to present the information with descriptions. The experimental result shows that the expert system can pass the information acquired into the expertise to replicate the human thought process and can analyze data from the database [24].

[25] used an adaptive neuro-fuzzy inference system (ANFIS) and principal component analysis (PCA) for diabetes diagnosis to improve the diagnostic accuracy of diabetes. He divided the system into two stages, the first stage of 8 features dataset is reduced to 4 features by PCA and in stage two the reduced dataset was used to diagnose patients with diabetes disease using the designed system classifier (adaptive neuro-fuzzy inference system). The proposed system recorded 89.47% classification accuracy.

[26] proposed a fuzzy logic expert system to improve on Pima diabetes dataset classification accuracy. Artificial neural network (ANN) with two neural networks were combined to form a hybrid system, a back propagation algorithm was used to train the proposed system. Fuzzy like age, blood pressure, and the rest were used as crisp data after dividing the inputs into two groups. The standardized crisp input values are fed to the first ANN in the first stage (ANN1) and the result of the defuzzyfied from fuzzing fuzzy data and their values are presented to FNN. To calculate the final output, the results from ANN1 and FNN were fed to the second ANN (ANN2). If the output value is different from the actual value, the weights of these networks will change, and the process was repeated until reasonable results are reached. The developed system was tested using K-fold cross-validation and achieved an accuracy of 84.24%.

The hybridized intelligent system used the fuzzy expert system in addition to the neural network base. The inputs have been separated into a couple of groups: fuzzy such as blood pressure and medical tests and rest are deemed to be crisp data. A fuzzy system is applied to integrate the fuzzy inputs then feeding them to ANN together with the crisp inputs. ANN has been used for the prediction of DM. This paper tends to add GA for features selection before applying ANN and FL.

3 An Overview of the Techniques

3.1 Brief Description of Some Computational Intelligence Techniques Used

Neural Network (NN)

Neural networks can easily handle both continuous and discrete data and can gather data from available indicators. A neural network has been used to train and test the designed Fuzzy system, as well as to develop the per-for-mance of the overall system. This consists of 8 features, namely: diastolic blood pressure, 2 h of oral glucose tolerance test (plasma glucose concentration), some pregnant days, skin folding thickness of Triceps, 2-h serum insulin (INS), body mass index (BMI), diabetes pedigree function (DPF), and sex. Wi has been used to show that that attribute has a weight that contributes to the process of diagnosis. The attributes of patient diagnosis were fed into the neural network as input layer and contribution was calculated at the hidden layer of each group of variables. The equation used is as follows:

$$ CAT_{i} = \sum\nolimits_{i}^{n} {A_{i} * W_{Ai} } $$
(1)

Fuzzy logic has the power of handling imprecise and imperfect data which is one of the characteristics of medical records, and it is a superset of traditional Boolean. This seeks a definite solution to a given problem and often parallels human decision-making in its ability to function from indirect reasoning. The steps involved in diagnosing and controlling diabetes through fuzzy logic are the following process:

$$ Output_{Neural Network} = \sum\nolimits_{i}^{n} {CAT_{i} * W_{CATi} } $$
(2)

Where WCATi is the connection weight of CATi

Fuzzy Logic

Fuzzy logic has the strength of handling imprecise and incomplete data that is one of the characteristics of medical records, and it is a superset of the conventional Boolean. It finds a precise solution to a given problem and its ability to work from approximate reasoning also resembles human decision making. The following process is the stages involves in diagnosis and management of diabetes by fuzzy logic:

  1. 1.

    Fuzzification of the attributes input by the patient

  2. 2.

    Formation of a fuzzy rule base system.

  3. 3.

    Inference engine: the building of decision making for the fuzzy logic component.

  4. 4.

    Defuzzification of the output results from the inference engine into crisp values.

Algorithm for Fuzzy Logic

Phase 1: Glucose, INS, BMI, DPF, and age were used as crisp values for the data.

Phase 2: Set the Fuzzy Number triangular membership function.

Phase 3: Construct the Fuzzy numbers for the input set & output set of the five (5) attributes

Phase 4: Mamdani was used for executing a fuzzy inference process.

Phase 5: Enter the rules and measure the corresponding degree of law for the fuzzy input collection “OR” disjunction (Glucoselow, Glucosemedium, Glucosehigh, INSlow, INSmedium, INShigh, BMIlow, BMImedium, BMIhigh, DPFlow, DPFmedium, DPFhigh, Ageyoung, Age-old).

Phase 6: Calculate the aggregation of the fuzzy output set DM fired rules (DMverylow, DMlow, DMmedium, DMhigh, DMveryhigh).

Phase 7: Defuzzify into the crisp values by:

$$ z^{ * } = \frac{{\int {\mu A\left( z \right) \cdot zdz} }}{{\int {\mu A\left( z \right)dz} }} $$
(3)

Where ∫ is the algebraic integration, μA(z) is the number of fuzzy numbers of the fuzzy DM output variable and z is the weight of μA(z). Step 8: Reflect the type of information in the human language.

Fuzzification

Fuzzification is the first step in the Fuzzy Inference method. It’s a domain transformation that transforms crisp inputs into fuzzy inputs. In fuzzification, the fuzzy sets for the indicators, and the performance of diagnosis and diabetes management along with membership function were established.

The Fuzzy Sets for the Indicators and the Output of Diabetes Mellitus is as follow;

Number of Pregnancy: {Absent, Normal, Risk}.

Diastolic Blood Pressure: {Low, Medium, High, Very High}.

Triceps Skin Thickness: {Good, Average, Below Average}.

Glucose: {Low, Medium, High}.

Insulin: {Low, Medium, High}.

Body Mass Index (BMI): {Low, Medium, High}.

Diabetes Pedigree Function (DPF): {Low, Medium, High}.

Age: {Young, Medium, Old}.

Output: {Low, Medium, High}.

For the output fuzzy set, the system used 0 = Low, 1 = Medium, and 2 = High. The following:

For the final result, this study considered low as No Diabetes, medium, and high as Diabetes. After identifying the indicators and their fuzzy sets, the range values for each indicator’s fuzzy sets were prepared, and the evaluation for the data used for the model was performed by the doctors. Once the range values for the fuzzy sets were ready, the equations were built to produce the membership function using the range values. Triangular, trapezoidal, and bell-shaped membership functions are some of the types of membership functions that have been proposed. In this work, triangular membership function was used because calculations with triangular membership are easy, and shapes are simpler and more versatile and have fewer complexes when comparing other membership functions when dividing values (low, mid, and high MF). In Eq. 3.4 the triangular component function is seen.

$$ 0\left\{ {\begin{array}{*{20}c} {0, \;x < a} \\ {\frac{x - a}{x - b} \;a < x \le b} \\ {\frac{c - x}{c - b} \;b < x < c} \\ {0, \;x > a} \\ \end{array} \;{\text{Triangular membership functions}}} \right. $$
(4)

After generating the membership functions of the fuzzy sets, to get the most appropriate membership from the fuzzy set of each indicator, the maximum was taken from the generated membership function of the fuzzy set in each indicator. The maximum was considered because this study followed the Mamdani method to develop the FIS. In Mamdani, the maximum is taken from the generated membership function of the fuzzy sets to choose the appropriate membership function.

3.2 Genetic Algorithm

Genetic algorithms have been used to select optimal attributes (values) from the diagnostic parameters that serve as input, as well as device optimization. Indian diabetes dataset PIMA has eight attributes. In Neural Network, a genetic algorithm was used to choose which attributes to be used as input to reduce the complexity of computation.

3.3 Genetic-Neuro-Fuzzy Inference System (GNFIS)

The proposed approach incorporates ANN, Fuzzy logic, and Genetic Algorithm to construct an inferential system called the Genetic-Neuro-Fuzzy Inference System (GNFIS), designed to handle ambiguous and imprecise diabetes diagnostic data, self-learning, and adaptive system. To construct the inference method, the feed-forward propagation learning technique composed of nine layers of neurons was employed. Both secret and output layers consist of active nodes, which are inactive nodes at the input layer where computations occur. The reasoning algorithm based inference engine used Mamdani’s Inference Mechanism which is a law of development. The active nodes represent inputs from computers and are one of seven layers. Numeric values are used as variables for the diagnosis to reflect how bad a patient is feeling. For every input tag, the output layer is the corresponding linguistic labels.

Using the formula below, the second layer composed of adaptive nodes was used to obtain the output of the preceding layer as input and generated their corresponding membership grade:

$$ L_{2} \left( {x_{i} } \right) = \mu_{Ai} \left( {X_{i} } \right) $$
(5)

Increasing variable’s Fuzzy value is calculated using triangular MF, given as:

$$ \left( {X_{i} } \right) = \frac{{x_{i} - b}}{a - b} $$
(6)

4 Methodology

4.1 Model Diagram

A Block diagram of the theoretical model for diabetes diagnosis is shown in Fig. 1 below; it demonstrates the flow of a diabetic Mellitus diagnostic model.

Fig. 1.
figure 1

Block diagram for the proposed system model

4.2 Data Used for the System

In this work, the direct rating method was used for acquiring data, which is an effective way of constructing a membership function and direct means of collecting data. Also, to rate the membership function, all the rating indicators (subject) in sequences of data were presented (objects) to the domain experts. To construct a membership function, the responses of several physicians were collected. Both lowest and highest values were used finally to form the ranges for membership function calculation. Minimum values are the lowest and highest as the maximum values. The data used fell within the range, removing any chance of losing data. Table 2 shows the ratings of the data for the proposed system for diagnoses of diabetes mellitus.

Indicators of Diabetes Mellitus

  1. i.

    Pregnancy number: This is graded as absent, weak, and risky. If a person is male then there will be no pregnancy number.

  2. ii.

    Diastolic blood pressure: This field has four fuzzy sets low, medium, high, and very high.

  3. iii.

    Triceps skinfold thickness: It a measurement factor used for body fat.

  4. iv.

    Glucose: Glucose or blood sugar is the principal source of energy in the blood. It is measured by a 2-h glucose concentration after 2 h of having breakfast (American Diabetes Association, 2014). Glucose has three low, medium, and high Fuzzy sets.

  5. v.

    Insulin (INS): Insulin is the hormone that the pancreas excretes to help transfer glucose from the blood into the cells that are used for energy. It is measured by 2-h serum insulin (INS) after 2 h of having breakfast (American Diabetes Association, 2014). If insulin is not responded well by the cells, then glucose cannot enter the cells. As a result, the cells lack the fuel they need, and glucose builds up in the bloodstream. INS has three low, medium, and high Fuzzy sets.

  6. vi.

    Body Mass Index (BMI): BMI is used as a measure of the body’s weight to a person’s height. This area consists of three small, medium, and high Fuzzy sets in the developed system.

  7. vii.

    Diabetes Pedigree Function (DPF): DPF is the statistical classification of a certain data category (American Diabetes Association 2014). For example, of the age group 40–45, data are analyzed and calculated when determining statistical values for the age group. DPF consists of three fuzzy sets low, medium, and high.

  8. viii.

    Age: Age is considered a further diabetes predictor. Age has three new, medium, and old Fuzzy collections.

5 Result and Discussion

The program has been developed to diagnose Mellitus Diabetes. The program was developed and using the Pima Indian Diabetes Database (PIDD) dataset to test the performance of the proposed system. The database has 8 input attributes, and the third as goal variable. The following are the attributes of PIDD database namely: Triceps skinfold thickness (mm), number of times pregnant, 2-H serum insulin (mu U/ml), diastolic blood pressure (mm Hg), diabetes pedigree size, body mass index (weight in kg/(height in m) ^2), 2-h oral glucose tolerance test plasma glucose concentration, and age (years). The attributes were used to test if a patient is tested-positive or tested-negative. The totals of 768 patients are available in PIDD.

100 patients were used for training test from the Pima database diabetes dataset and stored as rules in the database (patient medical records) to assess the efficiency of the built program. To get a simple estimate 100 patients were used to check the proposed program from the dataset. Human respiratory disease experts assess the production value of each rule from the documents that were obtained for the program. Human knowledge is used to determine the sensitivity of the input variables. The severity of each variable reflects their contribution to the disease of diabetes.

Java programming language was chosen to build the system this is because it is a high-level programming language and object-oriented based program with functions and class which could be used to develop and link a graphic user interface for user interaction and responsiveness. The diagnostic system employed several components in JavaScript to achieve the implementation of this system.

5.1 Performance Evaluation Metrics

Basic assessment criteria and evaluation focused based on classification accuracy using True Negative (TN), True Positive (TP), False Negative (FN), and False Positive (FP). Table 3 displayed the performance evaluation metrics used in the paper.

Where:

  • TN = the number of healthy people among those diagnosed with diabetes

  • TP = means the number of people who currently have diabetes among those who have been diagnosed as diabetic

  • FN = number of people who were diagnosed as diabetic found to be stable

  • FP = depicts the number of unhealthy individuals, i.e. diabetic, but diagnosed as well.

5.2 Genetic Neuro-Fuzzy Inference Diagnostic System Experimental Results

The results gathered from the developed GNFIS diagnostics system for Diabetes Mellitus are shown in the following sections.

The Experimental Results of the GNFIS

Fifty (50) specimens were chosen for test runs from the sample. To evaluate the different output metrics the one trail/run closer to the average classification accuracy value was selected. The overall accuracy of the results rating was 97.79%. The GNFIS system performance assessment metrics for classifying the reduced dataset are shown in Table 2 and Table 3; this demonstrates that the developed system’s classification accuracy is better than the current research.

Genetic algorithm (GA) was used to delete samples and cases that were outliers, noisy, and inconsistent, with missing values being replaced by mean. Using a 10-fold cross-validation technique for classification, the algorithm was used as a feature selection method, and its output was fed to NFIS. During each run, GA was used to select different features from the original collection of attributes, and their classification accuracy for each run was registered. To obtain the satisfactory result the experiment was replicated 50 times. Table 4 and Fig. 2 shown the evaluation metrics obtained from the proposed system.

Table 1. Ranges of the output fuzzy set for diabetes mellitus application
Fig. 2.
figure 2

Graphical representation of evaluation metrics for GNFIS

The accuracy obtained from the system for diabetes mellitus diagnosis is displayed in Table 5 and Fig. 3. The results show that the hybridized system performed better compared with the single technique.

Table 2. Results of rating for the indicators of the diabetes mellitus by physicians
Fig. 3.
figure 3

The accuracy (%) for the three system for diabetes diagnosis

6 Conclusion and Recommendation

Test findings showed that the GNFIS works best, with 96% accuracy over the entire dataset being used. The Fuzzy logic model provided 94% accuracy. With 92% accuracy of the three models, the ANNs model registered the least accuracy. From the sensitivity study, as shown in Table 3, age, diabetes family history, BMI, and salty food preference played a significant role in the occurrence of the diabetes dataset, and the findings provide proof of diabetes prevention through community interventions. This research introduced Genetic Algorithms for dimensionality reduction, as well as Neural Network and Fuzzy Logic for diagnosing the data set for diabetes. The GNFIS analysis of the proposed system’s accuracy is 2.08% higher than the neural network and the fuzzy logic. However, the accuracy of the system can be increased using other combinations of other machine learning algorithms. Besides, this application runs on a standalone system. In future work, the system can be applied on high-performance client-server and mobile applications, thus client-server application installed can request and receive information over the network which will be easier to access by all users.

Table 3. Performance evaluation metrics
Table 4. Evaluation metrics obtained from GNFIS diagnosis
Table 5. The accuracy obtained from the three systems for diagnosis