Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Diabetes mellitus is one of the most serious health challenges in both developing and developed countries. According to the International Diabetes Federation (IDF), 382 million people suffered from diabetes in 2013 while this is expected to rise to 592 million by 2035. Diabetes mellitus is a complex metabolic disorder which is characterised by persistent hyperglycemia and results from defects in insulin secretion, insulin action or both [1]. Diabetes can be either Type-1 (insulin dependent) or Type-2 (non-insulin dependent). The dataset we are focussing on in this paper originates from a population of Pima Indians and comprises Type-2 positive and negative instances. According to [2], the Pima Indians of Arizona have the highest reported incidence of diabetes in the world, while the predominant Type-2 diabetes observed there is typically characterised by slow and gradual commencement. Consequently, the traditional diagnostic method which is partially based on the plasma glucose test may be delayed by up to ten years [1].

Biomedical signal classification can be defined as a categorisation of the recorded input data, which is often based on extracting significant features into distinct classes [3]. The objective is to group the observed signals, thus generating decision boundaries to separate the involved categories. Typical examples include diagnostic checks so as to identify whether a patient suffers from a certain disease such as diabetes or not.

While the traditional approach used in designing pattern classification systems is based on experimental comparison of several classification algorithms in order to select the best one [4], more recently combinations of multiple classifier decisions are investigated [412]. A combined classifier can perform better than any of its base continuant classifiers due to the complementary performance of the base classifiers which leads to a more stable combination. A diverse set of base classifiers can be generated by parameter variation of the base classifier and/or variations of the training sets as in bagging and boosting approaches [1315]. Ensembles of neural network classifiers have been introduced in [1618]. The base classifiers are different from each other in one or more aspect such as the training parameters, initial weights, network architecture or training algorithm.

In this paper, we present an effective approach to diabetes classification based on a committee of neural network classifiers. We enforce diversity in the ensemble by varying the architectures of the neural networks, with the involved parameters chosen adaptively to generate a pool of classifiers. The final decisions are combined using a majority voting rule, and the resulting ensemble is shown to give excellent classification performance on the Pima Indian diabetic database from the UCI machine learning repository.

2 Diabetes Dataset

The dataset used for our study is the Pima Indians Diabetes Database (PIDD) from the UCI machine learning repository.Footnote 1 The dataset is a collection of medical diagnostic reports from 768 records of female patients of Pima Indian heritage in Arizona. The following attributes form each record: number of times pregnant, plasma glucose concentration, diastolic blood pressure, triceps skin fold thickness, 2-hour serum insulin, body mass index, diabetes pedigree function, and age. Each subject was tested as either positive or negative with 500 non-diabetic patients and 268 diabetic ones.

3 Proposed Classifier Committee

A variety of classifiers has been used for diagnosis of diabetes including nearest neighbour, multi-layer perceptron, decision tree, Bayes, and fuzzy classification algorithms to name a few. However, rather than relying on a single predictor, combining multiple classifiers can lead to improved and more robust classification performance. In this paper, we design such a combined classifier, or classifier ensemble, which makes use of parameter based diversity enforcement, for the purpose of diabetes diagnosis.

3.1 Multi-layer Perceptron (MLP) Neural Network

Multi-layer neural networks (MLNNs) have been successfully used in replacing conventional pattern recognition methods for a variety of applications including disease diagnosis [19, 20]. In the first stage of this study, a multi-layer neural network structure was used for diabetes identification on the Pima Indian dataset. The network structure consists of input layer, hidden layers, and output layer, where the hidden layer output layer neurons use non-linear sigmoid activation functions. There are eight inputs features, while the two outputs relate to the two classes. We used the Levenberg-Marquardt algorithm [21] for training the neural networks.

3.2 Cascade-Forward Back Propagation Network (CFBN)

A cascade type of feed-forward neural networks consists of a layer of input, a layer of output neurons, and one or more hidden layers. Similar to a general feed-forward networks, the first layer has weights coming from the input. However, each subsequent layer has weights coming from the input and all previous layers, while all layers have biases. The last layer is the network output. It was found [22] that CFBN can provide a more effective algorithm compared to feed-forward back propagation methods.

3.3 Proposed Classifier Ensemble

Combining multiple classifiers has been intensively studied and is widely regarded as an effective technique to overcome the limitations of individual classifiers [31, 32] with many well-known models having been proposed [33, 34]. Importantly, these classifier ensemble methods, such as neural network ensembles, have been applied successfully in many real-world applications [24, 25].

In general, there are two main categories of classifier ensembles. The first one aims at learning multiple classifiers at the feature level, and usually trains and combines multiple classifiers in the learning process, such as boosting [26] and bagging [14]. The second group tries to combine classifiers at the output level, where the results of multiple available classifiers are combined, e.g., multiple classifier systems or mixtures of experts [27]. In this paper, we focus on the second approach. In particular, given multiple classifiers (available or sequently learned, homogeneous or heterogeneous), the classifier ensemble is learned by combining these component classifiers in an intelligent way.

Classifiers differing in feature representation, architecture, learning algorithm or training data exhibit complementary classification behaviour and the fusion of their decisions can yield higher performance than the best individual classifier. It had been observed that although one classifier would yield the best performance, the sets of patterns misclassified by different classifiers would not necessarily overlap. This suggests that different classifiers potentially offer complementary information about the patterns to be classified, which can be exploited in order to improve the overall classification performance. By combining classifiers, the idea is not to rely on a single predictor but instead combine the individual opinions of multiple classifiers to arrive at a consensus decision.

In [28], three approaches to building multiple classifier systems were presented: divide and conquer approach, sequential approach and parallel approach. In the divide and conquer approach, the types of input are isolated on which each specific classifier performs well, and a new input is directed accordingly. In the sequential approach, one classifier is used at the beginning and other classifiers are invoked only if the first classifier fails, while in the parallel approach, all individual classifiers are applied in parallel to the same input data, and a combination method is applied to aggregate their decisions. It is this last technique that we employ in this paper.

In particular, we train—randomly—many individual neural network classifiers. Each classifier is different from the others in terms of number of epochs, learning rate, and/or number of hidden neurons of the network. At the end, we combine the classifier results by using the majority voting technique (also known as plurality voting technique). Both multi-layer neural network and cascade-forward back propagation network are used as base classifier in our proposed ensemble systems.

3.4 Combination Rules

Fixed combiners are widely studied in the literature on combining classifiers [29, 30]. The confidence \(q_j(x)\) for class j is computed by

$$\begin{aligned} q_j(x)=\frac{h_j(x)}{\sum _i h_j(x)} \end{aligned}$$
(1)

with

$$\begin{aligned} h_j(x)=rule_{i}(p_{ij}(x)) . \end{aligned}$$
(2)

The final classification can be made by

$$\begin{aligned} w(x)= \arg _j \max (q_j(x)) . \end{aligned}$$
(3)

This Maximum rule selects the classifier producing the highest estimated confidence.

In contrast, in ensemble majority voting, which is a popular way of combining classifiers, the votes for each class over the input classifiers are counted and the majority class selected. In terms of the rule of Eq. 2 this equates to

$$\begin{aligned} h_j(x)=\sum _i I(\arg _i \max (q_j(x))=i), \end{aligned}$$
(4)

where I() is the indicator function defined as

$$\begin{aligned} I(y)= \left\{ \begin{array}{ll} 1, &{}\quad \text {if y}=\text {true;} \\ 0, &{}\quad \text {otherwise} \end{array} \right. . \end{aligned}$$
(5)

4 Experimental Results

4.1 MLP Ensemble

In the first experiment, MLP is used as base classifier for our proposed ensemble. In order to generate diversity in the classifier ensemble, a pool of 16 different classifiers are generated and the final hypothesis is combined using majority voting. The classification accuracy of the best individual classifier on the test data is 80.21 % while the proposed combined classifier achieved a classification accuracy of 95.31 %. We note that the proposed combined classifier compares favourably with results from the literature. Figure 1 shows the accuracy of base MLP versus the number of individual classifiers, while Figs. 2 and 3 show the accuracy of base MLP versus the number of hidden neurons and number of training epochs, respectively.

Fig. 1
figure 1

Accuracy of base MLP versus the number of individual classifiers

Fig. 2
figure 2

Accuracy of base MLP versus the number of hidden neurons

Fig. 3
figure 3

Accuracy of base MLP versus the number of training epochs

4.2 CFBN Ensemble

In the second experiment, a cascade-forward back propagation network (CFBN) is used as the base classifier for the combined classifier. The best individual classifier gave a classification accuracy of 81.77 % on the test data. In contrast, the proposed combined classifier achieved a performance of 96.88 %. The CFBN approach thus gave slightly better compared to the MLP, both for the best individual classifier and for the generated ensemble. Figure 4 shows the accuracy of base CFBN versus the number of individual classifier, while Figs. 5 and 6 show the accuracy of base CFBN versus the number of hidden neurons and number of training epochs, respectively.

Fig. 4
figure 4

Accuracy of base CFBN versus the number of individual classifiers

Fig. 5
figure 5

Accuracy of base CFBN versus the number of hidden neurons

Fig. 6
figure 6

Accuracy of base CFBN versus the number of training epochs

5 Performance Measurements

There are a number of metrics to measure the performance of classification methods which are commonly used in medical diagnosis tasks, which are based on the definitions of true positives (TP, the number of correct predictions in which an instance is positive), true negatives (TN, the number of incorrect predictions in which an instance is negative), false positives (FP, the number of incorrect predictions in which an instance is positive), and false negatives (FN, the number of correct predictions in which an instance is negative).

Recall (sensitivity) is the percentage of real positive cases that are correctly predicted positive. Conversely, precision (confidence) indicates the percentage of predicted positive cases that are correctly real positives. The F-measure combines precision and recall as their harmonic mean. Specificity indicates the percentage of samples that were classified as normal and which were labeled as normal. In Fig. 7, we summarise the experimental results, in terms of these measures. As we can see, the performance of combining CFBNs outperforms an ensemble of MLPs.

Fig. 7
figure 7

Performance measures of the proposed methods on test data

6 Conclusions

Automatic identification of diabetes is a challenging task in pattern recognition. Combinations of weak classifiers have shown the potential to achieve good generalisation performance. In this paper, we have introduced a classifier ensemble to solve the instability problem of individual pattern classifiers in the context of diabetes diagnosis. The proposed classifier consists of multiple individual classifiers which are different in both architecture and training parameters. As confirmed in the experimental evaluation, the achieved classification accuracies of the combined classifiers clearly outperform those of the best individual classifiers. The performance is further found to outperform that of the reported methods in the literature. Of the two investigated neural network models that were employed as base classifiers, cascade-forward back propagation networks were found to outperform multi-layer perceptron networks, both as individual classifiers and in terms of the resulting ensembles.