Keywords

1 Introduction

Nowadays with the increasing volume in data, different fields have been advantage beneficiaries with the advances of the computational intelligence. More techniques and methods are employed in big data, knowledge extraction and support of decision making in diverse applications. One example of this can be encountered in medical purposes, where a better management of data, faster performance and improvement of the level of accuracy detection in diseases have made evident [1].

Different proposals from the computational intelligence have been employed to support problems in medical and biomedical applications. Learning systems as artificial neural networks (ANN) are commonly used in different applications such as image analysis and pattern recognition, biochemical analysis, drug design and diagnostic systems mainly [2,3,4]. In this last field, ANN have been used as support assistance for diagnosis of appendicitis in patients presenting with acute right iliac fossa [5], and comparisons with traditional medical models as the decision trees also have been implemented [6, 7]. Exploring different proposals of these learning systems, a database from 801 patients was applied to distinct proposals of ANN as radial basis function, multilayer network and probabilistic networks with results that reached rates of 99% in accuracy [8]. Other study to regard is related with analysis of data from a rural location, based on information collected in a period of 12 months from 156 patients, which also employed an ANN [9]. For the tuberculosis case, ANN have been widely used in diagnosis support systems. Proposals with different input information have been exposed, having results with sensitivity values upper than 80%, reaching rates of 100% in some cases. For specificity, results have been less satisfactory, with registered values that have dropped to 40%, in the worst case [10,11,12,13]. All those results show differences according to available information used in each study. Some studies just used a couple of variables, and a very few cases used all medical data of patients. It also depends on the quality of the information system used for such studies.

Neuroevolution of augmenting topologies (NEAT) is a technique to evolve ANN based on genetic algorithms. It was proposed in the beginning of the present millennium and is based on modifications of the weights and structure of the network synapses. Meanwhile the balance between the fitness of the evolved solutions and their diversity is maintained, crossover among topologies is developed, applying speciation [14,15,16].

Present work establishes a comparison between two different techniques to obtain the best architecture of an ANN to solve the disease detection problem in diagnosis support systems for tuberculosis and appendicitis. First technique is based on a traditional empirical mode to find the number of nodes in the hidden layer of a multilayer perceptron (MLP), which consists in to modify the number of nodes and test the network performance. Second method is based on NEAT technique to find the nodes and connections to solve the mentioned problems.

2 Methodology

Two databases were used for our comparison due to its similarities in number of examples, proportion of positive cases and number of variables to be considered. Both databases are detailed in this section. Then, aspects about the implementation of the ANN for disease detection will be explained.

2.1 Databases

First database was obtained from the TB Program at Hospital Santa Clara (HSC) in Bogotá D.C. - Colombia. Information from people under suspicion of pulmonary tuberculosis in the period from January 2008 to March 2011 was considered. The Ethics and Research Committee of the HSC approved this study. An informed consent was not needed because all data were obtained in a retrospective and anonymous mode. Only data from subjects with confirmed diagnostic were considered (using culture and individuals that finished the anti-TB therapy). At the end, information of 105 subjects was used: 83 subjects (79%) with TB confirmed and 22 subjects (21%) that were determined without the disease using diagnosis of exclusion. Confirmation of the TB cases was achieved using a culture test. For TB negative cases, tests did not have a positive culture test, other disease was found meanwhile the treatment, and as mentioned, a diagnosis of exclusion was used.

Features were extracted from different information. A first examination of signs and symptoms was performed by medical personnel, and a clinical suspicion diagnosis was determined. This variable was represented in an input variable named “Clinical information”, which takes a “1” value when just the medical report was considered, and “0” when other test result or additional information lead the subject to start the treatment. Other included variables were extracted from sex, age, homeless, diabetes status, and HIV (human immunodeficiency virus)/AIDS (acquired immunodeficiency syndrome) status. This last was determined using the study of clinical suggestion and confirming the status with exams, but without complementary information as CD4 cell or viral load. All variables were coded with zeros and ones according to negative or positive presence, respectively (Table 1). Age variable was maintained as numeric, with its original information, and a normalization given by the maximum value was achieved. This procedure was developed to avoid saturation of values in the synaptic weights of network and to avoid a wrong representation of the information in training.

Table 1. Variables used in the Tuberculosis diagnostic problem.

Second database represents seven medical measures with information from laboratory tests to confirm the diagnosis of appendicitis [17]. A total of 106 patients admitted to the emergency room during a three-month in 1980 and a six-month period of 1981–1982 were included with 85 with confirmed diagnostic by biopsy analysis with a histologic examination of the removed appendix. The age range was from two to 81 years with a mean age of 25. Fifty-five patients were males and 51 were females [18].

Variables were extracted from information from temperature and an admission blood sample. Then, total white count, manual differential count, cytochemical differential, and C-reactive protein quantity were taken into account. All values were real and then when normalized to be in the interval [0–1], according with the previous works reported in [17, 18]. Information about these variables can be seen in Table 2, where ‘Yes’ means subjects with confirmed disease.

Table 2. Variables used in the appendicitis diagnostic problem.

2.2 Models Determination

For estimating the statistical error and generalization of the models, using the explained dataset, cross-validation technique was employed [19]. In this case, the dataset was divided into three sets ensuring that data from people without and with the disease are equitable distributed. This is performed to assess the generalization of the trained model, preserving a portion of data that was not used in training. Table 3 shows these sets and the number of its members.

Table 3. Dataset division for training and validation.

Commonly, one hidden layer and one output layer are enough to solve classification problems, regardless of the type of input variable [20]. The ANN used in this work had an input layer composed of seven units, each one for each variable, and one output layer composed just of one neuron. Values of +1 and −1 were used to represent if input data corresponds to a patient with TB or not, respectively. Neurons in the hidden layer were established in an experimental way, testing from two to ten neurons. All neurons had a hyperbolic tangent function as activation function.

Between different algorithms to train the ANN, resilient backpropagation was used because its speed and low computational cost [21]. In training, also a cross-validation strategy was considered. In each case, training was performed with two sets (see Table 3) and results for validation were computed with the left out set, maximizing the classification rate between TB and no disease. To avoid overfitting, an early stopping procedure was implemented. Performances of the obtained models were evaluated using sensitivity and specificity. The different trainings were performed employing MATLAB 2017a (The MathWorks, Inc, Natrick, MA) through its Neural Networks Toolbox [22].

In the development of the NEAT approach, a proposal begins with a network in a similar way that a MLP feed-forward network of only input neurons and output neurons. As evolution progresses through discrete steps, the complexity of the network grows, inserting more nodes into a connection between input-output path and creating new connections within the actual nodes. For this, parameters of the network are represented into a phenotype of genetic algorithms, encoding the schemes that means every connection and neuron in an explicit representation. In this way, the NEAT method attempt simultaneously the learned weights and an appropriate topology for the MLP.

For this case, a cross-validation was used and 100 initializations were implemented to compute statistical measures of the results and to see the coherence between them. Fitness computation was based in the error given by the distance of the output from the correct answer summed for all patterns included in the training set. The resulting number was squared to give more proportionally more fitness the closer a network was to a solution [14]. First generation was formed by networks without hidden nodes with one output and a number of inputs according with the number of variables in each database. Also, a node with bias information was settled as one. There were connections corresponding with these initial nodes, representing the genes in each genome, and each connection with a random weight.

One hundred runs were developed by each fold considered in the database for training. Each run was composed by 200 as maximum number of generation for a generational loop, and the population size was of 150. The population with best results was save as more representative of each run. Finally, the results were collected to study the generalization of the network in front of new patterns taken from the fold left out. The NEAT tool for Matlab © was employed to developed the experiments [16].

3 Results

First, results with comparison for both databases when the NEAT method was applied are presented. Then, results for each disease are shown in terms of sensitivity and specificity. Figures 1 and 2 show the results for the three used folds, visualizing the average of nodes in the hidden layer found by the NEAT method. It is possible to see that the tuberculosis detection, architectures between two and six nodes were obtained to do the classification. Meanwhile, the Appendicitis problem shows that between ten and twelve nodes, in mean, are necessary to perform the classification.

Fig. 1.
figure 1

Results for average of number of hidden nodes found for the NEAT method in both databases.

Fig. 2.
figure 2

Results for average of disabled connections found for the NEAT method in both databases.

Other comparison can be visualized in the number of connections in the hidden layer (Fig. 2), where the appendicitis problem seems to be more difficult to classify. This can be indicated by the higher numbers for nodes and connections, compared with the tuberculosis case.

Neuroevolution results were compared with the empirical method to find the number of nodes in the hidden layer in a MLP between two and ten, according as mentioned before. The best values of each fold were taken into account to compute the comparative results. Table 4 resumes these corresponding results. For this, the best value in each fold was taken into account and mean and standard deviation were computed for these measures. In empirical cases, the model to detect appendicitis had three nodes in the hidden layer for all folds. The model to detect tuberculosis had two, five and nine nodes in the hidden layer, respectively.

Table 4. Comparison results for both employed techniques.

Figures 3 and 4 show the ROC curve for the used folds. It is possible to see that best results were reached with the empirical method (Fig. 4). Therefore, the NEAT method reached comparable results for the detection (Fig. 3). In a similar way, Figs. 5 and 6 show the results for appendicitis detection, allowing to observe similarities in both methods, but with best results for the empirical one (Table 4).

Fig. 3.
figure 3

ROC curve for three folds in the tuberculosis problem using NEAT method.

Fig. 4.
figure 4

ROC curve for three folds in the tuberculosis problem using the empirical method.

Fig. 5.
figure 5

ROC curve for three folds in the appendicitis problem using the NEAT method.

Fig. 6.
figure 6

ROC curve for three folds in the appendicitis problem using the empirical method.

4 Discussion

First observation is given by the complexity of models for both diseases. Tuberculosis diagnosis problem come from an easier problem compared with the appendicitis problem, this can be seen according with the number of hidden nodes and the connections in the network when NEAT method was used (Figs. 1 and 2). There is possible to note that tuberculosis problem can be solve with between three and six nodes, while appendicitis problem needs between ten and twelve nodes in mean to detect the disease.

After the comparison with the empirical technique, it is possible to observe that the results are equivalent for the tuberculosis detection (Table 4). NEAT offers best results for this particular problem, finding a better sensitivity and specificity. At the same time a smaller architecture also is found. This because the results are for architectures between three and six nodes in the NEAT method compared with three, five or nine nodes for the empirical way. This, in terms of number of connections is more efficient because for NEAT method this number can be in the interval 15 to 25 in mean. Instead the model with six nodes in the hidden layer for the empirical method produces around 50 connections.

In the case of appendicitis detection, the values for sensitivity and specificity were not so good. It is notable the results for the specificity when the empirical way was implemented, but sensitivity was not well succeeded as in reported works, where reached rates of 89% [18] and 97% [9]. However, the results are comparable and the scope of this work was to compare two different methods to obtain ANN architectures. In this case, a NEAT method offered comparable results as regards to measures of sensitivity and specificity, and size of the architecture related with nodes and connections.

Differences in the results can be explained by the type of used variables. For the tuberculosis case, most of variables are binary inputs, manifesting existence or absence of the variable. Meanwhile the used variables in the appendicitis problem were all real, making more difficult the classification. In spite of ANN can be deal with any type of input variable, for the present case, this could manifest a problem. This supposes a future work to do by the light of these results.

5 Conclusions

Two methods to obtain networks architectures were compared for the disease detection problem. A neuroevolution of augmenting topologies and traditional empirical way were employed to detect diseases based on different input variables. Results show that for tuberculosis case, the NEAT method offered better results and for appendicitis case the results are comparable in terms of sensitivity and specificity, but for architecture size represent a bigger network.