Enhancement of artificial neural network learning using centripetal accelerated particle swarm optimization for medical diseases diagnosis

Beheshti, Zahra; Shamsuddin, Siti Mariyam Hj.; Beheshti, Ebrahim; Yuhaniz, Siti Sophiayati

doi:10.1007/s00500-013-1198-0

Enhancement of artificial neural network learning using centripetal accelerated particle swarm optimization for medical diseases diagnosis

Methodologies and Application
Published: 15 December 2013

Volume 18, pages 2253–2270, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

Enhancement of artificial neural network learning using centripetal accelerated particle swarm optimization for medical diseases diagnosis

Download PDF

Zahra Beheshti¹,
Siti Mariyam Hj. Shamsuddin¹,
Ebrahim Beheshti² &
…
Siti Sophiayati Yuhaniz¹

1063 Accesses
53 Citations
Explore all metrics

Abstract

In recent decades, artificial neural networks (ANNs) have been extensively applied in different areas such as engineering, medicine, business, education, manufacturing and so on. Nowadays, ANNs are as a hot research in medicine especially in the fields of medical disease diagnosis. To have a high efficiency in ANN, selection of an appropriate architecture and learning algorithm is very important. ANN learning is a complex task and an efficient learning algorithm has a significant role to enhance ANN performance. In this paper, a new meta-heuristic algorithm, centripetal accelerated particle swarm optimization (CAPSO), is applied to evolve the ANN learning and accuracy. The algorithm is based on an improved scheme of particle swarm algorithm and Newton’s laws of motion. The hybrid learning of CAPSO and multi-layer perceptron (MLP) network, CAPSO-MLP, is used to classify the data of nine standard medical datasets of Hepatitis, Heart Disease, Pima Indian Diabetes, Wisconsin Prognostic Breast Cancer, Parkinson’s disease, Echocardiogram, Liver Disorders, Laryngeal 1 and Acute Inflammations. The performance of CAPSO-MLP is compared with those of PSO, gravitational search algorithm and imperialist competitive algorithm on MLP. The efficiency of methods are evaluated based on mean square error, accuracy, sensitivity, specificity, area under the receiver operating characteristics curve and statistical tests of $t$-test and Wilcoxon’s signed ranks test. The results indicate that CAPSO-MLP provides more effective performance than the others for medical disease diagnosis especially in term of unseen data (testing data) and datasets with high missing data values.

Training Multi-Layer Perceptron Using Population-Based Yin-Yang-Pair Optimization

An Efficient Hybrid Model Based on Modified Whale Optimization Algorithm and Multilayer Perceptron Neural Network for Medical Classification Problems

Article 17 May 2022

A cooperative coevolutionary method for optimizing random weight networks and its application for medical classification problems

Article 29 April 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Artificial neural networks (ANNs) (Fausett 1994; Gurney 1997) are nonlinear sophisticated modeling techniques inspired by biological nervous systems. ANNs are one of the powerful tools effectively employed in various fields (Anderson 2006) such as time series prediction, classification, pattern recognition, system identification and control, function approximation, signal processing and so on. The performance of ANNs depends on selecting proper architecture and learning algorithm.

During the recent decades, there have been many studies for solving the problem of ANN learning and architecture optimization. The Back-Propagation (BP) algorithm (Rumelhart and McClelland 1986) was introduced and applied widely to train multilayer networks. However, algorithms based on gradient-descent methods like BP easily trap into local minimum (Brent 1991; Gori and Tesi 1992). Also, the proposed traditional learning algorithms suffer from some shortcomings such as slow convergence rate and long learning time. Hence, many researchers have employed meta-heuristic algorithms for ANN learning to overcome the mentioned limitations. They have been used both in training and obtaining appropriate ANN architecture. The main advantages of using these algorithms over the traditional ones are their ability of escaping from local minima and adapting with their environment.

Genetic algorithm (GA) (Tang et al. 1996), artificial immune system (AIS) (Farmer et al. 1986), ant colony optimization (ACO) (Dorigo et al. 1996), particle swarm optimization (PSO) (Kennedy and Eberhart 1995; Shi and Eberhart 1998), artificial bee colony (ABC) (Karaboga 2005), imperialist competitive algorithm (ICA) (Atashpaz-Gargari and Lucas 2007) and gravitational search algorithm (GSA) (Rashedi et al. 2009) are in the class of meta-heuristic algorithms. The study in (Yao 1999) offers a general framework of the algorithms for ANN learning.

Samanta et al. (2003) compared the performance of bearing fault detection using two different classifiers of ANN and support vector machine (SMV) where the classifier parameters were optimized by GA. In the method, the time-domain vibration signals of a rotating machine with normal and defective bearings were processed for feature extraction. The extracted features from original and preprocessed signals were applied as inputs to the classifiers for two-class (normal or fault) recognition. In another research, GA and ANN were developed for loading pattern optimization of advanced gas-cooled reactors (AGR) (Ziver et al. 2004). Oreski et al. (2012) proposed a hybrid system of GA and ANN for retail credit risk assessment. The method provided a feature selection technique for finding an optimum feature subset to improve the accuracy of ANN classification.

Yao et al. (2000) applied the hybrid learning of PSO and feed-forward neural network (FFNN) to classify two problems in medical domain. The result showed that the method has better accuracy in classified data compared to other algorithms (Bennett and Mangasarian 1992; Setiono and Hui 1995; Prechelt 1995). Also, an adapted particle swarm optimization (PSO) model was offered to train MLP for predicting the outcome of construction claims in Hong Kong (Chau 2007). The successful prediction rate was up to 80 % and the method had more accurate results than its counterparts of a benchmarking BP. In other study, Huang et al. (2010) proposed an ANN classifier with entropy based feature selection for Breast Cancer diagnosis. Levenberg–Marquardt (LM) was used for training and PSO algorithm was applied for obtaining the proper weights of ANN.

Further, a hybrid approach based on ACO and ANN (Socha and Blum 2007) was presented for feature selection of several medical datasets. The proposed method had better performance than BP and GA for ANN. In another study, a hybrid version of continuous ACO with classical gradient techniques was applied to train ANN for medical pattern classification (Sivagaminathan and Ramakrishnan 2007).

Moreover, a comparative study was conducted on chronic obstructive pulmonary and pneumonia diseases diagnosis using the hybrid of AIS and different ANN classification models (Er et al. 2009). The results of the study were compared with those of the pervious similar studies reported focusing on the diseases diagnosis (Ashizawa et al. 1999; Heckerling et al. 2004; Coppini et al. 2007). It was concluded that AIS and ANN could be successfully used to diagnose chronic obstructive pulmonary and pneumonia diseases. A hybrid learning of MLP and AIS also was presented to predict the output from a grid-connected photovoltaic system (Sulaiman et al. 2012). In the proposed method, AIS was selected as the optimizer for the training process of MLP to optimize ANN architecture.

Besides, Ozkan et al. (2011) designed an ANN for modeling daily reference evapotranspiration (ET$_{0})$ using ABC algorithm. The accuracy of proposed model was compared with those of the ANN models trained with LM, standard BP algorithms and some existing empirical models (Kisi and Ozturk 2007). The proposed model had better performance in estimation of ET$_{0}$.

Ahmadi et al. (2013) offered a model based on MLP to predict permeability of the reservoir and used ICA to decide the initial weights of the network. The performance and the generalization capability of proposed method were compared with those of models developed with the common technique of BP. The results showed that the method outperformed the gradient descent-based neural network. Also, Tayefeh-Mahmoudi et al. (2013) proposed a method to evolve ANN architecture using grammar encoding and ICA. The hybrid method was evaluated on four known regression problems and compared against the state of the art methods: GA, resilient BP, Minimum Finder (MinFinder), Broyden–Fletcher–Goldfarb–Shanno (BFGS), and neural network construction (NNC) as optimization methods (Tsoulos et al. 2008). The experimental results showed that the proposed grammar encoding method using ICA outperformed the other methods.

Although many meta-heuristics have been extensively applied in ANN learning, none of them has shown a good performance for all applications. Also, the existing meta-heuristics have several disadvantages (Leung et al. 1997; Hrstka and Kučerová 2004; Liang et al. 2006; Gao et al. 2012; Moslemipour et al. 2012) such as slow convergence speed, trapping into local minima, long computational time, tuning many parameters and difficult encoding scheme. Hence, the performance enhancement of previous meta-heuristics or proposing new ones would seem necessary to increase the efficiency of ANN learning in different domains.

In this paper, the CAPSO algorithm (Beheshti and Shamsuddin 2013a) is used to improve ANN learning. Several medical datasets (Kuncheva 2005; Bache and Lichman 2013) are selected in order to evaluate the proposed method. The performance metrics of mean square error (MSE), correct classification rate or accuracy (ACC), sensitivity, specificity and area under the receiver operating characteristics curve (AUC) are employed to compare the results of proposed algorithm with PSO, GSA and ICA on MLP learning.

The structure of this paper is organized as follows. Section 2 briefly describes a background of MLP and blending some meta-heuristic algorithms used in this study for MLP learning. The details of proposed approach are elucidated in Sect. 3. Section 4 demonstrates the experimental results of neural network classification on medical datasets using the algorithms. Finally, a conclusion is given in Sect. 5.

2 Background materials

The basic background materials required for a deep understanding of the proposed method are elucidated in this section. At first, MLP as a FFNN is described after that the algorithms of PSO, GSA and ICA are explained in order to compare their results with the proposed algorithm for MLP learning in the classification problems. Later, the classification problem is briefly explained.

2.1 MLP network

FFNN is one of the most common ANNs and maintains a high level of research interest because of its ability to map any function to an arbitrary degree of accuracy. This kind of network has been used for both MLP network (Hornik et al. 1989) and radial basis function (RBF) network (Park and Sandberg 1991). Due to their similarity in functional mapping ability, MLP and RBF networks share the same problem domains.

Multi-layer perceptron has been utilized in a broad range of science such as finance, medicine, engineering, geology, physics and biology. In medicine, it has been extensively employed in medical diagnosis, detection and evaluation of medical conditions and treatment cost estimation (Handels et al. 1999; Folland et al. 2004; Karabatak and Ince 2009; Ari and Saha 2009; Jerez et al. 2010; Abbod et al. 2011). Also, it has been used in replacing conventional pattern recognition methods for the disease diagnosis systems (Kayaer and Yildirim 2003; Delen et al. 2005; Temurtas 2007; Elveren and Yumuşak 2011). As Fig. 1 demonstrates, MLP contains one input layer which receives external inputs, one or more hidden layers, and one output layer which shows the results. All layers except the input layer are made of processing nodes and activation function. Data presented at the input layer and the network nodes carry out calculations in the sequential layers until an output value is acquired at each output node.

Neurons are the processing units of MLP. An artificial neuron has been simulated in Fig. 2. As shown, each neuron or node computes the sum of the inputs weighted at the presence of a bias, and passes this sum through an activation function (like sigmoid function) so that the output is obtained. This process can be expressed as Eq. (1).

$$\begin{aligned} U_j&= \sum \limits _{i=1}^n {w_{ji} x_j } +\theta _j , \nonumber \\ O_j&= f_j ( {U_j }) \end{aligned}$$

(1)

where $w_{ji}$ is the weight connected between node $i $ and $j,U_{j}$ is the linear combination of inputs, $ O_{j}$ and $ \theta _j $ are the output and bias of node $j$ respectively and $f(.)$ is an activation function.

$f_{j}$ (.) is usually considered as a sigmoid function and $O_{j}$ is represented by:

$$\begin{aligned} O_j =f_j (U_j )=\frac{1}{(1+e^{-U_j })}. \end{aligned}$$

(2)

Proper design of MLP architecture is very complex because many elements affect the performance of MLP such as the number of nodes in input, hidden and output layer, distribution of layers, interconnection between neurons and layers (weights), error function and activation function.

Also, selecting a proper learning algorithm is another problem of using MLP. In the following sections, some meta-heuristic algorithms applied in MLP learning are explained and finally their performances are compared together.

2.2 Particle swarm optimization

Particle swarm optimization is a population based meta-heuristic algorithm inspired by social behavior of bird flocking or fish schooling. In the algorithm, each solution is called particle and particles can be presented by a group of vectors as ($\vec {X}_i ,\vec {V}_i ,\vec {P}_i )$ in a d-dimensional search space, where $\vec {X}_i $, $\vec {V}_i $ and $\vec {P}_i $ are the position, velocity and the personal best position of the $i\mathrm{th}$ particle respectively. The vectors are defined as follows:

$$\begin{aligned} \vec {X}_i&= (x_{i1} ,x_{i2} ,\ldots ,x_{id} ) \textit{for}\,\, i=1,2,\ldots ,N,\end{aligned}$$

(3)

$$\begin{aligned} \vec {V}_i&= (v_{i1} ,v_{i2} ,\ldots ,v_{id} ) \textit{for}\,\, i=1,2,\ldots ,N,\end{aligned}$$

(4)

$$\begin{aligned} \vec {P}_i&= (p_{i1} ,p_{i2} ,\ldots ,p_{id} ) \textit{for}\,\, i=1,2,\ldots ,N, \end{aligned}$$

(5)

where $N$ is the number of particles.

The velocity and position of particles are randomly initialized and the next velocity and position of every particle are computed as Eqs. (6) and (7):

$$\begin{aligned}&v_{id} ({t\!+\!1})=w( t)\!\times v_{id} ( t)\!+\!C_1 \times \textit{rand}\times ( {p_{id} ( t)\!-\!x_{id} ( t)})\nonumber \\&\quad +C_2 \times \textit{rand}\times ( {p_{gd} ( t)-x_{id} ( t)}),\end{aligned}$$

(6)

$$\begin{aligned}&x_{id} ( {t+1})=x_{id} ( t)+\mathop v\nolimits _{id} ( {t+1}) \end{aligned}$$

(7)

where $v_{id}(t)$ and $x_{id}(t)$ are the current velocity and position of ith particle respectively. $w( t)$ is inertia weight, $C_{1}$ and $C_{2}$ are acceleration coefficients. rand is uniformly random number in the interval of [0, 1]. In addition, $v_{id} ( t)\in [ {-v_{\max } ,v_{\max } }]$ and $v_\textit{max}$ is set to a constant based on the bounds of the solution space by users. In Eq. (6), the second and the third terms are called cognition and social term respectively. $p_{gd}$ is the best position achieved by the entire swarm so far in the $d\mathrm{th}$ dimension. For the population, this position is defined as follows:

$$\begin{aligned} \mathop {Pg}\limits ^\rightarrow =( {p_{g1} ,p_{g2} ,\ldots ,p_{gd} }). \end{aligned}$$

(8)

Although PSO is easy to implement, it has several disadvantages due to its poor exploration, such as facing the slow convergence rate, selecting parameters and getting easily trapped in a local optimum (Leung et al. 1997; Hrstka and Kučerová 2004; Liang et al. 2006; Gao et al. 2012; Moslemipour et al. 2012).

2.3 Gravitational search algorithm

GSA is a meta-heuristic algorithm inspired by the Newtonian gravity and motion laws. According to the gravity law (Schutz 2003), all particles in the universe attract each other by a gravity force. This force is directly proportional to the product of their masses and inversely proportional to the square of the distance between them. Therefore, GSA defines a system with $N$ agents (masses) based on the law in a d-dimensional search space and at the beginning, agent positions are initialized randomly. The position of the $i\mathrm{th}$ agent is illustrated as follows:

$$\begin{aligned} \vec {X}_i =(x_{i1} ,x_{i2} ,\ldots ,x_{id} )\quad \textit{for}\,\, i=1,2,\ldots ,N, \end{aligned}$$

(9)

where $x_{id}$ is the position of the ith agent in the dth dimension.

Each mass presents a solution and the heaviest mass shows an optimum solution in the search space. By lapse of time, masses are attracted by the heaviest mass. The gravitational force of mass $j$ on mass $i$ at a specific time $t$ is computed as follows:

$$\begin{aligned} F_{ij,d} ( t)=G( t)\frac{M_i ( t)\times M_j ( t)}{R_{ij} ( t)+\varepsilon }( { x_{jd} ( t)- x_{id} ( t)}), \end{aligned}$$

(10)

where $M_{i}$ and $M_{j}$ are the masses of agent $i$ and agent $j$ respectively. $\varepsilon $ is a small constant. $G(t)$ is gravitational constant at time $t$ initialized at the beginning and decreased by time $t$ to control the search accuracy. $R_{ij}$(t) is the Euclidean distance between probe $i$ and $j$ at time $t$:

$$\begin{aligned} R_{ij} ( t)= {\left\| {x_i ( t),x_j ( t)} \right\| }_2 . \end{aligned}$$

(11)

$M_{i}$ in Eq. (10) is calculated as follows:

$$\begin{aligned} m_i ( t)&= \frac{ {\textit{fit}}_i ( t)-\textit{worst}( t)}{\textit{best}( t)-\textit{worst}( t)},\end{aligned}$$

(12)

$$\begin{aligned} M_i ( t)&= \frac{ m_i ( t)}{\sum \nolimits _{j=1}^N { m_j ( t)} }, \end{aligned}$$

(13)

where fit $_{i}(t)$ is the fitness value of the agent $i$, best(t) and worst(t) are the best and the worst values of fitness functions at time $t$.

Also, the total force acting on agent $i $ at time $t $ in the $d$th dimension is computed by Eq. (14):

$$\begin{aligned} F_{id} ( t)=\sum \limits _{j\in \textit{kbest},j\ne i}^N { {\textit{rand}_j \times F}_{ij,d} ( t)}, \end{aligned}$$

(14)

where rand $_{j}$ is a random number in the range of [0, 1]. Kbest is a function of time, with the initial value $K_{0}$ at the beginning and it is linearly decreased step by step. In fact, Kbest is the set of first $K $agents with the best fitness value and the biggest mass.

The force accelerates the agent $i $ as Eq. (15) and the agent moves from current position to new position:

$$\begin{aligned}&a_{id} ( t)=\frac{ F_{id} (t)}{ M_i ( t)},\end{aligned}$$

(15)

$$\begin{aligned}&v_{id} ( {t+1})= v_{id} ( t)+ {\textit{rand}}_i {\times \,a}_{id} ( t),\end{aligned}$$

(16)

$$\begin{aligned}&x( {t+1})= x_{id} ( t)+ v_{id} ( {t+1}), \end{aligned}$$

(17)

where $a_\textit{id}(t)$ represents the acceleration of the $i\mathrm{th}$ agent, $v_\textit{id}(t)$ and $x_\textit{id}(t)$ are the current velocity and position of the $i\mathrm{th}$ agent. Also, $v_\textit{id}(t~+~1)$ and $x_\textit{id}(t~+~1) $ show the next velocity and position of the agent $i$ in dimension $d$ at time $t$ respectively.

Similar to other meta-heuristic algorithms, GSA also has some weaknesses such as using complex operators and taking long time to run (Beheshti and Shamsuddin 2013a, b).

2.4 Imperialist competitive algorithm

ICA is a global search optimization algorithm inspired by imperialistic competition. The algorithm is population-based including countries and imperialists. Each individual of the population is called a country. Some of the best countries with the best cost are chosen as imperialist states and others named colonies are divided among the imperialists according to their costs. Empires are formed by imperialist states and their colonies. After forming the empires, imperialistic competition is started among them to collapse weak empires and to remain the most powerful empire. To divide the colonies among the imperialists, the cost and power of each imperialist are normalized and initial colonies are allocated to the empires as Eqs. (18)–(20):

$$\begin{aligned}&C_n = c_n - \mathop {\max }\limits _i \left\{ { c_i } \right\} \!,\end{aligned}$$

(18)

$$\begin{aligned}&p_n =\left| {\frac{ C_n }{\sum \nolimits _{i=1}^{ N_{imp} } { C_i } }} \right| ,\end{aligned}$$

(19)

$$\begin{aligned}&{N\cdot C}_n =\textit{round}\left\{ {p_n .N_{col} } \right\} \!, \end{aligned}$$

(20)

where $c_{n}$ is the cost of the $n\mathrm{th}$ imperialist and $C_{n}$ and $p_{n}$ are the normalized cost and power of the imperialist respectively. Also, $N.C_{n}$ is the initial number of colonies related to the $n\mathrm{th}$ empire and $N_\textit{col}$ is the total number of initial colonies.

To form the $n\mathrm{th}$ empire, the $N.C_{n}$ of the colonies are randomly selected and allocated to the $n\mathrm{th}$ imperialist. In real world, the imperialist states try to make their colonies as a part of themselves. This process called assimilation is modelled by moving all of the colonies toward the imperialist as illustrated in the Fig. 3. In this figure, $\theta $ and $x$ are random angle and number with uniform distribution respectively, also; $d$ is the distance between the imperialist and colony.

$$\begin{aligned} x&\sim U( {0,\beta \times d}),\end{aligned}$$

(21)

$$\begin{aligned} \theta&\sim U( {-\gamma ,\gamma }), \end{aligned}$$

(22)

where $\beta $, $\gamma $ are parameters which cause colonies to move their relevant imperialist in a randomly deviated direction.

While a colony moves toward an imperialist, it might achieve a better position than the imperialist. In this case, the colony and the imperialist change their positions with each other. Also, it is possible that a revolution takes place among colonies which changes the power or organizational structures of the empire. This revelation is based on a revolution rate which shows the percentage of colonies that randomly change their position.

Finally, empires compete together in order to possess and control other empires’ colonies as shown in Fig. 4. A colony of the weakest empire is selected and the possession probability of each empire, Pp, is computed as Eq. (24). The normalized total cost of an empire, $N.T.C_{n}$, is acquired by Eq. (23) and used to obtain the empire possession probability.

$$\begin{aligned}&N.T. C_n =T. C_n - \mathop {\max }\limits _i \left\{ { {T.C}_i } \right\} .\end{aligned}$$

(23)

$$\begin{aligned}&p_{ p_n } =\left| {\frac{ {N.T.C}_n }{\sum \nolimits _{i=1}^{ N_{imp} } { {N.T.C}_i } }} \right| . \end{aligned}$$

(24)

Vector $P$ is formed in order to divide the mentioned colonies among empires:

$$\begin{aligned} P=\left[ { p_{ p_1 } , p_{ p_2 } , p_{ p_3 } ,\ldots , p_{ p_{ N_\textit{imp} } } } \right] . \end{aligned}$$

(25)

Vector $R$ with the same size as $P$ whose elements are uniformly distributed random numbers is created:

$$\begin{aligned} \begin{aligned}&R=\left[ {r_1, r_2, r_3,\ldots , r_{ N_\textit{imp} } } \right] \\&r_1 , r_2 , r_3 ,\ldots , r_{ N_\textit{imp} } \in U( {0,1}). \end{aligned} \end{aligned}$$

(26)

Vector $D$ is formed so that the mentioned colony (colonies) is given to an empire whose relevant index in $D$ is maximized.

$$\begin{aligned} D=P-R=\left[ { D_1 , D_2 , D_3 ,\ldots , D_{ N_\textit{imp} } } \right] . \end{aligned}$$

(27)

According to vector $D$, the process of choosing an empire is similar to the roulette wheel in GA. However, this method is faster because the selection is based on probabilities values.

The competition affects the power of empires and an empire power will be weaker or stronger. Therefore, all colonies of the weak empire are owned by more powerful empires and the weaker one is eliminated. The total power (cost) of an empire is modelled by adding the power of imperialist country (cost) and a percentage of mean power of its colonies (colonies costs) as:

$$\begin{aligned} T.C_n&= \textit{Cost}( {\textit{imprialist}_n })\nonumber \\&+\,\xi \,\,\textit{mean}\left\{ {\textit{Cost}( {\textit{colonies}\,\,\textit{of}\,\textit{empire}_n })} \right\} , \end{aligned}$$

(28)

where $T.C_{n}$ is the total cost of the $n\mathrm{th}$ empire and $\xi $ is a positive small number.

The competition will go on until remaining one empire or reaching determined maximum iteration. ICA also has some drawbacks such as using complex operators, tuning several parameters and taking long time for running (Beheshti and Shamsuddin 2013a, b). These limitations do not allow that ICA is easily used in different problems.

The described algorithms are applied for MLP learning to classify some medical datasets. In the next section, classification concept is explained to utilize in medical diseases diagnosis.

2.5 Classification

Classification is used to understand the existing data and to predict how new instances will behave. In other words, the objective of data classification is to classify data in different classes. Classification techniques include neural networks (Bishop 1995), decision tree induction (Breiman et al. 1984; Quinlan 1993) and support vector machines (Cristianini and Shawe-Taylor 2000). Researchers have employed the methods to classify different problems (Chen et al. 2007; Mitra and Wang 2008; Reynolds and Iglesia 2009; Saxena and Saad 2007; Qasem and Shamsuddin 2011; Awad and Motai 2008; García et al. 2009; Fana et al. 2011; Ekici 2012).

The correct classification is defined by the number of true positives (TP) and the number of true negatives (TN), whereas the misclassification is defined by the number of false positives (FP) and the number of false negatives (FN). Also, the accuracy of predicting events is measured by Sensitivity computed as TP/total actual positive and Specificity is a measure of accuracy for predicting nonevents based on TN/total actual negative of a classifier for a range of cutoffs. Receiver operating characteristic (ROC) curve is another concept to analyze the performance of algorithms in classification problem. ROC is a graphical plot to demonstrate the quality of classifiers. In other words, the curve shows the true positive rate (sensitivity) and false positive rate (1-specificity). AUC is the area under the curve (ROC). Since, the AUC is a portion of the area of the unit square, its value is between 0.0 and 1.0. Analyses of sensitivity and specificity of algorithms are carried out using the following equations (Fawcett 2006):

$$\begin{aligned} \textit{Sensitivity} = \frac{\textit{TP}}{\textit{TP}+\textit{FN}}(\% ),\end{aligned}$$

(29)

$$\begin{aligned} \textit{Specificity} = \frac{\textit{TN}}{\textit{TN}+\textit{FP}}(\% ), \end{aligned}$$

(30)

where TP $+$ FN is the number of positive patterns and TN $+$ FP is the number of negative patterns in a dataset.

According to the concept, Accuracy can be defined as follows:

$$\begin{aligned} \textit{Accuracy} = \frac{\textit{TN}+\textit{TP}}{\textit{TN}+\textit{TP}+\textit{FN}+\textit{FP}}(\% ) \end{aligned}$$

(31)

In the next section, a new meta-heuristic algorithm is introduced to enhance the ANN learning and used to train MLP for classification problems.

3 Methods

This section introduces CAPSO as a new global optimization algorithm to improve the ANN accuracy and follows by proposing the hybrid learning of CAPSO and MLP (CAPSO-MLP).

3.1 CAPSO algorithm

The CAPSO algorithm is an efficient optimization algorithm utilizing mechanics motion laws and PSO algorithm. According to the law, if an object with initial velocity $v_{1}$ moves from position $x_{1 }$to position $x_{2}$ and its velocity modifies to $v_{2 }$during the time step of $\Delta t$, the object will be accelerated. Therefore, the new velocity and position are obtained as follows:

$$\begin{aligned} v_2&= v_1 +a\times \Delta t,\end{aligned}$$

(32)

$$\begin{aligned} x_2&= x_1 +\textstyle {1 \over 2}\times a\times \Delta t^2+v_1 \times \Delta t, \end{aligned}$$

(33)

where $a $ is the object acceleration.

Regarding the concept, each solution candidate, called a particle, in CAPSO algorithm has four specifications: position, velocity, acceleration and centripetal acceleration. If a d-dimensional search space with $N$ particles is considered, the position and velocity of the $i\mathrm{th}$ particle are as follows:

$$\begin{aligned}&\mathop {X_i }\limits ^\rightarrow =( { x_{i1} , x_{i2} ,\ldots , x_{id} }) \quad {\textit{for}}\,\, {i=1,2,\ldots ,N.}\end{aligned}$$

(34)

$$\begin{aligned}&{\mathop {V_i }\limits ^\rightarrow =( { v_{i1} , v_{i2} ,\ldots , v_{id} })} \quad \textit{for}\,\, i=1,2,\ldots ,N. \end{aligned}$$

(35)

Also, the personal best position of the ith particle, $\mathop {Pi}\limits ^\rightarrow $, and the best position explored by population so far, $\mathop {Pg}\limits ^\rightarrow $, are illustrated using Eqs. (36) and (37):

$$\begin{aligned}&\mathop {P_i }\limits ^\rightarrow =( { p_{i1} , p_{i2} ,\ldots , p_{id} })\quad \textit{for} \quad i=1,2,\ldots ,N.\end{aligned}$$

(36)

$$\begin{aligned}&\mathop {Pg}\limits ^\rightarrow =( { p_{g1} , p_{g2} ,\ldots , p_{gd} }). \end{aligned}$$

(37)

Every particle updates its velocity based on current velocity, acceleration and centripetal acceleration:

$$\begin{aligned} v_{id} ( {t+1})= v_{id} ( t)+ a_{id} ( t)+ A_{id} ( t), \end{aligned}$$

(38)

where vi $_{d}(t~+~1)$ and $v_{id}(t)$ are the next and current velocity respectively.

Additionally, $a_{id}(t)$ and $A_{id}(t)$ represent the acceleration and centripetal acceleration as follows:

$$\begin{aligned} a_{id} ( t)&= \textit{rand}\times ( { p_{\textit{id}} ( t)\!-\! x_{\textit{id}} ( t)})\!+\!\textit{rand}\nonumber \\&\times ( { p_{gd} ( t)\!- \!x_{\textit{id}} ( t)}), \end{aligned}$$

(39)

$$\begin{aligned} A_{\textit{id}} ( t)&= E_i ( t)\times \textit{rand}\times ( { p_{\textit{id}} ( t)- p_{\textit{md}} ( t)- x_{\textit{id}} ( t)}),\nonumber \\ \end{aligned}$$

(40)

where rand is a random number with uniform distribution in the interval of [0, 1]. $x_{\textit{id}}(t)$ is the current position, $p_{\textit{id}}(t)$ shows the personal best position of the $i\mathrm{th}$ particle and $p_{\textit{gd}}(t)$ is the global best position explored in the $d\mathrm{th}$ dimension. Also, $p_{\textit{md}}(t)$ is the current median position of the swarm in dimension $d$ and $ E_{i}(t)$ is acceleration coefficient computed by Eq. (42). Moreover, according to Eq. (33) the next position of the $i\mathrm{th}$ particle is acquired by Eq. (43) as follows:

$$\begin{aligned}&e_i ( t)= {\textit{fit}}_i ( t)-\textit{GWfit}( t),\end{aligned}$$

(41)

$$\begin{aligned}&E_i ( t)=\frac{ e_i ( t)}{\sum \nolimits _{j=1}^N { e_j ( t)} },\end{aligned}$$

(42)

$$\begin{aligned}&x_{\textit{id}} ( {t+1})= x_{\textit{id}} ( t)+\textstyle {1 \over 2} {\times a}_{\textit{id}} ( t)+ v_{\textit{id}} ( {t+1}) \end{aligned}$$

(43)

where $ {\textit{fit}}_i ( t)_{ }$represents the fitness value of particle $i$ and $\textit{GWfit}( t)$ is the worst fitness value explored so far by the swarm.

3.2 Analysis and design of the proposed algorithm

In CAPSO algorithm, two terms of acceleration, $\vec {a}_i$, and centripetal acceleration, $\vec {A}_i$, play a key role to increase the convergence rate. As shown in Fig. 5, if a particle is far from the solutions obtained so far by the swarm ($\vec {P}_g, \vec {P}_i, \vec {P}_m)$, these terms help to move the particle toward the solutions. Also, $\vec {A}_i$ helps to escape the algorithm from local optima. Fig. 5a demonstrates the term of $\vec {A}_i$ computed based on Eq. (40) and in Fig. 5b, the graphical representation of $\vec {a}_i$ has been illustrated according to Eq. (39).

Figure 6 demonstrates the flowchart of proposed method and the CAPSO pseudo code has been illustrated in Fig. 7. As shown in these figures, the velocities and positions of particles are randomly initialized and each particle is evaluated based on its fitness value. $\mathop {P_i }\limits ^\rightarrow $ is computed for each particle and the best position explored by the swarm is chosen as $\mathop {Pg}\limits ^\rightarrow $. The next particles velocity and position will be acquired based on Eqs. (38)–(43). These steps will go on until stopping criterion is met and the best solution is returned by the algorithm.

3.3 Hybrid learning of CAPSO and MLP network

In this section, CAPSO is used for MLP learning. The approach is the hybrid learning of CAPSO and MLP network (CAPSO-MLP) to enhance the ability of the network in term of accuracy. The algorithm will simultaneously determine the set of weights and its corresponding accuracy by training the network. The MLP network can be represented as a vector with dimension $D$ containing the network weights. The vector for MLP is defined as Eq. (44). To optimize the MLP weights using CAPSO algorithms, the dimension of each particle is considered as the vector $D$:

$$\begin{aligned} D&= ( {\textit{Input}\times \textit{Hidden}})+( {\textit{Hidden}\times \textit{Output}})\nonumber \\&+\,\textit{Hidden}_{\textit{bias}} +\textit{Output}_{\textit{bias}}, \end{aligned}$$

(44)

where Input, Hidden and Output are referred the number of input, hidden and output neurons of MLP network respectively. Also, Hiddenb $_{ias}$ and Output $_{bias}$ are the number of biases in hidden and output layers.

Figure 8 shows the flowchart of hybrid leaning of CAPSO and MLP (CAPSO-MLP). The CAPSO-MLP is started by collecting, normalizing and reading a dataset. This is followed by setting the desired number of input, output and hidden neurons to determine the dimension of the particles as Eq. (44). The population is initialized and after MLP training, the training error is calculated as an objective function. According to training error, every particle updates its velocity and position. The new positions are the new weights of MLP network which should minimize the objective function. The function is computed based on mean square error (MSE) as Eq. (45). These steps will go on until meeting stop conditions.

$$\begin{aligned} \textit{MSE}=\frac{1}{P}\sum \limits _{j=1}^P { {( { t_j - O_j })}^2 }, \end{aligned}$$

(45)

where $t_{j}$ and $O_{j}$ are the desired output and the network output respectively, and $P$ is the number of patterns.

4 Experimental studies

To evaluate the performance of CAPSO-MLP, nine medical datasets (Kuncheva 2005; Bache and Lichman 2013) are selected as detailed in Table 1. As shown in this table, the selected datasets of Hepatitis, Heart Disease, Pima Indian Diabetes, Wisconsin Prognostic Breast Cancer, Parkinson’s disease and Echocardiogram (Heart attack), Liver Disorders, Laryngeal 1 and Acute Inflammations are in the class of binary classification problems. Each dataset will be represented as input patterns and outputs in hybrid learning of MLP network. Therefore, the effectiveness and performance of MLP network learning have been verified using different datasets. CAPSO, PSO, GSA and ICA algorithms are used for MLP learning on these datasets to assess the ability of proposed approach. In this context, the hybrids of the algorithms and MLP network are called PSO-MLP, GSA-MLP and ICA-MLP respectively.

Table 1 Description of datasets

Full size table

4.1 Datasets design

The Hepatitis dataset problem is a complex and noisy data and has a large number of missing data (167 missing values in total dataset). The classification should predict whether a patient with hepatitis will live or die. There are 19 attributes/inputs and the output classes’ distribution is live: 123 and die: 32.

The absence or presence of heart disease is predicted by Statlog (Heart Disease) dataset. This dataset contains 13 attributes/inputs and the output classes’ distribution is absence: 150 and presence: 120.

The Pima Indian Diabetes dataset diagnoses a person based on personal data and medical examination has Diabetes or not. There are 8 attributes/inputs and the output class distribution is Diabetes positive: 268 and Diabetes negative: 500.

The aim of the Wisconsin Prognostic Breast Cancer (WPBC) dataset is to predict whether a tumor is recurrence or non-recurrence. Each record represents a follow-up data for one breast cancer case. There are 33 attributes/inputs and the output class distribution is recurrence: 47 and non-recurrence: 151.

The objective of Parkinson’s disease dataset is to discriminate healthy people from those with Parkinson’s disease. This dataset is composed of a range of biomedical voice measurements with 22 attributes/inputs and the output class distribution is health status: 147 and Parkinson’s disease: 48.

In the dataset of Echocardiogram, all patients have suffered from heart attacks at some points in the past. Some of them are still alive and some are not. The dataset has 132 samples and 12 attributes/inputs. The number of missing values in total dataset is 132 and the distribution of output attribute is alive: 43, dead: 88 and missing value: 1.

The main aim of the Liver Disorders dataset is to diagnose a patient as healthy (normal) or abnormal. It has six attributes/inputs and two output classes (abnormal or normal). The first five variables are all blood tests which are thought to be sensitive to liver disorders that might arise from excessive alcohol consumption. This dataset contains 345 examples in, where 200 are negative examples (normal) and 145 are positive examples (abnormal). Each line in the Liver Disorders dataset constitutes the record of a single male individual.

The Laryngeal 1 dataset is applied to diagnose the laryngeal pathology as normal and pathological voices. The number of attributes/inputs is 16 (all continuous-valued). The total instances is 213 and the distribution of output attribute is 81 Normal and 132 Pathological voices.

The idea of Acute Inflammations dataset is to diagnose two diseases of urinary system: Inflammation of urinary bladder and Nephritis of renal pelvis origin. The basis for rules detection is Rough Sets Theory. The dataset has 120 samples and 6 attributes/inputs. In this paper, the algorithms are evaluated to classify the Inflammation of urinary bladder output. The distribution of output attribute is Positive: 59 and Negative: 61.

The datasets are processed for better generalization of error and accuracy. Normally, real world data are noisy because they are incomplete and inconsistent. The datasets are normalized and missing data values are separately replaced by the mean of attributes values.

4.2 Experimental setup

The mentioned datasets are partitioned into two as training and testing sets. The training set is used to train the network in order to achieve the optimal weights. The testing set is applied on unseen data to test the generalization performance of meta-heuristic algorithms on MLP network. For each dataset, 80 % of data are employed for the training set and the rest 20 % for the testing set.

The number of input and output neurons in MLP network is problem-dependent and the number of hidden nodes is computed as Kolmogorov theorem (Hecht-Nielsen 1987) based on input neurons:

$$\begin{aligned} \textit{Hidden}=2\times \textit{Input}+1. \end{aligned}$$

(46)

The number of iterations is set to 500 for all datasets. The population size is considered 40 in CAPSO, PSO and GSA; also, the number of countries and imperialist are 160 and 8 in ICA respectively. In PSO, $w$ is set to 0.7 and linearly decreased to 0.4 and $V_{max}$ is set to 0.5. The acceleration coefficients $C_{1}$ and $C_{2}$ are equal to 2. In ICA, revolution rate $=$ 0.5, $\beta =2$, $\gamma =0.5$ and $\xi = 0.02$ (Tayefeh-Mahmoudi et al. 2013) are considered. Besides, $G$ in GSA is computed as Eq. (47):

$$\begin{aligned} G(t)= G_0 e^{\frac{-\alpha t}{T}} \end{aligned}$$

(47)

where $G_{0}$ and $\alpha $ are set to 100 and 20 respectively; also, $T$ is set to the maximum iteration. At the beginning, Kbest in Eq. (14) is initialized with 40 (population size) and then linearly decreased to 1.

4.3 Results and discussion

This section presents the results of CAPSO, PSO, GSA and ICA adapted on MLP network. The experiments are conducted using the nine medical datasets. In this study, all datasets have been partitioned into two sets: training set and testing set. The training set is used to train the network in order to achieve the optimal weights. The testing set is applied on unseen data to test the generalization performance of meta-heuristic algorithms on MLP network.

The algorithms are independently run 10 times and the results are shown in Tables 2, 3, 4, 5 and 6. In these tables, the Mean, Min and Max indicate the average, minimum and maximum value respectively. Also, the algorithms are ranked based on the mean solution in each row of tables and the rank of algorithms is calculated in the each column of table.

Table 2 MSE of CAPSO-MLP, PSO-MLP, GSA-MLP and ICA-MLP for all datasets

Full size table

Table 3 Comparison of datasets in term of accuracy (%)

Full size table

Table 4 Comparison of datasets in term of sensitivity (%)

Full size table

Table 5 Comparison of datasets in term of specificity (%)

Full size table

Table 6 Comparison of datasets in term of AUC

Full size table

Table 2 demonstrates the training and testing error (MSE) of MLP network based on the mentioned algorithms on Hepatitis, Heart Disease, Diabetes, Breast Cancer, Parkinson’s disease, Echocardiogram, Liver Disorders, Laryngeal 1 and Acute Inflammations datasets. In the table, the training and testing ranks signify that CAPSO-MLP has better convergence rate in comparison with other algorithms. In contrast, GSA-MLP shows the worst convergence rate among the algorithms. Also, Fig. 9 illustrates the training error (MSE) of CAPSO-MLP, PSO-MLP, GSA-MLP and ICA-MLP on the Hepatitis, Heart Disease and Breast Cancer datasets. As observed, CAPSO-MLP provides the lower error training for all datasets in contrast; GSA-MLP shows the worst MSE among the algorithms.

Table 3 presents the classification accuracy of CAPSO-MLP, PSO-MLP, GSA-MLP and ICA-MLP over ten independent runs on the datasets. The accuracy refers to the percentage of correct classification on training and testing data respectively. As depicted in this table, CAPSO-MLP provides the best performance on unseen data (testing data): Hepatitis with 71.29 %, Heart Disease with 81.85 %, Diabetes with 72.99 %, Wisconsin Prognostic Breast Cancer with 80.25 %, Parkinson’s disease with 92.56 %, Echocardiogram (Heart attack) with 88.46 %, Liver Disorders with 72.32 %, Laryngeal 1 with 81.63 % and Acute Inflammations with 100 %. Also, it is noticeable that the proposed method has the highest classification accuracy for Hepatitis and Echocardiogram datasets, containing missing data, compared to the other algorithms. Also, GSA-MLP shows a lower accuracy among them.

In addition, to evaluate the classification abilities of CAPSO-MLP, PSO-MLP, GSA-MLP and ICA-MLP, a comparison of the algorithms in the terms of Sensitivity, Specificity and AUC has been performed as shown in Tables 4, 5 and 6.

In terms of Sensitivity and Specificity, the proposed method has offered better results than the others algorithms. In other words, CAPSO-MLP presents better Sensitivity and Specificity on testing data (unseen data).

As shown in Table 6, the proposed method provides the best AUC both in training and testing sets, except in Liver Disorders dataset. Among the algorithms, GSA-MLP has the lowest AUC, both in training and testing data. The better results of CAPSO-MLP are also observable in Fig. 10. As the ROC of training and testing Heart Disease illustrate, CAPSO-MLP has yielded the best AUC on the dataset.

Further, in order to determine whether the results achieved by the algorithms are statistically different from each other, $t$-test (Box et al. 2005) and Wilcoxon’s signed ranks test (Wilcoxon 1945) are conducted between the results obtained by the proposed algorithm, CAPSO-MLP, with PSO-MLP, GSA-MLP and ICA-MLP. The tests are performed on datasets for training and testing accuracy. In these tests, a p-value is computed and as shown in Table 7, if p-value is greater than the significance level of $\alpha = 0.05$, there are no significant differences between algorithms; otherwise, there are significant differences. In the table, the symbols ‘$>$’ and ‘=’ mean the algorithm on the left side is significantly and no significantly better than the algorithm on the right side respectively. As illustrated, there are significant differences between CAPSO-MLP and the other algorithms.

Table 7 Results of $t$-test and Wilcoxon’s signed-ranks test taking into account accuracy rate

Full size table

5 Conclusions

In this paper, an improved scheme of particle swarm algorithm (PSO) and Newtonian’s motion laws, called centripetal accelerated particle swarm optimization (CAPSO) has been proposed to accelerate the learning and convergence procedure of classifiers. CAPSO has been adapted on MLP network to optimize the accuracy and the weights of the network. The obtained results of nine medical disease diagnosis benchmarks indicate that the proposed method provides better results than PSO-MLP, GSA-MLP and ICA-MLP in terms of good convergence rate and classification accuracy. The superiority of CAPSO-MLP against other methods is considerable especially in unseen data (testing data) and datasets with high missing data values.

The main advantages of CAPSO for neural network learning are the simple concept, the ease of implementation and the needless to tune to any algorithm-specific parameters. The algorithm requires only common controlling parameters such as the number of generation and population size. Therefore, it can be concluded that the proposed method is a viable tool to evolve the neural network learning and accuracy.

References

Abbod MF, Cheng KY, Cui XR, Huang SJ, Han YY, Shieh JS (2011) Ensembled neural networks for brain death prediction for patients with severe head injury. Biomed Signal Process Control 6:414–421
Article Google Scholar
Ahmadi MA, Ahmadi MR, Shadizadeh SR (2013) Evolving artificial neural network and imperialist competitive algorithm for prediction permeability of the reservoir. Neural Comput Appl 23:563
Google Scholar
Anderson JA (2006) An introduction to neural networks. Prentice-Hall of India Private Limited
Ari S, Saha G (2009) In search of an optimization technique for artificial neural network to classify abnormal heart sounds. Appl Soft Comput 9:330–340
Article Google Scholar
Ashizawa K, Ishida T, MacMahon H, Vyborny CJ, Katsuragawa S, Doi K (1999) Artificial neural networks in chest radiography: application to the differential diagnosis of interstitial lung disease. Acad Radiol 6:2–9
Google Scholar
Atashpaz-Gargari E, Lucas C (2007) Imperialist competitive algorithm: an algorithm for optimization inspired by imperialistic competition. In: Proceedings of IEEE congress on evolutionary computation, pp 4661–4667
Awad M, Motai Y (2008) Dynamic classification for video stream using support vector machine. Appl Soft Comput 8:1314–1325
Article Google Scholar
Bache K, Lichman M (2013) UCI machine learning repository. School of Information and Computer Science, University of California, Irvine. http://archive.ics.uci.edu/ml
Beheshti Z, Shamsuddin SM (2013a) CAPSO: centripetal accelerated particle swarm optimization. Inf Sci. doi:10.1016/j.ins.2013.08.015
Beheshti Z, Shamsuddin SM (2013b) A review of population-based meta-heuristic algorithm. Int J Adv Soft Comput Appl 5(1):1–35
Google Scholar
Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim Method Softw 1:23–34
Article Google Scholar
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
Breiman L, Friedman JH, Olshen RA, Stone C (1984) Classification and regression trees. Wadsworth
Brent RP (1991) Fast training algorithms for multi-layer neural nets. IEEE Trans Neural Netw 2:346–354
Article Google Scholar
Box GE-P, Hunter JS, Hunter WG (2005) Statistics for experiments: design, innovation, and discovery, 2nd edn. Wiley, New York
Google Scholar
Chau KW (2007) Application of a PSO-based neural network in analysis of outcomes of construction claims. Autom Constr 16:642– 646
Google Scholar
Chen MY, Chen KK, Chiang HK, Huang HS, Huang MJ (2007) Comparing extended classifier system and genetic programming for financial forecasting: an empirical study. Soft Comput. 11:1173–1183
Article MATH Google Scholar
Coppini G, Miniati M, Paterni M, Monti S, Ferdeghini EM (2007) Computer-aided diagnosis of emphysema in COPD patients: neural-network-based analysis of lung shape in digital chest radiographs. Med Eng Phys 29:76–86
Article Google Scholar
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Delen D, Walker G, Kadam A (2005) Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med 34:113–127
Article Google Scholar
Dorigo M, Maniezzo V, Colorni A (1996) The ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern Part C Appl Rev 26:29–41
Article Google Scholar
Ekici S (2012) Support vector machines for classification and locating faults on transmission lines. Appl Soft Comput 12:1650–1658
Article Google Scholar
Elveren E, Yumuşak N (2011) Tuberculosis disease diagnosis using artificial neural network trained with genetic algorithm. J Med Syst 35:329–332
Article Google Scholar
Er O, Sertkaya C, Temurtas F, Tanrikulu AC (2009) A comparative study on chronic obstructive pulmonary and pneumonia diseases diagnosis using neural networks and artificial immune system. J Med Syst 33:485–492
Article Google Scholar
Fana CY, Changb PC, Linb JJ, Hsiehb JC (2011) A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification. Appl Soft Comput 11:632–644
Article Google Scholar
Farmer JD, Packard NH, Perelson AS (1986) The immune system, adaptation and machine learning. Phys D 2:187–204
Article MathSciNet Google Scholar
Fausett LV (1994) Fundamentals of neural networks: architectures, algorithms, and applications. Prentice-Hall, New Jersey
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27:861–874
Article Google Scholar
Folland R, Hines E, Dutta R, Boilot P, Morgan D (2004) Comparison of neural network predictors in the classification of tracheal-bronchial breath sounds by respiratory auscultation. Artif Intell Med 31:211–220
Article Google Scholar
Gao WF, Liu SY, Huang LL (2012) Particle swarm optimization with chaotic opposition based population initialization and stochastic search technique. Commun Nonlinear Sci Numer Simul 17:4316–4327
Article MathSciNet MATH Google Scholar
García S, Fernández A, Herrera F (2009) Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems. Appl Soft Comput 9:1304–1314
Article Google Scholar
Gori M, Tesi A (1992) On the problem of local minima in backpropagation. IEEE Trans Pattern Anal Mach Intell 14:76–85
Article Google Scholar
Gurney KN (1997) An introduction to neural networks. Routledge
Handels H, Roß TH, Kreusch J, Wolff HH, Poppl SJ (1999) Feature selection for optimized skin tumor recognition using genetic algorithms. Artif Intell Med 16:283–297
Article Google Scholar
Hecht-Nielsen R (1987) Kolmogorov’s mapping neural network existence theorem. In: Proceedings of IEEE international conference on neural networks, pp 11–14
Heckerling PS, Gerber BS, Tape TG, Wigton RS (2004) Use of genetic algorithms for neural networks to predict community-acquired pneumonia. Artif Intell Med 30:71–84
Article Google Scholar
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2:359–366
Article Google Scholar
Hrstka O, Kučerová A (2004) Improvements of real coded genetic algorithms based on differential operators preventing premature convergence. Adv Eng Softw 35:237–246
Article Google Scholar
Huang ML, Hung YH, Chen WY (2010) Neural network classifier with entropy based feature selection on breast cancer diagnosis. J Med Syst 34:865–873
Article Google Scholar
Jerez JM, Molina I, García-Laencina PJ, Alba E, Ribelles N, Martín M, Franco L (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50:105–115
Article Google Scholar
Karabatak M, Ince MC (2009) An expert system for detection of breast cancer based on association rules and neural network. Expert Syst Appl 36:3465–3469
Article Google Scholar
Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical Report, TR06, Erciyes University
Kayaer K, Yildirim T (2003) Medical diagnosis on Pima Indian diabetes using general regression neural networks. In: Proceedings of international conference on artificial neural networks and neural information, pp 181–184
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of IEEE international conference on neural networks, pp 1942–1948
Kisi O, Ozturk O (2007) Adaptive neuro-fuzzy computing technique for evapotranspiration estimation. J Irrig Drainage Eng ASCE 133:368–379
Article Google Scholar
Kuncheva LI (2005) Real medical data sets. Tech. Rep. School of Informatics: University of Wales, Bangor. http://www.bangor.ac.uk/~mas00a/activities/real_data.htm
Leung Y, Gao Y, Xu ZB (1997) Degree of population diversity—a perspective on premature convergence in genetic algorithms and its Markov chain analysis. IEEE Trans Neural Netw 8:1165–1176
Article Google Scholar
Liang JJ, Qin AK, Suganthan PN, Baskar S (2006) Comprehensive learning particle swarm optimizer for global optimization of multimodal functions. IEEE Trans Evol Comput 10:281–295
Article Google Scholar
Mitra V, Wang CJ (2008) Content based audio classification: a neural network approach. Soft Comput 12:639–646
Article Google Scholar
Moslemipour G, Lee TS, Rilling D (2012) A review of intelligent approaches for designing dynamic and robust layouts in flexible manufacturing systems. Int J Adv Manuf Technol 60:11–27
Article Google Scholar
Oreski S, Oreski D, Oreski G (2012) Hybrid system with genetic algorithm and artificial neural networks and its application to retail credit risk assessment. Expert Syst Appl 39:12605–12617
Article Google Scholar
Ozkan C, Kisi O, Akay B (2011) Neural networks with artificial bee colony algorithm for modeling daily reference evapotranspiration. Irrig Sci 29:431–441
Article Google Scholar
Park J, Sandberg I (1991) Universal approximation using radial-basis function networks. Neural Comput 3:246–257
Article Google Scholar
Prechelt L (1995) Some notes on neural learning algorithm benchmarking. Neurocomputing 9:343–347
Google Scholar
Qasem SN, Shamsuddin SM (2011) Radial basis function network based on time variant multi-objective particle swarm optimization for medical diseases diagnosis. Appl Soft Comput 11:1427–1438
Article Google Scholar
Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann
Rashedi E, Nezamabadi-pour H, Saryazdi S (2009) GSA: a gravitational search algorithm. Inf Sci 179:2232–2248
Article MATH Google Scholar
Reynolds AP, Iglesia B (2009) A multi-objective GRASP for partial classification. Soft Comput 13:227–243
Article Google Scholar
Rumelhart DE, McClelland JL (1986) Parallel distributed processing: explorations in the microstructure of cognition. MIT, Cambridge
Google Scholar
Samanta B, Al-Balushi KR, Al-Araimi SA (2003) Artificial neural networks and support vector machines with genetic algorithm for bearing fault detection. Eng Appl Artif Intell 16:657–665
Article Google Scholar
Saxena A, Saad A (2007) Evolving an artificial neural network classifier for condition monitoring of rotating mechanical systems. Appl Soft Comput 7:441–454
Article Google Scholar
Schutz B (2003) Gravity from the ground up. Cambridge University Press, Cambridge
Setiono R, Hui LCK (1995) Use of a quasinewton method in a feedforward neural network construction algorithm. IEEE Trans Neural Netw 6:740–747
Article Google Scholar
Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: Proceedings of IEEE international conference on evolutionary computation, pp 69–73
Sivagaminathan RK, Ramakrishnan S (2007) A hybrid approach for feature subset selection using neural networks and ant colony optimization. Expert Syst Appl 33:49–60
Article Google Scholar
Socha K, Blum C (2007) An ant colony optimization algorithm for continuous optimization: application to feed-forward neural network training. Neural Comput Appl 16:235–247
Article Google Scholar
Sulaiman SI, Abdul-Rahman TK, Musirin I, Shaari S (2012) An artificial immune-based hybrid multi-layer feedforward neural network for predicting grid-connected photovoltaic system output. Energy Proc 14:260–264
Article Google Scholar
Tang KS, Man KF, Kwong S, He Q (1996) Genetic algorithms and their applications. IEEE Signal Process Mag 13:22–37
Article Google Scholar
Tayefeh-Mahmoudi M, Taghiyareh F, Forouzideh N, Lucas C (2013) Evolving artificial neural network structure using grammar encoding and colonial competitive algorithm. Neural Comput Appl 22:1–16
Article Google Scholar
Temurtas F (2007) A comparative study on thyroid disease diagnosis using neural networks. Expert Syst Appl 36:944–949
Article Google Scholar
Tsoulos I, Gavrilis D, Glavas E (2008) Neural network construction and training using grammatical evolution. Neurocomputing 72:269–277
Article Google Scholar
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics Bull 1:80–83
Article Google Scholar
Yao X (1999) Evolving artificial neural networks. Proc IEEE 87:1423–1447
Article Google Scholar
Zhang C, Shao H, Li Y (2000) Particle swarm optimization for evolving artificial neural network. In: Proceedings of IEEE international conference on systems, man, and sybernetics, pp 2487–2490
Ziver AK, Pain CC, Carter JN, de Oliveira CRE, Goddard AJ, Overton RS (2004) Genetic algorithms and artificial neural networks for loading pattern optimization of advanced gas-cooled reactors. Ann Nucl Energy 13:431–457
Article Google Scholar

Download references

Acknowledgments

This research is supported by Research University Grant (03H72), Universiti Teknologi Malaysia. The authors would like to thank Soft Computing Research Group (SCRG), Universiti Teknologi Malaysia (UTM), Johor Bahru Malaysia, for supporting this study. Also, they give kind respect and special thanks to Dr. Zhaleh Beheshti, for proof reading the manuscript and providing valuable comments.

Author information

Authors and Affiliations

Soft Computing Research Group, Faculty of Computing, Universiti Teknologi Malaysia, 81310 , Skudai, Johor, Malaysia
Zahra Beheshti, Siti Mariyam Hj. Shamsuddin & Siti Sophiayati Yuhaniz
Software Research Group, Nowpardaz Engineering Company, Isfahan, Iran
Ebrahim Beheshti

Authors

Zahra Beheshti
View author publications
You can also search for this author in PubMed Google Scholar
Siti Mariyam Hj. Shamsuddin
View author publications
You can also search for this author in PubMed Google Scholar
Ebrahim Beheshti
View author publications
You can also search for this author in PubMed Google Scholar
Siti Sophiayati Yuhaniz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zahra Beheshti.

Additional information

Communicated by D. Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Beheshti, Z., Shamsuddin, S.M.H., Beheshti, E. et al. Enhancement of artificial neural network learning using centripetal accelerated particle swarm optimization for medical diseases diagnosis. Soft Comput 18, 2253–2270 (2014). https://doi.org/10.1007/s00500-013-1198-0

Download citation

Published: 15 December 2013
Issue Date: November 2014
DOI: https://doi.org/10.1007/s00500-013-1198-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Enhancement of artificial neural network learning using centripetal accelerated particle swarm optimization for medical diseases diagnosis

Abstract

Similar content being viewed by others

Training Multi-Layer Perceptron Using Population-Based Yin-Yang-Pair Optimization

An Efficient Hybrid Model Based on Modified Whale Optimization Algorithm and Multilayer Perceptron Neural Network for Medical Classification Problems

A cooperative coevolutionary method for optimizing random weight networks and its application for medical classification problems

1 Introduction

2 Background materials

2.1 MLP network

2.2 Particle swarm optimization

2.3 Gravitational search algorithm

2.4 Imperialist competitive algorithm

2.5 Classification

3 Methods

3.1 CAPSO algorithm

3.2 Analysis and design of the proposed algorithm

3.3 Hybrid learning of CAPSO and MLP network

4 Experimental studies

4.1 Datasets design

4.2 Experimental setup

4.3 Results and discussion

5 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Enhancement of artificial neural network learning using centripetal accelerated particle swarm optimization for medical diseases diagnosis

Abstract

Similar content being viewed by others

Training Multi-Layer Perceptron Using Population-Based Yin-Yang-Pair Optimization

An Efficient Hybrid Model Based on Modified Whale Optimization Algorithm and Multilayer Perceptron Neural Network for Medical Classification Problems

A cooperative coevolutionary method for optimizing random weight networks and its application for medical classification problems

Explore related subjects

1 Introduction

2 Background materials

2.1 MLP network

2.2 Particle swarm optimization

2.3 Gravitational search algorithm

2.4 Imperialist competitive algorithm

2.5 Classification

3 Methods

3.1 CAPSO algorithm

3.2 Analysis and design of the proposed algorithm

3.3 Hybrid learning of CAPSO and MLP network

4 Experimental studies

4.1 Datasets design

4.2 Experimental setup

4.3 Results and discussion

5 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation