Vortex search optimization algorithm for training of feed-forward neural network

Sağ, Tahir; Abdullah Jalil Jalil, Zainab

doi:10.1007/s13042-020-01252-x

Vortex search optimization algorithm for training of feed-forward neural network

Original Article
Published: 04 February 2021

Volume 12, pages 1517–1544, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Vortex search optimization algorithm for training of feed-forward neural network

Download PDF

498 Accesses
15 Citations
Explore all metrics

Abstract

Training of feed-forward neural-networks (FNN) is a challenging nonlinear task in supervised learning systems. Further, derivative learning-based methods are frequently inadequate for the training phase and cause a high computational complexity due to the numerous weight values that need to be tuned. In this study, training of neural-networks is considered as an optimization process and the best values of weights and biases in the structure of FNN are determined by Vortex Search (VS) algorithm. The VS algorithm is a novel metaheuristic optimization method recently developed, inspired by the vortex shape of stirred liquids. VS fulfills the training task to set the optimal weights and biases stated in a matrix. In this context, the proposed VS-based learning method for FNNs (VS-FNN) is conducted to analyze the effectiveness of the VS algorithm in FNN training for the first time in the literature. The proposed method is applied to six datasets whose names are 3-bit XOR, Iris Classification, Wine-Recognition, Wisconsin-Breast-Cancer, Pima-Indians-Diabetes, and Thyroid-Disease. The performance of the proposed algorithm is analyzed by comparing with other training methods based on Artificial Bee Colony Optimization (ABC), Particle Swarm Optimization (PSO), Simulated Annealing (SA), Genetic Algorithm (GA) and Stochastic Gradient Descent (SGD) algorithms. The experimental results show that VS-FNN is generally leading and competitive. It is also said that VS-FNN can be used as a capable tool for neural networks.

Designing evolutionary feedforward neural networks using social spider optimization algorithm

Article 27 February 2015

Memetic algorithms for training feedforward neural networks: an approach based on gravitational search algorithm

Article Open access 06 July 2020

Weight Optimization in Artificial Neural Network Training by Improved Monarch Butterfly Algorithm

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Artificial Neural Networks (ANNs) are one of the most useful computational modeling tools in machine learning algorithms. Therefore, ANN has attracted attention in many disciplines including engineering, medicine, agriculture, technology, business, arts, etc. Up to now, it has been applied as a reliable method to various challenging manners such as classification, image processing, speech recognition, natural language processing, etc. ANN supplies a parallel computing system including many simple processors inspired by biological neural networks and uses some organizational principles used in humans. In ANN, the connection weights storing information are combined in a parallel and sequential manner which forms the network architecture. In many real-world implementations, feed-forward neural networks (FNNs) are the most popular neural networks.

The concept of learning refers to the process of finding optimal weight values in this architecture. The success of ANNs in problem-solving depends largely on the training of the network and the performance of the learning algorithm used in the training phase [1, 8, 13, 23, 24]. Several learning algorithms have been used in literature. The most popular training methods are based on mathematical error reduction techniques such as back-propagation (BP) [6], Gradient Descent (GD) [29], Conjugate Gradient [36], Newton’s Method [5] and Levenberg–Marquardt (LM) [12]. Due to some factors such as the problems having nonlinear properties and/or large-scale dimensions, derivative learning-based algorithms may not always be sufficient for training neural networks.

Many metaheuristic optimization algorithms have been widely used in solving NP-hard problems such as training of FNNs since these algorithms establish a balance-aware between exploration and exploitation so that providing the optimal solution in the search space [17]. Some of the most-cited algorithms developed to date are Artificial Bee Colony (ABC) [16], Differential Evolution Algorithms (DEA) [32], Genetic Algorithms (GA) [14], Simulated Annealing (SA) [19], Particle Swarm Optimization (PSO) [18], Grey Wolf Optimizer [21], Firefly Algorithm (FA) [40], Butterfly Optimization Algorithm [3], Biogeography based Optimization Algorithm [30] and Gravitational Search Algorithm [28]. Apart from these, there are more than 250 metaheuristic algorithms presented in the literature for various purposes based on the NFL theorem [37].

This paper focuses on Vortex Search (VS) optimization algorithm [9] and it is adapted to the training task of FNNs as a learning method. The VS algorithm has been recently proposed for solving numerical function optimization and it uses multivariate Gaussian distribution to generate candidate solutions. The search space is reduced by using the inverse incomplete gamma function during iterations. Thus, exploration is effectively accelerated while performing exploitation.

To the best of our knowledge, this is the first study that the efficiency of VS is investigated for the training of FNNs and a VS-based learning method for FNNs (VS-FNN) is proposed in this paper. In order to adapt the VS algorithm to the training phase, the training process is accepted as an optimization problem and all weights and bias values in the architecture of FNN are systematically stored in a matrix to generate and optimize as a candidate solution in the search space. In other words, each weight value in this matrix represents a component of the center point for the vortex in the algorithm.

1.1 The motivation and contribution

As reported in the paper [9], VS was tested over 50 benchmark mathematical functions and the results were compared to both the single-solution based (SA and Pattern Search) and population-based (PSO and ABC) algorithms. Herewith, it was reported that VS outperforms or at least competed with other algorithms. This report proves that VS ensures the balance strongly between exploration and exploitation. Our motivation at the beginning of this work is based on the fact that it has not been previously investigated in the training of FNN, although it has been stated to be very successful against the most advanced algorithms.

The efficiency of the VS algorithm is examined in the training of FNN by comparing it with ABC, PSO, SA, GA, and SGD algorithms. However, it is not within the scope of this study to find the optimal network architecture to provide the lowest possible training error. The aim is to demonstrate that the VS algorithm is as competitive as other metaheuristics in training neural-networks and to analyze its performance. To crosscheck the accuracies of the trained FNNs, the algorithms are run over six datasets with multiple classes whose names are called 3-bit XOR, Iris Classification, Wine-Recognition, Wisconsin-Breast-Cancer, Thyroid-Disease, and Pima-Indians-Diabetes. In summary, the contributions of VS-based learning algorithm can be bulleted as follows:

To make the first investigation effort for the VS algorithm in the training of FNNs.
To demonstrate the effectiveness of its running nature not needing a user-tuning parameter on the adjustment of weights in an FNN.
To prove that it can achieve an accuracy performance as high as at least other successful algorithms by consuming less computation time.

The rest of the paper is organized as follows: some previous related works are introduced in Sect. 2. Then, the definition of FNN is explained briefly in Sect. 3. Vortex Search optimization algorithm is described in Sect. 4. After that, the VS-based learning method is revealed in Sect. 5. The experimental results are demonstrated and evaluated in Sect. 6. Finally, the conclusions are briefly considered in Sect. 7.

2 Related works

The focus here is to adjust the optimal weights and biases in an FNN structure. In this way, training neural networks can be accepted as an important nonlinear optimization problem. In literature, besides gradient methods, a large number of metaheuristics have been presented to train neural networks. Some of the mostly-used algorithms depend on evolutionary techniques and swarm intelligence algorithms.

One of the first studies in this field was presented by [25]. They used a GA-trained FFN to classify data of a sonar image dataset. They reported that GA could achieve better results than BP in particularly domain-specific problems. In 1998 [35], revealed a training method by combining gradient descent with SA. They claimed to maintain a quick convergence without tackling local optima during the training process.

In the literature, there are many papers presented for the training of FNN with PSO. For example, in 2007, (Gudise and Venayagamoorthy 2003) made a comparative study between BP and PSO for training neural networks. Considering computational requirements, they concluded that PSO had a faster convergence than BP. Then [41], presented a hybrid learning algorithm combining PSO with BP for the same purpose. They took advantage of the global search ability of PSO at the beginning of the iterations and later used BP for local search. They used three benchmark problems (3-bit-parity, function approximation, and Iris classification) and reported that their proposed method had reached better convergence than PSO and BP. In another study [31], adapted ant colony optimization to continuous optimization and they applied it to the training of feed-forward neural networks. The obtained results were evaluated as successful as GD by the authors [15]. used BP and PSO for training neural networks. In order to solve medical diagnostic problems, the performance between global and local techniques was evaluated on the training. They analyzed the performance of the algorithms on three medical diagnostic problems stated in the Proben1 dataset. BP was found to be more efficient than PSO [39] trained neural-networks with a combined method involving opposition based PSO and BP with momentum. They examined the hybrid method on well-known eight datasets. The produced results for training time and classification accuracy showed the superiority of the method.

ABC is another metaheuristic proposed for FNN training in literature [26] designed ABC algorithm to solve XOR, 3-Bit Parity, Encoder-Decoder test problems, and classification problems. The performance of the ABC algorithm for training neural networks was compared with BP, LM, GA, and PSO algorithms. They were reported that the ABC-based ANN training method reached more successful results by obtaining the highest classification success. (Brajevic and Tuba 2013) are used FA to train neural-networks by using two different transfer functions: sigmoid and sine. They also investigated the performance of FA with the results of ABC and GA obtained from benchmarks XOR, 3-Bit Parity, and 4-bit parity. According to experimental results, they concluded that superiority order among three algorithms was ABC, FA, and GA, respectively. A new study for FNN training with ABC was recently proposed by [38]. They modified ABC to accelerate the convergence and added a new selection method to improve the performance. Then they compared the training results with ABC variants.

Some of the various meta-heuristic optimization algorithms recently proposed as the learning method for FNNs are mentioned below. For example, Mirjalili et al. [22] focused on the Gravitational Search Algorithm (GSA) for training FNN and they proposed a hybrid learning method (PSOGSA) combining GSA and PSO. They compared the hybrid method with the original GSA and PSO by applying on three benchmark problems. Consequently, PSOGSA achieved better results in terms of converging speed and accuracy.

In another recent study, bird mating optimizer (BMO) inspired from the behaviors of birds in mating time, was used by [4] for the training of neural networks. They said that promising results had been achieved by the BMO-based training method on three benchmark datasets: Iris, Wisconsin-Breast-Cancer, Pima-Indian-Diabetes, and fuel cell system.

Piotrowski [27] had researched the performance of previous DEA based training methods and the researcher thought that stagnation is the main reason why it fails compared to other methods. In order to overcome this problem, it is reported that he achieved satisfactory results by combining global and local neighborhood-based mutation operators with the trigonometric mutation operator.

Tang et al. [34] adapted Dynamic Group Optimization (DGO) algorithm for training FNN. They used the DGO algorithm to find the optimal weights and bias values as well as to determine the structure of the FFN. Considering the reported experiments, they claimed that DGO is a suitable method for FNN training.

Swain et al. [33] proposed a hybrid metaheuristic combining the Gravitational Search algorithm and PSO for training FNN to diagnose the faults in wireless sensor networks.

3 Feed-forward neural-network

Feed-Forward Neural Network (FNN) is the most popular type of neural-network in which the data flow occurs only in the forward direction since FFNs can solve classification and regression problems effectively. The basic processing unit in FFN is defined as a neuron, which is inspired by biological nerve cells. Figure 1 shows the structure of an artificial neuron. Every single neuron generates one output signal corresponding to an input vector with n-dimensional. To accomplish a particular task, neural networks are trained by updating the values of the connections between neurons. These connections are usually named as weights and the connections do not form a cycle.

In FNN, the neurons are grouped in three types of layers: (i) input layer containing an equal number of neurons with the same number of problem-inputs, (ii) hidden layer containing at least one neuron that has to be in the solution of non-linear NP-hard problems and (iii) output layer containing an equal number of neurons with a number of problem-outputs. The numbers of neurons in input and output layers depend on the problem, while the number of neurons in the hidden layer and the number of hidden layers are defined by decision-makers who set the architecture to be applied [2]. The schematic of a general three-layer FNN is shown in Fig. 2.

Firstly, the weighted sum of inputs is calculated by Eq. (1) and then output ($y$) is calculated by sigmoid function as follows in Eq. (2).

$$net=\sum_{\mathcal{i}=1}^{n}{w}_{\mathcal{i}\mathcal{j}}{x}_{\mathcal{j}}+{w}_{bias},$$

(1)

$$y= f\left(net\right)=\frac{1}{1+{e}^{-\mathfrak{n}\mathfrak{e}\mathfrak{t}}},$$

(2)

where ${x}_{\mathcal{j}}$ indicates the i.th input and $i\in \left\{\mathrm{1,2},\dots ,n\right\}$. ${w}_{\mathcal{i}\mathcal{j}}$ is the weight between i.th input and j.th neuron. The red input with constant one value in Fig. 1 is a bias which is the weight value of a neuron which is independent of the inputs and it is applied to the hidden and output layers in order to balance the weight values in the training phase. The output of each neuron is calculated by a transfer function such as linear, radial, sigmoid, and hyperbolic tangent.

The aim of the learning procedure is to minimize the error caused by the difference between the calculated and expected output values. ANN is trained by inputs in training set until the output of the network satisfies the expected output. Then output values for unknown input values are estimated by using a trained network. To obtain acceptable results from the complex and nonlinear multivariable problems, ANN should be trained with sufficient data using an appropriate learning algorithm.

4 Vortex search optimization algorithm

Vortex Search (VS) is a single-solution based algorithm for numerical function optimization, which has been recently proposed by [9]. VS algorithm has an inspiration originated from the vortex rotation shape of liquids. The algorithm narrows the search space along with iterations by using the inverse incomplete gamma function so that it reaches the global optimum faster. In addition, VS does not contain any user-tuning parameter.

Assuming a two-dimensional optimization problem, the schema of a vortex is formed by a series of nested circles. In the first iteration, the center of the largest circle $\left({\mu }_{0}\right)$ is calculated by Eq. (3) for all dimensions.

$${\mu }_{0}=\frac{UpLimit+LowLimit}{2},$$

(3)

where $UpLimit$ and $LowLimit$ are the upper and lower boundaries of decision variables, respectively. Also, the initial radius $\left({r}_{0}\right)$ is determined by Eq. (4). Initially, a large value for ${r}_{0}$ is selected since the search area must be fully covered by the outer circle.

$${r}_{0}={\sigma }_{0}=\frac{max\left(UpLimit\right)-min(LowLimit)}{2}.$$

(4)

Then the candidate solutions are randomly generated by using Gaussian distribution within the specified circle in d-dimensional search space. The formulation of Gaussian distribution is given in Eq. (5).

$$\begin{gathered} p\left( {x{|}\mu ,{\Sigma }} \right) = \frac{1}{{\sqrt {\left( {2{\Pi }} \right)^{d} \left| {\Sigma } \right|} }}exp\left\{ { - \frac{1}{2}\left( {x - \mu } \right)^{T} {\Sigma }^{ - 1} \left( {x - \mu } \right)} \right\} \hfill \\ {\Sigma } = \sigma^{2} .\left[ {\text{\rm I}} \right]_{dxd} \hfill \\ \end{gathered}$$

(5)

where $x$ is the $dx1$ dimensional vector of a random variable and $\Sigma$ is the covariance matrix. ${\sigma }^{2}$ shows the variance of the distribution. $I$ indicates the $dxd$ equivalent matrix. After the generation of candidate solutions, the parameter values of solutions outside the boundaries are drawn into the boundaries. This operation is shown in Eq. (6).

$${\mathrm{s}}_{m}^{j}=\left\{\begin{array}{c}\begin{array}{c}\\ {\mathrm{ s}}_{m}^{j}<{LowLimit}^{j}\end{array}\\ \mathrm{rnd}.\left({UpLimit}^{j}-{LowLimit}^{j}\right)+{LowLimit}^{j}\\ \begin{array}{c}\\ {LowLimit}^{j}\preccurlyeq {\mathrm{s}}_{m}^{j}\preccurlyeq {UpLimit}^{j}\\ \begin{array}{c}{\mathrm{s}}_{m}^{j} \\ \\ \begin{array}{c}{LowLimit}^{j}, {\mathrm{ s}}_{m}^{j}>{UpLimit}^{j}\\ \begin{array}{c}\mathrm{rnd}.\left({UpLimit}^{j}-{LowLimit}^{j}\right)+{LowLimit}^{j}\\ \end{array}\end{array}\end{array}\end{array}\end{array}\right.,$$

(6)

where $m=\mathrm{1,2},\ldots,n$ and $j=\mathrm{1,2},\ldots,d$ and $\mathrm{rnd}$ is a random value in the range [0, 1].

The next operation is the selection of the best solution from the candidate solutions set. The best means that a solution has the best fitness value among all current solutions. A greedy selection is applied between the selected solution and the best solution found up to now. The better one is stored in memory and replaced with the current center of circle $\mu$. In this phase, the circle radius is decreased so that exploitation increases in search space. This radius decrement is slower in the early iterations of the algorithm since exploration is prioritized. However, the reduction of the radius is accelerated to ensure better exploitation during the second half of the total iterations. In the VS algorithm, an adaptive radius reduction strategy is used with the incomplete inverse gamma function to reduce the radius. The gamma function $\Gamma \left(a\right)$ is given in Eq. (7).

$$\begin{gathered} \Gamma \left( a \right) = {\upgamma }\left( {{\text{x}},{\text{a}}} \right) + \Gamma \left( {x,a} \right) \hfill \\ {\upgamma }\left( {{\text{x}},{\text{a}}} \right) = \mathop \smallint \limits_{0}^{{\text{x}}} {\text{e}}^{{ - {\text{t}}}} {\text{t}}^{{{\text{a}} - 1}} {\text{dt }} a > 0 \hfill \\ \Gamma (x,a) = ?_{x}^{\alpha } e^{ - t} t^{a - 1} dta > 0 \hfill \\ \end{gathered}$$

(7)

where $\upgamma \left(\mathrm{x},\mathrm{a}\right)$ and $\Gamma (x,a)$ are incomplete gamma function and complementary, respectively. $a$ defines the inflexibility of the search, $\mathrm{x}$ is a random variable. In each iteration, the radius is updated by Eq. (8).

$$\begin{gathered} r_{t} = \sigma_{0} .\frac{1}{x}.\Gamma \left( {x,a_{0} } \right) \hfill \\ a_{t} = a_{0} - t/MaxItr. \hfill \\ \end{gathered}$$

(8)

Here, in the first iteration ${a}_{0}$ equals 1 in order to cover the entire search space. $t$ refers to the iteration number and $MaxItr$ represents the maximum iteration number. This loop is repeated until a termination condition is satisfied.

5 VS-based feed-forward neural network

Several improvements can be made to increase the performance of FNNs. Some of the major ones are grouped into three headings. The first one is to compose the architecture of the FNN. In other words, changing the number of hidden layers and the number of neurons in each hidden layer is a crucial parameter to influence learning performance. The second heading is to determine the transfer function and optimal user-defined values (such as epoch size and learning rate) of the training method for the problem being addressed. Lastly, to determine the best connection weights is another heading.

In this study, we focus on the third heading, and the VS algorithm is used to determine the best weights and biases by minimizing the error in an FNN. It is executed as a training method by tuning all weights in FNN with a specified architecture. For this purpose, weights and biases represent decision variables of search space in the optimization process. These are defined in a matrix and candidate solutions are generated concerning that matrix.

Figure 3 indicates the distributions of the weights and biases for an FNN with a 2–2–1 structure. In this study, the matrix encoding strategy shown below is used as it is dealing with training FNNs. An encoded candidate solution matrix (CS) consists of four weight vectors, which are W1, B1, W2, and B2. These are represented as follows:

$$W1=\left[\begin{array}{cc}{W}_{13}& {W}_{23}\\ {W}_{14}& {W}_{24}\end{array}\right]$$

$$B1=\left[\begin{array}{c}{W}_{B1}\\ {W}_{B2}\end{array}\right]$$

$$W2^{\prime}=\left[\begin{array}{c}{W}_{36}\\ {W}_{46}\end{array}\right]$$

$$B2=\left[{W}_{B3}\right]$$

$$CS=\left\{\begin{array}{lll}W1& B1& \begin{array}{ll}W2{^{\prime}}& B2\end{array}\end{array}\right\}=\{\mathrm{W}13,\mathrm{ W}14,\mathrm{ W}23,\mathrm{ W}24,\mathrm{ WB}1,\mathrm{ WB}2,\mathrm{ W}36,\mathrm{ W}46,\mathrm{ WB}3\}$$

where W1 is the hidden layer weight matrix, B1 is the hidden layer bias matrix, W2 is the output layer weight matrix, W2′ is the transpose of W2, and B2 is the hidden layer bias matrix.

The training of FNN with VS, i.e. VS-FNN method starts to run after a suitable FNN architecture is defined for the selected dataset. To achieve the best values of weights and biases, the search space is constructed depending on the CS matrix. It means that the dimension of the problem equals the total number of weights and biases. Therefore, similar to how large-scale optimization problems are handled, the number of candidate solutions to be generated in one iteration in the complex FNN architectures can be increased compared to FNNs with a smaller dimension. The next step of the algorithm is to calculate the first center point and radius by using Eqs. (3) and (4), respectively. Then, the candidate solutions are randomly generated within the determined radius around this center. The solution with the least training error is selected as the new center point for the next iteration, and the search space is narrowed by decreasing the radius. Thus, the center point is evolved during the iterations.

Since the training of FNN is considered as an optimization problem, an objective function has to be defined to evaluate the fitness of the weight values to be generated during the optimization process. In this proposed approach, the mean training error is used as an objective function. Therefore, the feed-forward calculation is first applied for each sample in the dataset. Then, the errors which are the difference between calculated and expected values are found. Finally, the mean squared error (MSE) is calculated in Eq. (9).

$$\mathrm{MSE}\left(\overrightarrow{\upomega }(\mathrm{t}\right))=\frac{1}{\mathrm{N}}\sum_{\mathcal{j}=1}^{\mathrm{N}}\sum_{\mathrm{k}=1}^{\mathrm{P}}{({\mathrm{d}}_{\mathrm{k}}-{\mathrm{o}}_{\mathrm{k}})}^{2},$$

(9)

where $\overrightarrow{\upomega }(\mathrm{t})$ connection weights in iteration $t$. ${\mathrm{d}}_{\mathrm{k}}$ represents the desired output value and ${\mathrm{o}}_{\mathrm{k}}$ produced output value; P shows the number of output neurons and N is the number of samples.

VS-FNN continues until the maximum number of cycles is reached. The flowchart of VS-FNN can be seen in Fig. 4.

6 Experimental results

In this study, the proposed VS based learning method (VS-FNN) compares with PSO, ABC, GA, SA, and SGD algorithms by applying on an FNN with the same structure for all methods. These methods are called PSO-FNN, ABC-FNN, GA-FNN, SA-FNN, and SGD, respectively. To evaluate the performance of the proposed VS-FNN method, all methods are applied to six benchmark problems. The first of them is the three-bits parity (3-bit XOR) problem, whose inputs and expected outputs are shown in Table 1. The other problems are five popular datasets which are stated in the UCI machine learning repository of University of California at Irvine [10]. These are the Iris classification, Wine-Recognition, Wisconsin-Breast-Cancer, Pima-Indians-Diabetes, and Thyroid-Disease. In this paper, for simplicity, the names of datasets are abbreviated in some places as iris, wine, WBC, PID, thyroid respectively.

Table 1 Three bits parity problem

Vortex search optimization algorithm for training of feed-forward neural network

Abstract

Similar content being viewed by others

Designing evolutionary feedforward neural networks using social spider optimization algorithm

Memetic algorithms for training feedforward neural networks: an approach based on gravitational search algorithm

Weight Optimization in Artificial Neural Network Training by Improved Monarch Butterfly Algorithm

Explore related subjects

1 Introduction

1.1 The motivation and contribution

2 Related works

3 Feed-forward neural-network

4 Vortex search optimization algorithm

5 VS-based feed-forward neural network

6 Experimental results

6.1 Selected FNN structures

6.2 Results and discussion

6.2.1 The N bit parity problem

6.2.2 The iris classification problem

6.2.3 The wine recognition problem

6.2.4 The WBC problem

6.2.5 The PID problem

6.2.6 The thyroid-disease problem

7 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation