Training radial basis function networks using biogeography-based optimizer

Aljarah, Ibrahim; Faris, Hossam; Mirjalili, Seyedali; Al-Madi, Nailah

doi:10.1007/s00521-016-2559-2

Training radial basis function networks using biogeography-based optimizer

Original Article
Published: 24 August 2016

Volume 29, pages 529–553, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Training radial basis function networks using biogeography-based optimizer

Download PDF

Ibrahim Aljarah ORCID: orcid.org/0000-0002-9265-9819¹,
Hossam Faris¹,
Seyedali Mirjalili² &
…
Nailah Al-Madi³

1099 Accesses
81 Citations
Explore all metrics

Abstract

Training artificial neural networks is considered as one of the most challenging machine learning problems. This is mainly due to the presence of a large number of solutions and changes in the search space for different datasets. Conventional training techniques mostly suffer from local optima stagnation and degraded convergence, which make them impractical for datasets with many features. The literature shows that stochastic population-based optimization techniques suit this problem better and are reliably alternative because of high local optima avoidance and flexibility. For the first time, this work proposes a new learning mechanism for radial basis function networks based on biogeography-based optimizer as one of the most well-regarded optimizers in the literature. To prove the efficacy of the proposed methodology, it is employed to solve 12 well-known datasets and compared to 11 current training algorithms including gradient-based and stochastic approaches. The paper considers changing the number of neurons and investigating the performance of algorithms on radial basis function networks with different number of parameters as well. A statistical test is also conducted to judge about the significance of the results. The results show that the biogeography-based optimizer trainer is able to substantially outperform the current training algorithms on all datasets in terms of classification accuracy, speed of convergence, and entrapment in local optima. In addition, the comparison of trainers on radial basis function networks with different neurons size reveal that the biogeography-based optimizer trainer is able to train radial basis function networks with different number of structural parameters effectively.

Considering radial basis function neural network for effective solution generation in metaheuristic algorithms

Article Open access 22 July 2024

Automatic Design of Radial Basis Function Networks Through Enhanced Differential Evolution

Mixed Radial Basis Function Neural Network Training Using Genetic Algorithm

Article 10 July 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Radial basis function (RBF) networks are one of the most popular and applied type of neural networks. RBF networks are universal approximators and considered as special form of multilayer feedforward neural networks that contain only one hidden layer with Gaussian based activation functions. RBF networks were first introduced by Broomhead et al. [8] with a strong foundation in the conventional approximation theory [13, 16].

The advantages of RBF networks compared to other neural networks include the high generalization capability, its simple and compact structure (i.e; only three layers), easier parameters adjustment, very good noise tolerance and the high learning speed [26, 60]. Due to these advantages, RBF networks have been a common alternative to MLP networks [16]. Moreover, RBF networks have been successfully applied to many applications like: systems identification [2, 36], process faults classification [31], nonlinear control [30] and time series forecasting [11, 18].

Like other neural networks, RBF networks have two major components: the structure and the training method. The training method has a significance influence on the performance of the network. In literature, researchers proposed and investigated a wide variety of learning schemes for RFB network.

Castao et al. [9] classified the training methods of RBF networks into two categories: quick learning and full learning. The quick learning methods are more popular where the learning process can be performed in two independent stages: In the first stage, the structure of the network (i.e; the centers and widths of the network) is identified usually by an unsupervised learning algorithm like K-Means algorithm, while in the second stage, the connection weights between the hidden and output layers are tuned using least mean squares (LMS), gradient-based methods and variations of the backpropagation algorithm [42]. The drawback of using an unsupervised technique to locate the centers is that it depends only on the input features and it does not consider the distribution of the label class [50]. Moreover, using the common K-Means algorithm does not necessarily guarantee that the centers are best located [33]. On the other side, the main issue with the gradient methods is that its highly probable that the search process will be trapped in a local minima. Moreover, Vakil-Baghmisheh and Pave reported in [48] that applying customized version of the backpropagation algorithm to RBF networks could suffer from some drawbacks like the slow convergence and over-training which consequently affects the generalization ability of the model. Alternatively, the full learning methods optimize the RBF parameters simultaneously as a supervised task.

Nature-inspired Metaheuristic algorithms have been widely investigated in evolving and training RBF networks. These algorithms are based on stochastic search algorithm that are simulated by natural systems and phenomenons. Most of Nature-inspired metaheuristics are population based and rely on randomness as an essential principle of their process. The advantage of these search methods is their flexibility, self-adaptation, conceptual simplicity and ability of searching for a global optima rather than a local one [17]. Nature-inspired metaheuristic were deployed in different ways in training RBF networks. Some were applied for finding one parameter of the network like the centers [28], others optimized all the parameters [43], while others investigated optimizing all the parameters along with the structure of the network. Such algorithms applied to RBF networks training include: Genetic Algorithms (GA) [6, 15, 21], particle swarm optimization (PSO) [25, 37, 40, 47], Ant Colony Optimization (ACO) [12, 45], Differential Evolution (DE) [4, 14, 58], Firefly algorithm (FFA) [24], Cuckoo search (CS) [3, 10], Honey Bee Mating Optimization (HBMO) [23], Artificial Bee Colony [29] and BAT Algorithm [46].

According to the No Free Lunch theorem (NFL), there is no heuristic algorithm that certainly performs better than all other algorithms in all optimization problems [7, 22, 49]. Motivated by this reason, in this work, we propose a novel RBF training algorithm based on the recent biogeography-based optimizer (BBO) for optimizing the parameters (centers, widths and weights) of RBF network, simultaneously. BBO is an evolutionary algorithm, which was developed by Simon [44]. BBO was inspired by the studies related to the geographical distribution of biological organisms in terms of time and space. Recently, BBO optimizer has been applied in training neuro-fuzzy networks [38] and feedforward MLPs [35] and showed high modeling capability. However, according to our knowledge, there is no previous work investigating the efficiency of the well-regarded BBO algorithm in any type of RBF network training.

In order to evaluate the efficiency and effectiveness of the new BBO trainer, the proposed trainer is applied on twelve popular benchmark datasets, which are selected from the UCI machine learning repository.^{Footnote 1} The results of the proposed BBO trainer are compared with those obtained with other eleven algorithms. Six algorithms out of the eleven are classical evolutionary algorithms, which are the GA, PSO, DE, Evolutionary Strategy (ES), ACO and the population-based incremental learning (PBIL). While four algorithms are recent nature-inspired algorithms which are the FFA, CS, ABC and BAT Algorithm. The eleventh algorithm is actually a hybrid two stages training algorithm based on K-Means and gradient decent optimization.

This paper is organized as follows: in Sect. 2 a description of the RBF network, and its classical two-phases learning approach is given. In Sect. 3 the BBO algorithm is explained. Section 4 describes in detail the developed BBO-based approach for training RBF network. Experimental results are outlined in Sect. 5. Finally, the finding and remarks of this work, and future works are concluded in Sect. 6.

2 Radial based function neural networks

RBF neural network is a special type of fully connected feedforward networks that consists of only three layers: input, hidden and output layers. The number of neurons in the input layer depends on the number of dimensions of the input vector, whereas output layer neurons depend on the number of class labels in the data. The number of neurons in the hidden layer determines the topology of the network which also determines the decision boundary between data clusters. Each hidden neuron has an RBF activation function that calculates the similarity between the input and a stored prototype in that neuron. Having more prototypes results in a more complex decision boundary, which means higher accuracy. However, it results of more computations to evaluate the network.

Figure 1 shows the structure of RBF network in comparison with the Multilayer Perceptron network. Inspecting this figure, it may be seen that the arrows between the input layer and the hidden neurons in the RBF network represent the Euclidean distance between the input vector and the prototypes stored in the hidden neurons. On the other side, in MLP, arrows between the input layer and output layer represent weights. Moreover, in RBF networks, activation functions in the hidden nodes are Gaussian basis functions, while in MLP the sigmoidal functions are typically used.

RBF ANN process works as follows, first the input data enter the network through the input layer. After that, each neuron in the RBF layer (hidden layer) calculates the similarity between the input data and the prototype stored inside it, using the nonlinear Gaussian function shown in Eq. 1.

$$\begin{aligned} \phi (\left\| x-c_{j}\right\| )=\mathrm{exp}-\left( \frac{\left\| x-c_{j}\right\| ^{2}}{2\sigma _{j}^{2}}\right) \end{aligned}$$

(1)

where $\left\| x-c_{j}\right\| $ is Euclidean norm.

The output of the RBF is calculated using weighted average method by the following equation:

$$\begin{aligned} y_{i}=\sum _{j=1}^{n}\omega _{ji}\phi _{j}(x) \end{aligned}$$

(2)

where $\omega _{ji}$ represents the ith weight between the hidden layer and output layer, and n represents the number of hidden nodes.

The output of the RBF neuron is closer to 1 whenever the similarity between the input and the prototype is high, and close to zero otherwise. The output layer neurons takes the weighted sum of every RBF neuron output in order to decide the class label. Which means that every RBF neuron contributes in the labeling decision, higher similarity has larger contribution.

2.1 Classical radial basis function network training

Classical RBF process depends mainly on three points, the prototypes inside each RBF neurons and how to be chosen perfectly, the beta value in the similarity equation, and the weights between hidden layer and output layer (which affects the last decision). Choosing the prototypes can be done using many approaches such as choosing random data points from the data, or using K-Means clustering approach and use the clusters centers as prototypes or any other approach you may choose. Using K-Means clustering is the most used approach in the literature as it helps in smartly choosing small number of RBF neurons (K neurons), where each neuron represents a cluster in the data. Moreover, having only K neurons does not affect the complexity of the RBF network nor the accuracy of the classification decision.

Beta coefficient in the RBF activation function controls the width of the bell curve and should be determined in a manner that optimizes the fit between the activation function and the data. When K-Means is used to choose the RBF neurons prototypes, then Beta can be set using the following equation:

$$\begin{aligned} \mathrm{Beta}=\frac{1}{2 \times \sigma ^{2}} \end{aligned}$$

(3)

where $\sigma $ equals to the average distance between all points in the cluster and the cluster center.

Training the output weights is the third important parameter to set for RBF to work perfectly. Training these weights can be done using the Gradient decent which is an optimization technique that takes the outputs of the RBF neurons as input and optimize the weights according to them. Gradient descent must be run separately for each output node. The following subsections describe more details about K-Means and gradient decent methods that are selected in this work as a classical approach for optimizing the connection weights.

2.1.1 K-Means

K-Means is considered one of the most efficient clustering algorithms that used in many applications in the literature. K-Means clustering has many advantages, such as simplicity to implement and has good performance with large dataset. K-Means is a partitioning clustering algorithm, where the objective is to maximize the similarity between the members in each cluster and minimize the similarity between the members in different clusters. The main idea of the K-Means clustering is to define k centers, one center for each cluster. The data points are assigned to the proper cluster based on the minimum distances to all cluster centers. After that, cluster centers should be modified in an iterative way by calculating the mean of cluster’s members to achieve the best clustering quality (The squared error function). This process is continued until centers do not change any more.

2.1.2 Gradient decent

Gradient decent (GD) is considered an optimization algorithm uses the first-order derivative calculation to find a local minimum of a function. The algorithm applies consecutive steps to find the gradient of the objective function at the current point. The output of a RBF network can be represented as shown in Eqs. 4, and 5, while the error function E is given in Eq. 7, where $\hat{y}_{i,k}$ is the response value of the ith output unit, and $y_{i,k}$ is the actual response. The Gradient decent algorithms can be used to find the solution matrix W as shown in Eq. 7, where $\eta $ is a small decreasing value called the learning rate.

$$\begin{aligned} \hat{y}= \,& {} (y_{1},y_{2},...,y_{m})=\left[ \begin{array}{ccccc} \omega _{11} &{} \omega _{11} &{} . &{} . &{} \omega _{1m}\\ \omega _{21} &{} \omega _{21} &{} . &{} . &{} \omega _{2m}\\ . &{} . &{} . &{} . &{} .\\ . &{} . &{} . &{} . &{} .\\ \omega _{I1} &{} \omega _{I1} &{} . &{} . &{} \omega _{Im} \end{array}\right] \left[ \begin{array}{c} \phi _{1}(x)\\ \phi _{2}(x)\\ .\\ .\\ \phi _{m}(x) \end{array}\right] \end{aligned}$$

(4)

$$\begin{aligned} O= \, & {} W \cdot H \end{aligned}$$

(5)

$$\begin{aligned} E=\, & {} \frac{1}{2}\sum _{k=1}^{M}\sum _{i=1}^{L}(y_{i,k}-\hat{y}_{i,k})^{2} \end{aligned}$$

(6)

$$\begin{aligned} \omega _{ij}=\, & {} \omega _{ij}-\eta \frac{\partial E}{\partial \omega } \end{aligned}$$

(7)

In this work, the conjugate gradient (CG) is used to optimize the weights in the standard RBF network. CG is a special type of gradient descent with regularization that used to compute search directions. The CG uses a line search with quadratic, and cubic polynomial approximations. The stopping criteria that used in CG is the Wolfe–Powell, and CG guesses the initial step sizes using slope ratio method.

3 Biogeography-based optimization optimizer

Evolutionary Algorithms (EAs) belong to the class of stochastic population-based algorithms. As the name implies, such techniques approximate the global optima for optimization problems using stochastic operators. The optimization process first starts with a set of random solutions as candidate solutions for a given problem. This set is then evolved using different mechanism defined by the algorithm to find a better approximation of the global optimum. This framework is common between all EAs despite different mechanisms to evolve the solutions.

One of the most recent and well-regarded EAs proposed in the literature is the biogeography-based optimization (BBO) algorithm [44]. This algorithm mimics evolutionary phenomena in the field of biogeography to solve optimization problem. The main inspiration of BBO is based on the fact that nature balances between the prey and predator using migration in the same habitat and different habitats.

In BBO, each solution represents a habitat and each variable in the solution indicates a habitant (prey of predator). The objective function is called Habitat Suitability Index (HSI), which obviously shows how suitable a habitat is. The following rules should be considered to simulate the evolvement of habitats and habitants in nature:

1.
Habitants in any habitat face mutation regardless of their HSI.
2.
Habitants in a habitat with better HSI are more likely to migrate to habitats with worse HSI.
3.
Habitants in a habitat with worse HSI are more likely not to migrate.
4.
Immigration is always from better habitat to worse habitat.
5.
Each habitat has a rate of immigration and emigration, which define the rate of immigration to or from other habitats.

The immigration between habitats are simulated by exchanging the variables of solutions. In BBO algorithm, each habitat has different emigration and immigration rates to simulate habitats with different characteristics in nature. Obviously, with constant migration rates between habitats, the BBO is not able to balance exploration and exploitation. Therefore, this algorithm has been equipped with the following adaptive immigration and emigration rates:

$$\begin{aligned} \mu _k=\, & {} \frac{E \times n}{N} \end{aligned}$$

(8)

$$\begin{aligned} \lambda _k=\, & {} I \frac{1-n}{N} \end{aligned}$$

(9)

where n is the habitants number, N is the maximum habitants allowed, E is the maximum allowed emigration rate, and I is the maximum immigration rate. The mutation rate has also been required to change adaptively as follows:

$$\begin{aligned} m_n = \textit{M}\left( 1-\frac{p_n}{p_{\mathrm{max}}} \right) \end{aligned}$$

(10)

where M is the initial value, $P_n$ is the mutation probability, and $P_{\mathrm{max}}$ shows the maximum probability.

The significant number of works in the literature proves that the BBO algorithm is able to solve optimization problems. This is due to the high exploration of this algorithm, which originate from the migration mechanism between the habitats. The migration mechanisms abruptly changes the solution, which assist the BBO to avoid local solutions and determine an accurate approximation of the global optima for challenging problems effectively. This motivated our attempts to propose a trainer based on BBO to train RBFN for the first time in the literature.

4 Biogeography-based optimization for training radial basis function networks

In contrast to the classical approach where RBF network are trained in two independent phases, our proposed BBO-based approach searches for all RBF network parameters simultaneously. The parameters are the centers, widths and connection weights including the bias terms. In the proposed training algorithm, each habitat is encoded to represent these parameters as shown in Fig. 2 where $C_{i}$ is the center of the hidden neuron i, $\sigma _{i}$ is the width of that neuron and $\omega _{ij}$ is weight connecting between neuron i and output unit j. Habitats are implemented as real vectors with a length D which can be calculated as follows: suppose that n is number of hidden neurons, I is the number of features in the dataset and m is the number of output units then D can be calculated as given in Eq. 11.

$$\begin{aligned} D = (n \times I)+ n + (n \times m) + m \end{aligned}$$

(11)

In order to evaluate the fitness value (HSI) of the habitats (candidate RBF networks), the mean squared error (MSE) is calculated over all training samples for each habitat. MSE can be given as in Eq. 12 where y is the actual output, $\hat{y}$ is the estimated output and k is the total number of instances in the training dataset.

$$\begin{aligned} \mathrm{MSE} = \frac{1}{k} \sum _{i=1}^{k}(y - \hat{y})^{2} \end{aligned}$$

(12)

Based on the encoding scheme and the fitness evaluation described above, the BBO algorithm is designed to train the RBF networks as described in the flowchart in Fig. 3. This figure shows that the BBO first creates a set of random candidate solutions, which includes RBF networks with random connection weights and biases. This algorithm then repeatedly calculates the MSE for all the RBF networks when classifying the training data. The MSE shows which “random” RBF is better. Based on the rules discussed above, the BBO algorithm creates a set of new RBF networks considering the best RBF networks found so far. The process of calculating MSEs and improving the RBFs continues until the satisfaction of the end criterion, which could be a threshold or maximum iterations. It should be noted that the average MSE is calculated when classifying all training samples in the dataset for each RBF network in the proposed BBO-based trainer. Therefore, the computational complexity is of O(ntd) where n is the number of random RBF networks, t indicates the maximum iterations, and d shows the number of training samples in the dataset.

5 Experiments and results

In this section, the BBO training algorithm is evaluated on twelve datasets to verify the power of BBO for RBF neural network training. Furthermore, a comprehensive comparison of the BBO with other ten well-known metaheuristic algorithms is conducted. The metaheuristic algorithms that are used in this experiment are: GA, PSO, ACO, ES, PBIL, DE, Firefly, Cuckoo search, ABC and BAT Algorithm which are the most common metaheuristic-based trainers for RBF network in the literature. In addition, the BBO trainer is compared with the RBFclassic (Gradient-based) technique, which is considered the common method for training the RBF neural network.

5.1 Experimental setup

The MATLAB R2010b is used to implement the proposed BBO trainer and other algorithms. All datasets are divided using 66, 34 % for training and testing, respectively. 10 different runs are executed for all experiments, and 250 iterations in each run. Moreover, the population size is fixed to 50 individuals for all algorithms. The parameter settings for each algorithm are shown in Table 1.

In CS, besides the population size, the discovery rate $p_{\alpha }$ is the only parameter needs to be tuned. $p_{\alpha }$ is set to 0.25 since it was stated in [55] that this value is sufficient for most optimization problems. For Firefly, Beta is set to 1 as it was reported in [56] that parametric studies suggest setting the value of Beta to 1 can be used for most applications while gamma can be set to $1/\sqrt{L}$ where L is a scaling factor and if the scaling variations are not significant, then we can set gamma = O(1). Alpha is roughly tuned and set to 0.2. Same values were used and applied in previous studies as in [52, 53].

For PSO, acceleration constants are typically set to $\approx$2 [1, 57]. We use also linear decreasing strategy to update the Inertia in the interval [0.9,0.6]. It was found by experiments in the literature that this strategy improves the efficiency and performance of PSO achieving excellent results [5, 51].

For GA, the crossover probability is usually set to a much high rate, while the mutation probability is set to a much low probability [19]. With a rough tuning, the crossover and mutation probabilities are set to 0.9 and 0.1, respectively. For DE, the DE/rand/1/bin variant is applied with the crossover probability and differential weight equal to 0.9 and 0.5 as applied and recommended in [34, 59].

For ACO, ES and PBIL, all parameters are set as used and applied in [35, 44]. And for ABC and BAT, the defaults parameters are used [27, 54].

For BBO, we used the same parameters as in [35, 44] habitat modification probability is set to 1, immigration probability bounds per gene = [0, 1], step size is set to 1, maximum immigration and migration rates for each island is set to 1, while the mutation probability is set 0.05 as in [35].

However, it is worth mentioning that finding the best parameters of these algorithms is considered as another optimization problem by itself and it is known as meta-optimization. Therefore, fine tuning the optimization algorithms is out of scope if this work [39]. All dataset are normalized to the interval of [0, 1].

Table 1 The metaheuristic algorithms with initial parameters

Training radial basis function networks using biogeography-based optimizer

Abstract

Similar content being viewed by others

Considering radial basis function neural network for effective solution generation in metaheuristic algorithms

Automatic Design of Radial Basis Function Networks Through Enhanced Differential Evolution

Mixed Radial Basis Function Neural Network Training Using Genetic Algorithm

Explore related subjects

1 Introduction

2 Radial based function neural networks

2.1 Classical radial basis function network training

2.1.1 K-Means

2.1.2 Gradient decent

3 Biogeography-based optimization optimizer

4 Biogeography-based optimization for training radial basis function networks

5 Experiments and results

5.1 Experimental setup

5.2 Datasets description

5.3 Results

5.4 Comparisons with traditional classifiers in the literature

6 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation