1 Introduction

The problem of improving the learning of artificial neural networks (ANNs) is considered as a challenging task as the literature shows. This problem is to find optimal values for structural parameters of ANNs (mostly weights and biases) to achieve the minimum classification, predication, or approximation errors. In fact, learning is the main and key process in any type of ANNs. Due to the high dimensionality of this problem and varying search space with respect to the given data set, the problem of ANN learning enhancement is considered as a challenging task.

Generally speaking, there are three types of learning in ANNs: supervised [1, 2], unsupervised [3, 4], and reinforcement [5, 6]. As its name implies, in a supervised learning process, a supervisor provides feedbacks about the performance of an ANN based on the given training sample. So, an ANN is provided with its performance and allowed to adjust its structural parameters with respect to it. In contrary, there is no feedback from an external supervisor in unsupervised learning. In this case, an ANN has to assess its performance individually. Finally, ANNs are punished or rewarded with incorrect or correct actions during an enforced learning process. So, the feedbacks are limited to two types: right or wrong. In this case, an ANN has to adapt itself to the training samples based on the provided feedbacks. This type of learning is highly similar to learning mechanism of trained animals.

Regardless of the differences between the three types of learning in ANNs, the similarity is the objective. The ultimate goal of a learning process is the find the best structural parameters of NNs to achieve highest performance. In feedforward neural network (FNN) [7], which is the focus of this study, the most important structural parameters are weights of the connections between various neurons in different layers and the biases of the neurons themselves. There are other parameters that are of interest as well: number of hidden neurons and number of hidden layers. Since the structure of a FNN is defined before the learning process, however, the weights and biases are mostly considered as the main variables in learning enhancement of FNN.

The conventional learning algorithm for FNNs is the so-called back-propagation (BP) algorithm [8]. This algorithm is a mathematical gradient-based algorithm that is indeed the most popular algorithm in this field. In this algorithm, the training samples are fed to the FNN and the difference between the desirable output and obtained output (error) is propagated backward to adjust the weights and biases. This iterative process is continued until the satisfaction of an end criterion, which is mostly a threshold of the error.

Although the BP algorithm can be very effective in simple and mostly linear data sets, there is one main drawback: premature convergence. The BP algorithm utilizes gradient descent. Therefore, the initial random solution is guided towards the steepest valley in the search space. In this case, the quality of the obtained solutions highly depends of the initial solution. In addition, there is high possibility of local optima stagnation. It is quite often that the error stays constant during the learning process for a long time and that there is no further improvement, which shows that the algorithm entraps in local optima. The literature shows that heuristic algorithms are promising alternatives for alleviating this main drawback in learning FNNs [9].

Heuristic optimization algorithms mostly start the optimization process by creating one or a set of random solutions [1014]. They then improve the initial random solution(s) mostly using nature-inspired concepts until an end condition is met [15, 16]. Since heuristics methods randomly create solutions and improve them, they have high ability to avoid of local optima [1721]. Some of the most popular algorithms in this field are as: ant lion optimizer (ALO) [22], genetic algorithm (GA) [23], particle swarm optimization (PSO) [24, 25], ant colony optimization (ACO) [26], differential evolutions (DE) [27], evolution strategy (ES) [28], and PSOGSA [29]. Nearly, the majority of these algorithms have been employed to improve the learning of FNNs.

Montana and Davis was first utilized GA to improve learning of NNs [30]. In 1990, the GA algorithm was again applied to FNN [31]. The weights and biases of FNN were the variables. Belew et al. showed that this algorithm is able to effectively enhance the learning of FNN. There are also other studies that integrate GA to NN for improving learning using different methods [3235]. The PSO algorithm was also employed as the trainer for FNN in many studies [3641]. Although ACO is suitable for combinatorial problems, it was shown in [4245] that this algorithm is able to provide very promising results when applying to NNs as well. Some other heuristic-based learning algorithms in the literature are as follows: DE-based trainer [46, 47], ABC-based trainer [48, 49], gravitational search algorithm (GSA)-based trainer [50, 51], Tabu search (TS)-based trainer [52, 53], bio-geography-based optimization (BBO)-based trainer [54], and ES-based trainer [55], magnetic optimization algorithm (MOA)-based trainer [56], and grey wolf optimizer (GWO) [57].

Despite the merits of the above-mentioned works, the problem of local optima entrapment still persists. In addition, there is a theorem in the field of heuristics called No Free Lunch [58] that says there is no optimization algorithm for solving all problems. Since FNNs are trained for different data sets, there are possibilities that one algorithm performs well on a data set but worse on another. These reasons allow researcher to investigate the efficiencies of new algorithms in enhancing learning of FNNs. This is also the contribution of this study, in which the recently proposed social spider optimization (SSO) algorithm [59, 60] is chosen to be embedded to FNNs. The only similar work in the literature is that of Pereira et al. [61], in which the SSO algorithm has been employed to train neural networks and classify Ionosphere, Satimage, Diadol, Mea, and Spiral data sets. They only optimized the weights of the connections, but this work optimizes the weights and biases of MLPs simultaneously. Therefore, the problem of learning MLP is more difficult in this work. We also solve five different standard classification data sets. Other contribution of this work is the comparison of different heuristic algorithms on classification data sets. The rest of the paper is written as follows.

Section 2 presents the rudimentary concepts of FNNs. The general concepts and mechanism of SSO algorithm is provided in Sect. 3. The new SSO-based learning process of FNN is proposed in Sect. 4. Section 5 includes the results, discussion, and analysis. Eventually, Sect. 6 concludes the study and advises a couple of directions for future studies.

2 Feedforward neural network

As the name of FNN show, data are cascaded in one direction between the neurons in the network [7]. The neurons are arranged in parallel layers, and each neuron in tth layer receives data from the neurons t − 1th layer and delivers data to the neurons in t + 1th layer. The structure of FNN with 1 input, 1 hidden, and 1 output layer is illustrated in Fig. 1.

Fig. 1
figure 1

Three-layer FNN

As shown in Fig. 1, the input layer is the layer that is given the inputs. Then, the inputs are multiplied by the weights and delivered as the inputs of the hidden layer. The same scenario occurs for the data transition between the hidden layer and output. The actual outputs of neurons are calculated by a transfer function. Interested authors are referred to [7] for more details. Although the structure and workflow of FNN is very simple, it has been proven that a three-layer FNN is able to approximate any given function [62].

When providing inputs for a FNN, the weighted sum of inputs are first calculated as follows:

$$s_{\text{j}} = \mathop \sum \limits_{{{\text{i}} = 1}}^{n} (W_{\text{ij}} . X_{\text{i}} ) -\uptheta_{\text{j}} , \quad j = 1,2, \ldots \;h$$
(2.1)

where n is the number of the input nodes, W ij shows the connection weight from the ith node in the input layer to the jth node in the hidden layer, θ j is the bias (threshold) of the jth hidden node, and X i indicates the ith input.

Then, a sigmoid function defines the final output of each hidden node as follows:

$$S_{\text{j}} = {\text{sigmoid}}\left( {s_{\text{j}} } \right) = \frac{1}{{\left( {1 + \exp \left( { - s_{\text{j}} } \right)} \right)}}, \quad j = 1,2, \ldots \;h$$
(2.2)

Finally, the same two steps define the output of the network as follows:

$$o_{\text{k}} = \mathop \sum \limits_{{{\text{j}} = 1}}^{\text{h}} (w_{\text{jk}} . S_{\text{j}} ) -\uptheta_{\text{k}}^{'} , \quad k = 1,2, \ldots ,m$$
(2.3)
$$O_{\text{k}} = {\text{sigmoid}}\left( {o_{\text{k}} } \right) = \frac{1}{{\left( {1 + \exp \left( { - o_{\text{k}} } \right)} \right)}}, \quad k = 1,2, \ldots ,m$$
(2.4)

where w jk is the connection weight from the jth hidden node to the zth output node, and θ k is the bias (threshold) of the kth output node.

As discussed in Sect. 1, the most important structural parameters of FNN are weights and biases. Equations (2.1) and (2.3) show how they define the final output of FNN. We propose a new learning algorithm in the next section to find the optimal values for weights and biases.

3 Social Spider Optimization Algorithm

The social spider optimization (SSO) algorithm was proposed by Cuevas et al. in [60]. This algorithm is a swarm-based algorithm, which mimics the social intelligence of spiders who live in a colony. In fact, the social communication of spiders using the vibration throughout the web was the main inspiration of this algorithm. In this algorithm, the search space of optimization problems is considered as the web and the search agents as the spiders of the colony. The search agents are divided to two types: males (M) and females (F). The weight of each spider is defined as its fitness. The general operators that apply for modifying spiders (candidate solutions) are defined based on the gender. In the SSO algorithm, similar to a spider colony, the number of females is higher than the number of males and initially considered as 60 to 90 % of the population.

During optimization, spiders communicate with vibrating the strings in web. The vibration that a spider receives is defined with respect to the size of the sender spider and its distance. The mathematical model that was proposed by Cuevas et al. is as follows [60]:

$$Vibs_{\text{i}} = w_{\text{j}} e^{{{\text{d}}_{\text{i,j}}^{2} }}$$

where w j indicates the weight of the jth spider, and d i,j is the Euclidean distance between ith and jth spiders.

It is assumed in SSO that every spider is only able to feel three vibrations from other spiders as follows:

  • The nearest spider subject to having higher fitness (Vibc i).

  • The best spider in the swarm (Vibb i).

  • The nearest female, which is applicable for men (Vibf i).

As mentioned above, the operator for position updating of search agents is defined based on their gender. A female updates its position as follows [59]:

$$X_{\text{i}} \left( {t + 1} \right) = X_{\text{i}} \left( t \right) + \left( {\alpha \cdot Vibc_{\text{i}} \cdot \left( {S_{c} - X_{\text{i}} \left( t \right)} \right) + \beta \cdot Vibb_{\text{i}} \cdot \left( {S_{b} - X_{\text{i}} \left( t \right)} \right) + \delta \cdot \left( {r - 0.5} \right)} \right)$$
(3.1)

where α, β, δ, and r are random values in [0,1], S c indicates the closest best neighbour, and S b shows the fittest spider in the swarm.

This formula is applicable when a female is attracted towards the source of vibrations. In SSO, it is assumed that females can make a decision randomly to move towards or outwards the source. If they decide to move away from the source, the following formula is utilized [59]:

$$X_{\text{i}} \left( {t + 1} \right) = X_{\text{i}} \left( t \right) - \left( {\alpha \cdot Vibc_{\text{i}} \cdot \left( {s_{c} - X_{\text{i}} \left( t \right)} \right) + \beta \cdot Vibb_{\text{i}} \cdot \left( {S_{b} - X_{\text{i}} \left( t \right)} \right) + \delta \cdot \left( {r - 0.5} \right)} \right)$$
(3.2)

where α, β, δ, and r are random values in [0,1], S c indicates the closest best neighbour, and S b shows the fittest spider in the swarm.

Male spiders have different position updating procedure. Firstly, they are divided to two groups: dominated (D) and non-dominated (ND). ND spiders tend to be attracted toward females, whereas D spiders move towards the centre of male population. This behaviour is inspired by the tendency of ND spider to go to the more populated male areas and feed from the leftovers of other males and grow up to become an ND spider in future [59]. In order to define what make spider D, we have to calculate the median of the fitness (weight) of all male spiders in every iteration. Then, the spiders with fitness below the calculated median are considered as D-type and the rest are ND-type spiders. The mathematical model that was proposed to mimic the movement of males is as follows [59]:

For ND-type spiders:

$$X_{\text{i}} \left( {t + 1} \right) = X_{\text{i}} \left( t \right) + \left( {\alpha \cdot Vibf_{\text{i}} \cdot \left( {S_{f} - X_{\text{i}} \left( t \right)} \right) + \delta \cdot \left( {r - 0.5} \right)} \right)$$
(3.3)

For D-type spiders:

$$X_{\text{i}} \left( {t + 1} \right) = X_{\text{i}} \left( t \right) + \left( {\alpha \cdot \left( {\frac{{\sum\nolimits_{j = 1}^{{N_{m} }} {m_{j}^{k} \cdot W_{{N_{f} }} + j} }}{{\sum\nolimits_{j = 1}^{{N_{m} }} {W_{{N_{f} }} } }}} \right) - X_{\text{i}} (t)} \right)$$
(3.4)

where S f is the closest female to ith make, and β, δ, and r are random values in [0,1].

It may be seen in these equations that ND-type spiders are only able to move toward females and that there is no backward movement in contrast to female’s movement methods.

The last mechanism of the SSO algorithm for modifying the search agents is the mating operator. The ND males are required to mate with females who are within a certain radius called mating radius. There might be more than one male and female in the mating radius. Therefore, a roulette wheel mechanism randomly chooses parents proportional to their fitness values. A new spider is then constructed by the combination of genes (variables) of males and females. After producing new spiders, the fitness of them are calculated and compared to the worst spiders in the population. If any of the new spiders are fitter than a spider, it is added to the population with eliminating the less fit spider.

The SSO algorithm follows the following steps to solve optimization problems:

  1. (a)

    Begin the optimization process by generating spiders in random positions on the search space.

  2. (b)

    The spiders are assigned a gender (65–90 % female and the rest male).

  3. (c)

    The fitness of spiders is calculated by the objective function.

  4. (d)

    The best spider in the swarm, best female, and closest spider to each spider are defined.

  5. (e)

    Position of a spider is updated by (3.1) or (3.2) if it is female.

  6. (f)

    Position of a spider is updated by (3.3) or (3.4) if it is male.

  7. (g)

    The males and females located in the mating radius mate to create new spiders.

  8. (h)

    Produced new spiders are substituted with the worst spiders if they have better fitness value.

  9. (i)

    Steps c–h is iterated until the satisfaction of an end criterion.

Cuevas et al. [60] proved that the SSO algorithm is able to provide very competitive results compared with the well-known algorithms such as PSO and ABC. They observed and concluded that the division of spiders based on gender maintains the diversity of the balance between exploration and exploitation by each group. In addition, the vibration mechanism and movement methods promote exploration.

These observations also motivates our attempts to propose a SSO-based FNN trainer and investigate the performance of this algorithm in training NNs.

4 SSO-based Feedforward Neural Network

Ad discussed in Sect. 1, the learning of FNN mostly refers to the process of finding the best values of weights and biases. However, improving the learning of FNN is done with three methods when using heuristic algorithms [31]:

  1. 1.

    Defining the weights and biases.

  2. 2.

    Defining the structure of FNN.

  3. 3.

    Tuning the parameters of other learning methods (e.g. learning rate and momentum in BP).

In the first method, which is the most common method, the weights and biases are optimized by a meta-heuristic. The second method deals with optimizing the structure of FNN such as connections between neurons, number of neurons, number of hidden nodes, and number of hidden layers. The third method employs a heuristic algorithm as auxiliary method for another learner. In fact, meta-heuristics plays the role of parameter tuner in this case. In this study, we concentrate on the first method.

In order to formulate the problem of learning enhancement of FNN for a meta-heuristic, two steps should be taken:

  1. 1.

    Representing the weights and biases in a suitable format.

  2. 2.

    Defining the objective function.

The first step represents the variable of the problem for a given meta-heuristic. There are three main methods in the literature: vector, binary, and matrix approaches. Since the SSO algorithm employed in this work considers solutions in a vector, we choose the vector representation method [31, 39, 50]. An example of this approach is shown in Fig. 2.

Fig. 2
figure 2

Vector representation of weights and biases for heuristic algorithms

Figure 2 shows that the vector representation is a very simple method, in which the weights and biases are just added to a vector in order to be delivered to heuristic algorithms. After defining the vector of variables, an objective function should define its fitness. In FNN, generally speaking, the performance is defined by looking at the desirable output and the actual output of the network. The most common performance metric in the literature is mean squared error (MSE), which is calculated as follows:

$${\text{MSE}} = \mathop \sum \limits_{i = 1}^{m} \left( {o_{\text{i}}^{\text{k}} - d_{\text{i}}^{\text{k}} } \right)^{2}$$
(4.1)

where m is the number of outputs, d k i is the desired output of the ith input unit when the kth training sample is used, and o k i is the actual output of the ith input unit when the kth training sample appears in the input.

The key point here is that there is always more than one training sample in data sets. Therefore, a given FNN should be assessed based on its performance on all training samples. In this case, average of the MSE on all training samples is fruitful.

With problem representation and the objective function, the FNN is ready to be trained by the SSO algorithm. The flow chart of the proposed training algorithm is illustrated in Fig. 3.

Fig. 3
figure 3

General steps of the proposed SSO-based trainer

5 Results and discussion

In this section, we employ five data sets to benchmark the performance of the proposed method as presented in Table 1. This table show that we select five standard classification data sets from the University of California at Irvine (UCI) Machine Learning Repository [63]: XOR, balloon, Iris, breast cancer, and heart. It should be noted that the number of attributes is increased from the first to the last data set. We deliberately choose this set of data sets with different attributes to challenge the proposed algorithm and observe its performance. Obviously, the number of hidden nodes, weights, and biases are increased proportional to the number of attributes of data sets. To define the structure of FNNs for solving each data set, we chose 2 × N+1 hidden nodes where N is the number of attributes as per the recommendation of [64]. Other details of the data sets are available in Table 1.

Table 1 Data sets

For data verification, we compare the results with PSO and ACO as the best candidates and most well-known algorithms in the family of swarm-based optimization techniques. In addition, GA, ES, and PBIL are chosen as the best representatives of evolutionary algorithms. The initial values for the main parameters of SSO and these algorithms are provided in Table 2.

Table 2 Initial parameters of algorithms

For generating the results, each algorithm is run 10 times and the average (AVE) and standard deviation (STD) of MSE are chosen as the comparison metrics. Average of MSE will indicate the ability of algorithms in avoiding local solutions. In addition, the standard deviations show the variation of the results and stability of algorithm in avoiding local solutions. Another comparison metric reported in the results is the classification rate. We chose the best structure over 10 runs and calculate the classification rate using the test samples. This metric will assist us to see how well an algorithm in providing accurate results is. After all, we report the results in Tables 3, 4, 5, 6 and 7. Please note that we name the proposed learning method of this work FNNSSO and highlight the best results in boldface in the table of results.

Table 3 Statistical results for XOR data set over 10 independent runs
Table 4 Statistical results for balloon data set over 10 independent runs
Table 5 Statistical results for iris data set over 10 independent runs
Table 6 Statistical results for breast cancer data set over 10 independent runs
Table 7 Statistical results for heart data set over 10 independent runs

The results of the algorithm on XOR data set are provided in Table 3. This table shows that the minimum average of MSE is provided by the proposed FNNSSO algorithm. The standard deviations show that this algorithm performs very stable on this data set. The results of this algorithm is significantly better than FNNPSO and FNNACO in this data set, so this proves the merits of this algorithm in avoiding local solutions compared with other swarm-based optimization techniques. The FNNGA algorithm shows very competitive results compared with FNNPSO in terms of local optima avoidance and accuracy. It can be seen in Table 3 that the classification rate for FNNGA and FNNSSO are both equal to 100 %.

The results of the algorithms on the balloon data sets are almost similar to those of the XOR data set. Table 4 shows that the best result belong to FNNGA, closely followed by FNNSSO. Although the classification accuracies are similar for all algorithms, the average and standard deviation of MSE are different. Again, the results show that the local optima avoidance of the majority of evolutionary algorithms are higher than that of FNNPSO and FNNACO. As a swarm-based algorithm, however, FNNSSO provides very competitive results compared to evolutionary-based FNN trainers.

The results of Iris data set are shown in Table 5. The results are highly consistent with those of Table 3, in which FNNSSO shows the best results for AVE, STD, and classification rate. The significantly improved MSE and classification accuracy of FNNSSO algorithm are noticeable. The results of evolutionary algorithms (especially classification accuracy) are again better than FNNPSO and FNNACO.

The most difficult data sets in terms of dimensionality of learning problems are breast cancer and heart data sets. It may be observed in Tables 6 and 7 that the average and standard deviation of MSE are much better for FNNSSO algorithm This highly proves that the SSO algorithm is suitable for training FNNs due the high dimensional nature of this problem. The high exploration of this algorithm resulted in such a performance on these two challenging data sets. The classification accuracy of this algorithm is also good on these two data sets.

To sum up, the FNNSSO shows superior results compared with FNNGA on the majority of data sets. This is due to high exploration ability of this algorithm. Another behaviour observed is the better results of the evolutionary algorithms compared to the swarm-based optimization techniques employed in this work except SSO. This is because of the recombination operators of evolutionary algorithm which highly promote exploration. Despite this high exploration, the result of this work evidences that SSO algorithm also has a very high exploration.

Another behaviour observed in the results is the better performance of the FNNSSO algorithm on data sets as the difficulty increases. The results show that FNNSSO provide very good results on Iris, cancer, and heart data sets. The number of variables for these data sets to be optimized is 75, 209, and 1081, respectively. This evidences that the SSO algorithm is very good in optimizing high dimensional problems, which again is due to the high exploration of this algorithm. However, the classification accuracies are not as good as local optima avoidance, which is due to the exploitation of the SSO algorithm. Therefore, it seems that exploration of SSO is better that its exploitation, which assist this algorithm to show very high local optima avoidance and reasonable classification accuracy.

6 Conclusion

This work proposed a new training algorithm based on the recently proposed SSO algorithm for FNNs. The vector representation method was chosen to provide SSO with the weights and biases for optimization. The objective function was to minimize the average of MSE on all training samples. The performance of the proposed FNN trainer was benchmarked on five standard classification problems: XOR, balloon, Iris, breast cancer, and heart. The results of the proposed FNNSSO algorithm were compared to five other algorithms in the literature for verification: PSO, ACO, GA, ES, and PBIL. The results showed that the proposed method is very efficient in learning FNN, which is due to exploration and high local optima avoidance of this algorithm. We also observed that the results of FNNSSO were better on the majority of the data sets in terms of classification accuracy. Another finding was the relevancy of the performance of the FNNSSO algorithm and difficulty of the data set in terms of number of features. It was observed that the proposed algorithm outperforms other algorithms in Iris, breast cancer, and heart data sets, which is again due to high exploration of this algorithm. The paper also considered the comparison of other algorithms employed in this work. We found evolutionary algorithms outperformed swarm-based algorithms. The reason was discussed in terms of the intrinsic promotion in exploration using recombinations operators in evolutionary algorithms.

For future work, it is recommended to see the effectiveness of the proposed FNNSSO in training other types of NNs. In addition, methods for improving the exploitation of SSO algorithm are worth studying.