1 Introduction

Artificial neural networks are developed based on the analogy of human’s brain for making decision or learning different activities. Recently, a new model is under consideration, which is inspired by the emotional process of brain’s limbic system of mammalian. The brain emotional learning model has the advantages of lower computational complexity, high speed of convergence, and its stability. Moren and Balkenius proposed one of the famous computational models of brain emotional learning (BEL) [1, 2]. This model or modified version of it was used in different engineering applications [3,4,5,6,7,8,9,10]. Fakhrmoosavy et al. proposed an intelligent method for generating artificial earthquake records based on hybrid PSO-parallel brain emotional learning inspired model [10]. In spite of high ability of ANNs to simulate complex nonlinear problems, appropriate selection of learning parameters and initial weights have an important role on the training convergence of the model. Suitable selection of these parameters could lead the model to more results that are accurate and making it more reliable model. Different modified version of BEL proposed by researchers and used in real applications. Lotfi and akbarzadeh proposed the brain emotional learning-based pattern recognizer by extending the computational model of human brain limbic system. The authors used the proposed model for solving classification and chaotic time series prediction problems. The proposed model has advantages of more accuracy, less complexity, and more speed of training in comparison with standard MLP [11]. Parsapoor combined emotionally inspired structure with neuro-fuzzy inference system to propose a new model for solar activity prediction. The author found that the predictor model has a faster convergence than ANFIS and is reliable model for predicting the solar activity and other similar prediction problems [12]. Lotfi used the model of emotional process in the brain and proposed an image classifier model named brain emotional learning-based picture classifier. The author applied the activation function in the brain emotional model to improve the efficiency of proposed model. The results of simulation show the high speed of proposed model in comparison with multilayer perceptron neural network in training [13]. Parsapoor merged the model of brain emotional learning-based fuzzy inference system with radial basis function network and tested it by complex systems [14]. Lotfi and Keshavarz introduced the fuzzy mathematical model of brain system and used it for predicting the chaotic activity of earth’s magnetosphere. The authors fuzzified the connections in the limbic system model and implemented the inhibitory task of orbitofrontal cortex as a fuzzy decision making layer. The simulation showed the higher correlation of obtained results of fuzzy model in comparison with non-fuzzy models [15]. Lucas and Moghimi used a modified version of brain emotional learning model for designing an intelligent controller for auto landing system of aircrafts. As the existing automotive landing systems are activated only in a well-specified wind speed lamination condition, the proposed controller was more robust and had a better performance in the condition of strong wind gusts [16]. Parsapoor and Bilstrup proposed a new model of brain emotional learning-based prediction model by assigning adaptive networks to the different parts of original brain emotional learning model. The author used proposed model for predicting geomagnetic storms using the disturbance storm time index [12].

In this study, we combined optimization algorithms and learning automata with original model of BEL to propose modified new models to increase the accuracy and performance. This article is organized as follows: the concepts of brain emotional learning inspired model are presented in Sect. 2. Learning automata is described in Sect. 3. The optimization methods, which are used in this research, are expressed in Sect. 4. Power spectral density function as the frequency feature of earthquake record is presented in Sect. 5. Adaptive neuro-fuzzy inference system is explained in Sect. 6. Deep belief network for size reduction of features is described in Sect. 7. Two methods for improving the efficiency of original BEL model are proposed in Sect. 8. Numerical examples are illustrated in Sect. 9. Finally, concluding and remarks are presented in Sect. 10.

2 Brain emotional learning inspired model (BELIM)

The emotional part of mammalian’s brain has an important role of making a rapid response to the environmental stimuli. This part, which is called limbic system, includes different sections such as Thalamus, Sensory Cortex, Amygdala, and Orbitofrontal Cortex. Each section has a role in analyzing the input stimuli and making emotional response. The thalamus receives input information from environment. Pre-processing of signals is done in this section. The amygdala receives information of initial signals from thalamus that has not yet entered to the sensory cortex. Information of the other parts of the limbic system is also entered to the amygdala. Sensory cortex acts as a transition area in which signals through it from the rest of the cerebral cortex transmitted to the limbic system and vice versa. Therefore, it works as a communication area of the brain for controlling the behaviour. The orbitofrontal cortex (OFC) receives information from the sensory cortex. It has an important role in decision-making by its cognitive processors. In this area, learning is done by processing the stimuli and applying positive and negative reinforcement in the sense of reward and penalty. The other duty of OFC is controlling amygdala’s irrelevant responses.

Many researchers recently studied and used brain emotional learning inspired model to solve complex and nonlinear problems of mapping and making decision in the engineering science and industrial applications [4,5,6,7,8,9,10, 14]. Moren and Balkenius proposed one of the most important computational models in the field of emotional learning. This model shows the relationship between different parts of limbic system as shown in Fig. 1 with emphasize on interaction between amygdala and orbitofrontal cortex [2].

Fig. 1
figure 1

Different parts of limbic system in the BEL model [2]

The sensory inputs enter to the thalamus, which is a pathway for sending the input stimuli to sensory cortex and amygdala. In fact, the noise reduction and pre-processing will be done in the thalamus. In the case of having \(n\)inputs, they will be pre-processed in the thalamus. Then, it sends n normalized stimuli to the sensory cortex and one stimulus to the amygdala, which are as follows [2]:

$${S_{n+1}}=\hbox{max} \left( {{S_i}} \right);~~~i=1,~2,~ \ldots ,~n.$$
(1)

Sensory cortex is a transition area which transmits the sensory inputs to both amygdala and orbitofrontal cortex. Amygdala, which has the role of final decision-making, contains n + 1 A-nodes. For each A-node, there is a connection weight \({V_i}\). The output of each node is made by multiplying inputs \({S_i}\) to weights \({V_i}\):

$${A_i}={S_i}{V_i};~~i=1,2,3, \ldots ,n+1.$$
(2.2)

The output of amygdala is obtained by the summation of weighted inputs [2]:

$${E_a}=\mathop \sum \limits_{{i=1}}^{{n+1}} {A_i}.$$
(2.3)

The connection weights \({V}_{i}\) are modified monotonically based on the difference between the reinforcer \(R\) and summation of \(A-\)nodes:

$$\Delta {V_i}=~\alpha {S_i}{\left[ {R - \mathop \sum \limits_{{j=1}}^{{n+1}} {A_j}} \right]^+},$$
(2.4)

where \(\alpha\) is the learning rate in the interval of \(\left[\text{0,1}\right]\). If emotional reaction is learned by amygdala, it should be permanent; thus, the modification of weights \({V}_{i}\) is monotonically based on Eq. (2.4). There is another important part in the limbic system, called orbitofrontal cortex, which controls amygdala’s irrelevant responses. There is \(n\)o-node in the orbitofrontal cortex, which is obtained by multiplying sensory inputs \({S}_{i}\) by connection weights \({W}_{i}\) [2]:

$${O_i}={S_i}{W_i}.$$
(2.5)

Summation of O i makes the output of orbitofrontal cortex:

$${E_o}=\mathop \sum \limits_{{i=1}}^{n} {O_i}.$$
(2.6)

Connection weights of \({W}_{i}\) are updated as a function of the input and internal reinforcer for the orbitofrontal cortex:

$$\Delta {W_i}=\beta {S_i}{R_0},$$
(2.7)

where β is a learning rate parameter and internal reinforcer \({R_0}\) could be calculated by the following equation:

$${R_0}=\left\{ {\begin{array}{*{20}{c}} {~{{\left[ {\mathop \sum \limits_{i} {A_i} - R} \right]}^+} - \mathop \sum \limits_{i} {O_i}~~~~~~~~~~~~~{\text{if}}~~R \ne 0} \\ {~{{\left[ {\mathop \sum \limits_{i} {A_i} - \mathop \sum \limits_{i} {O_i}} \right]}^+}~~~~~~~~~~~~~~~~{\text{otherwise}}} \end{array}} \right..$$
(2.8)

The final output of limbic system will be calculated by the following equation [2]:

$$E={E_a} - {E_o}.$$
(2.9)

3 Learning automata

A stochastic learning automaton is a useful tool for making decision under uncertain conditions. It has been used in many engineering problems with a nondeterministic nature. Learning automata is an iterative process of selecting an action randomly, based on a probability density function and applying the action to the environment. The environment responses to the action and this response change the probability vector for selecting the next action. This process will be repeated to find the best action for finding an optimum or goal response of environment. Learning automata is divided to two main categories, finite action-set learning automata and continuous action-set learning automata, which are described in the following [17,18,19].

3.1 Finite action-set learning automata (FALA)

The action-set in FALA is always considered to be finite and predefined. Let \(A=\left\{ {{\alpha _1}, \ldots ,{\alpha _r}} \right\},~~r<\infty\) be the set of actions available at each instant n, the automaton selects an action \(\alpha \left( n \right)\) randomly based on its probability distribution, \(p\left( n \right)=\left\{ {{p_1}\left( n \right), \ldots ,{p_r}\left( n \right)} \right\}.\) The selected action is applied to the environment. Then, the environment responds to this action by stochastic reinforcement signal, \(\beta \left( n \right)\) as shown in Fig. 2. Afterward, the LA updates the probability distribution \(p\left( n \right)\) based on the selected action and reinforcement signal. The process of updating is done using different learning algorithms. Two kinds of fixed and variable structure FALA exist.

Fig. 2
figure 2

Structure of learning automata

Fig. 3
figure 3

Flowchart of GA for finding the weights or learning parameters of model

Examples of the fixed structure LA type are Tsetline, krinsky, and Krylov automata. Linear reward-inaction \({L_{R - I}}\), linear reward-ε-penalty \({L_{R - \varepsilon p}}\), linear reward-penalty \({L_{R - P}}\), and pursuit algorithm are examples of variable structure FALA. Variable structure FALA is used in this research. In the following, the algorithms, which are used in this study, will be described briefly [18, 19].

3.1.1 Linear reward-inaction algorithm, \({\varvec{L}}_{\varvec{R}-\varvec{I}}\)

The \({L_{R - I}}\) algorithm updates the action probabilities as described below. Let \(\alpha \left( n \right)={\alpha _i}\), then the action probability vector \(p\left( n \right)\) is updated as follows:

$$\begin{gathered} {p_i}\left( {n+1} \right)={p_i}\left( n \right)+~\lambda ~\beta \left( n \right)\left( {1 - {p_i}\left( n \right)} \right) \hfill \\ {p_j}\left( {n+1} \right)={p_j}\left( n \right) - ~\lambda ~\beta \left( n \right){p_j}\left( n \right);~~j \ne i, \hfill \\ \end{gathered}$$
(3.1)

where λ is the learning (step-size) parameter satisfying \(0 <\)λ \(< 1\) [18].

3.1.2 Linear reward-penalty algorithm, \({\varvec{L}}_{\varvec{R}-\varvec{P}}\)

If \(\alpha \left( n \right)={\alpha _i},\) then the probability vector is updated as follows:

$$\begin{gathered} {p_i}\left( {n+1} \right)={p_i}\left( n \right)+~{\lambda _1}~\beta \left( n \right)\left( {1 - {p_i}\left( n \right)} \right) - {\lambda _2}~\left( {1 - \beta \left( n \right)} \right){p_i}\left( n \right) \hfill \\ {p_j}\left( {n+1} \right)={p_j}\left( n \right) - ~{\lambda _1}~\beta \left( n \right)~{p_j}\left( n \right)+{\lambda _2}~\left( {1 - \beta \left( n \right)} \right)\left( {\frac{1}{{r - 1}} - {p_j}\left( n \right)} \right);\;j \ne i, \hfill \\ \end{gathered}$$
(3.2)

where λ 1 and λ 2 are learning parameters which usually λ 1 = λ 2 [18].

3.1.3 Pursuit Algorithm

The reward probabilities of actions are estimated in this algorithm by considering the history of selected actions and obtained reinforcement signal.

Let \(\alpha \left( n \right)={\alpha _i}\). The number of times which action \({\alpha _i}\) is chosen till instant \(n\)and the total reinforcement obtained in response to action \({\alpha _i}\) are saved in vectors \({\left( {{\eta _1}\left( n \right),~ \ldots ,{\eta _r}\left( n \right)} \right)^T} \, {\rm and} \, {\left( {{Z_1}\left( n \right),~ \ldots ,{Z_r}\left( n \right)} \right)^T},\) respectively. These vectors updates as follows:

$$\begin{gathered} Z_{i} \left( N \right) = Z_{i} \left( {N - 1} \right) + \beta \left( n \right) \hfill \\ Z_{j} \left( n \right) = Z_{j} \left( {n - 1} \right);\quad ~\forall j \ne i \hfill \\ \eta _{i} \left( n \right) = \eta _{i} \left( {n - 1} \right) + 1 \hfill \\ \eta _{j} \left( n \right) = \eta _{j} \left( {n - 1} \right);\quad ~\forall j \ne i \hfill \\ \hat{d}_{i} \left( n \right) = \frac{{Z_{i} \left( n \right)}}{{\eta _{i} \left( n \right)}};\quad i = 1, \ldots ,r, \hfill \\ \end{gathered}$$
(3.3)

where \(\hat {d}\) is the estimator vector, which is used for updating the probability of actions. Let \({\hat {d}_{M(n)}}\) be the highest estimated reward probability at instant. If the estimates are true, the value of p m (n) should be one and the rest of action probabilities should be zero. In other words, \(p\left( n \right)={e_{M\left( n \right),}}\) where \({e_{M\left( n \right)}}\) is a vector, which its mth element is one and the other elements are zero. This algorithm updates probability of actions by moving p(n) towards \({e_{M\left( n \right)}}\) by a small amount determined by a learning parameter as follows:

$$p\left( {n+1} \right)=p\left( n \right)+\lambda \left( {{e_{M\left( n \right)}} - p\left( n \right)} \right),$$
(3.4)

where \(0 <\)λ ≤ 1 is the learning parameter and the index \(M\left( n \right)\) is determined by the following [18, 19]:

$${\hat {d}_{M\left( n \right)}}={\text{max}}{\hat {d}_i}\left( n \right).$$
(3.5)

3.2 Continuous action-set learning automata (CALA)

In the FALA, the actions are a finite set with predefined values. These actions could not be more suitable for finding optimal parameter values to maximize a performance index. In fact, in this case, a continuous action set is needed. The automaton, which uses continuous action set, is called CALA. In this algorithm, the probability distribution of actions at instant \(n\) is \(N\left( {\mu \left( n \right),\sigma \left( n \right)} \right),\) which is the normal distribution with mean \(~\mu \left( n \right)\) and standard deviation \(\sigma (n)\). By updating (n) and \(\sigma (n)\) in each instant, the CALA updates the probability distribution of actions. Let \(\alpha \left( n \right) \in {\mathbb{R}}\) be the action chosen and let \(\beta (n)\) be the reinforcement signal at instant n. A reward function \(f:~{\mathbb{R}} \to {\mathbb{R}}\) instead of reward probabilities is defined by \(f\left( x \right)={\text{{\rm E}}}\left[ {\beta \left( n \right)|\alpha \left( n \right)=x} \right].\) The reinforcement in response to action \(x\) denoted by \({\beta _x}\):

$$f\left( x \right)={\text{{\rm E}}}{\beta _x}.$$
(3.6)

The role of CALA is finding the value of \(x\) to maximize the \(f\left( x \right).\) In this case, the \(N\left( {\mu \left( n \right),\sigma \left( n \right)} \right)\) converges to \(N\left( {{x_0},0} \right),\) where the reward function has its maximum value at \({x_0}.\) To avoid of being stuck at a non-optimal point the CALA, lets \(\sigma (n)\) converge to \({\sigma _l}\) instead of zero, which \({\sigma _l}\) has a very small value. CALA interact with the environment by choosing of two actions \(x\left( n \right)\) and \(~\mu \left( n \right)\) at each instant. The value of \(x\left( n \right)\) is generated randomly using probability distribution of \(N\left( {\mu \left( n \right),\phi \left( {\sigma \left( n \right)} \right)} \right).\) Two actions \(x\left( n \right)\) and \(~\mu \left( n \right)\) apply to the environment and CALA updates the probability distribution by updating mean and standard deviation of actions as follows [19]:

$$\begin{gathered} {{\varvec{\upmu}}}\left( {{\text{n}}+1} \right)={{\varvec{\upmu}}}\left( {\text{n}} \right)+{{\varvec{\uplambda}}}\frac{{\left( {{{{\varvec{\upbeta}}}_{\text{x}}} - {{{\varvec{\upbeta}}}_{{\varvec{\upmu}}}}} \right)}}{{\phi \left( {{{\varvec{\upsigma}}}\left( {\text{n}} \right)} \right)}}\frac{{\left( {{\text{x}}\left( {\text{n}} \right) - {{\varvec{\upmu}}}\left( {\text{n}} \right)} \right)}}{{\phi \left( {{{\varvec{\upsigma}}}\left( {\text{n}} \right)} \right)}} \hfill \\ \sigma \left( {n+1} \right)=\sigma \left( n \right)+\lambda \frac{{\left( {{\beta _x} - {\beta _\mu }} \right)}}{{\phi \left( {\sigma \left( n \right)} \right)}}\left[ {{{\left( {\frac{{\left( {x\left( n \right) - \mu \left( n \right)} \right)}}{{\phi \left( {\sigma \left( n \right)} \right)}}} \right)}^2} - 1} \right]+\lambda \left\{ {C\left[ {{\sigma _l} - \sigma \left( n \right)} \right]} \right\}, \hfill \\ \end{gathered}$$
(3.7)

where

$$\phi \left( \sigma \right)=\left\{ {\begin{array}{*{20}{c}} {{\sigma _l};~\sigma \leqslant {\sigma _l}} \\ {\sigma;~\sigma>{\sigma _l}} \end{array}} \right.$$

and, \(\lambda\): learning parameter \(\left( {0<\lambda \leqslant 1} \right),~c\): large positive constant.

4 Used optimization methods

4.1 Genetic algorithm

Genetic algorithm is an evolutionary computing method for finding the optimum value of a simple to very complex and nonlinear function. This algorithm was inspired by the biological rules in the nature and has been used in many optimization problems especially in the engineering fields. The GA consists of three main operators, selection, crossover, and mutation [20,21,22]. Figure 3 shows the steps of GA and its application on this research. GA used two times in this study: first time, for finding the optimum value of BEL’s weights in one optimization problem and the second time for finding the best values of learning rate parameters.

4.2 Particle swarm optimization (PSO)

PSO algorithm was inspired by the social behaviour of animals, such as bird flocking or fish schooling. Each individual in this algorithm is called particle. Any particle moves to new position based on its past position and current velocity. Each particle updates its velocity based on the information which it gets by own and from other particles in the swarm [23, 24]. This algorithm was used in many areas of engineering fields [25,26,27,28,29,30,31,32]. PSO algorithm is used in this research for finding the optimum value of BEL weights or learning rate parameters as follows.

figure a

4.3 Artificial bee colony (ABC)

ABC was inspired by the behaviour of honeybees in the nature for finding food sources. In this algorithm, there are three groups of honeybees, which are named employee, onlooker, and scout bees. Scout bees find the food sources, which are the solution for each parameter randomly, and then, employee bees go to the food sources. They evaluate the amount of food source nectar (fitness function). Afterwards, the onlooker bees select the best food source after evaluating the information, which obtained from employee bees [33]. ABC algorithm is used in this research for finding the optimum value of BEL weights or learning rate parameters as follows.

figure b

5 Power spectral density function (PSDF)

If \(x\left( t \right)\) is a random variable, its time-history is not periodic, in general. Thus, it is not possible to show it by its discrete Fourier series. In addition, for a stationary process \(x\left( t \right)\), the condition below may not be satisfied:

$$\int\limits_{{ - \infty }}^{\infty } {\left| {x(t)} \right|{\text{ d}}t < \infty } .$$
(5.1)

Therefore, it is not possible to calculate the Fourier series of \(x\left( t \right)\). It may be overcome to this problem using autocorrelation function \({R_x}\left( \tau \right)\) instead of \(x\left( t \right)\). Autocorrelation function contains the frequency content of process indirectly and defined as the mean value of the product \(x\left( t \right)x\left( {t+\tau } \right)\), which \(\tau\) is a time difference between sampling. If \(x\left( t \right)\) is a stationary process, the value of \(E\left[ {x\left( t \right)x\left( {t+\tau } \right)} \right]\) is independent of time:

$$E\left[ {x\left( t \right)x\left( {t+\tau } \right)} \right]=f\left( \tau \right)={R_x}\left( \tau \right),$$
(5.2)

where \(E\left[ . \right]\) is the expected value.

Power spectral density function, \({S_x}\left( \omega \right)\), is the Fourier transform of autocorrelation function of \(x\) [34]:

$${S_x}\left( \omega \right)=\frac{1}{{2\pi }}\int\limits_{{ - \infty }}^{\infty } {{R_x}\left( \tau \right){e^{ - i\omega \tau }}} d\tau .$$
(5.3)

6 Adaptive neuro-fuzzy inference system (ANFIS)

  • A fuzzy inference system with fuzzy if–then rules can model the qualitative aspects of human knowledge and reasoning processes without quantitative analysis. ANFIS acts in this way. Suppose that there is a system with two inputs x, y and one output z. The rule base contains two fuzzy if–then rules of Takagi and Sugeno type [35].

$${\text{Rule 1:}}\, {\text{if}}~x~{\text{is}}~{A_1}~{\text{and~}}y~{\text{is}}~{B_1}~{\text{then}}~{f_1}={p_1}x+{q_1}y+{r_1},$$
$${\text{Rule 2:}}\, {\text{if}}~x~{\text{is}}~{A_2}~{\text{and}}~y~{\text{is}}~{B_2}~{\text{then}}~{f_2}={p_2}x+{q_2}y+{r_2}$$
  • Ai, Bi and Fi are fuzzy sets and pi, qi and ri are the outputs of the system which obtains from the learning process. Figure 4 shows the type-3 ANFIS.

  • Layer 1 (fuzzification): in this layer, the membership grades are generated by Eqs. (6.1), (6.2), and (6.3) [35]:

Fig. 4
figure 4

Structure of the ANFIS with two inputs of x and y

$$O_{i}^{1}=\mu {A_i}\left( x \right),$$
(6.1)
  • \({\text{where}}~\,O_{i}^{1}\) is the membership function of A i . \(\mu {A_i}\left( x \right)\) is bell shaped with maximum of 1 and minimum of 0 such as

$$\mu {A_i}\left( x \right)=\frac{1}{{1+{{\left[ {{{\left( {\frac{{x - {c_i}}}{{{a_i}}}} \right)}^2}} \right]}^{{b_i}}}}}$$
(6.2)
  • Or

$$\mu {A_i}\left( x \right)=exp\left\{ { - {{\left( {\frac{{x - {c_i}}}{{{a_i}}}} \right)}^2}} \right\},$$
(6.3)
  • which \(\left\{ {{a_i},{b_i},{c_i}} \right\}\) is the parameter set.

  • Layer 2 (production) : Each node in this layer is a circle node and multiplies the incoming signals. Each node output represents the firing strength of a rule [35]:

$${w_i}=\mu {A_i}\left( x \right) \times \mu {B_i}\left( x \right)~;~~i=1,2.$$
(6.4)
  • Layer 3 (normalization): The ith node calculates the ratio of the ith rule’s firing strength to the sum of all rules’ firing strengths [35]:

$${\bar {w}_i}=\frac{{{w_i}}}{{{w_1}+{w_2}}},~~i=1,2.$$
(6.5)
  • Layer 4 (defuzzification): Every node \(i\) in this layer is a square node with a function [35]:

$$O_{i}^{4} = \bar{w}_{\'i } f_{i} = \bar{w}_{\'i } \left( {p_{i} x + q_{i} y + r_{i} } \right),$$
(6.6)
  • where \({\bar {w}_i}\): Output of layer 3

  • \(\left\{ {{p_i},{q_i},{r_i}} \right\}\): Parameter set.

  • Layer 5 (output): The summation of all incoming signals is computed by the following [35]:

$$O_{1}^{5}={\text{overall~output}}=\mathop \sum \nolimits^{} {\bar {w}_i}{f_i}=\frac{{\mathop \sum \nolimits_{i} {w_i}{f_i}}}{{\mathop \sum \nolimits_{i} {w_i}}}.$$
(6.7)

7 Deep belief network (DBN)

  • Artificial neural network is one of the most important tools in the field of artificial intelligence. It has many applications such as object recognition, mapping, signal analysis, and so on. According to the theoretical and biological reasons, it is recommended to use deep architecture including many nonlinear processing layers in the structure of ANNs. These deep models have many hidden layers and parameters, which should be trained. It increases the computational effort and decreases the speed of training significantly. In addition, it may cause to get into local minimum. Using DBN could overcome these problems [36, 37]. The layers of DBN consist of Restricted Boltzmann Machines (RBM) which are probabilistic models with one hidden layer. The DBN tries to reconstruct the inputs at the output layer. For this purpose, the hidden layer should be described the data of input layer, as well. Not only DBN is useable for classification, but also it is capable of extracting feature. In fact, DBN is able to extract the most important features of training data [38]. In addition, DBN could be used to reduce the dimensionality of data [38]. After extracting the features of earthquake records (PSDF) in the thalamus, the DBN is used in the sensory cortex to reduce the size of features. Therefore, the most effective features will be found. For this purpose, the number of neurons in the hidden layer should be less than the number of neurons in the input layer. If the reduced-dimension data could explain the input data, it could be used as representative of whole features. In this research, 101 features obtain for each earthquake record and considered as the inputs and outputs of DBN, as shown in Fig. 5. The number of hidden neurons is chosen 70, which are less than input and output neurons. The network will be trained. Therefore, the features in the hidden layer could be used as inputs of amygdala and orbitofrontal cortex.

Fig. 5
figure 5

Deep belief network for size reduction of features

Fig. 6
figure 6

Structure of First Proposed Model

Fig. 7
figure 7

Structure of Second Proposed Model

Fig. 8
figure 8

PSDF of four used samples as the input of models

Fig. 9
figure 9

Results of training first proposed model with different optimization algorithms

Fig. 10
figure 10

Training second proposed model with different learning algorithm of LA

8 Proposed method

  • Two methods are proposed in this research for predicting the level of fear encounter to an earthquake. In both methods, a modified BEL model is used. The input signal of the model is earthquake record, which shows the ground motion accelerations. As the ground accelerations are recorded in a short time steps (for example 0.02 s), there are many data points in each earthquake record, which increase the calculations of model. In addition, two earthquakes, which are similar in magnitude and intensity, could be completely different in acceleration–time diagram. Therefore, the feature extraction is considered in the thalamus to use a more relevant stimulus for sensory cortex. Frequency content is the most important feature of earthquake signals. In this research, power spectral density function is used as the frequency content of earthquake records for sending to sensory cortex. In this way, the calculations will be reduced significantly. Furthermore, more sensible stimulus will be used for predicting the fear induced by earthquake in the brain emotional model. In sensory cortex, we used a deep belief network for decreasing the size of PSDF. Afterwards, it sends the reduced features to orbitofrontal cortex and amygdala. It should be mentioned that the output of the model is the level of fear. In fact, the level of fear is a qualitative value, which could not be predicted directly using brain emotional learning (BEL) model. To make a relationship between the values, which calculate at amygdala and the level of fear, an adaptive neuro-fuzzy inference system, is used in the amygdala to make fuzzy rules. These fuzzy rules are used in the amygdala to find the level of fear caused by the earthquake.

8.1 First proposed method

  • In the first proposed method, the structure of BEL model is used without considering the original learning algorithm of BEL model. In this method, the weights are found and modified using different optimization algorithms such as GA, PSO, ABC, and CALA. In fact, this is an optimization problem, which its design variables are weights of modified BEL model and the goal is minimizing the error of model to find the fear. Different optimization algorithms are considered in the first proposed methods to find the most efficient algorithm for the kind of input–output data at hand. Figure Fig. 6 shows the structure of the first proposed method.

8.2 Second proposed method

  • In the second proposed method, learning automata is used for finding the learning parameters of modified BEL model (MBEL). In fact, MBEL model is used as the environment. The difference between the output of amygdala and target value shows the absolute error of the model. The relative error could be calculated and the response of environment will be defined as unity minus relative error. In the BEL model, the output and the target are scaled to be in the interval of [0, 1]. Therefore, the relative error could be in the interval of [0, 1], as well. If action selects properly in an ideal case, the relative error will be zero and the response of environment will be one, which shows a reward. In the other case: if an improper action selects, the relative error in the worth case will have its maximum value of one and the response of environment will be zero which shows a penalty. The response applies to LA for making decision of finding the best action, which are the learning parameters of network. These parameters are used in the weight correction algorithm of MBEL model. Three learning algorithm, \({L_{R - I}}\), \({L_{R - P}}\), \({L_{P - A}}\), are used in this research to find suitable algorithm for the data which are used in this study. These algorithms described in Sect. 3.

  • Selection of actions is done in three different methods:

  • selecting actions randomly (S 1)

  • selecting actions randomly with equal number of all actions (S2)

  • selecting actions based on their probability (S 3)

Figure Fig. 7 shows the second proposed method in detail.

9 Numerical examples

9.1 Data set

A data set consists of 100 real earthquake records, which is provided by the Pacific Earthquake Engineering Research Center (PEER) Website [39]. 70% of the data set are used for training the proposed model and the rest of data are used for testing the performance of trained model. Earthquakes are different in time duration, frequency content, magnitude, and peak ground acceleration. These records, which show the acceleration time-history of earthquakes, are used as the input of proposed models.

9.2 Example for the first proposed model

The earthquakes select randomly from the data set for applying to the model as the input. Feature extraction is done in the thalamus and PSDF is obtained for each earthquake record. In fact, PSDF shows the frequency content of earthquake which is one of the most important features of each earthquake record. Figure 8 shows four samples of PSDF which are used in this research.

The features entered into sensory cortex. DBN is used in this section for reducing the size of features. The training of the model is done using all specified training data. Figure 9 shows the comparison between original BEL and using different optimization algorithms for finding the appropriate weights of the first proposed method. According to Fig. 9, BEL-CALA is the best optimization algorithm for this example even better than original BEL.

To illustrate the ability of trained model, 30% of provided data were used as test data. The average relative error for BEL, BEL-GA, BEL-ABC, BEL-PSO, and BEL-CALA is 2.97, 3.82, 7.04, 1.92, and 1.21%, respectively. Table 1 demonstrates the results of ten samples of test data for different optimization algorithm and original BEL, which are used in the first proposed method.

Table 1 Results of the first proposed model for ten samples of test data

As expected from the training results, BEL-CALA is the best model, which uses CALA algorithm for training the model. The level of fear is related to the amount of magnitude of the earthquakes. Table 2 illustrates the relationship between the earthquake magnitude and the level of fear, which is considered in this research.

Table 2 Relationship between earthquake magnitude and assigned fear level (USGS)

Although there are small differences between the real and predicted magnitude using the PSO and CALA optimization algorithms in the first proposed model, the level of fear predicted successfully by these algorithms, as shown in Table 3.

Table 3 Comparison between assigned and predicted levels of fear
Table 4 Results of the second proposed model on predicting the magnitude of ten samples from test data
Table 5 Results of the second model on predicting the level of fear for ten samples from test data

Table 3 shows that the BEL-PSO and BEL-CALA models could exactly predict the level of fear and are reliable models for this purpose.

9.3 Example for the second proposed model

In this example, the second proposed method is used for training the model. In fact, the weights are corrected using BEL Formula and learning automata is used for finding the best values of learning parameters. Different learning algorithms are used for this purpose, which their results are shown in Fig. 10.

Three methods of selection are used in \({L_{R - I}}\) and \({L_{R - P}}\) learning algorithms, while \({L_{P - A}}\) works with \({S}_{3}\) method only. According to Fig. 10, the third method of selection is the best choice for all learning algorithms.

Figure 11 shows the comparison between different learning algorithms. Although Fig. 11 illustrates that \({L_{R - P}}\) has better training in the early epochs, but the results show that \({L_{R - I}}\)has the best training in overall.

Fig. 11
figure 11

Comparison between different learning algorithms of LA on training second proposed model

Fig. 12
figure 12

Performance of different algorithms on training second proposed model

For comparing the results, other algorithms such as CALA, GA, PSO, and ABC are used for finding the learning parameters. Figure 12 shows the performance of different algorithms on training second proposed model. The result shows that BEL-CALA and BEL-PSO have the minimum error, BEL, BEL-GA, and BEL-L_RI trained with a moderate amount of error, and BEL-ABC has the maximum error.

Test data are used to show the ability of trained model for predicting the magnitude of earthquakes. The mean relative error of BEL, BEL-GA, BEL-ABC, BEL-PSO, BEL-\({L}_{R-I}\), and BEL-CALA obtained 2.97, 2.43, 5.13, 1.18, 2.63, and 0.83%, respectively. Table 4 illustrates the results of ten samples of test data. According to Table (4), the BEL-CALA model has the best performance among different algorithms. In addition, BEL-PSO could predict the magnitude with acceptable amount of error.

Table 5 shows the result of different algorithms used in the second proposed model for predicting the level of fear for ten samples of test data. In spite of small errors in predicting the magnitude, Table (5) shows high accuracy of second model to predict the level of fear.

10 Conclusions

In this research, two modified BEL models proposed to improve the efficiency and accuracy of original BEL model. In both proposed models, ANFIS was used in amygdala to make fuzzy rules. Moreover, deep belief network was used in the sensory cortex for size reduction of features of earthquake record. GA, PSO, ABC, and learning automata were used in the first proposed model for finding the weights, and they were applied in the second proposed model to obtain learning parameters. The proposed models were used for magnitude and fear prediction of earthquakes to illustrate their performance. The following conclusions are drawn according to the results of numerical examples:

  • Both proposed methods are more accurate than original BEL model.

  • Except ABC, the other algorithms which were used in the first proposed model lead to improve the efficiency of original BEL model.

  • All algorithms which were used in the second proposed model could improve the accuracy of model in comparison with BEL model.

  • CALA algorithm in both proposed models illustrates the best accuracy among other used algorithms.