A New Computational Approach to the Levenberg-Marquardt Learning Algorithm

Bilski, Jarosław; Kowalczyk, Barosz; Smola̧g, Jacek

doi:10.1007/978-3-031-23492-7_2

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13588))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

497 Accesses

Abstract

A new parallel computational approach to the Levenberg-Marquardt learning algorithm is presented. The proposed solution is based on the AVX instructions to effectively reduce the high computational load of this algorithm. Detailed parallel neural network computations are explicitly discussed. Additionally obtained acceleration is shown based on a few test problems.

This work has been supported by the Polish National Science Center under Grant 2017/27/B/ST6/02852.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Parallel Approach to the Levenberg-Marquardt Learning Algorithm for Feedforward Neural Networks

The Parallel Modification to the Levenberg-Marquardt Algorithm

The Parallel Approach to the Conjugate Gradient Learning Algorithm for the Feedforward Neural Networks

Keywords

1 Introduction

Artificial feedforward neural networks have been studied by many scientists e.g. [2, 12, 14, 27, 28, 31, 43, 45]. One of the most frequently used methods for training feedforward neural networks are gradient methods, see e.g. [18, 29, 44]. Most of the simulations of neural networks learning algorithms, like other learning algorithms [19, 20, 30, 33, 34, 36, 40, 41], work on a serial computer. The computational complexity of many learning algorithms is very high. This makes serial implementation very time consuming and slow. The Levenberg Marquart (LM) algorithm [21, 26] is one of the most effective learning algorithms, unfortunately, it requires a lot of calculations. But, for very large networks the computational load of the LM algorithm makes it impractical. A suitable solution to this problem is the use of high performance dedicated parallel structures, see eg. [3, 5,6,7,8,9,10,11,12,13, 38, 39, 48]. This paper shows a new parallel computational approach to the LM algorithm based on vector instruction. The results of the study of a new parallel approach to the LM algorithm is shown in the last part of the paper.

A sample structure of the feedforward neural network is shown in Fig. 1. This sample network has L layers, $N_l$ neurons in each $l-th$ layer, and $N_L$ outputs. The input vector contains $N_0$ input values.

The Eq. (1) describes the recall phase of the network

$$\begin{aligned} \begin{array}{l} s_i^{\left( l \right) }\left( t \right) = \sum \limits _{j = 0}^{{N_{l - 1}}} {w_{ij}^{\left( l \right) }\left( t \right) x_i^{\left( l \right) }\left( t \right) }, y_i^{\left( l \right) } (t) = f(s_i^{\left( l \right) } (t)). \\ \end{array} \end{aligned}$$

(1)

The Levenberg-Marquard method [21, 26] is used to train the feedforward neural network. The following loss function is minimized

$$\begin{aligned} E\left( {\mathbf{{w}}\left( n \right) } \right) = \frac{1}{2}\sum \nolimits _{t = 1}^Q {\sum \nolimits _{r = 1}^{{N_L}} {\varepsilon _r^{{{\left( L \right) }^2}}} \left( t \right) } = \frac{1}{2}\sum \nolimits _{t = 1}^Q {\sum \nolimits _{r = 1}^{{N_L}} {{{\left( {y_r^{\left( L \right) }\left( t \right) - d_r^{\left( L \right) }\left( t \right) } \right) }^2}} } \end{aligned}$$

(2)

where $\varepsilon _i^{\left( L \right) } $ is defined as

$$\begin{aligned} \varepsilon _r^{\left( L \right) }(t) = \varepsilon _r^{\left( {Lr} \right) }(t) = y_r^{\left( L \right) }(t) - d_r^{\left( L \right) }(t) \end{aligned}$$

(3)

and $d_r^{\left( L \right) }(t)$ is the $r-th$ desired output in the $t-th$ probe.

The LM algorithm is a modification of the Newton method and is based on the first three elements of the Taylor series expansion of the loss function. A change of weights is given by

$$\begin{aligned} \Delta \left( {\mathbf{{w}}(n)} \right) = - {\left[ {{\nabla ^\mathbf{{2}}}{} \mathbf{{E}}\left( {\mathbf{{w}}(n)} \right) } \right] ^{ - 1}}\nabla \mathbf{{E}}\left( {\mathbf{{w}}(n)} \right) \end{aligned}$$

(4)

this requires knowledge of the gradient vector

$$\begin{aligned} \nabla \mathbf{{E}}\left( {\mathbf{{w}}(n)} \right) = {\mathbf{{J}}^T}\left( {\mathbf{{w}}(n)} \right) \mathbf{{\varepsilon }}\left( {\mathbf{{w}}(n)} \right) \end{aligned}$$

(5)

and the Hessian matrix

$$\begin{aligned} {\nabla ^\mathbf{{2}}}{} \mathbf{{E}}\left( {\mathbf{{w}}(n)} \right) = {\mathbf{{J}}^T}\left( {\mathbf{{w}}(n)} \right) \mathbf{{J}}\left( {\mathbf{{w}}(n)} \right) + \mathbf{{S}}\left( {\mathbf{{w}}(n)} \right) \end{aligned}$$

(6)

where $\textbf{J}\left( {\mathbf{{w}}(n)}\right) $ in (5) and (6) is the Jacobian matrix

$$\begin{aligned} \mathbf{{J}}(\mathbf{{w}}\left( n \right) ) = \left[ {\begin{array}{*{20}{c}} {\frac{{\partial \varepsilon _1^{\left( L \right) }(1)}}{{\partial w_{10}^{(1)}}}}&{} \cdots &{}{\frac{{\partial \varepsilon _1^{\left( L \right) }(1)}}{{\partial w_{ij}^{(k)}}}}&{} \cdots &{}{\frac{{\partial \varepsilon _1^{\left( L \right) }(1)}}{{\partial w_{{N^L}{N^{L - 1}}}^{(L)}}}}\\ \vdots &{} \cdots &{} \vdots &{} \cdots &{} \vdots \\ {\frac{{\partial \varepsilon _{{N_L}}^{\left( L \right) }(1)}}{{\partial w_{10}^{(1)}}}}&{} \cdots &{}{\frac{{\partial \varepsilon _{{N_L}}^{\left( L \right) }(1)}}{{\partial w_{ij}^{(k)}}}}&{} \cdots &{}{\frac{{\partial \varepsilon _{{N_L}}^{\left( L \right) }(1)}}{{\partial w_{{N_L}{N_{L - 1}}}^{(L)}}}}\\ \vdots &{} \cdots &{} \vdots &{} \cdots &{} \vdots \\ {\frac{{\partial \varepsilon _{{N_L}}^{\left( L \right) }(Q)}}{{\partial w_{10}^{(1)}}}}&{} \cdots &{}{\frac{{\partial \varepsilon _{{N_L}}^{\left( L \right) }(Q)}}{{\partial w_{ij}^{(k)}}}}&{} \cdots &{}{\frac{{\partial \varepsilon _{{N_L}}^{\left( L \right) }(Q)}}{{\partial w_{{N_L}{N_{L - 1}}}^{(L)}}}} \end{array}} \right] . \end{aligned}$$

(7)

In the hidden layers the errors $\varepsilon _i^{\left( lr \right) } $ are calculated as follows

$$\begin{aligned} \varepsilon _i^{\left( {lr} \right) }\left( t \right) \buildrel \wedge \over = \sum \limits _{m = 1}^{{N_{l + 1}}} {\delta _i^{\left( {l\mathrm{{ + }}1,r} \right) }\left( t \right) w_{mi}^{\left( {l + 1} \right) }}, \end{aligned}$$

(8)

$$\begin{aligned} \delta _i^{\left( {lr} \right) }\left( t \right) = \varepsilon _i^{\left( {lr} \right) }\left( t \right) f'\left( {s_i^{\left( {lr} \right) }\left( t \right) } \right) . \end{aligned}$$

(9)

Based on this, the elements of the Jacobian matrix for each weight can be computed

$$\begin{aligned} \frac{{\partial \varepsilon _r^{\left( L \right) }\left( t \right) }}{{w_{ij}^{\left( l \right) }}} = \delta _i^{\left( {lr} \right) }\left( t \right) x_j^{\left( l \right) }\left( t \right) . \end{aligned}$$

(10)

It should be noted that derivatives (10) are computed in a similar way it is done in the classical backpropagation method, except that each time there is only one error given to the output. In this algorithm, the weights of the entire network are treated as a single vector and their derivatives form the Jacobian matrix $\textbf{J}$.

The $\mathbf{{S}}\left( {\mathbf{{w}}(n)} \right) $ component (6) is given by the formula

$$\begin{aligned} \mathbf{{S}}\left( {\mathbf{{w}}(n)} \right) = {\sum \nolimits _{t = 1}^Q {\sum \nolimits _{r = 1}^{{N_L}} {\varepsilon _r^{\left( L \right) }\left( t \right) } \nabla } ^2}\varepsilon _r^{\left( L \right) }\left( t \right) . \end{aligned}$$

(11)

In the Gauss-Newton method it is assumed that $\mathbf{{S}}\left( {\mathbf{{w}}(n)} \right) \approx 0$ and that equation (4) takes the form

$$\begin{aligned} \Delta \left( {\mathbf{{w}}(n)} \right) = - {\left[ {{\mathbf{{J}}^T}\left( {\mathbf{{w}}(n)} \right) \mathbf{{J}}\left( {\mathbf{{w}}(n)} \right) } \right] ^{ - 1}}{\mathbf{{J}}^T}\left( {\mathbf{{w}}(n)} \right) \mathbf{{\varepsilon }}\left( {\mathbf{{w}}(n)} \right) . \end{aligned}$$

(12)

In the Levenberg-Marquardt method is is assumed that $\mathbf{{S}}\left( {\mathbf{{w}}(n)} \right) = \mu \textbf{I}$ and that equation (4) takes the form

$$\begin{aligned} \Delta \left( {\mathbf{{w}}(n)} \right) = - {\left[ {{\mathbf{{J}}^T}\left( {\mathbf{{w}}(n)} \right) \mathbf{{J}}\left( {\mathbf{{w}}(n)} \right) + \mu \mathbf{{I}}} \right] ^{ - 1}}{\mathbf{{J}}^T}\left( {\mathbf{{w}}(n)} \right) \mathbf{{\varepsilon }}\left( {\mathbf{{w}}(n)} \right) . \end{aligned}$$

(13)

By defining

$$\begin{aligned} \begin{array}{c} \mathbf{{A}}\left( n \right) = - \left[ {{\mathbf{{J}}^T}\left( {\mathbf{{w}}(n)} \right) \mathbf{{J}}\left( {\mathbf{{w}}(n)} \right) + \mu \mathbf{{I}}} \right] \\ \mathbf{{h}}\left( n \right) = {\mathbf{{J}}^T}\left( {\mathbf{{w}}(n)} \right) \mathbf{{\varepsilon }}\left( {\mathbf{{w}}(n)} \right) \end{array} \end{aligned}$$

(14)

the Eq. (13) is as follows

$$\begin{aligned} \Delta \left( {\mathbf{{w}}(n)} \right) = \mathbf{{A}}{\left( n \right) ^{ - 1}}{} \mathbf{{h}}\left( n \right) . \end{aligned}$$

(15)

The Eq. (15) can be solved using the QR factorization

$$\begin{aligned} {\mathbf{{Q}}^T}\left( n \right) \mathbf{{A}}\left( n \right) \Delta \left( {\mathbf{{w}}(n)} \right) = {\mathbf{{Q}}^T}\left( n \right) \mathbf{{h}}\left( n \right) , \end{aligned}$$

(16)

$$\begin{aligned} \mathbf{{R}}\left( n \right) \Delta \left( {\mathbf{{w}}(n)} \right) = {\mathbf{{Q}}^T}\left( n \right) \mathbf{{h}}\left( n \right) . \end{aligned}$$

(17)

This paper used the Givens rotations for the QR factorization. The operation, in 5 steps, of the LM algorithm is described below:

1.
The calculation of the network outputs for all input data, errors, and the loss function.
2.
The calculation of the Jacobian matrix, using the backpropagation method for each error individually.
3.
The calculation of weight changes $\Delta \left( {\mathbf{{w}}(n)} \right) $ using the QR factorization.
4.
The recalculation of the loss function (2) for new weights ${\mathbf{{w}}(n)} +\Delta \left( {\mathbf{{w}}(n)} \right) $. If the loss function is less than the one calculated earlier in step 1, then $\mu $ should be reduced $\beta $ times, the new weight vector is saved and the algorithm returns to Step 1. Otherwise, the $\mu $ value is increased $\beta $ times and the algorithm repeats step 3.
5.
The algorithm stops running when the loss function falls below a preset value or the gradient falls below a preset value.

2 Vector Solution for Levenberg-Marquardt Algorithm

The Levenberg-Marquardt algorithm needs high computing power. Each epoch starts with steps 1 and 2, and next steps 3 and 4 can be repeated a few times. Figure 2 shows a single epoch of the LM algorithm, showing the first two steps and repeating steps 3 and 4. It is worth noting that the next pairs of steps 3 and 4 are independent of each other and can be performed at the same time. They only differ in the $\mu $ parameter value and have the same starting point. Thus, they can be run parallel on separate processor cores. However, the solution proposed in this article uses processor vector instructions. Vector instructions allow 4, 8, and even 16 operations to be performed in parallel. This approach enables simultaneous determination of new 4, 8, or 16 points in the weight space using only one processor core, see Fig. 3. Figure 3a shows the epoch of the LM algorithm with the use of four-element vectors. After completing the first two steps, the algorithm calculates steps 3 and 4 for the next 4 parameters $\mu $ at one time. Thus, the three consecutive computations of steps 3 and 4 are performed earlier and therefore do not take computational time. The rectangles with the line in the middle symbolize steps 3 and 4, which are used in the standard calculation method and are omitted in calculations using vector instructions. Figure 3b shows the version with eight-element vectors.

Figure 4 shows an example of the learning process using the LM algorithm. In the following epochs, you can see a different number of steps 3 and 4 repetitions. There are epochs where the repetition does not occur and there are those with a large number of repetitions, in this case, vector instructions can be used, which makes it possible to calculate up to four pairs of steps 3 and 4 at the same time and consequently shortening the learning time. Of course, eight- or sixteen-element vectors can be used instead of using four-element vectors. This increases the parallelism and speed of the proposed calculation method.

3 Experimental Results

The proposed solution was tested against the classical variant of the Levenberg-Marquardt learning algorithm on several test problems. Two types of forward-coupled artificial neural networks were tested in the experiment: MLP — Multilayer Perceptron, FCMLP — Fully Connected Multilayer Perceptron. The performance of the presented calculation method was measured in average training time in milliseconds. The presented results are compiled according to the best combination of training parameters. In all cases, the initial weights were randomly selected from the range [–0.5,0.5]. The number of epochs has been limited to 1,000. Each training session was repeated 100 times.

3.1 Logistic Function Approximation

The logistic function is a unary function given by the formula

$$\begin{aligned} y=f\left( x\right) =4x\left( 1-x\right) \end{aligned}$$

(18)

The teaching sequence contains 11 samples where $x \in \left[ 0, 1\right] $. The average accepted error threshold has been set to 0.001. Table 1 shows the simulation results for two kinds of neural networks MLP and FCMLP. Both networks have five neurons in the hidden layer. The symbols LM, LMP 4, LMP 8, and LMP 16 represent the average network training time using the LM algorithm and its vector versions for 4, 8, and 16 element vectors, respectively. The speed factor means how many percent the vector version is faster then the classical one and is given by the formula

$$\begin{aligned} SF=\left( 1-{LMPx\over LM}\right) *100\% \end{aligned}$$

(19)

Table 1. Training results for the LOG function.

Full size table

3.2 Hang Function Approximation

The Hang function is a nonlinear two-argument $ x_1 $ and $ x_2 $ function with the following formula

$$\begin{aligned} y=f\left( x_1, x_2\right) = {\left( 1 + x_1^{-2} + \sqrt{x_2^{-3}} \right) }^2 \end{aligned}$$

(20)

The Hang teaching sequence contains 50 samples that cover arguments in the range of $x_1, x_2 \in \left[ 1, 5\right] $. The target error threshold was set to 0.001 as the epoch average. The results of simulations for the Hang function are shown in Table 2. Both tested networks have 15 neurons in the hidden layer.

Table 2. Training results for the HANG function.

Full size table

3.3 IRIS Function Classification

The iris dataset contains 150 instances describing three species of iris flowers. The flowers are identified with 4 numerical attributes describing the lengths and widths of the petals of the flower. The target error has been set to 0.05. Table 3 shows the simulation results.

Table 3. Training results for the IRIS function.

Full size table

3.4 The Two Spirals Classification

Two spirals is a well-known classification problem where a neural network has to identify one of the two helices based on two-dimensional coordinates. The training set for this problem contains 96 samples. The target error has been set to 0.05. Table 4 shows the simulation results.

Table 4. Training results for the TS function.

Full size table

4 Conclusion

In this paper, the new computational approach to the Levenberg-Marquardt learning algorithm for a feedforward neural network is proposed. Two types of feedforward neural networks were used in the experiments: multilayer perceptron and fully interconnected multilayer perceptron. The networks were trained with different training sets: Logistic function, Hang, Iris, and Two Spirals. We can compare the computational performance of the proposed solution, based on vector instructions of the Levenberg-Marquardt learning algorithm, with a classical solution. The conducted experiments showed a significant reduction of the real learning time. For all training sets, calculation times have been reduced by an average of 50%. It has been observed that the performance of the proposed solution is promising.

A vector approach can be used for other advanced learning algorithms of feedforward neural networks, see eg. [2, 8]. In the future research, we plan to design parallel realization of learning of other structures including probabilistic neural networks [32] and various fuzzy [1, 15, 20, 22, 24, 37, 42, 46, 47], and neuro-fuzzy structures [16, 17, 23, 25, 35].

References

Bartczuk, Ł, Przybył, A., Cpałka, K.: A new approach to nonlinear modelling of dynamic systems based on fuzzy rules. Int. J. Appl. Math. Comput. Sci. (AMCS) 26(3), 603–621 (2016)
Article MathSciNet MATH Google Scholar
Bilski, J.: The UD RLS algorithm for training the feedforward neural networks. Int. J. Appl. Math. Comput. Sci. 15(1), 101–109 (2005)
MATH Google Scholar
Bilski, J., Litwiński, S., Smola̧g, J.: Parallel realisation of QR algorithm for neural networks learning. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 158–165. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24844-6_19
Chapter Google Scholar
Bilski, J., Smola̧g, J.: Parallel realisation of the recurrent RTRN neural network learning. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2008. LNCS (LNAI), vol. 5097, pp. 11–16. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69731-2_2
Chapter Google Scholar
Bilski, J., Smola̧g, J.: Parallel realisation of the recurrent elman neural network learning. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2010. LNCS (LNAI), vol. 6114, pp. 19–25. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13232-2_3
Chapter Google Scholar
Bilski, J., Smoląg, J.: Parallel realisation of the recurrent multi layer perceptron learning. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012. LNCS (LNAI), vol. 7267, pp. 12–20. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29347-4_2
Chapter Google Scholar
Bilski, J., Smoląg, J.: Parallel approach to learning of the recurrent Jordan neural network. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2013. LNCS (LNAI), vol. 7894, pp. 32–40. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38658-9_3
Chapter Google Scholar
Bilski, J.: Parallel structures for feedforward and dynamical neural networks (in Polish). AOW EXIT (2013)
Google Scholar
Bilski, J., Smoląg, J., Galushkin, A.I.: The parallel approach to the conjugate gradient learning algorithm for the feedforward neural networks. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2014. LNCS (LNAI), vol. 8467, pp. 12–21. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07173-2_2
Chapter Google Scholar
Bilski, J., Smola̧g, J.: Parallel architectures for learning the RTRN and elman dynamic neural networks. IEEE Trans. Parallel Distrib. Syst. PP(99), (2014). https://doi.org/10.1109/TPDS.2014.2357019
Bilski, J., Kowalczyk, B., Marchlewska A., Żurada J.M.: Local levenberg-marquardt algorithm for learning feedforwad neural networks. J. Artif. Intell. Soft Comput. Res. 10(4), 299–316 (2020). https://doi.org/10.2478/jaiscr-2020-0020
Bilski, J., Kowalczyk, B., Marjański, A., Gandor, M., Żurada, J.M.: A novel fast feedforward neural networks training algorithm. J. Artif. Intell. Soft Comput. Res. 11(4), 287–306 (2021). https://doi.org/10.2478/jaiscr-2021-0017
Article Google Scholar
Bilski J., Rutkowski L., Smola̧g J., Tao D., A novel method for speed training acceleration of recurrent neural networks. Inf. Sci. 553, 266–279 (2021). https://doi.org/10.1016/j.ins.2020.10.025
Chu, J.L., Krzyzak, A.: The recognition of partially occluded objects with support vector machines, convolutional neural networks and deep belief networks. J. Artif. Intell. Soft Comput. Res. 4(1), 5–19 (2014)
Article Google Scholar
Cpałka, K., Rutkowski, L.: Flexible takagi-sugeno fuzzy systems. In: Proceedings of the International Joint Conference on Neural Networks, Montreal, pp. 1764–1769 (2005)
Google Scholar
Cpałka, K., Łapa, K., Przybył, A., Zalasiński, M.: A new method for designing neuro-fuzzy systems for nonlinear modelling with interpretability aspects. Neurocomputing 135, 203–217 (2014)
Article Google Scholar
Cpalka, K., Rebrova, O., Nowicki, R. et al.: On design of flexible neuro-fuzzy systems for nonlinear modelling. Int. J. Gener. Syst. 42(6), Special Issue: SI, 706–720 (2013)
Google Scholar
Fahlman, S.: Faster learning variations on backpropagation: an empirical study. In: Proceedings of Connectionist Models Summer School, Los Atos (1988)
Google Scholar
Gabryel, M., Przybyszewski, K.: Methods of searching for similar device fingerprints using changes in unstable parameters. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds.) ICAISC 2020. LNCS (LNAI), vol. 12416, pp. 325–335. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61534-5_29
Chapter Google Scholar
Gabryel, M., Scherer, M.M., Sułkowski, Ł, Damaševičius, R.: Decision making support system for managing advertisers by ad fraud detection. J. Artif. Intell. Soft Comput. Res. 11, 331–339 (2021)
Article Google Scholar
Hagan, M.T., Menhaj, M.B.: Training feedforward networks with the Marquardt algorithm. IEEE Trans. Neuralnetworks 5(6), 989–993 (1994)
Google Scholar
Korytkowski, M., Rutkowski, L., Scherer, R.: From ensemble of fuzzy classifiers to single fuzzy rule base classifier. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2008. LNCS (LNAI), vol. 5097, pp. 265–272. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69731-2_26
Chapter Google Scholar
Korytkowski, M., Scherer, R.: Negative correlation learning of neuro-fuzzy system. LNAI 6113, 114–119 (2010)
Google Scholar
Łapa, K., Przybył, A., Cpałka, K.: A new approach to designing interpretable models of dynamic systems. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2013. LNCS (LNAI), vol. 7895, pp. 523–534. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38610-7_48
Chapter Google Scholar
Łapa, K., Zalasiński, M., Cpałka, K.: A new method for designing and complexity reduction of neuro-fuzzy systems for nonlinear modelling. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2013. LNCS (LNAI), vol. 7894, pp. 329–344. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38658-9_30
Chapter Google Scholar
Marqardt, D.: An algorithm for last-sqares estimation of nonlinear paeameters. J. Soc. Ind. Appl. Math. 431–441 (1963)
Google Scholar
Niksa-Rynkiewicz, T., Szewczuk-Krypa, N., Witkowska, A., Cpałka, K., Zalasiński, M., Cader, A.: Monitoring regenerative heat exchanger in steam power plant by making use of the recurrent neural network. J. Artif. Intell. Soft Comput. Res. 11(2), 143–155 (2021). https://doi.org/10.2478/jaiscr-2021-0009
Article Google Scholar
Patan, K., Patan, M.: Optimal training strategies for locally recurrent neural networks. J. Artif. Intell. Soft Comput. Res. 1(2), 103–114 (2011)
MATH Google Scholar
Riedmiller, M., Braun, H.: A direct method for faster backpropagation learning: the RPROP Algorithm. In: IEEE International Conference on Neural Networks, San Francisco (1993)
Google Scholar
Romaszewski, M., Gawron, P., Opozda, S.: Dimensionality reduction of dynamic msh animations using HO-SVD. J. Artif. Intell. Soft Comput. Res. 3(3), 277–289 (2013)
Article Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Parallel Distributed Processing, vol. 1, ch. 8, Rumelhart, D.E., McCelland, J. (red.). The MIT Press, Cambridge, Massachusetts (1986)
Google Scholar
Rutkowski, L.: Multiple Fourier series procedures for extraction of nonlinear regressions from noisy data. IEEE Trans. Sig. Process. 41(10), 3062–3065 (1993)
Article MATH Google Scholar
Rutkowski, L.: Identification of MISO nonlinear regressions in the presence of a wide class of disturbances. IEEE Trans. Inf. Theor. 37(1), 214–216 (1991)
Article MathSciNet MATH Google Scholar
Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: Decision trees for mining data streams based on the gaussian approximation. IEEE Trans. Knowl. Data Eng. 26(1), 108–119 (2014)
Article MATH Google Scholar
Rutkowski, L., Przybył, A., Cpałka, K., Er, M.J.: Online speed profile generation for industrial machine tool based on neuro-fuzzy approach. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2010. LNCS (LNAI), vol. 6114, pp. 645–650. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13232-2_79
Chapter Google Scholar
Rutkowski, L., Rafajlowicz, E.: On optimal global rate of convergence of some nonparametric identification procedures. IEEE Trans. Autom. Control 34(10), 1089–1091 (1989)
Article MathSciNet MATH Google Scholar
Rutkowski, T., Łapa, K., Jaworski, M., Nielek, R., Rutkowska, D.: On explainable flexible fuzzy recommender and its performance evaluation using the akaike information criterion. In: Gedeon, T., Wong, K.W., Lee, M. (eds.) ICONIP 2019. CCIS, vol. 1142, pp. 717–724. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36808-1_78
Chapter Google Scholar
Smola̧g, J., Bilski, J.: A systolic array for fast learning of neural networks. In: Proceedings of V Conference Neural Networks and Soft Computing, Zakopane, pp. 754–758 (2000)
Google Scholar
Smola̧g, J., Rutkowski, L., Bilski, J.: Systolic array for neural networks. In: Proceedings of IV Conference Neural Networks and Their Applications, Zakopane, pp. 487–497 (1999)
Google Scholar
Starczewski, A.: A clustering method based on the modified RS validity index. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2013. LNCS (LNAI), vol. 7895, pp. 242–250. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38610-7_23
Chapter Google Scholar
Starczewski J.T. Advanced Concepts in Fuzzy Logic and Systems with Membership Uncertainty, volume 284 of Studies in Fuzziness and Soft Computing. Springer, Berlin (2013). https://doi.org/10.1007/978-3-642-29520-1
Starczewski, J.T., Goetzen, P., Napoli, Ch.: Triangular fuzzy-rough set based fuzzification of fuzzy rule-based systems. J. Artif. Intell. Soft Comput. Res. 10, 271–285 (2020)
Article Google Scholar
Tadeusiewicz, R.: Neural Networks (in Polish). AOW RM (1993)
Google Scholar
Werbos, J.: Backpropagation through time: what it does and how to do it. In: Proceedings of the IEEE, vol. 78, no. 10 (1990)
Google Scholar
Wilamowski, B.M., Yo, H.: Neural network learning without backpropagation. IEEE Trans. Neural Network. 21(11), 1793–1803 (2010)
Article Google Scholar
Zalasiński, M., Cpałka, K.: New approach for the on-line signature verification based on method of horizontal partitioning. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2013. LNCS (LNAI), vol. 7895, pp. 342–350. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38610-7_32
Chapter Google Scholar
Zalasiński, M., Łapa, K., Cpałka, K.: Prediction of values of the dynamic signature features. Expert Syst. Appl. 104, 86–96 (2018)
Article Google Scholar
El Zini J., Rizk Y., Awad M.: An optimized parallel implementation of non-iteratively trained recurrent neural networks. Journal of Artif. Intell. Soft Comput. Res. 11(1), 33–50 (2021). https://doi.org/10.2478/jaiscr-2021-0003

Download references

Author information

Authors and Affiliations

Institute of Computational Intelligence, Czȩstochowa University of Technology, Czȩstochowa, Poland
Jarosław Bilski, Barosz Kowalczyk & Jacek Smola̧g

Authors

Jarosław Bilski
View author publications
You can also search for this author in PubMed Google Scholar
Barosz Kowalczyk
View author publications
You can also search for this author in PubMed Google Scholar
Jacek Smola̧g
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jarosław Bilski .

Editor information

Editors and Affiliations

Systems Research Institute of the Polish Academy of Sciences, Warsaw, Poland
Leszek Rutkowski
Częstochowa University of Technology, Częstochowa, Poland
Rafał Scherer
Częstochowa University of Technology, Częstochowa, Poland
Marcin Korytkowski
University of Alberta, Edmonton, AB, Canada
Witold Pedrycz
AGH University of Science and Technology, Kraków, Poland
Ryszard Tadeusiewicz
University of Louisville, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bilski, J., Kowalczyk, B., Smola̧g, J. (2023). A New Computational Approach to the Levenberg-Marquardt Learning Algorithm. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2022. Lecture Notes in Computer Science(), vol 13588. Springer, Cham. https://doi.org/10.1007/978-3-031-23492-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-23492-7_2
Published: 24 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23491-0
Online ISBN: 978-3-031-23492-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A New Computational Approach to the Levenberg-Marquardt Learning Algorithm

Abstract

Similar content being viewed by others

Parallel Approach to the Levenberg-Marquardt Learning Algorithm for Feedforward Neural Networks

The Parallel Modification to the Levenberg-Marquardt Algorithm

The Parallel Approach to the Conjugate Gradient Learning Algorithm for the Feedforward Neural Networks

Keywords

1 Introduction

2 Vector Solution for Levenberg-Marquardt Algorithm

3 Experimental Results

3.1 Logistic Function Approximation

3.2 Hang Function Approximation

3.3 IRIS Function Classification

3.4 The Two Spirals Classification

4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A New Computational Approach to the Levenberg-Marquardt Learning Algorithm

Abstract

Similar content being viewed by others

Parallel Approach to the Levenberg-Marquardt Learning Algorithm for Feedforward Neural Networks

The Parallel Modification to the Levenberg-Marquardt Algorithm

The Parallel Approach to the Conjugate Gradient Learning Algorithm for the Feedforward Neural Networks

Keywords

1 Introduction

2 Vector Solution for Levenberg-Marquardt Algorithm

3 Experimental Results

3.1 Logistic Function Approximation

3.2 Hang Function Approximation

3.3 IRIS Function Classification

3.4 The Two Spirals Classification

4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation