Abstract
The utility of the back-propagation method in establishing suitable weights in a distributed adaptive network has been demonstrated repeatedly. Unfortunately, in many applications, the number of iterations required before convergence can be large. Modifications to the back-propagation algorithm described by Rumelhart et al. (1986) can greatly accelerate convergence. The modifications consist of three changes:1) instead of updating the network weights after each pattern is presented to the network, the network is updated only after the entire repertoire of patterns to be learned has been presented to the network, at which time the algebraic sums of all the weight changes are applied:2) instead of keeping η, the “learning rate” (i.e., the multiplier on the step size) constant, it is varied dynamically so that the algorithm utilizes a near-optimum η, as determined by the local optimization topography; and3) the momentum factor α is set to zero when, as signified by a failure of a step to reduce the total error, the information inherent in prior steps is more likely to be misleading than beneficial. Only after the network takes a useful step, i.e., one that reduces the total error, does α again assume a non-zero value. Considering the selection of weights in neural nets as a problem in classical nonlinear optimization theory, the rationale for algorithms seeking only those weights that produce the globally minimum error is reviewed and rejected.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Akaike H (1959) On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method. Ann Inst Statist Math 11:1–17
Alkon DL (1983) Learning in a marine snail. Scientific American (July 1984), pp 70–84
Alkon DL (1984) Calcium-mediated reduction in ionic currents: a biophysical memory trace. Science 226:1037–1945
Alkon DL (1985) Conditioning — induced changes ofHermissenda channels: relevance to mammalian brain function. In: Weinberger NM, McGaugh JL, Lynch G (eds) Memory systems of the brain. The Guiford Press, New York
Levy AV, Gomez S (1985) The tunneling method applied to global optimization. In: Boggs PT, Byrd RH, Schnabel RB (eds) Numerical optimization 1984. SIAM, Philadelphia, pp 213–244
Luenberger DG (1984) Linear and nonlinear programming, 2nd ed. Addison-Wesley, Reading, Mass
Lundy M, Mees A (1986) Convergence of an annealing algorithm. Math Prog 34:111–124
Muneer T (1988) Comparison of optimization methods for nonlinear least squares minimization. Int J Math Educ Sci Tech 19:192–197
Pardalos PM, Rosen JB (1986) Methods for global concave minimization: a bibliographic survey. SIAM Rev 28:367–379
Parker DB (1987) Optimal algorithms for adaptive networks: second order back propagation, second order direct propagation, and second order Hebbian learning. In: Caudill M, Butler C (eds) Proceedings of the 1st International Conference on Neural Networks, San Diego, Calif., June 1987. IEEE Cat. #87TH0191-7, pp II-593-II-600
Pegis RJ, Grey DS, Vogl TP, Rigler AK (1966) The generalized orthonormal optimization program and its applications. In: Lavi A, Vogl TP (eds) Recent advances in optimization techniques, Wiley, New York, pp 47–60
Pineda FJ (1987) Generalization of back propagation to recurrent and higher order neural networks. Proceedings of the IEEE Conference on Neural Information Processing Systems, Denver, Colorado 1987: (to be published)
Rinnooy-Kan AHG, Timmer GT (1985) A stochastic approach to global optimization. In: Boggs PT, Byrd RH, Schnabel RB (eds) Numerical optimization 1984. STAM, Philadelphia, pp 245–262
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representation by error propagation. In: Rumelhart DE, McClelland JL and the PDP Research Group (eds) Parallel distributed processing, vol 1, chap 8. MIT Press, Cambridge, Mass
Walster GW, Hansen ER, Sengupta S (1985) Test results for a global optimization algorithm. In: Boggs PT, Byrd RH, Schnabel RB (eds) Numerical optimization 1984. SIAM, Philadelphia, pp 272–287
Watson LT (1986) Numerical linear algebra aspects of globally convergent homotopy methods. SIAM Rev 28:529–545
Whitson GM (1988) An introduction the the parallel distributed processing model of cognition and some examples of how it is changing the teaching of artificial intelligence. In: Dreshem HL (ed) Proceedings of the 19th Annual Technical Symposium on Comp Sci Education. ACM, New York, pp 59–62
Whitson GM, Kulkarni A (1988) A testbed for sensory PDP models. Proceedings of the 16th Annual Comp Sci Conf. ACM, New York, pp 467–468
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Vogl, T.P., Mangis, J.K., Rigler, A.K. et al. Accelerating the convergence of the back-propagation method. Biol. Cybern. 59, 257–263 (1988). https://doi.org/10.1007/BF00332914
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF00332914