Abstract
It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals [15]. Here we generalize this notion to all factors involved in the network’s gradient, leading us to propose centering the slope of hidden unit activation functions as well. Slope centering removes the linear component of backpropagated error; this improves credit assignment in networks with shortcut connections. Benchmark results show that this can speed up learning significantly without adversely affecting the trained network’s generalization ability.
Previously published in: Orr, G.B. and Müller, K.-R. (Eds.): LNCS 1524, ISBN 978-3-540-65311-0 (1998).
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Anderson, J., Rosenfeld, E. (eds.): Neurocomputing: Foundations of Research. MIT Press, Cambridge (1988)
Battiti, R.: Accelerated back-propagation learning: Two optimization methods. Complex Systems 3, 331–342 (1989)
Battiti, R.: First- and second-order methods for learning: Between steepest descent and Newton’s method. Neural Computation 4(2), 141–166 (1992)
Bienenstock, E., Cooper, L., Munro, P.: Theory for the development of neuron selectivity: Orientation specificity and binocular interaction in visual cortex. Journal of Neuroscience 2 (1982); Reprinted in [1]
Deterding, D.H.: Speaker Normalisation for Automatic Speech Recognition. PhD thesis, University of Cambridge (1989)
Finke, M., Müller, K.-R.: Estimating a-posteriori probabilities using stochastic network models. In: Mozer, M.C., Smolensky, P., Touretzky, D.S., Elman, J.L., Weigend, A.S. (eds.) Proceedings of the 1993 Connectionist Models Summer School, Boulder, CO. Lawrence Erlbaum Associates, Hillsdale (1994)
Hastie, T.J., Tibshirani, R.J.: Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(6), 607–616 (1996)
Herrmann, M.: On the merits of topography in neural maps. In: Kohonen, T. (ed.) Proceedings of the Workshop on Self-Organizing Maps, pp. 112–117. Helsinki University of Technology (1997)
Hochreiter, S., Schmidhuber, J.: Feature extraction through lococode. Neural Computation (1998) (to appear)
Intrator, N.: Feature extraction using an unsupervised neural network. Neural Computation 4(1), 98–107 (1992)
Lapedes, A., Farber, R.: A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recognition. Physica D 22, 247–259 (1986)
LeCun, Y., Kanter, I., Solla, S.A.: Eigenvalues of covariance matrices: Application to neural-network learning. Physical Review Letters 66(18), 2396–2399 (1991)
Robinson, A.J.: Dynamic Error Propagation Networks. PhD thesis, University of Cambridge (1989)
Schraudolph, N.N., Sejnowski, T.J.: Unsupervised discrimination of clustered data via optimization of binary information gain. In: Hanson, S.J., Cowan, J.D., Giles, C.L. (eds.) Advances in Neural Information Processing Systems, vol. 5, pp. 499–506. Morgan Kaufmann, San Mateo (1993)
Schraudolph, N.N., Sejnowski, T.J.: Tempering backpropagation networks: Not all weights are created equal. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems, vol. 8, pp. 563–569. MIT Press, Cambridge (1996)
Sejnowski, T.J.: Storing covariance with nonlinearly interacting neurons. Journal of Mathematical Biology 4, 303–321 (1977)
Shah, S., Palmieri, F., Datum, M.: Optimal filtering algorithms for fast learning in feedforward neural networks. Neural Networks 5, 779–787 (1992)
Tenenbaum, J.B., Freeman, W.T.: Separating style and content. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9, pp. 662–668. The MIT Press, Cambridge (1997)
Turney, P.D.: Exploiting Context When Learning to Classify. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 402–407. Springer, Heidelberg (1993)
Turney, P.D.: Robust classification with context-sensitive features. In: Proceedings of the Sixth International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, pp. 268–276 (1993)
Vogl, T.P., Mangis, J.K., Rigler, A.K., Zink, W.T., Alkon, D.L.: Accelerating the convergence of the back-propagation method. Biological Cybernetics 59, 257–263 (1988)
Widrow, B., McCool, J.M., Larimore, M.G., Johnson Jr., C.R.: Stationary and nonstationary learning characteristics of the LMS adaptive filter. Proceedings of the IEEE 64(8), 1151–1162 (1976)
Zimmermann, H.G.: Neuronale Netze als Entscheidungskalkül. In: Rehkugler, H., Zimmermann, H.G. (eds.) Neuronale Netze in der Ökonomie: Grundlagen und finanzwirtschaftliche Anwendungen, pp. 1–87. Vahlen Verlag, Munich (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Schraudolph, N.N. (2012). Centering Neural Network Gradient Factors. In: Montavon, G., Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-35289-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35288-1
Online ISBN: 978-3-642-35289-8
eBook Packages: Computer ScienceComputer Science (R0)