Centering Neural Network Gradient Factors

Schraudolph, Nicol N.

doi:10.1007/978-3-642-35289-8_14

Nicol N. Schraudolph¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7700))

66k Accesses
3 Citations
3 Altmetric

Abstract

It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals [15]. Here we generalize this notion to all factors involved in the network’s gradient, leading us to propose centering the slope of hidden unit activation functions as well. Slope centering removes the linear component of backpropagated error; this improves credit assignment in networks with shortcut connections. Benchmark results show that this can speed up learning significantly without adversely affecting the trained network’s generalization ability.

Previously published in: Orr, G.B. and Müller, K.-R. (Eds.): LNCS 1524, ISBN 978-3-540-65311-0 (1998).

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Weight and Gradient Centralization in Deep Neural Networks

Why Rectified Linear Neurons Are Efficient: A Possible Theoretical Explanation

Linearly Constrained Weights: Reducing Activation Shift for Faster Training of Neural Networks

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Anderson, J., Rosenfeld, E. (eds.): Neurocomputing: Foundations of Research. MIT Press, Cambridge (1988)
Google Scholar
Battiti, R.: Accelerated back-propagation learning: Two optimization methods. Complex Systems 3, 331–342 (1989)
MATH Google Scholar
Battiti, R.: First- and second-order methods for learning: Between steepest descent and Newton’s method. Neural Computation 4(2), 141–166 (1992)
Article Google Scholar
Bienenstock, E., Cooper, L., Munro, P.: Theory for the development of neuron selectivity: Orientation specificity and binocular interaction in visual cortex. Journal of Neuroscience 2 (1982); Reprinted in [1]
Google Scholar
Deterding, D.H.: Speaker Normalisation for Automatic Speech Recognition. PhD thesis, University of Cambridge (1989)
Google Scholar
Finke, M., Müller, K.-R.: Estimating a-posteriori probabilities using stochastic network models. In: Mozer, M.C., Smolensky, P., Touretzky, D.S., Elman, J.L., Weigend, A.S. (eds.) Proceedings of the 1993 Connectionist Models Summer School, Boulder, CO. Lawrence Erlbaum Associates, Hillsdale (1994)
Google Scholar
Hastie, T.J., Tibshirani, R.J.: Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(6), 607–616 (1996)
Article Google Scholar
Herrmann, M.: On the merits of topography in neural maps. In: Kohonen, T. (ed.) Proceedings of the Workshop on Self-Organizing Maps, pp. 112–117. Helsinki University of Technology (1997)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Feature extraction through lococode. Neural Computation (1998) (to appear)
Google Scholar
Intrator, N.: Feature extraction using an unsupervised neural network. Neural Computation 4(1), 98–107 (1992)
Article MathSciNet Google Scholar
Lapedes, A., Farber, R.: A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recognition. Physica D 22, 247–259 (1986)
Article MathSciNet Google Scholar
LeCun, Y., Kanter, I., Solla, S.A.: Eigenvalues of covariance matrices: Application to neural-network learning. Physical Review Letters 66(18), 2396–2399 (1991)
Article Google Scholar
Robinson, A.J.: Dynamic Error Propagation Networks. PhD thesis, University of Cambridge (1989)
Google Scholar
Schraudolph, N.N., Sejnowski, T.J.: Unsupervised discrimination of clustered data via optimization of binary information gain. In: Hanson, S.J., Cowan, J.D., Giles, C.L. (eds.) Advances in Neural Information Processing Systems, vol. 5, pp. 499–506. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Schraudolph, N.N., Sejnowski, T.J.: Tempering backpropagation networks: Not all weights are created equal. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems, vol. 8, pp. 563–569. MIT Press, Cambridge (1996)
Google Scholar
Sejnowski, T.J.: Storing covariance with nonlinearly interacting neurons. Journal of Mathematical Biology 4, 303–321 (1977)
Article Google Scholar
Shah, S., Palmieri, F., Datum, M.: Optimal filtering algorithms for fast learning in feedforward neural networks. Neural Networks 5, 779–787 (1992)
Article Google Scholar
Tenenbaum, J.B., Freeman, W.T.: Separating style and content. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9, pp. 662–668. The MIT Press, Cambridge (1997)
Google Scholar
Turney, P.D.: Exploiting Context When Learning to Classify. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 402–407. Springer, Heidelberg (1993)
Chapter Google Scholar
Turney, P.D.: Robust classification with context-sensitive features. In: Proceedings of the Sixth International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, pp. 268–276 (1993)
Google Scholar
Vogl, T.P., Mangis, J.K., Rigler, A.K., Zink, W.T., Alkon, D.L.: Accelerating the convergence of the back-propagation method. Biological Cybernetics 59, 257–263 (1988)
Article Google Scholar
Widrow, B., McCool, J.M., Larimore, M.G., Johnson Jr., C.R.: Stationary and nonstationary learning characteristics of the LMS adaptive filter. Proceedings of the IEEE 64(8), 1151–1162 (1976)
Article MathSciNet Google Scholar
Zimmermann, H.G.: Neuronale Netze als Entscheidungskalkül. In: Rehkugler, H., Zimmermann, H.G. (eds.) Neuronale Netze in der Ökonomie: Grundlagen und finanzwirtschaftliche Anwendungen, pp. 1–87. Vahlen Verlag, Munich (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

IDSIA, Corso Elvezia 36, 6900, Lugano, Switzerland
Nicol N. Schraudolph

Authors

Nicol N. Schraudolph
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science, Technische Universität Berlin, Franklinstr. 28/29, 10587, Berlin, Germany
Grégoire Montavon & Klaus-Robert Müller &
Dept. of computer Science, Willamette University, 900 State Street, 97301, Salem, OR, USA
Geneviève B. Orr

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schraudolph, N.N. (2012). Centering Neural Network Gradient Factors. In: Montavon, G., Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-35289-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35288-1
Online ISBN: 978-3-642-35289-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Centering Neural Network Gradient Factors

Abstract

Chapter PDF

Similar content being viewed by others

Weight and Gradient Centralization in Deep Neural Networks

Why Rectified Linear Neurons Are Efficient: A Possible Theoretical Explanation

Linearly Constrained Weights: Reducing Activation Shift for Faster Training of Neural Networks

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Centering Neural Network Gradient Factors

Abstract

Chapter PDF

Similar content being viewed by others

Weight and Gradient Centralization in Deep Neural Networks

Why Rectified Linear Neurons Are Efficient: A Possible Theoretical Explanation

Linearly Constrained Weights: Reducing Activation Shift for Faster Training of Neural Networks

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation