Abstract
Averaging over many predictors leads to a reduction of the variance portion of the error. We present a method for evaluating the mean squared error of an infinite ensemble of predictors from finite (small size) ensemble information. We demonstrate it on ensembles of networks with difierent initial choices of synaptic weights.We find that the optimal stopping criterion for large ensembles occurs later in training time than for single networks. We test our method on the suspots data set and obtain excellent results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
J.L. Elman and D. Zipser. Learning the Hidden Structure of Speech. J. Acoust. Soc. Amer. 83, 1615–1626. 1988.
S. Geman, E. Bienenstock and R. Doursat. Neural networks and the bias/variance dilemma. Neural Comp., 4(1):1–58. 1992.
W.P. Lincoln and J. Skrzypek. Synergy of clustering multiple back propagation networks. In Touretzky, D. S, editors, Advances in Neural Information Processing Systems 2, pages 650–657, SanMateo, CA. Morgan Kaufmann 1990.
J. Morris Forecasting the sunspot cycle. J. Roy. Stat. Soc. Ser. A, 140, 437–447 1977.
U. Naftaly, N. Intrator and D. Horn. Optimal Ensemble Averaging of Neural Networks. Network, Comp. Neural Sys., 8, 283–296 1997.
S.J. Nowlan and G.E. Hinton. Simplifying neural networks by soft weight-sharing. Neural Computation. 4, 473–493 1992.
P.M. Perrone. Improving regression estimation: averaging methods for variance reduction with extensions to general convex measure optimization. PhD thesis BrownUniversity, Institute for Brain and Neural Systems, 1993.
H. Pi and C. Peterson. Finding the Embedding Dimension and Variable Dependencies in Time Series. Neural Comp. 6, 509–520 1994.
M.B. Priestley. Spectral Analysis and Time Series. Academic Press. 1981.
A.S. Weigend, B.A. Huberman and D. Rumelhart. Predicting the future: A connectionist approach. Int. J. Neural Syst. 1, 193–209 1990.
D. H. Wolpert. Stacked generalization. Neural Networks 5:241–259 1992
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Horn, D., Naftaly, U., Intrator, N. (1998). Large Ensemble Averaging. In: Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 1524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49430-8_7
Download citation
DOI: https://doi.org/10.1007/3-540-49430-8_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65311-0
Online ISBN: 978-3-540-49430-0
eBook Packages: Springer Book Archive