Abstract
We are concerned with feed-forward non-linear networks (multi-layer perceptrons, or MLPs) with multiple outputs. We wish to treat the outputs of the network as probabilities of alternatives (e.g. pattern classes), conditioned on the inputs. We look for appropriate output non-linearities and for appropriate criteria for adaptation of the parameters of the network (e.g. weights). We explain two modifications: probability scoring, which is an alternative to squared error minimisation, and a normalised exponential (softmax) multi-input generalisation of the logistic non-linearity. The two modifications together result in quite simple arithmetic, and hardware implementation is not difficult either. The use of radial units (squared distance instead of dot product) immediately before the softmax output stage produces a network which computes posterior distributions over class labels based on an assumption of Gaussian within-class distributions. However the training, which uses cross-class information, can result in better performance at class discrimination than the usual within-class training method, unless the within-class distribution assumptions are actually correct.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
D R Cox and H D Millar. The Theory of stochastic processes. Methuen, 1965.
T J Seinowski and C R Rosenberg. NETtalk: A parallel network that learns to read aloud. Technical Report JHU/EECS-86/01, Johns Hopkins U. EE&CS, 1986.
L Gillick. Probability scores for backpropagation networks. July 1987. Personal communication.
G E Hinton. Connectionist Learning Procedures. Technical Report CMU-CS-87–115, Carnegie Mellon University Computer Science Department, June 1987.
E B Baum and F Wilczek. Supervised learning of probability distributions by neural networks. In D Anderson, editor,Neural Information Processing Systems, pages 52–61, Am. Inst, of Physics, 1988.
S Solla, E Levin, and M Fleisher. Accelerated learning in layered neural networks. Complex Systems, January 1989.
G.E. Hinton, T.J. Sejnowski, and D.H. Ackley. Boltzmann machines: constraint satisfaction networks that learn. Technical report CMU-CS-84–119, Carnegie-Mellon University, May 1984.
E Yair and A Gersho. The Boltzmann Perceptron Network: a soft classifier. Technical Report CIPR TR 88–11, Center for Information Processing Research, Dept. of E&CE, UCSB, November 1988.
L R Bahl, P F Brown, P V de Souza, and R L Mercer. Maximum mutual information estimation of hidden Markov model parameters. In Proc. IEEE ICASSP86, pages 49–52, 1986.
A J Viterbi. Principles of Digital Communication and Coding. McGraw-Hill, 1979.
G E Peterson and H L Barney. Control methods used in a study of vowels. J. Acoust. Soc. Amer., 24(2):175–184, March 1952.
W M Huang and R P Lippmann. Neural net and traditional classifiers. In D Anderson, editor, Neural Information Processing Systems, pages 387–396, Am. Inst, of Physics, 1988.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1990 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bridle, J.S. (1990). Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition. In: Soulié, F.F., Hérault, J. (eds) Neurocomputing. NATO ASI Series, vol 68. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76153-9_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-76153-9_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-76155-3
Online ISBN: 978-3-642-76153-9
eBook Packages: Springer Book Archive