Automatic Relevance Determination (ARD)

Husmeier, Dirk

doi:10.1007/978-1-4471-0847-4_15

Dirk Husmeier PhD³

Part of the book series: Perspectives in Neural Computing ((PERSPECT.NEURAL))

634 Accesses
1 Citations

Abstract

A scheme for the systematic adaptation of the random-parameter distribution widths is introduced. Weights exiting the same input node are combined into a weight group, and the distribution widths of the weight groups are adjusted during training by a method similar to Manhattan updating. A practical algorithm is derived, and an empirical demonstration shows that irrelevant inputs are detected and effectively switched off. The whole scheme was inspired by and is akin to Neal’s and MacKay’s automatic relevance determination. It will therefore be referred to by the same name.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Reference

The term hyperparameter will be borrowed since the ^g_rand are parameters of a prior probability distribution, and are closely related to the hyperparameters α_g in MacKays’s work [39].
Google Scholar
By defining σ as the exponential of ρ its positivity is always ensured. The other reason for introducing ρ is that σ is a scale parameter. Since a non-informative prior for a scale parameter is uniform on a logarithmic scale (as discussed in Section 11.2), ρ is the natural parameter for any adaptation scheme.
Google Scholar
The nature of the inconsistency of scheme ARD2 (vide infra) becomes clearer when the update rule for the ρ _gs is analysed. As will be shown shortly in (15.15), the gradient of E with respect to ρ depends on all the weights exiting the input units, that is both the weights feeding into the S-layer and those feeding into the g-layer. However, as illustrated above and discussed in a more general way in [7], pp.340–342, these weights scale differently when the training data are subjected to a linear transformation. Consequently, the sign of the gradient in (15.15), and hence the network’s ‘assumption’ about the significance of the different inputs, can change as the result of such a linear transformation. This is a striking inconsistency, since linear transformations of the data should lead to equivalent networks which differ only by the linear transformation of the weights.
Google Scholar
The method of simple weight decay, with á_k = 0.01 for all weight grous, was applied for regularization; see Section 12.1.2 for details.
Google Scholar

Download references

Author information

Authors and Affiliations

Neural Systems Group, Department of Electrical & Electronic Engineering, Imperial College, Exhibition Road, London, SW7, UK
Dirk Husmeier PhD

Authors

Dirk Husmeier PhD
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Husmeier, D. (1999). Automatic Relevance Determination (ARD). In: Neural Networks for Conditional Probability Estimation. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0847-4_15

Download citation

DOI: https://doi.org/10.1007/978-1-4471-0847-4_15
Publisher Name: Springer, London
Print ISBN: 978-1-85233-095-8
Online ISBN: 978-1-4471-0847-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics