Adaptive Blending Units: Trainable Activation Functions for Deep Neural Networks

Sütfeld, Leon René; Brieger, Flemming; Finger, Holger; Füllhase, Sonja; Pipa, Gordon

doi:10.1007/978-3-030-52243-8_4

Leon René Sütfeld¹⁷,
Flemming Brieger¹⁸,
Holger Finger¹⁷,
Sonja Füllhase¹⁷ &
…
Gordon Pipa¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1230))

Included in the following conference series:

Science and Information Conference

779 Accesses
5 Citations

Abstract

The most widely used activation functions in current deep feed-forward neural networks are rectified linear units (ReLU), and many alternatives have been successfully applied, as well. However, none of the alternatives have managed to consistently outperform the rest and there is no unified theory connecting properties of the task and network with properties of activation functions for most efficient training. A possible solution is to have the network learn its preferred activation functions. In this work, we introduce Adaptive Blending Units (ABUs), a trainable linear combination of a set of activation functions. Since ABUs learn the shape, as well as the overall scaling of the activation function, we also analyze the effects of adaptive scaling in common activation functions. We experimentally demonstrate advantages of both adaptive scaling and ABUs over common activation functions across a set of systematically varied network specifications. We further show that adaptive scaling works by mitigating covariate shifts during training, and that the observed advantages in performance of ABUs likewise rely largely on the activation function’s ability to adapt over the course of training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Optimizing performance of feedforward and convolutional neural networks through dynamic activation functions

Article 03 September 2024

Linearly Constrained Weights: Reducing Activation Shift for Faster Training of Neural Networks

Weight and Gradient Centralization in Deep Neural Networks

Notes

1.
With respect to the constraints, the affine() units in [20] are equivalent to ABU$_\mathrm{nrm}$, and their convex() units are equivalent to ABU$_\mathrm{pos}$. Unfortunately, the authors did not provide details with respect to their implementation, so we cannot say whether or not the implementations are equivalent.

References

Agostinelli, F., Hoffman, M., Sadowski, P., Baldi, P.: Learning activation functions to improve deep neural networks. arXiv preprint arXiv:1412.6830 (2014)
Alcaide, E.: E-swish: adjusting activations to different network depths. arXiv preprint arXiv:1801.07145 (2018)
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015)
Dushkoff, M., Ptucha, R.: Adaptive activation functions for deep networks. Electron. Imaging 2016(19), 1–5 (2016)
Article Google Scholar
Eisenach, C., Wang, Z., Liu, H.: Nonparametrically learning activation functions in deep neural nets (2016). https://openreview.net/pdf?id=H1wgawqxl, https://scholar.google.de/scholar?hl=en&assdt=0%2C5&q=https%3A%2F%2Fopenreview.net%2Fpdf%3Fid%3DH1wgawqxl&btnG=
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)
Google Scholar
Godfrey, L.B., Gashler, M.S.: A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. In: 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K), vol. 1, pp. 481–486. IEEE (2015)
Google Scholar
Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. arXiv preprint arXiv:1302.4389 (2013)
Hahnloser, R.H.R., Sarpeshkar, R., Mahowald, M.A., Douglas, R.J., Seung, H.S.: Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405(6789), 947 (2000)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR, abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980
Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, pp. 972–981 (2017)
Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images. Tech Report (2009). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.222.9220&rep=rep1&type=pdf, https://scholar.google.de/scholar?hl=en&assdt=0%2C5&q=Learning+multiple+layers+of+features+from+tiny+images&btnG=
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Leugering, J., Pipa, G.: A unifying framework of synaptic and intrinsic plasticity in neural populations. Neural Comput. 30(4), 945–986 (2018). https://scholar.google.de/scholar?hl=en&assdt=0%2C5&q=A+unifying+framework+of+synaptic+and+intrinsic+plasticity+in+neural+populations.+Neural+Comput.&btnG=#d=gscit&u=%2Fscholar%3Fq%3Dinfo%3AXsUGb1r4qYAJ%3Ascholar.google.com%2F%26output%3Dcite%26scirp%3D0%26hl%3Den
Li, X., Chen, S., Hu, X., Yang, J.: Understanding the disharmony between dropout and batch normalization by variance shift. arXiv preprint arXiv:1801.05134 (2018)
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the ICML, vol. 30, p. 3 (2013)
Google Scholar
Manessi, F., Rozza, A.: Learning combinations of activation functions. arXiv preprint arXiv:1801.09403 (2018)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 807–814 (2010)
Google Scholar
Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. CoRR, abs/1710.05941 (2017). http://arxiv.org/abs/1710.05941
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014). http://jmlr.org/papers/v15/srivastava14a.html
MathSciNet MATH Google Scholar
Turrigiano, G.G., Nelson, S.B.: Homeostatic plasticity in the developing nervous system. Nat. Rev. Neurosci. 5(2), 97 (2004)
Article Google Scholar
Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015)

Download references

Author information

Authors and Affiliations

Institute for Cognitive Science, Wachsbleiche 27, 49090, Osnabrück, Germany
Leon René Sütfeld, Holger Finger, Sonja Füllhase & Gordon Pipa
Ulm University, Helmholtzstraße 16, 89081, Ulm, Germany
Flemming Brieger

Authors

Leon René Sütfeld
View author publications
You can also search for this author in PubMed Google Scholar
Flemming Brieger
View author publications
You can also search for this author in PubMed Google Scholar
Holger Finger
View author publications
You can also search for this author in PubMed Google Scholar
Sonja Füllhase
View author publications
You can also search for this author in PubMed Google Scholar
Gordon Pipa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leon René Sütfeld .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sütfeld, L.R., Brieger, F., Finger, H., Füllhase, S., Pipa, G. (2020). Adaptive Blending Units: Trainable Activation Functions for Deep Neural Networks. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Computing. SAI 2020. Advances in Intelligent Systems and Computing, vol 1230. Springer, Cham. https://doi.org/10.1007/978-3-030-52243-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-52243-8_4
Published: 04 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-52242-1
Online ISBN: 978-3-030-52243-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Adaptive Blending Units: Trainable Activation Functions for Deep Neural Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing performance of feedforward and convolutional neural networks through dynamic activation functions

Linearly Constrained Weights: Reducing Activation Shift for Faster Training of Neural Networks

Weight and Gradient Centralization in Deep Neural Networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Adaptive Blending Units: Trainable Activation Functions for Deep Neural Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing performance of feedforward and convolutional neural networks through dynamic activation functions

Linearly Constrained Weights: Reducing Activation Shift for Faster Training of Neural Networks

Weight and Gradient Centralization in Deep Neural Networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation