q-Softplus Function: Extensions of Activation Function and Loss Function by Using q-Space

Abe, Motoshi; Kurita, Takio

doi:10.1007/978-3-031-02444-3_3

Motoshi Abe¹⁰ &
Takio Kurita¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13189))

Included in the following conference series:

Asian Conference on Pattern Recognition

964 Accesses
1 Citations

Abstract

In recent years, the performance of machine learning algorithms has been rapidly improved because of the progress of deep learning. To approximate any non-linear function, almost all models of deep learning use the non-linear activation function. Rectified linear units (ReLU) function is most commonly used. The continuous version of the ReLU function is the softplus function and it is derived by the integration of the sigmoid function. Since a sigmoid function is based on Gaussian distribution, the softplus activation function is also based on Gaussian distribution. In machine learning and statistics, most techniques assume the Gaussian distribution because Gaussian distribution is easy to handle in mathematical theory. For example, the exponential family is often assumed in information geometry which connects various branches of mathematical science in dealing with uncertainty and information based on unifying geometric concepts. The q-space is defined to extend this limitation of information geometry. On the q-space, q-multiplication, q-division, q-exponential, and q-logarithm are defined with hyperparameter q as a natural extension of multiplication and division, etc. in general space. In this paper, we propose to extend the activation function and the loss function by using q-space. By this extension, we can introduce hyperparameter q to control the shape of the function and the standard softplus function can be recovered by setting the hyperparameter $q=1$. The effectiveness of the proposed q-softplus function, we have performed experiments in which the q-softplus function is used for the activation function of a convolutional neural network instead of the ReLU function and the loss function of metric learning Siamese and Triplet instead of max function.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Deep Metric Learning: Loss Functions Comparison

Article 01 December 2023

Qtorch+: Next Generation Arithmetic for Pytorch Machine Learning

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Keywords

1 Introduction

In recent years, the performance of the machine learning algorithms has been rapidly improved. Many techniques of machine learning are proposed such as support vector machine [24], neural network [4], convolutional neural network [6], and so on. Since these models can approximate any non-linear function, they are effective for classification [11, 13, 20, 21], person recognition [7, 10], object detection [25], and so on.

To approximate any non-linear function, almost all models of deep learning use the non-linear activation function. Rectified linear units (ReLU) function is most commonly used as non-linear activation function of the hidden layers in the deep learning models. Sigmoid function or softmax function is often used as a non-linear activation function in the output layer of the deep learning models.

The continuous version of the ReLU function is softplus function and it is derived by the integration of the sigmoid function. The sigmoid function and softmax function are defined by using exponential function and have a close relation with Gaussian distribution. It means that the input of the sigmoid or softmax function is assumed to be a Gaussian distribution. Exponential linear units (ELU) [19], Sigmoid-weighted linear unit (SiLU) [9], swish [18], and mish [16] have been proposed as extension of ReLU function. The such activation functions are derived from ReLU function or sigmoid function.

In machine learning and statistics, most techniques assume the Gaussian distribution for prior distribution or conditional distribution because Gaussian distribution is easy to handle in mathematical theory. For example, the exponential family is often assumed in information geometry which connects various branches of mathematical science in dealing with uncertainty and information based on unifying geometric concepts. In information geometry, it is famous that the exponential family is flat under the e-connection. The Gaussian distribution is a kind of the exponential family.

However, some famous probability distributions, such t-distribution, is not exponential family. As an extension of information geometry, q-space is defined [22]. On the q-space, q-multiplication, q-division, q-exponential, and q-logarithm are defined with hyperparameter q as a natural extension of multiplication and division, etc. in general space. In the q-space, the q-Gaussian distribution is derived by the maximization of the Tsallis entropy under appropriate constraints. The q-Gaussian distribution includes Gaussian distribution and t-distribution that can be represented by setting the hyperparameter q to $q=1$ for Gaussian distribution and $q=2.0$ for t-distribution. Since the q-Gaussian distribution can be written by scalar parameter, we can handle some probability distributions as flat in q-space.

The authors proposed to used q-Gaussian distribution for dimensionality reduction technique. The t-Distributed Stochastic Neighbor Embedding (t-SNE) [15] and the parametric t-SNE [14] are extended by using the q-Gaussian distribution instead of t-distribution as the probability distribution on low-dimensional space. They are named q-SNE [1] and the parametric q-SNE [17].

In this paper, we propose to define the activation function and the loss function by using the q-exponential and q-logarithm of q-space. Especially we define q-softplus function as an extension of the softplus function. By this extension, we can introduce hyperparameter q to control the shape of the function. For example, we can recover the standard softplus function or the shifted ReLU function by changing the hyperparameter q of the q-softplus function. To make the origin of the proposed q-softplus function the same as the one of the ReLU function, we also defined the shifted q-softplus function.

To show the effectiveness of the proposed shifted q-softplus function, we have performed experiments in which the shifted q-softplus function is used as the activation function in a convolutional neural network instead of the standard ReLU function. Also, we have performed experiments in which the q-softplus function is used for loss function of metric learning Siamese [5, 8, 10] and Triplet [12, 23] instead of the max function. Through the experiments, the proposed q-softplus function shows better results on CIFAR10, CIFAR100, STL10, and TinyImageNet datasets.

2 Related Work

2.1 q-Space

Information geometry is an interdisciplinary field that applies the techniques of differential geometry to study probability theory and statistics [3]. It studies statistical manifolds, which are Riemannian manifolds whose points correspond to probability distributions. Tanaka [22] extended the information geometry developed on the exponential family to q-Gaussian distribution.

To do so, it is necessary to extend the standard multiplication, division, exponential, and logarithm to q-multiplication, q-division, q-exponential, and q-logarithm in [22]. Then we can consider a space in which these q-arithmetic operations are defined. In this paper, we call this space q-space.

In q-space, the q-multiplication and q-division of two functions f and g are respectively defined as

$$\begin{aligned} f\otimes _{q}g = \left( f^{1-q}+g^{1-q}-1\right) ^{\frac{1}{1-q}}, \end{aligned}$$

(1)

and

$$\begin{aligned} f\oslash _{q}g = \left( f^{1-q}-g^{1-q}+1\right) ^{\frac{1}{1-q}}, \end{aligned}$$

(2)

where q is a hyperparameter.

Similarly the q-exponential and q-logarithm are defined as

$$\begin{aligned} exp_q(x) = \left( 1+\left( 1-q\right) x\right) ^{\frac{1}{1-q}}, \end{aligned}$$

(3)

and

$$\begin{aligned} log_q(x) = \frac{1}{1-q}\left( x^{1-q}-1\right) . \end{aligned}$$

(4)

These q-arithmetic operations converge to the standard multiplication and division when $q \rightarrow 1$. In the q-space, the q-Gaussian distribution is derived by the maximization of the Tsallis entropy under appropriate constraints. The q-Gaussian distribution includes Gaussian distribution and t-distribution. Since the q-Gaussian distribution can be written with a scalar parameter q, we can handle a set of probability distributions as flat in q-space.

2.2 Activation Function

In a neural network, we use an activation function to approximate non-linear function. The ReLU function is famous and is mostly used in deep neural networks. The ReLU function is defined as

$$\begin{aligned} ReLU(x) = max(0,x). \end{aligned}$$

(5)

The main reason why the ReLU function is used in deep neural network is that the ReLU function can prevent the vanishing gradient problem. The ReLU function is very simple and works well in deep neural networks. This function is also called the plus function.

The softplus function is a continuous version of the ReLU function and is defined as

$$\begin{aligned} Softplus(x) = \log {(1+\exp {x})}. \end{aligned}$$

(6)

The first derivative of this function is continuous around at 0.0 while one of the ReLU function is not. The softplus function is also derivation as integral of a sigmoid function.

Recently many activation functions have been proposed for deep neural networks [9, 16, 18, 19]. Almost all of such activation functions are defined based on the ReLU function or sigmoid function or a combination of the ReLU function and sigmoid function.

These functions are also used to define loss function. For example, the max (ReLU) function or softplus function is used as contrastive loss or triplet loss uses in metric learning.

2.3 Metric Learning

The Siamese network and Triplet network have been proposed and often used for metric learning.

The Siamese network consists of two networks which have the shared weights and can learn metrics between two outputs. In the training, the two samples are fed to each network and the shared weights of the network are modified so that the two outputs of the network are closer together when the two samples belong to the same class, and so that the two outputs are farther apart when they belong to different classes.

Let $\{(\boldsymbol{x}_i, y_i)|i=1\ldots N\}$ be a set of training samples, where $\boldsymbol{x}_i$ is an image and $y_i$ is a class label of i-th sample. The loss function of the Siamese network is defined as

$$\begin{aligned} L_{siamese}&=\frac{1}{2}t_{ij}d_{ij}^2 + \frac{1}{2}(1-t_{ij})max(m-d_{ij}, 0)^2,\end{aligned}$$

(7)

$$\begin{aligned} d_{ij}&=\Vert f(\boldsymbol{x}_i;\theta ) - f(\boldsymbol{x}_j;\theta )\Vert ^2 \end{aligned}$$

(8)

where $t_{ij}$ is the binary indicator which shows whether the i-th and j-th samples are the same class or not, f is a function corresponding to the network, $\theta $ is a set of shared weights of the network. This $\theta $ is learned by minimizing this loss $L_{siamese}$. The Siamese loss is called the contrastive loss. It is noticed that the max (ReLU) function is used in this loss. It is possible to use the softplus function instead of the max function.

The Triplet network consists of three networks with the shared weights and learns metrics between three outputs. In the training, the three samples are fed to each network. One sample is called an anchor. The sample that is the same class with the anchor is called a positive sample and the sample that is a different class from the anchor is called a negative sample. For the positive sample, the networks is trained such that the two outputs of anchor and positive are closer together. For the negative sample, the networks is trained such that the two outputs of anchor and negative become away from each other.

Let $x_a$, $x_p$, and $x_n$ be the anchor, the positive, and the negative sample respectively. The loss function of the Triplet network is defined as

$$\begin{aligned} L_{triplet}&=max(d_{ap} - d_{an} + m, 0), \end{aligned}$$

(9)

where m is a margin, $d_{ij}$ is a distance same as the contrastive loss. It is noticed that the max (ReLU) function is also used in this loss. We can use the softplus function instead of the max function. Since the max or softplus function is linear when $x>>0$, they are effect to move the sample farther away. This is very important for metric learning.

3 q-Softplus Function and Shifted q-Softplus Function

The q-Space is defined to extend information geometry developed for exponential family. By using q-space, we can consider the natural extended world. In this paper, we proposed an extension of the standard activation functions or the loss functions by using q-space. Since q-exponential and q-logarithm express the various shape of a graph by setting a hyperparameter q, we can control the shape of the activation function or the loss function by selecting the better parameter q in the q-space. In particular, in this paper, we proposed the q-softplus function as an extension of the softplus function.

3.1 q-Softplus Function

The q-softplus function is defined as

$$\begin{aligned} qsoftplus(x)&= log_q(1 + exp_qx)\nonumber \\&= \frac{1}{1-q}\left( \left( 1+max\left( 1+\left( 1-q\right) x,0\right) ^{\frac{1}{1-q}}\right) ^{1-q}-1\right) . \end{aligned}$$

(10)

When $q\rightarrow 1$, q-softplus function close to the original softplus function. Figure 1 (A) shows the shape of the q-softplus function compared with the max (ReLU) function and the softplus function. When $q=0.999$ (q close to 1), q-softplus function overlapped with the softplus function. Moreover, when $q=0.0$, q-softplus function becomes the shifted max function. From Fig. 1 (A), it is noticed that the q-softplus function can represent the various shapes including the max (ReLU) function and the softplus function. When $1+\left( 1-q\right) x>0$ the first derivative of x is as follows,

$$\begin{aligned} \frac{dqsoftplus(x)}{dx}&= \left( 1+\left( 1+(1-q)x\right) ^{\frac{1}{1-q}}\right) ^{-q}\left( 1+(1-q)x\right) ^\frac{q}{1-q} \nonumber \\&= \left( 1+exp_qx\right) ^{-q}\left( exp_qx\right) ^q, \end{aligned}$$

(11)

other wise is 0. When $q\rightarrow 1$, Eq. 11 closes to first derivation of softplus function.

3.2 Shifted q-Softplus Function

The q-softplus function becomes shifted max function when $q=0.0$. To make q-softplus with $q=0.0$ the same as the max function, we propose to shift q-softplus function by introducing sift term. We call this function the shifted q-softplus function Then the shifted q-softplus function is defined as

$$\begin{aligned} sqsoftplus(x)&= log_q(1 + exp_q(x-\frac{1}{1-q}))\nonumber \\&= \frac{1}{1-q}\left( \left( 1+max\left( 1+\left( 1-q\right) (x-\frac{1}{1-q}),0\right) ^{\frac{1}{1-q}}\right) ^{1-q}-1\right) . \end{aligned}$$

(12)

When $q=0.0$, the shifted q-softplus function becomes the same as the max function. Figure 1 (B) shows the shapes of the shifted q-softplus function. It is noticed that the shifted q-softplus function can represent the various shapes including the max function from this figure.

3.3 Loss Function for Metric Learning

The loss function of the Siamese network or the Triplet network, the max or softplus function is important to move the sample farther away because the max or softplus function is linear when $x>>0$. We also propose a new loss function called q-contrastive loss and q-triplet loss by using q-softplus. The q-contrastive loss is defined as

$$\begin{aligned} L_{qsiamese}&=\frac{1}{2}t_{ij}d_{ij}^2 + \frac{1}{2}(1-t_{ij})qsoftplus(m-d_{ij}, 0)^2. \end{aligned}$$

(13)

Similarly, the q-triplet loss is defined as

$$\begin{aligned} L_{triplet}&=qsoftplus(d_{ap} - d_{an} + m, 0). \end{aligned}$$

(14)

Table 1. This table shows classification accuracy on CIFAR10, CIFAR100, STL10 and Tiny ImageNet. The hyperparameters q of all activation function on VGG11 are same. The accuracy shows percentage for train and test sample respectively.

Full size table

Table 2. This table shows test classification accuracy on CIFAR10, CIFAR100, STL10 and Tiny ImageNet by using optuna. The hyperparameters q of shifted q-softplus function found by optuna are shown in Table 3. The accuracy shows percentage.

Full size table

Table 3. This table shows the found hyperparameter q of each shifted q-softplus function on VGG11 by using optuna. VGG11 has 10 q-softplus activation functions. The qk denotes the k-th shifted q-softplus function from first layer.

Full size table

By using the q-softplus function, we can control the effect of moving the sample farther away. Since the first derivative of the q-softplus function is continuous at 0, it can move the sample more farther away than the given margin. We can also use the shifted q-softplus function in loss function. Since the shifted q-softplus function has distorted linear shapes, we can control the effect of loss.

Figure 2 shows the example of the network architecture where the q-softplus function or the shifted q-softplus function is used. In this figure, the example of the triplet loss is shown.

4 Experiments

4.1 Experimental Dataset

To confirm the effectiveness of the proposed q-softplus based activation function and loss function, we have performed experiments using MNIST, FashionMNIST, CIFAR10, CIFAR100, STL10, and Tiny ImageNet datasets.

Table 4. This table shows classification accuracy of test sample by Siamese network on MNIST, FashionMNIST and CIFAR10. The accuracy shows percentage for train and test sample respectively by k-nn.

Full size table

The MNIST has grey images of 10 class hand-written digits. The size of each image is 28 $\times $ 28 pixels. The number of training samples is 60,000 and the number of test samples is 10,000. The FashionMNIST has grey images of 10 classes of fashion items. The size of each image is 28 $\times $ 28 pixels. The number of training samples is 60,000 and the number of test samples is 10,000. The CIFAR10 has colored images of 10 class objects. The size of each image is 32 $\times $ 32 pixels. The number of training samples is 50,000 and the number of test samples is 10,000. The CIFAR100 has colored images of 100 class objects. The size of each image is 32 $\times $ 32 pixels. The number of training samples is 50,000 and the number of test samples is 10,000. The STL10 has colored images of 10 class objects. The size of each image is 96 $\times $ 96 pixels. The number of training samples is 500 and the number of test samples is 800. The TinyImageNet has colored images of 200 objects. The size of each image is 64 $\times $ 64 pixels. The number of training samples is 100,000 and the number of test samples is 10,000.

Table 5. This table shows classification accuracy of test sample by Triplet network on MNIST, FashionMNIST and CIFAR10. The accuracy shows percentage for train and test sample respectively by k-nn.

Full size table

4.2 Shifted q-Softplus as an Activation Function

To confirm the effectiveness to use the shifted q-softplus function as an activation function, we have performed experiments in which the shifted q-softplus function in CNN is used instead of the ReLU function. The classification accuracy is measured for the datasets CIFAR10, CIFAR100, STL10, and Tiny ImageNet. VGG11 [20] is used as the CNN model and the effect of Batch Normalization (BN) is also investigated. Stochastic gradient descent (SGD) with a momentum of 0.9 is used for optimization. The learning rate is at first set to 0.01 and is multiplied by 0.1 at 20 and 40 epochs. The parameter of the weight decay is set to 0.0001. The batch size is set to 100 training samples and the training is done for 100 epochs.

Table 1 shows the classification accuracy for different q. The score is calculated as the average of 5 trials with a different random seed. From this table, the shifted q-softplus function gives better classification accuracy than the ReLU function. From this table we can notice that the best hyperparameter q is around 0.2. When the hyperparameter q is positive, namely $q>0.0$, the shape of the shifted q-softplus function becomes lower than the ReLU function. This means that better classification accuracy is obtained when the outputs of each layer are smaller than the outputs of the ReLU function.

We have also performed experiments to find the best hyperparameter q of the shifted q-softplus function for each dataset by using optuna [2]. The optuna is developed for python language to find the best hyperparameter of the machine learning models. The objective function to find the best hyperparameter q is the validation loss. We used 0.1% of training dataset as the validation samples. The trials of finding phase is set to 30.

The results of test accuracy for each dataset are shown in Table 2. Again, the values in the table are the averages of 5 trials with a different random seed. The best hyperparameters q of the shifted q-softplus function for each dataset are shown in Table 3. It is noticed that the best hyperparameter q is larger than 0.0 and smaller than 0.2 for almost all cases.

4.3 q-Softplus as an Loss Function of Metric Learning

To confirm the effectiveness of the q-softplus function as loss function, we have performed experiments in which the q-softplus function is used to define the loss function of the Siamese network and the Triplet network instead of the max function. We call these loss functions q-contrastive loss and q-triplet loss. MNIST, FashionMNIST, and CIFAR10 datasets are used in the experiments. The simple CNN with 2 convolutional layers and 3 fully connected layers is used for MNIST and FashionMNIST datasets. The ReLU function is used as the activation function in the hidden layers of the network. On the other hand, VGG11 with batch normalize is used for CIFAR10 dataset. The dimension of the final output is 10 for all datasets. Stochastic gradient descent (SGD) with a momentum of 0.9 is used for optimization. The learning rate is at first set to 0.01 and is multiplied by 0.1 at 20 and 40 epochs. The parameter of the weight decay is set to 0.0001. The batch size is to 100 samples and the training is done for 100 epochs. The margin in the loss function is determined by preliminary experiments.

The goodness of the feature vectors obtained by the trained network is evaluated by measuring the classification accuracy obtained by using k nearest neighbor (k-nn) in the 10-dimensional feature space. In the following experiment, k is set to 5 for k-nn. Since the q-softplus function becomes shifted max function when $q=0.0$, we also included experiments with margin - 1.

Table 4 shows the classification accuracy obtained by the Siamese network and Table 5 shows the classification accuracy obtained by Triplet network. The score is the average of 5 trials with a different random seed.

It is noticed that the q-softplus function gives better classification accuracy than the max function. The best hyperparameter q is around −0.5. Since the shape of the q-softplus function becomes higher than the max function when $q<0.0$, to make the output larger is probably better to move the sample farther away.

5 Conclusion

In this paper, we proposed the q-softplus function and the shifted q-softplus function as an extension of the softplus function. Through the experiments of the classification task, we confirmed that the network in which the shifted q-softplus function is used as activation function in the hidden layers gives the better classification accuracy than the network using the ReLU function. Also, we found that the best q in the shifted q-softplus function is around 0.2. This results suggest that better classification accuracy is obtained when the outputs of each layer are smaller than the outputs of the ReLU function. Through the experiments of metric learning, we confirmed that the q-softplus function can improve the contrastive loss of the Siamese network and the triplet loss of the Triplet network. For the metric learning, the best q is around −0.5. This results suggest that better features can be obtained when the outputs are larger than the output of the max function.

References

Abe, M., Miyao, J., Kurita, T.: q-SNE: visualizing data using q-Gaussian distributed stochastic neighbor embedding. arXiv preprint arXiv:2012.00999 (2020)
Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019)
Google Scholar
Amari, S.: Differential-Geometrical Methods in Statistics, vol. 28. Lecture Notes in Statistics. Springer, New York (1985). https://doi.org/10.1007/978-1-4612-5056-2
Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (2009)
Google Scholar
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “Siamese” time delay neural network. In: Advances in Neural Information Processing Systems, pp. 737–744 (1994)
Google Scholar
Canziani, A., Paszke, A., Culurciello, E.: An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678 (2016)
Chen, W., Chen, X., Zhang, J., Huang, K.: Beyond triplet loss: a deep quadruplet network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 403–412 (2017)
Google Scholar
Chopra, S., Hadsell, R., LeCun, Y., et al.: Learning a similarity metric discriminatively, with application to face verification. In: CVPR (1), pp. 539–546 (2005)
Google Scholar
Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018)
Article Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 1735–1742. IEEE (2006)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Google Scholar
Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: Feragen, A., Pelillo, M., Loog, M. (eds.) SIMBAD 2015. LNCS, vol. 9370, pp. 84–92. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24261-3_7
Chapter Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
van der Maaten, L.: Learning a parametric embedding by preserving local structure. In: van Dyk, D., Welling, M. (eds.) Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 5, pp. 384–391. PMLR, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, 16–18 April 2009. http://proceedings.mlr.press/v5/maaten09a.html
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(86), 2579–2605 (2008). http://jmlr.org/papers/v9/vandermaaten08a.html
Misra, D.: Mish: a self regularized non-monotonic activation function. arXiv preprint arXiv:1908.08681 (2019)
Motoshi Abe, J.M., Kurita, T.: Parametric q-Gaussian distributed stochastic neighbor embedding with convolutional neural network. In: Proceedings of International Joint Conference on Neural Network (IJCNN) (accepted) (2021)
Google Scholar
Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. arXiv preprint arXiv:1710.05941 (2017)
Shah, A., Kadam, E., Shah, H., Shinde, S., Shingade, S.: Deep residual networks with exponential linear unit. In: Proceedings of the Third International Symposium on Computer Vision and the Internet, pp. 59–65 (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Tanaka, M.: Geometry of Entropy. Series on Stochastic Models in Informatics and Data Science. Corona Publishing Co., LTD. (2019). (in Japanese)
Google Scholar
Wang, J., et al.: Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–1393 (2014)
Google Scholar
Wang, L.: Support Vector Machines: Theory and Applications, vol. 177. Springer, Heidelberg (2005). https://doi.org/10.1007/b95439
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. arXiv preprint arXiv:1412.6856 (2014)

Download references

Acknowledgment

This research was motivated from the insightful book by Prof. Masaru Tanaka at Fukuoka University. This work was partly supported by JSPS KAKENHI Grant Number 21K12049.

Author information

Authors and Affiliations

Hiroshima University, Higashi-Hiroshima, Japan
Motoshi Abe & Takio Kurita

Authors

Motoshi Abe
View author publications
You can also search for this author in PubMed Google Scholar
Takio Kurita
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Motoshi Abe .

Editor information

Editors and Affiliations

Korea University, Seoul, Korea (Republic of)
Christian Wallraven
Nanjing University, Nanjing, China
Qingshan Liu
Osaka University, Osaka, Japan
Hajime Nagahara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abe, M., Kurita, T. (2022). q-Softplus Function: Extensions of Activation Function and Loss Function by Using q-Space. In: Wallraven, C., Liu, Q., Nagahara, H. (eds) Pattern Recognition. ACPR 2021. Lecture Notes in Computer Science, vol 13189. Springer, Cham. https://doi.org/10.1007/978-3-031-02444-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-02444-3_3
Published: 10 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-02443-6
Online ISBN: 978-3-031-02444-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

q-Softplus Function: Extensions of Activation Function and Loss Function by Using q-Space

Abstract

Similar content being viewed by others

Deep Metric Learning: Loss Functions Comparison

Qtorch+: Next Generation Arithmetic for Pytorch Machine Learning

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Keywords

1 Introduction