SATB-Nets: Training Deep Neural Networks with Segmented Asymmetric Ternary and Binary Weights

Gao, Shuai; Wu, JunMin; Da Chen; Ding, Jie

doi:10.1007/978-3-030-04179-3_62

Shuai Gao¹⁶,
JunMin Wu¹⁷,
Da Chen¹⁶ &
…
Jie Ding¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11302))

Included in the following conference series:

International Conference on Neural Information Processing

2248 Accesses

Abstract

Deep convolutional neural networks (CNNs) are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources efficiently. To address this limitation, we introduce SATB-Nets, a method which trains CNNs with segmented asymmetric ternary weights for convolutional layers and binary weights for the fully-connected layers. We compare SATB-Nets with previous proposed ternary weight networks (TWNs), binary weight networks (BWNs) and full precision networks (FPWNs) on CIFAR-10 and ImageNet datasets. The result shows that our SATB-Nets model outperforms full precision model VGG16 by 0.65% on CIFAR-10 and achieves up to $29\times $ model compression rate. On ImageNet, there is $31\times $ model compression rate and only 0.15% accuracy degradation over the full-precision AlexNet model of Top-1 accuracy.

Access provided by CONRICYT-eBooks. Download conference paper PDF

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks

Training Low Bitwidth Model with Weight Normalization for Convolutional Neural Networks

Three-Means Ternary Quantization

Keywords

1 Introduction

Deep Neural Networks (DNNs) have inexorably pushed the amazing performances in lots of application domains including but not limited to the speech recognition [1, 2] and computer vision, mainly including object recognition [3, 4, 6, 23] and object detection [7, 8, 10]. A particular type of networks, named Convolution Neural Networks (CNNs), are being deployed to real world applications on smart phones and other embedded devices. However, it is difficult to deploy these computationally intensive and memory-intensive CNNs on embedded devices which are both computational resources limited and storage limited.

1.1 Binary Weight Networks and Model Compression

To address the storage and computational issues [5, 21], methods that seek to binarize weights or activations in DNNs models have been proposed. BinaryConnect [11] binarizes the weights to {+1, −1} with a single sign function. Binary Weight Networks [12] improve the models’ capacity by adding an extra scaling factor on the basis of the previous method. BinaryNet [12] and XNOR-Net [13] binarize not only weights but also activations as extensions of the previous methods. These models eliminate most of the multiplication operations in the forward and backward propagations [16] and model compression rate achieves up to 32$\times $, but there are also considerable accuracy loss.

1.2 Ternary Weight Networks and Model Compression

Nowadays, more and more researchers are engaged in the quantization of 2-bit neural networks especially the ternary weights quantization. Ternary weights networks (TWNs) [14] were introduced with the weights constrained to {$-1, 0, +1$} to maximize scale model compression and minimize the precision loss of the model as far as possible. Compared with the previous binary quantization network, the accuracy loss has been reduced obviously because of the increased weights precision. However, there are also some tricks to improve the capacity of ternary weights networks with the different scaling factors for positive and negative weights.

We optimize the previous methods [14, 20] by proposing Segmented Asymmetric Ternary and Binary Weights Networks (SATB-Nets) to explore higher model capacity and model compression rate. For each layer, we segment the weights vector space into many disjoint subspaces. In each subspace, we confine weights to three values {$+W^{pt}_{ls}, 0, -W^{nt}_{ls}$} for convolutional (CONV) layers and two values {$+W^{pb}_{ls}, -W^{nb}_{ls}$} for fully-connected (FC) layers, which can also be encoded with two bits and a single bit. Compared with TWNs [14] and BWNs [11] quantization method, our SATB-Nets are able to explore the local redundancy structure better and gain more stronger expressive abilities leading to better performance. In addition, the fixed scaling factors {$+W^{p*}_{ls}, 0, -W^{n*}_{ls}$} provide more possibilities for computing acceleration.

2 Segmented Asymmetric Ternary and Binary Weights Networks

We will detailedly introduce how to obtain Segmented Asymmetric Ternary and Binary Weights Networks (SATB-Nets) and train them efficiently in this section.

2.1 Segmentation

Product quantization (PQ) [18] partitions the vector space into many disjoint subspaces to explore the redundancy of structures in vector space. Authors of [9] proposes the segmentation of the weight matrix and then the performance of quantization in each subspace. Similarly, we partition weight matrix into several submatrices to improve the expression ability of the quantized networks:

$$\begin{aligned} W= [W^{1}, W^{2}, ... , W^{k}], \end{aligned}$$

(1)

Where $W\in R^{m*n}$ and $W^{i}\in R^{m*(n/k)}$ assuming n is divisible by k. We can quantify each submatrix $W^{i}$ with ternary and binary value. More segments lead to higher model capacity but will aggressively increase the codebook size. So, by using the same trick as described in [9], we fixed the number of segments k to 8 to keep a satisfying balance between compression rate and output precision loss of the networks.

2.2 Asymmetric Binary Weights for Fully-Connected(FC) Layers

We constrain the full precision weights $W_{lsi}$ (lth layers, sth segments and ith parameters) to binary weights with values belong to {$+W^{pb}_{ls}, -W^{nb}_{ls}$}. The quantization function is shown in (2).

$$\begin{aligned} w^{b}_{lsi} = f_{b}(w^{b}_{lsi}) = \left\{ \begin{array}{ll} +W^{pb}_{ls} &{} \qquad w_{lsi} \ge 0\\ {}\\ -W^{nb}_{ls} &{} \qquad w_{lsi}<0 \end{array} \right. \end{aligned}$$

(2)

Here 0 is threshold and {$W^{pb}_{ls}, W^{nb}_{ls}$} are the scaling factors. In order to get as well performance as possible, the minimization of Euclidian distance between the floating-point weights $W_{ls}$ and binary weights $W^{b}_{ls}$ is adopted and the optimization problem is transformed to (3):

$$\begin{aligned} \left\{ \begin{array}{ll} W^{*}_{ls} = arg\min J( W^{*}_{ls}) = arg\min { \left\| { W }_{ ls }-{ W }_{ ls }^{ b } \right\| }_{ 2 }^{ 2 }\\ {}\\ s.t. W^{*}_{ls}>0; w^{b}_{lsi}\in \{+W^{pb}_{ls},0,-W^{nb}_{ls}\}; i=1,2,...,n; s=1,2,...,k \end{array} \right. \end{aligned}$$

(3)

Substitute the binary function (2) into the formula (3), we can get the expression as (4):

$$\begin{aligned} J( { W }_{ ls }^{ * }) ={ \left\| { W }_{ ls }-{ W }_{ ls }^{ b } \right\| }_{ 2 }^{ 2 }=\sum _{ i\in { I }_{ * } }^{ { n }_{ s } }{ { \left| \left| { w }_{ lsi } \right| -{ W }_{ ls }^{ * } \right| }^{ 2 } } \end{aligned}$$

(4)

where ${ I }_{ * } = ${${ I }_{ p },{ I }_{ n }$}, ${ I }_{ p } = \{i|{ w }_{ lsi } \ge 0\}$, ${ I }_{ n } = \{i|{ w }_{ lsi }<0\}$. According to (4), It is not complicated to obtain binary weights from the floating-point weights as (5):

$$\begin{aligned} \begin{aligned} { W }_{ ls }^{ * }=\frac{ 1 }{ \left| { I }_{ * } \right| } \sum _{ i\in { I }_{ * } }^{ { n }_{ s } }{ \left| { w }_{ lsi } \right| } \end{aligned} \end{aligned}$$

(5)

where $|{ I }_{ * }|$ denotes the number of elements in ${ I }_{ * }$ in each segment.

2.3 Asymmetric Ternary Weights for Convolutional (CONV) Layers

Similarly, we also constrain the floating-point weights $W_{lsi}$ (lth layers, sth segments and ith parameters) to ternary weights with values belong to {$+W^{pt}_{ls}, 0, -W^{nt}_{ls}$}. The quantization function is shown in (6).

$$\begin{aligned} w^{t}_{lsi} = f_{t}(w^{t}_{lsi}|\varDelta ^{p}_{ls}, \varDelta ^{n}_{ls}) = \left\{ \begin{array}{ll} +W^{pt}_{ls} &{} \qquad {w_{lsi}>\varDelta ^{p}_{ls}}\\ 0 &{} \qquad {-\varDelta ^{n}_{ls} \le w_{lsi} \le \varDelta ^{p}_{ls}}\\ -W^{nt}_{ls} &{} \qquad {w_{lsi}<-\varDelta ^{n}_{ls}} \end{array} \right. \end{aligned}$$

(6)

Here {$\varDelta ^{P}_{ls}, \varDelta ^{n}_{ls}$} are the threshold and {$W^{pt}_{ls}, W^{nt}_{ls}$} are the scaling factors. The optimization problem is formulated as (7):

$$\begin{aligned} \left\{ \begin{array}{ll} W^{*}_{ls}= arg\min J( W^{*}_{ls}) = arg\min { \left\| { W }_{ ls }-{ W }_{ ls }^{ t } \right\| }_{ 2 }^{ 2 }\\ {}\\ s.t. W^{*}_{ls}>0; W^{t}_{lsi}\in \{+W^{pt}_{ls},0,-W^{nt}_{ls}\}; i=1,2,...,n; s=1,2,...,k \end{array} \right. \end{aligned}$$

(7)

Substitute the ternary function (6) into the formula (7), we can get the expression as (8):

$$\begin{aligned} \begin{aligned} J( { W }_{ ls }^{ * })&={ \left\| { W }_{ ls }-{ W }_{ ls }^{ t } \right\| }_{ 2 }^{ 2 } = \left| { I }_{ { \varDelta }_{ ls }^{ * } } \right| * { ({ W }_{ ls }^{ * }) }^{ 2 } - 2*(\sum _{ i|i\in { I }_{ { \varDelta }_{ ls }^{ * } } }^{ { n }_{ s } }{ \left| { w }_{ lsi } \right| } )*({ W }_{ ls }^{ * }) + C\\ \end{aligned} \end{aligned}$$

(8)

Where $ { I }_{ { \varDelta }_{ ls }^{ * } } = ${${ I }_{ { \varDelta }_{ ls }^{ p } }, { I }_{ { \varDelta }_{ ls }^{ n } }$}, ${ I }_{ { \varDelta }_{ ls }^{ p } } = \{i|{ w }_{ lsi }>{ \varDelta }_{ ls }^{ p }\}$, ${ I }_{ { \varDelta }_{ ls }^{ n } } = \{i|{ w }_{ lsi }<-{ \varDelta }_{ ls }^{ n }\}$ and $|{ I }_{ { \varDelta }_{ ls }^{ * } }|$ denotes the number of elements in ${ I }_{ { \varDelta }_{ ls }^{ * } }$ in each segment. ${ \varDelta }_{ ls }^{ p }$ and ${ \varDelta }_{ ls }^{ n }$ are independent together. $C = \sum _{ i }^{ { n }_{ s } }{ { \left| { w }_{ lsi } \right| }^{ 2 } } $ is a {$W^{pt}_{ls}, W^{nt}_{ls}$}-independent constant. Therefore, our scaling factors {$W^{pt}_{ls}, W^{nt}_{ls}$} can be simplified to:

$$\begin{aligned} \begin{aligned} { W }_{ ls }^{ * } = arg \min J( { W }_{ ls }^{ * }) = arg \min (\left| { I }_{ { \varDelta }_{ ls }^{ * } } \right| *{ ({ W }_{ ls }^{ * }) }^{ 2 }-2*(\sum _{ i|i\in { I }_{ { \varDelta }_{ ls }^{ * } } }^{ { n }_{ s } }{ \left| { w }_{ lsi } \right| } )*({ W }_{ ls }^{ * })) \end{aligned} \end{aligned}$$

(9)

According to (9), It is not complicated to obtain tenary weights from the floating-point weights as (10):

$$\begin{aligned} \begin{aligned}&{ W }_{ ls }^{ * } =\frac{ 1 }{ \left| { I }_{ { \varDelta }_{ ls }^{ * } } \right| } \sum _{ i|i \in { I }_{ { \varDelta }_{ ls }^{ * } } }^{ { n }_{ s } }{ \left| { w }_{ lsi } \right| } \\ \end{aligned} \end{aligned}$$

(10)

Here {${ \varDelta }_{ ls }^{ p }, { \varDelta }_{ ls }^{ n }$} are both positive values. There is no straightforward solutions to figure out ${ \varDelta }_{ ls }^{ p } $ and $ { \varDelta }_{ ls }^{ n }$ as [17]. But values are generated from uniform or normal distribution empirically, adopting the method mentioned in [14], the thresholds are as following:

$$\begin{aligned} \begin{aligned}&{ \varDelta }_{ ls }^{ * } \approx 0.7 * \frac{ 1 }{ \left| { I }^{ * } \right| } \sum _{ i|i \in { I }^{ * } }^{ { n }_{ s } }{ \left| { w }_{ lsi } \right| } \end{aligned} \end{aligned}$$

(11)

Where ${ I }^{ * } = $ {${ I }^{ p },{ I }^{ n }$}, ${ I }^{ p } = \{i|{w}_{lsi} \ge 0|i = 1,2,...,n_s\}$, ${ I }^{ n } = \{i|{w}_{lsi} < 0|i = 1,2,...,n_s\}$. Finally, by substituting (10) and (11) to (6), Ternary weights can be easily obtained from the floating-point weights.

2.4 Heterogeneous Quantized Weights Structure

In order to achieve a good balance between compression rate and accuracy, we train CNNs with ternary weights for convolutional layers and binary weights for the fully-connected layers. On the one hand, these densely and highly redundancy fully-connected layers take up most of the parameters, binarization is more helpful in removing redundancy and higher proportion of compression. On the other hand, [21] shows that convolutional layers require more bits of precision than fully-connected layers, so ternary weights for convolutional layers improve the expression capacity. In addition, the quantization values of zero for convolutional layers reduce the calculation of the multiplication to accelerate the networks.

2.5 Train the SATB-Nets with Stochastic Gradient Descent (SGD) Method

Stochastic Gradient Descent (SGD) algorithm is used as the training algorithm for SATB-Nets, about which more detail is shown in Algorithm 1.

The whole training process is almost the same as normal training method, except that segmented asymmetric ternary weights for convolutional (CONV) layers and binary weights for the fully-connected (FC) layers are used in forward propagation (step 1) and backward derivation (step 2), which is similar to training method as BinaryConnect [11]. In order to overcome the difficulty of convergence of models using quantized weights, we reserved the full precision floating-point weights to update weights to obtain the tiny changes in each iteration (step 3).

In addition, Batch Normalization (BN) [24] and learning rate scaling, as two useful tricks, are adopted. We also use momentum for acceleration.

3 Experiments

In this section, we benchmark SATB-Nets with full precision weights networks (FPWNs), binary weights networks (BWNs) and Ternary Weights Networks (TWNs) on the small scale datasets (CIFAR-10) and the large scale dataset (ImageNet datasets). We adopt the VGG [6] networks on Cifar-10 and the AlexNet [3] on ImageNet. To be fair, the following terms are identical: network architecture, learning rate scaling procedure (multi-step), optimization method (SGD with momentum) and regularization method (L2 weight decay). We conjecture that SATB-Nets have sufficient expressiveness in the depth networks and adopt the data augmentation and the sparse weights like dropout [15] to prevent over-fitting. In addition, all the neural networks are deployed on framework of Caffe [25]. For more detailed configurations, we can see Table 1.

Table 1. Networks and some hyper-parameters of them on datasets

Full size table

3.1 VGGNets on CIFAR-10

CIFAR-10 is a benchmark image classification dataset which consists of 60K $32 \times 32$ color images and Five sixths of them belong to a training set and the rest belong to the test set. To prevent over-fitting while training VGG [6] networks, data-augmentation is used following [4]. A random $32 \times 32$ crop is from the padded images on which 4 pixels are padded each side. The cropped images are used for training while original images are for testing. We adopt VGG16 [6] architecture for the experiment firstly. Beside, in order to solve the difficulty of training so deep neural network, we initialize these networks with full-trained full precision model.

We compare SATB-Nets with the FPWNs, BWNs and TWNs. The result (Fig. 1 and Table 2) shows that SATB-Nets from VGG16 outperforms BWNs, TWNs and FPWNs by 2.52%, 1.09% and 0.65% respectively. In the meanwhile, SATB-Nets from VGG10, VGG13 and VGG16 are always outperforming BWNs and TWNs.

To our surprise, the SATB-Nets constrained from VGG13 and VGG16 outperform the full precision weights networks. According to our analysis, we conjecture that our SATB-Nets have adequate capacity for expression and the sparse weights networks prevent over-fitting like dropout [15].

Table 2. Validation accuracy of VGGNet on CIFAR-10 (%)

Full size table

For the more sufficient experimental verification, we expand the experiment to VGG13 removing last 3 convolutional layers of VGG16 [6] and VGG10 removing last 6 convolutional layers. The results met our expectations which are listed in Table 2. In the meanwhile, Table 3 shows the compression ratio of VGG-16.

Table 3. Compression ratio for VGG-16 (Byte)

Full size table

3.2 AlexNet on ImageNet

We further examine the performance of SATB-Nets on the ImageNet ILSVRC-2012 dataset, which has over 1.2M training examples and 50K validation examples. We use the AlexNet Caffe model [26] as the reference model. Beside, in order to solve the difficulty of training so deep neural network, we initialize these networks with full-trained full precision model.

Our training curves are shown in Fig. 2, the complete result (Fig. 2 and Table 4) shows that SATB-Nets reaches the top-1 validation accuracy of 56.57% which has only 0.15% accuracy degradation over full precision counterpart.

Table 4. Validation accuracy of AlexNet on ImageNet(%)

Full size table

Tables 3 and 5 show the compression ratio of VGG-16 and AlexNet. SATB-Nets achieve up to $29\times $ and $31\times $ model compression rate respectively which are closed to the binary weights compression without impacting accuracy.

Table 5. Compression statistics for AlexNet (Byte)

Full size table

4 Conclusion

In this paper, we propose ternary and binary weights networks optimization problems. Next, We propose SATB-Nets which nearly achieve up to binary compression ratio. Meanwhile, experiments show that benchmarks demonstrate the superior performance of the method which we proposed. Next step, we will apply the method to more datasets and models to more deeply explore the relationships between the capacity of networks and the quantized values.

References

Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Annual Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE Computer Society (2016)
Google Scholar
Esser, S.K., Merolla, P.A., Arthur, J.V., Cassidy, A.S., Appuswamy, R., Andreopoulos, A.: Convolutional networks for fast, energy-efficient neuromorphic computing. In: Proceedings of the National Academy of Sciences of the USA, pp. 1441–1446 (2016)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems. vol. 39, pp. 91–99. MIT Press, Cambridge (2015)
Google Scholar
Gong, Y., Liu, L., Yang, M., et al.: Compressing deep convolutional networks using vector quantization (2014). arXiv preprint arXiv:1412.6115
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1440–1448 (2015)
Google Scholar
Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: Training deep neural networks with binary weights during propagations. In: Proceedings of the Annual Conference on Neural Information Processing Systems, pp. 3123–3131 (2015)
Google Scholar
Hubara, I., Courbariaux, M., Soudry, D., et al.: Binarized neural networks. In: Proceedings of the Annual Conference on Neural Information Processing Systems, pp. 4107–4115 (2016)
Google Scholar
Rastegari, M., Ordonez, V., Farhadi, A., et al.: XNOR-Net: Imagenet classification using binary convolutional neural networks. In: Proceedings of the European Conference on Computer Vision, pp. 525–542 (2016)
Chapter Google Scholar
Li, F., Zhang, B., Liu, B.: Ternary weight networks (2016). arXiv preprint arXiv:1605.04711
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Lin, Z., Courbariaux, M., Memisevic, R., Bengio, Y.: Neural networks with few multiplications (2015). arXiv preprint arXiv:1510.03009
Hwang, K., Sung, W.: Fixed-point feedforward deep neural network design using weights $+$1, 0, and $-$1 (2014). arXiv preprint arXiv:1405.3866
Jgou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2010)
Article Google Scholar
Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, pp. 562–570 (2015)
Google Scholar
Ding, J., Wu, J.M., Wu, H.: Asymmetric ternary networks. In: International Conference on TOOLS with Artificial Intelligence IEEE Computer Society, pp. 61–65 (2017)
Google Scholar
Han, S., Mao, H., Dally, W.J.: Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding (2015). arXiv preprint arXiv:1510.00149
Deng, J., Dong, W., Socher, R., et al.: Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning, pp. 448–456 (2015)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the International Conference on Multimedia Retrieval, pp. 675–678 (2014)
Google Scholar
BVLC: Caffe model zoo. http://caffe.berkeleyvision.org/model_zoo

Download references

Acknowledgment

The National Key Research and Development Program of China (Grants No. 2016YFB1000403).

Author information

Authors and Affiliations

School of Software Engineering, University of Science and Technology of China, SuZhou, China
Shuai Gao & Da Chen
Department of Computer Science and Technology, University of Science and Technology of China, HeFei, China
JunMin Wu & Jie Ding

Authors

Shuai Gao
View author publications
You can also search for this author in PubMed Google Scholar
JunMin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Da Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jie Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shuai Gao or JunMin Wu .

Editor information

Editors and Affiliations

The Chinese Academy of Sciences, Beijing, China
Long Cheng
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi Sing Leung
Kobe University, Kobe, Japan
Seiichi Ozawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, S., Wu, J., Da Chen, Ding, J. (2018). SATB-Nets: Training Deep Neural Networks with Segmented Asymmetric Ternary and Binary Weights. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11302. Springer, Cham. https://doi.org/10.1007/978-3-030-04179-3_62

Download citation

DOI: https://doi.org/10.1007/978-3-030-04179-3_62
Published: 18 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04178-6
Online ISBN: 978-3-030-04179-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SATB-Nets: Training Deep Neural Networks with Segmented Asymmetric Ternary and Binary Weights

Abstract

Similar content being viewed by others

XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks