Deep Residual Net Based Compact Feature Representation for Image Retrieval

Bai, Cong; Chen, Jian; Ma, Qing; Liu, Zhi; Chen, Shengyong

doi:10.1007/978-3-030-00767-6_68

Cong Bai¹⁸,
Jian Chen¹⁸,
Qing Ma^18,19,
Zhi Liu¹⁸ &
…
Shengyong Chen¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11165))

Included in the following conference series:

Pacific Rim Conference on Multimedia

2485 Accesses

Abstract

Deep learning technology has been introduced into many multimedia processing tasks, including multimedia retrieval. In this paper, we propose a deep residual net (ResNet) based compact feature representation improve the content-based image retrieval (CBIR) performance. The proposed method integrates ResNet and hashing networks to convert the raw images into binary codes. The binary codes of images in query set and that of the database are compared using Hamming distance for retrieval. Comprehensive experiments are executed on three public databases. The results show that the proposed method outperforms state-of-the-art methods. Furthermore, the impact of the deep convolutional network (DCNN)’s depth on the performance is investigated.

Access provided by CONRICYT-eBooks. Download conference paper PDF

A Feature Learning Approach for Image Retrieval

Efficient Deep Feature Based Semantic Image Retrieval

Article 13 January 2023

Improved Image Retrieval Algorithm of GoogLeNet Neural Network

Keywords

1 Introduction

With the huge accumulation of digital images and videos in the society, the demand of searching such kind of data is also increasing [8]. Traditional multimedia search engines usually use the surrounding meta data, such as titles and tags or manually annotated keywords as the index to retrieval the multimedia data, named keyword-based retrieval [18]. The main drawback of such kind of retrieval technology is the inconsistence between the textural information and visual content of multimedia data. So the content-based multimedia retrieval is proposed and makes great progress in the past decades [13]. In content-based multimedia retrieval, semantic gap is a challenging problem, which refers to the difference between the low level representations of images and the higher level concepts used by human beings to describe the images. To narrow this gap, extensive efforts have been made both from the academic and industry communities [9, 10, 17].

Over the past few years, deep learning has been witnessed as one of the most promising technology in computer vision as their outstanding performance in a series of vision related tasks, such as image classification [19], face recognition [29], image segmentation [27] and so on. Since the successful application of AlexNet [12] in computer vision, the instinct idea to get better feature representation is to use deep convolution neural networks (DCNN). For example, AlexNet contains 8 learned layers, VGGNet [21] has 19 learned layers and GoogLeNet [22] consists of 22 learned layers. However, going deeper means that training such network will become more difficult. Thus deeper but easy to be trained DCNN is proposed, namely, ResNet [6]. Inspired by these successful DCNN, deep learning has already been introduced into content-based image retrieval (CBIR) [2]. However, it is instinct to ask: whether using very deep DCNN will improve the performance of image retrieval, especially for large scale image database?

In order to answer the above question, we use a very deep DCNN, ResNet for image retrieval. However, features extracted by ResNet are high-dimensional, thus they are not compatible with large scale image database if they are used directly for retrieval. So the proposed method constitutes a framework for converting the raw images into binary codes for effectively large scale image retrieval. To do so, the raw images are firstly input into the ResNet to get the deep features. And then the deep features are converted into binary codes by a DCNN based hashing network. In summary, the contributions of this work are twofold:

(1) We investigate a new framework that could convert the raw images into binary codes for large scale image retrieval. This framework integrates ResNet and DCNN based hashing network.

(2) Extensive experiments are conducted for comprehensive evaluations of the proposed framework, especially with different depth of the DCNN.

The reminder of the manuscript is structured as follows: Sect. 2 describes the proposed framework, followed by the experiment results in Sect. 3. Finally, conclusion and perspectives are given in Sect. 4.

2 Proposal

The framework of our method is shown in Fig. 1. The inputs are the pixels of the raw image and the corresponding label information of the image (for training only) and the output are the binary codes of the images. Such codes could be used for image retrieval by comparing the Hamming distances between the query’s codes and the codes of the gallery images. The proposed framework includes two kinds of DCNN. Deep Residual Network (ResNet) [6] is used to convert the raw images into deep features and hashing neural network (HNN) is used to convert the deep feature into binary codes. We name our proposal as ResHNN. Details will be explained in this section.

2.1 ResNet

ResNet won the 1st place on the ILSVRC 2015 classification task and is proved to be easily optimized. And it gains improvement from increased depth [6]. ResNet is composed by many stacked “Residual Units” and each unit could be expressed in the following form:

$$\begin{aligned} y_i=h(x_i)+F(x_i,W_l) \end{aligned}$$

$$\begin{aligned} x_{i+1}=f(y_i) \end{aligned}$$

(1)

where $x_i$ and $x_i+1$ are input and output of the i-th unit, and F is residual function. $h(x_i) $ is an identity mapping and f is a ReLU function [7]. The core idea of ResNet is to learn the additive residual function F with respect to $h(x_i)$, with a key choice of using an identity mapping $h(x_i)=x_i$. This is realized by attaching an shortcut connection that performs identity mapping and their outputs are added to the outputs of the stacked layers. The residual unit we used is as shown in Fig. 1. More details could be found in [6, 7].

2.2 HNN

The hashing layer is based on Nonlinear Discrete Hashing [3], which uses a multi-layer neural network to obtain the compact binary codes through nonlinear transformations. Let $X=[x_1,x_2,\dots ,x_n] \in R^{n \times d}$ denote the training set with n samples, where each sample $x_i \in R^d (1 \le i \le n)$ is a data point of d dimension. Assuming the m-th layer consists of $u^{(m)}$ units, the output of each layer is computed as:

$$\begin{aligned} h^{(1)}(x_i) = s(x_iW^{(1)}+c^{(1)}), i=1,\cdots ,n \end{aligned}$$

(2)

$$\begin{aligned} h^{(m)}(x_i) = s(h^{(m-1)}(x_i)W^{(m)}+c^{(m)}), i=1,\cdots ,n \end{aligned}$$

(3)

where $s(\cdot )$ is a nonlinear activation function such as the tanh function, and the projection matrix $W^{(m)}$ and the bias vector $c^{(m)}$ are the parameters to be learned for the m-th layer of the network.

And for a I-layer network, we could have the output in the form of:

$$\begin{aligned} F(x)=h^{(I)}(x) \in R^{u^{(I)}} \end{aligned}$$

(4)

where the mapping $F:R^d \rightarrow R^{u(I)}$ is a parametric nonlinear function determined by $\{W^{(m)},c^{(m)}\}^I_{m=1}$. We treat the sign of the output of the network as the binary code of these n samples and put the binary code of all the samples together as:

$$\begin{aligned} B=sgn(F(X)) \in \{-1,+1\}^{(n \times r)} \end{aligned}$$

(5)

Specifically, we treat the 0 as $+1$. The formula is as follows:

$$\begin{aligned} sgn(b)=\left\{ \begin{array}{rcl} 1, &{}\quad {b \ge 0}\\ -1, &{}\quad {b < 0} \end{array} \right. \end{aligned}$$

(6)

The goal is to find a binary matrix that minimizes the value of loss function. The formula is as follows:

$$\begin{aligned} arg\min \limits _{B} Q=Q(L,B)+Q(B,X) \end{aligned}$$

(7)

where Q(L, B) means the difference between the predicted labels through the hash code matrix B and the ground truth labels of all samples, and the Q(B, X) means the information loss caused by transforming to binary code. Denoting the classifier weight matrix as C, the first term Q(L, B) can be considered as:

$$\begin{aligned} Q_C(L,B)=||L-CB^T||_F^2 \end{aligned}$$

(8)

where the L is the ground truth label of the samples and $||\cdot ||_F$ means the Frobenius norm.

Q(B, X) measures the discrepancy between the binary codes and the data samples including the quantization loss term and the similarity preserving term

$$\begin{aligned} Q^{(I)}_{F}(B,X)= & {} \Vert B-F^{(I)}\Vert ^2_F \nonumber \\ {}+ & {} \alpha \sum ^n_{i=1}\sum ^n_{j=1}S_{ij}\Vert F^{(I)}(i,:)-F^{(I)}(j,:)\Vert ^2_F \nonumber \\&s{.}t{.} B\in \{-1,1\}^{n\times r}, B^{T}B=nI_{r} \end{aligned}$$

(9)

where S is the similarity matrix. To reduce the redundancy of information, $B^{T}B=nI_{r}$ is added. But the problem of constraint makes optimization difficult, so a real-valued matrix Y is introduced in $\varOmega =\{Y\in \mathbb {R}^{n\times r}\Vert Y^TY=nI_r\}$ approaching to B. So the Eq. 10 is introduced to substitute the independent constraint.

$$\begin{aligned} Q_I(B)=\Vert B-Y\Vert ^2_F \end{aligned}$$

(10)

We notice that the loss function Eq. 7 only considers the outputs of the top layer of the network, but the hidden layers are not included. So the companion loss function is introduced as follows:

$$\begin{aligned} Q_F(B,X)=Q_F^{(I)}(B,X)+\sum ^{I-1}_{m=1}\alpha ^{(n)}h(Q_F^{(m)}-\tau ^{m}) \end{aligned}$$

(11)

where $h(x)=\max (x,0)$ and $Q_F^{(m)}=\sum _{i=1}^n\sum _{j=1}^{n}S_{ij}\Vert F^{(m)}(i,:)-F^{(m)}(j,:)\Vert ^2_F$, $m=1,2,\ldots , I-1$.

In consideration of all the mentioned above, the overall function is defined as follow:

$$\begin{aligned} \arg \min _{\mathbf{B },\mathbf{P },\{\mathbf{F }^{(m)}\}^\mathbf{I }_m=1,\mathbf{Y }} =Q_C+\lambda _1Q_I+\lambda _2Q_F+\lambda _3Q_R \nonumber \\ s{.}t{.} B\in \{-1,1\}^{n\times r} \end{aligned}$$

(12)

where $Q_R=\Vert C\Vert ^2_F+\sum \limits _{m=1}^I\Vert W^{(m)}\Vert ^2_F+\sum \limits ^I_{m=1}\Vert c^{m}\Vert ^2_F$ contains the regularizer to control the scales of the parameters.

Since the above joint optimization problem is non-convex and difficult to solve. Sub-optimal problems with respect to one variable while keeping other variable fixed is used. So we could iterate each variable of optimal solution in sub-optimal problem one by one. And this problem could be solved by Singular Value decomposition (SVD) and Gram-Schmidt process. More details could be found in [3].

3 Experiments

In this section, we conduct experiments on three datasets: MNIST [4], CIFAR10 [11], and SUN379 [26], to evaluate the performance of the proposal and try to answer the question we posed in the introduction.

Table 1. Retrieval performance on MNIST with 16, 32 and 64 bit length of binary codes

Full size table

3.1 Databases

MNIST: It is a handwritten digit dataset consisting of 70000 images with the size of $28\times 28$. Each image is associated with a digit from 0 to 9 and represented as a 784-dimensional gray-scale feature vector by concatenating all pixels [3]. It’s a simple dataset, so we extract a 256 dimensional feature vector by ResNet for each image. Following the same setting in [24], 1000 images with 100 images per class are randomly selected from original test set to form the query set, and use the remaining 69000 images as gallery database.

CIFAR10: It is a set of 60000 manually labeled color images. They are from 10 classes, and each class has 6000 images. Each image is with the size of $32\times 32$. ResNet is used to extract a 1024 dimensional feature vector for each image. Similar to the MNIST, we use 1000 images consist of 100 images per class from original test set as query set and construct the gallery database with the remaining images.

SUN397: This dataset contains 108754 images which are classified into 397 categories. It is bigger and more complex than the two mentioned above databases, it could be a challenge to retrieve semantic neighbors. Each image is represented by a 2048 dimensional feature vector extracted by ResNet. Following the same protocol of the referred methods, 8000 images are randomly sampled as query images and the remaining images are left to form the gallery database.

3.2 Evaluation Metric

All experiments are repeated 10 times and the averaged values are took as the final result. Two metrics are used to measure the performance of different methods: precision at N samples and mean Average Precision (mAP). Given top N returned samples, precision at N samples is calculated as the percentage of relevant retrieved images:

$$\begin{aligned} Precision@N=\frac{\sum _{k=1}^{N}rel(k)}{N} \end{aligned}$$

(13)

where $rel(k)=1$ if k-th image is a relevant retrieved image, otherwise, $rel(k)=0$. The mean Average Precision (mAP) presents an overall measurement of the retrieval performance by computing the area under the precision-recall curve, which delivers good discrimination and stability. It is calculated as follows:

$$\begin{aligned} AveP = \frac{\sum _{k=1}^N (P(k) \times rel(k))}{number\ of\ relevant\ images},\nonumber \\ MAP=\frac{\sum _{q=1}^Q AveP(q)}{Q} \end{aligned}$$

(14)

where k is the rank in the sequence of retrieved documents, N is the number of retrieved images, P(k) is the precision at cut-off k in the list, and rel(k) is equal to 1 if the item at rank k is a relevant image, otherwise, it is equal to 0 [23]. Q is the number of the queries.

3.3 Results and Analysis

Result on MNIST: The training set used for hashing net is with the size of 5000 images by selecting 500 images from each class. The ResNet and HNN are trained separately and the depth of the ResNet we used in this database is 50. For the HNN, we take the tanh function as the nonlinear activation function and initialize the biases $c^{(m)}$ to be 0. Each element of $W^{(m)}$ is uniformly sampled from the range $\left[ -\sqrt{\frac{6}{row(m)+col(m)}}, \sqrt{\frac{6}{row(m)+col(m)}}\ \right] $, where row(m) is the number of rows of $W^{(m)}$ and col(m) is the number of columns of $W^{(m)}$. The numbers R and L are set as 5 and 3. And we set $\alpha ^{(1)}$ and $\alpha ^{(2)}$ as 20, $\alpha ^{(3)}$ as 100, $\tau ^{(1)}$ and $\tau ^{(2)}$ as 1000, $\lambda _1$ as 1e−3, $\lambda _2$ and $\lambda _3$ as 1e−5, learning rate $\eta $ as 1e−3. And the same setting is adopted for all the other datasets. The experimental results are shown in Table 1. The results are compared on hash code with lengths of 16, 32, and 64 bits. As MNIST is a simple handwrite characters dataset, a lot of methods could achieve good performance, so does ResHNN. As the performance achieved in this database is quiet high, the impact of depth of the DCNN is difficult to estimate, so we did not conduct further experiments on it.

Result on CIFAR10: Similar as the setting of experiments on the MNIST dataset, the training set is constructed with 5000 images with 500 images per category. We compare the results in different depth of the ResNet with 50, 101 and 152 to evaluated their impacts on feature extracting and hashing. Results are shown in Table 2. It is obviously that our method ResHNN outperforms referred methods obviously in different kinds of bit length, both in the aspect of mAP and Precision@500 and ResHNN-152 is the best. And with the ResNet going deeper, the retrieval performance improves slightly. We believe that the reason is that deeper networks could extract features from images more efficiently, which is preserved in our hash layers.

Table 2. Retrieval performance on CIFAR-10 with 16, 32 and 64 bit length of binary codes

Full size table

Table 3. Retrieval performance on SUN397 with 48, 64 and 128 bit length of binary codes respectively

Full size table

Result on SUN397: In order to verify whether the proposed ResHNN works well under large and complex conditions, more experiments were conducted in SUN397 database. As this database is a larger collection, we evaluate the impacts of the different number of the layers of ResNet also, with respects of 50, 101 and 152. Results are shown in Table 3. We notice that the proposed ResHNN could achieve the comparable performance with referred methods, and outperforms in long length of bits. Furthermore, the increase of the depth of the ResNet could trigger the obvious improvements on the retrieval with the longer length of binary codes. This could be explained by the fact that the deeper of the DCNN layers, the more information of the visual content of the image could be extracted, and with the longer of the length of the binary codes, such information could be preserved better. The examples of retrieval results with different layers of ResNet are shown in Fig. 2. The wrong returned images are marked by red circles.

4 Conclusion and Perspective

In this paper, we propose a ResNet based compact feature representation for image retrieval, namely, ResHNN, which integrates Residual net and hashing neural networks to generate the binary code for CBIR. Extensive experimental results on three widely used public databases demonstrate the superiority of the proposed ResHNN. Furthermore, we explore the impact of ResNet’s depth on the performance. The impacts of the different deep features and different hashing method will be discovered further.

References

Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS, pp. 459–468 (2006)
Google Scholar
Bai, C., Huang, L., Pan, X., Zheng, J., Chen, S.: Optimization of deep convolutional neural network for large scale image retrieval. Neurocomputing 303, 60–67 (2018)
Article Google Scholar
Chen, Z., Lu, J., Feng, J., Zhou, J.: Nonlinear discrete hashing. IEEE Trans. Multimedia 19(1), 123–135 (2017)
Article Google Scholar
Deng, L.: The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Sig. Process. Mag. 29(6), 141–142 (2012)
Article Google Scholar
Gong, Y., Lazebnik, S., Gordo, A., Perronnin, F.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2916–2929 (2013)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Chapter Google Scholar
Hong, R., Hu, Z., Wang, R., Wang, M., Tao, D.: Multi-view object retrieval via multi-scale topic models. IEEE Trans. Image Process. 25(12), 5814–5827 (2016). https://doi.org/10.1109/TIP.2016.2614132
Article MathSciNet Google Scholar
Hong, R., Zhang, L., Tao, D.: Unified photo enhancement by discovering aesthetic communities from flickr. IEEE Trans. Image Process. 25(3), 1124–1135 (2016). https://doi.org/10.1109/TIP.2016.2514499
Article MathSciNet Google Scholar
Hong, R., Zhang, L., Zhang, C., Zimmermann, R.: Flickr circles: aesthetic tendency discovery by multi-view regularized topic modeling. IEEE Trans. Multimedia 18(8), 1555–1567 (2016). https://doi.org/10.1109/TMM.2016.2567071
Article Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Lew, M., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: state of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2(1), 1–19 (2006)
Article Google Scholar
Li, K., Qi, G.J., Hua, K.A.: Learning label preserving binary codes for multimedia retrieval: a general approach. ACM Trans. Multimedia Comput. Commun. Appl. 14(1), 2:1–2:23 (2017)
Article Google Scholar
Lin, G., Shen, C., Shi, Q., Van Den Hengel, A., Suter, D.: Fast supervised hashing with decision trees for high-dimensional data. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1971–1978 (2014)
Google Scholar
Liong, V.E., Lu, J., Wang, G., Moulin, P., Zhou, J.: Deep hashing for compact binary codes learning. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 07–12 June 2015, pp. 2475–2483 (2015)
Google Scholar
Liu, Y., Zhang, D., Lu, G., Ma, W.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40(1), 262–282 (2007)
Article Google Scholar
Mei, T., Rui, Y., Li, S., Tian, Q.: Multimedia search reranking. ACM Comput. Surv. 46(3), 1–38 (2014)
Article Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Shen, F., Shen, C., Liu, W., Shen, H.T.: Supervised discrete hashing. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 07–12 June 2015, pp. 37–45 (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICRL), pp. 1–14 (2015)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 07–12 June 2015, pp. 1–9 (2015)
Google Scholar
Turpin, A., Scholer, F.: User performance versus precision measures for simple search tasks. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 11–18. ACM, New York (2006)
Google Scholar
Wang, J., Kumar, S., Chang, S.F.: Semi-supervised hashing for large-scale search. IEEE Trans. Pattern Anal. Mach. Intell. 34(12), 2393–2406 (2012)
Article Google Scholar
Weng, L., Jhuo, I.H., Shi, M., Sun, M., Cheng, W.H., Amsaleg, L.: Supervised multi-scale locality sensitive hashing. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ICMR 2015, pp. 259–266. ACM, New York (2015)
Google Scholar
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE conference on Computer vision and pattern recognition (CVPR), pp. 3485–3492. IEEE (2010)
Google Scholar
Ye, L., Liu, Z., Li, L., Shen, L., Bai, C., Wang, Y.: Salient object segmentation via effective integration of saliency and objectness. IEEE Trans. Multimedia 19(8), 1742–1756 (2017)
Article Google Scholar
Zhang, J., Peng, Y.: SSDH: semi-supervised deep hashing for large scale image retrieval. IEEE Trans. Circ. Syst. Video Technol. PP(99), 1 (2017)
Google Scholar
Zheng, J., Yang, P., Chen, S., Shen, G., Wang, W.: Iterative re-constrained group sparse face recognition with adaptive weights learning. IEEE Trans. Image Process. 26(5), 2408–2423 (2017)
Article MathSciNet Google Scholar

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China under Grants No. 61502424 and U1509207, Zhejiang Provincial Natural Science Foundation of China under Grant No. LY18F020032 and LY16F020033.

Author information

Authors and Affiliations

College of Computer Science, Zhejiang University of Technology, Hangzhou, China
Cong Bai, Jian Chen, Qing Ma, Zhi Liu & Shengyong Chen
College of Science, Zhejiang University of Technology, Hangzhou, China
Qing Ma

Authors

Cong Bai
View author publications
You can also search for this author in PubMed Google Scholar
Jian Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qing Ma
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shengyong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cong Bai .

Editor information

Editors and Affiliations

Hefei University of Technology, Hefei, China
Richang Hong
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
University of Tokyo, Tokyo, Japan
Toshihiko Yamasaki
Hefei University of Technology, Hefei, China
Meng Wang
City University of Hong Kong, Hong Kong, Hong Kong
Chong-Wah Ngo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bai, C., Chen, J., Ma, Q., Liu, Z., Chen, S. (2018). Deep Residual Net Based Compact Feature Representation for Image Retrieval. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11165. Springer, Cham. https://doi.org/10.1007/978-3-030-00767-6_68

Download citation

DOI: https://doi.org/10.1007/978-3-030-00767-6_68
Published: 19 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00766-9
Online ISBN: 978-3-030-00767-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deep Residual Net Based Compact Feature Representation for Image Retrieval

Abstract

Similar content being viewed by others

A Feature Learning Approach for Image Retrieval

Efficient Deep Feature Based Semantic Image Retrieval

Improved Image Retrieval Algorithm of GoogLeNet Neural Network

Keywords

1 Introduction