1 Introduction

Nearest neighbor (NN) search is a fundamental technique for many computer vision tasks, such as classification [3], recognition [21, 27], matching [5, 26] and retrieval [8, 10, 15, 23, 28]. The traditional NN search needs to calculate and store Euclidean distance between the high dimension vectors, thus it costs large amount of disk space and computation to deal with large scale data. To achieve higher efficiency, compact codes are widely used to speed up NN search for several reasons. First, representing high dimensional vectors with a small number of bits takes much less space while maintaining almost the same information (e.g., [27] shows that compressing images into binary codes still preserves the information required for recognition). Thus, compact codes of large dataset can be stored in memories, which are much faster than hard drives. Second, compact codes allow fast query because computing distance with the compressed data is much more efficient.

Recent work on learning compact codes falls into two categories. The first one is the hashing method that compresses vectors into binary codes using two stages [5, 9, 10, 16, 21, 2629]. The first stage is to find a low-dimensional embedding and the second stage is to binarize the low-dimensional vectors. The other category is quantizing the input data with learned codebooks [1, 2, 4, 6, 8, 12, 15, 20, 30]. Both of the two categories of method can be used for the approximate nearest neighbor (ANN) search, in which the Euclidean distance can be replaced with the Hamming distance or the table look-up way like asymmetric distance computation strategy [15]. However, there are several differences between hashing and quantization method for ANN search. The most different impact is the way of computing the approximate distance. To approximate the Euclidean distances between query and database, hashing method computes Hamming distance between the binary codes of query and database, while quantization store the Euclidean distances between codewords and inquires them by the codes. Therefore the range of distance of quantization is much larger than hashing. Moreover, since the codewords and query are in the same Euclidean space, quantization methods can extend to asymmetric distance search strategy, which is proved better than the symmetric distance [15] like the Hamming distance. Since the proposed method follows the idea of the second category, we explicitly introduce the recent works about quantization as following.

Vector quantization (VQ) [11] implements Llody’s method [17] to train a codebook where the nearest codeword is selected to represent an input vector. Product Quantization (PQ) [15] splits one input vector into several subvectors where each subvector is quantized independently by their own vector quantizers. The compact code of PQ is the concatenation of the indices of all quantizers. This quantization technique is efficient for high dimensional vectors because the size of each codebook can be considerably reduced compared to standard VQ techniques. However, PQ algorithm treats the subvectors as being independent, which does not consider the relationships between subvectors. To remedy this problem, transform coding (TC) [4] uses a set of scalar quantizers to compress each subvector, which employs principle components analysis (PCA) and adaptive bit allocation to reduce the information redundancy in each dimension. Optimized Product Quantization (OPQ) [8] learns an orthogonal projection matrix for input data to minimize quantization distortion by the space decomposition. Another quantization framework uses the addition of codewords instead of the Cartesian product to reconstruct the input vectors. Hence, it has no independence assumption on the codebooks. Residual vector quantization (RVQ) recursively implements the VQ method to quantize the residual of the previous quantization level [2, 6]. Additive quantization (AQ) [1] and Composition quantization (CQ) [30] trains a set of codebooks at the same time, where one and only one codeword from each codebook is selected to reconstruct the input.

AQ algorithm represents each input vector with a sum of codewords, where each codeword is from its own codebook. Since the independence assumption of PQ and OPQ is eliminated, AQ has considerable improvement compared to the above two state-of-the-art methods. However, AQ relies on a heuristic search strategy called beam search [19, 25] to encode the input vectors. Beam search needs to exhaustively search over all available codebooks during the vector encoding, which causes high computational complexity that may not proper to dispose large-scale dataset. In practical applications, the efficiency of quantization method influences two aspects of the applied technique. On the service side, ANN search methods need to compress the data off-line and store the codes before a new query is given. Since amount of data is generated every day, fast encoding strategy can significantly save time so that the servers can compress the new data in time. On the client side, the query needs to be compressed to compact code for ANN search. A fast encoding strategy will reduce the retrieval time Therefore, an efficient encoding algorithm is significantly important.

The rest of the paper is organized as follows. In Section 2, the additive quantization is reviewed. In Section 3, the beam search problem is formulated and the proposed solution is introduced. In Section 4, we show the experimental results. The paper is concluded in Section 5.

2 Additive quantization

We first introduce the additive quantization which is the basis of our work. Figure 1 shows its whole framework, which contains the training phase and the compressing phase. In the training phase, the codebooks and the code of data are updated in codebook learning stage and encoding stage iteratively. Then the learned codebooks are used to compress the large-scale database in the compressing phase.

Fig. 1
figure 1

The framework of Additive Quantization

The process of the training phase can be described as follows. With N input vectors \( \mathcal{X}={\left\{{\boldsymbol{x}}_n\right\}}_{n=1}^N \) in training phase, the additive quantization represents one vector x as

$$ \boldsymbol{x}\approx {\displaystyle \sum_{m=1}^M}{c}_m\left({b}_m\right),\kern0.75em {b}_m\in 1,\cdots, K $$
(1)

where c m (b m ) denotes the b m th codeword in codebook C m , K is the number of codewords in each codebook, and M means the number of codebooks. We further denote the set of M codebooks \( {\boldsymbol{C}}^{(M)}=\bigcup_{m=1}^M{C}_m \) and an M-tuple code b = [b 1,  ⋯ , b M ]. The training phase aims to find the proper codebooks and codewords representing training data with minimum approximation error. i.e.,

$$ \underset{{\boldsymbol{C}}^{(M)},\boldsymbol{b}}{ \min }{\displaystyle \sum_{\boldsymbol{x}\in \mathcal{X}}}\left|\right|\boldsymbol{x}-{\displaystyle \sum_{m=1}^M}{c}_m\left({b}_m\right)\left|\right|{}^2 $$
(2)

Equation (2) is often solved via iteratively optimizing b and C (M). In the codebook learning stage, with the given code \( \boldsymbol{b}\in \mathcal{B} \) (where \( B \) is the set of codes for χ ), optimization of C (M) can be simply solved as an unconstrained nonlinear optimization problem [1]. In the encoding stage, C (M) is fixed and the best code b for each x is updated. However, the best combination of codewords from each codebook is difficult to find in finite search loops, and the encoding stage usually occupies most of the time consumption in training phase. Therefore the key point of additive quantization is to find an efficient search strategy for vector encoding. The approximation error of x is defined using Euclidean distance

figure f
$$ E\left(\boldsymbol{b}\right)\triangleq \left|\right|\boldsymbol{x}-{\displaystyle \sum_{m=1}^M}{c}_m\left({b}_m\right)\left|\right|{}^2 $$
(3)

With the input x, the minimization of Eq. (3) is essentially a high-order Markov random field problem where each b m in b is a discrete random variable, and the state of node b m can be updated iteratively to find the minimal error. Many methods can be used to solve this problem, such as alternative optimization technique [30] and Loopy Belief Propagation [24].

Discussed in [1], these optimal methods may cost very large amount of time to find the best combination even for small codebooks. AQ adopts the beam search [25] which constructs the output codewords successively. Specifically, beam search takes M steps to choose the M codewords instead of alternatively updating. With an input vector x, the available set of codewords is initialized as C (M). In the first slice, the top T nearest neighbors (T-NN) for x are found in C (M) to be T candidates. In the next slice, for each candidate, an available set of codewords is updated as C (M − 1) = C (M) ∖ C m . Then the T-nearest neighbors for x − c m (b m ) are found in C (M − 1), therefore we obtain T 2 tuples of codewords for the previous T candidates. Then T unique tuples of codewords with less approximation (i.e., ||x − c m (b m )||2) are picked to be the candidates for next step. After M slices, the T unique tuples of codewords are selected, and one tuple with the smallest approximation error is returned as the code of x. An explicit description can be summarized in Algorithm 1 for further discussion. The beam search needs to seek all available set of codebooks, that means it searches the remaining M − m + 1 codebooks C (M − m + 1) in m th step, therefore the computational complexity referring to M is O(M(M − 1)/2) = O(M 2).

After the training phase, the beam search is used to compress the database with the codebooks learned in the compressing phase. However, we think the time consumption is still unacceptable in practical application, where the number of databases is usually very large. There are two shortages in AQ that can be considered to improve the efficiency of compressing phase. First, in the compressing phase the learned codes of training data are useless. Second, the prior of the input data is not used to contribute to encode the database after the training phase, therefore the compressing phase of AQ is the same as the encoding stage in the training phase, which still needs to search all possible set C (M − m + 1) in m th step.

In this paper, we propose two modifications to reduce the search range. First, we to take advantages of the codes of training data to calculate a codebook selecting order, in which we can search the nearest neighbors in one certain codebook C m  ∈ C (M) instead of all possible codebooks C (M − m + 1) in each step, and then the computational complexity can be reduced to O(M) instead of O(M 2). Second, we re-train the codebook where the codewords in C m tends to be selected in the m -th slice. Thus the search width can be narrowed. As the hierarchical codebooks are trained, the performance can be similar or even better than beam search in AQ.

3 Fast additive quantization

In this section, a formulation of optimization problem is established to analyze beam search which is the key method in the encoding phase in AQ. According to the formulation, an optimal order of codebook selecting is calculated. Then we learn the codebooks to obtain a hierarchical structure where we can reduce the search width in compressing phase.

3.1 Formulation of beam search

We first formulate the beam search as an optimization problem of codebook selecting order which is called the path in following discussion. Without loss of generality, we let the beam search width T equal to 1 to obtain a concise expression. Then according to the Algorithm 1, we define the path m as the M -tuple [m (1),  ⋯ , m (M)] where the i -th component m (i) ∈ {1, M} represents the i th selected codebook, we also denote r (i) as the residual of the vector x subtracting the assigned i codewords in codebook \( {C}_{m^{(1)}},\dots, {C}_{m^{(i)}} \). The algorithm shows that all codewords in C (M − m + 1) are searched with the calculated residual r (i − 1) in i -th iteration (r (0) = x), then the nearest codeword \( {b}_{m^{(i)}} \) and the number of codebooks m (i) is returned, which can be formulated as

$$ \begin{array}{c}\hfill \left[{m}^{(i)},{b}_{m^{(i)}}\right]= \arg \underset{m}{ \min}\left\{\underset{b}{ \min }|{\left|{\boldsymbol{r}}^{\left(i-1\right)}-{c}_m(b)\right|}^2\right\}\ \hfill \\ {}\hfill \begin{array}{c}\hfill s.t.\kern0.5em i=1,\cdots, M\hfill \\ {}\hfill b=1,\cdots, K\hfill \\ {}\hfill m\notin {\left\{{m}^{(j)}\right\}}_{j=1}^{i-1}\hfill \end{array}\hfill \end{array} $$
(4)

According to our formulation, the beam search in i -th step can be decomposed into two minimization problems: one is the minimization with respect to m, which is called the path optimization because it needs to find a proper order for codebook selection. The other is the minimization with respect to codeword b, which is called inner optimization since it searches the nearest neighbor of r (i − 1) within one codebook.

For the i -th step, we could obtain the i -th codeword \( {c}_{m^{(i)}}\left({b}_{m^{(i)}}\right) \). Then r (i) is updated with all the obtained codewords \( {c}_{m^{(1)}}\left({b}_{m^{(1)}}\right),\dots, \) \( {c}_{m^{(i)}}\left({b}_{m^{(i)}}\right) \) as follows

$$ {\boldsymbol{r}}^{(i)}=\boldsymbol{x}-{\displaystyle \sum_{m={m}^{(1)}}^{m^{(i)}}}{c}_m\left({b}_m\right) $$
(5)

After M steps, each component of the best code b is obtained iteratively. Eq. (4) shows that the beam search uses the enumeration method to solve the two-variable optimization problem in each step, which takes large time consumption. We decompose Eq. (4) to two individual optimization problems to approximate the exhaustive search. Specifically, the set of best codes \( B \) for χ is obtained after training phase, therefore we can fix the codes and solve the order optimization problem to calculate the best path m. Then for a large-scale database, we only need to solve the inner optimization problem under the best path in the compressing phase, which has lower computational complexity compared to the original compressing phase in AQ.

3.2 Optimal order of codebook selecting

After beam search, the best assignments \( B \) of training data can be obtained while the current codebooks C. Then we calculate the best path as the prior information using the best assignments. As b = [b 1,  ⋯ , b M ] is fixed, the inner optimization of Eq. (4) is eliminated, and then it can be simplified as

$$ \begin{array}{c}\hfill {m}^{(i)}=\underset{m}{\mathrm{argmin}}\left|\left|{\boldsymbol{r}}^{\left(i-1\right)}-{c}_m\left({b}_m\right)\right|\right|{}^2\ \hfill \\ {}\hfill s.t.\kern0.5em i=1,\cdots, M\hfill \\ {}\hfill m\notin {\left\{{m}^{(j)}\right\}}_{j=1}^{i-1}\hfill \end{array} $$
(6)

This optimization problem can be easily solved by exhaustively searching all codebooks, which takes little time compared to the beam search. The Eq. (6) shows that the best path achieves the minimal residual value in each step comparing to other paths. When we search the nearest neighbor in codebooks in this order, the approximation error is exactly the same as the result of beam search. It is distinct that one fixed path is not unified for all input data, therefore, we consider the residual value to be a random variable and then define the optimal path with minimal expectation of residual value in each step. Specifically, each component \( {m}_o^{(i)} \) for the optimal path m o can be defined as

$$ \begin{array}{c}\hfill {m}_o^{(i)}\triangleq \underset{m}{\mathrm{argmin}}\mathbb{E}\left(||{{\boldsymbol{r}}^{\left(i-1\right)}-{c}_m\left({b}_m\right)}^2\right)\hfill \\ {}\hfill s.t.\kern0.5em i=1,\cdots, M\hfill \\ {}\hfill m\notin {\left\{{p}_o^{(j)}\right\}}_{j=1}^{i-1}\hfill \end{array} $$
(7)

And the residual r (i) for each input x can be calculated as

$$ \begin{array}{c}\hfill {\boldsymbol{r}}^{(i)}=\boldsymbol{x}-{\displaystyle \sum_{m={m}_o^{(1)}}^{m_o^{(i)}}}{c}_m\left({b}_m\right)\hfill \\ {}\hfill \kern0.5em \hfill \\ {}\hfill s.t.\kern0.5em i=1,\cdots, M\kern0.5em \hfill \end{array} $$
(8)

We calculate m o by alternatively optimizing Eq. (7) and (8) after the training phase. Compared to other path, the optimal path has the smallest expectation of residual value in each step, which is defined as

$$ {\overset{-}{R}}^{(i)}=\mathbb{E}\left(||{\boldsymbol{x}-{\displaystyle \sum_{m={m}_o^{(1)}}^{m_o^{(i)}}}{c}_m\left({b}_m\right)}^2\right)=\mathbb{E}\left(||{{\boldsymbol{r}}^{(i)}}^2\right) $$
(9)

If the vectors in the database follow the same distribution of the training set, we think m o is also the optimal path for the database because of its similar statistical characteristic. And the optimal path m o , which will lead to find smaller final approximation error than other paths, tends to achieve smaller mean residual value in each step than other fixed path, with the conclusion verified by the experimental result.

3.3 Hierarchical codebook updating

Given the optimal path m o , we retrain the codebooks which have the hierarchical structure for the input. Specifically, \( {C}_{m_o^{(i)}} \) are updated to minimize the quantization distortion of r (i − 1). A two-step process is proposed to adjust the codewords in codebooks for the residuals in M slices.

3.3.1 Codewords exchanging

As the beam search is used to find the combination of codewords from each codebooks, the optimal path for each input x can be calculated at the same time. To let the codewords in \( {C}_{m_o^{(i)}} \) can reduce the quantization distortion of r (i − 1) at the most extent, we exchange the top K codewords which are selected most frequently in step \( {m}_o^{(i)} \) into the codebook\( {C}_{m_o^{(i)}} \). Then the path for each input tends to be the same as m o .

3.3.2 Codewords updating

After the frequently selected codewords are exchanged to proper codebooks, we can fast re-quantize the input data using the residual quantization strategy [2] via the optimal path m o . With the code b calculated by the residual quantier, the codebook \( {C}_{m_o^{(i)}} \) is optimized as

$$ {C}^{*}=\underset{C}{\mathrm{argmin}}\mathbb{E}\left(||{{\boldsymbol{r}}^{\left(i-1\right)}-C\left({b}_{m_o^{(i)}}\right)}^2\right) $$
(10)
figure g

Equation (10) can be solved using the least-quadratic problem, which is same as the codebook learning step in AQ [1]. Since the code \( {b}_{m_o^{(i)}} \) may change if we firstly update the (i − 1)th codebook\( {C}_{m_o^{\left(i-1\right)}} \), we update the codebooks with the inverted order [\( {m}_o^{(M)} \), \( {m}_o^{\left(M-1\right)} \),…, \( {m}_o^{(1)} \)].

3.3.3 Adaptive codebook learning

We propose an adaptive learning strategy to construct the hierarchical codebooks using the two-step process. In the first iteration, the 1-width beam search (T = 1) is implemented to calculate the code and path from the M codebooks for each input and m o is calculated by Eq. (7) and (8). Second, the most frequently indexed codewords in 1-st step are moved into\( {C}_{m_o^{(1)}} \), and then he input x is quantized by the new codebook\( {C}_{m_o^{(1)}}^{*} \), where \( {b}_{m_o^{(1)}} \) is the code indexing the nearest codeword. In the next m − 1 iteration, we repeat the above steps with the remaining M − m + 1 codebooks for the residual r (m − 1). After the M iterations, the codewords are exchanged to the proper codebooks and new codes are calculated. Then we solve the Eq. (10) to update the each codebook. The Algorithm 2 summarizes the codebook learning process.

We implement the codebook learning step several loops to train the hierarchical codebooks. The computational complexity for each loop is O(M 3 KDT) while that for AQ is O(M 2 KDT). However, since we iteratively train the ordered codebooks, our method can only search a local optimal combination of codewords in once iteration. Therefore we let the width of beam search T = 1 which can significantly reduce the cost in the beam search in training phase (AQ let T = 1 6 for codebook learning). In practical, the number of codebooks M is not large (generally equal to 4 in AQ), therefore the total cost of codebook learning is much less in our method than AQ.

3.4 Fast AQ compression

After we obtain the best path m o and train the hierarchical codebooks, we can quantize the input with the fixed order of codebooks and a small search width T. The complexity of encoding a vector is O(MKDT) instead of O(M 2 KDT) in AQ encoding phase. Moreover, as we exchange the most frequently indexed codewords into the proper codebooks, the search width T does not required to be very large to find the global minimal combination of codewords. In experiments, we find that the performance when T = 4 is similar to that of AQ. Compare to the AQ where T = 64 in compression phase, our method is dozen of times faster than AQ. Figure 2 illustrates the framework of our approach, where the highlights show the differences from AQ.

Fig. 2
figure 2

The framework of Fast Additive Quantization

4 Experimental results

In this section, we present two experiments that validate the effectiveness of FAQ. The first is the comparison of the approximation error between the proposed approach and three baselines. The second experiment is comparison of the performance of nearest neighbor search. As our main propose is to reduce the computational complexity of AQ, we evaluate the time consumption of compressing phase in FAQ and AQ to show our improvement of computation when encoding large testing dataset.

The experimental settings and standard datasets are introduced in Section 4.1 while experimental results are shown in Section 4.2 and Section 4.3 respectively.

4.1 Dataset and baseline

SIFT-1 M [7]: this dataset is a collection of 128-dimensional SIFT descriptors [18]. It contains one million base vectors, 100,000 training vectors and 10,000 queries with known true Euclidean nearest. GIST-1 M [7]: GIST descriptors are 960-demensional global features extracted from images [22]. This dataset consists of one million base vectors, 500,000 training vectors and 1000 queries.

Several methods achieving state-of-the-art performance are used as the baselines. The first category of methods is based on addition quantization framework, including RVQ [2] and AQ [1]. For RVQ, the K-means method is used to train the codebooks where the number of iteration is 100. For AQ, the width T is set to 16 in encoding stage of the training phase and 64 for the compressing phase.

The second category is product quantization framework, which includes PQ [15], TC [4] and OPQ [8]. For PQ, we use the code written by Hervé Jégou [14] and the number of iteration is set to 100. For TC, which is a special case of PQ, we allocate and bits and implement the matrix transform, and then use [14] to train scalar quantizer for each dimension. For OPQ, we use the code published by Kaiming He [13]. The codebooks are initialized by 100 iterations PQ, and the non-parametric OPQ is used in 50 iterations to train the final codebooks and rotation matrix.

As our approach is modified based on AQ, we use 20-iteration RVQ to initial the codebooks. Then we apply fast AQ to encode the training data where search width is set to be 1. We find 10 iterations with training phase are enough to train the hierarchical codebooks. In compressing phase, the performance is similar to AQ when search width T = 4.

We also compare our method with some hashing methods. Since the most existing quantization methods are for unsupervised, we choose two well-known unsupervised hashing methods, LSH [9] and ITQ [10]. Specifically, we use training data to learn the hashing function, and then find the nearest neighbors of queries from the base vectors. Moreover, different from quantization methods, hashing methods aim to compress data only for search without data reconstruction, thus we only compare the hashing methods in the search performance without quantization distortion.

All measurements are taken on a machine with i7–4790 CPU, with 3.60GHz processor and 4 cores.

4.2 Data representation

The main measurement for vector quantization is the approximation error which reflects the loss of information. Figure 3 shows performance and the time consumption of our approach with different codebook size for SIFT-1 M dataset. Since the size of codebooks for TC is adaptive, we only compare FAQ with other four baselines in this experiment. Figure 3a presents that the error of FAQ is nearly the same as AQ, which much smaller than PQ and OPQ. In details, the error of PQ and OPQ ranges from 7 × 104 and 6 × 104 to 3.9 × 104 and 3.6 × 104 respectively, which is about 40 % and 23 % larger than the error of AQ. The performance of RVQ and FAQ are better than the product methods. However, the average performance gap between RVQ and AQ is about 6 %, which may not be ignored in many cases. On contrast, FAQ is about 1 % larger than AQ, which is smaller enough to be ignored. The comparison of time consumption is shown in Fig. 3b. Since PQ and RVQ can compress batches of vectors at the same time, their speed is much faster than AQ. As the proposed method aims to improve the efficiency of AQ, FAQ is about 30 times faster than the AQ for compressing vectors. Therefore FAQ can be implemented in applications instead of AQ since the little performance losses.

Fig. 3
figure 3

Comparisons of approaches on dataset SIFT-1 M with various codebook sizes (M = 4): a approximation error; b time consumption

For further discussion, two other paths are used to show the improvement of hierarchical structure in codebooks. First is a random order, which randomly selects each codebook to search the nearest neighbor. Second is the largest path that achieves maximum residuals in each step. Figure 4 presents the approximation errors of the different three paths, in which the AQ is used as the baseline to reveal the performance gaps. The result shows that the performance gap between AQ and FAQ using optimal path is smaller than 2 %, while the gap of other two paths are quite large. Specifically, the distortion of the largest order is about 20 % to 150 % larger than AQ, and that of random order is 8 % to 65 %. The comparison verifies that the codebooks are hierarchical for different level of residual. That means the best combination tends to be the same as the solution of residual quantization via the optimal path. Moreover, we find the approximate error is positively correlates with the residual value in each step, and the theoretical analysis of their relationship will be researched as a future work.

Fig. 4
figure 4

Comparisons of different paths on 10,000 SIFT descriptors with various codebook sizes. “FAQ-best” stands for the result under optimal path, while “FAQ-rand” and “FAQ-largest” correspond to random path and largest path respectively

Then we discuss the influence of different code lengths (L = Mlog2 K), where all four baselines are compared. To compare the performance of AQ, the Fast Additive Product Quantization (FAPQ), which combines the FAQ and OPQ, is used for 64(128) bits code length to compare with the Additive Product Quantization (APQ). In more details, the vector is decomposed to M 1 = 8(16) orthogonal components as OPQ first, and each component is encoded by FAQ with M 2 = 4 codebooks. Figure 5 shows that the FAQ achieves the similar performance and has advantage of time consumption in different code length. Figure 5a presents that the performance gap between AQ and FAQ is quite small compared to other baselines. When compressing millions vectors such SIFT descriptors, the time consumption for FAQ is about several minutes while AQ require hours. Thus our fast approach has advantage after considering the representational power and time consumption synthetically.

Fig. 5
figure 5

Comparisons of approaches on SIFT-1 M dataset with different code length: a approximation error; b time consumption. AQ is used for 32 bits while hybrid algorithm (APQ) is used for 64(128) bits, so does our approach

4.3 Nearest neighbor search

Approximate nearest neighbor (ANN) search is a widely used technology in image processing and computer vision. In this experiment, the datasets are encoded using the four methods for ANN search while the true Euclidean nearest neighbor given by dataset is regarded as the ground truth.

The asymmetric distance computation (ADC) strategy [15] is used for PQ and OPQ to approximate the Euclidean distance. Specifically, the database is encoded off-line first, then the Euclidean distance between query and each codeword is computed, which are further stored in a table, finally the approximate distance between query and database are computed by looking up the table, where the code are used for indexing the Euclidean distance of respective codewords. Different from ADC strategy, our approach use the summation of the module value of query and scalar product to approximate the Euclidean distance between query and database, what’s more, the scalar product between codewords can be precomputed and stored in tables. The details are shown in [1].

Figure 6 shows the Recall@R measures for the query [15] as the search accuracy, which is a generally used measure that calculates the accuracy of the top R nearest neighbors compared to the ground truth. For the SIFT dataset, the Recall@R has similar accuracy between AQ and FAQ, which are better than PQ, TC, OPQ and RVQ. For GIST dataset, the performance of FAQ is similar to or even better than AQ, OPQ and RVQ, which are much better than PQ and TC. Considering the structure of two dataset, we think the methods perform differently in two dataset for following reason. The SIFT descriptors have symmetric structure which contains 8 directions in 128 dimension and it tends to be independent when product quantization is used directly, therefore, performance between PQ and OPQ are similar. When it comes to GIST descriptor which has asymmetric structure, the independence assumption in PQ may cause more problems, therefore the AQ, RVQ, FAQ and OPQ have large improvement compared to PQ. Moreover, [15] mentioned that when input vector is divided into shorter subvectors, the quantization distortion using same bits becomes larger, therefore the TC which splits the vector into scalars achieves the worst performance. In fact, the performance of TC will improve significantly when the coding length rises. What’s more, the results show that the search performance of quantization is much better than the hashing method, which is agreed with the discussion in the introduction.

Fig. 6
figure 6

Comparasion of approaches with recall at R top ranked samples on different dataset: a SIFT-1 M dataset; b GIST-1 M dataset

5 Conclusion

In this paper we present the fast additive quantization (FAQ) approach, which trains the hierarchical codebooks via the optimal path to reduce the search range. To find the best search order, we first formulate the beam search as an order optimization problem of codebook selection, which contains the path optimization and inner optimization problem. Then we update the codebooks to minimize the quantization distortion of the residual of each quantization level. After the codebooks contain the hierarchical structure, the search width can be significantly reduced. In compressing phase, we search the nearest codewords in optimal order of codebooks with small width. The experimental results show that our method has the similar performance of vector quantization as AQ while the time consumption is considerable reduced in the vector encoding phase.

For future works, we will analyze the correlation between the approximation error and the expectation of residual values in each step theoretically.