Approximated Scale Space for Efficient and Accurate SIFT Key-Point Detection

Wang, Ying; Liu, Yiguang; Xu, Zhenyu; Zheng, Yunan; Hong, Weijie

doi:10.1007/978-3-030-30825-4_3

Ying Wang³⁸,
Yiguang Liu³⁸,
Zhenyu Xu³⁸,
Yunan Zheng³⁸ &
…
Weijie Hong³⁸

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 606))

Included in the following conference series:

International Conference on Sensing and Imaging

269 Accesses

Abstract

The SIFT (scale invariant feature transform) key-point serves as an indispensable role in many computer vision applications. This paper presents an approximation of the SIFT scale space for key-point detection with high efficiency while preserving the accuracy. We build the scale space by repeated averaging filters to approximate the Gaussian filters used in SIFT algorithm. The accuracy of the proposed method is guaranteed by that an image undergoes repeated smoothing with an averaging filter is approximately equivalent to the smoothing with a specified Gaussian filter, which can be proved by the center limit theorem. The efficiency is improved by using integral image to fast compute the averaging filtering. In addition, we also present a method to filter out unstable key-points on the edges. Experimental results demonstrate the proposed method can generate high repeatable key-points quite close to the SIFT with only about one tenth of computational complexity of SIFT, and concurrently the proposed method does outperform many other methods.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Comparison of SIFT, Bi-SIFT, and Tri-SIFT and their frequency spectrum analysis

Article 14 August 2017

An Analysis of the Factors Affecting Keypoint Stability in Scale-Space

Article 30 April 2016

A Scale Invariant Keypoint Detector Based on Visual and Geometrical Cues

Keywords

1 Introduction

The SIFT [1] key-point plays an important role in computer vision and pattern recognition tasks such as structure from motion, object matching, object recognition, and texture analysis [2,3,4,5,6]. Though many other key-points have been proposed [7,8,9,10,11], the SIFT key-point is still favored by many applications especially for those with high accuracy needed [12]. However, the SIFT key-point has also been criticized for the drawback of its heavy computational burden, thus many variational methods have been proposed to accelerate the computational speed, but at the cost of insufficient accuracy or degraded repeatability of the key-points. To efficiently compute the SIFT key-point while preserving the accuracy and repeatability is the goal of this paper. The SIFT key-point is obtained by the DOG blob detector which is an approximation of LOG detector. There are also some other blob detectors such as [13,14,15], but their performance is inferior to SIFT in terms of accuracy and repeatability, as the scale space theory[16, 17] proved that the Gaussian kernel is the only smoothing kernel that could be used to create the image scale space. Based on the center limit theorem, repeated averaging filtering can be used to approximate Gaussian filtering. We propose a bank of averaging filters that accurately approximate the Gaussian filter in the SIFT algorithm, and the averaging filters are implemented via integral images. Because integral image is computed in our method, a combined method for eliminating key-points on the edge is also presented by reusing the integral images, which is more efficient than the original one in SIFT algorithm.

2 A Brief Review of SIFT Scale Space Construction and Key-Point Detection

The SIFT scale space consists of two pyramids, the Gaussian pyramid and the DOG(difference of Gaussian) pyramid. The Gaussian pyramid consists of M octaves and each octave consists of N layers. Suppose L(a, b) denotes the ath layer in octave b, it is produced by the convolution of a Gaussian function with the input image I(x, y)

$$\displaystyle \begin{aligned} L(a,b)=G(x,y,\sigma)*I(x,y) \end{aligned} $$

(1)

where ∗ stands for the convolution operation and

$$\displaystyle \begin{aligned} G(x,y,\sigma)=\frac{1}{2\pi\sigma^2}e^{-(x^2+y^2)/2\sigma^2} \end{aligned} $$

(2)

where σ is determined by a, b and the scale of the first layer σ ₀

$$\displaystyle \begin{aligned} \sigma=2^{a+\frac{b}{N-3}}\cdot\sigma_{0} \end{aligned} $$

(3)

Since the computation complexity of Eq. (1) is proportion to σ ², in practical implementation, smaller σ can be used due to the property of the Gaussian convolution that the convolution of two Gaussian functions is also a Gaussian with variance being the sum of the original variances. That is,

$$\displaystyle \begin{aligned} I(x,y)*G(x,y,\sigma_{1})*G(x,y,\sigma_{2})=I(x,y)*G(x,y,\sigma_{3}) \end{aligned} $$

(4)

where $\sigma _{1}^{2}+\sigma _{2}^{2}=\sigma _{3}^2$. So, if b ≠ 0, L(a, b) is computed as

$$\displaystyle \begin{aligned} L(a,b)=G(x,y,\sigma_{\text{gap}})*L(a,b-1) \end{aligned} $$

(5)

where

$$\displaystyle \begin{aligned} \sigma_{\text{gap}}=\sqrt{\sigma(0,b)^{2}-\sigma(0,b-1)^{2}} \end{aligned} $$

(6)

It can be seen from Eq. (6) that each octave shares the same group of σ to generate a successive layer, this is because when b = 0 and a ≠ 0, the L(a, b) is obtained by down sampling the layer L(a − 1, b + N − 2). Once the Gaussian pyramid is computed, the difference of Gaussian pyramid is obtained via the subtraction of two successive layers in the Gaussian pyramid. Figure 1 shows the SIFT scale space construction processes. The SIFT key-point is detected by finding the extrema in the DOG pyramid, after that interpolation is performed to get sub-pixel precision if the key-point is not on the edge.

3 Approximation of Gaussian Smoothing Using Repeated Averaging Filter

In this section, we introduce the quantitative relationship between repeated averaging filtering and Gaussian filtering for a given image. The coefficients of an averaging filter can be viewed as the probability density of a random variable X which is uniformly distributed in a square area centered at the origin. Given an image I(x, y) and an averaging filter A(x, y, w), where w is the window width of the averaging filter, repeated averaging filtering of the image can be mathematically expressed as:

$$\displaystyle \begin{aligned} I(x,y,n)=I(x,y)*\underbrace{A(x,y,w)*A(x,y,w)\ldots A(x,y,w)}_{n} \end{aligned} $$

(7)

Based on the center limit theorem, when n approaches to infinity,

$$\displaystyle \begin{aligned} \begin{aligned} \lim\limits_{n \to \infty }{I(x,y,n)} &=I(x,y)*\lim\limits_{n \to \infty }\underbrace{A(x,y,w)*A(x,y,w)\ldots A(x,y,w)}_{n} \\&=I(x,y)*G(x,y,\sigma_{\text{gau}}) \end{aligned} \end{aligned} $$

(8)

where G(x, y, σ _gau) is the Gaussian function with the variance $\sigma _{\text{gau}}^2$. Suppose the window width of an averaging filter is w, the variance of its corresponding discrete random variable is

$$\displaystyle \begin{aligned} \sigma_{\text{av}}^2=\frac{w^2-1}{12} \end{aligned} $$

(9)

the relation of σ _av, σ _gau, w and n can be deduced:

$$\displaystyle \begin{aligned} \sigma_{\text{gau}}=\sqrt{n\sigma_{\text{av}}^2}=\sqrt{\frac{nw^2-n}{12}} \end{aligned} $$

(10)

For discrete averaging filter and discrete Gaussian filter, the two sides of the first equal sign of Eq. (10) are very close to each other when n ≮ 3, which indicates no less than 3 times repeated averaging filtering can well approximate a specified Gaussian filter. Figure 2 shows the two curves are close to each other.

4 Scale Space Construction via Repeated Averaging Filtering

To create the scale space introduced in Sect. 1, the key problem is to seek the initial scale σ ₀, a group of Gaussian filters used to construct next layer from current layer and a group of averaging filters to approximate these Gaussian filters. Given a Gaussian kernel, it is not hard to find an optimal averaging filter A(x, y, w) and the times n needed to approximate the Gaussian kernel based on Eq. (10). However, there are more difficulties to get the optimal averaging filter and filtering times in real applications. Here we mainly focus on finding optimal averaging filters to construct the scale space introduced in Sect. 1.

The constraint of Eq. (10) is that n is a positive integer not smaller than 3 and not bigger than 10, because if n < 3 it is not enough to approximate the Gaussian kernel and if n > 10 the computational complexity will have no advantage over the original Gaussian filtering. The variable w should be a positive odd number not smaller than 3, in order to satisfy that there could always be a center pixel the filtering result can be assigned to. Suppose the initial scale of the Gaussian pyramid is σ ₀, as most key-points lie in the first several layer of the scale space, thus σ ₀ should not be large. According to the original SIFT algorithm where σ ₀ = 1.6, here we set σ ₀ < 2.0. The number of layers N in each octave depends on the sampling frequency F, and F = N − 3. In [1] the sampling frequency is set to 2, 3, or 4, which gains an acceptable balance between key-point repeatability and computational time. Constraints to σ _i under SIFT scale space is that $\sigma _{i} = \sigma _{0}\sqrt {2^{\frac {2(i+1)}{F}}-2^{\frac {2i}{F}}}$. Besides, $\sigma _{i} = \sqrt {\frac {n_{i}w_{i}^2-n_{i}}{12}}$ is needed for discrete averaging filters and discrete Gaussian filters. These constraints can be formulated as:

$$\displaystyle \begin{aligned} \sigma_{i} = \sigma_{0}\sqrt{2^{\frac{2(i+1)}{F}}-2^{\frac{2i}{F}}} = \sqrt{\frac{n_{i}w_{i}^2-n_{i}}{12}} \end{aligned} $$

(11)

where i = 1, 2, 3, …, F + 2, σ ₀ < 2, n _i ∈{3, 4, 5, …, 10}, w _i ∈{3, 5, 7, …}. It can be verified that the only solution to Eq. (11) is

$$\displaystyle \begin{aligned} \left\{ \begin{aligned} \sigma_{0} & = \sqrt{2}, \sigma_{1}=2, \sigma_{2}=2\sqrt{2}, \sigma_{3}=4, \sigma_{4}=4\sqrt{2} \\ n_{1}&=3, n_{2}=6, n_{3}=4, n_{4}=4\\ w_{1}&=3, w_{2}=3, w_{3}=5, w_{4}=7\\ F&=2 \end{aligned} \right. \end{aligned} $$

(12)

The approximated SIFT scale space is constructed in the same procedure as it is in the original SIFT algorithm, the Gaussian filters are replaced by averaging filters, and their quantitative relation is given in Eq. (12). Note that the key-point detector in SIFT is actually the DOG detector obtained from subtraction of two Gaussian kernels, Fig. 3 shows the approximated DOG and DOG curves in one octave of the DOG pyramid.

5 Key-Point Detection

The key-points are detected by searching the local extrema in a 3 × 3 region of the approximated DOG pyramid. However, the DOG detector is sensitive to image edges, and the key-point on the edge should be removed since it is unstable. As integral image is computed in our method (depicted in Fig. 4a), we propose to use fast Hessian measure to reject key-points on the edge. The principal curvature is used to determine whether a point is on the edge, which is computed by the eigenvalues of scale adapted Hessian matrix:

$$\displaystyle \begin{aligned} H(X,\sigma)=\left[\begin{array}{cc} L_{xx}(X,\sigma), &\;\; L_{xy}(X,\sigma) \\ L_{xy}(X,\sigma), &\;\; L_{yy}(X,\sigma) \end{array}\right], \end{aligned} $$

(13)

The partial derivatives in Eq. (13) are computed by integral image and a group of box filters as illustrated in Fig. 4. The images of before and after filtering out the key-point on the edge are shown in Fig. 5. After filtering out the key-points on edge, the scale space quadratic interpolation is performed for every key-point to get a sub-pixel accurate location (Fig. 5).

6 Experimental Results

The accuracy of our method is validated by the performance comparison of the approximated SIFT detector, SIFT detector, SURF detector, and FAST detector. The SIFT code implemented by Rob Hess [18] is used in our experiment. For the SURF algorithm, we use the original implementation released by its authors. The FAST detector in our experiment is implemented in OpenCV 3.2.0. The datasets proposed by Mikolajczyk and Schmid [19] are used for the evaluation. There are totally 8 datasets, and each dataset has six images with increasing amount of deformation from a reference image. The deformations, covering zoom and rotation (Boat and Bark sequences), view-point change (Wall and Graffiti sequences), brightness changes (Leuven sequence), blur (Trees and Bikes sequence) as well as JPEG compression (UBC sequence), are provided with known ground truth which can be used to identify the correspondence. The repeatability score introduced in [19] is used to measure the reliability of the detectors on detecting the same feature point under different deformations of the same scene. The repeatability score is defined as the ratio between the number of corresponding features and the smaller number of features in one image in the image pair. To give a fair comparison, the thresholds of these detectors are adjusted to give approximately equal number of feature points as SURF detector detected. The results in Fig. 6 show that the SIFT detector has the best performance in most cases, and the proposed method is close to SIFT and better than SURF and FAST detector.

The efficiency of the proposed method is also compared to SIFT, SURF, and FAST. The times of detection of the key-points in the first image of the wall sequence is shown in Table 1, where we can see that the FAST detector is most efficient among all methods, but it merely detects key-points in one scale, while others detect them in more than 10 scales. The proposed method is about 3 times faster than SURF and 10 times faster than SIFT. Real-world image matching using the approximated SIFT detector with SURF descriptor is shown in Fig. 7.

Table 1 Timing results of SIFT, SURF, FAST and our method for key-point detection on the first image of the Wall sequence (size: 1000 × 700 pixels)

Full size table

7 Conclusion

In this paper we have proposed a framework to efficiently and accurately approximate the SIFT scale space for key-point detection. The quantitative relation between repeated averaging filtering and Gaussian filtering has been analyzed, and a group of averaging filters has been found to accurately approximate the DOG detector. Experimental results demonstrate that the proposed method is about 10 times faster than SIFT detector, while preserving the high accuracy and high repeatability of the SIFT detector.

References

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
St-Charles, P. L., Bilodeau, G. A., & Bergevin, R. (2016). Fast image gradients using binary feature convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1–9).
Google Scholar
Loncomilla, P., Ruiz-del-Solar, J., & Martnez, L. (2016). Object recognition using local invariant features for robotic applications: A survey. Pattern Recognition, 60, 499–514.
Article Google Scholar
Ma, W., Wen, Z., Wu, Y., Jiao, L., Gong, M., Zheng, Y., et al. (2016). Remote sensing image registration with modified SIFT and enhanced feature matching. IEEE Geoscience and Remote Sensing Letters, 14(1), 3–7.
Article ADS Google Scholar
Yang, C., Wanyu, L., Yanli, Z., & Hong, L. (2016). The research of video tracking based on improved SIFT algorithm. In 2016 IEEE International Conference on Mechatronics and Automation (ICMA) (pp. 1703–1707). Piscataway, NJ: IEEE.
Chapter Google Scholar
Stcharles, P. L., Bilodeau, G. A., & Bergevin, R. (2016). Fast image gradients using binary feature convolutions. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (pp. 1074–1082). Piscataway, NJ: IEEE.
Google Scholar
Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110(3), 346–359.
Article Google Scholar
Calonder, M., Lepetit, V., Strecha, C., & Fua, P. (2010). Brief: Binary robust independent elementary features. In European Conference on Computer Vision (pp. 778–792). Berlin: Springer.
Google Scholar
Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. R. (2011). ORB: An efficient alternative to SIFT or SURF. In 2011 IEEE International Conference on Computer Vision (ICCV) (pp. 2564–2571). Piscataway, NJ: IEEE.
Chapter Google Scholar
Leutenegger S, Chli M, Siegwart R. Y. (2011). BRISK: Binary robust invariant scalable keypoints. In 2011 IEEE International Conference on Computer Vision (ICCV) (pp. 2548–2555). Piscataway, NJ: IEEE.
Chapter Google Scholar
Alahi, A., Ortiz, R., & Vandergheynst, P. (2012). Freak: Fast retina keypoint. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 510–517). Piscataway, NJ: IEEE.
Chapter Google Scholar
Mitsugami, I. (2013). Bundler: Structure from motion for unordered image collections. Journal of the Institute of Image Information & Television Engineers, 65(4), 479–482.
Article Google Scholar
Agrawal, M., Konolige, K., & Blas, M. R. (2008). CenSurE: Center surround extremas for realtime feature detection and matching. In European Conference on Computer Vision (ECCV) (pp. 102–115). Berlin: Springer.
Google Scholar
Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761–767.
Article Google Scholar
Grabner, M., Grabner, H., & Bischof, H. (2006). Fast approximated SIFT. In Asian Conference on Computer Vision (pp. 918–927). Berlin: Springer.
Google Scholar
Crowley, J. L., & Parker, A. C. (1984). A representation for shape based on peaks and ridges in the difference of low-pass transform. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(2), 156–170.
Article Google Scholar
Lindeberg, T. (1994). Scale-space theory: A basic tool for analysing structures at different scales. Journal of Applied Statistics, 21(2), 225–270, 21(2):224–270.
Google Scholar
Hess, R. (2010). An open-source SIFTLibrary. In Proceedings of the Inter National Conference on Multimedia (pp. 1493–1496). New York, NY: ACM.
Chapter Google Scholar
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., et al. (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(1–2), 43–72.
Article Google Scholar

Download references

Acknowledgements

This work is supported by NSFC under Grant 61860206007 and 61571313, by National Key Research and Development Program under Grant 2016YFB0800600 and 2016YFB0801100 and by funding from Sichuan Province under Grant 18GJHZ0138, and by joint funding from Sichuan University and Lu-Zhou city under 2016CDLZ-G02-SCU.

Author information

Authors and Affiliations

College of Computer Science, Sichuan University, Chengdu, China
Ying Wang, Yiguang Liu, Zhenyu Xu, Yunan Zheng & Weijie Hong

Authors

Ying Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yiguang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yunan Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Weijie Hong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yiguang Liu .

Editor information

Editors and Affiliations

Department of Mathematics, Tufts University, Medford, MA, USA
Eric Todd Quinto
Department of Electrical & Computer Engineering, The University of Akron, Akron, OH, USA
Nathan Ida
Department of Information Science, School of Mathematical Sciences, Peking University, Beijing, China
Ming Jiang
Mathematik, Univ. des Saarlandes, Saarbrücken, Saarland, Germany
Alfred K. Louis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Liu, Y., Xu, Z., Zheng, Y., Hong, W. (2019). Approximated Scale Space for Efficient and Accurate SIFT Key-Point Detection. In: Quinto, E., Ida, N., Jiang, M., Louis, A. (eds) The Proceedings of the International Conference on Sensing and Imaging, 2018. ICSI 2018. Lecture Notes in Electrical Engineering, vol 606. Springer, Cham. https://doi.org/10.1007/978-3-030-30825-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-30825-4_3
Published: 18 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30824-7
Online ISBN: 978-3-030-30825-4
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics

Approximated Scale Space for Efficient and Accurate SIFT Key-Point Detection

Abstract

Similar content being viewed by others

Comparison of SIFT, Bi-SIFT, and Tri-SIFT and their frequency spectrum analysis

An Analysis of the Factors Affecting Keypoint Stability in Scale-Space

A Scale Invariant Keypoint Detector Based on Visual and Geometrical Cues

Keywords

1 Introduction

2 A Brief Review of SIFT Scale Space Construction and Key-Point Detection

3 Approximation of Gaussian Smoothing Using Repeated Averaging Filter

4 Scale Space Construction via Repeated Averaging Filtering

5 Key-Point Detection

6 Experimental Results

7 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Approximated Scale Space for Efficient and Accurate SIFT Key-Point Detection

Abstract

Similar content being viewed by others

Comparison of SIFT, Bi-SIFT, and Tri-SIFT and their frequency spectrum analysis

An Analysis of the Factors Affecting Keypoint Stability in Scale-Space

A Scale Invariant Keypoint Detector Based on Visual and Geometrical Cues

Keywords

1 Introduction

2 A Brief Review of SIFT Scale Space Construction and Key-Point Detection

3 Approximation of Gaussian Smoothing Using Repeated Averaging Filter

4 Scale Space Construction via Repeated Averaging Filtering

5 Key-Point Detection

6 Experimental Results

7 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation