1 Introduction

Corners have been shown to be well suited for a variety of image processing and computer vision tasks such as object tracking, stereo matching, and 3D reconstruction. Various corner detection methods have been reported in the literature. The existing corner detection methods can be broadly classified into three categories: contour-based methods (Rattarangsi and Chin 1990; Teh and Chin 1989; Mokhtarian and Suomela 1998; Zhong and Liao 2007; Zhang et al. 2014; Zhang and Shui 2015; Olson 2000; Zhang et al. 2015, 2019), template-based methods (Deriche and Giraudon 1993; Smith and Brady 1997; Rosten et al. 2010; Shui and Zhang 2013; Xia et al. 2014), and intensity based methods (Moravec 1979; Harris and Stephens 1988; Noble 1988; Gårding and Lindeberg 1996; Lindeberg 1998; Schmid et al. 2000; Kenney et al. 2003; Mikolajczyk and Schmid 2004; Laptev 2005; Lowe 2004; Bay et al. 2006; Marimon et al. 2010; Maver 2010; Su et al. 2012; Verdie et al. 2015; Yi et al. 2016; Lenc and Vedaldi 2016; Zhang et al. 2017). Contour-based methods detect corners by analyzing the shape changes on the edge contours which are extracted from an input image by an edge detector. The contour-based methods rely on the results of a preceding step on image edge detection which affects their applications.

Template-based methods find corners by fitting a small patch of an image with predefined corner templates. Deriche and Giraudon (1993) analyzed the behaviors of wedge and Y-type corners by using the Gaussian filter. In Smith and Brady (1997), every pixel inside a circular mask is compared with a center pixel and the intensity difference is recorded. Corners are defined as the smallest univalue segment assimilating nucleus (SUSAN) points. In Ruzon and Tomasi (2001), junctions are defined as points in an image where two or more piecewise constant wedges meet at the central point. Shui and Zhang (2013) applied the anisotropic Gaussian directional derivative filters (Shui and Zhang 2012) to derive the representations of L-type, Y-type, X-type, and star-type corners and detect corners from edge pixels. Xia et al. (2014) presented a junction detector based on the intensity variations of edge pixels. Pham et al. (2014) presented a junction detection method in which junctions are obtained by searching for optimal meeting points of median lines in line-drawing images. In recent years, machine learning algorithms are used in template-based corner detection methods. Trujillo and Olague (2006) used a genetic programming based learning approach to extract corners from input images. Rosten et al. (2010) extended the SUSAN detector (Smith and Brady 1997) and presented the features from accelerated segment test (FAST) detector.

Intensity-based methods detect corners directly from an input image by analyzing the information on local intensity variations. Following Moravec’s observation (Moravec 1979) that the intensity variations of corners are large in all directions, Harris and Stephens (1988) developed the famous Harris detector. The isotropic Gaussian filter was used to smooth the input image and the first-order image derivatives along the horizontal and vertical directions were obtained to construct a \(2\times 2\) structure tensor and detect corners. The aim of the Harris detector is to find corners which have significant changes of image intensities in both directions. The Harris detector is one of the most successful detectors and has been widely used. However, the Harris detector is a single scale detector which may miss significant corners or detect false corners (Lee et al. 1995). The reason is that most objects consist of a wide range of scale features. Meanwhile, it is indicated in Bay et al. (2006) that the most valuable property of a corner detector is its repeatability in affine image transformations. A large number of detectors (Gårding and Lindeberg 1996; Lindeberg 1998; Schmid et al. 2000; Mikolajczyk and Schmid 2004; Laptev 2005; Lowe 2004; Bay et al. 2006; Marimon et al. 2010) have been presented to enhance the repeatability performance of corner detectors in a scale-space representation (Witkin 1984; Koenderink 1984).

Lindeberg (1998) presented a corner detection method with automatic scale selection. Mikolajczyk and Schmid (2004) presented a scale invariant Harris–Laplace detector where corners were detected by the Harris detector in multi-scales, and Laplace operator was used to depict corners’ characteristic scales. Lowe (2004) approximated the normalized Laplacian of Gaussian filter by a difference of Gaussian (DoG) filter and presented the scale-invariant feature transform (SIFT) detector. Bay et al. (2006) proposed the speeded up robust features (SURF) detector which uses box filters to approximate the determinant of a Hessian matrix and extracts feature points. Brox et al. (2006) applied anisotropic nonlinear diffusions to construct a nonlinear structure tensor for detecting corners. Lepetit and Fua (2006) used Laplace of Gaussian filter with multiple scales to smooth the input image and decision trees technique (Quinlan 1986) was used to extract corners. Alcantarilla et al. (2012) presented the KAZE operator which detects interest points in a nonlinear scale space by using additive operator splitting techniques (Weickert et al. 1998) to approximate the Perona and Malik diffusion equation (Perona and Malik 1990). Miao and Jiang (2013) employed the rank order Laplacian of Gaussian to smooth the input image and construct \(2\times 2\) Hessian matrix for detecting corners. Duval-Poo et al. (2015) replaced the Log-Gabor wavelet smoothing (Gao et al. 2007) by multi-scale shearlet filters and constructed a nonlinear \(2\times 2\) structure tensor for detecting corners. Verdie et al. (2015) presented a temporally invariant learned detector (TILDE) which learned from images with the same scene under drastic illumination changes. In Yi et al. (2016), SIFT method (Lowe 2004) was used to extract interest points from the input images and train an interest point detector. In Lenc and Vedaldi (2016), the local covariant constraint was used to train a feature detector. The approach of Lenc and Vedaldi (2016) was also extended by using TILDE (Verdie et al. 2015) as guidance (Zhang et al. 2017). DeTone et al. (2018) presented a self-supervised framework for training an interest point detector.

Our research indicates that the intensity variations of a corner are not significant in all directions. Our research also shows that no one has explained why the intensity variation based methods (Harris and Stephens 1988; Gårding and Lindeberg 1996; Kenney et al. 2003; Mikolajczyk and Schmid 2004; Gao et al. 2007; Duval-Poo et al. 2015) which used the first-order derivatives along the horizontal and vertical directions to construct the \(2\times 2\) structure tensor cannot detect corners well. Up to now, a large number of filters [e.g., Zhang et al. (2014), log-Gabor filters (Field 1987), shearlet filters (Duval-Poo et al. 2015), anisotropic nonlinear diffusion filters (Brox et al. 2006), and anisotropic Gaussian filters (Shui and Zhang 2012)] have been used to smooth the input image and extract intensity variations. However, within the scope of our investigations, no one has presented methods on how to accurately extract the local intensity variations to depict the differences between edges and corners.

In this paper, the properties of the isotropic and anisotropic Gaussian directional derivative representations (Shui and Zhang 2013) of a step edge and several general corners (such as L-type, Y- or T-type, X-type, and star-type corners) are investigated to explain why the existing \(2\times 2\) structure tensor based algorithms (Noble 1988; Gårding and Lindeberg 1996; Kenney et al. 2003; Mikolajczyk and Schmid 2004; Gao et al. 2007; Duval-Poo et al. 2015) cannot detect corners well. The properties indicate that the first-order derivatives along the horizontal and vertical directions cannot depict the differences between edges and corners well. In fact, the intensity variation around a corner is not large in all directions. All the existing \(2\times 2\) structure tensor based algorithms (Noble 1988; Gårding and Lindeberg 1996; Kenney et al. 2003; Mikolajczyk and Schmid 2004; Gao et al. 2007; Duval-Poo et al. 2015) are based on Moravec’s theory (Moravec 1979) that the intensity variation around corners are large in all directions, which result in false corner detections. Some corners may be detected as edges, while some edge pixels may be judged as corners. Furthermore, for a corner, it may be detected by using the two orthogonal directional derivatives. However, if the image is rotated by a certain angle, the horizontal and vertical directional derivatives of the corner may become smaller. Then the corner may not be detected.

We present a new technique to obtain the local intensity variations from the input image. We proved that the new intensity variation extraction technique has the ability to accurately depict the intensity variation differences between edges and corners in the continuous domain. The properties of the intensity variations of step edges and corners and the new intensity variation extraction technique enable us to derive a new multi-directional structure tensor with multiple scales, which has the ability to depict the differences between edges and corners well in the discrete domain. The eigenvalues of the multi-directional structure tensor with multiple scales are used in our new corner detection method. The proposed corner detector is compared with ten state-of-the-art feature detectors (Harris (Harris and Stephens 1988), Harris–Laplace (Mikolajczyk and Schmid 2004), FAST (Rosten et al. 2010), DoG (Lowe 2004), SURF (Bay et al. 2006), KAZE (Alcantarilla et al. 2012), ANDD (Shui and Zhang 2013), ACJ (Xia et al. 2014), LIFT (Yi et al. 2016), and Superpoint (DeTone et al. 2018)). Thirty images with various scenes without ground truth are used to evaluate the detectors’ average repeatabilities under affine transformation, JPEG compression, and noise degradation. The Oxford dataset is used to assess the performance of the detectors on region repeatability (Mikolajczyk et al. 2005). The DTU-Robots dataset (Aanæs et al. 2012) is used to assess the performance of the detectors on repeatability metric. Two test images with ground truths are used to assess the detection accuracy and localization accuracy of these methods. The experimental results show that the proposed method is of very high quality. This is impossible for the other tested detectors (Harris and Stephens 1988; Mikolajczyk and Schmid 2004; Rosten et al. 2010; Lowe 2004; Bay et al. 2006; Alcantarilla et al. 2012; Shui and Zhang 2013; Xia et al. 2014; Yi et al. 2016; DeTone et al. 2018).

The rest of the paper is presented as follows. In Sect. 2, the Harris detector and the representations of a step edge, L-type corner, Y- or T-type corner, X-type corner, and star-type corner are introduced. In Sect. 3, the weakness of the existing structure tensor based corner detection techniques is identified. Several edges and corner properties are summarized. Then, a new corner detection algorithm based on a multi-directional structure tensor with multiple scales is presented and a new intensity variation extraction technique is introduced. Extensive experimental results are presented in Sect. 4, and conclusions are given in Sect. 5.

2 Related Work

In this section, the standard Harris detection algorithm is introduced first. Then, the isotropic and anisotropic Gaussian directional derivative representations of a step edge and several general corner models are presented.

2.1 Harris Corner Detector

The Harris corner detector employs a \(2\times 2\) structure tensor to measure the local intensity variations of the input image along the horizontal and vertical directions. For a given 2D input image I(xy), the weighted sum of squared difference \(\mathfrak {I}(m_{x},m_{y})\) is defined as

$$\begin{aligned} \begin{aligned} \mathfrak {I}(m_{x},m_{y})&=\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }h_{\sigma }(x,y)\\&\quad \bigg (I(x+m_{x},y+m_{y})-I(x,y)\bigg )^{2}\text {d}x\text {d}y, \end{aligned} \end{aligned}$$
(1)

where \(h_{\sigma }(x,y)\) is an isotropic Gaussian filter, \(\sigma \) is the scale factor (\(\sigma >0\)), (xy) is a point location in the image, and \((m_{x},m_{y})\) is a local shift. The shifted image patch \(I(x+m_{x},y+m_{y})\) is approximated by a Taylor expansion truncated to the first order terms

$$\begin{aligned} I(x{+}m_{x},y+m_{y})\approx I(x,y){+}m_{x}I_{x}(x,y){+}m_{y}I_{y}(x,y), \end{aligned}$$
(2)

where \(I_{x}(x,y)\) and \(I_{y}(x,y)\) denote the partial derivatives of the input image \(I \) with respect to the horizontal and vertical directions. Substituting approximation Eq. (2) into Equation (1) yields

$$\begin{aligned} \begin{aligned} \mathfrak {I}(m_{x},m_{y})\approx&\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }h_{\sigma }(x,y)\\&\bigg (m_{x}I_{x}(x,y)+m_{y}I_{y}(x,y)\bigg )^{2}\text {d}x\text {d}y\\ =&\,(m_{x}~m_{y})A{ m_{x} \atopwithdelims ()m_{y}}, \end{aligned} \end{aligned}$$
(3)

where A is the structure tensor

$$\begin{aligned} \begin{aligned} A&= \int _{-\infty }^{\infty }\int _{-\infty }^{\infty }h_{\sigma }(x,y)\\&\quad \left[ \begin{array}{cc} I_{x}^2(x,y)&{}I_{x}(x,y)I_{y}(x,y)\\ I_{x}(x,y)I_{y}(x,y)&{}I_{y}^2(x,y)\\ \end{array} \right] \text {d}x\text {d}y.\\ \end{aligned} \end{aligned}$$
(4)

Typically, a corner is characterized by a large variation of \(\mathfrak {I}\) in all directions at (xy). Let \(\lambda _1\) and \(\lambda _2\) (\(\lambda _1< \lambda _2\)) be the eigenvalues of structure tensor A. There are three cases to be considered. (1) If both \(\lambda _1\) and \(\lambda _2\) are small, then there is no feature at pixel (xy). (2) If \(\lambda _1\approx 0\) and \(\lambda _2\) is a large positive value, then an edge is found. (3) If both \(\lambda _1\) and \(\lambda _2\) are large positive values, then a corner is found.

2.2 Isotropic and Anisotropic Gaussian Directional Derivative Representations

In the spatial domain, the anisotropic Gaussian kernel (AGK) \(g_{\sigma ,\rho ,\theta }(x,y)\) can be represented as (Zhang and Shui 2015; Shui and Zhang 2013, 2012; Zhang et al. 2017)

$$\begin{aligned} \begin{aligned}&g_{\sigma ,\rho ,\theta }({x,y})=\\&\frac{1}{2\pi \sigma ^2}\exp \left( -\frac{1}{2\sigma ^2}{[x,y]}\mathbf {R}_{-\theta }\left[ \begin{array}{cc} \rho ^{-2}~~~0\\ 0~~~\rho ^2\\ \end{array} \right] \mathbf {R}_{\theta }{[x,y]}^{\top }\right) , \end{aligned}\end{aligned}$$
(5)

with

$$\begin{aligned} \mathbf {R}_\theta =\left[ \begin{array}{cc} \cos \theta &{}\sin \theta \\ -\sin \theta &{}\cos \theta \end{array} \right] , \end{aligned}$$

where \(\tiny {\top }\) represents matrix transpose, \(\rho \) is the anisotropic factor (\(\rho >1\)), and \(\mathbf {R}_{\theta }\) is the rotation matrix with angle \(\theta \). From Eq. (5), the anisotropic Gaussian directional derivative (AGDD) filter \(\psi _{\sigma ,\rho ,\theta }(x,y)\) at orientation \(\theta +\pi /2\) is derived as

$$\begin{aligned} \begin{aligned} \phi _{\sigma ,\rho ,\theta }(x,y)=\frac{\partial g_{\sigma ,\rho }}{\partial y}(\mathbf {R}_{\theta }{[x,y]}^{\top }). \end{aligned} \end{aligned}$$
(6)

It is worth noting that the directional derivative obtained by Eq. (6) has a \(\pi /2\) shift with the directional derivative obtained by deriving the partial derivative of orientation \(\theta \). If the anisotropic factor \(\rho \) is 1, \(g_{\sigma ,\rho ,\theta }(x,y)\) and \(\phi _{\sigma ,\rho ,\theta }(x,y)\) in Eqs. (5) and (6) represent isotropic Gaussian kernel and isotropic Gaussian directional derivative (IGDD) filter respectively.

The anisotropic Gaussian directional derivative of the input image I(xy) along direction \(\theta +{\pi }/{2}\) is computed by the convolution operator

$$\begin{aligned} \begin{aligned} \nabla _{\sigma ,\rho ,\theta }I(x,y)&= \frac{\partial }{\partial (\theta +\pi /2)}(I(x,y)\otimes g_{\sigma ,\rho ,\theta }(x,y))\\&= I(x,y)\otimes \phi _{\sigma ,\rho ,\theta }(x,y), \end{aligned} \end{aligned}$$
(7)

where \(\otimes \) represents a convolution operation. The AGDD reflects the gray-scale intensity variation of the input image along direction \(\theta +{\pi }/{2}\). It is easy to verify that

$$\begin{aligned} \nabla _{\sigma ,\rho ,\theta }I(x,y)=-\nabla _{\sigma ,\rho ,\theta +\pi }I(x,y). \end{aligned}$$
(8)

It means that the interval \([0,\pi )\) for AGDDs is enough to describe the intensity variation of the input image.

In the polar coordinate system, a point function in a wedge-shaped region can be defined as (Shui and Zhang 2013)

$$\begin{aligned} \begin{aligned}&\zeta _{\beta _{1},\beta _{2}}(r,\beta ) \\&\quad =\left\{ \begin{array}{ll} T,~\text {if}~0\le r<+\infty ,~\beta _1\le \beta \le \beta _2,~\beta _2-\beta _1\ne \pi \\ 0,~\text {otherwise}\\ \end{array} \right. \\ \end{aligned} \end{aligned}$$
(9)

where r is the radius, \(\beta \) is the polar angle, T is the gray value, and \(\beta _1\) and \(\beta _2\) are the lower and upper bounds of angle \(\beta \) as shown in Fig. 1. It can be easily found that a corner point is located at the tip o of the wedge-shaped region. In this paper, the point function in a wedge-shaped region is named as a basic corner model. A similar corner model is also presented in Deriche and Giraudon (1993).

Fig. 1
figure 1

Examples of a basic corner model in the polar coordinate system

The general corner model (e.g., L-type corner, Y- or T-type corner, X-type corner, and star-type corner) can be derived by the several basic corner models as follows

$$\begin{aligned} \begin{aligned} \hbar _{(T_{i},\beta _{i})}(r,\beta ) = \sum _{i=1}^{s} T_{i}\zeta _{\beta _{i},\beta _{i+1}}(r,\beta ), \end{aligned} \end{aligned}$$
(10)

where \(T_{i}\) represents the gray value of the i-th wedge-shaped region. s is the number of wedge-shaped regions. It is noted that \(\beta _{s+1}\) = \(\beta _{1}\). If \(s=2\) and \(\beta _2-\beta _1=\pi \), Eq. (10) represents a step edge. If \(s=2\) and \(\beta _2-\beta _1\ne \pi \), Eq. (10) corresponds to an L-type corner. If \(s=3\), Eq. (10) represents a Y- or T-type corner. If \(s=4\), Eq. (10) represents an X-type corner. If \(s=5\), Eq. (10) corresponds to a star-type corner.

The AGDD representation of the basic corner model is (Shui and Zhang 2013)

$$\begin{aligned} \xi _{\sigma ,\rho }(\theta )&=\iint _{\mathbb {R}^2}\zeta _{\beta _{1},\beta _{2}}(r,\beta )\phi _{\sigma ,\rho ,\theta }(-r,-\beta )rdrd\beta \nonumber \\&=\frac{T\rho }{2\sqrt{2\pi }\sigma }\Bigg ( \frac{\text {cos}(\beta _1-\theta )}{(\rho ^4\text {sin}^2(\beta _1-\theta )+\text {cos}^2(\beta _1-\theta ))^{\frac{1}{2}}}\nonumber \\&\quad -\frac{\text {cos}(\beta _2-\theta )}{(\rho ^4\text {sin}^2(\beta _2-\theta )+\text {cos}^2(\beta _2-\theta ))^{\frac{1}{2}}}\Bigg ), \end{aligned}$$
(11)

where \(\mathbb {R}^2\) represents the 2D real space and \(\psi _{\sigma ,\rho ,\theta }(r,\beta )\) represents the AGDD filter in the polar coordinate system. Then, the AGDD representation of the general corner model is

$$\begin{aligned}&\Lambda _{\sigma ,\rho }(\theta )\nonumber \\&\quad =\frac{\rho }{2\sqrt{2\pi }\sigma }\sum _{i=1}^{s}T_{i}\Bigg ( \frac{\text {cos}(\beta _{i}-\theta )}{(\rho ^4\text {sin}^2(\beta _{i}-\theta )+\text {cos}^2(\beta _{i}-\theta ))^{\frac{1}{2}}}\nonumber \\&\qquad {-}\frac{\text {cos}(\beta _{i{+}1}{-}\theta )}{(\rho ^4\text {sin}^2(\beta _{i{+}1}{-}\theta ){+}\text {cos}^2(\beta _{i{+}1}{-}\theta ))^{\frac{1}{2}}}\Bigg ). \end{aligned}$$
(12)

With \(\rho =1\), Eq. (12) is the IGDD representation of the general corner model

$$\begin{aligned} \begin{aligned} \kappa _{\sigma ,\rho }(\theta )&=\frac{1}{2\sqrt{2\pi }\sigma }\sum _{i=1}^{s}T_{i}\bigg (\text {cos}(\beta _{i+1}-\theta )-\text {cos}(\beta _{i}-\theta )\bigg )\\&=\frac{1}{\sqrt{2\pi }\sigma }\sum _{i=1}^{s}T_{i}\text {sin}\bigg (\theta -\frac{\beta _i+\beta _{i+1}}{2}\bigg )\text {sin}\bigg (\frac{\beta _{i+1}-\beta _i}{2}\bigg ), \end{aligned} \end{aligned}$$
(13)

which means that all the IGDD representations of a step edge and general corners are sine functions.

Examples of the AGDD and IGDD representations of the step edge and general corners are shown in Fig. 2. The step edge, L-type, Y- or T-type, X-type, and star-type corner models are illustrated in Fig. 2a–e respectively. Their corresponding intensity variations of the AGDD representations are shown in the second column. Their corresponding intensity variations of the IGDD representations are shown in the third column.

3 Proposed Method

In this section, the problems of the existing \(2\times 2\) structure tensor based corner detection methods are demonstrated and several corner intensity variation properties are summarized. Then, a new multi-directional structure tensor with multiple scales is derived for corner detection. Finally, a new image intensity variation extraction technique is presented.

3.1 Corner Properties

As shown in Fig. 2b-e, for the L-type, the Y- or T-type, the X-type, and the star-type corner models, it is obvious from the second and third columns of Fig. 2 that the directional derivatives are large in most directions at a corner while their directional derivatives of the AGDD or the IGDD representations are very small or even near zero along the horizontal (0) or vertical (\(\pi /2\)) directions. Then, these corners may not be correctly detected by the Harris detector. This phenomenon does not satisfy the definition for a corner (Moravec 1979) that the directional derivatives are large in all directions. Furthermore, along the horizontal and vertical directions, the directional derivatives of the input image cannot accurately depict the intensity variation differences between edges and corners. Take a step edge, an L-type corner, and an X-type corner as examples as shown in the first column of Fig. 2, their corresponding directional derivatives are zero in the horizontal direction. However, in the vertical direction, the absolute magnitude of the directional derivative on the edge is larger than that of the L-type and X-type corners. Then, edges may be detected as corners by Eq. (4), while the real corners may be marked as edges. The reason is that the existing intensity variation based methods (Harris and Stephens 1988; Noble 1988; Gårding and Lindeberg 1996; Kenney et al. 2003; Mikolajczyk and Schmid 2004; Gao et al. 2007; Duval-Poo et al. 2015) do not take the directional derivative differences in different filtering orientations between edges and corners fully into account.

Fig. 2
figure 2

A step edge, L-type, Y-type, X-type, and star-type corner models are shown in ae in the first column (gray values \(T_{1}=50\), \(T_{2}=100\), \(T_{3}=150\), \(T_{4}=200\), and \(T_{5}=120\)). Their corresponding directional derivatives of the AGDD representations (\(\rho ^2=8\), \(\sigma ^2=8\)) and IGDD representations (\(\rho ^2=1\), \(\sigma ^2=8\)) are shown in the second and third columns respectively

Fig. 3
figure 3

Examples of directional derivative changes by image rotation or affine image transformation with \(\rho ^2=16\) and \(\sigma ^2=4\). a An L-type corner with gray values \(T_{1}=50\) and \(T_{2}=100\), b The L-type corner is rotated by \(\pi /4\) clockwise, c the L-type corner undergoes an affine image transformation

We found from Eq. (13) that the isotropic Gaussian representations cannot depict the intensity variation differences between a step edge and general corners as shown in the third column of Fig. 2. However, we found from Eq. (12) that the anisotropic Gaussian representations have the ability to accurately depict the intensity variation differences between a step edge and general corners. As shown in the second column of Fig. 2, a step edge with \(\beta _{1}=\pi /2\), \(\beta _{2}=3\pi /2\), \(T_{1}=50\), and \(T_{2}=100\) has only one local maximum for a directional derivative at \(\theta =3\pi /2\) and one local minimum for a directional derivative at \(\theta =\pi /2\). For an L-type corner with \(\beta _{1}=11\pi /6\), \(\beta _{2}=\pi /6\), \(T_{1}=50\), and \(T_{2}=100\), it has two local maxima for directional derivatives at \(\theta =\pi /6\) and \(\theta =5\pi /6\) and two local minima for directional derivatives at \(\theta =7\pi /6\) and \(\theta =11\pi /6\). For a Y- or T-type corner with \(\beta _{1}=0\), \(\beta _{2}=2\pi /3\), \(\beta _{3}=4\pi /3\), \(T_{1}=50\), \(T_{2}=100\), and \(T_{3}=150\), it has three local maxima for directional derivatives at \(\theta =2\pi /3\), \(\theta =\pi \), and \(\theta =4\pi /3\) and three local minima for directional derivatives at \(\theta =0\), \(\theta =\pi /3\), and \(\theta =5\pi /3\). For an X-type corner, it has four local maxima and four local minima for directional derivatives. For a star-type corner, it has five local maxima and five local minima for directional derivatives.

Furthermore, our researches indicate that two orthogonal directional derivatives along the horizontal and vertical directions cannot accurately detect corners on an affine transformed image. Take an L-type corner as an example as shown in Fig. 3a, its corresponding two orthogonal directional derivatives are large from Eq. (12) as shown in the second column of Fig. 3. According to the criteria of Harris corner detection, it can be detected as a corner. After the L-type corner is rotated by \(\pi /4\) clockwise as shown in Fig. 3b, its corresponding two orthogonal directional derivatives are small from Eq. (12) as shown in the second column of Fig. 3. Then the corner may not be detected with such or similar image rotation transformations. The reason is that the two orthogonal directional derivatives do not contain enough local structure information. The existing multi-scale filtering techniques (Gårding and Lindeberg 1996; Lindeberg 1998; Schmid et al. 2000; Mikolajczyk and Schmid 2004; Laptev 2005; Lowe 2004; Bay et al. 2006; Brox et al. 2006; Gao et al. 2007; Marimon et al. 2010; Alcantarilla et al. 2012; Miao and Jiang 2013; Duval-Poo et al. 2015; Perona and Malik 1990; Wang 1999; Widynski and Mignotte 2014) cannot solve the aforementioned problem because the multi-scale filtering technique only efficiently enhance the local intensity variation extraction along the horizontal and vertical directions. Another example is when the image is rotated and squeezed, which means that the shape of the corner is changed. If the L-type corner undergoes an affine image transformation as shown in Fig. 3c, its corresponding two orthogonal directional derivatives are also small from Eq. (12) as shown in the second column of Fig. 3. Then the corner may not be detected with such or similar affine image transformations. The existing multi-scale filtering techniques cannot solve the aforementioned problem either.

Based on the above analysis, several properties of corners are summarized as follows:

Property 1

The intensity variation of a corner is large in most directions, not necessarily in all directions.

Property 2

The first-order derivatives along the horizontal and vertical directions cannot depict the intensity variations of step edges and corners well.

Property 3

The isotropic Gaussian filter cannot depict the intensity variation differences between step edges and corners accurately.

Property 4

The anisotropic Gaussian filters have the ability to depict the intensity variation differences between step edges and corners.

Property 5

The existing \(2\times 2\) structure tensor based techniques may not depict the differences between step edges and corners accurately.

The above properties will help us propose a new corner measure, a new corner detection algorithm, and a new image intensity variation extraction technique which will be presented in the following section.

3.2 Multi-directional Structure Tensor with Multiple Scales

Based on the aforementioned analysis, it can be concluded that the intensity variation based methods (Harris and Stephens 1988; Gårding and Lindeberg 1996; Kenney et al. 2003; Mikolajczyk and Schmid 2004; Miao and Jiang 2013; Gao et al. 2007) which used the first-order derivatives along the horizontal and vertical directions to construct the \(2\times 2\) structure tensor cannot detect corners well. In this section, the multi-scale and multi-directional anisotropic Gaussian filters are used as an example to explain how to detect corners using multi-scale and multi-directional intensity variation information.

Images are 2D discrete signals in the integer lattice \(\mathbb {Z}^2\), and the continuous AGKs and AGDD filters in Eqs. (5) and (6) need to be discretized in \(\mathbb {Z}^2\). Given multi-scales \(\sigma _s\) (e.g., \(s=1,2,3\)), an anisotropic factor \(\rho \), and K oriented angles \(\theta _{k}=(k-1)\pi /K~(k=1,2,\dots ,K)\), the discrete version of the functions for multi-directional AGKs \(g_{\sigma _s,\rho ,k}(x,y)\) and AGDD \(\phi _{\sigma _s,\rho ,k}(x,y)\) with multiple scales are below

$$\begin{aligned} g_{\sigma _s,\rho ,k}(\mathbf {n})&=\frac{1}{2\pi \sigma _s^2}\exp \left( -\frac{1}{2\sigma _s^2}\mathbf {n}^{\top }\mathbf {R}_{-k}\left[ \begin{array}{ll} \rho ^{-2}~~~0\\ 0~~~\rho ^{2}\\ \end{array} \right] \mathbf {R}_{k}\mathbf {n}\right) ,\nonumber \\ \phi _{\sigma _s,\rho ,k}(\mathbf {n})&=\frac{-\rho ^2[-\text {sin}\theta _k~\text {cos}\theta _k]\mathbf {n}}{\sigma _s^2}g_{\sigma _s,\rho ,k}(\mathbf {n}), \end{aligned}$$
(14)

with

$$\begin{aligned}&\mathbf {R}_{k}=\left[ \begin{array}{ll} \cos \theta _k&{}\sin \theta _k\\ -\sin \theta _k&{}\cos \theta _k \end{array} \right] ,~\mathbf {n}=\left[ \begin{array}{ll} n_x\\ n_y \end{array}\right] \in \mathbb {Z}^2,\\ \end{aligned}$$

where (\(n_x,n_y\)) represents the pixel coordinate in the integer lattice \(\mathbb {Z}^2\).

Given the multi-directional anisotropic Gaussian filters \(g_{\sigma _s,\rho ,k}(n_x,n_y)\) with multi-scales \(\sigma _s\), the discrete weighted sum of square differences \(\mathfrak {I}_s(n_x,n_y)\) of point \((n_x,n_y)\) is redefined as Eq. (15), where \((n_x+i,n_y+j)\) is a point in an image patch over an area with width \(u+1\) and height \(v+1\) centered on \((n_x,n_y)\), \(\triangle t\) is a shift at point \(I(n_x+i,n_y+j)\), \(\theta _k\) is the angle between the horizontal axis and the k-th oriented vector. In this paper, the size of \((u+1)\times (v+1)\) is set to \(7\times 7\).

$$\begin{aligned} \mathfrak {I}_s(n_x,n_y)&=\frac{\pi }{K(u+1)(v+1)}\sum _{i=-\frac{u}{2}}^{\frac{u}{2}}\sum _{j=-\frac{v}{2}}^{\frac{v}{2}}\sum _{k=1}^{K}\nonumber \\&\qquad g_{\sigma _s,\rho ,k}(n_x+i,n_y+j)\otimes \nonumber \\&\qquad \Big (I(n_x+i+\triangle t\text {cos}\theta _k,n_y+j+\triangle t\text {sin}\theta _k)\nonumber \\&\quad -I(n_x+i,n_y+j)\Big )^{2}, \end{aligned}$$
(15)

\(I(n_x+i+\triangle t\text {cos}\theta _k,n_y+j+\triangle t\text {sin}\theta _k)\) can be approximated by a Taylor expansion as

$$\begin{aligned} \begin{aligned}&I(n_x+i+\triangle t\text {cos}\theta _k,n_y+j+\triangle t\text {sin}\theta _k)\\&\quad \approx I(n_x+i,n_y+j)+\triangle t I_{k}(n_x+i,n_y+j), \end{aligned}\end{aligned}$$
(16)

where \(I_{k}(n_x+i,n_y+j)\) is the directional derivative of \(I(n_x+i,n_y+j)\) in the direction of \(\theta _k\). Substituting approximation Eq. (16) into Eq. (15) yields Eq. (17).

$$\begin{aligned} \begin{aligned}&\mathfrak {I}_s(n_x,n_y)\approx \frac{\pi }{K(u+1)(v+1)}\sum _{i=-\frac{u}{2}}^{\frac{u}{2}}\sum _{j=-\frac{v}{2}}^{\frac{v}{2}}\sum _{k=1}^{K}\\&\qquad g_{\sigma _s,\rho ,k}(n_x+i,n_y+j)\\&\qquad \otimes (\triangle t I_{k}(n_x+i,n_y+j))^{2}. \end{aligned} \end{aligned}$$
(17)

It is worth to note that

$$\begin{aligned} \begin{aligned}&\nabla _{\sigma _s,\rho ,k}I(n_x+i,n_y+j)\\&\quad =g_{\sigma _s,\rho ,k}(n_x+i,n_y+j)\otimes I_{k}(n_x+i,n_y+j). \end{aligned}\end{aligned}$$
(18)

As a result, Eq. (17) can be rewritten as Eq. (19), where M is a multi-directional structure tensor at multiple scales which is a symmetric \(K\times K\) matrix as given in Eq. (20). From Eq. (20), it can be easily concluded that the eigenvalues of matrix M are determined by scale \(\sigma _s\), the anisotropic factor \(\rho \), and the number of orientations K of the anisotropic Gaussian filters.

$$\begin{aligned}&\mathfrak {I}_s(n_x,n_y)\approx \frac{\pi }{K(u+1)(v+1)}\sum _{i=-\frac{u}{2}}^{\frac{u}{2}}\sum _{j=-\frac{v}{2}}^{\frac{v}{2}}\nonumber \\&\qquad \bigg ([\nabla _{\sigma _s,\rho ,1}I(n_x{+}i,n_y{+}j), \nabla _{\sigma _s,\rho ,2}I(n_x{{+}}i,n_y{+}j),\dots ,\nonumber \\&\qquad \nabla _{\sigma _s,\rho ,K}I(n_x+i,n_y+j)][\triangle t, \triangle t,\dots , \triangle t]^{\top }\bigg )^{2}\nonumber \\&\quad =\frac{\pi }{K(u+1)(v+1)}(\triangle t~\triangle t~\dots ~\triangle t)M \left( \begin{array}{cccc} \triangle t\\ \triangle t\\ \vdots \\ \triangle t \\ \end{array} \right) , \end{aligned}$$
(19)
$$\begin{aligned}&M=\left[ \begin{matrix} \displaystyle \sum _{i=-\frac{u}{2}}^{\frac{u}{2}}\sum _{j=-\frac{v}{2}}^{\frac{v}{2}} \nabla _{\sigma _s,\rho ,1}^{2}I(n_{x}+i,n_{y}+j) &{} \cdots &{} \displaystyle \sum _{i=-\frac{u}{2}}^{\frac{u}{2}}\sum _{j=-\frac{v}{2}}^{\frac{v}{2}}\nabla _{\sigma _s,\rho ,1}I(n_{x}+i,n_{y}+j)\nabla _{\sigma _s,\rho ,K}I(n_{x}+i,n_{y}+j) \\ \vdots &{} \ddots &{} \vdots \\ \displaystyle \sum _{i=-\frac{u}{2}}^{\frac{u}{2}}\sum _{j=-\frac{v}{2}}^{\frac{v}{2}}\nabla _{\sigma _s,\rho ,K}I(n_{x}+i,n_{y}+j)\nabla _{\sigma _s,\rho ,1}I(n_{x}+i,n_{y}+j) &{} \cdots &{} \displaystyle \sum _{i=-\frac{u}{2}}^{\frac{u}{2}}\sum _{j=-\frac{v}{2}}^{\frac{v}{2}} \nabla _{\sigma _s,\rho ,K}^{2}I(n_{x}+i,n_{y}+j) \\ \end{matrix} \right] \end{aligned}$$
(20)

3.3 Corner Measure and Corner Detection Algorithm

In this section, a new corner measure and a new corner detection algorithm are presented as follows.

In this paper, K eigenvalues \(\{\lambda _1,\lambda _2,\ldots ,\lambda _{K}\}\) of the \(K\times K\) multi-directional structure tensor at each scale are used to form a new corner measure to distinguish corners from other points in the input image. The new corner measure is defined as

$$\begin{aligned} \wp _{s}(n_x,n_y)=\frac{\prod \limits _{k=1}^{K} \lambda _k}{\sum \limits _{k=1}^{K}\lambda _k+\tau }, \end{aligned}$$
(21)

where \(\tau \) is a small constant (\(\tau =2.22 \times 10^{-16}\)) which is used to avoid a singular denominator in the case of a rank zero structure tensor. For each image pixel \((n_x,n_y)\), it is marked as a corner if its corresponding \(\wp _{s}(n_x,n_y)\) is a local maximum within a 7\(\times \)7 window and is larger than a threshold \(T_{h}\) at each scale \(\sigma _s\) (\(s=1,2,3\)).

In general, the new corner measure has the following advantages over the existing \(2\times 2\) structure tensor based methods (Noble 1988; Gårding and Lindeberg 1996; Kenney et al. 2003; Mikolajczyk and Schmid 2004; Gao et al. 2007; Duval-Poo et al. 2015). The proposed corner measure has the ability to accurately detect corners, and it is also robust for corner detection with image affine transformations. This is impossible for the existing corner detectors (Deriche and Giraudon 1993; Smith and Brady 1997; Rosten et al. 2010; Shui and Zhang 2013; Xia et al. 2014; Moravec 1979; Harris and Stephens 1988; Noble 1988; Gårding and Lindeberg 1996; Lindeberg 1998; Schmid et al. 2000; Kenney et al. 2003; Mikolajczyk and Schmid 2004; Laptev 2005; Lowe 2004; Bay et al. 2006; Marimon et al. 2010; Maver 2010; Su et al. 2012).

The proposed corner detection method is described as follows:

  1. 1.

    Use the multi-directional anisotropic Gaussian filters at multi-scales to smooth the input image, and derive the multi-direction directional derivatives at multi-scales as in Eq. (7).

  2. 2.

    For each image pixel, construct the multi-directional structure tensor at multi-scales as in Eq. (20).

  3. 3.

    Obtain the eigenvalues at each scale based on Eq. (21).

  4. 4.

    Mark the pixel as a candidate corner if its corresponding corner measure is the local maximum within a window (7\(\times \)7) and is larger than the threshold \(T_h\) at the lowest scale.

  5. 5.

    Mark the candidate corner as a corner if its corresponding corner measure is larger than the threshold \(T_h\) at all the scales.

3.4 A New Image Intensity Variation Extraction Technique

In this subsection, our aim is to present a new image intensity variation extraction technique to accurately depict the intensity variation differences between edges and corners.

From Eq. (10), the step edge, the L-type, the Y- or T-type, the X-type, and the star-type corners can be represented by the sum of several basic corner models as described by Eq. (9). From Eq. (12), we derived that step edge, L-type, Y- or T-type, X-type, and star-type corners have only one, two, three, four, and five local maxima for the first-order anisotropic Gaussian directional derivative respectively. Noting that each local maximum of the first-order derivatives corresponds to a local minimum of the first-order derivatives. It means that if the extracted image intensity variations have the ability to describe the number of the maximum points of the first-order anisotropic Gaussian directional derivatives, the extracted image intensity variations have the ability to depict the characteristics of edges and corners. Then for each AGDD representation of the basic corner model as given in Eq. (11), if its corresponding two local maxima on the directional derivatives can be identified, the extracted local intensity variation information has the ability to depict the intensity variation differences between edges and corners. In what follows, we discuss how to design the anisotropic Gaussian filters to exactly identify the two local maxima on the directional derivatives of the AGDD representation of the basic corner model.

The AGDD representation of the basic corner model is shown in Eq. (11). Without loss of generality, let \(\beta _{2}-\beta _{1}\in (0,\pi )\). A basic corner with \(\beta _{1}=-\pi /6\), \(\beta _{2}=\pi /3\), and \(T=50\) is selected as an example, and the directional derivatives of the basic corner model is shown in Fig. 4. For the basic corner model, it is easy to verify that the two local maxima on the directional derivative curve are at \(\beta _2+\pi \) and \(\beta _1+2\pi \) as shown in Fig. 4. The angle difference between the two local maxima is \(\pi -(\beta _2-\beta _1)\). Then, only if \(\theta \) equals \((\beta _1+\beta _2+3\pi )/2\), there exists a local minimum on the directional derivatives when the two local maxima can be distinguished.

Fig. 4
figure 4

Examples of directional derivatives of the basic corner model with \(\beta _{1}=-\pi /6\), \(\beta _{2}=\pi /3\), \(T=50\), \(\rho ^2=4\), and \(\sigma ^2=4\)

The first-order AGDD representation of the basic corner model is

$$\begin{aligned} \xi _{\sigma ,\rho }^{\prime }(\theta )&=\frac{T}{2\sqrt{2\pi }\sigma }\Bigg ( \frac{\rho ^2\text {sin}(\beta _{1}-\theta )}{(\rho ^2\text {sin}^2(\beta _{1}{-}\theta ){+}\rho ^{-2}\text {cos}^2(\beta _{1}{-}\theta ))^{\frac{3}{2}}}\nonumber \\&\qquad -\frac{\rho ^2\text {sin}(\beta _{2}-\theta )}{(\rho ^2\text {sin}^2(\beta _{2}-\theta )+\rho ^{-2}\text {cos}^2(\beta _{2}-\theta ))^{\frac{3}{2}}}\Bigg ). \end{aligned}$$
(22)

The second-order AGDD representation of the basic corner model is

$$\begin{aligned} \xi _{\sigma ,\rho }^{\prime \prime }(\theta )&=\frac{T}{2\sqrt{2\pi }\sigma }\Bigg ( \frac{\big (2(\rho ^4-1)\text {sin}^{2}(\beta _{1}-\theta )-1\big )\text {cos}(\beta _1-\theta )}{(\rho ^2\text {sin}^2(\beta _{1}-\theta )+\rho ^{-2}\text {cos}^2(\beta _{1}-\theta ))^{\frac{5}{2}}} \nonumber \\&\quad -\frac{(2(\rho ^4-1)\text {sin}^{2}(\beta _{2}-\theta )-1)\text {cos}(\beta _2-\theta )}{(\rho ^2\text {sin}^2(\beta _{2}-\theta )+\rho ^{-2}\text {cos}^2(\beta _{2}-\theta ))^{\frac{5}{2}}}\Bigg ). \end{aligned}$$
(23)

If \(\xi _{\sigma ,\rho }(\frac{\beta _1+\beta _2+3\pi }{2})\) is a local minimum on the directional derivatives, its corresponding first-order and second-order derivatives should satisfy

$$\begin{aligned} \begin{aligned} \xi _{\sigma ,\rho }^{\prime }\left( \frac{\beta _1+\beta _2+3\pi }{2}\right)&=0,\\ \xi _{\sigma ,\rho }^{\prime \prime }\left( \frac{\beta _1+\beta _2+3\pi }{2}\right)&>0. \end{aligned}\end{aligned}$$
(24)

When \(\theta \) equals \(\frac{\beta _1+\beta _2+3\pi }{2}\), we can conclude from Eq. (22) that \(\xi _{\sigma ,\rho }^{\prime }(\theta )\) is 0, and its corresponding second-order derivative \(\xi _{\sigma ,\rho }^{\prime \prime }(\theta )\) is

$$\begin{aligned} \begin{aligned}&\xi _{\sigma ,\rho }^{\prime \prime }\left( \frac{\beta _1+\beta _2+3\pi }{2}\right) \\&\quad =\frac{T}{\sqrt{2\pi }\sigma } \frac{\left( 2(\rho ^4-1)\text {cos}^{2}(\frac{\beta _2-\beta _1}{2})-1\right) \text {sin}(\frac{\beta _2-\beta _1}{2})}{\left( \rho ^2\text {cos}^2(\frac{\beta _2-\beta _1}{2})+\rho ^{-2}\text {sin}^2(\frac{\beta _2-\beta _1}{2})\right) ^{\frac{5}{2}}}. \end{aligned}\end{aligned}$$
(25)

From Eq. (25), it can be derived that \(\xi _{\sigma ,\rho }(\frac{\beta _1+\beta _2+3\pi }{2})\) is the local minimum on the gradient magnitude responses if it satisfies

$$\begin{aligned} \begin{aligned} 2(\rho ^4-1)\text {cos}^{2}\left( \frac{\beta _2-\beta _1}{2}\right) -1>0. \end{aligned}\end{aligned}$$
(26)

Inequality (26) holds if the following is satisfied

$$\begin{aligned} \begin{aligned} \rho ^4>1+\frac{1}{2\text {cos}^2\left( \frac{\beta _2-\beta _1}{2}\right) }. \end{aligned}\end{aligned}$$
(27)
Fig. 5
figure 5

Test images

When \(\beta _2-\beta _1=0\), the right-hand side of inequality (27) gives the minimum \(\frac{3}{2}\). For a given anisotropic factor \(\rho ^2 > \frac{\sqrt{6}}{2}\), the two local maxima on the directional derivatives can be resolved only when the angle \(\beta _2-\beta _1\) satisfies

$$\begin{aligned} \begin{aligned} 0<\beta _2-\beta _1<2\text {arccos}\left( \frac{1}{\sqrt{2(\rho ^4-1)}}\right) . \end{aligned}\end{aligned}$$
(28)

Inequality (28) can be further written as inequality (29)

$$\begin{aligned} \begin{aligned} \pi -2\text {arccos}\left( \frac{1}{\sqrt{2(\rho ^4-1)}}\right)<\pi -(\beta _2-\beta _1)<\pi . \end{aligned}\end{aligned}$$
(29)

It is worth to note that \(\beta _2-\beta _1\) is the range of \(\beta \) for the basic corner model (9) and \(\pi -(\beta _2-\beta _1)\) is the angle difference between the two local maxima on the directional derivative of the basic corner. From inequality (28) and inequality (29), it can be concluded that the larger the anisotropic factor, the more the local intensity variation information that can be extracted by the anisotropic Gaussian filters which have a stronger ability to distinguish adjacent local maxima on the directional derivatives. We note that the L-type, Y- or T-type, X-type, and star-type corners can be represented by the sum of several basic corner models. Then, if all the angles of the basic corner models satisfy with inequality (28), it means that the obtained intensity variation information has the ability to describe the number of the maximum points of the first-order anisotropic Gaussian directional derivatives, and it also means that the extracted local intensity variation information can accurately depict the intensity variation differences between edges and corners.

4 Experimental Results and Performance Evaluation

The proposed corner detector is compared with ten state-of-the-art detectors [Harris (Harris and Stephens 1988), Harris–Laplace (Mikolajczyk and Schmid 2004), FAST (Rosten et al. 2010), DoG (Lowe 2004), SURF (Bay et al. 2006), KAZE (Alcantarilla et al. 2012), ANDD (Shui and Zhang 2013), ACJ (Xia et al. 2014), LIFT (Yi et al. 2016), and Superpoint (DeTone et al. 2018)]. Thirty images (Bowyer et al. 1999) are used to evaluate the average repeatabilities (Awrangjeb and Lu 2008) of these detectors. The Oxford dataset is used to assess the region repeatability (Mikolajczyk et al. 2005) of these detectors. The DTU-Robots dataset (Aanæs et al. 2012) that contains 3D objects under changing viewpoints is used to evaluate the repeatability metric of these detectors. Furthermore, two test images with ground truths are used to assess the detection capability and localization accuracy of these methods. Execution time, memory usage, and 3D reconstruction from large scale structure from motion dataset are also investigated.

The original codes for seven of these detectors in Rosten et al. (2010), Bay et al. (2006), Alcantarilla et al. (2012), Shui and Zhang (2013), Xia et al. (2014), Yi et al. (2016), DeTone et al. (2018) are from the authors. The codes for the Harris–Laplace (Mikolajczyk and Schmid 2004) and DoG (Lowe 2004) detectors are from http://www.robots.ox.ac.uk/vgg/affine/. The code for the Harris detector (Harris and Stephens 1988) is from http://peterkovesi.com/matlabfns/. The parameter settings for the proposed detector are: \(\rho ^2=1.5\), \(\sigma _1^2=1.5\), \(\sigma _2^2=3\), \(\sigma _3^2=4.5\), \(K=8\), \((u+1) \times (v+1)=7 \times 7\), and \(T_h=1.0\times 10^7\). The program or web demos of the proposed method can be accessed at http://vision-cdc.csiro.au/corner1st/. The selection of the parameters for the proposed method will be discussed in Sect. 4.1.

4.1 Repeatability Under Affine Transformation

In Awrangjeb and Lu (2008), the average repeatability \(R_{\text {avg}}\) measures the average number of the repeated corners between the original and affine transformed images. It is defined as

$$\begin{aligned} R_{\text {avg}}=\frac{N_{r}}{2}\left( \frac{1}{N_{o}}+\frac{1}{N_{t}}\right) , \end{aligned}$$
(30)

where \(N_{o}\) and \(N_{t}\) are the numbers of detected corners from the original and transformed images by a detector, and \(N_{r}\) is the number of repeated corners between them. If a corner is detected in a geometrically transformed image, and it is in the neighbourhood of the ground truth location (say within 4 pixels), then a repeated corner is detected. A higher average repeatability means a better performance. Thirty images (Bowyer et al. 1999) with different scenes as shown in Fig. 5 are used for measuring the average repeatability for the detectors.

We followed the criteria standard (Awrangjeb and Lu 2008) that a total of 6,510 transformed test images were obtained by applying the following six different types of transformations on each original image:

  • Rotation: The original image was rotated at \(10^{\circ }\) apart within \([-\pi /2,\pi /2]\), excluding \(0^{\circ }\).

  • Uniform scaling: The scale factors \(s_x=s_y\) are in [0.5, 2] with 0.1 apart, excluding 1.

  • Non-uniform scaling: The scale \(s_x\) is in [0.7, 1.5] and \(s_y\) is in [0.5, 1.8] with 0.1 apart, excluding the case when \(s_x=s_y\).

  • Shear transformations: The shear factor c was chosen by sampling the range \([-1,1]\) with a 0.1 interval, excluding 0, with the following formula

    $$\begin{aligned}\begin{aligned} \left[ \begin{array}{c} x'\\ y'\end{array} \right] = \left[ \begin{array}{cc} 1&{}c\\ 0&{}1 \end{array} \right] \left[ \begin{array}{c} x\\ y\end{array} \right] .\end{aligned}\end{aligned}$$
  • Lossy JPEG compression: A compression factor is in [5, 100] at 5 apart.

  • Gaussian noise: Zero mean white Gaussian noise was added to the original image at 15 standard deviations in [1, 15] with an interval 1.

Table 1 Average repeatability of the proposed method
Fig. 6
figure 6

Average repeatability of the proposed method under rotation, lossy JPEG compression, and additive white Gaussian noises

From inequality (28) and inequality (29), it is concluded that the larger the anisotropic factor, the higher the potential to extract the intensity variation information to depict the intensity variation differences between step edges and corners. Meanwhile, it is proved in Shui and Zhang (2012) that the variance \(\varepsilon _w^2\) of the image Gaussian noise smoothed by the AGDD filters is \(\varepsilon _w^2=\frac{\rho ^2\epsilon ^2}{8\pi \sigma ^4}\). It means that the noise response of an AGDD filter is proportional to the noise variance and to the square of the anisotropic factor and inversely proportional to the power of four of the scale factor. Considering the use of the extracted intensity variation information to depict the intensity variation differences between step edges and corners and the capability on noise suppression, the scale factors with \(\sigma _1^2=1.5\), \(\sigma _2^2=3\), and \(\sigma _3^2=4.5\) are used in the proposed detector. The next step is to discuss the selection of the number of directions and the anisotropic factor.

In this evaluation criteria, we firstly fix the anisotropic factor with \(\rho ^2 = 1.5\) to check the average repeatability of the proposed methods with different number of directions. It can be observed from Table 1 that the proposed method achieves the best performance when K is 8. Secondly, we fix the direction number \(K = 8\) to check the average repeatability of the proposed method with different anisotropic factors. It can be observed from Table 1 that with \(K = 8\), the proposed method achieves the best performance when \(\rho ^2\) is 1.5. From this experiment, we found that the different numbers of directions have a great influence on the performance under image rotation transformations, as shown in Fig. 6a. With \(K = 2\), the performance of the proposed method drops dramatically in the case of image rotation transformation. The reason is that the AGDD filters with two directions cannot extract enough intensity variation information and cannot accurately detect corners with image rotation transformations. Meanwhile, we can also found that the anisotropic factor has a great influence on the performance of the proposed method under image lossy JPEG compression and additive white Gaussian noises as shown in Fig. 6b, c. With anisotropic factor \(\rho ^2 = 2.5\), the performances of the proposed method drop dramatically in the cases of image lossy JPEG compression and additive white Gaussian noises. The reason is that the large anisotropic factor will reduce the ability of AGDD filters to suppress the Gaussian noise. Based on the aforementioned analysis, the direction number with \(K = 8\) and the anisotropic factor with \(\rho ^2 = 1.5\) are used in the proposed detector.

Then, the proposed approach with the fixed parameter setting has been compared with the ten other detectors (Harris and Stephens 1988; Mikolajczyk and Schmid 2004; Rosten et al. 2010; Lowe 2004; Bay et al. 2006; Alcantarilla et al. 2012; Shui and Zhang 2013; Xia et al. 2014; Yi et al. 2016; DeTone et al. 2018. The results with different rotations, uniform scalings, non-uniform scalings, shear transformation, lossy JPEG compression, and Gaussian noises are shown in Fig. 7. It can be observed that the proposed detector achieves the best performance under this evaluation criteria.

Fig. 7
figure 7

Average repeatability of the eleven detectors under rotation, uniform scaling, non-uniform scaling, shear transforms, lossy JPEG compression, and additive white Gaussian noises

4.2 Repeatability Score Under Region Repeatability Evaluation

In http://www.robots.ox.ac.uk/vgg/affine/, each of the image sequences used in the evaluation contains six images of naturally textured scenes with increasingly geometric and photometric transformations. The images in a sequence are related by a homography which is provided with the image data (http://www.robots.ox.ac.uk/vgg/affine/). The repeatability score for a given pair of images is computed as the ratio between the number of region-to-region correspondences and the minimum number of regions in one of the images. Two regions are deemed to correspond if the overlap error \(\epsilon \) is sufficiently small. For region repeatability evaluation (Mikolajczyk et al. 2005), the overlap error is defined as one minus the ratio between the intersection of regions, \(A\cap H^{\top }BH\), and the union of the regions, \(A\cup H^{\top }BH\),

$$\begin{aligned} \begin{aligned} \epsilon =1-\frac{A\cap H^{\top }BH}{A\cup H^{\top }BH}, \end{aligned}\end{aligned}$$
(31)

where A represents a region in the original image, B represents the corresponding region in the transformed image, and H is the corresponding homography between the original and the transformed image. When the overlap error between two regions is less than \(40\%\), a correspondence is detected. The repeatability score is defined as

$$\begin{aligned} \begin{aligned} RS_i = \frac{CR_{1i}}{\text {min}(C_1,C_i)}, \end{aligned}\end{aligned}$$
(32)

where \(CR_{1i}\) is the number of correspondences between the original image and the \(i \)-th transformed image (\(i=1,\ldots ,6\)), \(C_1\) is the number of the detected corners from the original image, and \(C_i\) is the number of the detected corners from the \(i \)-th transformed image.

In this experiment, six image sequences from http://www.robots.ox.ac.uk/vgg/affine/ are selected for performance evaluation and two image sequences (large zooming and rotations) are discarded. The reason is that it usually needs an appropriate descriptor to handle large image zooming and rotations (Duval-Poo et al. 2015). The threshold for each method is tuned to extracts about 1,000 corners from each input image. The repeatability scores for the six image sequences are illustrated in Fig. 8. Compared with the other ten methods, the proposed method achieves the best performance for the ‘Trees’, ‘Bikes’, ‘Ubc’, and ‘Leuven’ images. For the ‘Wall’ and ‘Graffiti’ images, the proposed method obtains a moderate performance. It is worth to note that the performance of the other methods vary greatly for different image sequences. The main reason is that the issue on how to effectively obtain the intensity variation information from the input images has not been considered in other ten methods. In conclusion, the proposed method achieves the best overall detection performance on region repeatability evaluation.

Fig. 8
figure 8

Comparison of different corner detectors on six image sequences, a Trees (image blur), b Bikes (image blur), c Ubc (image compression), d Leuven (light change), e Wall (viewpoint change), and f Graffiti (viewpoint change)

4.3 Repeatability Metric Under the DTU-Robots Dataset

In the DTU-Robots dataset (Aanæs et al. 2012), the performances of the detectors are evaluated under viewpoint, scale, and light changes using a large database of images with repeatability metric as a performance measure. The camera is placed at 119 positions in three horizontal paths (Arc 1, Arc 2, and Arc 3) and along a linear path (Linear path) in front of 60 scenes. For each scene, 119 images of 1,200\(\times \)1,600 pixels are acquired from the 119 camera positions. The center image which is the closest to the scene is chosen as the reference image. In the first evaluation setting, all feature points found in each image are compared with the points extracted from the reference image. Meanwhile, to simulate natural scenes, light varies from being diffuse on an overcast day to highly directional in sunshine and the scene is illuminated by 18 individually controlled light emitting diodes, which can be combined to provide a highly controlled and flexible light setting. In the second evaluation setting, the scene relighting has been carried out both from right to left and from back to front to investigate the sensitivity of the feature detectors to changes of lightings. At a camera position, ten different illumination settings are configured by changing the lighting directions. Then, ten different images are obtained from ten different illumination settings. All feature points found in each image are compared with the points extracted from the reference image (the tenth image is chosen as the reference image in this evaluation setting).

In this experiment, the repeatability metric for one pair of images is used as a performance measure which is defined as

$$\begin{aligned} \begin{aligned} {{R_{\text {metric}}}} = \frac{{M_{\text {corresp}}}}{{M_{\text {total}}}}, \end{aligned}\end{aligned}$$
(33)

where \(M_{\text {corresp}}\) is the number of correspondences between the reference image and each image, and \(M_{\text {total}}\) is the number of the detected corners from the reference image. A point in the reference image is marked as a correspondence point if it meets the following three criteria.

  • Epipolar Geometry: Consistency with epipolar geometry is used as the first evaluation criterion. The camera positions provide the basis for the relationship between points in one image and associated epipolar lines in another. Points are eliminated if they are more than 2.5 pixels away from the epipolar line.

  • Surface Geometry: 3D reconstruction is used as the second evaluation criterion. Two points are considered as a positive match if their 3D positions are within a window with a radius of 10 pixels to the scene surface obtained from the structured light reconstruction. On the contrary, points within a window with a radius of 10 pixels without reconstruction are removed.

  • Absolute Scale: Scale consistency is used as the third evaluation criterion. The output scale of the point and the output scale of the corresponding point in another test image should be within a scale range of 2.

Fig. 9
figure 9

Average repeatability metric for different scene settings, a Arc 1, b Arc 2, c Arc 3, and d Linear path

Fig. 10
figure 10

Average repeatability metric for the change in light direction for four camera positions (1, 20, 64, and 65). The first row is the average repeatability metric for the change in light direction from right to left (R/L). The second row is the average repeatability metric for the change in light direction from back to front (B/F)

Fifty-four sets of images (a total of 122,094 test images) from the original sixty sets of images (Aanæs et al. 2012) are obtained for our evaluation (the 31th–36th sets cannot be downloaded from Aanæs et al. (2012)). In this experiment, the threshold for each detector is adjusted so that each detector extracts about 2,000 corners from each input image. Figure 9 shows the average match percentage for 119 positions. The average repeatability metrics for the changes in light directions from right to left for four camera positions (1, 20, 64, and 65) are shown in the first row of Fig. 10. The average repeatability metrics for the changes in light direction from back to front for four camera positions (1, 20, 64, and 65) are shown in the second row of Fig. 10. It is worth to note that we follow the statement in Aanæs et al. (2012) and left out the FAST corner detector (Rosten et al. 2010) in the light change experiments because of the missing scale information and its, in general, unreliable performance. The average repeatability metric for each detector is summarized in Table 2. It can be observed that the proposed method outperforms all other methods by a large margin. The reason is that the proposed method has the ability to accurately extract image local intensity variation information to depict the differences between step edges and corners and accurately extract corners from the input images.

Table 2 Average match percentage

4.4 Evaluation of Detection Performance Based on Ground Truth Images

Let \(DC = \{(\hat{x}_{i},\hat{y}_{i}), ~i = 1,2,\ldots ,M_{1}\}\) and \(GT =\)\(\{(x_{j},y_{j}),~j=1,2,\ldots ,M_{2}\}\) be the sets for detected corners by a corner detector and the true corners in the ground truth images respectively. For a corner \((x_j,y_j)\) in set GT, a corner is found from set DC with the minimal distance. If the minimal distance is not more than a predefined threshold \(\delta \) (here \(\delta =4\)), corner \((\hat{x}_{i},\hat{y}_{i})\) is treated as correctly detected, and corner \((x_j,y_j)\) in set GT and the detected corner in set DC form a matched pair. Otherwise, the corner \((x_j,y_j)\) is counted as a missed corner. Similarly, for a corner \((\hat{x}_{i},\hat{y}_{i})\) in set DC, a corner is found from set GT with the minimal distance. If the minimal distance is larger than threshold \(\delta \), then corner \((\hat{x}_{i},\hat{y}_{i})\) is labelled as a false corner. The localization error is defined as the average distance for all the matched corner pairs. Let \(\{(\hat{x}_{l},\hat{y}_{l}),(x_l,y_l) : ~l=1,2,\ldots ,N_m\}\) be the matched pairs in sets GT and DC. The average localization error is calculated by

$$\begin{aligned} {L_{e}=\sqrt{\frac{1}{N_m}\sum _{l=1}^{N_m}((\hat{x}_{l}-x_l)^{2}+(\hat{y}_{l}-y_l)^{2})}. } \end{aligned}$$
(34)

The two commonly used images ‘Geometric’ and ‘Lab’ (Shui and Zhang 2013; Xia et al. 2014) are used for accuracy evaluations (Shui and Zhang 2013). The ground truths for the two test images are shown in Fig. 11. The image ‘Geometric’ contains 84 corners and the image ‘Lab’ contains 249 corners.

Fig. 11
figure 11

Test images a ‘Geometric’ and b ‘Lab’ and their ground truth corner positions

Fig. 12
figure 12

Detection results on the test image ‘Geometric’, a Harris (Harris and Stephens 1988), b Harris–Laplace (Mikolajczyk and Schmid 2004), c FAST (Rosten et al. 2010), d ANDD (Shui and Zhang 2013), e ACJ (Xia et al. 2014), and f Proposed detectors

Fig. 13
figure 13

Detection results on the test image ‘Lab’, a Harris (Harris and Stephens 1988), b Harris–Laplace (Mikolajczyk and Schmid 2004), c FAST (Rosten et al. 2010), d ANDD (Shui and Zhang 2013), e ACJ (Xia et al. 2014), and f Proposed detectors

In this experiment, the proposed method is compared with five detectors (Harris (Harris and Stephens 1988), Harris–Laplace (Mikolajczyk and Schmid 2004), FAST (Rosten et al. 2010), ANDD (Shui and Zhang 2013), and ACJ (Xia et al. 2014)). The detection results of the six detectors are shown in Figs. 12 and 13. The number of missed corners, the number of false corners, and the localization error for each detector are listed in Table 3. For the two test images, different detectors show different detection characteristics. Assume that missing a corner point and marking a false corner point incur the same loss in detection performance, the total number is used to assess the detection performance for a corner detector. The fewer the number of missed and false corner points, the better the detection performance. For the ‘Geometric’ image, the total number of missed and false corner points for the Harris, the Harris–Laplace, the FAST, the ANDD, the ACJ, and the proposed detectors are 144, 132, 112, 29, 28, and 28, respectively. For the ‘Lab’ image, the total number of missed and false corner points for the Harris, the Harris–Laplace, the FAST, the ANDD, the ACJ, and the proposed detectors are 215, 361, 255, 144, 225, and 169, respectively. It can be observed that the proposed detector and the ANDD detector attain the best corner detection.

Besides, the corner localization accuracy is another important measure to evaluate corner detectors. For the ‘Geometric’ image, the proposed method attains the smallest localization error, and the ANDD detector attains the second smallest localization error. For the ‘Lab’ image, the ANDD detector attains the smallest localization error, and the proposed detector attains the second smallest localization error. In conclusion, the proposed detector and the ANDD detector attains the best detection performance.

It is worth to note that our research also indicates that the performance of the proposed method can be affected by threshold selection and the change of illumination. Take a house image as an example, the corner detection results of the proposed method under different illuminations are shown in Fig. 14. It can be seen that some obvious corner points in the window area (marked by ‘’) cannot be detected under different illuminations as shown in Fig. 14c, d. The reason is that the directional derivatives \(\xi _{\sigma ,\rho }(\theta )\) in the window area are very small.

4.5 Execution Time and Memory Usage

The proposed corner detector has been implemented in MATLAB (R2017b) using a 2.81 GHz CPU with 16 GB of memory. For different images (http://www.robots.ox.ac.uk/vgg/affine/), the thresholds for the Harris, DoG, KAZE, ANDD, ACJ, and the proposed methods are tuned for detecting around 2000 features and each detector was executed 100 times. The corresponding execution time and memory usage are shown in Table 4. The codes for the Harris, DoG, KAZE, ANDD, and ACJ methods are written in MATLAB. It can be found that the memory usage of the proposed method is in the middle range among all the compared methods. Meanwhile, it can be observed that the proposed method cannot meet the needs of real-time applications. The proposed method can be implemented using GPU (Cornelis and Van Gool 2008) or FPGA (Huang et al. 2012) to improve the speed performance.

Table 3 Performance comparison for the six detectors on two ground truth test images
Fig. 14
figure 14

Examples of the corner detection results of the proposed method under different illuminations

4.6 Application for 3D Reconstruction

In order to verify the performance of the proposed corner detector in real tasks, 3D reconstruction based on the proposed corner detection method is carried out. Our 3D reconstruction process is based on the structure from motion technique in Hartley and Zisserman (2004) and Snavely et al. (2006) which aims to recover camera parameters, pose estimates, and sparse 3D reconstruction from image sequences. In this experiment, two datasets (Aanæs et al. 2012; Wilson and Snavely 2014) are used to perform 3D reconstruction. The first dataset (Aanæs et al. 2012) contains high resolution images which are captured from 49 fixed viewpoints. The second dataset (Wilson and Snavely 2014) contains unordered images and many are distorted images. These two datasets represent two typical image collection situations for 3D reconstruction applications. The first dataset (Aanæs et al. 2012) is widely used for applications about reconstructing a very specific scene or object. The second dataset (Wilson and Snavely 2014) is widely used for applications about reconstructing a very large scale place such as landmarks or cities.

We combined the proposed corner detector with the SURF descriptor (Bay et al. 2006) for sparse 3D reconstruction. The proposed method is compared with the SURF method (Bay et al. 2006). The threshold for each detector is adjusted so that each detector extracts about 1,500 corners from each input image. The SURF descriptor (Bay et al. 2006) is with default scales. For each scene, forty images are selected for 3D reconstruction. The results of sparse 3D reconstruction are shown in Fig. 15. In this experiment, we use the number of reconstructed 3D points as the performance indicator for the two methods. For the ‘Rabit’ images, the SURF method and the proposed method used 7426 and 8562 points for 3D reconstruction respectively. For the ‘Alamo’ images, the SURF method and the proposed method used 21,322 and 25,680 points for 3D reconstruction respectively. It can be observed that the sparse 3D reconstruction from the proposed method contains more scene structure information. The reason is that the proposed method has the ability to accurately extract corners from the input images.

5 Conclusion

The contributions of the paper include six aspects. First, we proved for the first time that the existing intensity variation based corner detectors using the first-order derivatives along the horizontal and vertical directions cannot effectively detect corners. It is necessary to extract image local intensity variation information using the AGDD filters along multi-directions. Second, the properties of the anisotropic and isotropic Gaussian directional derivative representations of step edge, L-type corner, and other types of corners are investigated and discovered. Third, a new intensity variation extraction technique is presented which has the ability to accurately depict the intensity variation differences between step edges and corners. Fourth, a multi-directional structure tensor with multiple scales is derived for corner detection. Fifth, a new corner measure and a new corner detection algorithm are presented. Sixth, the proposed detector outperforms ten state-of-the-art corner detectors in terms of average repeatability (under affine image transformation, JPEG compression, and noise degradation), region repeatability, repeatability metric, detection accuracy, and localisation error. In our approach, the AGDD filters can also be replaced by other filters, such as shearlet, Gabor, or anisotropic diffusion filters for corner detection. The proposed corner detector also has a great potential to be applied in object tracking and many other fields. The program and demo on our corner detection can be accessed at http://vision-cdc.csiro.au/corner1st/.

Table 4 Execution time and memory usage comparisons (image size is in pixels, the units of execution time and memory usage are in second and MB respectively)
Fig. 15
figure 15

Test image ‘Rabit’ (Aanæs et al. 2012) and test image ‘Alamo’ (Wilson and Snavely 2014) are shown in (a) and (d) in the first column. Their corresponding sparse 3D reconstruction results of the SURF method (Bay et al. 2006) and the proposed method are shown in the second and the third columns respectively