Image matching based on the adaptive redundant keypoint elimination method in the SIFT algorithm

Hossein-Nejad, Zahra; Agahi, Hamed; Mahmoodzadeh, Azar

doi:10.1007/s10044-020-00938-w

Image matching based on the adaptive redundant keypoint elimination method in the SIFT algorithm

Theoretical Advances
Published: 12 November 2020

Volume 24, pages 669–683, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Pattern Analysis and Applications Aims and scope Submit manuscript

Image matching based on the adaptive redundant keypoint elimination method in the SIFT algorithm

Download PDF

1023 Accesses
33 Citations
Explore all metrics

Abstract

Scale invariant feature transform (SIFT) is one of the most effective techniques in image matching applications. However, it has a main drawback: existing numerous redundant keypoints located very close to each other in the image. These redundant keypoints increase the computational complexity while they decrease the image matching performance. Redundant keypoint elimination method (RKEM)–SIFT are incorporated to eliminate these points by comparing their distances with a fixed experimental threshold value. However, this value has a great impact on the matching results. In this paper, an adaptive RKEM is presented which considers type of the images and distortion thereof, while adjusting the threshold value. Moreover, this value is found separately for the reference and sensed images. In an image, the adaptive RKEM finds the histogram of the keypoints distances, for which the number and the width of the bins are determined based on the number of keypoints and the distances distribution metrics. Then, a maximum value for searching the optimal threshold value is determined. Finally, for each integer value smaller than the mentioned maximum, a set containing distances smaller than that value is created and the one with the smallest variance is selected. The integer value corresponding to that set is chosen as the adaptive threshold for that image. This approach can improve the efficiency of the RKEM-SIFT in eliminating redundant keypoints. Simulation results validated that the proposed method outperforms the SIFT, A² SIFT and RKEM-SIFT in terms of the matching performance indices.

A Method of Acceleration Applied in Symmetric-SIFT and SIFT

Impact of Distortions on the Performance of Feature Extraction and Matching Techniques

A Novel Iterative SIFT Points Matching Method

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Matching is the process of determining correspondence between images of the same scene received at different imaging conditions [1]. This process is an essential step in many applications such as image registration [2], change detection [3], image fusion [4], image mosaicking [5], and 3D reconstruction [6]. Generally, matching techniques can be categorized into 2 types: the area-based and the feature-based approaches [7]. Area-based methods use the distribution of grey-levels, in which images in the windows with the same dimensions are considered directly. To this end, using some similarity (or difference) intensity-based criteria, 2 images are statistically compared in order to determine the location of maximum similarities, representing similar situations [8]. Although the area-based methods can reach very high accuracies of matching, their performance is reduced in areas with uniform textures [9]. Area-based methods are appropriate almost only for images with low distortions (i.e., little differences in scale, rotation and angle between images). Existing noise, illumination changes and taking images from different sensors cause variations in the intensity of the image; thus, the area-based methods may not work properly for these types of images [7]. In feature-based methods, some features (including points, lines, and regions) are first extracted from the images [10,11,12,13]. Then using the attributes of each feature such as magnitude and gradient orientation, a descriptor is generated as a feature vector according to which the matching process is performed, considering some similarity criteria [14, 15]. ‘Features from accelerated segment test’ (FAST) [16], ‘oriented FAST and rotated BRIEF’ (ORB) [17], SIFT [18], Zernike moments [19, 20], ‘gradient location-orientation histogram’ (GLOH) [21], and ‘affine mirror invariant feature transform’ (AMIFT) [22] are some examples of detectors and descriptors that have been used in the feature-based image matches. Compared to the area-based methods, the feature-based ones are more reliable and have higher performance when dealing with geometric and radiometric distortions.

The SIFT algorithm is an important feature-based matching algorithm presented by David Lowe in 2004. Features extracted by SIFT are invariant to scale and rotation changes and robust to changes in illumination, affine distortion, and noises [18]. SIFT descriptors are also very distinctive and outperform other descriptors in geometric distortions [23]. Due to its high performance, SIFT is extensively applied for matching natural and satellite images [21, 24,25,26,27,28,29,30]. Despite the advantages mentioned, this algorithm has some limitations. These limitations include high computing time, low controllability of the number of features, insignificance of some detected keypoints and destroying edges and essential details in the image due to Gaussian scale space (GSS) used in the SIFT algorithm [1, 31, 32]. These drawbacks deteriorate the computation of features, cause mismatches, and reduce the robustness of the algorithm to affine distortions. In addition, redundant keypoints may be detected which decrease the matching accuracy while increasing the computational load.

To overcome the above-mentioned problems, some methods have been presented in the literature [1, 30, 32,33,34,35,36,37,38,39,40,41]. In [1], the uniform robust SIFT (UR-SIFT) algorithm was introduced where keypoints were distributed uniformly in both spatial and scale directions. In [30], belief propagation (BP)-SIFT was presented which used the BP algorithm to increase the number of matches. In [32], the adapted anisotropic Gaussian SIFT (AAG-SIFT) method was proposed in which the anisotropic Gaussian filter on scale space of SIFT and dominant orientation consistency (DOC) were used to preserve the edges and eliminate the mismatches, respectively. In [33], the speeded-up robust features (SURF) algorithm was presented, which was faster than the SIFT algorithm. In [34], the affine SIFT (ASIFT) algorithm was presented, which was completely invariant to affine distortions. In [35], to preserve details in images and to eliminate mismatches, nonlinear diffusion in scale space of SIFT and phase congruency were used, respectively. In [36], bilateral filter SIFT (BFSIFT) was presented where, to preserve details in synthetic aperture radar (SAR) images, anisotropic scale space (ASS) replaced GSS in the SIFT. In [37], SAR-SIFT algorithm was introduced. For matching SAR images, this method used a SIFT-like algorithm in which SAR-Harris scale space was created to detect keypoints. In addition, for orientation assignment and descriptor extraction, it used a circular window for each keypoint. In [38], the locality preserving matching (LPM) method was proposed and applied in order to improve the matching accuracy in the SIFT algorithm. In [39], the multi-dimensional scaling SIFT (MDS-SIFT) method was presented that used the MDS algorithm [42] to reduce the dimensions of the descriptors of the SIFT vector and LBP features to eliminate mismatches. In [40], to reduce the number of keypoints and increase the run speed and matching accuracy, the proposed method used the adaptive threshold Canny operator, followed by the SIFT algorithm. In [41], by improving the SIFT descriptor and presenting an optimized matching method, the registration process was enhanced in SAR images.

Another imperative problem of the SIFT is the existence of redundant points leading to similar descriptors and consequently possible interference in the matching process. Recently, a method was proposed in [31] which aimed to eliminate redundant points using a redundancy index. The SIFT algorithm based on the ‘redundant keypoint elimination method’ (RKEM-SIFT) could well eliminate redundant keypoints and resulted in very good attainments in the image registration. In this algorithm, once the classic SIFT algorithm identified the keypoints, the distances between keypoints in each image were computed. Afterwards, for any pair of keypoints with a distance less than a pre-determined threshold value, one keypoint was deleted and the other one was kept for the matching process. RKEM–SIFT algorithm is used effectively in image registration [31], retinal image matching [43], image mosaicking [44], and copy-move forgery detection [45]. Despite the benefits reported for the RKEM–SIFT algorithm, one major shortcoming was that the mentioned threshold value had to be determined experimentally. In choosing the threshold value in the RKEM–SIFT, the image type (e.g., natural images, remote sensing images, etc.) was not considered as an operational factor. Accordingly, some redundant keypoints could not be eliminated, or some important ones might be deleted unwantedly. Contrariwise, this threshold value was considered the same for both the reference and sensed images and the distortion between images was not considered. This insensitive selection increased the mismatches rate.

In this paper, an adaptive RKEM is presented to eliminate redundant keypoints of the SIFT algorithm. The proposed method operates based on adaptive calculation of the threshold in the RKEM–SIFT. The threshold value is determined based on the amount of dispersion (variance) of distance of keypoints. Hence, the type of the images and the between distortions are considered, which can lead to the improvement of the efficiency of the RKEM–SIFT in eliminating redundant points and consequently enhancement of the image matching performance. To do this, first the distance between all the keypoints in the image is calculated and the histogram of distances is found. To determine the number and the width of the histogram bins, we incorporate the number of keypoints and the minimum and the maximum of the distance distribution. Then, according to a novel approach we compute a maximum distance threshold value, beyond which we do not proceed to find redundant keypoints. This value truncates the histogram of the distances. Thus, far distances are ignored when searching for the redundant keypoints. A collection of integer values smaller than the mentioned threshold is considered. Afterwards, for each value a set containing distances smaller than that value is generated, for which the variance is calculated. The set with the least variance is chosen and its corresponding threshold will be called the adaptive threshold. With respect to this distance threshold value, the keypoints are assessed to be redundant or not. As another improvement with respect to the RKEM–SIFT, the threshold value in each image is found separately, again leading to a more successful image matching process. Finally, for the remaining keypoints in the reference and sensed images, the matching process is performed. The proposed method is applied to four datasets containing images with different types of distortions. Comparing the results of this and some other related methods demonstrates the superiority of our approach in terms of the well-known performance criteria.

The organization of the rest of the paper is as follows. RKEM–SIFT method is reviewed in Sect. 2. Section 3 is dedicated to the description of the proposed adaptive RKEM method in the matching process. Section 4 presents the simulation results. Finally, the paper is concluded in Sect. 5, where some ideas for the future work are also presented.

2 RKEM–SIFT

This section describes the stages of the RKEM method which tries to reduce the redundant keypoints of the SIFT algorithm [31]. The RKEM-SIFT consists of four steps: (1) extracting the scale space extrema, (2) improving the accuracy of the localization, and removing unstable extrema, (3) allocating orientation to each feature and finally, (4) eliminating redundant keypoints. Details of these steps are as follows.

2.1 Extraction of scale space extrema

The first step in the SIFT algorithm is the detection of those points whose locations are independent of changes in the image scale. For this purpose, SIFT finds the stable features of the image on different scales using the ‘scale space’. Figure 1 shows how the scale space is created.

At first, the double-sized image is considered as the first level image in the first octave. Then, this image is convolved in an iterative manner with the Gaussian kernel to create Gaussian scale space images in each octave, as follows:

$$ L\left( {x, \, y,\sigma } \right) = G\left( {x, \, y,\sigma } \right) \otimes I\left( {x, \, y} \right) $$

(1)

In which, $I\left( {x, \, y} \right)$ represents the image, $G\left( {x, \, y,\sigma } \right)$ is the Gaussian kernel defined in Eq. (2) and $\otimes$ represents the convolution operator.

$$ G\left( {x, \, y,\sigma } \right) = \frac{1}{{2\pi \sigma^{2} }}\exp ( - {{(x^{2} + y^{2} )} \mathord{\left/ {\vphantom {{(x^{2} + y^{2} )} {2\sigma^{2} }}} \right. \kern-\nulldelimiterspace} {2\sigma^{2} }}) $$

(2)

Parameter $\sigma$ indicates the scale of each image with the initial value $\sigma_{0} = {1 \mathord{\left/ {\vphantom {1 6}} \right. \kern-\nulldelimiterspace} 6}$. Once the scale space is generated, DOG images are computed by taking difference between any adjacent pair of Gaussian images in each octave, as follows:

$$ D\left( {x, \, y,\sigma } \right) = L\left( {x, \, y,k\sigma } \right) - L\left( {x, \, y,\sigma } \right) $$

(3)

where $L\left( {x, \, y,\sigma } \right)$ and $L\left( {x, \, y,k\sigma } \right)$ are the Gaussian images with scale $\sigma$ and $k\sigma$, respectively. To find the stable features in the DOG images of each octave, each pixel is compared with its eight neighboring pixels in its scale in addition to nine upper-scale and nine lower-scale neighboring pixels in the DOG images. If it is an extremum (maximum or minimum) point, its value is stored as a candidate. Finally, unstable extrema are removed as described in the next subsection.

2.2 keypoints localization improvement and unstable extrema removal

For this step, the low-contrast features and those located on the edges are taken away. The algorithm finds the precise locations of the keypoints via interpolating neighbour data points for each keypoint. In accordance with the proposal of Lowe [14], features with contrast values less than a pre-determined threshold value $(T_{c} = 0.03)$ are considered unstable; and thus, are better to be removed. Afterwards, the edge-located features should be removed, since they are noise-sensitive [14].

2.3 Assigning orientation to each feature

Afterwards, an orientation is determined for each keypoint. To do this, a circular mask (with radius of $6\sigma$) is considered for each feature. Then, the orientation histogram for the pixels in that mask is computed. Lastly, with respect to this histogram, one or possibly multiple main orientations are selected for every keypoint.

2.4 Redundant keypoint elimination

The RKEM–SIFT was proposed to reduce the redundant keypoints. This method was implemented for the reference and sensed images separately as follows.

Stage 1: From the image, keypoints are extracted.
Stage 2: The Manhattan distance between any pair of keypoints (call $p_{i}$ and $p_{j}$) is calculated as:
$$ d(p_{i} ,p_{j} ) = \sum\nolimits_{k = 1}^{l} {\,\left| {p_{i} (k) - p_{j} (k)} \right|} $$
(4)

In which, $p_{i} (k)$ and $p_{j} (k)$ are the k^th coordinate of the keypoints $p_{i}$ and $p_{j}$, respectively, and $l$ is the length of any keypoint vector in the image. Then, the distance values between the keypoints $p_{i}$ and all other ones are summed up.

$$ SD(p_{i} ) = \sum\nolimits_{s = 1}^{N} {\,d(p_{i} ,p_{s} )} $$

(5)

In Eq. (5), N is the number of keypoints in the image, and $d(p_{i} ,p_{s} )$ is the distance between the keypoints $p_{i}$ and $p_{s}$ found using Eq. (4). The redundancy index (RI) of the $p_{i}$ keypoint is defined as:

$$ RI(p_{i} ) = {1 \mathord{\left/ {\vphantom {1 {SD(p_{i} )}}} \right. \kern-\nulldelimiterspace} {SD(p_{i} )}} $$

(6)

Stage 3: A pre-defined threshold is associated, with respect to which the unnecessary keypoints are omitted. In the RKEM method, this threshold is found experimentally, and the image contents are not taken into account.
Stage 4: For any two distinct keypoints $p_{i}$ and $p_{j}$, if the distance is less than the threshold value, one keypoint is considered as redundant based on (7) and thus deleted. In fact, the redundant keypoint is the one with the larger redundancy index or smaller sum of distance values.
(7)

This method has a major problem: The threshold value is considered empirically fixed for the all the images. With respect to this value, the redundancy of keypoints in both the reference and sensed images is evaluated. If this threshold value is not determined properly, it leads to non-removal of redundant keypoints or removing important ones, since the distribution of the keypoints in an image is not homogeneous. Thus, the matching process may be interfered.

3 The proposed method

The image matching process consists of 3 steps (see Fig. 2). At first, the initial features of both images are detected using the classical SIFT algorithm, and then the final features are selected based on the proposed adaptive RKEM method. Finally, the matching process is performed.

3.1 Initial features detection

At the first step, the scale-space extrema are detected, the keypoints are accurately localized, and the orientation for each keypoint is assigned based on the classical SIFT algorithm [18], as in the first left column in Fig. 2.

3.2 The proposed adaptive RKEM Method

For this step, the proposed adaptive RKEM method is used for extracting and selecting the final features. An integer value is determined to represent the maximum number of threshold values. Then, the following stages are performed for the reference and sensed images:

1st stage: The Manhattan distance between any pair of keypoints (e.g., $p_{i}$ and $p_{j}$) in the image is calculated according to Eq. (4) for all the keypoints.
2nd stage: The histogram of the keypoints’ distances is computed, in which the number and the width of the bins is considered according to the following formulas:
$$ nb = \max (\left\lfloor {{N \mathord{\left/ {\vphantom {N {10}}} \right. \kern-\nulldelimiterspace} {10}}} \right\rfloor ,10) $$
(8)
$$ bw = \frac{Mxd - Mnd}{{nb}} $$
(9)

In Eq. (8), $nb$ represents the number of the bins and $\left\lfloor \cdot \right\rfloor$ is the floor operation. In Eq. (9), $bw$ is the bin width and $Mxd$ and $Mnd$ are the maximum and minimum of keypoints distances, respectively. Then, the ratio of the frequencies of the adjacent bins from the lower bin toward the higher one is computed as in Eq. (10). The first bin whose ratio to its next bin is less than 0.5 is selected and its maximum bin point is considered as n_max. In Eq. (10), the $k^{th}$ bin ($bin_{k}$) covers the distances in the range of $[Mnd + (k - 1).\,bw,\,\,Mnd + k.\,bw]$ and $m_{k}$ is its frequency in the histogram.

$$ \begin{gathered} \hat{k} = \mathop {\min }\limits_{k} \,\left\{ {\arg ([{{m_{k} } \mathord{\left/ {\vphantom {{m_{k} } {m_{k + 1} }}} \right. \kern-\nulldelimiterspace} {m_{k + 1} }}]\, < 0.5)} \right\},\,\,\,\,\,k = 1, \cdots ,nb \hfill \\ n_{\max } = \left\lfloor {\max \,\,{\text{bin}}_{{\hat{k}}} } \right\rfloor \hfill \\ \end{gathered} $$

(10)

3rd stage: For each threshold value $n$ with $n = 1, \cdots ,n_{\max }$, the following operation is carried out. If the distance of $p_{j}$ to a keypoint $p_{i}$ is greater than $n$, that distance is stored in the set $R_{n}$. This set collects distances of all keypoints which are greater than n.
$$ R_{n} = \left\{ {d(p_{i} ,p_{j} )\,\,|\,\,d(p_{i} ,p_{j} ) > n\,\,;\,\,\forall i,j = 1, \cdots ,N;\,\,i \ne j} \right\}\, $$
(11)
4th stage: The variance of the set $R_{n}$ is calculated as:
$$ \sigma_{n}^{2} = {\text{var}} (R_{n} )\,\,n = 1,2, \cdots ,n_{\max } $$
(12)
5th stage: Since the purpose of the proposed algorithm is finding the optimal threshold value for removing redundant keypoints, the one with the minimum variance of the keypoints’ distances is selected according to Eq. (13) in order to choose the best threshold among the previously considered values. It means that the dispersion of the keypoints is smaller, indicating the existence of less redundant keypoints.
$$ n_{opt.} = \arg \,\,\min (\sigma_{n}^{2} ) $$
(13)
6th stage: If the distance between each two distinct keypoints is less than the optimal threshold $n_{opt.}$, the unnecessary keypoint is removed. As a matter of fact, the keypoint with higher RI is considered redundant and thus, ignored.

The presented adaptive RKEM method automatically finds the threshold value and is able to obtain independent threshold values for the reference and sensed images. This method leads to accurate removal of redundant keypoints and ultimately increases the matching accuracy.

3.3 Matching process

Matching is the process of determining the correspondence between two or more images of the same scene received at different times, with different angles, or by different sensors. To carry out image matching, there are different descriptors which could generally be categorized into three groups: distribution-based descriptors, special frequency techniques-based descriptors, and differential descriptors [15]. This paper uses the SIFT, which is a distribution-based descriptor. To create a descriptor in the SIFT algorithm, an area is initially considered as a 4 × 4 grid around each feature in its associated Gaussian image. The dimensions of this area are selected according to the scale of each feature, so that each bin is as a square with an edge equal to three times the scale of the feature. Then, the grid coordinates are rotated equal to the main orientation of the desired feature. The magnitudes and gradient orientations of the pixels inside the rotated area are calculated, and the gradient orientations are rotated equal to the original orientation of the features being rotated. Finally, the SIFT descriptor is created as a vector with 128 components. After creating one descriptor for each feature, by calculating the Euclidean distance for each descriptor in both images, matching process is done. To perform the nearest neighbor distance ratio (NNDR) matching operation, for each keypoint descriptor in the reference image, its Euclidean distance to the first nearest and the second nearest neighbors in the sensed image are computed, and their ratio is found. If this value is less than a threshold value, it will be considered as the matched point. In this method, the descriptor $RD_{k}$ in the reference image and the descriptor $SD_{i}$ in the sensed image are matched if Eq. (14) holds [18, 21].

$$ \frac{{\left\| {RD_{{\text{k}}} - SD_{i} } \right\|}}{{\left\| {RD_{k} - SD_{j} } \right\|}} < T $$

(14)

In which, $SD_{i}$ and $SD_{j}$ are the first and the second nearest descriptors to the descriptor$RD_{k}$, respectively.

4 Simulation and experimental results

In order to evaluate the proposed method, four databases were examined. The first database contained different types of images with different distortions (different affine distortions, scale changes, illumination changes, and different blur), including 10 images of each distortion. In each type of distortion, the sizes of images are different. For the case of the distortion of the scale changes, the images sizes are $800 \times 640$. Also for the illumination changes, and different blur and rotation changes, the images sizes are $800 \times 640$, $921 \times 614$, and $1000 \times 700$, respectively. These images were taken from (https://lear.inrialpes.fr/people/mikolajczyk/Database/index.html) [46]. The second database contained ten pairs of natural images with different distortions (scale changes and rotation changes), taken from [25]. The third database contained 6 pairs of multi-modal remote sensing images with different context conditions such as urban areas, and natural landscapes, taken from [1]. The fourth database contained one pair of images with perspective distortions, taken from (https://github.com/jagracar/OpenCV-python-tests/blob/master/OpenCV-tutorials/data/box.png.) [47]. In these experiments, in order to prevent the selection of threshold values from affecting the results of image matching, the algorithm parameters were selected constantly as follows. The scale space in the SIFT algorithm was created in five octaves, each of which had four scale layers. The amount of the threshold $T_{C}$ was considered 0.03 in order to extract the suitable number of features according to Lowe’s suggestion [14]. The $T$ threshold value in Eq. (14) was considered 0.8 in order to remove false matches (mismatches), as Lowe suggested [18, 21]. The threshold value in the RKEM algorithm in Eq. (7) was considered 3 in order to remove redundant keypoints according to [25]. In the A²SIFT algorithm, the threshold value $n$ was considered between 0.01 and 0.05 according to [39]. Two sets of experiments were done. In the first set, different threshold values in the RKEM–SIFT were studied. In the second set, we evaluated the performance of the proposed method in the process of matching. To investigate the effectiveness of the proposed method, it was also compared with some common algorithms including the standard SIFT [14], A²SIFT [39] and RKEM-SIFT [25]. All the experiments were run on a personal computer with a 2.27 GHz Intel Core i5, 4G RAM with software MATLAB 2016A.

4.1 Evaluation criteria

Four common criteria, i.e., matching accuracy according to Eq. (15), recall according to Eq. (16), repeatability according to Eq. (17) and RMSE (root mean square error) according to Eq. (18) are used to evaluate the proposed method.

$$ {\text{Accuracy}} = {{TP} \mathord{\left/ {\vphantom {{TP} M}} \right. \kern-\nulldelimiterspace} M} $$

(15)

$$ {\text{Recall}} = {{TP} \mathord{\left/ {\vphantom {{TP} P}} \right. \kern-\nulldelimiterspace} P} $$

(16)

$$ F_{r} = {{TP} \mathord{\left/ {\vphantom {{TP} {\min \left( {N_{{{\text{ref}}}} ,N_{{{\text{sens}}}} } \right)}}} \right. \kern-\nulldelimiterspace} {\min \left( {N_{{{\text{ref}}}} ,N_{{{\text{sens}}}} } \right)}} $$

(17)

$$ {\text{RMSE}} = \sqrt {{{\sum\limits_{i = 1}^{M} {\left( {x_{i} - x^{\prime}_{i} } \right)^{2} + \left( {y_{i} - y^{\prime}_{i} } \right)^{2} } } \mathord{\left/ {\vphantom {{\sum\limits_{i = 1}^{M} {\left( {x_{i} - x^{\prime}_{i} } \right)^{2} + \left( {y_{i} - y^{\prime}_{i} } \right)^{2} } } M}} \right. \kern-\nulldelimiterspace} M}} $$

(18)

In Eqs. (15–18), $TP$ is the number of correct matches, $M$ is the total number of matches and $P$ is the number of correspondences. $N_{{{\text{ref}}}}$ and $N_{{{\text{sens}}}}$ denote the total number of the detected features of the reference and the sensed images, respectively. Also in (18), the pairs $(x_{i} ,y_{i} )$ and $(x^{\prime}_{i} ,y^{\prime}_{i} )$ represent the coordinates of the reference and transformed i^th matched keypoints, respectively. If the matching accuracy, recall and repeatability are high and RMSE is low, the performance of the system is acceptable.

4.2 Effects of the threshold value in the RKEM–SIFT on blur images

In this experiment, we researched on how to find the optimal threshold value in the RKEM-SIFT algorithm. In this test, 2 images with the same distortion (blur) were used. A couple of images (structural images) with blur changes were used (Fig. 3), and the matching accuracy for different thresholds is shown in Fig. 4. As seen in the latter figure, the appropriate threshold value in Eq. (7) in these images is 4 as it has the highest matching accuracy. In the next test, a couple of images (texture images) with blur changes were used (Fig. 5), and the matching accuracy of which on different thresholds is shown in Fig. 6. In this pair of images, the optimal threshold value was 6.

As seen in Figs. 4 and 6, for the same distortion, the optimal threshold values in these pairs of images are different. Figure 3 shows a pair of structural images while Fig. 5 shows a pair of texture images; hence it can be concluded that the nature of the images affects the choice of the optimal threshold value. Generally, in addition to the nature of images, the type of images and the between distortion play roles in determining the optimal threshold value. Using an adaptive threshold value which considers distortion in the images and their types and nature, the RKEM-SIFT achieves more accurate matching results.

4.3 Simulation results of the matching process

To evaluate the effectiveness of the proposed Adaptive RKEM method, a series of experiments (4.3.1–4.3.8) with various affine distortions, different scales and illuminations, different rotations, JPEG compression and perspective distortions are designed, and the results are discussed by the matching accuracy, recall, repeatability and the RMSE evaluation criteria.

4.3.1 Matching of images with affine distortions

In this experiment, a set of real-world images with affine distortions, i.e., images that are different in terms of scale, movement, rotation, and shearing taken from [46] are matched to demonstrate the validity of the proposed method. The results are shown in (Figs. 7, 8) and Table 1. As shown in these figures, the number of incorrect matches in the proposed adaptive RKEM is less than those of the RKEM, A² SIFT and SIFT methods. Table 1 reports the performance measures for different matching techniques. It is obvious that the proposed method achieves high performance rates in terms of the matching accuracy and the RMSE, which reflects the appropriate operation of the proposed Adaptive RKEM method. The run time of the feature extraction in the proposed method has increased because in addition to calculating the Manhattan distance to find redundant keypoints, the process of computing the appropriate threshold value for removing these redundant keypoints is also included. Feature matching time in the Adaptive RKEM is reduced because redundant keypoints are removed. As it is concluded from Table 1, the increase of this total time (i.e., sum of the feature extraction time and the feature matching time) in the proposed method is negligible compared with its considerable increase in the matching accuracy and decrease in the RMSE.

Table 1 Matching process criteria for different methods on images with different affine distortions

Full size table

4.3.2 Matching of images with changes in scale

In this test, a couple of images with different scales taken from [46] are used, and the matching process is evaluated in this case. Squares in (Figs. 9, 10) represent the false matches due to redundant SIFT features. In addition, it is shown that the unnecessary keypoints are deleted using RKEM–SIFT and the Adaptive RKEM improves the matching accuracy.

Quantitative results obtained by different methods are given in Table 2, which shows that the proposed Adaptive RKEM method outperform others in terms of accuracy, repeatability, recall, RMSE and matching run times. This fact demonstrates that the proposed method is more robust to scale changes than the other methods.

Table 2 Matching process criteria for different methods on ten couples of images with different scales

Full size table

4.3.3 Matching of images with illumination changes

To demonstrate the performance of the proposed method, a couple of images with different illuminations taken from [46] are employed in this experiment (Figs. 11, 12). Table 3 indicates that the proposed method obtains the highest alignment accuracy, recall and repeatability compared with the SIFT [18], A²SIFT [48] and RKEM–SIFT. In this experiment, total run time of the Adaptive RKEM is almost twice as that of the classic SIFT method because the most time is spent to find and remove redundant keypoints.

Table 3 Matching process criteria for different methods on a couples of images with different illuminations

Full size table

4.3.4 Matching of images with rotation changes

In this test, ten couples of real-world images with rotation changes (mobile images) taken from [31] are used. Comparison of the results is shown in Fig. 13. According to the RMSE values in this figure, it can be concluded that the Adaptive RKEM method significantly outperform SIFT, A²SIFT [48], and RKEM–SIFT methods by alignment accuracy for images with rotation changes. By a further comparison, it can also be found that the proposed Adaptive RKEM method obtains lower RMSE than the others.

4.3.5 Matching of images with JPEG compression

In this experiment, 5 pairs of JPEG compression images taken from [46] are used; the results of the RMSE values are reported in Table 4. As shown in this table, the proposed method reveals better performance than the others. Functioning of the matching methods on a pair of images is shown in Fig. 14, in which the mismatches are highlighted using squares. As shown, in terms of matching accuracy and number of mismatches, the SIFT algorithm performs the worst among other methods but RKEM–SIFT and Adaptive RKEM obtain acceptable matching indices.

Table 4 Average RMSE results of matching processes in images with JPEG compression

Full size table

4.3.6 Matching of multi-modal remote sensing images

In this test, multi-modal remote sensing images from [1] are used and the matching process is evaluated in this case. Squares in Fig. 15 represent the false matches due to redundant SIFT features. As shown in Fig. 15, many false matches could be recognized in the SIFT algorithm results that do not exist in the proposed Adaptive RKEM method. Therefore, it is concluded that removing redundant keypoints improves the matching process.

4.3.7 Matching of Images with perspective distortions

To demonstrate the performance of the proposed adaptive RKEM method for images with perspective distortions, some samples from [47] are employed in this experiment. Comparison of the results is shown in Fig. 16.

Table 5 indicates that the proposed Adaptive RKEM method obtains the highest matching accuracy, compared with the SIFT and the RKEM-SIFT.

Table 5 Matching process criteria for different methods on images with perspective distortions

Full size table

4.3.8 Investigation of the proposed adaptive RKEM method on other detectors

To demonstrate the performance of the proposed adaptive RKEM method on other detectors as SURF and KAZE, a couple of images (mobile images) with scale changes from the database [25] are employed in this experiment. Table 6 indicates that the proposed Adaptive RKEM method (including Adaptive RKEM-SURF and Adaptive RKEM-KAZE) obtained better matching accuracies compared with the SURF and KAZE.

Table 6 Matching process criteria for different methods on images with different scales

Full size table

5 Conclusions

In this paper, an adaptive algorithm for eliminating redundant keypoints in the SIFT method was presented. The proposed algorithm, generated based on the RKEM–SIFT method, enhanced the matching performance with respect to its origin in terms of different indices. In this technique, the threshold value was selected based on keypoints’ distribution in the image under study. The aim of the 2nd and 3rd stages in the proposed method was to find a maximum distance threshold value ($n_{\max }$), beyond which we do not proceed to find redundant keypoints. This value truncates the histogram of the distances; hence, it facilitates finding the optimal threshold value. One of the main characteristics of the proposed adaptive RKEM method was that the threshold value could be chosen differently in the sensed and reference images unlike the RKEM. This capability was very important, as these images had different characteristics, and the same threshold value led to lower image matching performance. Experimental results showed that the proposed method is applicable to various types of images such as natural images and remote sensing ones and to different distortions such as angle, perspective, affine and scale changes. Using the proposed Adaptive RKEM method in different applications other than image matching, such as image registration, change detection and image mosaicking could be the subjects of future works.

References

Sedaghat A, Mokhtarzade M, Ebadi H (2011) Uniform robust scale-invariant feature matching for optical remote sensing images. IEEE Trans Geosci Remote Sens 49:4516–4527
Article Google Scholar
Hossein-Nejad Z, Nasri M (2017a) A Review on Image Registration Methods, Concepts and applications. J Mach Vis Image Process 5:39–67
Google Scholar
Radke RJ, Andra S, Al-Kofahi O, Roysam B (2005) Image change detection algorithms: a systematic survey. IEEE Trans Image Process 14:294–307
Article MathSciNet Google Scholar
Liu Y, Liu S, Wang Z (2015) Multi-focus image fusion with dense SIFT. Inf Fusion 23:139–155
Article Google Scholar
Tsai C-L, Li C-Y, Yang G, Lin K-S (2009) The edge-driven dual-bootstrap iterative closest point algorithm for registration of multimodal fluorescein angiogram sequence. IEEE Trans Med Imaging 29:636–649
Article Google Scholar
Guo Y, Lei Y, Liu L, Wang Y, Bennamoun M, Sohel F (2016) EI3D: Expression-invariant 3D face recognition based on feature and shape matching. Pattern Recogn Lett 83:403–412
Article Google Scholar
Zitova B, Flusser J (2003) Image registration methods: a survey. Image Vis Comput 21:977–1000
Article Google Scholar
Rubeaux M, Nunes J-C, Albera L, Garreau M (2014) Medical image registration using Edgeworth-based approximation of Mutual Information. IRBM 35:139–148
Article Google Scholar
Remondino F, El-Hakim SF, Gruen A, Zhang L (2008) Turning images into 3-D models. IEEE Signal Process Mag 25:55–65
Article Google Scholar
Sagayam KM, Hemanth DJ (2019) A probabilistic model for state sequence analysis in hidden Markov model for hand gesture recognition. Comput Intell 35:59–81
Article MathSciNet Google Scholar
Sagayam KM, Hemanth DJ (2018) ABC algorithm based optimization of 1-D hidden Markov model for hand gesture recognition applications. Comput Ind 99:313–323
Article Google Scholar
Sagayam KM, Hemanth DJ (2017) “Application of pseudo 2-D hidden Markov model for hand gesture recognition.” In: Proceedings of the first international conference on computational intelligence and informatics 179–188
Sagayam KM, Hemanth DJ2018) “Comparative analysis of 1-D HMM and 2-D HMM for hand motion recognition applications.”Progress in intelligent computing techniques: theory, practice, and applications 227–234
Hsu C-T, Beuker RA (2000) (2000)Multiresolution feature-based image registration. Vis Commun Image Process 4067:1490–1498
Google Scholar
Aghajani K, Yousefpour R, Zohrehvandi M (2019) A robust non-local total-variation based image registration method under illumination changes in medical applications. Biomed Signal Process Control 49:96–112
Article Google Scholar
Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. Eur Conf Comput Vis 3951:430–443
Google Scholar
Rublee E, Rabaud V, Konolige K, Bradski GR (2011) “ORB: An efficient alternative to SIFT or SURF.” ICCV 1–8
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110
Article Google Scholar
Sagayam KM, Hemanth DJ, Ramprasad YN, Menon R (2017) “Optimization of hand motion recognition system based on 2D HMM approch using ABC algorithm.” In: Hybrid Intelligent Techniques for pattern Analysis and Understanding, pp 167–192
Gu L, Su J (2008) “Natural hand posture recognition based on Zernike moments and hierarchical classifier.” In: 2008 IEEE international conference on robotics and automation 3088–3093
Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 27:1615–1630
Article Google Scholar
Mohtaram N, Radgui A, Caron G, Mouaddib EM (2018) “Amift: affine-mirror invariant feature transform.” In: 2018 25th IEEE International Conference on Image Processing (ICIP) 893–897
Zhang Q, Wang Y, Wang L (2015) Registration of images with affine geometric distortion based on maximally stable extremal regions and phase congruency. Image Vis Comput 36:23–39
Article Google Scholar
Liao K, Liu G, Hui Y (2013) An improvement to the SIFT descriptor for image representation and matching. Pattern Recognit Lett 34:1211–1220
Article Google Scholar
Chen Y, Shang L (2016) Improved SIFT image registration algorithm on characteristic statistical distributions and consistency constraint. Optik Int J Light Electron Opt 127:900–911
Article Google Scholar
Yi Z, Zhiguo C, Yang X (2008) Multi-spectral remote image registration based on SIFT. Electron Lett 44:107–108
Article Google Scholar
Castillo-Carrión S, Guerrero-Ginel J-E (2017) SIFT optimization and automation for matching images from multiple temporal sources. Int J Appl Earth Obs Geoinf 57:113–122
Article Google Scholar
Ding X, Ding Y (2017) “Image matching with an improved descriptor based on SIFT,” In: Seventh international conference on electronics and information engineering 10322: 1–6
Hossein-Nejad Z, Nasri M (2017b) An adaptive image registration method based on SIFT features and RANSAC transform. Comput Electr Eng 62:524–537
Article Google Scholar
Zhu Y, Cheng S, Stanković V, Stanković L (2013) Image registration using BP-SIFT. J Vis Commun Image Represent 24:448–457
Article Google Scholar
Hossein-Nejad Z, Nasri M (2017c) RKEM: redundant keypoint elimination method in image registration. IET Image Proc 11:273–284
Article Google Scholar
Wang F, You H, Fu X (2014) Adapted anisotropic Gaussian SIFT matching strategy for SAR registration. IEEE Geosci Remote Sens Lett 12:160–164
Article Google Scholar
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110:346–359
Article Google Scholar
Morel J-M, Yu G (2009) ASIFT: A new framework for fully affine invariant image comparison. SIAM J Imaging Sci 2:438–469
Article MathSciNet Google Scholar
Fan J, Wu Y, Wang F, Zhang Q, Liao G, Li M (2014) SAR image registration using phase congruency and nonlinear diffusion-based SIFT. IEEE Geosci Remote Sens Lett 12:562–566
Google Scholar
Wang S, You H, Fu K (2011) BFSIFT: A novel method to find feature matches for SAR image registration. IEEE Geosci Remote Sens Lett 9:649–653
Article Google Scholar
Dellinger F, Delon J, Gousseau Y, Michel J, Tupin F (2015) SAR-SIFT: a SIFT-like algorithm for SAR images. IEEE Trans Geosci Remote Sens 53:453–466
Article Google Scholar
Ma J, Zhao J, Jiang J, Zhou H, Guo X (2019) Locality preserving matching. Int J Comput Vis 127:512–531
Article MathSciNet Google Scholar
Zhou Z, Cheng S, Li Z (2016) “MDS-SIFT: an improved SIFT matching algorithm based on MDS dimensionality reduction.” In: 2016 3rd International Conference on Systems and Informatics (ICSAI), 896–900
Ran S, Zhong W, Ye L, Zhang Q (2015) “An improved SIFT algorithm based on adaptive threshold canny.” In: 2015 8th international congress on image and signal processing (CISP), 340–345
Zhou D, Zeng L, Liang J, Zhang K (2016) Improved method for SAR image registration based on scale invariant feature transform. IET Radar Sonar Navig 11:579–585
Article Google Scholar
Tajeripour F, Kabir E, Sheikhi A (2008) Fabric defect detection using modified local binary patterns. EURASIP J Adv Signal Process 2008:1–12
MATH Google Scholar
Hossein-Nejad Zahra, Nasri M (2019) “Retianl image registration based on auto-adaptive SIFT and redundant keypoint elimination method.” In: 27th Iranian conference on electrical engineering (ICEE2019)
Hossein-Nejad Z, Nasri M (2020) Natural image mosaicing based on redundant keypoint elimination method in SIFT algorithm and adaptive RANSAC method. Signal Data Process 17:1–13
Google Scholar
Hossein-Nejad Z, Nasri M (2019) Copy-move image forgery detection using redundant keypoint elimination method. In: Ramakrishnan S (ed) Cryptographic and information security approaches for images and videos. CRC Press, Boca Raton, pp 773–797
Google Scholar
https://lear.inrialpes.fr/people/mikolajczyk/Database/index.html
https://github.com/jagracar/OpenCV-python-tests/blob/master/OpenCV-tutorials/data/box.png
Lingua A, Marenchino D, Nex F (2009) Performance analysis of the SIFT operator for automatic feature extraction and matching in photogrammetric applications. Sensors 9:3745–3766
Article Google Scholar
Alcantarilla PF, Bartoli A, Davison AJ (2012) “KAZE features.” In: European conference on computer vision 7577: 214–227

Download references

Acknowledgements

The authors would like to thank the Pattern Analysis and Applications Associate Editor and the anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Department of Electrical Engineering, Shiraz Branch, Islamic Azad University, Shiraz, Iran
Zahra Hossein-Nejad, Hamed Agahi & Azar Mahmoodzadeh

Authors

Zahra Hossein-Nejad
View author publications
You can also search for this author in PubMed Google Scholar
Hamed Agahi
View author publications
You can also search for this author in PubMed Google Scholar
Azar Mahmoodzadeh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamed Agahi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hossein-Nejad, Z., Agahi, H. & Mahmoodzadeh, A. Image matching based on the adaptive redundant keypoint elimination method in the SIFT algorithm. Pattern Anal Applic 24, 669–683 (2021). https://doi.org/10.1007/s10044-020-00938-w

Download citation

Received: 17 April 2020
Accepted: 29 October 2020
Published: 12 November 2020
Issue Date: May 2021
DOI: https://doi.org/10.1007/s10044-020-00938-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Image matching based on the adaptive redundant keypoint elimination method in the SIFT algorithm

Abstract

Similar content being viewed by others

A Method of Acceleration Applied in Symmetric-SIFT and SIFT

Impact of Distortions on the Performance of Feature Extraction and Matching Techniques

A Novel Iterative SIFT Points Matching Method

1 Introduction

2 RKEM–SIFT

2.1 Extraction of scale space extrema

2.2 keypoints localization improvement and unstable extrema removal

2.3 Assigning orientation to each feature

2.4 Redundant keypoint elimination

3 The proposed method

3.1 Initial features detection

3.2 The proposed adaptive RKEM Method

3.3 Matching process

4 Simulation and experimental results

4.1 Evaluation criteria

4.2 Effects of the threshold value in the RKEM–SIFT on blur images

4.3 Simulation results of the matching process

4.3.1 Matching of images with affine distortions

4.3.2 Matching of images with changes in scale

4.3.3 Matching of images with illumination changes

4.3.4 Matching of images with rotation changes

4.3.5 Matching of images with JPEG compression

4.3.6 Matching of multi-modal remote sensing images

4.3.7 Matching of Images with perspective distortions

4.3.8 Investigation of the proposed adaptive RKEM method on other detectors

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation