1 Introduction

Matching is the process of determining correspondence between images of the same scene received at different imaging conditions [1]. This process is an essential step in many applications such as image registration [2], change detection [3], image fusion [4], image mosaicking [5], and 3D reconstruction [6]. Generally, matching techniques can be categorized into 2 types: the area-based and the feature-based approaches [7]. Area-based methods use the distribution of grey-levels, in which images in the windows with the same dimensions are considered directly. To this end, using some similarity (or difference) intensity-based criteria, 2 images are statistically compared in order to determine the location of maximum similarities, representing similar situations [8]. Although the area-based methods can reach very high accuracies of matching, their performance is reduced in areas with uniform textures [9]. Area-based methods are appropriate almost only for images with low distortions (i.e., little differences in scale, rotation and angle between images). Existing noise, illumination changes and taking images from different sensors cause variations in the intensity of the image; thus, the area-based methods may not work properly for these types of images [7]. In feature-based methods, some features (including points, lines, and regions) are first extracted from the images [10,11,12,13]. Then using the attributes of each feature such as magnitude and gradient orientation, a descriptor is generated as a feature vector according to which the matching process is performed, considering some similarity criteria [14, 15]. ‘Features from accelerated segment test’ (FAST) [16], ‘oriented FAST and rotated BRIEF’ (ORB) [17], SIFT [18], Zernike moments [19, 20], ‘gradient location-orientation histogram’ (GLOH) [21], and ‘affine mirror invariant feature transform’ (AMIFT) [22] are some examples of detectors and descriptors that have been used in the feature-based image matches. Compared to the area-based methods, the feature-based ones are more reliable and have higher performance when dealing with geometric and radiometric distortions.

The SIFT algorithm is an important feature-based matching algorithm presented by David Lowe in 2004. Features extracted by SIFT are invariant to scale and rotation changes and robust to changes in illumination, affine distortion, and noises [18]. SIFT descriptors are also very distinctive and outperform other descriptors in geometric distortions [23]. Due to its high performance, SIFT is extensively applied for matching natural and satellite images [21, 24,25,26,27,28,29,30]. Despite the advantages mentioned, this algorithm has some limitations. These limitations include high computing time, low controllability of the number of features, insignificance of some detected keypoints and destroying edges and essential details in the image due to Gaussian scale space (GSS) used in the SIFT algorithm [1, 31, 32]. These drawbacks deteriorate the computation of features, cause mismatches, and reduce the robustness of the algorithm to affine distortions. In addition, redundant keypoints may be detected which decrease the matching accuracy while increasing the computational load.

To overcome the above-mentioned problems, some methods have been presented in the literature [1, 30, 32,33,34,35,36,37,38,39,40,41]. In [1], the uniform robust SIFT (UR-SIFT) algorithm was introduced where keypoints were distributed uniformly in both spatial and scale directions. In [30], belief propagation (BP)-SIFT was presented which used the BP algorithm to increase the number of matches. In [32], the adapted anisotropic Gaussian SIFT (AAG-SIFT) method was proposed in which the anisotropic Gaussian filter on scale space of SIFT and dominant orientation consistency (DOC) were used to preserve the edges and eliminate the mismatches, respectively. In [33], the speeded-up robust features (SURF) algorithm was presented, which was faster than the SIFT algorithm. In [34], the affine SIFT (ASIFT) algorithm was presented, which was completely invariant to affine distortions. In [35], to preserve details in images and to eliminate mismatches, nonlinear diffusion in scale space of SIFT and phase congruency were used, respectively. In [36], bilateral filter SIFT (BFSIFT) was presented where, to preserve details in synthetic aperture radar (SAR) images, anisotropic scale space (ASS) replaced GSS in the SIFT. In [37], SAR-SIFT algorithm was introduced. For matching SAR images, this method used a SIFT-like algorithm in which SAR-Harris scale space was created to detect keypoints. In addition, for orientation assignment and descriptor extraction, it used a circular window for each keypoint. In [38], the locality preserving matching (LPM) method was proposed and applied in order to improve the matching accuracy in the SIFT algorithm. In [39], the multi-dimensional scaling SIFT (MDS-SIFT) method was presented that used the MDS algorithm [42] to reduce the dimensions of the descriptors of the SIFT vector and LBP features to eliminate mismatches. In [40], to reduce the number of keypoints and increase the run speed and matching accuracy, the proposed method used the adaptive threshold Canny operator, followed by the SIFT algorithm. In [41], by improving the SIFT descriptor and presenting an optimized matching method, the registration process was enhanced in SAR images.

Another imperative problem of the SIFT is the existence of redundant points leading to similar descriptors and consequently possible interference in the matching process. Recently, a method was proposed in [31] which aimed to eliminate redundant points using a redundancy index. The SIFT algorithm based on the ‘redundant keypoint elimination method’ (RKEM-SIFT) could well eliminate redundant keypoints and resulted in very good attainments in the image registration. In this algorithm, once the classic SIFT algorithm identified the keypoints, the distances between keypoints in each image were computed. Afterwards, for any pair of keypoints with a distance less than a pre-determined threshold value, one keypoint was deleted and the other one was kept for the matching process. RKEM–SIFT algorithm is used effectively in image registration [31], retinal image matching [43], image mosaicking [44], and copy-move forgery detection [45]. Despite the benefits reported for the RKEM–SIFT algorithm, one major shortcoming was that the mentioned threshold value had to be determined experimentally. In choosing the threshold value in the RKEM–SIFT, the image type (e.g., natural images, remote sensing images, etc.) was not considered as an operational factor. Accordingly, some redundant keypoints could not be eliminated, or some important ones might be deleted unwantedly. Contrariwise, this threshold value was considered the same for both the reference and sensed images and the distortion between images was not considered. This insensitive selection increased the mismatches rate.

In this paper, an adaptive RKEM is presented to eliminate redundant keypoints of the SIFT algorithm. The proposed method operates based on adaptive calculation of the threshold in the RKEM–SIFT. The threshold value is determined based on the amount of dispersion (variance) of distance of keypoints. Hence, the type of the images and the between distortions are considered, which can lead to the improvement of the efficiency of the RKEM–SIFT in eliminating redundant points and consequently enhancement of the image matching performance. To do this, first the distance between all the keypoints in the image is calculated and the histogram of distances is found. To determine the number and the width of the histogram bins, we incorporate the number of keypoints and the minimum and the maximum of the distance distribution. Then, according to a novel approach we compute a maximum distance threshold value, beyond which we do not proceed to find redundant keypoints. This value truncates the histogram of the distances. Thus, far distances are ignored when searching for the redundant keypoints. A collection of integer values smaller than the mentioned threshold is considered. Afterwards, for each value a set containing distances smaller than that value is generated, for which the variance is calculated. The set with the least variance is chosen and its corresponding threshold will be called the adaptive threshold. With respect to this distance threshold value, the keypoints are assessed to be redundant or not. As another improvement with respect to the RKEM–SIFT, the threshold value in each image is found separately, again leading to a more successful image matching process. Finally, for the remaining keypoints in the reference and sensed images, the matching process is performed. The proposed method is applied to four datasets containing images with different types of distortions. Comparing the results of this and some other related methods demonstrates the superiority of our approach in terms of the well-known performance criteria.

The organization of the rest of the paper is as follows. RKEM–SIFT method is reviewed in Sect. 2. Section 3 is dedicated to the description of the proposed adaptive RKEM method in the matching process. Section 4 presents the simulation results. Finally, the paper is concluded in Sect. 5, where some ideas for the future work are also presented.

2 RKEM–SIFT

This section describes the stages of the RKEM method which tries to reduce the redundant keypoints of the SIFT algorithm [31]. The RKEM-SIFT consists of four steps: (1) extracting the scale space extrema, (2) improving the accuracy of the localization, and removing unstable extrema, (3) allocating orientation to each feature and finally, (4) eliminating redundant keypoints. Details of these steps are as follows.

2.1 Extraction of scale space extrema

The first step in the SIFT algorithm is the detection of those points whose locations are independent of changes in the image scale. For this purpose, SIFT finds the stable features of the image on different scales using the ‘scale space’. Figure 1 shows how the scale space is created.

Fig. 1
figure 1

Scale space creation in the SIFT method [45]

At first, the double-sized image is considered as the first level image in the first octave. Then, this image is convolved in an iterative manner with the Gaussian kernel to create Gaussian scale space images in each octave, as follows:

$$ L\left( {x, \, y,\sigma } \right) = G\left( {x, \, y,\sigma } \right) \otimes I\left( {x, \, y} \right) $$
(1)

In which, \(I\left( {x, \, y} \right)\) represents the image, \(G\left( {x, \, y,\sigma } \right)\) is the Gaussian kernel defined in Eq. (2) and \(\otimes\) represents the convolution operator.

$$ G\left( {x, \, y,\sigma } \right) = \frac{1}{{2\pi \sigma^{2} }}\exp ( - {{(x^{2} + y^{2} )} \mathord{\left/ {\vphantom {{(x^{2} + y^{2} )} {2\sigma^{2} }}} \right. \kern-\nulldelimiterspace} {2\sigma^{2} }}) $$
(2)

Parameter \(\sigma\) indicates the scale of each image with the initial value \(\sigma_{0} = {1 \mathord{\left/ {\vphantom {1 6}} \right. \kern-\nulldelimiterspace} 6}\). Once the scale space is generated, DOG images are computed by taking difference between any adjacent pair of Gaussian images in each octave, as follows:

$$ D\left( {x, \, y,\sigma } \right) = L\left( {x, \, y,k\sigma } \right) - L\left( {x, \, y,\sigma } \right) $$
(3)

where \(L\left( {x, \, y,\sigma } \right)\) and \(L\left( {x, \, y,k\sigma } \right)\) are the Gaussian images with scale \(\sigma\) and \(k\sigma\), respectively. To find the stable features in the DOG images of each octave, each pixel is compared with its eight neighboring pixels in its scale in addition to nine upper-scale and nine lower-scale neighboring pixels in the DOG images. If it is an extremum (maximum or minimum) point, its value is stored as a candidate. Finally, unstable extrema are removed as described in the next subsection.

2.2 keypoints localization improvement and unstable extrema removal

For this step, the low-contrast features and those located on the edges are taken away. The algorithm finds the precise locations of the keypoints via interpolating neighbour data points for each keypoint. In accordance with the proposal of Lowe [14], features with contrast values less than a pre-determined threshold value \((T_{c} = 0.03)\) are considered unstable; and thus, are better to be removed. Afterwards, the edge-located features should be removed, since they are noise-sensitive [14].

2.3 Assigning orientation to each feature

Afterwards, an orientation is determined for each keypoint. To do this, a circular mask (with radius of \(6\sigma\)) is considered for each feature. Then, the orientation histogram for the pixels in that mask is computed. Lastly, with respect to this histogram, one or possibly multiple main orientations are selected for every keypoint.

2.4 Redundant keypoint elimination

The RKEM–SIFT was proposed to reduce the redundant keypoints. This method was implemented for the reference and sensed images separately as follows.

  • Stage 1: From the image, keypoints are extracted.

  • Stage 2: The Manhattan distance between any pair of keypoints (call \(p_{i}\) and \(p_{j}\)) is calculated as:

    $$ d(p_{i} ,p_{j} ) = \sum\nolimits_{k = 1}^{l} {\,\left| {p_{i} (k) - p_{j} (k)} \right|} $$
    (4)

In which, \(p_{i} (k)\) and \(p_{j} (k)\) are the kth coordinate of the keypoints \(p_{i}\) and \(p_{j}\), respectively, and \(l\) is the length of any keypoint vector in the image. Then, the distance values between the keypoints \(p_{i}\) and all other ones are summed up.

$$ SD(p_{i} ) = \sum\nolimits_{s = 1}^{N} {\,d(p_{i} ,p_{s} )} $$
(5)

In Eq. (5), N is the number of keypoints in the image, and \(d(p_{i} ,p_{s} )\) is the distance between the keypoints \(p_{i}\) and \(p_{s}\) found using Eq. (4). The redundancy index (RI) of the \(p_{i}\) keypoint is defined as:

$$ RI(p_{i} ) = {1 \mathord{\left/ {\vphantom {1 {SD(p_{i} )}}} \right. \kern-\nulldelimiterspace} {SD(p_{i} )}} $$
(6)
  • Stage 3: A pre-defined threshold is associated, with respect to which the unnecessary keypoints are omitted. In the RKEM method, this threshold is found experimentally, and the image contents are not taken into account.

  • Stage 4: For any two distinct keypoints \(p_{i}\) and \(p_{j}\), if the distance is less than the threshold value, one keypoint is considered as redundant based on (7) and thus deleted. In fact, the redundant keypoint is the one with the larger redundancy index or smaller sum of distance values.

    (7)

This method has a major problem: The threshold value is considered empirically fixed for the all the images. With respect to this value, the redundancy of keypoints in both the reference and sensed images is evaluated. If this threshold value is not determined properly, it leads to non-removal of redundant keypoints or removing important ones, since the distribution of the keypoints in an image is not homogeneous. Thus, the matching process may be interfered.

3 The proposed method

The image matching process consists of 3 steps (see Fig. 2). At first, the initial features of both images are detected using the classical SIFT algorithm, and then the final features are selected based on the proposed adaptive RKEM method. Finally, the matching process is performed.

Fig. 2
figure 2

Flowchart of the proposed image matching approach

3.1 Initial features detection

At the first step, the scale-space extrema are detected, the keypoints are accurately localized, and the orientation for each keypoint is assigned based on the classical SIFT algorithm [18], as in the first left column in Fig. 2.

3.2 The proposed adaptive RKEM Method

For this step, the proposed adaptive RKEM method is used for extracting and selecting the final features. An integer value is determined to represent the maximum number of threshold values. Then, the following stages are performed for the reference and sensed images:

  • 1st stage: The Manhattan distance between any pair of keypoints (e.g., \(p_{i}\) and \(p_{j}\)) in the image is calculated according to Eq. (4) for all the keypoints.

  • 2nd stage: The histogram of the keypoints’ distances is computed, in which the number and the width of the bins is considered according to the following formulas:

    $$ nb = \max (\left\lfloor {{N \mathord{\left/ {\vphantom {N {10}}} \right. \kern-\nulldelimiterspace} {10}}} \right\rfloor ,10) $$
    (8)
    $$ bw = \frac{Mxd - Mnd}{{nb}} $$
    (9)

In Eq. (8), \(nb\) represents the number of the bins and \(\left\lfloor \cdot \right\rfloor\) is the floor operation. In Eq. (9), \(bw\) is the bin width and \(Mxd\) and \(Mnd\) are the maximum and minimum of keypoints distances, respectively. Then, the ratio of the frequencies of the adjacent bins from the lower bin toward the higher one is computed as in Eq. (10). The first bin whose ratio to its next bin is less than 0.5 is selected and its maximum bin point is considered as nmax. In Eq. (10), the \(k^{th}\) bin (\(bin_{k}\)) covers the distances in the range of \([Mnd + (k - 1).\,bw,\,\,Mnd + k.\,bw]\) and \(m_{k}\) is its frequency in the histogram.

$$ \begin{gathered} \hat{k} = \mathop {\min }\limits_{k} \,\left\{ {\arg ([{{m_{k} } \mathord{\left/ {\vphantom {{m_{k} } {m_{k + 1} }}} \right. \kern-\nulldelimiterspace} {m_{k + 1} }}]\, < 0.5)} \right\},\,\,\,\,\,k = 1, \cdots ,nb \hfill \\ n_{\max } = \left\lfloor {\max \,\,{\text{bin}}_{{\hat{k}}} } \right\rfloor \hfill \\ \end{gathered} $$
(10)
  • 3rd stage: For each threshold value \(n\) with \(n = 1, \cdots ,n_{\max }\), the following operation is carried out. If the distance of \(p_{j}\) to a keypoint \(p_{i}\) is greater than \(n\), that distance is stored in the set \(R_{n}\). This set collects distances of all keypoints which are greater than n.

    $$ R_{n} = \left\{ {d(p_{i} ,p_{j} )\,\,|\,\,d(p_{i} ,p_{j} ) > n\,\,;\,\,\forall i,j = 1, \cdots ,N;\,\,i \ne j} \right\}\, $$
    (11)
  • 4th stage: The variance of the set \(R_{n}\) is calculated as:

    $$ \sigma_{n}^{2} = {\text{var}} (R_{n} )\,\,n = 1,2, \cdots ,n_{\max } $$
    (12)
  • 5th stage: Since the purpose of the proposed algorithm is finding the optimal threshold value for removing redundant keypoints, the one with the minimum variance of the keypoints’ distances is selected according to Eq. (13) in order to choose the best threshold among the previously considered values. It means that the dispersion of the keypoints is smaller, indicating the existence of less redundant keypoints.

    $$ n_{opt.} = \arg \,\,\min (\sigma_{n}^{2} ) $$
    (13)
  • 6th stage: If the distance between each two distinct keypoints is less than the optimal threshold \(n_{opt.}\), the unnecessary keypoint is removed. As a matter of fact, the keypoint with higher RI is considered redundant and thus, ignored.

The presented adaptive RKEM method automatically finds the threshold value and is able to obtain independent threshold values for the reference and sensed images. This method leads to accurate removal of redundant keypoints and ultimately increases the matching accuracy.

3.3 Matching process

Matching is the process of determining the correspondence between two or more images of the same scene received at different times, with different angles, or by different sensors. To carry out image matching, there are different descriptors which could generally be categorized into three groups: distribution-based descriptors, special frequency techniques-based descriptors, and differential descriptors [15]. This paper uses the SIFT, which is a distribution-based descriptor. To create a descriptor in the SIFT algorithm, an area is initially considered as a 4 × 4 grid around each feature in its associated Gaussian image. The dimensions of this area are selected according to the scale of each feature, so that each bin is as a square with an edge equal to three times the scale of the feature. Then, the grid coordinates are rotated equal to the main orientation of the desired feature. The magnitudes and gradient orientations of the pixels inside the rotated area are calculated, and the gradient orientations are rotated equal to the original orientation of the features being rotated. Finally, the SIFT descriptor is created as a vector with 128 components. After creating one descriptor for each feature, by calculating the Euclidean distance for each descriptor in both images, matching process is done. To perform the nearest neighbor distance ratio (NNDR) matching operation, for each keypoint descriptor in the reference image, its Euclidean distance to the first nearest and the second nearest neighbors in the sensed image are computed, and their ratio is found. If this value is less than a threshold value, it will be considered as the matched point. In this method, the descriptor \(RD_{k}\) in the reference image and the descriptor \(SD_{i}\) in the sensed image are matched if Eq. (14) holds [18, 21].

$$ \frac{{\left\| {RD_{{\text{k}}} - SD_{i} } \right\|}}{{\left\| {RD_{k} - SD_{j} } \right\|}} < T $$
(14)

In which, \(SD_{i}\) and \(SD_{j}\) are the first and the second nearest descriptors to the descriptor\(RD_{k}\), respectively.

4 Simulation and experimental results

In order to evaluate the proposed method, four databases were examined. The first database contained different types of images with different distortions (different affine distortions, scale changes, illumination changes, and different blur), including 10 images of each distortion. In each type of distortion, the sizes of images are different. For the case of the distortion of the scale changes, the images sizes are \(800 \times 640\). Also for the illumination changes, and different blur and rotation changes, the images sizes are \(800 \times 640\), \(921 \times 614\), and \(1000 \times 700\), respectively. These images were taken from (https://lear.inrialpes.fr/people/mikolajczyk/Database/index.html) [46]. The second database contained ten pairs of natural images with different distortions (scale changes and rotation changes), taken from [25]. The third database contained 6 pairs of multi-modal remote sensing images with different context conditions such as urban areas, and natural landscapes, taken from [1]. The fourth database contained one pair of images with perspective distortions, taken from (https://github.com/jagracar/OpenCV-python-tests/blob/master/OpenCV-tutorials/data/box.png.) [47]. In these experiments, in order to prevent the selection of threshold values from affecting the results of image matching, the algorithm parameters were selected constantly as follows. The scale space in the SIFT algorithm was created in five octaves, each of which had four scale layers. The amount of the threshold \(T_{C}\) was considered 0.03 in order to extract the suitable number of features according to Lowe’s suggestion [14]. The \(T\) threshold value in Eq. (14) was considered 0.8 in order to remove false matches (mismatches), as Lowe suggested [18, 21]. The threshold value in the RKEM algorithm in Eq. (7) was considered 3 in order to remove redundant keypoints according to [25]. In the A2SIFT algorithm, the threshold value \(n\) was considered between 0.01 and 0.05 according to [39]. Two sets of experiments were done. In the first set, different threshold values in the RKEM–SIFT were studied. In the second set, we evaluated the performance of the proposed method in the process of matching. To investigate the effectiveness of the proposed method, it was also compared with some common algorithms including the standard SIFT [14], A2SIFT [39] and RKEM-SIFT [25]. All the experiments were run on a personal computer with a 2.27 GHz Intel Core i5, 4G RAM with software MATLAB 2016A.

4.1 Evaluation criteria

Four common criteria, i.e., matching accuracy according to Eq. (15), recall according to Eq. (16), repeatability according to Eq. (17) and RMSE (root mean square error) according to Eq. (18) are used to evaluate the proposed method.

$$ {\text{Accuracy}} = {{TP} \mathord{\left/ {\vphantom {{TP} M}} \right. \kern-\nulldelimiterspace} M} $$
(15)
$$ {\text{Recall}} = {{TP} \mathord{\left/ {\vphantom {{TP} P}} \right. \kern-\nulldelimiterspace} P} $$
(16)
$$ F_{r} = {{TP} \mathord{\left/ {\vphantom {{TP} {\min \left( {N_{{{\text{ref}}}} ,N_{{{\text{sens}}}} } \right)}}} \right. \kern-\nulldelimiterspace} {\min \left( {N_{{{\text{ref}}}} ,N_{{{\text{sens}}}} } \right)}} $$
(17)
$$ {\text{RMSE}} = \sqrt {{{\sum\limits_{i = 1}^{M} {\left( {x_{i} - x^{\prime}_{i} } \right)^{2} + \left( {y_{i} - y^{\prime}_{i} } \right)^{2} } } \mathord{\left/ {\vphantom {{\sum\limits_{i = 1}^{M} {\left( {x_{i} - x^{\prime}_{i} } \right)^{2} + \left( {y_{i} - y^{\prime}_{i} } \right)^{2} } } M}} \right. \kern-\nulldelimiterspace} M}} $$
(18)

In Eqs. (1518), \(TP\) is the number of correct matches, \(M\) is the total number of matches and \(P\) is the number of correspondences. \(N_{{{\text{ref}}}}\) and \(N_{{{\text{sens}}}}\) denote the total number of the detected features of the reference and the sensed images, respectively. Also in (18), the pairs \((x_{i} ,y_{i} )\) and \((x^{\prime}_{i} ,y^{\prime}_{i} )\) represent the coordinates of the reference and transformed ith matched keypoints, respectively. If the matching accuracy, recall and repeatability are high and RMSE is low, the performance of the system is acceptable.

4.2 Effects of the threshold value in the RKEM–SIFT on blur images

In this experiment, we researched on how to find the optimal threshold value in the RKEM-SIFT algorithm. In this test, 2 images with the same distortion (blur) were used. A couple of images (structural images) with blur changes were used (Fig. 3), and the matching accuracy for different thresholds is shown in Fig. 4. As seen in the latter figure, the appropriate threshold value in Eq. (7) in these images is 4 as it has the highest matching accuracy. In the next test, a couple of images (texture images) with blur changes were used (Fig. 5), and the matching accuracy of which on different thresholds is shown in Fig. 6. In this pair of images, the optimal threshold value was 6.

Fig. 3
figure 3

A pair of structural images with blur distortion, a reference image; b sensed image

Fig. 4
figure 4

Graph of the matching accuracy versus various threshold values in the RKEM–SIFT method on a pair of structural images with blur distortion

Fig. 5
figure 5

A pair of textural images with blur distortion, a reference image; b sensed image

Fig. 6
figure 6

Graph of matching accuracy versus various threshold values in the RKEM–SIFT method on a pair of textural images with blur distortion

As seen in Figs. 4 and 6, for the same distortion, the optimal threshold values in these pairs of images are different. Figure 3 shows a pair of structural images while Fig. 5 shows a pair of texture images; hence it can be concluded that the nature of the images affects the choice of the optimal threshold value. Generally, in addition to the nature of images, the type of images and the between distortion play roles in determining the optimal threshold value. Using an adaptive threshold value which considers distortion in the images and their types and nature, the RKEM-SIFT achieves more accurate matching results.

4.3 Simulation results of the matching process

To evaluate the effectiveness of the proposed Adaptive RKEM method, a series of experiments (4.3.1–4.3.8) with various affine distortions, different scales and illuminations, different rotations, JPEG compression and perspective distortions are designed, and the results are discussed by the matching accuracy, recall, repeatability and the RMSE evaluation criteria.

4.3.1 Matching of images with affine distortions

In this experiment, a set of real-world images with affine distortions, i.e., images that are different in terms of scale, movement, rotation, and shearing taken from [46] are matched to demonstrate the validity of the proposed method. The results are shown in (Figs. 7, 8) and Table 1. As shown in these figures, the number of incorrect matches in the proposed adaptive RKEM is less than those of the RKEM, A2 SIFT and SIFT methods. Table 1 reports the performance measures for different matching techniques. It is obvious that the proposed method achieves high performance rates in terms of the matching accuracy and the RMSE, which reflects the appropriate operation of the proposed Adaptive RKEM method. The run time of the feature extraction in the proposed method has increased because in addition to calculating the Manhattan distance to find redundant keypoints, the process of computing the appropriate threshold value for removing these redundant keypoints is also included. Feature matching time in the Adaptive RKEM is reduced because redundant keypoints are removed. As it is concluded from Table 1, the increase of this total time (i.e., sum of the feature extraction time and the feature matching time) in the proposed method is negligible compared with its considerable increase in the matching accuracy and decrease in the RMSE.

Fig. 7
figure 7

Matching process results of images with different affine distortion a, b: SIFT method [18], c, d: A2SIFT method [48]. In this and the subsequent figures, the left column shows the images with descriptors and the right column shows the matches

Fig. 8
figure 8

Matching results of images with different affine distortion a, b: RKEM-SIFT [31], and c, d: the adaptive RKEM

Table 1 Matching process criteria for different methods on images with different affine distortions

4.3.2 Matching of images with changes in scale

In this test, a couple of images with different scales taken from [46] are used, and the matching process is evaluated in this case. Squares in (Figs. 9, 10) represent the false matches due to redundant SIFT features. In addition, it is shown that the unnecessary keypoints are deleted using RKEM–SIFT and the Adaptive RKEM improves the matching accuracy.

Fig. 9
figure 9

Matching process results of images with different scales a, b: SIFT method, c, d: A2SIFT method

Fig. 10
figure 10

Matching process results of images with different scales a, b: RKEM–SIFT, and c, d: the adaptive RKEM

Quantitative results obtained by different methods are given in Table 2, which shows that the proposed Adaptive RKEM method outperform others in terms of accuracy, repeatability, recall, RMSE and matching run times. This fact demonstrates that the proposed method is more robust to scale changes than the other methods.

Table 2 Matching process criteria for different methods on ten couples of images with different scales

4.3.3 Matching of images with illumination changes

To demonstrate the performance of the proposed method, a couple of images with different illuminations taken from [46] are employed in this experiment (Figs. 11, 12). Table 3 indicates that the proposed method obtains the highest alignment accuracy, recall and repeatability compared with the SIFT [18], A2SIFT [48] and RKEM–SIFT. In this experiment, total run time of the Adaptive RKEM is almost twice as that of the classic SIFT method because the most time is spent to find and remove redundant keypoints.

Fig. 11
figure 11

Matching process results of images with different illuminations a, b: SIFT method, c, d: A2SIFT method

Fig. 12
figure 12

Matching process results of images with different illuminations a, b: RKEM–SIFT, and c, d: the adaptive RKEM

Table 3 Matching process criteria for different methods on a couples of images with different illuminations

4.3.4 Matching of images with rotation changes

In this test, ten couples of real-world images with rotation changes (mobile images) taken from [31] are used. Comparison of the results is shown in Fig. 13. According to the RMSE values in this figure, it can be concluded that the Adaptive RKEM method significantly outperform SIFT, A2SIFT [48], and RKEM–SIFT methods by alignment accuracy for images with rotation changes. By a further comparison, it can also be found that the proposed Adaptive RKEM method obtains lower RMSE than the others.

Fig. 13
figure 13

The RMSE results of the 5 matching methods applied to 10 pairs of images with various rotation angles

4.3.5 Matching of images with JPEG compression

In this experiment, 5 pairs of JPEG compression images taken from [46] are used; the results of the RMSE values are reported in Table 4. As shown in this table, the proposed method reveals better performance than the others. Functioning of the matching methods on a pair of images is shown in Fig. 14, in which the mismatches are highlighted using squares. As shown, in terms of matching accuracy and number of mismatches, the SIFT algorithm performs the worst among other methods but RKEM–SIFT and Adaptive RKEM obtain acceptable matching indices.

Table 4 Average RMSE results of matching processes in images with JPEG compression
Fig. 14
figure 14

Matching process results by RKEM–SIFT [31] and the proposed method on images with JPEG compression. a The RKEM–SIFT method; b the proposed Adaptive RKEM–SIFT

4.3.6 Matching of multi-modal remote sensing images

In this test, multi-modal remote sensing images from [1] are used and the matching process is evaluated in this case. Squares in Fig. 15 represent the false matches due to redundant SIFT features. As shown in Fig. 15, many false matches could be recognized in the SIFT algorithm results that do not exist in the proposed Adaptive RKEM method. Therefore, it is concluded that removing redundant keypoints improves the matching process.

Fig. 15
figure 15

Matching process results of multi-modal remote sensing images a: SIFT, and b: the proposed adaptive RKEM–SIFT

4.3.7 Matching of Images with perspective distortions

To demonstrate the performance of the proposed adaptive RKEM method for images with perspective distortions, some samples from [47] are employed in this experiment. Comparison of the results is shown in Fig. 16.

Fig. 16
figure 16

Matching process results by RKEM–SIFT [31] and the proposed method of images with perspective distortions. a The RKEM–SIFT method; b the proposed Adaptive RKEM–SIFT

Table 5 indicates that the proposed Adaptive RKEM method obtains the highest matching accuracy, compared with the SIFT and the RKEM-SIFT.

Table 5 Matching process criteria for different methods on images with perspective distortions

4.3.8 Investigation of the proposed adaptive RKEM method on other detectors

To demonstrate the performance of the proposed adaptive RKEM method on other detectors as SURF and KAZE, a couple of images (mobile images) with scale changes from the database [25] are employed in this experiment. Table 6 indicates that the proposed Adaptive RKEM method (including Adaptive RKEM-SURF and Adaptive RKEM-KAZE) obtained better matching accuracies compared with the SURF and KAZE.

Table 6 Matching process criteria for different methods on images with different scales

5 Conclusions

In this paper, an adaptive algorithm for eliminating redundant keypoints in the SIFT method was presented. The proposed algorithm, generated based on the RKEM–SIFT method, enhanced the matching performance with respect to its origin in terms of different indices. In this technique, the threshold value was selected based on keypoints’ distribution in the image under study. The aim of the 2nd and 3rd stages in the proposed method was to find a maximum distance threshold value (\(n_{\max }\)), beyond which we do not proceed to find redundant keypoints. This value truncates the histogram of the distances; hence, it facilitates finding the optimal threshold value. One of the main characteristics of the proposed adaptive RKEM method was that the threshold value could be chosen differently in the sensed and reference images unlike the RKEM. This capability was very important, as these images had different characteristics, and the same threshold value led to lower image matching performance. Experimental results showed that the proposed method is applicable to various types of images such as natural images and remote sensing ones and to different distortions such as angle, perspective, affine and scale changes. Using the proposed Adaptive RKEM method in different applications other than image matching, such as image registration, change detection and image mosaicking could be the subjects of future works.