1 Introduction

Image mosaicing (image stitching) is the process of stitching two or more images together in order to create a larger, consistent and seamless composite image [1,2,3,4]. Image mosaicing is one of the most important and commonly used in the fields of image processing, machine vision, and computer graphics, which has recently been the focus of many researchers [5,6,7,8,9]. Mosaicing process has many applications in image compression [10], image stabilization [11, 12], medical image analysis [13], video indexing [14], panoramic photography [15, 16], and mapping [17]. Although there are various methods for mosaicing images, the majority of them are composed of three steps including feature extraction, image registration, image warping and blending [3, 18, 19]. In general, image mosaicing methods are divided into two categories: pixel-based (direct) methods [20,21,22] and feature-based methods [23,24,25,26]. In the direct methods, the information of the entire pixels of the image is used for mosaic image. Direct methods are good for mosaicing images with large overlap, translations, and small rotations [23, 27]. In the feature-based methods, image features are identified and then matching and homography between feature points are done. Feature-based methods are suitable for mosaicing images with little overlapping areas [27, 28]. In general, feature-based methods have more precision compared to direct methods and are used more often in practical applications [29].

In feature-based methods, features are extracted from images in a variety of ways. In some methods, these features are detected by deep neural networks [30, 31], and in some other methods, features are extracted by common feature detectors [32, 33]. Deep neural networks are composed of three main layers: convolution, pooling and fully connected [34]. The features are identified by the convolution layer and these features are converted to the features vector by the pooling layer. Deep neural networks are widely used in image processing [35,36,37,38], but these networks have weaknesses in feature extraction, including the incomprehensibility of image features physically, time consumin, and computational complexity [39]. These disadvantages have led to the continued attention to classical detectors in feature extraction. Feature-based detectors for image mosaicing are divided into two categories [40,41,42,43]: mosaicing images based on high-level features and low-level features. In the high-level feature algorithms, one part of the image (contours) is identified as feature. In the low-level feature algorithms, keypoints or corners are identified as features [44, 45]. Typically, low-level feature algorithms are used more often for the image mosaicing process than high-level feature algorithms [2, 23, 29]. Low-level feature algorithms consist of two stages: identifying features and crating descriptors [46, 47]. At the feature identification stage, distinct positions of the image stable against different deviations (rotation, scale, affine, illumination) are extracted, and at the stage of creating descriptors, a descriptor is created for each identified feature using the specific area around each feature such as the gradient size, image information, texture, and edges. The most important and widely-used algorithm used in low-level feature extraction is the Scale Invariant Feature Transform (SIFT) algorithm [48, 49].

SIFT is a computer vision algorithm used to extract specific features of the images, and was originally proposed by David Lowe in 2004 at Columbia University, patented in the United States [32]. The features obtained from this algorithm are invariable against image scale and rotation as well as the change in the viewpoint, and the changes in intensity are partly fixed [50]. Despite the advantages mentioned, the algorithm has limitations that can reduce the image mosaicing process quality [51, 52]. Many studies have been conducted to solve the problems of this algorithm [53,54,55,56,57,58,59,60,61,62,63,64]. The PCA-SIFT algorithm has been introduced to reduce the size of the vector in the SIFT descriptor [53]. In this approach, the Principal Component Analysis is introduced to reduce the size of the descriptors vector. In this algorithm, 128 dimensional SIFT descriptors are turned into a 40-dimensional vector, increasing the speed of the algorithm performance and enhancing its performance in cases when images do not have much affine transformation. The Speed Up Robust Features (SURF) algorithm is a resilient local feature detector that has been proposed to enhance the speed of SIFT, which is also constant against rotation and scale [54]. SURF uses fast measurement of the Hessian matrix for the detector and distribution-based descriptor. Although SURF performance is similar to SIFT, at a same time, SURF can operate faster. The N-Dimensional Scale Invariant Feature Transform (NSIFT) method has been presented to improve N-dimensional images matching in SIFT [55]. This procedure is effective in the medical image matching that is three and four dimensional. Scale Restriction Scale Invariant Feature Transform) SRSIFT (has been introduced to reduce mismatches in SIFT [56]. This method eliminates some of the mismatches by modifying the keypoints gradient direction and scale limit standard leading to SIFT improvement in multi-modal remote sensing images. Auto-Adaptive SIFT (A2SIFT) has been proposed to improve SIFT over spatial distribution [57]. Affine SIFT (ASIFT) is the improved version of SIFT that is constant to affine deviations and much stronger against angle changes [58]. ASIFT extracts an impressive number of matching points in images, but often a large number of mismatches occur among them. The Iterative SIFT algorithm has been presented to reduce the number of features of SIFT [59]. Although this method can reduce the number of features and control the feature extraction step of SIFT but the distribution of keypoints in this method is not uniform. The Uniform Robust SIFT (UR-SIFT) algorithm has been introduced for more control over SIFT feature extraction [60]. RKEM-SIFT has been introduced to delete unnecessary keypoints in SIFT, the effectiveness of which in image registration [61], retinal image matching [63], and copy-move forgery detection [65] has been confirmed. In this algorithm, after identifying the keypoints according to the classic SIFT, in each image, the keypoint’s distance from each other is calculated first. Afterward, when the distance between two different keypoints is less than a certain threshold value, the redundant keypoint is removed and the other one is maintained for the matching process. One of the major disadvantages of RKEM-SIFT is that it considers a single threshold value for the removal of redundant keypoints in all parts of the image regardless of the details of the image. In an image, one part may have many details and another part may have few details; a fixed threshold value in the whole image cannot lead to good results in the removal of redundant keypoints.

In this paper, to solve the mentioned problem of RKEM-SIFT, a new method is proposed to find the appropriate threshold values based on the clustering approach. In this proposed method, after detecting the features by SIFT, based on the Fuzzy C-Means clustering algorithm which is one the most applicable methods in data clustering, keypoints are placed in certain order clusters. Afterward, to remove redundant keypoints, according to the details of each region of the image, the appropriate threshold value is considered. The proposed Clustered Redundant Keypoint Elimination Method-SIFT (CRKEM-SIFT) can increase the matching precision and ultimately improves the performance of image mosaicing. Another innovation of the paper is the proposal of a new method in image blending. One of the major problems with classical methods is the combination of image artifacts and common lines. The Gaussian weight function is used in the proposed method of combining the mean and its variance, based on the mean and variance of the overlapped region, respectively.

The organization of the rest of paper is as follows. In Sect. 2, classical algorithms and their problems in image mosaicing are described, and in Sect. 3, the proposed approach is explained and in Sect. 4, we deal with the results of testing by different applications such as image matching, image registration, image mosaicing and change detection. Finally, the paper is concluded in Sect. 5.

2 Classical algorithms and their problems in image mosaicing

The image mosaicing process consists of three different steps according to Fig. 1. At first, the features of both input images are detected, and then the process of registration is done. Image registration refers to the geometric alignment of the image sets. Finally, image blending is carried out which modifies the image gray levels in the vicinity of a boundary to obtain a smooth transition between images by removing these seams and creating a blended image. RKEM-SIFT, and the common methods of blending such as the weighted average method [66], and ORB-improved blending [67] is described briefly, and finally the disadvantages of each of these algorithms are investigated in image mosaicing.

Fig. 1
figure 1

Flowchart of image mosaicking process

2.1 RKEM-SIFT algorithm

RKEM-SIFT algorithm was introduced by Hossein-Nejad et al. in 2017 [61]. This algorithm is the improved form of SIFT algorithm in the feature extraction step. In this method, the features distance identified in the SIFT algorithm are calculated and proposed as the redundancy index criterion according to the distance between features. Then the features distance which is less than the predetermined threshold value, the feature of the redundancy index of which is more than other features, is omitted.

2.2 RKEM-SIFT problems

In an image, there are areas with many details, such as trees, and areas with least details such as the sky, the sea, where one of the problems of RKEM-SIFT method exits: in deletion of redundant keypoints, it does not pay attention to image details, so acting the same in all regions of the image. As shown in Fig. 2, image details at the bottom are more than those at the top of the image, so the number of the keypoints identified at the top of the image is less than those at the bottom. Finally, the distance between the keypoints is closer to the bottom of the image and there is even overlap in some areas. To remove features that overlap, attention should be paid to the image details because some of the features that are close to each other are not redundant in areas where there are many details. This problem mentioned in the RKEM-SIFT algorithm may interfere with the next processes, leading to problems such as the reduction in precision in image mosaicing. In the next section, we will examine the blending methods.

Fig. 2
figure 2

Detected keypoints in the typical image with different types of structure

2.3 Blending algorithms

The process of image mosaicing requires to create an integrated image without border-stitching after the registration stage. Image stitching is to create an image from blending of two or several other images so that their most important information is reserved and the boundaries of the images are not clear. The three common methods in image blending are the Weighted Average [21, 66], Wavelet Transform [68, 69], and Spline Multi-resolution Method [70]. The Weight Average method is used most [67, 71]. Blending two images is presented according to Eq. (1), which provides various methods for calculating the amount \(\beta \left( x \right)\).

$$ I\left( {x,y} \right) = \beta \left( x \right)I_{{{\text{Ref}}}} \left( {x,y} \right) + \left( {1 - \beta \left( x \right)} \right)I_{{{\text{Tgt}}}} \left( {x,y} \right) $$
(1)

In this equation, IRef (x, y) and ITgt (x, y) are pixels in the reference image and the target image at the same location of the overlapping region.

In [66], the Weighted Average method is used for image blending, where \(\beta \left( x \right)\) is the cross-fade weighting function; its value is calculated according to Eq. (2).

$$ \beta \left( x \right) = 1 - \frac{{x - x_{\min } }}{{x_{\max } - x_{\min } }} $$
(2)

In this equation, Xmin and xmax representing the left and right edge of a certain row in the overlapping region, respectively. The disadvantages of the weighted average method is the use of cross-fade function as this function creates artificial edges and blurring [67]. In [67] to solve the problem mentioned in the weighted average method, the Gaussian model is used instead of cross-fade function according to Eq. (3). The advantage of this method, is fewer artificial edges in the blended images. One of the disadvantages of this method [67] is the existence of boundary lines in the image.

$$ \beta \left( x \right) = 2e^{{ - \frac{{\left( {x - u} \right)^{2} }}{{2\sigma^{2} }}}} - \frac{1}{2} $$
(3)

In this equation, u and \(2\sigma^{2}\) are obtained according to Eq. (45).

$$ u = x_{\max } $$
(4)
$$ 2\sigma^{2} = \left( {x_{\max } - x_{\min } } \right)^{2} $$
(5)

In this equation, Xmin and xmax represent the left and right edge of a certain row in the overlapping region, respectively.

2.4 Problems of blending methods

Each of image blending methods has some problems, each of which will be discussed below.

Most of the blended methods described shortly in the previous section are used to blend two images. However, normally more than two images are used to create panoramic images. Geometric adjustments, radiometric adjustments, altered imaging position, and changes in image texture and color cause problems in image blending. The most common problems are blurring of the image in common areas (Fig. 3a), artifacts (Fig. 3.b), boundary line in the blending area (Fig. 3c) and artificial edges (Fig. 3d).

Fig. 3
figure 3

Problems of blending methods, a blurring in common areas, b artifacts, c The boundary line in blending area, d artificial edges

3 Method

In this section, CRKEM-SIFT is proposed at the feature extraction stage and Gaussian-weighted blending function is proposed at the blending stage to improve image mosaic processes. The details of each method are as follows.

3.1 Proposed CRKEM-SIFT method in feature extraction

The details of the proposed method are as follows.

Stage 0 A set of different images to be mosaicked is considered as the input to the algorithm.

Stage 1 The keypoints are identified in each image using SIFT.

Stage 2 For each keypoint of the image, one orientation is considered according to the classic SIFT [12].

Stage 3 The keypoints in each image are placed in m clusters using the Fuzzy C-Means clustering algorithm [46].

Stage 4 In each cluster, using Eq. (6), the number of each cluster’s keypoints (Nci) to the total detected keypoints in the image (Nt) is calculated. This value is considered as the threshold for each cluster. This threshold value is appropriate because the number of keypoints required for each cluster is identified by considering the image content in each cluster compared to the overall content of the image.

$$ {\text{threshold}} = \frac{{N_{{{\text{ci}}}} }}{{N_{t} }}\quad i = 1, \ldots , m $$
(6)

Stage 5 In each cluster, the Manhattan distance between each keypoint and other keypoints is calculated, respectively, according to Eq. (7).

$$ d\left( {p_{a} , p_{b} } \right) = \mathop \sum \limits_{i = 1}^{k} \left| {p_{a} \left( i \right) - p_{b} \left( i \right)} \right| $$
(7)

In these Equation, pa (i) is the ith coordinate of the keypoint pa, pb (i) is the ith coordinate of the pb and k is the dimension of the keypoints. Then the total distance between each keypoint and other keypoints is calculated in each cluster according to Eq. (8).

$$ {\text{SD}}\left( {p_{a} } \right) = \mathop \sum \limits_{j = 1}^{N} d\left( {p_{a} , p_{j} } \right) $$
(8)

In this equation, N is the number of keypoints in each cluster and d(pa, pj) is the distance between the keypoint pa and pj calculated according to Eq. (7).

Stage 6 If the Manhattan distance of two distinct keypoints is less than the threshold value obtained according to Eq. (6) in each cluster, we should eliminate one of the keypoints. Among these points, the keypoint, according to Eq. (9), whose redundancy is high is removed because Eq. (9) shows the amount of redundancy of the keypoints. The greater this amount is, the closer the keypoints are to one another.

$$ {\text{RI}}\left( {p_{a} } \right) = \frac{1}{{{\text{SD}}\left( {P_{a} } \right)}} $$
(9)

Stage 7 at this stage, for each remaining keypoint of the image, a descriptor vector is computed according to the classic SIFT [12].

3.2 Proposed Gaussian-weighted method in blending

Image blending is the mosaic final step to blend the pixels intensity in the overlapped region to avoid the seams. Image blending is used according to Eq. (10).

$$ I\left( {x,y} \right) = \beta \left( {x,y} \right)I_{{{\text{Ref}} }} \left( {x,y} \right) + \left( {1 - \beta \left( {x,y} \right)} \right)I_{Tgt} \left( {x,y} \right) $$
(10)

In this equation, \(I_{{{\text{Ref}} }} \left( {x,y} \right)\) and \(I_{Tgt} \left( {x,y} \right)\) are the pixels of the reference and target image in overlapping areas. \(\beta \left( {x,y} \right)\) is the Gaussian weighted function that gives a value to the pixel as weight according to the distance of the pixel to the boundary line. The Gaussian weighted function \(\beta \left( {x,y} \right)\) is according to Eq. (11).

$$ \begin{gathered} \beta \left( {x,y} \right) = 2e^{{ - \frac{{\left( {x - u} \right)^{2} }}{{2\sigma^{2} }}}} - \frac{1}{2} \hfill \\ u = {\text{mean}}\left( {\left\{ {x_{i} \left| {i = 1,...,n} \right.} \right\}} \right) \hfill \\ \sigma^{2} = {\text{var}} \left( {\left\{ {x_{i} \left| {i = 1,...,n} \right.} \right\}} \right) \hfill \\ \end{gathered} $$
(11)

In these equations, xi is the data in the overlapping areas and n is the number of pixels in the overlapping areas.

4 Simulation results

In this section, a set of experiments in different applications is considered to assess the performance of the proposed CRKEM-SIFT method compared with SIFT [32], SURF [54], MSER-SIFT [72], RKEM-SIFT [61], ARKEM-SIFT [73], ORB-Weighted Mean method [67], AKAZE-MAC [3], SIFT-Voronoi Diagram [51], SURF-Multiband Blending [74], SIFT-weighted Average [75], SURF-LM [76], and SIFT-Improved Weighted Average [77]. In the first experiment series, effects of different threshold values on preservation of image details in RKEM and CRKEM-SIFT are investigated. In the second, the performance of the proposed CRKEM-SIFT approach in the matching process is investigated. In the third and fourth sets, the performance of the proposed method in the applications of registration process and image mosaicing process are used, respectively. In these experiments, in order to prevent the selection of the parameters values from affecting the results, the algorithm parameters are selected constant. In the proposed algorithm of CRKEM-SIFT, the value of m which is the number of clusters is considered 3 and this number is considered experimental. The amount of the parameter Tc is considered 0.01 in order to extract the suitable number of features according to Lowe’s suggestion [32].

4.1 Image database

To assess the proposed method, it is necessary that the images be chosen in a way that include all distortions (different scales, different viewpoints, different times and changes in rotation) between the images. To do so, the collection of images includes tree dataset, wall images downloaded from (www.robots.ox.ac.uk), bough images downloaded from (www.robots.ox.ac.uk), graffiti images downloaded from (http://lear.inrialpes.fr/people/mikolajczyk/Database), building images downloaded from (https://www.mathworks.com), and Waterfall images.

4.2 Effects of different threshold values on preservation of image details in RKEM and CRKEM-SIFT

In this experiment, an image was used to examine the removal of redundant keypoints and the performance of removing the redundant keypoints by RKEM-SIFT with different thresholds and the proposed CRKEM-SIFT is shown in Fig. 4.

Fig. 4
figure 4

Keypoint detection by RKEM-SIFT and proposed CRKEM-SIFT, a original image, b RKEM-SIFT with threshold value of three, c RKEM-SIFT with threshold value of four, d RKEM-SIFT with threshold value of five, e RKEM-SIFT with threshold value of six, f RKEM-SIFT with threshold value of seven, g proposed CRKEM-SIFT

As shown in Fig. 4b–f, by increasing the threshold value in the RKEM-SIFT method, more keypoints are considered as redundant points and eliminated. RKEM-SIFT does not pay attention to the details of the image to remove the redundant keypoints, and in the part of the image that has a lot of details, the keypoints are removed in the same proportion as that of the part of the image with few details. It is shown in Fig. 4g with the proposed CRKEM-SIFT, much attention has been paid to the details of the image to remove the redundant keypoints, so that in the parts of the image where the details are many, more keypoints have been preserved. In general, RKEM-SIFT does not pay attention to the content and texture of the image removing redundant points, and uniformly removes keypoints that are closer than the predetermined threshold throughout the image. The proposed CRKEM-SIFT, on the contrary, pays attention to the content and texture of the image while removing the redundant points and adaptively removes fewer keypoints in parts of the image that have more content.

4.3 Experiment in the matching process

Matching is the process of determining correspondence between two or more images of the same scene received at different times, with different angles, or by different sensors. To investigate the effectiveness of the proposed CRKEM-SIFT, a series of experiments with varied scales and viewpoints is performed, and the resulting matching process is evaluated by the matching precision according to Eq. (12), feature repeatability rates according to Eq. (13), and the total number of matches.

$$ {\text{Precision}} = \frac{{{\text{TP}}}}{m} $$
(12)
$$ F_{r} = \frac{{{\text{TP}}}}{{\min \left( {N_{{{\text{ref}}}} , N_{{{\text{sens}}}} } \right)}} $$
(13)

In these equations, TP is the number of true matches and m is the total number of matches. Nref and Nsenc denote the total number of the detected features of the reference image and the target image, respectively. If the matching precision and feature repeatability rates are high, it has a better performance in the matching process.

4.3.1 Matching process on images with varied scales

In this test, the "tree" dataset including images with different scales is used, and the image matching performance is evaluated, which is shown in Fig. 5.

Fig. 5
figure 5

Matching process results by SIFT, SURF, RKEM-SIFT, and CRKEM-SIFT on tree images with scale changes. a SIFT [32], b SURF [54], c MSER-SIFT [72], d RKEM-SIFT [61], e ARKEM-SIFT [73], f Proposed CRKEM-SIFT

In Fig. 5, false matches are grouped by red squares based on the unnecessary SIFT keypoints, what indicates removal of redundant points using RKEM-SIFT, and ARKEM-SIFT in addition to the improvement of matching precision by the proposed CRKEM-SIFT. The total number of matches is one of the important criteria. As seen in Fig. 4, the total number of matches in the ARKEM-SIFT method is much less than the proposed CRKEM-SIFT method. In ARKEM-SIFT, in order to remove the redundant keypoints, the details of the image have not received attention, what has reduced the total number of matches, so it is important that the details of the image have been considered in the proposed CRKEM-SIFT method. Quantitative results obtained by different methods are given in Table 1, which shows the proposed CRKEM-SIFT performs the best versus SIFT, SURF, MSER-SIFT, RKEM-SIFT and ARKEM-SIFT in term of feature repeatability rate. In ARKEM-SIFT, the matching precision is more than the proposed CRKEM-SIFT, which does not indicate the proper performance of ARKEM-SIFT versus the proposed CRKEM-SIFT. As the total number of matches in ARKEM-SIFT is significantly different from the proposed CRKEM-SIFT, the proposed CRKEM-SIFT is more robust to scale changes than the other methods.

Table 1 Matching process results of different methods on tree images with different scales by classic criteria

4.3.2 Matching process on images with viewpoint changes

In this test, the "wall" dataset with different angles is used, and the matching performance in the proposed CRKEM-SIFT is evaluated, shown in Table 2.

Table 2 Matching process results of different methods on wall images with different viewpoints by classic criteria

In terms of the precision and repeatability metrics, RKEM-SIFT [61], ARKEM-SIFT [73] and specially the proposed CRKEM-SIFT, also have performed better than SIFT. The SURF method performs the worst among the other methods in two sets of experiments. It has obtained fewer repeatability and lower matching precision (in Table 2).

4.4 Experiment on the image registration process

Image registration is the process of aligning two images from the same scene taken under various imaging situations [78]. To investigate the effectiveness of the proposed CRKEM-SIFT, a series of tests on images with various rotations, scales, and viewpoints is performed, and the functionality of the proposed CRKEM-SIFT in the registration process is evaluated visually and by the RMSE criterion according to Eq. (14).

$$ {\text{RMSE}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - x_{i}^{\prime } } \right)^{2} + \left( {y_{i} - y_{i}^{\prime } } \right)^{2} }}{m} $$
(14)

In this equation, (xi,yi) and (x′i, y′i) are the coordinates of the ith matching keypoint pair. If RMSE is relatively low, it has a better performance in image registration.

4.4.1 Registration process on images with different rotations and scales

In this experiment, seven pairs of the “bough” dataset which consist of images with changes rotation and scale are used, the performance results of which in the registration process are presented in Fig. 6.

Fig. 6
figure 6

Experimental results of the three matching methods on seven pairs of bough images with different rotations and scales

As shown in Fig. 6, the proposed CRKEM-SIFT method has better performance than others. Performance of the registration process on a pair of images is shown in Fig. 7. As shown in this figure, CRKEM-SIFT has improved the registration process.

Fig. 7
figure 7

Registration process results by SIFT, SURF, RKEM-SIFT, and CRKEM-SIFT on bough images with scale and rotation changes. a SIFT [32]; b SURF [54]; c MSER-SIFT [72]; d RKEM-SIFT [61]; e ARKEM-SIFT [73], f proposed CRKEM-SIFT;

4.4.2 Registration process on images with viewpoint changes

In this experiment, the "graffiti" dataset which consists of images with different viewpoints is used, and the functionality of the proposed CRKEM-SIFT in the registration process is evaluated, shown in Fig. 8.

Fig. 8
figure 8

Registration process results on graffiti images with viewpoint changes. a SIFT [32]; b SURF[54]; c RKEM-SIFT [61]; d CRKEM-SIFT

Figure 8 and Table 3 show that alignment accuracy of the proposed CRKEM-SIFT compared to the others is more suitable.

Table 3 Registration process results of different methods on graffiti images with different viewpoints by classic criterion

4.5 Experiment on the image mosaicing process

In general, the image mosaicing process includes two important steps: feature extraction and image blending. In this section, to assess these two steps, first the proposed CRKEM-SIFT method and then the proposed method (proposed CRKEM-SIFT and proposed blending method) for two and more image mosaicings are evaluated. Image mosaicing evaluation methods are divided into two categories: visual methods and objective methods. To evaluate the performance of image mosaicing algorithms objectively, several articles have used the relationship between mosaicing and reference images, including the Median error (MEE), Maximum error (MAE) [79], and RMSE. On the other hand, overlapping areas are very important in mosaicing the image process. The Structural SIMilarity (SSIM) index [80], Peak Signal-to-Noise Ratio (PSNR) [3], Feature SIMilarity Index (FSIM) index [81] and Visual Saliency-Induced Index Visual saliency (VSI) [82] are overlapping criteria. The mechanism for calculating these criteria is shown in Fig. 9.

Fig.9
figure 9

Flowchart of calculating evaluation criteria

4.5.1 Studying proposed CRKEM-SIFT method in mosaicing process

To demonstrate the performance of the proposed CRKEM-SIFT in the mosaicing process, a “Margoon waterfallFootnote 1” dataset which consists of images with different viewpoints is employed in this experiment (Fig. 10).

Fig. 10
figure 10

Mosaicing process results on Waterfall images with viewpoint changes. a SIFT [32], b SURF [54], c MSER-SIFT [72], d RKEM-SIFT [61], e ARKEM-SIFT [73], f proposed CRKEM-SIFT

As shown in Table 4, the proposed method has obtained the highest alignment precision versus SIFT, RKEM-SIFT, ARKEM-SIFT, which denotes the proper performance of the proposed CRKEM-SIFT in image mosaicing. The mosaicing process has not shown a good performance by SURF, MSER-SIFT and ARKEM-SIFT. An example of improper performance of image mosaicing has been shown in Fig. 9 with red squares, showing that it has not been able to mosaic the specified part properly.

Table 4 Mosaicing process results of different methods on waterfall images with different viewpoints by classic criterion

4.5.2 Studying the proposed method for two-image mosaicing

To demonstrate the performance of mosaicing process by the proposed method (proposed CRKEM-SIFT and proposed blending method), a dataset of the “underground city of KarizFootnote 2” which consists of images with different viewpoints is employed in this experiment (Fig. 11).

Fig. 11
figure 11

Mosaicing process results by different methods, a reference image, b target image, c AKAZE-MAC [3],d SIFT-Voronoi Diagram [51], e ORB-Weighted Mean [67], f SIFT-Improved Weighted Average [77], g proposed method

As shown in Fig. 11, some boundary lines can be seen by the results of ORB-Weighted Mean [67], AKAZE-MAC [3], SIFT-Voronoi Diagram [51] and SIFT-Improved Weighted Average [77]. In addition, parts of the image are not well stitched, indicating poor performance. By the proposed blending method, the blending process is well observed without any boundary lines.

As seen in Table 5, the proposed method has the highest value by the PSNR, SSIM, FSIM and VSI evaluation criteria and the lowest value by the MEE, RMSE and MAE evaluation criteria, which indicates its successful performance in image mosaicing. Following the proposed method, based on PSNR, SSIM, FSIM and VSI evaluation criteria are AKAZE-MAC [3], SIFT-improved weighted average [77], SIFT-Voronoi diagram [51] and finally ORB-Weighted Mean [67] for success in image mosaicing, respectively.

Table 5 Mosaicing process results of blending methods

4.5.3 Studying the proposed method for several images mosaicing process

In this experiment, a “building” dataset which consists of images with angle changes is used, the results of the mosaicing process on which are shown in Fig. 12. As it can be seen in this figure, the image mosaicing process by methods [74,75,76,77] have a large number of boundary lines (red squares in Fig. 12), but only one boundary line is shown in the proposed method, indicating the effective performance of the proposed method. In the SIFT-Weighted Average method [75], in addition to the boundary line, a false edge (blue square in Fig. 12) is also observed.

Fig. 12
figure 12

Mosaicing process results by methods [74,75,76,77], proposed method. a SURF-Multiband blending [74], b SIFT-Weighted Average [75], c SURF-LM [76], d SIFT-Improved Weighted Average [77], e proposed method

5 Conclusion and future works

In this paper, a new approach for natural image mosaicing process using a combination of two proposed methods CRKEM-SIFT and a Gaussian-Weighted blending method was introduced. At first, the features were extracted from the images using the proposed CRKEM-SIFT algorithm, which individually showed that the proposed CRKEM-SIFT increased the matching precision and ultimately improved the performance of image mosaicing. Then, the mean and variance of the pixels of common areas were considered as the mean and variance in the Gaussian Weight function, showing the enhancement of the image blending. Therefore, the proposed method seems to be improving the accuracy of the natural image mosaicing. Application of the proposed CRKEM-SIFT method in other research fields such as image tracking might reasonably be the topic of future works.