1 Introduction

The technology of image mosaic is a recent axis of research within the field of computer vision. This new technology is about getting a panoramic view from a sequence of images. Mosaic images can be used in a number of computer vision applications and computer graphics, such as the construction of large aerial and satellite photographs using image collections [29], motion detection [3, 5], video indexing [9, 30], virtual environment [31], medical image and radiography system [16, 34].

Image registration encounters many obstacles that arise in the variety of images. Each method must take into account the type of degradation, geometric deformations between images and the required accuracy. In this case, it is still inconceivable to achieve an appropriate universal formula for any case of image mosaic. However, most methods follow the following four steps: 1) detection of primitive, 2) primitive matching, 3) estimation of transformation model [14], 4) projecting the transformed image on the reference image. At the stage of estimation of transformation model, the automatic selection of the RANSAC method [10, 11, 25, 26] does not prevent the selection of outliers [13]; the number of iterations required by the RANSAC method affects in a direct way the execution time and also the results that are not stable for the same sample images.

In this paper, a new calculation method for the transformation model is proposed based on a geometric principle by splitting images to stitch, in a subsequent stage, into cells with VORONOI diagram [20, 33]. This work aims to enhance the contribution presented in [19] by improving both the calculation time of the geometric transformation and the projection time of images on a flat scene, in addition to proposing a new matching method which eliminates outliers. Our approach consists of three stages: the first identifies a set of interest points between the two images with the SIFT detector [22], the second is to use matched points (also called seeds) by the VORONOI diagram to subdivide images into cells. These will be matched by the ZNCC method [21, 28] to keep only highly matched ones. In the third step, we proceed to the calculation of the geometric transformation model that generally requires matching pairs in the existing methods (in the case of the homographic projection, minimum 4 matching pair points).

The random selection of points with the non-deterministic method RANSAC may, sometimes, lead to the selection of outliers, which may affect the results obtained. The robustness of our approach lies in the choice of pairs of points from the best VORNOI regions previously matched. This choice, initially, allowed us to eliminate the random selection criterion used by RANSAC and to reduce significantly the complexity and computation time as well. The final step is that of deformation of the target image against the reference image. This mosaicking process is repeated each time a new image is inserted.

The paper is organized as follows: after this introduction, the related work is reviewed in Section 2. Details of the proposed approach are presented in Section 3. Experiments and results are discussed in Section 4, and the conclusion is in Section 5.

2 Related works

Image registration [36] is a fundamental field of research in creating mosaic images. This article addresses the category of low-level feature-based mosaicing methods [12] based on two major steps. The first extracts the image characteristics such as edge, corner, pixel, color, histogram etc. Algorithms that perform this operation are numerous, and among the most popular we find FAST [27, 32], HARRIS [24], SURF [4] and SIFT [22]. The second step is intended to calculate the optimal geometric transformation between images by using a learning technique to estimate parameters of a model by random sampling of observed data such as RANSAC [10, 11, 25, 26].

Today, there are several recent studies conducted in this category. Brown M. and D.G. Lowe in their reference article [7] were among the first authors to propose a method of automatic stitching insensitive to the order of images, orientation, scale and illumination. In [17] the authors propose an improved automatic method of image mosaic based on SURF to extract the features and to match in the overlapping portions. An algorithm [18] deals with the panorama in real time for mobile systems cameras. The proposed system generates panoramic images and displays the intermediate results simultaneously on the mobile screen. The authors in [23] propose an algorithm which combines the idea of simulating affine transformation (ASIFT) [35] with the speed of the FAST detector [27, 32] and FREAK [1] providing a high quality fast image mosaic.

RANSAC [10, 11, 25, 26] is a robust algorithm capable of computing the parameters of the examined model, i.e. it can estimate the parameters accurately despite the presence of outliers among the data. However, the downside of RANSAC is that it does not have a time limit when it calculates these parameters. By limiting the maximum number of iterations, the result may not be optimal and may not be stable during a number of subsequent tests using the same data. Another disadvantage of RANSAC takes place during the estimation of specific thresholds in the model under examination. In this paper we propose a new method of mosaic which renounces the use of RANSAC by exploiting the performance of VORONOI diagram [20, 33].

3 Proposed work

In this article, a new approach to mosaicking image based on VORONOI regions is presented (Fig. 1). First interest points are extracted from the processed images then only the corresponding points are kept by the use of SIFT descriptors. Next, each image is divided into several regions by the VORONOI diagram based on the points already matched by SIFT. Then comes the stage of matching each pair of regions by ZNCC [21, 28] to determine those that have the best correlation scores. By using the seeds of these regions, we can calculate the geometric transformation between the two images. Finally, the images are deformed within a single frame to produce the image of the mosaic.

Fig. 1
figure 1

Procedure of the image mosaic of our approach

3.1 Mosaicking tools

3.1.1 Detection of interest points by SIFT

We will start with the detection of interest points and measure the matching between the two images by SIFT descriptors (Fig. 2). It is to identify distinctive points (or keys), which lead to identify an object. The detection of these points leads to setting up feature vectors whose components are specific to the item. SIFT contributed to the progress of information extraction methods in an image by providing a robust and satisfactory algorithm properties required by systems, in our case, artificial vision. We obtain a set of pair points, also called control points.

Fig. 2
figure 2

Matched pair points in image 1 and 2

3.1.2 VORONOI diagram

The VORONOI diagram is a method of dividing an image into a number of regions. In mathematics, let us place ourselves in a 2D plan called E. Let S be a set of n points in the plane, called (seeds, sites, or generators) n ≥ 3. It is to subdivide the space into regions around each point p of S. In this case, S is the set of control points from the SIFT descriptors, such as all the points in the region containing p are closer to p than any other point of S. We are therefore interested in the mediators neighboring points S. Let p and q be two points of the set S such as the mediator of these two points dividing the plane into two half-planes by the following formula:

$$ D\left(p,q\right)=\left\{M\in E,d\left(p,M\right)<d\left(q,M\right)\right\} $$
(1)

With D(p, q) the half-plane containing the point p, M represents all the points in the plane E, and D the Euclidean distance between p and the points belonging to M.

Based on control points previously detected (Fig. 2), we will apply the VORONOI diagram to divide the two images into multiple regions [20] (Fig. 3).

Fig. 3
figure 3

a Image divided by VORONOI diagram. b Identification of each pair of regions by a distinct color

The regions detected by VORONOI in an image i are written as R k i with 1 ≤ k ≤ n and n is the number of control points (Fig. 3a).

With regard to the geometrical form and the pixel count of the pairs of regions in the two images (Fig. 3b). We can already see that there is a similarity between the majority of these pairs of regions.

Fig. 4
figure 4

Result of matching two regions between two images

Fig. 5
figure 5

Overlap area between the two images with a better correlation score

Fig. 6
figure 6

Selected regions that validate the measurement correlation condition

Fig. 7
figure 7

Selected control points for the calculation of the transformation

Fig. 8
figure 8

Stitching of two images

Fig. 9
figure 9

a) Example of outliers of the two images after using SIFT. b) Elimination of outliers after using the matching based on the VORONOI diagram

Fig. 10
figure 10

Five sets of sample images

Fig. 11
figure 11

VORONOI region having the best ZNCC score

Fig. 12
figure 12

Results of the mosaic obtained by the two methods

Fig. 13
figure 13

Estimation time model of the geometric transformation by the two methods

Fig. 14
figure 14

Overlap zone (Yellow) between I1 and I2 excluded from the projection procedure

Fig. 15
figure 15

Projection time using Huang method and the present approach: a) - 0.8 threshold; b) - 0.6 Threshold

Fig. 16
figure 16figure 16

Nine real image sequences

Fig. 17
figure 17figure 17

VORONOI regions having the best ZNCC score

Fig. 18
figure 18figure 18

Image mosaic using both methods

Fig. 19
figure 19

Estimation time of the geometric transformation model by the two methods

3.1.3 Matching VORONOI regions

In this stage, we match each pair of VORONOI regions (Fig. 4) by the method (ZNCC, Zero mean Normalized Cross Correlation) [21, 28], formulated as follows:

$$ ZNCC\left({R}_i^k,\ {R}_j^k\right)=\frac{\left({R}_i^k-\overline{R_i^k}\right).\left({R}_j^k-\overline{R_j^k}\right)}{\left\Vert {R}_i^k-\overline{R_i^k}\left\Vert .\right\Vert {R}_j^k-\overline{R_j^k}\right\Vert } $$
(2)

With R k i a region of the left image and R k j a region of the right image with the same size. \( \overline{R_i^k\ } \) and \( \overline{R_j^k} \) are respectively the average of the regions R k i and R k j .

We keep in this stage only the pairs that offer a better correlation score strictly greater than 0.8.

After the matching process between the test images we obtain Fig. 5 below gives us an idea about the overlap area between the two images.

The candidate points to be used to calculate the transformation will be the seeds (generators) of VORONOI regions having the best ZNCC correlation score.

The table below illustrates the correlation scores for the first 20 pairs of regions between the two test images in descending order.

In the process of calculating the projective transformation, we need at least four pairs. In our approach, we selected the pairs of seeds belonging to the first five matched regions (those in red color) in Table 1. The fifth pair of seed couples will be used for validation of the projective transformation found by the other seed couples. Figure 6 shows the regions selected to the calculation of the transformation. Again, we observed that the regions of the left image are almost identical with those of the right image.

Table 1 Correlation score for each pair of region

3.1.4 Calculation of the geometric transformation

The next step is to calculate the geometric transformation based on control points (Seeds) of VORONOI regions (Fig. 7).

At this point, both images are related by a geometrical transformation T ij , called homography, which is a matrix of size 3 × 3 with eight degrees of freedom. Two matching points of the left and right image are related by the matrix T ij with the following formula:

$$ {m}_{ki} \sim {T}_{ij}{m}_{kj} $$
(3)

With m ki the point k of image i, m kj the point k of image j and T ij represent the geometric transformation of size 3 × 3 which connects image i and j.

In order to test our approach, we used the projective transformation model. At this stage, there is no need for RANSAC method as we already have the pairs of control points resulted from the best VORONOI regions shown in Fig. 7 to calculate the projective transformation. This will significantly reduce the time and complexity of computation (this part will be detailed in the experimental section). Eventually, the image2 is aligned with the reference image so that the pixels representing the same underlying structures are overlaid (Fig. 8).

3.2 Robustness of the proposed approach

3.2.1 Matching

The highlight of matching used in our approach, is to detect the entire overlap region (Fig. 5) by the combination of SIFT detector [22] and VORONOI diagram. This approach provides a robust and reliable matching, in the examples used (database). Similar regions in the two images representing the overlap area (regions with the highest correlation score) can reach, according to the test results obtained, approximately 46 % of the total size of the image.

3.2.2 Rate of outliers

We can see in the two images in Fig. 9a, that among the pairs points matched by SIFT, that there are two points connected by a red line constituting an outlier that can influence the results of the mosaic. In our approach, shifting the matching process from points to regions, provides a robust correspondence and eliminates potential outliers. In fact, the outlier detected by SIFT (Fig. 9a) does not belong to the overlap zone detected by our approach (Fig. 9b). In addition, the region’s correlation score in which occurs the false matching provides a poor match thus does not fall within the regions of the overlap area of the two images.

3.2.3 Calculation time

The calculation of the transformation is not based on RANSAC method, which is, as mentioned above, a non-deterministic iterative method and does not guarantee getting the real homography, nor the stability of results at each program execution. In addition, with the RANSAC method, computational complexity increases according to the image size, the number of matching points detected and the desired accuracy. However, this approach helps to overcome these problems by eliminating the phase of random selection of points to be used in the calculation of transformation (See Fig. 13 of the experiment).

3.2.4 Stability of results

We notice that with each program execution, the RANSAC algorithm is used for the same pair of images. The points selected as (best inliers), which will make up the basis of the calculation of the transformation matrix, are not the same (because of the random selection), and each time we would mostly get different results (see Table 2 above). However, in our approach, the points selected to calculate the transformation remain the same for the all four tests, thanks to the stability of the correlation scores (which remain constant) between the VORONOI regions to which the selected seeds belong.

Table 2 Points used in the calculation of transformation by RANSAC and the present approach

4 Experimentation

In this section, we have implemented our approach and Huang approach [15] (based on RANSAC) using a computer system equipped with: Intel (R) i3, 2 .2 GHz with 6 GB RAM. In addition, we compare the present approach to the one proposed by Huang [15] in terms of the mosaic quality, the execution time and the projection time optimization of two images on a flat surface. Both approaches are tested on synthetic data (Adobe System Database [6]) and real data (Fig. 16).

4.1 Simulations

To demonstrate the performance and robustness of our approach, we perform a simulation by choosing a pair of sequences from the ten categories of Adobe System Database [6] (five pairs of these are shown in Fig. 10); each sequence offers two gray-scale images of arbitrary scene. Firstly, we detected interest points by the SIFT algorithm [22] which aims to subdivide the image into VORONOI regions. Then, the matching of these regions is carried out by the correlation function ZNCC [21, 28] in order to obtain pairs of regions (having the best correlation score). By using the seeds of these regions, we calculate the geometric transformation. Finally, the projection of the two images in the same plane allows constructing a mosaic image.

After detecting the control points with the SIFT algorithm and the subdivision of images in VORONOI regions, the regions with the best ZNCC correlation score are shown in Fig. 11.

Figure 12 shows the results of the mosaic obtained by the two methods (our approach and Huang’s method).

According to Fig. 12, the results obtained by both approaches (our approach and Huang’s) are satisfactory in terms of mosaic construction. However, the difference between the two methods is the time of calculation of the geometric transformation and the projection time of the two images on a flat surface.

According to Fig. 13, the calculation of the geometric transformation by the proposed method is faster in terms of calculation time. This is valid for the entire database used. This result can be explained by the fact that our approach significantly reduces the computational complexity and solves the major problem of RANSAC method which lies in the random selection of points that will make up the basis of the estimation of geometrical transformation. In other words, the calculation of the transformation is performed only once, unlike RANSAC method, which depends on the number of iterations varying according to the required accuracy.

The projection step involves creating a single image combining the different parts of the captured scene (I1 and I2) by the projection of two images on a flat surface. Each point of the panorama is a projection of a point of the set of original images taking into account that it may be projected onto several images simultaneously by using the previously calculated transformation. The elapsed time during this step is still significant compared to the overall execution time. For this reason, in our approach we will use the results of the matching of regions obtained by VORONOI to reduce the projection time. In other words, the projection of the second image I2 on the source image I1 is performed only with the points that are not in the overlap area between the two images (Fig. 14).

According to Fig. 15a, b, it is clear that the elapsed time for the projection of two images on a flat surface, using our approach, is considerably reduced for all tests with a percentage following used images in a range of 14 to 46 % if we consider a strict threshold matching 0.8 (Fig. 15a), and from 33 to 55 % for a permissive threshold of 0.6 (Fig. 15b).

The analysis of Fig. 15a, b shows that the present approach provides satisfactory results with respect to Huang method in terms of projection time of the two images on a flat surface. After applying the geometric transformation to the second image, our approach performs only the projection of the area not included in the first image. However, Huang method projects fully the image on a plane surface. Therefore, the present approach is considerable if we want to use in real time.

4.2 Real data

The proposed method is tested on real data as well. We have taken, using a Canon EOS 450D camera, nine image sequences of arbitrary three-dimensional scene (Fig. 16). The choice of test images comes to verify the different variations that the sequences may undergo at the time of acquisition such as rotation, translation, scaling and moving objects.

After detecting the control points with SIFT algorithm and the subdivision of images in VORONOI regions, the regions having the best ZNCC correlation score are shown in Fig. 17.

After applying the geometric transformation (based on VORONOI regions) on the second image of each image pair (Fig. 17), the construction of the mosaic image is performed by our approach. The result of our method and Huang’s is presented in Fig. 18.

According to Fig. 18, the results obtained by both approaches (our approach and Huang’s) on real data are satisfactory at the mosaic construction. These results confirm the ones obtained on synthetic images.

Similarly, the estimation time of the geometrical transformation by the two methods on real data (Fig. 19) remains consistent with the results obtained from simulation data.

Figure 20 shows the elapsed time at the stage of projection of the two images on a flat surface.

Fig. 20
figure 20

Projection time using Huang method and our approach: a) - 0.8 threshold b) - 0.6 threshold

Figure 20a, b confirms the results obtained in the simulation which remain in favor of the proposed method. The reduction of projection time varies between 1.18 and 30.79 % for a threshold of 0.8 (Fig. 20a) and 2.1 to 51.59 % for a threshold of 0, 6 (Fig. 20b) according to the sequence of real images used. These values can be simply explained by the following changes: brightness, type of transformation, zoom, angle, etc. which influence the captured images. In addition, these factors act in a direct manner on the overlap area and, therefore, it influences the projection time.

4.3 Image blending

Because of the variations experienced by images at the time of acquisition (such as color intensity due to lighting conditions, scaling and many others), it is possible to have transitions between the images that form the mosaic image. In this case it is possible to use post-processing methods such as multi-band blending [2, 7, 8], which proved to be particularly effective for the mosaic image. Figure 21 represents a set of 4 pictures after applying the blending method, and the obtained results are satisfactory in terms of eliminating color discontinuities and phantom objects.

Fig. 21
figure 21

Results of multi-band blending technique

5 Conclusion

In this paper, we developed a new method of image mosaic based on VORONOI regions. This method involves extracting the control points from two images by SIFT algorithm. Then these points are used to subdivide the images into VORONOI regions. The VORONOI regions matching step is performed through the correlation measurement ZNCC. We use the seeds of the regions with the highest correlation score to calculate the geometric transformation between the two images. In fact, this choice allows the optimization of the geometrical transformation estimation in terms of two elements: time and stability of results. Finally, the technique used to project images on a plane surface allows us to reduce the projection time considerably. The robustness of the proposed method in terms of simplicity, stability and quality of the mosaic image was proved by the results of experiments on synthetic data (images of Adobe System Database) and on real data.