Abstract
In this article, we propose a new method of image stitching that computes, in a robust manner, the transformation model applied to creating a panorama that is close to reality. The random selection of matching points used in existing methods, using Random Sample Consensus (RANSAC) or the threshold of the execution process (iteration number) cannot generally provide sufficient precision. Our approach, in this regard, comes to solve this problem. The calculation of the transformation model is based on the VORONOI diagram that divides images into regions to be used in the matching instead of control points. In this case, the transformation estimation will be based on the regions seeds that provide the best correlation score. Among the advantages of our method is solving problems related to outliers that can, in existing methods, affect the reliability of the mosaic. The results obtained are satisfactory in terms of stability, quality, execution time and reduction of the computational complexity.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
The technology of image mosaic is a recent axis of research within the field of computer vision. This new technology is about getting a panoramic view from a sequence of images. Mosaic images can be used in a number of computer vision applications and computer graphics, such as the construction of large aerial and satellite photographs using image collections [29], motion detection [3, 5], video indexing [9, 30], virtual environment [31], medical image and radiography system [16, 34].
Image registration encounters many obstacles that arise in the variety of images. Each method must take into account the type of degradation, geometric deformations between images and the required accuracy. In this case, it is still inconceivable to achieve an appropriate universal formula for any case of image mosaic. However, most methods follow the following four steps: 1) detection of primitive, 2) primitive matching, 3) estimation of transformation model [14], 4) projecting the transformed image on the reference image. At the stage of estimation of transformation model, the automatic selection of the RANSAC method [10, 11, 25, 26] does not prevent the selection of outliers [13]; the number of iterations required by the RANSAC method affects in a direct way the execution time and also the results that are not stable for the same sample images.
In this paper, a new calculation method for the transformation model is proposed based on a geometric principle by splitting images to stitch, in a subsequent stage, into cells with VORONOI diagram [20, 33]. This work aims to enhance the contribution presented in [19] by improving both the calculation time of the geometric transformation and the projection time of images on a flat scene, in addition to proposing a new matching method which eliminates outliers. Our approach consists of three stages: the first identifies a set of interest points between the two images with the SIFT detector [22], the second is to use matched points (also called seeds) by the VORONOI diagram to subdivide images into cells. These will be matched by the ZNCC method [21, 28] to keep only highly matched ones. In the third step, we proceed to the calculation of the geometric transformation model that generally requires matching pairs in the existing methods (in the case of the homographic projection, minimum 4 matching pair points).
The random selection of points with the non-deterministic method RANSAC may, sometimes, lead to the selection of outliers, which may affect the results obtained. The robustness of our approach lies in the choice of pairs of points from the best VORNOI regions previously matched. This choice, initially, allowed us to eliminate the random selection criterion used by RANSAC and to reduce significantly the complexity and computation time as well. The final step is that of deformation of the target image against the reference image. This mosaicking process is repeated each time a new image is inserted.
The paper is organized as follows: after this introduction, the related work is reviewed in Section 2. Details of the proposed approach are presented in Section 3. Experiments and results are discussed in Section 4, and the conclusion is in Section 5.
2 Related works
Image registration [36] is a fundamental field of research in creating mosaic images. This article addresses the category of low-level feature-based mosaicing methods [12] based on two major steps. The first extracts the image characteristics such as edge, corner, pixel, color, histogram etc. Algorithms that perform this operation are numerous, and among the most popular we find FAST [27, 32], HARRIS [24], SURF [4] and SIFT [22]. The second step is intended to calculate the optimal geometric transformation between images by using a learning technique to estimate parameters of a model by random sampling of observed data such as RANSAC [10, 11, 25, 26].
Today, there are several recent studies conducted in this category. Brown M. and D.G. Lowe in their reference article [7] were among the first authors to propose a method of automatic stitching insensitive to the order of images, orientation, scale and illumination. In [17] the authors propose an improved automatic method of image mosaic based on SURF to extract the features and to match in the overlapping portions. An algorithm [18] deals with the panorama in real time for mobile systems cameras. The proposed system generates panoramic images and displays the intermediate results simultaneously on the mobile screen. The authors in [23] propose an algorithm which combines the idea of simulating affine transformation (ASIFT) [35] with the speed of the FAST detector [27, 32] and FREAK [1] providing a high quality fast image mosaic.
RANSAC [10, 11, 25, 26] is a robust algorithm capable of computing the parameters of the examined model, i.e. it can estimate the parameters accurately despite the presence of outliers among the data. However, the downside of RANSAC is that it does not have a time limit when it calculates these parameters. By limiting the maximum number of iterations, the result may not be optimal and may not be stable during a number of subsequent tests using the same data. Another disadvantage of RANSAC takes place during the estimation of specific thresholds in the model under examination. In this paper we propose a new method of mosaic which renounces the use of RANSAC by exploiting the performance of VORONOI diagram [20, 33].
3 Proposed work
In this article, a new approach to mosaicking image based on VORONOI regions is presented (Fig. 1). First interest points are extracted from the processed images then only the corresponding points are kept by the use of SIFT descriptors. Next, each image is divided into several regions by the VORONOI diagram based on the points already matched by SIFT. Then comes the stage of matching each pair of regions by ZNCC [21, 28] to determine those that have the best correlation scores. By using the seeds of these regions, we can calculate the geometric transformation between the two images. Finally, the images are deformed within a single frame to produce the image of the mosaic.
3.1 Mosaicking tools
3.1.1 Detection of interest points by SIFT
We will start with the detection of interest points and measure the matching between the two images by SIFT descriptors (Fig. 2). It is to identify distinctive points (or keys), which lead to identify an object. The detection of these points leads to setting up feature vectors whose components are specific to the item. SIFT contributed to the progress of information extraction methods in an image by providing a robust and satisfactory algorithm properties required by systems, in our case, artificial vision. We obtain a set of pair points, also called control points.
3.1.2 VORONOI diagram
The VORONOI diagram is a method of dividing an image into a number of regions. In mathematics, let us place ourselves in a 2D plan called E. Let S be a set of n points in the plane, called (seeds, sites, or generators) n ≥ 3. It is to subdivide the space into regions around each point p of S. In this case, S is the set of control points from the SIFT descriptors, such as all the points in the region containing p are closer to p than any other point of S. We are therefore interested in the mediators neighboring points S. Let p and q be two points of the set S such as the mediator of these two points dividing the plane into two half-planes by the following formula:
With D(p, q) the half-plane containing the point p, M represents all the points in the plane E, and D the Euclidean distance between p and the points belonging to M.
Based on control points previously detected (Fig. 2), we will apply the VORONOI diagram to divide the two images into multiple regions [20] (Fig. 3).
The regions detected by VORONOI in an image i are written as R k i with 1 ≤ k ≤ n and n is the number of control points (Fig. 3a).
With regard to the geometrical form and the pixel count of the pairs of regions in the two images (Fig. 3b). We can already see that there is a similarity between the majority of these pairs of regions.
3.1.3 Matching VORONOI regions
In this stage, we match each pair of VORONOI regions (Fig. 4) by the method (ZNCC, Zero mean Normalized Cross Correlation) [21, 28], formulated as follows:
With R k i a region of the left image and R k j a region of the right image with the same size. \( \overline{R_i^k\ } \) and \( \overline{R_j^k} \) are respectively the average of the regions R k i and R k j .
We keep in this stage only the pairs that offer a better correlation score strictly greater than 0.8.
After the matching process between the test images we obtain Fig. 5 below gives us an idea about the overlap area between the two images.
The candidate points to be used to calculate the transformation will be the seeds (generators) of VORONOI regions having the best ZNCC correlation score.
The table below illustrates the correlation scores for the first 20 pairs of regions between the two test images in descending order.
In the process of calculating the projective transformation, we need at least four pairs. In our approach, we selected the pairs of seeds belonging to the first five matched regions (those in red color) in Table 1. The fifth pair of seed couples will be used for validation of the projective transformation found by the other seed couples. Figure 6 shows the regions selected to the calculation of the transformation. Again, we observed that the regions of the left image are almost identical with those of the right image.
3.1.4 Calculation of the geometric transformation
The next step is to calculate the geometric transformation based on control points (Seeds) of VORONOI regions (Fig. 7).
At this point, both images are related by a geometrical transformation T ij , called homography, which is a matrix of size 3 × 3 with eight degrees of freedom. Two matching points of the left and right image are related by the matrix T ij with the following formula:
With m ki the point k of image i, m kj the point k of image j and T ij represent the geometric transformation of size 3 × 3 which connects image i and j.
In order to test our approach, we used the projective transformation model. At this stage, there is no need for RANSAC method as we already have the pairs of control points resulted from the best VORONOI regions shown in Fig. 7 to calculate the projective transformation. This will significantly reduce the time and complexity of computation (this part will be detailed in the experimental section). Eventually, the image2 is aligned with the reference image so that the pixels representing the same underlying structures are overlaid (Fig. 8).
3.2 Robustness of the proposed approach
3.2.1 Matching
The highlight of matching used in our approach, is to detect the entire overlap region (Fig. 5) by the combination of SIFT detector [22] and VORONOI diagram. This approach provides a robust and reliable matching, in the examples used (database). Similar regions in the two images representing the overlap area (regions with the highest correlation score) can reach, according to the test results obtained, approximately 46 % of the total size of the image.
3.2.2 Rate of outliers
We can see in the two images in Fig. 9a, that among the pairs points matched by SIFT, that there are two points connected by a red line constituting an outlier that can influence the results of the mosaic. In our approach, shifting the matching process from points to regions, provides a robust correspondence and eliminates potential outliers. In fact, the outlier detected by SIFT (Fig. 9a) does not belong to the overlap zone detected by our approach (Fig. 9b). In addition, the region’s correlation score in which occurs the false matching provides a poor match thus does not fall within the regions of the overlap area of the two images.
3.2.3 Calculation time
The calculation of the transformation is not based on RANSAC method, which is, as mentioned above, a non-deterministic iterative method and does not guarantee getting the real homography, nor the stability of results at each program execution. In addition, with the RANSAC method, computational complexity increases according to the image size, the number of matching points detected and the desired accuracy. However, this approach helps to overcome these problems by eliminating the phase of random selection of points to be used in the calculation of transformation (See Fig. 13 of the experiment).
3.2.4 Stability of results
We notice that with each program execution, the RANSAC algorithm is used for the same pair of images. The points selected as (best inliers), which will make up the basis of the calculation of the transformation matrix, are not the same (because of the random selection), and each time we would mostly get different results (see Table 2 above). However, in our approach, the points selected to calculate the transformation remain the same for the all four tests, thanks to the stability of the correlation scores (which remain constant) between the VORONOI regions to which the selected seeds belong.
4 Experimentation
In this section, we have implemented our approach and Huang approach [15] (based on RANSAC) using a computer system equipped with: Intel (R) i3, 2 .2 GHz with 6 GB RAM. In addition, we compare the present approach to the one proposed by Huang [15] in terms of the mosaic quality, the execution time and the projection time optimization of two images on a flat surface. Both approaches are tested on synthetic data (Adobe System Database [6]) and real data (Fig. 16).
4.1 Simulations
To demonstrate the performance and robustness of our approach, we perform a simulation by choosing a pair of sequences from the ten categories of Adobe System Database [6] (five pairs of these are shown in Fig. 10); each sequence offers two gray-scale images of arbitrary scene. Firstly, we detected interest points by the SIFT algorithm [22] which aims to subdivide the image into VORONOI regions. Then, the matching of these regions is carried out by the correlation function ZNCC [21, 28] in order to obtain pairs of regions (having the best correlation score). By using the seeds of these regions, we calculate the geometric transformation. Finally, the projection of the two images in the same plane allows constructing a mosaic image.
After detecting the control points with the SIFT algorithm and the subdivision of images in VORONOI regions, the regions with the best ZNCC correlation score are shown in Fig. 11.
Figure 12 shows the results of the mosaic obtained by the two methods (our approach and Huang’s method).
According to Fig. 12, the results obtained by both approaches (our approach and Huang’s) are satisfactory in terms of mosaic construction. However, the difference between the two methods is the time of calculation of the geometric transformation and the projection time of the two images on a flat surface.
According to Fig. 13, the calculation of the geometric transformation by the proposed method is faster in terms of calculation time. This is valid for the entire database used. This result can be explained by the fact that our approach significantly reduces the computational complexity and solves the major problem of RANSAC method which lies in the random selection of points that will make up the basis of the estimation of geometrical transformation. In other words, the calculation of the transformation is performed only once, unlike RANSAC method, which depends on the number of iterations varying according to the required accuracy.
The projection step involves creating a single image combining the different parts of the captured scene (I1 and I2) by the projection of two images on a flat surface. Each point of the panorama is a projection of a point of the set of original images taking into account that it may be projected onto several images simultaneously by using the previously calculated transformation. The elapsed time during this step is still significant compared to the overall execution time. For this reason, in our approach we will use the results of the matching of regions obtained by VORONOI to reduce the projection time. In other words, the projection of the second image I2 on the source image I1 is performed only with the points that are not in the overlap area between the two images (Fig. 14).
According to Fig. 15a, b, it is clear that the elapsed time for the projection of two images on a flat surface, using our approach, is considerably reduced for all tests with a percentage following used images in a range of 14 to 46 % if we consider a strict threshold matching 0.8 (Fig. 15a), and from 33 to 55 % for a permissive threshold of 0.6 (Fig. 15b).
The analysis of Fig. 15a, b shows that the present approach provides satisfactory results with respect to Huang method in terms of projection time of the two images on a flat surface. After applying the geometric transformation to the second image, our approach performs only the projection of the area not included in the first image. However, Huang method projects fully the image on a plane surface. Therefore, the present approach is considerable if we want to use in real time.
4.2 Real data
The proposed method is tested on real data as well. We have taken, using a Canon EOS 450D camera, nine image sequences of arbitrary three-dimensional scene (Fig. 16). The choice of test images comes to verify the different variations that the sequences may undergo at the time of acquisition such as rotation, translation, scaling and moving objects.
After detecting the control points with SIFT algorithm and the subdivision of images in VORONOI regions, the regions having the best ZNCC correlation score are shown in Fig. 17.
After applying the geometric transformation (based on VORONOI regions) on the second image of each image pair (Fig. 17), the construction of the mosaic image is performed by our approach. The result of our method and Huang’s is presented in Fig. 18.
According to Fig. 18, the results obtained by both approaches (our approach and Huang’s) on real data are satisfactory at the mosaic construction. These results confirm the ones obtained on synthetic images.
Similarly, the estimation time of the geometrical transformation by the two methods on real data (Fig. 19) remains consistent with the results obtained from simulation data.
Figure 20 shows the elapsed time at the stage of projection of the two images on a flat surface.
Figure 20a, b confirms the results obtained in the simulation which remain in favor of the proposed method. The reduction of projection time varies between 1.18 and 30.79 % for a threshold of 0.8 (Fig. 20a) and 2.1 to 51.59 % for a threshold of 0, 6 (Fig. 20b) according to the sequence of real images used. These values can be simply explained by the following changes: brightness, type of transformation, zoom, angle, etc. which influence the captured images. In addition, these factors act in a direct manner on the overlap area and, therefore, it influences the projection time.
4.3 Image blending
Because of the variations experienced by images at the time of acquisition (such as color intensity due to lighting conditions, scaling and many others), it is possible to have transitions between the images that form the mosaic image. In this case it is possible to use post-processing methods such as multi-band blending [2, 7, 8], which proved to be particularly effective for the mosaic image. Figure 21 represents a set of 4 pictures after applying the blending method, and the obtained results are satisfactory in terms of eliminating color discontinuities and phantom objects.
5 Conclusion
In this paper, we developed a new method of image mosaic based on VORONOI regions. This method involves extracting the control points from two images by SIFT algorithm. Then these points are used to subdivide the images into VORONOI regions. The VORONOI regions matching step is performed through the correlation measurement ZNCC. We use the seeds of the regions with the highest correlation score to calculate the geometric transformation between the two images. In fact, this choice allows the optimization of the geometrical transformation estimation in terms of two elements: time and stability of results. Finally, the technique used to project images on a plane surface allows us to reduce the projection time considerably. The robustness of the proposed method in terms of simplicity, stability and quality of the mosaic image was proved by the results of experiments on synthetic data (images of Adobe System Database) and on real data.
References
Alahi A, Ortiz R, Vandergheynst P (2012) Freak: fast retina keypoint. In Computer Vision and Pattern Recognition (CVPR), 2012 I.E. Conference on, 510–517. Ieee. doi:10.1109/CVPR.2012.6247715
Allène C, Pons JP, Keriven R (2008) Seamless image-based texture atlases using multi-band blending. In Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, 1–4. IEEE. doi:10.1109/ICPR.2008.4761913
Azzari P, Stefano LD, Bevilacqua A (2005) An effective real-time mosaicing algorithm apt to detect motion through background subtraction using a PTZ camera. Adv Video Signal Based Surveillance. AVSS 2005. IEEE Conf, 511–516. IEEE. doi:10.1109/AVSS.2005.1577321
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359. doi:10.1016/j.cviu.2007.09.014
Bevilacqua A, Azzari P (2007) A fast and reliable image mosaicing technique with application to wide area motion detection. Image Analysis Recognition. Springer Berlin Heidelberg. 501–512. doi:10.1007/978-3-540-74260-9_45
Brandt J (2010) Transform coding for fast approximate nearest neighbor search in high dimensions. IEEE Conf. Computer Vision Pattern Recognition. [Data file]. Retrieved from Adobe System: http://www.adobe.com/go/datasets
Brown M, Lowe DG (2007) Automatic panoramic image stitching using invariant features. Int J Comput Vis 74(1):59–73. doi:10.1007/s11263-006-0002-3
Burt PJ, Adelson EH (1983) A multiresolution spline with application to image mosaics. ACM Transactions on Graphics (TOG) 2(4):217–236. doi:10.1145/245.247
Choi YH, Seong YK, Choi TS (2002) Image mosaicing with automatic scene segmentation for video indexing. Int Conf Consumer Electronics, 74–75. doi:10.1109/ICCE.2002.1013933
Fang X, Zhu J, Luo B (2012) Image mosaic with relaxed motion. SIViP 6(4):647–667. doi:10.1007/s11760-010-0194-4
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24:381–395. doi:10.1145/358669.358692
Ghosh D, Kaabouch N (2016) A survey on image mosaicing techniques. J Vis Commun Image Represent 34:1–11. doi:10.1016/j.jvcir.2015.10.014
Hodge VJ, Austin JA (2004) Survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126. doi:10.1007/s10462-004-4304-y
Hu J, Deng W, Guo J (2011) 2D projective transformation based active shape model for facial feature location. Eighth Int Conf Fuzzy Syst Knowl Discov 4:2442–2446. doi:10.1109/FSKD.2011.6019993
Huang W, Han X (2013) An improved RANSAC algorithm of color image stitching. Proc Chinese Intell Automation. 21–28. doi:10.1007/978-3-642-38466-0_3
Jalink A, McAdoo J, Halama G, Liu H (1996) CCD mosaic technique for large-field digital mammography. Med Imaging, IEEE Trans 15(3):260–267. doi:10.1109/42.500135
Jing N, Fan Y, Lingyi S (2013) Improved method of automatic image stitching based on SURF. Future Info Commun Technol Ubiquitous HealthCare (Ubi-HealthTech), 2013 First Int Symp, 1–5. IEEE. doi:10.1109/Ubi-HealthTech.2013.6708059
Kim BS, Lee SH, Cho NI (2011) Real-time panorama canvas of natural images. Consumer Electronics, IEEE Trans 57(4):1961–1968. doi:10.1109/TCE.2011.6131177
Laraqui A, Saaidi A, Jarrar A, Satori K (2014) Image stitching based on the geometric solution. Info Sci Tech (CIST), Third IEEE Int Colloquium IEEE, 340–344. doi:10.1109/CIST.2014.7016643
Laraqui M, Saaidi A, Mouhib A, Abarkan M (2015) Images matching using voronoï regions propagation. 3D Res 6(3):1–16. doi:10.1007/s13319-015-0056-5
Lhuillier M, Quan L (2002) Quasi-dense reconstruction from image sequence. In Computer Vision—ECCV 2002, 125–139. Springer Berlin Heidelberg. doi:10.1007/3-540-47967-8_9
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60:91–110. doi:10.1023/B:VISI.0000029664.99615.94
Ma X, Liu D, Zhang J, Xin J (2015) A fast affine-invariant features for image stitching under large viewpoint changes. Neurocomputing 151:1430–1438. doi:10.1016/j.neucom.2014.10.045
Mikolajczyk K, Schmid C (2002) An affine invariant interest point detector. In Computer Vision—ECCV 2002, 128–142. Springer Berlin Heidelberg. doi:10.1007/3-540-47969-4_9
Montijano E, Martinez S, Sagues C (2015) Distributed robust consensus using RANSAC and dynamic opinions. Control Syst Technol, IEEE Trans 23(1):150–163. doi:10.1109/TCST.2014.2317771
Raguram R, Frahm JM, Pollefeys M (2008) A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus. vol. 5303 LNCS, no. PART 2, 500–513. doi: 10.1007/978-3-540-88688-4_37
Rosten E, Porter R, Drummond T (2010) Faster and better: a machine learning approach to corner detection. IEEE Trans Pattern Anal Mach Intell 32(1):105–119. doi:10.1109/TPAMI.2008.275
Saaidi A, Tairi H, Satori K (2006) Fast stereo matching using rectification and correlation techniques. ISCCSP, Second Int Symp Commun, Control Signal Proc, 1–4
Sali E, Wolfson H (1992) Texture classification in aerial photographs and satellite data. Int J Remote Sensing, 3395–3408. doi:10.1080/01431169208904130
Sooknanan K, Kokaram A, Corrigan D, Baugh G, Harte N, Wilson J (2012) Indexing and selection of well-lit details in underwater video mosaics using vignetting estimation. In OCEANS, 2012-Yeosu, 1–7. IEEE. doi:10.1109/OCEANS-Yeosu.2012.6263541
Szeliski R (2002) Video mosaics for virtual environments, IEEE Comput Graphics Appl, 22–30. doi:10.1109/38.486677
Trajković M, Hedley M (1998) Fast corner detection. Image Vis Comput 16(2):75–87. doi:10.1016/S0262-8856(97)00056-5
Wang X, Ying X, Liu Y-J, Xin S-Q, Wang W, Gu X, Mueller-Wittig W, He Y (2015) Intrinsic computation of centroidal Voronoi tessellation (CVT) on meshes. Comput Des, 51–61. doi:10.1016/j.cad.2014.08.023
Wei GQ, Qian J, Schramm HF, Novak CL (2003) Method for intensity correction in CR mosaic by combined nonlinear and linear transformations. Med Imaging 2003. Int Soc Optics Photonics. 979–985. doi:10.1117/12.480830
Yu G, Morel JM (2011) Asift: an algorithm for fully affine invariant comparison. Image Proc On Line, 2011. doi:10.5201/ipol.2011.my-asift
Zitova B, Flusser J (2003) Image registration methods: a survey. Image Vis Comput 21(11):977–1000. doi:10.1016/S0262-8856(03)00137-9
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Laraqui, A., Baataoui, A., Saaidi, A. et al. Image mosaicing using voronoi diagram. Multimed Tools Appl 76, 8803–8829 (2017). https://doi.org/10.1007/s11042-016-3478-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3478-z