1 Introduction

Native capturing panoramic images requires wide-angle lenses and other specialized equipments that are very expensive. Sometimes it is still difficult to obtain panoramic image such as 360 ° panoramic image. Fortunately, it is possible for us to generate a panorama from a collection of images with overlapping fields of view by means of image mosaic techniques, which can present information more effectively by enhancing image resolution and reduce redundant information. Nowadays, image mosaic has been an extensive research field and many commercial applications have been developed [4, 14]. Image mosaic is playing a more and more important role and has been widely applied in computer vision [18], robot navigation [5] and virtual reality [15].

Many researchers have done a lot of researches on image mosaic, and most of the algorithms are based on the local feature because of its efficiency and robustness. One of the most classic methods is proposed by Matthew and David G. Lowe [2]. In their works, they formulated stitching as a multi-image matching problem. Due to the use of invariant local features, the method is insensitive of ordering, orientation, scale and the illumination of the input images. Hui Zhou presented a graph-based method in [25], introducing a new thinking of image mosaic. Other graph based image stitching methods have been proposed. Minimum spanning tree provided us a good way to reduce the image registration error. Gong [24] proposed a method based on minimum routing cost spanning tree to calculate global optimum position of every image by constructing the minimum routing cost spanning tree of the mosaicking graph and to create panoramic image. While optimal image mosaicking has large complexity, Nikos Nikolaidis and Ioannis Pitas [13] proposed two methods, which require less computation time by performing mosaicking in pairs of two sub-images at a time, without significant reconstruction losses.

In this paper, we describe an automatic image mosaic method based on graph model, which can also be used to generate 360-degrees panoramic image. Meanwhile, this algorithm has several advantages over another graph-based image mosaic method proposed by Zhou [25]. Firstly, it is a completely automatic method without user-specified order of images. Secondly, we use optimized Dijkstra algorithm to minimize the global error, which will save far more time than the above mentioned algorithm and finish stitching hundreds of images in limited time. Thirdly, it raises the image mosaic success rate by introducing the registration graph, which is construction by the Matching Mean Square Error. Lastly, we extend our work to generate 360-degrees panoramic images and the results are generally better than autostitch [2].

The following parts of this paper is organized as follow: Section 2 introduces the workflow of image mosaic and a brief analysis of the problems. Section 3 elaborates the algorithm we proposed in details and some experimental results are shown in Section 4. The paper is ended with some conclusion in Section 5.

2 Overview of local feature based algorithm

2.1 Workflow of image stitching

Nowadays, the most frequently used image mosaic algorithm is local feature based algorithm, which is more accurate and efficient [16]. This type of algorithms is basically made up of four steps: feature extraction, feature matching&false matches discarding, image alignment and image blending.

  1. (1)

    Feature Extraction

    The first step is to detect interest regions and compute their local descriptor. Many local descriptors have been invented. For example, a scale-invariant blob detector was developed by Lindeberg [10], where a blob is a circle centered at a maximum point of the normalized Laplacian measure and with radius proportional to the scale. By approximating the Laplacian with Difference-of-Gaussian (DOG) filters, D. Lowe [11] detected blobs in scale space efficiently. Later, another local descriptor called SURF was proposed by Bay H [1] based on Hessian matrix with a basic approximation to Gaussian second order filter. It relied on integral images to reduce the computation time. There are also other local feature descriptors, such as GLOH [12], LOIP [21], BRIEF [3] etc. Mikolajczyk and Schmid [12] compared the performance of different kinds of local descriptors computed for interest regions detected by many types of detectors, and their experimental results shows that the SIFT-based descriptors perform best. Because of the advantage of the SIFT-base descriptors, many image stitching algorithms are based on local features [9, 22, 23].

  2. (2)

    Feature Match and False Match Discarding

    The second step is to match the descriptors from two images and discard false matches. After the extracting local features in two images, we need to match the features between them in order to get their transformation relationship. By calculating the Euclidean distance, the closest features can be obtained by k-d tree, BBF [11] or other data structures. Perhaps the challenging problem is how to discard the mismatches, which is also an active research field. Many methods have been proposed, among which RANSAC is one of the most widely used methods to discard mismatches.

  3. (3)

    Image Alignment

    As for the third step, all the images will be transformed into a same coordinate system by the homography matrix derived from corresponding two images. It is the step that our work focuses on.

    In the third step, there are two ways to align all the images: frame-to-mosaic and frame-to-frame [25]. The first method only works well only when there are sufficient overlapping areas among images. With respect to the second method, however, the main problem is how to minimize the global error. Since the registration is done in a pairwise way locally, small registration error between images will be accumulated and amplified when computing the homography matrix between one image and another. Many solutions of the problem have been proposed. A widely used optimization technique is the bundle adjustment first proposed by Triggs [19], in which all the parameters can be estimated by Levenberg-Marquardt. Hui Zhou also proposed an algorithm used for optimization of registration for a set of images with multiple rows and columns [25].

  4. (4)

    Image Blending

    After all the images are aligned, in most cases, however, it is not a perfect panorama, and there may still exist some seams at the boarder of each image. What we should do is to blend the image. Fade-in-fade-out is the most frequently used blending algorithm because of its convenience [6].

    As it’s shown in Fig. 1, for the matching points I 1 (x,y) and I 2 (x,y) in the overlapping area, this algorithm assigns two weight coefficient to the two points, which are related to the position of the point in the overlapping area.

    Fig. 1
    figure 1

    Fade-in-fade-out scheme

2.2 General problems

2.2.1 Cylindrical problem

To align the images, we need to transform one image to the coordinate system of another image by the means of correctly matched local features. But when processing with cylindrical panorama, the image will be warped because of the change of the angle of view. To construct a plausible 360-degree view, Szeliski and Shum provided a method to stitch cylindrical panoramas. This type of panoramas maps pixels to a virtual cylindrical coordinate. Many non-planar transformation models have been developed in the literatures [8, 17], such as cylindrical projection, spherical projection, and cubic projection. As shown in Fig. 2, from this correspondence, we can compute the cylindrical coordinate from the planar coordinate. A point on the cylinder can be presented in polar coordinate, such as (h, theta), h is the height of the point, and theta is the angle from x axis. Considering a point P(x,y,z) in space, when this point gets projected to a unit radius cylinder, we can say the projection point is specified by (0, theta) on the surface of the cylinder. By similar triangles, in the three dimensional coordinate, we have: (sinθ, h, cosθ) ∝ (x, y, f). Here, theta and h are the parameters mentioned in polar coordinate, and f is the focal length of the camera.

Fig. 2
figure 2

Projection from 3D space to cylindrical coordinates

According to this relationship, formula that maps a point to its projection on the cylindrical surface can be derived as

$$ \begin{array}{cc}\hfill {x}^{+}=s\;ta{n}^{-1}\frac{x}{f}=s\theta, \hfill & \hfill {y}^{+}=s\frac{y}{\sqrt{x^2+{f}^2}}=sh\hfill \end{array} $$
(1)

Where s is an arbitrary scaling factor, also called the radius of the cylinder, x + and y + are the projected result on the cylinder. The scaling factor s can be set as s = f (focal length of camera) to minimize the distortion of the image near its center. If the focal length of the camera or field of view is known, each image can be warped into cylindrical coordinates by using Eq. (1).

2.2.2 Looping path problem

In multi-image registration, a minor error between images can be accumulated and amplified. Even though the error between two adjacent images is very small, and the good alignment is achieved between successive images, the error will cumulate, thus causing poor alignment when the image path follows a loop [7]. As a result, the same area of the panorama in different images can be misaligned.

Because of the difference between images, it is certain that there are errors between two transformed images. By multiplying the homography matrix along the path from each image to the reference image, we can get the direct homography matrix for any transformation image and reference image pair. It is hypothesized that image I 1 and I 2 are adjacent, and I 2 and I 3 are adjacent, we can obtain the transform matrix M 1, 2 and M 2, 3 by estimating the parameters of these images. Here M i,j is the transformation matrix from I i to I j . Then the transform matrix M 1, 3 can be calculated by M 1, 3  = M 2, 3 * M 1, 2 , but the matrix M 1, 2 and M 2, 3 may not be accurate. Suppose we have a small error in the transform matrix:

$$ {M}_{1,2}=\overline{M_{1,2}}+\varDelta {M}_{1,2} $$
(2)
$$ {M}_{2,3}=\overline{M_{2,3}}+\varDelta {M}_{2,3} $$
(3)
$$ {M}_{1,3}=\overline{M_{1,3}}+\varDelta {M}_{1,3} $$
(4)

Where M 1,2 represents the calculated transform matrix, \( \overline{M_{1,2}} \) represents the real transform matrix and ΔM 1,2 represents the error of the matrix.

Because M 1,3  = M 2,3 *M 1,2 , by this formula, we can get:

$$ \left(\overline{M_{1,3}}+\varDelta {M}_{1,3}\right)=\left(\overline{M_{2,3}}+\varDelta {M}_{2,3}\right)\ast \left(\overline{M_{1,2}}+\varDelta {M}_{1,2}\right) $$
(5)

Because \( \overline{M_{1,3}}=\overline{M_{2,3}}\ast \overline{M_{1,2}} \), by replacing them, we can get

$$ \varDelta {M}_{1,3}=\varDelta {M}_{2,3}\ast \overline{M_{1,2}}+\overline{M_{2,3}}\ast \varDelta {M}_{1,2}+\varDelta {M}_{2,3}\ast \varDelta {M}_{1,2} $$
(6)

According to this formula, we know that through the path from registration image to the reference image, the error of each transform matrix, even though it is small, will accumulate to a big error and amplify.

Hui Zhou proposed an algorithm used for optimization of registration for a set of images with multiple rows and columns [25]. In his method, registration graph is introduced, and he tries to reduce the registration error globally. However, there are some shortcomings in his approach. Firstly, this approach is under the assumption that the layout of the image set is defined based on 4-neighbour connectivity and there is no isolated images. In fact, the images to be stitched are usually not 4-neighboured and some noise images may exist. Secondly, this approach intends to find the shortest path from registration image to reference image and it doesn’t take it into consideration that big error may occur in the shortest path. Thirdly, the order of the images should be given before aligning the images, but in most cases, it is difficult and inconvenient for people to specify the order.

3 Proposed algorithm

3.1 Feature extraction and images matching

In panoramic recognition algorithm, SIFT features [15] should be extracted and matched in all the pairs of the images. We use the SIFT method because it shows the best performance [12]. Consider that there are two images named I 1 and I 2 , SIFT features have been extracted and the next step is to judge whether they have overlapping area by matching the features. RANSAC is used to maximize the numbers of inliers and by picking 4 of these matches we can find the homography H between them. It is not reliable to judg whether the two images are matched only by the number of matches. In this paper, we judge it by the Match Mean Square error (MMSE) whichstands for the displacement of the two images. It can be derived as follows:

$$ MMSE = \frac{{\displaystyle {\sum}_{n\in W(d)}}\left|{I}_1(n)-{I}_2(n)\right|}{W(d)} $$
(7)

Where W(d) denotes the common area of the two images. I 1 (n), I 2 (n) denotes the gray scale of the image. And then we set a threshold of value ε. If the MMSE between the two images is smaller than ε, the two images are matched. Here, we make ε. equals to 130 by our experience.

3.2 Registration graph construction

As mentioned in 4.1, if we get the homography H 1 from I 1 to I 2 and H 2 from I 2 to I 3 , we can obtain H 3 from I 1 to I 3 by multiplying H2 and H1. So we can transform from I n to I m if there is a transform chain between I n and I m . As for a panorama, if all the images can be transformed to the same coordinate system, they can be aligned. As for panorama with multiple rows and columns, graph-model is introduced.

The images to be aligned can be regarded as the vertex of the graph, and the homography relationship between two images can be regarded as the edge of the graph. The edge is directed because the transform matrix of image A to B and B to A are different. What’s more, transform matrix of image A to B is not the inverse of B to A because of the existence of the error. If there are n images to be aligned, n*(n-1) pairs of images will be checked to generate edges. For each pair, we calculate the MMSE and homography matrix from one to another. If MMSE is smaller than ε (this pair is matched), a directed edge is added and the MMSE and homography matrix is the information of the edge. So we have constructed the registration graph automatically. Figures 3 and 4 below shows the registration graph. The vertexes denote the images and the edges denote the relationship between images.

Fig. 3
figure 3

Illustration of the registration graph

Fig. 4
figure 4

a Unordered images. b The constructed graph

After the graph is constructed, we can solve the problems by graph theory. As for the existing noise images, we can eliminate them by computing the strongly connected components. For example, in Fig. 4, there are three strongly connected components, and we can process them separately and output three panoramas. Thus noise images are eliminated from each panorama.

3.3 Images alignment

Based on the previously built graph, the next step is to transform all the images into the same coordinate system with the smallest global error. Zhou H thinks there will be less errors if the transform path is shorter [25]. He chooses one image as the reference image so that the sum of path length of each image to the reference image is shortest (Shortest Path Algorithm) and uses Floyd algorithm to find the shortest path, but he doesn’t take it into consideration that if there is a major error on the shortest path, it will cause great error for all the images along the path.

Our algorithm is based on the MMSE. It provides us with the MMSE between two related images in the images matching step. The goal is to find a set of tranformation matrixes D opt , every transformation matrix in D opt represents the transformation from the real image to the reference image, such that:

$$ {D}_{opt} = arg\underset{D}{ \min }{\displaystyle \sum_{image\ i}} MMSE $$
(8)

In above formula, MMSE stands for the matching mean square error of projecting one image to its reference image.

Consider the graph built in 4.2 is G(V,E), and there are homography matrix and the MMSE on each edge, to minimize the sum of MMSE is to find a root of a spanning tree so that the sum of MMSE from every vertex to the root is minimum. The problem can be transformed to the problem how to find a vertex on the graph so that the shortest path from every vertex to the target vertex is minimum. The problem is single source shortest path problem and can be solved by Dijkstra Algorithm.

To solve this problem, we can enumerate the vertexes on the graph and then use Dijkstra to find all the shortest paths from every other vertex to the enumerated vertex. The best enumeration with minimum MMSE can be chose as the result. What’s more, we should record the path and transform matrix along the path so that we can transform all the images to the coordinate system of the rooted image. A heap can optimize Dijkstra algorithm with a time complexity of O(|E|log|V|), so the time complexity of the whole algorithm is O(|V||E| log |V|). which is better than the Floyd algorithm with the time complexity of O(|V|3).

Based on the recorded path of every vertex to the root and the transform matrix along the path, we can calculate the optimal set of matrix D opt . Then we transform all the images with accordingly homography matrix in D opt , and put them together to the same image, and now we can obtain an aligned panorama.

3.4 Image blending

So far, we have obtained the aligned panorama, but there are some stitching seams in the panorama. In order to remove these seams and combine information from multiple images, an easy method is applied in this paper. The gray scale is assigned by a weight function, which depends on the distance to the border of the image. We perform a weighted sum of the image intensities along each ray using this weight function

$$ C(x)={\displaystyle \sum_k}{w}_k(x){I}_k(x)/{\displaystyle \sum_k}{w}_k(x) $$
(9)

where C(x) represents the gray scale in the panorama, I k (x) presents the gray scale of overlapping areas of images before aligned, w k (x) represents the weight, depending on the distance to the border of the image, \( {\displaystyle \sum_k}{w}_k(x)=1 \) .

4 Experimental results

Three algorithms (the proposed algorithm (Weighted Shortest Path Algorithm), traditional Zigzag Transform Algorithm and Shortest Path Algorithm [25]) were tested on two image sets. The two image sets are Flowertest and Citytest that are from the Internet. Each of them is consisted of 56 images with a resolution of about 550*460. We run the experiment on the PC with 2.4G CPU and 8GB memory. The total time cost to finish the mosaic of Citytest and Flowertest with the proposed algorithm is 70.33 and 71.9 s respectively. MMSE defined in 4.1 is used to measure the experiment results.

Figures 5 and 6 are the contrast of the output panoramas of the test set Citytest. Due to the high resolution of the images, it’s hard to discriminate the three outputs. We picked three details which are enlarged to show the difference of the three algorithms, see it in Fig. 6. Figures 7 and 8 are the output panoramas of Flowertest and its enlarged view details.

Fig. 5
figure 5

Mosaic result of citytest. a Weighted shortest path algorithm. b Shortest path algorithm. c Zigzag transform algorithm

Fig. 6
figure 6

Enlarged view details in Fig. 5

Fig. 7
figure 7

Mosaic result of citytest. a Weighted shortest path algorithm. b Shortest path algorithm. c Zigzag transform algorithm

Fig. 8
figure 8

Enlarged view details in Fig. 7

Above figures show the advantage of Weighted Shortest Path Algoritm over another algorithms. The accumulated error makes the misalignment of the images in the other two algorithms, while few misalignments can be seen in the output of our algorithm (Table 1). As the contrasts in Fig. 5 and 7 cannot clearly show the difference of the overview panorama because of the size of the page. We measure the quality of the outputs in quantitative. MMSE is used as the measurement. Figure 9 is the MMSE with different input number of images of the two datasets.

Table 1 The error of the three algorithms
Fig. 9
figure 9

The MMSEs of the three algorithms testing on two image sets with different number of input images

From the result we can see, compared with the other two algorithms, the proposed algorithm has the lowest Matching Mean Square Error, and is more stable than other algorithms. In Fig. 9, the advantage of our algorithm becomes obvious as the increase of the number of input images, which shows the rubostness. What’s more, we can see the loweast MMSE and smallest variance of MMSE. In another words, our algorithm is the most stable algorithm.

In this paper, we extend our work to apply the proposed algorithm to the generation of 360 ° panorama, which also obtains good results. We use the dataset on [20]. in this website, a lot of 360 ° cylindrical panoramas are provided. Figures 10, 11, 12 and 13 show the results of the image mosaic by the proposed method in this paper.

Fig. 10
figure 10

Chester Riverside

Fig. 11
figure 11

The parking lot

Fig. 12
figure 12

The office

Fig. 13
figure 13

The Eiling hall of the University of California, SantaBarbara

Moreover, our algorithm works fine under the image sets with rich texture information, and it also shows the good performance when the smooth images are adopted. Two datasets with less texture information are chosen, and Figs. 14 and 15 are the result by Weighted Shortest Path Algorithm.

Fig. 14
figure 14

Grand Teton National Park 1

Fig. 15
figure 15

Aerial

Furthermore, we did some experiments on other data set to compare the mosaic result quality with autostitch [2]. Under some cases, our algorithm performs better than autostitch. Figures 16 and 17 show the contrast with autostitch.

Fig. 16
figure 16

Image mosaic result by proposed algorithm

Fig. 17
figure 17

Image mosaic result of autostitch

5 Conclusion

In this paper, an automatic panoramic image stitching method based on graph-model is proposed. The theory methods such as optimized Dijkstra and path-finding algorithm are also used to minimize the global error, which has speeded up the image alignment process. Compared with other graph-based methods, the result of the proposed algorithm is much better. In some cases, the results are better than autostitch. What’s more, the proposed method is automatic, so there is no need to specify the order of the input images.

From the experiment result, we can see that the error increases as the number of images to be aligned increases, and the accumulated error is not eliminated fundamentally. Image mosaic is a broad research field, and it is worthy of paying further effort should be paid to reduce the global mosaic error.