1 Introduction

Modern methods for the solution of geometric reconstruction problems in Computer Vision may be viewed as starting with the paper of Longuet-Higgins (1981). However, after nearly three decades of research, the problem of finding guaranteed optimal least-squares (L 2) solutions to such problems is not solved. In fact, papers such as Nistér et al. (2007a, 2007b) suggest that such algorithms do not exist.

Even for the simplest of geometric vision problems, the triangulation problem, no ideal method has been given that guarantees an optimal least-squares solution. This paper, however, tries a totally different approach, by giving a procedure for verifying whether a solution obtained through standard methods such as bundle-adjustment actually is the global optimum. Usually, in fact, it is. Though not useful for all geometric Vision problems, the method given in this paper is applicable to a substantial number of the common reconstruction problems, as will be shown theoretically and by experiment in this paper.

Perhaps the simplest of all 3D reconstruction problems is the triangulation problem, which nevertheless shares many of the difficulties of more complex tasks. We begin our description of our approach by considering the triangulation problem, before generalizing later to a wider class of problems. The condition we provide for the verification is a sufficient but not necessary condition for the solution to be a global optimum. However, for triangulation it works in almost all cases. In the rare cases where the condition fails it is usually because the point has large noise, in which case in a large-scale reconstruction problem, the best option is just to remove the point from consideration. Alternatively it may be possible to apply one of the recent (considerably more time-consuming) optimal algorithms (Lu and Hartley 2007; Kahl et al. 2008) for solving the problem; see also the survey (Hartley and Kahl 2007).

How Hard is the Triangulation Problem, Really?

It was shown in Hartley and Sturm (1997) that the least-squares two-view triangulation problem is solvable in closed form. However, up to three local minima may exist. Much more recently, it has been shown (Stewenius et al. 2005) that the solution for three views involves the solution of a polynomial of degree 47, and higher degree polynomials are required with more views. The degree of the polynomial involved in the solution translates into numbers of possible local minima. It is certainly possible to find explicit examples with multiple local minima (Hartley and Sturm 1997). This suggests that the problem is difficult.

On the other hand, it was shown in Hartley and Schaffalitzky (2004) that a single local (and hence global) minimum occurs if an L (minimax) solution is sought, instead of the least-squares solution. However, the least-squares problem remains the problem of primary interest. The L solution is useful as an initialization procedure for the least squares, but it does not guarantee an optimal L 2 solution. In a real data test it seems that common algorithms get the right answer most of the time. So, perhaps the problem is not so hard after all—if only one knows whether one has the right solution. That problem is effectively solved in this paper.

Working algorithms with practical accuracy have been reported for geometric vision problems since the eight-point algorithm for two view reconstruction (Hartley and Zisserman 2004). Although these known methods do not guarantee a globally optimal solution, nevertheless, simple methods based on initialization, followed by iterative refinement usually work very well. They depend on the initial solution being within the basin of attraction of the optimal solution. However, until now there has been no way of verifying this requirement. In this paper, we address this problem and give a fast and very satisfactory method of verifying that a given local minimum is the global minimum. By experiments on a very large set of triangulation problems, the test is seen to have a 99.9 % success rate and averages 0.5 ms per triangulated point on a modern 2.66 GHz laptop computer, substantially faster for small problems.

The new technique for proving convexity and thus the global optimality is applicable to not only triangulation but also other geometric vision problems. We show that the same analysis for the triangulation problem applies to any optimization problem with quasi-convex residual functions (Ke and Kanade 2007; Kahl and Hartley 2008), since the location of the global minimum can be constrained to a convex set.

The class of problems that our framework covers is the same as the quasiconvex problems in the L -norm (Ke and Kanade 2007; Kahl and Hartley 2008), which includes n-view triangulation, homography estimation, camera resectioning to name a few.

Our main contributions may be listed as follows:

  • An easy and fast method to verify the global optimality of the solution under the L 2 norm for a whole class of multiview geometry problems.

  • We perform an extensive set of experiments on several data sets to explore the limitations of the method. Results are given for triangulation, homography estimation, camera resectioning and structure and motion with known rotations. Our main conclusion is that in most cases the basic verification test succeeds, especially for small scale problems.

2 Triangulation—Simple Convexity Test

We will derive conditions for optimality of the solutions for a class of cost functions encountered in the solution of geometric vision problems. These are the cost functions composed of sum of squares of measurement errors, where each error term is a quasi-convex cost function of Second Order Cone Programming (SOCP) type (Kahl and Hartley 2008). This type of cost function is encountered in many different problems, many of which are listed in the survey (Hartley and Kahl 2007). However, to allow a simple introduction to the techniques introduced in this paper, we will first apply them to the triangulation problem, which has been shown to be representative of the more general problems considered later.

2.1 Problem Description

We consider a calibrated camera model and think of image points as being represented by points on a sphere, that is unit vectors, rather than points in an image plane. The triangulation problem is characterized by a set of unit measurement vectors w i based at projection centres C i , all pointing to a common point X in space, which is to be determined. Thus, nominally X=C i +d i w i , where d i is the depth of the point X with respect to the i-th projection.

With error in the measurements, the rays from centre C i in direction d i do not exactly intersect, so the required point X is the one that minimizes the sum of squares error. The image error is normally the angular difference between a measured direction w and the direction vector from the camera centre to the point X. For simplification, our cost function will be the sum of squares of the tangent of the angular error, rather than the squared angle itself. Since errors are usually quite small, the difference is insignificant, and the analysis is simpler. Eventually, we will extend the result to minimization of squared angle error (see Appendix A).

Considered more generally, the desired solution is a point X opt that represents a global minimum of the cost function \(C(\mathbf{X}) = \sum_{i=1}^{N} f^{2}_{i}(\mathbf{X})\), where \(f^{2}_{i}\) represents a squared residual error measurement in the i-th of N images. If we use the tangent of the angle to measure the residual error, then the total cost function is given by

(1)

Bounding the Solution

Now, suppose that X 0 is a proposed solution to this triangulation problem with residual given by \(C(\mathbf{X}_{0}) = \sum_{i=1}^{N} f^{2}_{i}(\mathbf{X}_{0}) = \epsilon^{2}\). This observation gives a constraint on the position of the optimal solution, which must satisfy

$$ C( \mathbf{X}_{\mathrm {opt}}) = \sum_{i=1}^N f^2_i( \mathbf{X}_{\mathrm {opt}}) \le\epsilon^2 . $$
(2)

Since the sum of terms \(f^{2}_{i}( \mathbf{X}_{\mathrm {opt}})\) must be less than ϵ 2, so must each individual term. Thus, for all i, \(f^{2}_{i}( \mathbf{X}_{\mathrm {opt}}) \le\epsilon^{2}\). The set of points X in ℝ3 satisfying this condition for some i consists of those points that map into a circular neighbourhood of the image point. This constitutes a cone in ℝ3, as observed in Hartley and Schaffalitzky (2004). Since this condition must be satisfied by each of the projections, it follows that X opt must lie in the intersection of all the cones. This is a convex region of ℝ3, since each cone is a convex set.

Let us define the convex set \(\mathcal {D}\) of points for the optimal solution as

$$ \mathcal {D}= \bigl\{\mathbf{X} \, | \, f^2_i( \mathbf{X}) \le\epsilon^2; \, {\mathbf{w}_i}^{\top}(\mathbf{X} - \mathbf{C}_i) \ge0,\ i=1,\ldots,N\bigr\}, $$
(3)

where the second condition w i (XC i )≥0 restricts the domain to points in front of the camera.

Our overall strategy of finding X opt (or verifying X 0=X opt) is to provide tests that allow us to prove that the cost function must be convex on this convex region. If this is so, then finding the optimal solution X opt may be carried out using standard convex optimization methods. More significantly, if X 0 already is a local minimum of the cost function, found by any geometric optimization technique, then it must be a global minimum.

The Hessian of the Cost Function

A sufficiently differentiable function is convex on a convex region if and only if its Hessian is positive semi-definite. Hence, we are led in this section to consider the Hessian of the cost function. The total cost function is a sum of several different terms, one for each measurement. We will first consider a single term in the cost function, corresponding to a single image measurement.

To begin with, we consider the Hessian expressed in a suitable camera-centred coordinate frame. Consider a vector w pointing from the origin in the direction of the z-axis, and let X=(x,y,z) be a point lying close to the positive z axis, such that the vector from the origin to X makes an angle ϕ from the vector w. Consider the error function given by

$$ f^2(x, y, z) = \tan^2 \phi= \bigl(x^2 + y^2\bigr) / z^2 . $$
(4)

This represents the squared projection error of the point X with respect to the “measured direction”, w.

The Hessian matrix of f 2 with respect to x, y and z is easily computed to be

(5)
(6)

where we introduce the notation τ x , τ y , τ and d defined by the terms in corresponding position. Note that \(\tau^{2} = \tau_{x}^{2} + \tau_{y}^{2}\), and d represents the depth of the point in the viewing direction. Neglecting the constant factor 2/d 2, the eigenvalues of this matrix are easily computed to be 1 and \(a \pm\sqrt{a^{2} + \tau^{2}}\), where a=(3τ 2+1)/2. There are two positive and one negative eigenvalue.

A Bound on the Hessian

As the point X=(x,y,z) moves over the region of interest \(\mathcal {D}\), the Hessian changes. It is our purpose to find a lower bound for the Hessian matrix for all points X in \(\mathcal {D}\).

Consider the matrix given by

$$ \mathtt{H}' = \frac{2}{d^2} \left [ \arraycolsep=5pt \begin{array}{@{}ccc@{}} 1/3 & 0 & 0 \\ 0 & 1/3 & 0 \\ 0 & 0 & -3 \tau^2 \end{array} \right ]. $$
(7)

The reasoning that led to the choice of this matrix H′ will be explained later, but its properties are simple enough. The key property of this matrix is the observation that

$$\mathtt{H} - \mathtt{H}' = \frac{4}{3d^2} \left [ \arraycolsep=5pt \begin{array}{@{}ccc@{}} 1 & 0 & -3\tau_{x} \\ 0 & 1 & -3\tau_{y} \\ -3\tau_{x} & -3\tau_{y} & 9 \tau^2 \end{array} \right ] $$

is positive semi-definite. Indeed, it is easily seen that HH′ has eigenvalues (4/3d 2) times the following:

All eigenvalues are non-negative. We write HH′⪰0, or HH′. Matrix H′ is a lower bound for the Hessian.

The eigenvalues of H′ are obviously the diagonal entries. We see that H′ has two positive and one negative eigenvalue. The eigenvector corresponding to the negative eigenvalue is directed along the z axis, and the other two eigenvectors are in the plane perpendicular to the z axis. However, since the two corresponding eigenvalues are equal, the eigenvectors may be taken as any two orthogonal vectors in the xy plane.

The matrix H′ is more easily handled than H, since it is dependent on the point X only through the values of d and τ, both of which can be bounded, as we shall see. We may write H′ as

$$\mathtt{H}' = \bigl(2/3d^2\bigr) \operatorname {diag}\bigl(1,1,- 9 \tau^2\bigr) . $$

We assume that the point X lies in a cone with angle arctanτ max. Thus, ττ max, and we see that

Note that this matrix is a lower bound (in the semi-definite partial ordering) for the Hessian of f at any point X lying in the cone. It depends only on the depth d of the point from the vertex of the cone.

Generally Placed Cone

The above computation was carried out for the case where the vector w points directly along the z axis. We now consider a cone with axis represented by an arbitrary unit vector w and with vertex at a point C. This corresponds to rotating the camera by R to align the Z direction to w, followed by a translation of the origin. The depth d(X) of a point X relative to the vertex is given by w (XC). The Hessian for the function (4) can be transformed easily to this rotated coordinate frame, according to

$$ \mathtt{H} \mapsto \mathtt{R}\mathtt{H} \mathtt{R}^{\top }\succeq \frac{2}{3 d(\mathbf{X})^2} \bigl( \mathbf{u} \mathbf{u}^{\top}+ \mathbf{v}\mathbf{v}^{\top}- 9 \tau _{\mathrm {max}}^2\mathbf{w}\mathbf{w}^{\top} \bigr) , $$
(8)

where u, v and w are the three columns of the rotation matrix. This gives a lower bound for the Hessian, depending only on the point X through its depth d(X) from the vertex.

Bounded Depths

We may remove this final dependency of the Hessian on the point X if we assume that the point X lies in a bounded domain. Thus, suppose the depth d(X) is bounded on the feasible domain \(\mathcal {D}\), and that we can find bounds L and U such that for all \(\mathbf{X} \in \mathcal {D}\),

$$ L \leq1/d(\mathbf{X}) \leq U . $$
(9)

In this case, one can find a bound on the Hessian that holds over the complete domain \(\mathcal {D}\), and yet is independent of X, namely (8) and (9) together give

$$ \mathtt{H} \succeq (2/3) \bigl(L^2 \bigl(\mathbf{u} \mathbf{u}^{\top}+ \mathbf{v}\mathbf{v}^{\top}\bigr) -9 \,U^2\, \tau _{\mathrm {max}}^2 \mathbf{w}\mathbf{w}^{\top} \bigr), $$
(10)

where the inequality holds for the Hessian of f 2 computed at any point in \(\mathcal {D}\). Thus, the matrix on the right may be thought of as a lower bound for the Hessian evaluated in this region.

Summing the Hessians

Now, we consider a point X, subject to several measurements, represented by vectors w i . We do not care where the vertex of the cone (corresponding to the camera centre) is located, but only that the depth of the point X in the i-th cone is d i . We suppose that the point X is situated in the intersection of cones with angle arctanτ max. Let \(f^{2}_{i}(\mathbf{X})\) be \(\tau_{i}^{2}\) where arctanτ i is the angle of X from the axis of the i-th cone. The L 2 error associated with the point X is given by \(f^{2}(\mathbf{X}) = \sum_{i} f^{2}_{i} (\mathbf{X})\) and the Hessian of f 2 is H=∑ i H i . Now applying the inequality (8) to each H i , we get

$$ \mathtt{H} \succeq \sum_i \bigl(2/3d_i(\mathbf{X})^2\bigr) \bigl( \mathbf{u}_i {\mathbf{u}_i}^{\top}+ {\mathbf{v}_i \mathbf{v}_i}^{\top} - 9\tau _{\mathrm {max}}^2\mathbf{w}_i {\mathbf{w}_i}^{\top} \bigr). $$
(11)

Although each of the individual summands H i in this expression cannot be positive semi-definite, we hope that by adding up the contributions for different i, the positive eigenvalues will dominate the negative eigenvalues, resulting in a positive definite or semi-definite matrix.

Summarizing the previous discussion, we may state our basic convexity theorem for the 3D triangulation problem.

Theorem 1

Let the convex domain \(\mathcal {D}\) be contained in the intersection of a set of N cones with vertices C i and with axes represented by unit vectors w i , and angle bounded by arctanτ max. Let u i and v i be unit vectors orthogonal to each other and to w i . For a point \(\mathbf{X} \in \mathcal {D}\), let d i (X)=w i (XC i ) represent its depth from the vertex of the i-th cone in the direction w i . Let U i and L i be upper and lower bounds for the value of 1/d i (X) on \(\mathcal {D}\). Then

$$ \sum_{i=1}^N \bigl( L_i^2 \bigl(\mathbf{u}_i { \mathbf{u}_i}^{\top}+ \mathbf{v}_i{ \mathbf{v}_i}^{\top}\bigr)-9U_i^2\, \tau _{\mathrm {max}}^2 \mathbf{w}_i {\mathbf{w}_i}^{\top} \bigr) \succeq \mathtt{0} $$
(12)

is a sufficient condition for the least-squares error function (1) to be convex on  \(\mathcal {D}\).

3 Weighted Depths

It is possible to obtain a better condition than that given in Theorem 1, as will be shown next. Observe that the summand in (12) consists of two parts, which are multiplied by \(L_{i}^{2}\) and \(U_{i}^{2}\) respectively. Whereas u i u i +v i v i is positive semi-definite, \(-9 \tau_{\mathrm{max}}^{2} \mathbf{w}_{i} {\mathbf{w}_{i}}^{\top}\) is negative semi-definite. In order for the total sum to be positive semi-definite, it is advantageous if the factor \(U_{i}^{2}\) is kept as small as possible. Though U i can be no smaller than L i , it is best if the bounds L i and U i are as close to each other (tight) as possible.

Looked at another way, the condition (12) resulted from obtaining a lower bound for the Hessian over the whole region \(\mathcal {D}\). Since the Hessian of each individual term \(f_{i}^{2}\) scales as 1/d 2 along any ray through C i , any constant lower bound over the region \(\mathcal {D}\) will not be a good bound if the depth of the region varies greatly, so that L i and U i are widely different. An alternative is to introduce some positive function α(X) and find a lower bound for the scaled Hessian α(X)H over the region \(\mathcal {D}\). If this lower bound is positive definite, then so is H everywhere in \(\mathcal {D}\).

Consider a positive function α(X) and suppose that there exist constants \(L^{\alpha}_{i}\) and \(U^{\alpha}_{i}\), such that the following conditions holds for all i, and all \(\mathbf{X} \in \mathcal {D}\):

$$ L^{\alpha}_i \le \frac{\alpha(\mathbf{X})}{d_i(\mathbf{X})} \le U^{\alpha}_i . $$
(13)

Then, referring to (11), we find that

(14)

So to establish convexity it is sufficient to verify that

$$ \sum_i \bigl( \bigl({L^{\alpha}_i} \bigr)^2 \bigl( \mathbf{u}_i {\mathbf{u}_i}^{\top}+ \mathbf{v}_i{\mathbf{v}_i}^{\top}\bigr) - 9 \bigl({U^{\alpha}_i} \bigr)^2 \tau _{\mathrm {max}}^2 \mathbf{w}_i{\mathbf{w}_i}^{\top} \bigr) \succeq \mathtt{0}. $$
(15)

Observe that this condition is identical with (12) except that bounds L i and U i are replaced by \(L^{\alpha}_{i}\) and \(U^{\alpha}_{i}\). If these new bounds are tighter (that is more nearly equal) than (L i ,U i ), then we can expect a better result; we are more likely to be able to establish convexity using condition (15) than by using (12).

If \(L^{\alpha}_{i}\) and \(U^{\alpha}_{i}\) are not greatly different, this condition specifies that all the depths d i (X) are approximately proportional to some “average” depth value α(X).

A useful choice for the function α(X) in the above discussion is the average depth of point X with respect to all views, that is

$$\alpha(\mathbf{X}) = (1/N) \sum_{i=1}^N d_i(\mathbf{X}) . $$

If the depth d i (X) of a point is reasonably approximated by the average depth over all views, then α(X)/d i (X) does not vary greatly over the domain \(\mathcal {D}\), so the bounds \(L^{\alpha}_{i}\) and \(U^{\alpha}_{i}\) are close together. Even for domains \(\mathcal {D}\) reaching to infinity, the ratio α(X)/d i (X) will remain within reasonable maximum and minimum bounds, since the average depth α(X) increases to infinity at the same rate as any specific depth d i . This is illustrated in Fig. 1.

Fig. 1
figure 1

When the baseline is small compared to depth, the region \(\mathcal {D}\) is elongated, and the ratio U i /L i can be large. However, over the region of interest, the depth of a point X is approximately the same in all views, so α(X)/d i (X) is close to constant when α(X) is the average depth over all views

Average depth is a good choice for the function α(X) in the case where all the points have approximately the same depth in all views. In the case when the depths are substantially different, then it may be better to choose a weighted depth average. For instance, let μ i =d i (X 0), where \(\mathbf{X}_{0}\in \mathcal {D}\) is a proposed solution to the problem. Then, a good choice for the function α(X) is

$$\alpha(\mathbf{X}) = \frac{1}{N} \sum_{i=1}^N \frac{d_i(\mathbf{X})}{\mu_i} . $$

We now may summarize the previous discussion by stating the general convexity theorem.

Theorem 2

With the notation as in Theorem 1, let α(X) be any positive real valued function defined on \(\mathcal {D}\), and let \(U^{\alpha}_{i}\) and \(L^{\alpha}_{i}\) be upper and lower bounds for the value of α(X)/d i (X) on \(\mathcal {D}\). Then

$$ \sum_{i=1}^N \bigl( \bigl({L^{\alpha}_i} \bigr)^2 \bigl( \mathbf{u}_i {\mathbf{u}_i}^{\top}+ \mathbf{v}_i {\mathbf{v}_i}^{\top} \bigr) - 9 \bigl({U^{\alpha}_i} \bigr)^2\,\tau _{\mathrm {max}}^2 \mathbf{w}_i {\mathbf{w}_i}^{\top} \bigr) \succeq \mathtt{0} $$
(16)

is a sufficient condition for the least-squares error function (1) to be convex on  \(\mathcal {D}\).

3.1 Computing the Bounds

For conditions of this type to be useful, it is necessary that the bounds \(L^{\alpha}_{i}\) and \(U^{\alpha}_{i}\) can be calculated. We can find \(U^{\alpha}_{i}\) by maximizing α(X)/d i (X) for \(\mathbf{X} \in \mathcal {D}\). Any value \(U^{\alpha}_{i}\) no smaller than this maximum will be a suitable bound.

The region \(\mathcal {D}\) of ℝ3 over which we need to minimize and maximize α(X)/d i (X) is the intersection of second order cones. Using second order cones leads us into Second Order Cone Programming (SOCP), which for efficiency (and the non-availability of public domain code in C++) we wish to avoid. It is easier to allow \(\mathcal {D}\) to be a polygonal region (if necessary by enlarging it slightly). This polygonal region can be expressed in terms of linear constraints, resulting in the use of Linear Programming to optimize α(X)/d i (X). More details of this are given in the algorithm description in Sect. 3.2.

In the case α(X)=1, the problem reduces to finding the maximum and minimum values of depth on \(\mathcal {D}\). Thus, let

$$B = \min_{\mathbf{X} \in \mathcal {D}} d_i(\mathbf{X}) = \min_{\mathbf{X} \in \mathcal {D}} {\mathbf{w}_i}^{\top}(\mathbf{X} - \mathbf{C}_i) . $$

Then \(U^{\alpha}_{i}\) can be chosen as any value greater than 1/B. If the region \(\mathcal {D}\) is polygonal, the value of B may be found by linear programming. If \(\mathcal {D}\) is an intersection of cones, then we may use SOCP to find the minimum.

If α(X) is a linear function of the different depths, say \(\alpha(\mathbf{X}) = \alpha_{0} + \sum_{j=1}^{n} \alpha_{j} d_{j}(\mathbf{X})\), then it is also possible to find the bounds \(U^{\alpha}_{i}\) and \(L^{\alpha}_{i}\). The required optimization problem to find \(U^{\alpha}_{i}\) is

where the maximization takes place over all values of B and \(\mathbf{X} \in \mathcal {D}\), and \(U^{\alpha}_{i}\) is subsequently chosen as any value no smaller than B. In this case, however, writing d i (X)=w i (XC i ) results in a constraint involving both B and the point X; this is not a linear constraint. Nevertheless, for a fixed value of B, the constraint is linear, and instead of maximizing B, we may solve a feasibility problem to determine whether there exists X to satisfy the constraints. The problem can be rewritten, essentially equivalently, as

By a process of binary search over the range of B one may solve the optimization problem to determine the optimal B. This requires repeated solution of the feasibility problem. We write the problem in this way to emphasize that the required value of B is one in which the constraint problem is infeasible. It is of some importance not to stop the bisection process at a value B where the constraints are satisfied.

A good starting point is the point \(\mathbf{X}_{0}\in \mathcal {D}\), which can be used to specify a value B for which the problem is feasible. Since we may not need a tight upper bound B, the bisection process may not need to continue to high accuracy. One may set \(U^{\alpha}_{i}\) to some value of B for which the constraint problem is infeasible.

3.2 Algorithm

We give a summary of the algorithm for proving convexity of the cost function.

Consider a set of cameras with centres C i , and let w i be a direction vector representing the measured direction of an observed point X from C i . Let u i and v i be two unit vectors orthogonal to w i constituting (along with w i ) an orthogonal coordinate frame.

  1. 1.

    Bounding the region. Let X be a 3D point constituting a potential solution to the triangulation problem. The cost of this point is the value of the cost function

    (17)

    Let the value of this cost for the given point X be ϵ 2.

    We then define a region of space in which the optimal point X opt lies according to the inequalities.

    $$ -\epsilon\le\frac{{\mathbf{u}_i}^{\top}( \mathbf{X}_{\mathrm {opt}}- \mathbf{C}_i)}{{\mathbf{w}_i}^{\top}( \mathbf{X}_{\mathrm {opt}}- \mathbf{C}_i)} \le \epsilon $$
    (18)

    and similar inequalities involving v i instead of u i . Since w i (X optC i )>0 (the cheirality constraint that the point must lie in the direction it is observed), we can multiply out by w i (X optC i ) to obtain a total of four linear inequalities in the positions of the point X opt, constraining it to a polyhedral region of space, \(\mathcal {D}\).

  2. 2.

    Finding depth bounds. The next step is to find minimum and maximum values of d i on the region \(\mathcal {D}\). Since d i is defined to be w i (XC i ), determining its minimum and maximum over the polyhedral region \(\mathcal {D}\) is simply a pair of linear programming problems.

  3. 3.

    Performing the test. We can now form the matrix in (12) and test its smallest eigenvalue to determine if it is positive-definite. If it is, then the cost function is convex on the region \(\mathcal {D}\). If the initial estimate X is a local minimum, then it is also a global minimum.

  4. 4.

    α -test. If the test in the previous step failed, then repeat steps 2 and 3 to find bounds (13), and then condition (16) to test for convexity.

4 General SOCP Cost Functions

We now consider the same problem in the more general context of the type of cost function that arises in many geometric Vision problems. The type of cost function that we consider is of the type that arises in SOCP. Thus, let f:ℝm→ℝ be a function of the form

$$ f^2(\mathbf{X}) = \frac{\| \mathtt{A}^{\top }\mathbf{X} + \mathbf{b}\|^2}{(\mathbf{c}^{\top} \mathbf{X} + k)^2} , $$
(19)

where A is an m×n matrix. Suppose that for \(\mathbf{X} \in \mathcal {D}\) the value of the function is bounded, so that \(f^{2}(\mathbf{X}) < \tau_{\mathrm{max}}^{2}\) for \(\mathbf{X} \in \mathcal {D}\).

Define the vector v=(A X+b)/(c X+k), and observe that \(\|\mathbf{v}\|^{2} = f^{2}(\mathbf{X}) < \tau_{\mathrm{max}}^{2}\). The Hessian of f 2(X) can now be computed to equal

$$ \mathtt{H} = \frac{2}{(\mathbf{c}^{\top} \mathbf{X} + k)^2} [ \arraycolsep=5pt \begin{array}{@{}cc@{}} \mathtt{A} & \mathbf{c} \end{array} ] \left [ \arraycolsep=5pt \begin{array}{@{}cc@{}} \mathtt{I}_{n\times n} & -2 \mathbf{v} \\ -2 \mathbf{v}^{\top} & 3 \| \mathbf{v}\|^2 \end{array} \right ] \left [ \begin{array}{c} \mathtt{A}^{\top}\\ \mathbf{c}^{\top} \end{array} \right ]. $$
(20)

Apart from the multiplication by [A c] and its transpose, this is the same as what was obtained previously in (5). As before, we see that

Indeed, it is straight-forward to verify that the difference between these two matrices,

$$\frac{2}{3} \left [ \arraycolsep=5pt \begin{array}{@{}cc@{}} \mathtt{I}_{n\times n} & -3 \mathbf{v} \\ -3 \mathbf{v}^{\top} & 9 \|\mathbf{v}\|^2 \end{array} \right ] $$

has eigenvectors 0, 2/3 (repeated n−1 times) and (2/3)(1+9∥v2). It results from this that

$$ \mathtt{H} \succeq \frac{2}{3 d(\mathbf{X})^2} \bigl(\mathtt{A} \mathtt{A}^{\top}- 9\tau_{\mathrm{max}}^2 \mathbf{c} \mathbf{c}^{\top}\bigr), $$
(21)

where d(X)=c X+k. If we can bound 1/d(X)2 so that L 2≤1/d(X)2U 2, then we obtain a bound that does not depend at all on the point X, namely

$$ \mathtt{H} \succeq (2/3) \bigl( L^2 \mathtt{A} \mathtt{A}^{\top}- 9 U^2\,\tau_{\mathrm{max}}^2 \mathbf{c} \mathbf{c}^{\top} \bigr) . $$
(22)

In addition, as with the triangulation problem, we may define a function α(X) on \(\mathcal {D}\) to obtain a general condition.

Theorem 3

Let \(\mathcal {D}\) be a domain inm and for i=1,…,N let \(f^{2}_{i} : \mathcal {D}\rightarrow \mathbb{R}\) be a function of the form

$$f^2_i(\mathbf{X}) = \frac{\| {\mathtt{A}_i}^{\top} \mathbf{X} + \mathbf{b}_i \| ^2}{({\mathbf{c}_i}^{\top} \mathbf{X} + k_i)^2} $$

such that \(0 \le f^{2}_{i}(\mathbf{X}) \le\tau_{i}^{2}\) on \(\mathcal {D}\). Let α be a positive real valued function defined on \(\mathcal {D}\) and \((L^{\alpha}_{i}, U^{\alpha}_{i})\) be lower and upper bounds for α(X)/(c i X+k i ) on \(\mathcal {D}\). Then

(23)

is a sufficient condition for the function \(\sum_{i=1}^{N} f^{2}_{i}(\mathbf{X})\) to be convex on \(\mathcal {D}\).

An important special case is when α is identically equal to unity, and (L i ,U i ) are bounds for the inverse “depth”, 1/(c i X+k i ). Also note that if τ max=max i τ i , then we can replace τ i by τ max in the (23) to get a slightly weaker (but more simple) sufficient condition.

4.1 Application to Other Problems

We now consider the way many other problems in Multiple-View Geometry may be expressed in terms of cost functions of the form given in (19), and hence are susceptible to the methods developed in this paper. Problems that can be addressed in this way are essentially the same as those that can be solved in L norm using SOCP or related methods. A survey of such problems was given in Hartley and Kahl (2007). We consider the most important such problems here. In all cases we consider, it is a simple exercise to write the cost functions as a sum of terms of the form (19).

4.2 Multiple View Reconstruction

The cost function (17) involves only one point X. Suppose we have several points X j ; j=1,…,M and also several points C i ; i=1,…,N, and that a direction vector w ij nominally pointing from C i to X j is known for some subset \(\mathcal{S}\) of pairs (i,j). One may write a cost-function

(24)

which is to be minimized over all choices of the X j and C i . To remove the gauge freedoms from this problem, it is best to set one of the C i to be at the origin, and for one point X j to be chosen to satisfy w ij (X j C i )=1.

Now, if we define a vector X to be a concatenation of all the X j and C i , then each of the terms in (24) is easily expressible in the form (19). The matrix A ij , and the vector c ij that will appear in the place of A i and c i in (19) will have 3(M+N) columns, but will be quite sparse, having non-zero entries only in columns corresponding to the points X i and C j involved in the term.

Defined in this way, the problem satisfies the conditions of Theorem 3, and the bound (23) may be used to check convexity of the cost function.

For large problems, the number of measurements is quite large and the total cost \(\sum f_{ij}^{2} (\mathbf{X}_{0}) = \epsilon^{2}\) will be high. This value determines the value of τ that must be used to verify optimality. For this reason, the region \(\mathcal {D}\) can be large, and the test can fail. An alternative is to set a smaller value of τ. If the convexity test succeeds for this smaller value, then we cannot rule out the existence of an optimum X opt not in the region \(\mathcal {D}\); however, it such an optimum exists, then the residual \(f_{ij}^{2}( \mathbf{X}_{\mathrm {opt}})\) must be greater than τ for at least one measurement. If τ is reasonably large, then this may be considered an unlikely occurrence, particularly if the measurement set is outlier-free.

4.3 Reconstruction Using Image Measurements

Consider a projective camera with camera matrix P i acting on a point X j , and let the corresponding measured image point be x ij , defined for \((i, j) \in\mathcal{S}\), that is for some subset of all point—image pairs. If we decompose the camera matrix as

$$\mathtt{P}_i = \left [ \arraycolsep=5pt \begin{array}{@{}cc@{}} \mathtt{A}_i & \mathbf{b}_i \\ {\mathbf{a}_i}^{\top} & b_i \end{array} \right ] $$

where A i is a 2×3 matrix and a i and b i are vectors, then the projection of a point X j is given by (A i X j +b i )/(a i X j +b i ). The squared image error is then given by

$$f_{ij}^2(\mathbf{X}_j, \mathbf{b}_i, b_i) = \biggl \Vert \frac{\mathtt{A}_i \mathbf{X}_j + \mathbf {b}_i}{{\mathbf{a}_i}^{\top} \mathbf{X}_j + b_i} - \mathbf{x}_{ij} \biggr \Vert ^2 $$

which is easily put into the form (19).

If the camera matrices are completely known, and there is only one point X j that needs to be found, then this is simply the triangulation problem, expressed in terms of image coordinates.

With several points and cameras, we suppose that the first three columns of P i , consisting of matrix A i and vector a are known, whereas X j , b i and b i are unknown. Then, A i X j +b i and a i X j +b i are linear in the unknowns X j , b i and b i . As in Sect. 4.2 we define a vector X by concatenating all the X j , b i and b i . Each term \(f^{2}_{ij} (\mathbf{X})\) of the cost function is then of the required form (19) and (23) may be used to prove convexity of the cost function.

Together, A i and a i make up the left-hand 3×3 block of the camera matrix. There are two important situations in which this left-hand block of P i may be known. The first is when the cameras are calibrated, and the block constitutes the rotation matrix associated with the camera. This rotation matrix may have been obtained from a prior reconstruction step in which rotations are computed. This is the same problem as was considered in Sect. 4.2, expressed now in image coordinates.

The second situation is that considered by Rother and Carlsson (2002), in which a plane in the scene allows us to compute inter-image homographies induced by that plane. These homographies are represented by the left-hand block of the camera matrices. Note that the solution given in Rother and Carlsson (2002) was a linear solution to the reconstruction problem, not the sort of least-squares solution that we wish to verify here.

4.4 Homographies and Camera Resectioning

In the camera resectioning problem, we are given 3-dimensional points X i and measured corresponding points x i related by an unknown 3×4 projection matrix P, which is to be estimated. This is also known as the projective camera pose estimation problem. For no apparent reason, this has also recently become known as the PnP problem, a neologism which we abjure. The homography estimation is essentially the same, except that the points X i are 2-dimensional points, and the matrix P is a 3×3 matrix. Since these two problems are essentially the same, we will concentrate on the pose-estimation problem.

If we denote the k-th row of P by p k , then the squared error associated with the measurement of point x i is given by

$$f^2(\mathtt{P}) = \biggl( \frac{{\mathbf{p}_1}^{\top}\tilde{\mathbf{X}}_i}{{\mathbf{p}_3}^{\top}\tilde{\mathbf{X}}_i} - x_i \biggr)^2 + \biggl( \frac{{\mathbf{p}_2}^{\top}\tilde{\mathbf{X}}_i}{{\mathbf{p}_3}^{\top}\tilde{\mathbf{X}}_i} - y_i \biggr)^2 $$

where \(\tilde{\mathbf{X}}_{i}\) represents the homogeneous vector (x,y,z,1). Here, the situation is slightly different from the previous cases of triangulation and multiple view reconstruction, in that the entries of the matrix P are the unknowns, and not the coordinates of the points X i . Nevertheless, it is easily observed that the squared error term may be written in the usual form (19).

5 Projective Coordinate Change

If the convexity test described previously should fail to give a positive answer, then there are further things that can be tried to attempt to prove convexity. Our first method of attack is to try a projective coordinate change.

By changing projective coordinate systems, we will reparametrize the domain of the error function f 2(X). This may turn a non-convex function into a convex function. It is easy to see that affine coordinate changes do not affect convexity so it is enough to restrict our attention to transformations that change the plane at infinity. Cheirality conditions also apply, that is, all 3D-points must remain in front of the cameras. See Hartley and Zisserman (2004) for more details on projective geometry.

We consider the use of the inequalities given previously to demonstrate that the function is positive semi-definite on a region. The matrix H′ defined in (7) has one small negative eigenvalue in the direction along the principal ray of the camera, and two larger positive eigenvalues oriented in directions orthogonal to the principal ray. It gives a uniform lower bound on the Hessian in a region of space. The general idea is that if a point is seen from two different directions, then the Hessian of the combined error function is simply the sum of the individual Hessians. If the two principal rays are not aligned, then the idea is that the negative eigenvalue in the principal direction for one camera will be outweighed by the contribution of the positive eigenvalues for the other camera.

This will be true if the view directions for the different cameras are sufficiently different. If on the other hand, the view directions for the different views are the same, then no cancelling will take place, and we will not be able to conclude that the Hessian is positive definite in any region around the point X of interest.

In this case, the situation can be saved by the application of a projective transformation. If the view directions for several cameras are similar, this implies that the point X is near to infinity. We apply a projective transformation that maps this point to a point closer to the cameras, so that the viewing rays are no longer near parallel. Then, consider the Hessian of the error function in this new coordinate system. The general idea is best illustrated by an example.

Consider cameras with matrices P a =[I|(a,0,0)], where a is a variable. This camera is placed at the position x=−a on the x-axis, and the principal rays for all these cameras point in the direction of the z-axis. Suppose we want to find the Hessian of the error function on the intersection of error-cones, defined by τ, around the principal rays of the cameras. This situation will occur when the correct triangulation point lies near infinity. Now, according to (7), the Hessian at a point with depth d may be bounded as follows:

$$H_a \succeq 2/3d^2 \operatorname {diag}\bigl(1, 1, -9\tau^2 \bigr) $$

and this bound is the same for all a. Adding any number of such Hessians H a for different cameras P a will not result in a positive-definite matrix. Hence, we cannot easily conclude that the Hessian is positive-definite.

It is instructive to note here in passing that since the region of interest \(\mathcal {D}\) stretches to infinity, the bound L a in condition (12) will be zero. Hence the matrix in (12) will be primitive. This problem will be partly alleviated by using the α-test (16) with α(X) equalling the average depth of a point over different images. Since the depth is the same in all images, the ratio α(X)/d(X)=1, so \(L^{\alpha}_{i} = U^{\alpha}_{i} = 1\), and the matrix to test is \(\sum_{i=1}^{N} 2/3d^{2} \operatorname {diag}(1, 1, -9\tau^{2})\), which as remarked above is still not positive-definite. The problem is that the axes of the cones are parallel, which gives no scope for the positive eigenvalues of the individual Hessians to cancel the negative ones.

However, if we apply a projective transformation represented by matrix

$$\mathtt{T} = \left [ \arraycolsep=5pt \begin{array}{@{}cccc@{}} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 1 \end{array} \right ] $$

which takes the point (0,0,1,0), the point at infinity in the z direction, to (0,0,1,1), a finite point. The axes of different cones pointing to this point will no longer be parallel.

This action transforms the camera matrix P a to

$$\mathtt{P}_a \mathtt{T}^{-1} = \left [ \arraycolsep=5pt \begin{array}{@{}cccc@{}} 1 & 0 & -a & a \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{array} \right ]. $$

Note that the point (0,0,1,1) still maps to the origin in all images. Now, according to (21) the Hessian for a point at depth d will be bounded by

Now, summing two such Hessians for a=1 and a=−1, we find that

$$\mathtt{H}_1 + \mathtt{H}_{-1} = \frac{4}{3d^2} \left [ \arraycolsep=5pt \begin{array}{@{}ccc@{}} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & a^2 - 9 \tau^2 \end{array} \right ], $$

where we use the fact that (as before) the depth d is the same for all images. This matrix is obviously positive definite even for moderately large values of τ. This proves that the Hessian is positive definite on the intersection of τ-cones, and shows that the region over which it is certain that there is a single minimum of the cost function is relatively large.

Algorithm

Even if there exists a transformation T:ℙm↦ℙm that makes the error function f 2(X) convex over the transformed domain \(T(\mathcal {D})\) (where m is the dimension of the unknown), it may be hard to find it automatically. Note that the maximum and minimum depths change when the plane at infinity is transformed, so it is required that the bounding depths are recomputed as well.

In addition, there is an essential subtlety to observe in selecting candidates for the plane at infinity π . We limit the discussion for the triangulation problems, but the similar considerations apply for other problems. First, it is not advisable to select π to be a plane cutting the region of interest \(\mathcal {D}\), for in this case, points in \(\mathcal {D}\) will be moved to infinity, which defeats our purpose of selecting non-parallel viewing directions. More important is a second consideration that if the selected plane at infinity separates a camera centre from the region of interest \(\mathcal {D}\), then cheirality of points in \(\mathcal {D}\) with respect to that camera is reversed (Hartley 1998). This causes complications in defining depth with the right sign.

In the experiments, we have found that by choosing the candidate plane at infinity as a plane parallel to one of the cameras’ image planes, and with an offset such that all 3D points and camera centres are just in front of this plane, then good verification rates are obtained. This choice corresponds to picking a principal plane to one of the cameras and moving it slightly backwards. Recall that the coordinates of a principal plane is extracted from the last row of the camera matrix.

Suppose the candidate plane at infinity is parametrized by v X+1=0 where v is a m-vector (for triangulation m=3). Then, the matrices A i and vectors c i of Theorem 3 should be updated according to A i A i vb i and c i c i k i v. The next step is to recompute the bounding depths, and finally test whether condition (16) holds.

6 Experiments

Test Procedure

In all experiments described below, the following test procedure is run until a definitive answer is returned (unless otherwise stated) in order to determine convexity and hence to verify global optimality.

  1. 1.

    Compute a local minimizer X local with bundle adjustment.

  2. 2.

    Compute lower and upper bounds, L i and U i , respectively, according to Theorem 1.

  3. 3.

    Test if the convexity condition in (12) holds (referred to as primary test).

  4. 4.

    Compute lower and upper bounds, \(L^{\alpha}_{i}\) and \(U^{\alpha}_{i}\), respectively, according to Theorem 2.

  5. 5.

    Test if the stronger convexity condition in (16) holds (referred to as α-test).

  6. 6.

    Apply a projective transformation and test convexity (see Sect. 5). This is only done for triangulation.

Notre Dame

This data set was originally created in Snavely et al. (2006). It consists of 595 images with 277,877 reconstructed 3D-points. Each point is visible in at least 2 images up to as many as 216 images according to the following distribution: 2 images 58 %, 3 images 19 %, 4–10 images 18 % and more than 10 images 5 %.

Out of 277,887 triangulated 3D-points, 265 instances were unsuccessful with our primary test in (12), and after applying the stronger α-test derived in Sect. 3, 158 cases remained. By transforming the plane at infinity, another 156 cases could be proven to be optimal. Hence, only 2 cases could not be verified to be globally optimal.

See Table 1 for a summary of the results. Each problem instance takes on the average less than 0.5 milliseconds to verify on a 2.66 GHz Pentium written in C++.

Table 1 Results for Notre Dame

Dinosaur

This turn-table sequence consists of 36 images with 328 given point features. The complete 3D reconstruction was computed with standard structure from motion routines (including refinements by bundle adjustment).

The multiview geometry problems tested were triangulation, camera resectioning and structure and motion with known rotations. See Table 2 for a summary of results. Both camera resectioning and triangulation work very well for this type of scene and camera motion. It was not possible to prove optimality for the whole sequence (assuming known rotations), only for the first 22 images. Note that this configuration is still a large structure and motion problem: 22 cameras and several hundreds of 3D points. When going beyond 22 images, the primary test was not sufficient. We did not apply the α-test due to the sheer size of the configuration.

Table 2 Results for Dinosaur

The large number of image measurements in this data set makes the total cost C(X 0) relatively large. And since this value determines the value of τ for the test, it is not surprising that the convexity test fails. One may ask to what pixel threshold is it possible to verify optimality? We ran a series of tests with different values τ’s using bisection and found that it is possible to verify the whole configuration for up to 2.59 pixels. It cannot be ruled out the existence of an optimum X opt with lower total cost C(X opt), but at the same time such a solution must have at least one residual with more than 2.59 pixels of error. Given that the data set is outlier free and typical residual errors are much lower, this seems like an unlikely occurrence. Hence, we may conclude that with high probability we have verified the globally optimal solution. Similar ideas have recently been developed for one-dimensional vision problems in Olsson et al. (2009).

We also tried verifying 100 random pairs of images with common feature points in this pair, and similarly 100 random triples with all common feature points, with 100 % success rate. Only pairs and triples with at least 10 feature points were selected.

The Christ Statue

The images were collected from various tourist photographs of this well-known statue and reconstructed with standard structure and motion routines. Similar experiments as for the Dinosaur sequences were performed with similar success rates, see Table 3.

Table 3 Results for the Christ Statue

Corridor

The forward camera motion for this 11-image sequence is quite different from the other sequences. Verifying global optimality for triangulation and camera resectioning turned out to be no problem, but for the other multiview problems it was more difficult, see Table 4. We did not apply the α-test to the known rotation cases due to the large size of the configurations.

Table 4 Results for Corridor

There are several 3D planes in the scene and we took all the point features on the left frontal wall and computed all pairwise 3×3 homographies for this plane using bundle adjustment. Out of the \(\binom{11}{2}=55\) image pairs, 27 were successfully verified with the primary test, and another 3 with the α-test.

The known rotation case turned out to be more difficult, which is consistent with the findings in Vedaldi et al. (2007). Not a single test was successful for the primary test. We did not apply the other methods due to the size of the setup. The whole sequence has 11 cameras and more than 4000 image points, so just computing all the depth tests takes time. (The size for the LP constraint set is approximately 4×4000=16000 since each image coordinate gives rise to two inequality constraints).

Arnold

A final experiment with 6 images of a poster was tested, see Fig. 2 and Table 5. In this scene, all 3D feature points are on a plane, and hence uncalibrated camera resectioning is not possible. Triangulation of all 3D points (in total, 1639 points) as well as all pairwise homographies (15 in total) induced by the plane were successfully verified by the primary test. The results with known rotation were as follows: 9 out of 15 pairwise images were successfully verified and 17 out 20 image triples were verified. Again, no other steps in the test procedure were applied due to the size of these configurations.

Fig. 2
figure 2

One image of the Arnold sequence with feature points

Table 5 Results for Arnold

Discussion

The method works very successfully (with almost 100 % success) on the triangulation and resection problems. However for large problems, such as full 3D reconstruction the method is less reliable. The main reason for this is the approximation made after (2), where it is deduced that each f i (X opt)≤ϵ 2. This is the worst-case hypothesis, that all the error is incurred by one measurement. The effect is to set an unrealistically high bound τ max in the convexity conditions, which causes the tests to fail. For this reason, the tests derived here do not work well for very large problems. It is only possible to verify subproblems for optimality.

The other main failure case for the primary and α-tests is when points are a long way away compared to the base line, that is, close to infinity. This is because the two bounds L and U in (10) or \(L_{i}^{\alpha}\) and \(U_{i}^{\alpha }\) in (15) are far apart—the intersection volume is too elongated. Indeed, it is easily seen that if U or \(U^{\alpha}_{i}\) is too large then the matrix in (10) or (15) cannot be positive definite. However, in this case, as shown, the projective change method succeeds in all but two cases.

In practice, the three tests, the primary test, the α-test and projective change are run sequentially until one of them returns a positive answer. Since the tests are successively more expensive in terms of time, this is the most efficient approach. The total time for triangulation verification of all the points in the Notre Dame sequence was 2 m 12 s (less than 0.5 ms per point) on a 2.66 GHz laptop, single threaded, programmed in C++.

Comparing the primary test and the α-test, it was observed (on the Notre Dame set) that the α-test is more accurate. There were only two cases where the α-test failed but the primary test succeeded. However the α-test takes about 5 times as long to run as the primary test (9 m 40 s). Since in practice it is only run when the primary test fails (about one test in 1000, this longer run time is not important).

As a tool to verify the optimality of the triangulation and resection results for a 3D reconstruction, this method requires insignificant time in the context of the complete reconstruction task. Even for a large reconstruction of 106 points, the time to verify that all points and cameras are triangulated or resectioned correctly will take no more than 500 seconds.

To summarize, reconstruction problems with small dimensions can be handled well with a small computational effort (the primary test generally works) while for larger problems the picture is mixed. The success rate depends on the geometry of the scene as well as the motion of the camera. Images taken with a wide baseline seem to be more easily verifiable compared to other camera motions. As the number of residuals increases, it becomes harder to verify optimality—just as expected.

7 Conclusion

The tests described here are extremely effective at verifying convexity, and hence global optimality of a local minimum for various multiple view geometry problems. The convexity test for the primary bound seems to do almost as well as α-bound. As the second bound is harder to compute, it is often advantageous to try the simple bound first. Experimental evaluations on several data sets showed the practical efficacy of the tests.

Failure of verifying optimality is quite rare. In the case of triangulation, the success rate is almost 100 % and verification time is about 0.5 ms per point. The main reason for failure is that the viewing rays are being parallel and the domain of interest becomes elongated, as illustrated in Fig. 1. For larger problems with many residuals, the success rate decreases, and it is possible only to verify subproblems. We have been able to verify optimality for reconstructions with more than 1500 image points and 20 cameras.

This work has shown how to deal with a large class of multiple view geometry problems under the L 2 cost function. On the other hand, many geometric problems involve optimizations including rotations, for example, the estimation of two-view relative motion. Such problems cannot be handled with the presented techniques and are left as a challenge for future work.