Keywords

1 Introduction

Estimating the pose of a camera from a set of 2D/3D point and correspondences has many applications in computer vision and robotics, such as robot autonomous navigation, Augmented Reality (AR) [35], SLAM [23, 29, 30] and VO [13, 20]. Recent studies [25, 26, 34] show that jointly using point and line features for pose estimation give improved results. As there may exist false matchings in the real scenarios, RANSAC algorithm [5] is generally used to point out these outliers. The solution of the minimal problem is an essential part of the RANSAC algorithm. This paper focuses on solving the minimal configurations of the 2D/3D point and line correspondences.

There are four minimal configurations for the 2D/3D point and line correspondences. Existing works generally focus on finding a specific solution for each of these minimal configurations. The similarity between the 2D/3D line correspondence and the 2D/3D point correspondence has been used in the literature. In [31], they apply such similarity to solve the least-squares problem of 2D/3D line and point correspondences. Kuang et al. [16] propose a minimal solution to estimate the pose of a camera with unknown focal length by points, directions and lines. Direct algebraic solution is generally adopted for pose estimation when the camera intrinsic parameters are unknown, because it is hard to get the 3D information of points and lines in the image plane without the intrinsic parameters. However, it is not clear whether the direct algebraic solution using the basic constraints is comparable to the methods based on well-designed geometric transformations when the intrinsic parameters are known. The specific geometric transformation can eliminate the unknown or even get lower order equation. Such simplification is thought to probably result in a more stable algorithm. This paper shows that directly solving the basic constraints can result in more stable or comparable results. We significantly improve the stability of the efficient three quadrics solver, E3Q3 [17], by selecting a proper unknown elimination order. This can benefit the vision tasks that resort to a three quadrics solver. We compare our algorithm with the previous algorithms by simulations. The results show that our algebraic algorithm is comparable to the state-of-the-art P3P algorithm [14], and is superior to the state-of-the-art algorithms of the other three cases in terms of stability. In addition, our algorithm is efficient and can be applied in real-time applications.

2 Related Work

The four minimal problems mentioned above have been solved case by case in the literature.

P3P Problem. Calculating the camera pose from three 2D/3D point correspondences is known as the P3P problem, which have been extensively studied in the literature. The first solution for the P3P problem dates back to 1841 presented by Grunert [7]. This algorithm applies the law of cosines to generate three quadratic equations about the lengths between the three 3D points and the camera origin. This is a specific quadratic polynomial system without first order monomials, which can result in a quartic equation with a closed-form solution. Several works [4, 18, 22] follow this formulation with difference in the specific variable elimination approach used to get the quartic equation. Haralick et al. [9] present a detailed comparison about these algorithms. More general approaches are also used to explore this specific quadric system. Quan et al. [27] apply the Sylvester resultant [19] to solve the quadric system. Gao et al. [6] employ Wu-Ritt’s zero decomposition algorithm [32] to systematically study this equation system, and provide a complete analytical solution. They also give criteria to determine the number of solutions and the number of real physical solutions. The drawback of this series of algorithms is that they need to solve a 3D/3D point alignment problem [1] to get the pose. This increases the computational time. Additionally, due to the finite representation of a digital number, the numerical error accumulated in the extra step may degrade the accuracy. Kneip et al. [15] and Masselli et al. [21] address this problem by introducing the intermediate coordinate frame to eliminate the variable. Most recently, Ke et al. [14] give an algebraic solution to directly compute the camera pose. Due to avoiding extra transformations, this algorithm is efficient and accurate. These approaches make use of the specific property of the P3P problem. Therefore, they can not be generalized to the other three minimal problems.

Two Point and One Line, and One Point and Two Line Correspondences. These two cases have not been studied thoroughly in the literature. Ramalingam et al. [28] give a solution to both problems. They apply the collinearity of the 2D/3D point correspondence, and the coplanarity of the 2D/3D line correspondence to construct constraints on the camera pose. They design two intermediate coordinate systems for each problems to eliminate the variables. Their transformations involve tangent function. This may cause numerical problem. Our algorithm also uses the collinearity and coplanarity constraints. But our algorithm does not require any transformation. This can avoid numerical error propagation, thus can increase accuracy. Besides, their algorithm needs to calculate the Singular-Value Decomposition (SVD) of a relative large matrix. This reduces the speed of the algorithm.

P3L Problem. Determining the camera pose by three line correspondences is known as the Perspective-3-Line (P3L) problem. Several solutions [2, 3, 33] have been proposed for this problem. Chen [2] analyzes the degenerate condition of the P3L problem. Xu et al. [33] study the number of potential solutions of the P3L problem. These methods adopt the similar methodology. They introduce intermediate coordinate systems to make one of the constraints on the rotation matrix automatically satisfied. Two transformations are required by [3], and one transformation is needed by [2, 33]. The simplified problem then can be solved by using the elementary linear algebra and the trigonometric identity. Our algorithm does not need such transformation, thus reduces the numerical error accumulation.

3 Notation and Geometrical Constraints

In this paper, we use italic, boldfaced lowercase and boldfaced uppercase letters to represent scalars, vectors and matrices, respectively. The aim of this paper is to calculate the rotation R and translation t between a world frame \({O^w}{X^w}{Y^w}{Z^w}\) and a camera frame \({O^c}{X^c}{Y^c}{Z^c}\) from the minimal configurations of 2D/3D point and line correspondences, including three point correspondences, two point and one line correspondences, one point and two line correspondences, and tree line correspondences. As mentioned above, determining the camera pose from three 2D/3D point correspondences and three 2D/3D line correspondences are known as the P3P and P3L problem, respectively. To simplify the notation, we call determining the camera pose from two point and one line correspondences as the Perspective-2-Point-and-1-Line (P2P1L) problem, and determining the pose from one point and two line correspondences as the Perspective-1-Point-and-2-Line (P1P2L) problem. This section describes the notation and geometrical constraints yielded by one 2D/3D point correspondence and one 2D/3D line correspondence, illustrated in Fig. 1.

Fig. 1.
figure 1

Geometric constraints from one 2D/3D point correspondence and one 2D/3D line correspondence.

3.1 2D/3D Point Correspondence

In this paper, we use a quaternion \(\mathbf {q}={{\left[ w,x,y,z\right] }^{T}}\) [12] to represent the rotation matrix R as:

$$\begin{aligned} \mathbf {R} = \left[ \begin{matrix} {{w}^{2}}+{{x}^{2}}-{{y}^{2}}-{{z}^{2}} &{} 2xy-2wz &{} 2wy+2xz \\ 2wz+2xy &{} {{w}^{2}}-{{x}^{2}}+{{y}^{2}}-{{z}^{2}} &{} 2yz-2wx \\ 2xz-2wy &{} 2wx+2yz &{} {{w}^{2}}-{{x}^{2}}-{{y}^{2}}+{{z}^{2}} \\ \end{matrix} \right] \end{aligned}$$
(1)

Let \({{\mathbf {P}}^{{{P}_{i}}}}\) denote a 3D point and \({{\mathbf {p}}_{i}}\) the back-projection ray of its image. To avoid extra transformation, we do not adopt the law of cosines widely used in the P3P problem [9]. As \({{\mathbf {p}}_{i}}\) is collinear with \({{\mathbf {P}}^{{{P}_{i}}}}\), we have:

$$\begin{aligned} {{\mathbf {p}}_{i}}\times \left( \mathbf {RP}_{i}^{P}+\mathbf {t} \right) =\mathbf {0} \end{aligned}$$
(2)

where \(\times \) represents the cross product, which can be calculated as:

$$\begin{aligned} {{\left[ {{\mathbf {p}}_{i}} \right] }_{\times }}\left( \mathbf {RP}_{i}^{P}+\mathbf {t} \right) =\mathbf {0} \end{aligned}$$
(3)

where \({{\left[ {{\mathbf {p}}_{i}} \right] }_{\times }}\) is a skew-symmetric matrix having the form:

(4)

Substituting (1) and (4) into (3), we have the following three quadric equations:

$$\begin{aligned} \begin{aligned}&c_{j,1}^{{p_i}}{x^2} + c_{j,2}^{{p_i}}{y^2} + c_{j,3}^{{p_i}}{z^2} + c_{j,4}^{{p_i}}{w^2} + c_{j,5}^{{p_i}}xy + c_{j,6}^{{p_i}}xz \\&+ c_{j,7}^{{p_i}}xw + c_{j,8}^{{p_i}}yz + c_{j,9}^{{p_i}}yw + c_{j,10}^{{p_i}}zw - {p_{i3}}{t_2} + {p_{i2}}{t_3} = 0, j=1,2,3 \\ \end{aligned} \end{aligned}$$
(5)

where \({t_k}\) \((k=1,2,3)\) are the three components of t. Define

$$\begin{aligned} {\mathbf {r}} = {\left[ {{x^2}},{{y^2}},{{z^2}},{{w^2}},{xy},{xz},{xw},{yz},{yw},{zw} \right] ^T} \end{aligned}$$
(6)

We can simplify the j-th equation of the i-th point correspondences in (5) as

$$\begin{aligned} \mathbf{{c}}_j^{{p_i}} \cdot \mathbf{{r}} + \mathbf{{n}}_j^{{p_i}} \cdot \mathbf{{t}} = 0,\quad j = 1,2,3 \end{aligned}$$
(7)

where \( \cdot \) represents the dot product, \(\mathbf{{c}}_j^{{p_i}}\) is a \(10\,\times \,1\) vector and \(\mathbf{{n}}_j^{{p_i}}\) is a \(3\,\times \,1\) vector. As \({\left[ {{\mathbf{{p}}_i}} \right] _ \times }\) is a rank-2 matrix, this equation system only provides 2 linear independent constraints.

3.2 2D/3D Line Correspondence

Let \({{L}_{i}}\) and \({{l}_{i}}\) represent a 3D line and its corresponding 2D line. Denote the direction of \({{L}_{i}}\) as \({{\mathbf {v}}^{{{L}_{i}}}}\) and a 3D point on \({{L}_{i}}\) as \({{\mathbf {P}}^{{{L}_{i}}}}\). The back-projection of \({{l}_{i}}\) is a plane \({{\pi }_{i}}\) that passes through the origin of the camera frame. Denote the normal of \({{\pi }_{i}}\) as \({{\mathbf {n}}^{{{l}_{i}}}}\). Since \({{L}_{i}}\) should be on \({{\pi }_{i}}\), we get the following constraints:

$$\begin{aligned} \begin{array}{l} {\mathbf{{n}}^{{l_i}}} \cdot \mathbf{{R}}{\mathbf{{v}}^{{L_i}}} = 0, \\ {\mathbf{{n}}^{{l_i}}} \cdot \left( {\mathbf{{R}}{\mathbf{{P}}^{{L_i}}} + \mathbf{{t}}} \right) = 0 \end{array} \end{aligned}$$
(8)

Substituting (1) into (8) and using the definition of (6), we obtain two quadrics:

$$\begin{aligned} \begin{array}{l} \mathbf{{c}}_1^{{l_i}} \cdot \mathbf{{r}} = 0, \\ \mathbf{{c}}_2^{{l_i}} \cdot \mathbf{{r}} + {\mathbf{{n}}^{{l_i}}} \cdot \mathbf{{t}} = 0 \end{array} \end{aligned}$$
(9)

4 Minimal Solution

4.1 P3P

We give a new approach for the P3P problem. As we seek to give a generic framework for all the minimal configurations of 2D/3D point and line correspondences, we avoid adopting the specific property of the P3P problem used by the previous works [9, 14]. As mentioned above, each 2D/3D correspondence provides two constraints. Without loss of generality, we pick up the first two equations of (3) from the first two correspondences, and the first and the last equations from the third correspondence. According to (7), we have the following quadratic equation system:

$$\begin{aligned} \begin{array}{@{} l} \mathbf{{c}}_1^{{p_1}} \cdot \mathbf{{r}} + \mathbf{{n}}_1^{{p_1}} \cdot \mathbf{{t}} = \mathbf{{c}}_1^{{p_2}} \cdot \mathbf{{r}} + \mathbf{{n}}_1^{{p_2}} \cdot \mathbf{{t}} = \mathbf{{c}}_1^{{p_3}} \cdot \mathbf{{r}} + \mathbf{{n}}_1^{{p_3}} \cdot \mathbf{{t}} = 0 \\ \mathbf{{c}}_2^{{p_1}} \cdot \mathbf{{r}} + \mathbf{{n}}_2^{{p_1}} \cdot \mathbf{{t}} = \mathbf{{c}}_2^{{p_2}} \cdot \mathbf{{r}} + \mathbf{{n}}_2^{{p_2}} \cdot \mathbf{{t}} = \mathbf{{c}}_3^{{p_3}} \cdot \mathbf{{r}} + \mathbf{{n}}_3^{{p_3}} \cdot \mathbf{{t}} = 0 \end{array} \end{aligned}$$
(10)

Divide this equation system into two parts, so that the first part contains the first 3 equations and the second part contains the remaining ones. Then we have:

$$\begin{aligned} \begin{array}{@{} l} {\mathbf{{C}}_1}{} \mathbf{{r}} + {\mathbf{{N}}_1}{} \mathbf{{t}} = \mathbf{{0}},\quad {\mathbf{{C}}_2}{} \mathbf{{r}} + {\mathbf{{N}}_2}{} \mathbf{{t}} = \mathbf{{0}} \\ {\mathbf{{C}}_1} = \left[ \mathbf{{c}}_1^{{p_1}}, \mathbf{{c}}_1^{{p_2}}, \mathbf{{c}}_1^{{p_3}} \right] ^{T},\;{\mathbf{{N}}_1} = \left[ \mathbf{{n}}_1^{{p_1}}, \mathbf{{n}}_1^{{p_2}}, \mathbf{{n}}_1^{{p_3}} \right] ^{T}\;\\ {\mathbf{{C}}_2} = \left[ \mathbf{{c}}_2^{{p_1}}, \mathbf{{c}}_2^{{p_2}}, \mathbf{{c}}_3^{{p_3}} \right] ^{T},\;{\mathbf{{N}}_2} = \left[ \mathbf{{n}}_2^{{p_1}}, \mathbf{{n}}_2^{{p_2}}, \mathbf{{n}}_3^{{p_3}} \right] ^{T} \end{array} \end{aligned}$$
(11)

where \({{\mathbf {C}}_{1}}\) and \({{\mathbf {C}}_{2}}\) are 3 \(\times \) 10 matrices, \({{\mathbf {N}}_{1}}\) and \({{\mathbf {N}}_{2}}\) are 3 \(\times \) 3 matrices. Using \({\mathbf{{C}}_2}{} \mathbf{{r}} + {\mathbf{{N}}_2}{} \mathbf{{t}} = \mathbf{{0}}\) in (11), we get a closed-form solution for \(\mathbf {t}\) as

$$\begin{aligned} \mathbf{{t}} = - {\left( {{\mathbf{{N}}_2}} \right) ^{ - 1}}{\mathbf{{C}}_2}{} \mathbf{{r}} \end{aligned}$$
(12)

Other choices are also valid, if the coefficient matrix of t is invertible. Replace \({\mathbf {t}}\) in \({\mathbf{{C}}_1}{} \mathbf{{r}} + {\mathbf{{N}}_1}{} \mathbf{{t}} = \mathbf{{0}}\) in (11). Together with the norm one constraint of \({\mathbf {q}}\), we get four quadratic equations for the four elements in \({\mathbf {q}}\) as:

$$\begin{aligned} \begin{array}{l} \mathbf{{Ar}} = 0\\ {w^2} + {x^2} + {y^2} + {z^2} = 1 \end{array} \end{aligned}$$
(13)

where

$$\begin{aligned} \mathbf{{A}} = {{\mathbf{{C}}_1} - {\mathbf{{N}}_1}{{\left( {{\mathbf{{N}}_2}} \right) }^{ - 1}}{\mathbf{{C}}_2}} \end{aligned}$$
(14)

We will show that the other three minimal cases also have the same quadric forms of q. Therefore, we will give the solution to it at the end of this section.

4.2 P2P1L

For the two 2D/3D point correspondences, we chose the first two equations of (7). Together with constraints in (9) from the line correspondence, we can obtain the following equation system:

$$\begin{aligned} \begin{array}{@{} l} \mathbf{{c}}_1^{{l_1}} \cdot \mathbf{{r}} = \mathbf{{c}}_1^{{p_1}} \cdot \mathbf{{r}} + \mathbf{{n}}_1^{{p_1}} \cdot \mathbf{{t}} = \mathbf{{c}}_1^{{p_2}} \cdot \mathbf{{r}} + \mathbf{{n}}_1^{{p_2}} \cdot \mathbf{{t}} = 0\\ \mathbf{{c}}_2^{{l_1}} \cdot \mathbf{{r}} + {\mathbf{{n}}^{{l_1}}} \cdot \mathbf{{t}} = \mathbf{{c}}_2^{{p_1}} \cdot \mathbf{{r}} + \mathbf{{n}}_2^{{p_1}} \cdot \mathbf{{t}} = \mathbf{{c}}_2^{{p_2}} \cdot \mathbf{{r}} + \mathbf{{n}}_2^{{p_2}} \cdot \mathbf{{t}} = 0 \end{array} \end{aligned}$$
(15)

There are 5 equations in t. Without loss of generality, we choose one equation involving t from each correspondence to solve t. To simplify the notation, we use the same notation as (11). Rearranging (15), we have

$$\begin{aligned} \begin{array}{@{} l} \mathbf{{c}}_1^{{l_1}} \cdot \mathbf{{r}} = 0,\quad {\mathbf{{C}}_1}{} \mathbf{{r}} + {\mathbf{{N}}_1}{} \mathbf{{t}} = 0, \quad {\mathbf{{C}}_2}{} \mathbf{{r}} + {\mathbf{{N}}_2}{} \mathbf{{t}} = 0\\ {\mathbf{{C}}_1} = \left[ \mathbf{{c}}_1^{{p_1}}, \mathbf{{c}}_1^{{p_2}} \right] ^{T},\;{\mathbf{{N}}_1} = \left[ \mathbf{{n}}_1^{{p_1}}, \mathbf{{n}}_1^{{p_2}} \right] ^{T}\;\\ {\mathbf{{C}}_2} = \left[ \mathbf{{c}}_2^{{l_1}}, \mathbf{{c}}_2^{{p_1}}, \mathbf{{c}}_2^{{p_2}} \right] ^{T},\;{\mathbf{{N}}_2} = \left[ {\mathbf{{n}}^{{l_1}}}, \mathbf{{n}}_2^{{p_1}}, \mathbf{{n}}_2^{{p_2}} \right] ^{T} \end{array} \end{aligned}$$
(16)

where \({{\mathbf {C}}_{1}}\) is a 2 \(\times \) 10 matrix, and \({{\mathbf {N}}_{1}}\) is a 2 \(\times \) 3 matrix. Using \({{\mathbf {C}}_{2}}\mathbf {r}+{{\mathbf {N}}_{2}}\mathbf {t}=\mathbf {0}\), we can obtain a closed-form solution for t as (12). Substituting the expression of t into \({{\mathbf {C}}_{1}}\mathbf {r}+{{\mathbf {N}}_{1}}\mathbf {t}=\mathbf {0}\), we get a quadric equation system as (13), with

$$\begin{aligned} \mathbf{{A}} = \left[ \begin{array}{l} \mathbf{{c}}_1^{{l_1}}, \left( {\mathbf{{C}}_1} - {\mathbf{{N}}_1}{\left( {{\mathbf{{N}}_2}} \right) ^{ - 1}}{\mathbf{{C}}_2}\right) ^{T} \end{array} \right] ^{T} \end{aligned}$$
(17)

4.3 P1P2L

Given one 2D/3D point and two 2D/3D line correspondences, according to (7) and (9), we can have the following equations:

$$\begin{aligned} \begin{array}{@{} l} \mathbf{{c}}_1^{{l_1}} \cdot \mathbf{{r}} = \mathbf{{c}}_1^{{l_2}} \cdot \mathbf{{r}} = \mathbf{{c}}_1^{p_1} \cdot \mathbf{{r}} + \mathbf{{n}}_1^{{p_1}} \cdot \mathbf{{t}} = 0 \\ \mathbf{{c}}_2^{{l_1}} \cdot \mathbf{{r}} + {\mathbf{{n}}^{{l_1}}} \cdot \mathbf{{t}} = \mathbf{{c}}_2^{{l_2}} \cdot \mathbf{{r}} + {\mathbf{{n}}^{{l_2}}} \cdot \mathbf{{t}} = \mathbf{{c}}_2^{p_1} \cdot \mathbf{{r}} + \mathbf{{n}}_2^{{p_1}} \cdot \mathbf{{t}} = 0 \end{array} \end{aligned}$$
(18)

Here we use the first two equations of (7). Other choices are also valid. Each line correspondence provides one constraint on t. Together with another constraint from the point correspondence, we can obtain three linear equations with respect to t. Rearranging the equations, we can have:

$$\begin{aligned} \begin{array}{l} \mathbf{{c}}_1^{{l_1}} \cdot \mathbf{{r}}= \mathbf{{c}}_1^{{l_2}} \cdot \mathbf{{r}} = \mathbf{{c}}_1^{p_1} \cdot \mathbf{{r}} + \mathbf{{n}}_1^{{p_1}} \cdot \mathbf{{t}} = 0\\ {\mathbf{{C}}_2}{} \mathbf{{r}} + {\mathbf{{N}}_2}{} \mathbf{{t}} = 0\\ {\mathbf{{C}}_2} = \left[ \mathbf{{c}}_2^{{l_1}}, \mathbf{{c}}_2^{l_2}, \mathbf{{c}}_2^{p_1}\right] ^{T},\; {\mathbf{{N}}_2} = \left[ {\mathbf{{n}}^{{l_1}}}, {\mathbf{{n}}^{{l_2}}}, \mathbf{{n}}_2^{{p_1}} \right] ^{T} \end{array} \end{aligned}$$
(19)

Using (12), we can get t. Substituting (12) into \(\mathbf {c}_{1}^{p_1}\cdot \mathbf {r}+\mathbf {n}_{1}^{{{p}_{1}}}\cdot \mathbf {t}=0\), we can get a quadratic equation system the same as (13) with

$$\begin{aligned} \mathbf{{A}} = \left[ \begin{array}{l} \mathbf{{c}}_1^{{l_1}}, \mathbf{{c}}_1^{{l_2}}, \mathbf{{c}}_1^{{p_1}} - \left( {\left( {{\mathbf{{N}}_2}} \right) ^{ - 1}}{\mathbf{{C}}_2}\right) ^{T}{} \mathbf{{n}}_1^{{p_1}} \end{array} \right] ^{T} \end{aligned}$$
(20)

4.4 P3L

Given three line correspondences, we can have the following quadratic equation system according to (9):

$$\begin{aligned} \begin{array}{@{}l} \mathbf{{c}}_1^{{l_1}} \cdot \mathbf{{r}} = \mathbf{{c}}_1^{{l_2}} \cdot \mathbf{{r}} = \mathbf{{c}}_1^{{l_3}} \cdot \mathbf{{r}} = 0 \\ \mathbf{{c}}_2^{{l_1}} \cdot \mathbf{{r}} + {\mathbf{{n}}^{{l_1}}} \cdot \mathbf{{t}}= \mathbf{{c}}_2^{{l_2}} \cdot \mathbf{{r}} + {\mathbf{{n}}^{{l_2}}} \cdot \mathbf{{t}}= \mathbf{{c}}_2^{{l_3}} \cdot \mathbf{{r}} + {\mathbf{{n}}^{{l_3}}} \cdot \mathbf{{t}} = 0 \end{array} \end{aligned}$$
(21)

It is clear that the first three quadrics only involving the quaternion q. Combining with the norm one constraint of q, we have a form the same as (13) with

$$\begin{aligned} \mathbf{{A}} = \left[ \begin{array}{l} \mathbf{{c}}_1^{{l_1}}, \mathbf{{c}}_1^{{l_2}}, \mathbf{{c}}_1^{{l_3}} \end{array} \right] ^{T} \end{aligned}$$
(22)

Besides, t can be computed from the last three equations of (21) using (12) with

$$\begin{aligned} {\mathbf{{C}}_2} = \left[ \begin{array}{l} \mathbf{{c}}_2^{{l_1}},\mathbf{{c}}_2^{l_2},\mathbf{{c}}_2^{l_3} \end{array} \right] ^{T},\;{\mathbf{{N}}_2} = \left[ \begin{array}{l} {\mathbf{{n}}^{{l_1}}}, {\mathbf{{n}}^{{l_2}}}, {\mathbf{{n}}^{{l_3}}} \end{array} \right] ^{T} \end{aligned}$$
(23)

4.5 Solve the Rotation Matrix

As mentioned above, in all of the four minimal configurations, \(\mathbf {R}\) can be obtained by solving a quadratic equation system with the form (13). It seems that there are 16 solutions according to the Bézout’s Theorem [19]. However, as (1) only includes degree 2 monomials, signs of unknowns do not impact on the value of \(\mathbf {R}\). Thus, there are at most 8 real solutions for \(\mathbf {R}\). In this section, we show how to solve this quadratic equation system. Assume w is not 0. Let us define

$$\begin{aligned} x = aw,\quad y = bw,\quad z = cw \end{aligned}$$
(24)

Divide both side of \(\mathbf{{Ar}} = \mathbf{{0}}\) in (13) by w. We can have

$$\begin{aligned} {\mathbf{{a}}_i} \left[ {{a^2},{b^2},{c^2},ab,ac,bc,a,b,c,1} \right] ^{T} = 0,\;\;i = 1,2,3 \end{aligned}$$
(25)

where \({{\mathbf {a}}_{i}}\) is the i-th row of \(\mathbf {A}\). It is easy to find that \([a,b,c]^{T}\) is the intersection of three quadrics. This can be solved by the E3Q3 algorithm [17].

For completeness, we briefly introduce the E3Q3 algorithm. By regarding a as a constant, we get three equations about b and c. Dividing the six monomials of b and c into two parts, i.e., \(\left\{ {{b}^{2}},{{c}^{2}},bc \right\} \) and \(\left\{ b,c,1 \right\} \), we can obtain:

(26)

Assume H is invertible. Multiplying \({{\mathbf {H}}^{-1}}\) to both side of (26), we get the relationship between \(\left\{ {{b}^{2}},{{c}^{2}},bc \right\} \) and \(\left\{ b,c,1 \right\} \). Using this relationship and the identities \(({{b}^{2}})c=(bc)b\), \((bc)c=({{c}^{2}})b\), and \((bc)(bc)=({{b}^{2}})({{c}^{2}})\), we can get a homogeneous linear system whose coefficients \(\mathbf {M}(a)\) are polynomials in a. According to the linear algebra, the homogeneous linear system has a non-trivial solution, if and only if the determinant of \(\mathbf {M}(a)\) is zero. This results in a degree 8 polynomial in a. Solve this for a, then back-substitute a into the linear system to get b and c.

Given a, b and c, \({{w}^{2}}\) can be obtained from the norm one constraint of the quaternion by \( {w^2} = 1/\left( {a^2} + {b^2} + {c^2} + 1\right) . \) Substituting (24) into (1) and using \({{w}^{2}}\), we can easily obtain R. There are two assumptions for computing R. The first is H is invertible, and the second is w is not zero. Therefore, singularity occurs when the assumptions do not satisfy. We address both singularities in the following two sections.

Robust E3Q3 (RE3Q3). Kukelova et al. [17] find that there are 8 degenerate configurations when H is rank deficient, and they give the solution for each of them. However, this method is hard to handle the situation when H approximates singularity, which will significantly degrade the performance of the algorithm as shown in Fig. 2a.

As we can treat any of a, b and c as a constant, and the other two as unknowns in (25), there actually exist three choices for H. Let \({{\mathbf {H}}_{a}}\), \({{\mathbf {H}}_{b}}\) and \({{\mathbf {H}}_{c}}\) represent the coefficient matrices obtained by choosing a, b and c as a constant, respectively. If the coefficient matrix of the second order monomials in (25) is nondegenerate, it is probable that when \({{\mathbf {H}}_{a}}\) is ill-conditioned, but \({{\mathbf {H}}_{b}}\) or \({{\mathbf {H}}_{c}}\) is still in good condition. We try to find the one with the best condition. As the condition number of a matrix describes to what extent a matrix approaches singularity. The larger the condition number is, the closer the matrix approaches singularity. We calculate the condition number of \({{\mathbf {H}}_{a}}\), \({{\mathbf {H}}_{b}}\) and \({{\mathbf {H}}_{c}}\), and choose the one with the minimal condition number to replace \({\mathbf {H}}\) in (25). This just needs to interchange the coefficient of (25), and do not need to implement different algorithm for different choice. Thus, it is efficient. We call this approach Robust E3Q3 (RE3Q3). Figure 2a shows that RE3Q3 is much more stable than the original E3Q3 algorithm [17] in the degenerate configuration. Besides, in the general situation, RE3Q3 still improves the stability of E3Q3, as shown in Fig. 2b.

Fig. 2.
figure 2

Compare RE3Q3 with E3Q3 in degenerate situation (a) and in general situation (b). We randomly generate the coefficients of (25) except for the constants. Then we randomly generate a solution, and substitute it into (25) to calculate the constants. For the degenerate cases, we get H in (26) and randomly set the smallest singular value within \(\left( 0,{{10}^{-6}} \right) \). We run the algorithm 50,000 times.

Reference Rotation. When w in \(\mathbf {q}\) is small, according to (24), a, b and c are probably greater than 1. Thus, they may amplify the estimation error of w when we compute x, y and z. This effect increases, when w gets smaller. The performance of our algorithm will degrade if w is a very tiny value, as shown in Fig. 3, where \(w\in (0,{{10}^{-6}})\). If we have a reference rotation represented as a quaternion \({{\mathbf {q}}_{ref}}\), which gives a rough estimation of the rotation, we can easily solve this problem. Given a \({{\mathbf {q}}_{ref}}\), we can exchange w with the element that has the maximum absolute value in \({{\mathbf {q}}_{ref}}\) to get \(\mathbf {{q}'}\). This makes a, b and c all smaller than 1. Exchanging the element of \(\mathbf {{q}'}\) equals to permute the coefficients in (25), and the computational cost is negligible. When \(\mathbf {{q}'}\) is calculated, we can get the original \(\mathbf {{q}}\) by applying the exchange again.

Fig. 3.
figure 3

Rotation error for P3L and P3P when \(w\in (0,{{10}^{-6}})\), which is degenerate for the baseline algorithm. We generate 3 extra points to find the most accurate rotation \({{\mathbf {R}}_{self}}\). Then we use \({{\mathbf {R}}_{self}}\) as a reference rotation to calculate the rotation again (labeled as Self Reference R). We run the algorithm 50,000 times. This method gives almost the same result as using the ground truth as the reference matrix. It is clear that our algorithm can provide valid reference rotation even in the degenerate situation.

In the application, we can generally have a \({{\mathbf {q}}_{ref}}\). For example, in the SLAM system, camera pose is sequentially estimated. Therefore, the last rotation can be used as the reference rotation. In addition, the minimal solution is generally used in the RANSAC algorithm [5], the current optimal rotation estimation can be a reference rotation. One question is that whether our algorithm can generate a valid reference matrix in the degenerate configuration. To verify this, we run our P3L and P3P algorithm on 50,000 randomly generated degenerate configurations where \(w\in (0,{{10}^{-6}})\). Three additional points are generated to select the most accurate rotation, denoted as \({{\mathbf {R}}_{self}}\). \({{\mathbf {R}}_{self}}\) is then used as the reference rotation. We also use the ground truth rotation \({{\mathbf {R}}_{gt}}\) as the reference rotation. The experimental results in Fig. 3 show that \({{\mathbf {R}}_{self}}\) and \({{\mathbf {R}}_{gt}}\) gives almost the small results. As we only use the relative order of \({{\mathbf {q}}_{ref}}\), \({{\mathbf {q}}_{ref}}\) can be rather rough. This makes our algorithm stable even in the degenerate case.

4.6 Algorithm Summary

As mentioned above, the rotation matrix R of all the four minimal configurations can be obtained by solving a quadric equation system having the form of (13). Given R, the translation t can be calculated from a linear system (12). One 2D/3D point correspondence gives three equations in (3). We use one of them for R estimation. Given R, we use the remaining 2 equations for the estimation of t. Our algorithm is summarized in Algorithm 1.

figure a

5 Simulation Results

As the previous works [2, 3, 6, 9, 14, 15, 28], we compare our algorithm with the state-of-the-art algorithms by simulations. We can evaluate the algorithms by a large number of configurations in the simulation. As the same input will generate the same result, the simulation results will unfold the performance of different algorithms in real applications.

Given the estimation \((\mathbf {\hat{R}},\mathbf {\hat{t}} )\) and the ground truth \(\left( {{\mathbf {R}}_{gt}},{{\mathbf {t}}_{gt}} \right) \), the estimation error of \(\mathbf {\hat{R}}\) is measured by the absolute angle of the axis-angle representation of \(\mathbf {\hat{R}R}_{gt}^{-1}\) as [17], and the estimation error of \(\mathbf {\hat{t}}\) is measured by \({{{\left\| \mathbf {\hat{t}}-{{\mathbf {t}}_{gt}} \right\| }_{2}}}/{{{\left\| {{\mathbf {t}}_{gt}} \right\| }_{2}}}\;\). We randomly generate the rotation matrix by Euler angle. The position of the camera is within a cube \({{\left[ -5,5 \right] }^{3}}\). The camera has resolution \(640\,\times \,480\) and focal length 800. A line is generated by two random points. The depth of the 3D point is within \(\left[ 2,8 \right] \). We also study the behavior of our algorithm with or without a reference rotation. As shown in Fig. 3, the rotation matrix calculated by our algorithm is as valid as the ground truth. Thus, we use the ground truth rotation matrix as the reference. Our algorithm without a reference rotation is labeled as the baseline. The following results are obtained from 50,000 trials. Table 1 lists the mean, standard deviation, median, and maximal estimation errors. It shows that our baseline algorithm is comparable to the state-of-the-art P3P algorithm [14], and outperforms the previous algorithms of the other three problems. Besides, a reference rotation can further increase the performance.

5.1 Results of P3P Problem

We compare our algorithm with Ke’s algorithm [14], Kneip’s algorithm [15] and Gao’s algorithm [6]. For fairness, we do not apply root polishing for Ke’s algorithm. As all of these algorithms have publicly available c++ implementation, we also implement our algorithm in c++. We use Hartley’s Sturm sequences [10] implementation to solve the eighth degree polynomial equation. The relative error is set to \({10^{- 14}}\) as [17].

The histograms of rotation and translation errors are shown in Fig. 4. Table 1 lists the quantitative results of different algorithms. It is clear that the reference rotation can increase the stability of our algorithm. Ke’s algorithm is better than our algorithm in rotation. Our algorithm gives better results in translation, as we use more equations for the translation estimation, and solve it in the least-squares manner. Our algorithm outperforms other P3P algorithms. This is because both Ke’s algorithm and our algorithm avoid the unnecessary intermediate transformation, therefore reduce the numerical error accumulation.

Fig. 4.
figure 4

Histograms of rotation \(\mathbf {R}\) (left) and \(\mathbf {t}\) (right) errors for P3P algorithms.

5.2 Results of P2P1L and P1P2L Problem

We compare our algorithm with Ramalingam’s algorithm [28] for the P2P1L and P1P2L problems. The error histograms are shown in Figs. 5 and 6. Table 1 gives the statistics of the estimation error. It is obvious that our algorithm outperforms Ramalingam’s algorithm in terms of accuracy. Ramalingam’s algorithm requires two intermediate transformations. Numerical errors accumulated in these transformations potentially decrease the accuracy. Besides, their transformations involve tangent function, which may case numerical issue.

Fig. 5.
figure 5

Histograms of \(\mathbf {R}\) (left) and \(\mathbf {t}\) (right) errors for different P2P1L algorithms.

Fig. 6.
figure 6

Histograms of \(\mathbf {R}\) (left) and \(\mathbf {t}\) (right) errors for P1P2L algorithms.

5.3 Results of P3L Problem

As mentioned in the Sect. 2, The P3L algorithms [2, 3, 33] are similar. We compare our algorithm with the latest P3L algorithm [33]. Figure 7 shows the results of different algorithms. In the area of very small rotation error (the first two bins in Fig. 7a), Xu’s algorithm has a higher probability than our algorithm. However, as shown by the sub-windows in Fig. 7, Xu’s algorithm has a very long tail. Besides, the statistics listed in Table 1 also verify that our algorithm is more stable than Xu’s algorithm. The maximal rotation and translation errors of Xu’s algorithm approximate 0.1 rad and 0.8, respectively. Our maximal rotation and translation errors are much smaller than theirs.

Fig. 7.
figure 7

Histograms of \(\mathbf {R}\) (left) and \(\mathbf {t}\) (right) errors for P3L algorithms.

Table 1. Mean, standard deviation (STD), median, and max of the pose errors. The best result is highlighted by the boldface.
Table 2. Computational time comparison. RE3Q3, E3Q3, and P3P algorithms are tested by c++. Others are tested by Matlab.

5.4 Computational Time

Our algorithm is implemented in C++ using Eigen linear algebra library [8] for the P3P problem. We use the OpenCV’s [24] implementation of Ke’s algorithm [14]. For the other minimal problems, we compare the time using the Matlab implementations. Here, we only list our running time with reference rotation, as the running of our baseline algorithm is very similar to our algorithm with reference rotation. As all the four minimal problems are solved in a uniform framework, the computational time of the other three cases in c++ should be similar to the time of the P3P problem. In the application, we can use the reference rotation to reduce the computational time of the translation. For fairness, we compute all the eight solutions of translation here. All the results are obtained by 50,000 trials on a laptop with a 2.9 GHZ intel core i7 CPU.

Table 2 gives the results. Compared to E3Q3, RE3Q3 slightly increases the running time. It is not surprising that our algorithm is slower that Ke’s algorithm [14], as we need to solve an eighth degree equation for the rotation and eight linear equation systems for the translation, but they only need to solve a quartic equation for the rotation and four linear systems for the translation. For the P3L problem, Xu’s algorithm [33] is slightly faster than ours. This is because they use a transformation to directly eliminate one of the rotation variable. Ramalingam’s algorithm [28] is slower than our algorithm. This is because their algorithm needs to compute SVD of a 6 \(\times \) 8 matrix for the P2P1L problem, and 6 \(\times \) 9 matrix for the P1P2L problem. The SVD is time-consuming. Although our algorithm is slower than Ke’s algorithm, it is still efficient for a real-time application. Minimal solution is generally used in the RANSAC algorithm [5]. Suppose the ratio of the outlier is 0.5. To ensure with a probability, such as 0.99, that at least one of the random minimal samples is without outliers, we need at least 35 trials [11]. This will be finished within 0.8 ms.

6 Conclusion

In this paper, we propose an algebraic algorithm for the four minimal configurations of 2D/3D point and line correspondences. This is useful for many robotics and computer vision applications. Our algorithm directly uses the collinearity and coplanarity constraints to construct the equation system, and does not need any intermediate transformation. This can avoid numerical error accumulation. We increase the stability of our algorithm by a reference matrix which is generally available in real applications. We present the RE3Q3 algorithm which significantly increases the stability of the E3Q3 algorithm [17]. The simulation results show that our baseline algorithm is comparable to the state-of-the-art P3P algorithm [14], and outperforms the stat-of-the-art algorithms of the other three minimal cases. A reference rotation matrix, which is generally available in the SLAM or VO system, can further increase the numerical stability of our algorithm. Additionally, our algorithm is efficient for real-time applications.