1 Introduction

Estimation of a single homography matrix from image measurements is an important step in 3D reconstruction, mosaicing, camera calibration, metric rectification, and other tasks [24]. For some applications, like non-rigid motion detection [25, 46] or enhanced image warping [20], a whole array of homography matrices are required. The matrices have to be intrinsically interconnected to satisfy consistency constraints representing the rigidity of the motion and the scene. Moreover, the matrices have to be collectively multi-homogeneous—rescaling any individual matrix should not affect the projective information contained in the whole matrix set. A key problem in estimating multiple homography matrices is to enforce the underlying consistency constraints while accounting for the arbitrariness of the individual scales of the matrices. The need to cope with scale indeterminacy has not been particularly emphasised previously, but that it is an important aspect of the problem will be apparent from our study.

As a rule, the consistency constraints are available only in implicit form. The conventional approach to cope with such constraints is to evolve a derivative family of explicit constraints. The new constraints are typically more relaxed than the original ones. Adhering to this methodology, Shashua and Avidan [40] have found that homography matrices induced by four or more planes in the 3D scene between two views span a four-dimensional linear subspace. Chen and Suter [4] have derived a set of strengthened constraints for the case of three or more homographies in two views. Zelnik-Manor and Irani [46] have shown that another rank-four constraint applies to a set of so-called relative homographies generated by two planes between four or more views. These latter authors have also derived constraints for larger sets of homographies and views.

Once isolated, the explicit constraints can be put to use in a procedure whereby first individual homography matrices are estimated from image data, and next these matrices are upgraded to matrices satisfying the constraints. Following this pattern, Shashua and Avidan as well as Zelnik-Manor and Irani used low-rank approximation under the Frobenius norm to enforce the rank-four constraint. Chen and Suter enforced their set of constraints also via low-rank approximation, but then employed the Mahalanobis norm with covariances of input homographies. All these estimation procedures involve input matrices coming with specific scale factors. The underlying error measures are such that a change of scale factors may a priori result in a different set of estimates. Furthermore, the output matrices satisfy only the derivative constraints, so their perfect consistency is not guaranteed. Another limitation of the existing methods is that each requires a certain minimum number of input homography matrices and none can work with only two such matrices.

This paper presents an alternative approach to estimating interdependent homography matrices which ensures that all implicit constraints are enforced and that the final estimates are unaffected by any specific choice of individual scale factors. A statistically motivated cost function is proposed for upgrading, via optimisation, the input set of homography matrices to a set satisfying all possible constraints. The function is scale change insensitive. To achieve high estimation accuracy, it incorporates the covariances of the input matrices, and this yields, upon optimisation, a statistically sound approximated maximum likelihood fit to the data. The utility of the function is demonstrated in a specific application, namely the problem of estimating a set of homography matrices induced by multiple planes in a 3D scene between two views. A variant of the Levenberg–Marquardt algorithm for that problem is developed that is specifically tailored to a parametrisation of the homography matrices via natural underlying latent variables. The use of the parametrisation ensures that all consistency constraints are satisfied. A notable contribution of this work is the development of a procedure for determining initial values of the latent variables, so that the optimisation process seeded with these values converges to a useful local minimum. Importantly, the procedure works already for two input homography matrices, hence enabling the overall estimation scheme to work for two input homography matrices or more. The initialisation procedure is in fact of wider interest, as it is suitable for initialising other methods that operate on the same latent variables, including the canonical bundle adjustment technique for maximum likelihood estimation. The results which are contained in the experimental section of the paper validate the approach and show that the proposed estimation method outperforms existing schemes, achieving high levels of accuracy on par with the ‘gold-standard’ bundle adjustment technique. While the newly introduced method, like all other aforementioned methods, is not truly robust to outliers, it can—as it turns out—be made robust via incorporation into a bigger robust fitting scheme. A particular scheme of this kind forms a final contribution of the paper, and one of practical utility. It is a modification of the well-known random sampling and consensus (RANSAC) technique specialised to facilitate robust fitting of fully consistent homographies to data with outliers. The proposed estimation method enters the modified RANSAC as a computationally efficient tool for generating fully consistent homographies.

Earlier results informing this work appeared in [15]. The present findings are part of a broader, ongoing study, one of whose recent contributions is [42].

2 Multi-projective parameter estimation and latent variables

We first formulate the problem of estimating a set of interdependent homographies as a problem in multi-projective parameter estimation [14]. A general multi-projective parameter estimation problem involves a collection \(\mathbf {X}_1, \dots , \mathbf {X}_I\) of \(k \times l\) matrices envisaged as data points, and a collection \({\varvec{\mathsf {\Theta }}}_1, \dots , {\varvec{\mathsf {\Theta }}}_I\) of \(k \times l\) matrices treated as parameters. Each \(\mathbf {X}_i\) is assumed to be known only up to an individual multiplicative non-zero factor. The ’s are subject to constraints and are meant to represent improved versions of the \(\mathbf {X}_i\)’s. With the \(k \times Il\) matrix \({\varvec{\mathsf {X}}} = [\mathbf {X}_1, \dots , \mathbf {X}_I]\) denoting the composite datum and the \(k \times Il\) matrix denoting the composite parameter, the problem under consideration is to fit to \({\varvec{\mathsf {X}}}\) so that the constraints on are met. Exemplifying this general problem is the following specific problem of interest:

Problem 1

Fit a set of \(3 \times 3\) matrices, representing planar homographies engendered by various planes in a 3D scene under common projections on two images, to a given set of \(3 \times 3\) matrices.

To see how the multi-projective framework applies here, consider a pair of fixed cameras with camera matrices \(\mathbf {K}_{1}\mathbf {R}_{1}[\mathbf {I}_{3}, -\mathbf {t}_{1}]\) and \(\mathbf {K}_{2}\mathbf {R}_{2}[\mathbf {I}_{3}, -\mathbf {t}_{2} ]\). Here, the length-3 translation vector \(\mathbf {t}_{n}\) and the \(3 \times 3\) rotation matrix \(\mathbf {R}_{n}\) represent the Euclidean transformation between the \(n\)th (\(n = 1,2\)) camera and the world coordinate system, \(\mathbf {K}_{n}\) is a \(3 \times 3\) upper triangular calibration matrix encoding the internal parameters of the \(n\)th camera, and, for each \(m = 1, 2, \dots \), \(\mathbf {I}_m\) denotes the \(m \times m\) identity matrix. Suppose, moreover, that a set of \(I\) planes in a 3D scene have been selected. Given \(i = 1, \dots , I\), let the \(i\)th plane from the collection have a unit outward normal \(\mathbf {n}_i\) and be situated at a distance \(d_i\) from the origin of the world coordinate system. Then, for each \(i = 1, \dots , I\), the \(i\)th plane gives rise to a planar homography between the first and second views described by the \(3 \times 3\) matrix

$$\begin{aligned} \mathbf {H}_i = w_i \mathbf {A} + \mathbf {b} \mathbf {v}_i^{\top }, \end{aligned}$$
(1)

where

$$\begin{aligned} \mathbf {A}&=\mathbf {K}_{2}\mathbf {R}_{2}\mathbf {R}_{1}^{-1}\mathbf {K}_{1}^{-1}, \quad w_{i} = \mathbf {n}_{i}^{\top }\mathbf {t}_{1} - d_{i}, \nonumber \\ \mathbf {b}&=\mathbf {K}_{2}\mathbf {R}_{2}(\mathbf {t}_{1} - \mathbf {t}_{2}), \quad \mathbf {v}_{i} =\mathbf {K}^{-\top }_{1} \mathbf {R}_{1} \mathbf {n}_{i}. \end{aligned}$$
(2)

We note that in the case of calibrated cameras when one may assume that \(\mathbf {K}_1 = \mathbf {K}_2 = \mathbf {I}_3\), \(\mathbf {t}_1 = \mathbf {0}\), \(\mathbf {R}_1 = \mathbf {I}_3\), \(\mathbf {R}_2 = \mathbf {R}\), system (2) reduces to

$$\begin{aligned} \mathbf {A}&=\mathbf {R}, \quad w_{i} = - d_{i}, \nonumber \\ \mathbf {b}&= \mathbf {t}, \quad \mathbf {v}_{i} = \mathbf {n}_{i}, \end{aligned}$$
(3)

with \(\mathbf {t} = - \mathbf {R}\mathbf {t}_2\), and equality (1) becomes the familiar direct nRt representation

$$\begin{aligned} \mathbf {H}_i = -d_i \mathbf {R} + \mathbf {t} \mathbf {n}_i^\top \end{aligned}$$

(cf. [2, 33]). We stress that all of our subsequent analysis concerns the general case of possibly uncalibrated cameras, with \(\mathbf {A}\), \(\mathbf {b}\), \(w_i\)’s and \(\mathbf {v}_i\)’s to be interpreted according to (2) rather than (3).

Let \( {\varvec{\mathsf {H}}} = [\mathbf {H}_1, \dots , \mathbf {H}_I] \) be the composite of all the homography matrices in question. Then, with \(\mathbf {a} = {{\mathrm{vec}}}(\mathbf {A})\), where \({{\mathrm{vec}}}\) denotes column-wise vectorisation [32], \( \varvec{\eta }= [\mathbf {a}^\top , \mathbf {b}^\top , \mathbf {v}_1^\top , \dots , \mathbf {v}_I^\top , w_1, \dots , w_I]^\top , \) and

(4)

\({\varvec{\mathsf {H}}}\) can be represented as

(5)

The components of \(\varvec{\eta }\) constitute latent variables that link all the matrices \(\varvec{\Pi }_i(\varvec{\eta })\) together and provide a natural parametrisation of the set of all \({\varvec{\mathsf {H}}}\)’s. Since \(\varvec{\eta }\) has a total of \(4I + 12\) entries, the set of all matrices of the form has dimension no greater than \(4I + 12\). A more refined argument shows that the set of all ’s has in fact dimension equal to \(4I + 7\) [12, 13]. Since \(4I + 7 < 9I\) whenever \(I \ge 2\), it follows that \({\varvec{\mathsf {H}}}\) resides in a proper subset of all \(3 \times 3I\) matrices for \(I \ge 2\). Thus, the requirement that \({\varvec{\mathsf {H}}}\) take the form as per (5) whenever \(I \ge 2\) can be seen as an implicit constraint on \({\varvec{\mathsf {H}}}\), with the consequence that the \(\mathbf {H}_i\)’s are all interdependent. Suppose that an estimate \( {\varvec{\mathsf {X}}} = [\mathbf {X}_1, \dots , \mathbf {X}_I] \) of \({\varvec{\mathsf {H}}}\) has been generated in some way. For example, for each \(i\), \(\mathbf {X}_i\) might be an estimate of \(\mathbf {H}_i\) individually obtained from image data. The estimation problem at hand is to upgrade \({\varvec{\mathsf {X}}}\) to so that holds for some \(\varvec{\eta }\) and is close to \({\varvec{\mathsf {X}}}\) in a meaningful sense. The essence here is to find a criterion and effective means for selecting an appropriate \(\varvec{\eta }\).

3 Approximate maximum likelihood cost function and scale invariance

The general problem of fitting to \({\varvec{\mathsf {X}}}\) with constraints imposed on is best considered as an optimisation problem. Since the input matrices are known only up to individual scales, the output matrices should also be determined only to within individual scales. This can be achieved through the use of multi-homogeneous cost functions. A function \(J\) is multi-homogeneous if

for each length-\(I\) vector \(\varvec{\lambda }= [\lambda _1, \dots , \lambda _I]^\top \) with non-zero entries, where . It is clear that if a function is multi-homogeneous, then it is minimised not only at a single , but also at all composite multiples .

To describe a multi-homogeneous cost function relevant to our problem, for each \(i = 1, \dots , I\), let

$$\begin{aligned} \varvec{\theta }_i = {{\mathrm{vec}}}({\varvec{\mathsf {\Theta }}}_i), \quad \mathbf {x}_i = {{\mathrm{vec}}}(\mathbf {X}_i), \end{aligned}$$

with each vector having length \(kl\). Referring to the \(\mathbf {X}_i\)’s via their vectorisations, suppose that associated with each \(\mathbf {x}_i\) is a \(kl \times kl\) covariance matrix \(\varvec{\Lambda }_{x_i}\). Our cost function will incorporate the \(\varvec{\Lambda }_{\mathbf{x}_i}\)’s, assuming that they all take a specific form that we elaborate on next.

If the parameters of a model are represented by a vector to within a scale factor, then the covariance matrix of any particular estimate of the parameter vector is not uniquely determined. To see why, recall first that covariances are averages of squared perturbations of specific instantiations of a model. If the model is over-parametrised, involving some redundant parameters like an indeterminate scale, then the parameter vector is not identified by the model and perturbations in the model space do not translate unequivocally into perturbations in the parameter space. A way of dealing with this problem is to restrict the parameter space—that is, to identify the model—by imposing equality constraints. Such an imposition is known as gauge fixing, with the term “gauge” referring to any particular set of constraints. The covariances evolved with the aid of gauge-fixing rules are gauge dependent—they can look very different for different gauges [27, 43]. In what follows, we eliminate the scale indeterminacy by imposing the normalisation constraint \(\Vert \mathbf {x} \Vert = 1\) (see Fig. 1). Under this condition, the covariance matrix of an estimate \(\mathbf {x}\) is such that \( \varvec{\Lambda }_{\mathbf{x}} = \varvec{\Lambda }_{\pm \Vert \mathbf {x} \Vert ^{-1}\mathbf {x}} \) and, moreover, \( \varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{} = \mathbf {P}_{\mathbf {x}}^\perp \varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{0} \mathbf {P}_{\mathbf {x}}^\perp , \) where \(\mathbf {P}_{\mathbf {x}}^\perp \) is the \(kl \times kl\) symmetric projection matrix given by \( \mathbf {P}_{\mathbf {x}}^\perp = \mathbf {I}_{kl} - \Vert \mathbf {x} \Vert ^{-2} \mathbf {x} \mathbf {x}^{\top } \) and \(\varvec{\Lambda }^{0}_{\mathbf{x}}\) is a \(kl \times kl\) symmetric matrix that we shall refer to as a pre-covariance matrix. An argument leading to the above assertion, together with explicit expressions for \(\varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{0}\) in two specific cases, can be found in Appendices A and B; see also [3, 28]. As \( \mathbf {P}_{\mathbf {x}}^\perp \mathbf {x} = \mathbf {0} \) and \( \mathbf {x}^{\top } \mathbf {P}_{\mathbf {x}}^\perp = \mathbf {0}^{\top }, \) the matrix \(\varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{}\) satisfies \( \varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{} \mathbf {x} = \mathbf {0} \quad \text {and} \quad \mathbf {x}^{\top } \varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{} = \mathbf {0}^{\top }, \) and, in particular, is singular. In line with this, the companion information matrix associated with \(\mathbf {x}\), given by the Moore–Penrose pseudo-inverse \(\varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{+}\) of \(\varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{}\), also satisfies \( \varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{+} \mathbf {x} = \mathbf {0} \quad \text {and} \quad \mathbf {x}^{\top } \varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{+} = \mathbf {0}^{\top }, \) and is singular.

Fig. 1
figure 1

Resolving scale and sign indeterminacy in the derivation of homography covariances by imposing the normalisation constraint \(\Vert \mathbf {x} \Vert = 1\). The antipodal points \(\mathbf {x}\) and \(\mathbf {-x}\) encode one and the same homography. The true perturbations \(\Delta \mathbf {x}\) and \(-\Delta \mathbf {x}\) of the homography are approximated to the first order by their respective images \(\Delta \mathbf {x}'\) and \(-\Delta \mathbf {x}'\) via the orthogonal projections onto the tangent space to the unit sphere at \(\mathbf {x}\) and \(-\mathbf {x}\). Information about the spread of varying \(\Delta \mathbf {x}\) is carried by the restriction of the covariance matrix \(\varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{}\) to the tangent space at \(\mathbf {x}\). The restriction of \(\varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{}\) to the orthogonal complement of the tangent space at \(\mathbf {x}\), which is the space spanned by \(\mathbf {x}\), vanishes and carries no information about the spread of the \(\Delta \mathbf {x}\)’s. Likewise for \(\varvec{\Lambda }_{-\mathbf{x}}\). Since the spread of the \(\Delta \mathbf {x}\)’s is the same as the spread of the \(-\Delta \mathbf {x}\)’s, it follows that \(\varvec{\Lambda }_\mathbf{x} = \varvec{\Lambda }_{-\mathbf{x}}\)

We take for our approximate maximum likelihood (AML) cost function the squared Mahalanobis distance between any aggregate \(\{\epsilon _i \Vert \varvec{\theta }_i \Vert ^{-1} \varvec{\theta }_i\}_{i=1}^I\), \(\epsilon _i = \pm 1\), of normalised, arbitrarily signed variants of the \(\varvec{\theta }_i\)’s and any aggregate \(\{\epsilon _i^{\prime } \Vert \mathbf {x}_i \Vert ^{-1} \mathbf {x}_i)\}_{i=1}^I\), \(\epsilon _i^{\prime } = \pm 1\), of similar variants of the \(\mathbf {x}_i\)’s, with the matrices \(\varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{+}\) serving as weights

$$\begin{aligned} J_{\mathrm {AML}}({\varvec{\mathsf {\Theta }}})= & {} \sum _{i=1}^I \left( \epsilon _i \Vert \varvec{\theta }_i \Vert ^{-1} \varvec{\theta }_i - \epsilon _i^{\prime } \Vert \mathbf {x}_i \Vert ^{-1} \mathbf {x}_i\right) ^\top \\&\times \varvec{\mathbf {\Lambda }}_{\epsilon _i^{\prime } \Vert \mathbf {x}_i \Vert ^{-1} \mathbf {x}_i}^{+} \left( \epsilon _i \Vert \varvec{\theta }_i \Vert ^{-1} \varvec{\theta }_i - \epsilon _i^{\prime } \Vert \mathbf {x}_i \Vert ^{-1} \mathbf {x}_i\right) . \end{aligned}$$

On account of \( \varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{+} \mathbf {x}_i = \mathbf {0}, \) \( \mathbf {x}_i^\top \varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{+} = \mathbf {0}^\top , \) and \( \varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{} = \varvec{\mathbf {\Lambda }}_{\pm \Vert \mathbf {x}_i \Vert ^{-1}\mathbf {x}_i}^{}, \) the expression for the function does not depend on any particular choice of the signs \(\epsilon _i\) and \(\epsilon _i^{\prime }\) and reduces to

$$\begin{aligned} J_{\mathrm {AML}}({\varvec{\mathsf {\Theta }}}) = \sum _{i=1}^I \Vert \varvec{\theta }_i \Vert ^{-2} \varvec{\theta }_i^{\top } \varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{+} \varvec{\theta }_i. \end{aligned}$$

Note that, for each \(i = 1, \dots , I\), multiplying \(\varvec{\theta }_i\) by a non-zero scalar \(\lambda _i\) results in \(\varvec{\theta }_i^{\top } \varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{+} \varvec{\theta }_i\) being multiplied by \(\lambda _i^2\) and \(\Vert \varvec{\theta }_i \Vert ^{-2}\) being multiplied by \(\lambda _i^{-2}\), so that \(\Vert \varvec{\theta }_i \Vert ^{-2} \varvec{\theta }_i^{\top } \varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{+} \varvec{\theta }_i\) remains intact. As a result, the AML function is multi-homogeneous.

The label “approximate” in the name of \(J_{\mathrm {AML}}\) refers to the fact that \(J_{\mathrm {AML}}\) is an approximation—to within an additive constant—of another maximum likelihood-based cost function, namely the exact maximum likelihood (EML) cost function that underpins the bundle adjustment method for estimating the \(\varvec{\theta }_i\)’s directly from image data. The EML cost function, encompassing the so-called reprojection error [24, Sect. 4.2.3], has for its composite argument the principal parameters \(\varvec{\theta }_i\), \(i = 1, \dots , I\), alongside some additional nuisance parameters. The approximation leading from the EML cost function to the AML cost function involves, among other things, the elimination of the nuisance parameters (cf. [8, 10, 11, 26, 31, 35]).

With the significance of the AML cost function elucidated, when one now takes into consideration the constraints on , the corresponding constrained minimiser of \(J_{\mathrm {AML}}\) can be viewed as a statistically well-founded estimate of .

4 Rank-four constraint enforcement

This section will depart from the primary topic of the paper, which is the upgrading of multiple homographies using the AML cost function, to shed light on an earlier multi-homography updating technique commonly known as rank-four constraint enforcement. The material here is fairly technical and can be skipped without detriment to the understanding of the rest of the paper. The purpose of the departure is to reconcile an apparent contradiction: while we stress the importance of explicitly modelling the estimation process within a multi-projective framework, rank-four constraint enforcement—which is known to work—appears at first glance to be outside of this framework. The principle underpinning rank-four constraint enforcement is that a collection of five or more interrelated homography matrices resides in an at most four-dimensional subspace. In its standard form, the technique enforces the rank-four constraint linearly via a singular value decomposition (SVD) based projection on a linear space of lower dimension. The standard form can naturally be extended to a whole family of weighted variants. It is not immediately clear how all of these, including the standard form, can fit into the category of multi-projective techniques. Settling this question seems necessary, given the multi-projective standpoint advocated in this paper. Below, we reveal that all versions of the method, despite their linear allure, are in fact fully compatible with the multi-projectiveness paradigm.

Given a composite of homography matrices \( {\varvec{\mathsf {H}}} = [\mathbf {H}_1, \dots , \mathbf {H}_I] \) satisfying (5), let \(\mathbf {H}\) be the \(9 \times I\) matrix given by

$$\begin{aligned} \mathbf {H} = [\mathbf {h}_1, \dots \mathbf {h}_I], \quad \mathbf {h}_i = {{\mathrm{vec}}}(\mathbf {H}_i). \end{aligned}$$
(6)

Since, for each \(i = 1, \dots , I\),

$$\begin{aligned} \mathbf {h}_i = w_i {{\mathrm{vec}}}(\mathbf {A}) + {{\mathrm{vec}}}(\mathbf {b} \mathbf {v}_i^{\top }) = w_i \mathbf {a} + (\mathbf {I}_3 \otimes \mathbf {b})\mathbf {v}_i, \end{aligned}$$
(7)

where \(\otimes \) denotes Kronecker product [32], it follows that

$$\begin{aligned} \mathbf {H} = \mathbf {S} \mathbf {T}, \end{aligned}$$
(8)

where \(\mathbf {S}\) is the \(9 \times 4\) matrix given by

$$\begin{aligned} \mathbf {S} = [\mathbf {I}_3 \otimes \mathbf {b}, \mathbf {a}] \end{aligned}$$

and \(\mathbf {T}\) is the \(4 \times I\) matrix given by

$$\begin{aligned} \mathbf {T} = \begin{bmatrix} \mathbf {v}_1&\dots&\mathbf {v}_I \\ w_1&\dots&w_I \end{bmatrix}. \end{aligned}$$

An immediate consequence of (8) is that \(\mathbf {H}\) has rank at most four. The requirement that \(\mathbf {H}\) should have rank no greater than four places a genuine constraint on \(\mathbf {H}\), and hence also on \({\varvec{\mathsf {H}}}\), whenever \(I \ge 5\). This constraint is exactly the rank-four constraint of Shashua and Avidan mentioned in the Introduction. In accordance with what has already been pointed out, when \(I \ge 5\), the rank-four constraint can be enforced linearly, by employing SVD, in an infinite numbers of ways. Specifically, with every length-\(I\) vector \(\mathbf {p} = [p_1, \dots , p_I]^\top \) with non-zero entries, each having the meaning of an importance weight, there is associated a specific enforcement procedure. It needs to be immediately added that the procedures associated with different \(\mathbf {p}\) are not necessarily different. If \(\mathbf {p}\) and \(\mathbf {p}'\) are proportional up to individual signs of the entries of \(\mathbf {p}\) and \(\mathbf {p}'\), that is, if \(p_i = \lambda \epsilon _i p_i'\) with \(\epsilon _i = \pm 1\) for some \(\lambda \ne 0\) and all \(i = 1, \dots , I\), then the corresponding procedures are equivalent in the sense that the sets of respective output homographies (but not necessarily the sets of output homography matrices) are equal. If \(\mathbf {p}\) and \(\mathbf {p}'\) are not proportional up to individual signs of the entries of \(\mathbf {p}\) and \(\mathbf {p}'\), then the respective procedures are, as a rule, different.

Before describing the enforcement procedures in detail, we shall need to identify the inverse mapping \(\varvec{r}\) to the mapping \({\varvec{\mathsf {H}}} \mapsto \mathbf {H}\) defined in (6). It is readily verified that \({\varvec{\mathsf {H}}}\) can be expressed in terms of \(\mathbf {H}\) as

$$\begin{aligned} {\varvec{\mathsf {H}}} = \varvec{r}(\mathbf {H}) = ({{\mathrm{vec}}}(\mathbf {H}))^{(3)} \end{aligned}$$

Here, given an \(n \times m\) matrix \(\mathbf {S}\) and a positive integer \(r\) that divides \(m\), \(\mathbf {S}^{(r)}\) denotes the \(r\)-wise vector transposition of \(\mathbf {S}\), that is, the \(n \times (m/r)\) matrix obtained by performing a block transposition on \(\mathbf {S}\), with blocks comprising length-\(r\) column vectors [19].Footnote 1 In MATLAB parlance,

$$\begin{aligned} {\varvec{\mathsf {H}}} = \varvec{r}(\mathbf {H}) = \mathtt {reshape}(\mathbf {H}, 3, 3I). \end{aligned}$$

With these preparations in place, given \( {\varvec{\mathsf {X}}} = [\mathbf {X}_1, \dots , \mathbf {X}_I] \) with \(I \ge 5\), let \(\mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}})\) be the \(9 \times I\) matrix given by

$$\begin{aligned} \mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}}) = [p_1 \Vert \mathbf {x}_1 \Vert ^{-1} \mathbf {x}_1, \dots , p_I \Vert \mathbf {x}_I \Vert ^{-1} \mathbf {x}_I]. \end{aligned}$$

Let \( \mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}}) = \mathbf {U} \mathbf {D} \mathbf {V}^\top \) be the SVD of \({\varvec{\mathsf {X}}}\), with \(\mathbf {D}\) being a \(9 \times I\) diagonal matrix with main diagonal entries \(d_{11}, \dots d_{qq}\), \(q = \min (9,I)\), and, correspondingly, let \( \mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}})_4 = \mathbf {U} \mathbf {D}_4 \mathbf {V}^\top \) be the \(4\)-truncated SVD of \(\mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}})\), with \(\mathbf {D}_4\) resulting from \(\mathbf {D}\) by replacing the entries \(d_{55}, \dots , d_{qq}\) by zero. Define the rank-four correction of \({\varvec{\mathsf {X}}}\) by

The procedure whereby \({\varvec{\mathsf {X}}}\) gets upgraded to \({\varvec{\mathsf {\widehat{\Theta }}}}_{\mathrm {rank4},\mathbf {p}}\) constitutes the SVD-based method for enforcing the rank-four constraint associated with \(\mathbf {p}\). The default version of the method corresponds to \(\mathbf {p}\) having all entries equal to \(1\).

We claim that, for any \(\mathbf {p}\), the mapping that sends \({\varvec{\mathsf {X}}}\) to has the following property: If \({\varvec{\mathsf {X}}}\) is replaced by \(\times \varvec{\lambda }{\varvec{\mathsf {X}}}\) with \(\varvec{\lambda }\) being a length-\(I\) vector with non-zero entries, then is replaced by for some length-\(I\) vector \(\varvec{\sigma }\) that depends only on \(\varvec{\lambda }\). This property implies that the homographies described by the \({\varvec{\mathsf {\widehat{\varvec{\mathbf {\Theta }}}}}}_{\mathrm {rank4}, \mathbf {p}, i}\)’s are well defined as functions of the homographies described by the \(\mathbf {X}_i\)’s, providing thereby a desired reconciliation with the multi-projectiveness paradigm.

It is clear that \(\mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}})\) is multi-positively invariant: if \(\varvec{\lambda }= [\lambda _1, \dots , \lambda _I]^\top \) has all entries positive, then \( \mathcal {F}_{\mathbf {p}}(\times \varvec{\lambda }{\varvec{\mathsf {X}}}) = \mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}}). \) Consequently, is also multi-positively invariant. We shall next show that is multi-sign equivariant: if \(\varvec{\epsilon }= [\epsilon _1, \dots , \epsilon _i]^\top \) is such that \(\epsilon _i = \pm 1\) for each \(i = 1, \dots , I\), then

(9)

To this end, we introduce \( \mathbf {E} = {{\mathrm{diag}}}(\epsilon _1, \dots , \epsilon _I). \) Note that \(\mathbf {E}^\top = \mathbf {E}\) and \(\mathbf {E}^2 = \mathbf {I}_I\), so in particular \(\mathbf {E}\) is orthogonal: \(\mathbf {E}\mathbf {E}^\top = \mathbf {I}_I\). Note also that

$$\begin{aligned} \mathcal {F}_{\mathbf {p}}(\times \varvec{\epsilon }{\varvec{\mathsf {X}}}) = \left[ p_1 \epsilon _1\Vert \mathbf {x}_1 \Vert ^{-1} \mathbf {x}_1, \dots , p_I \epsilon _I\Vert \mathbf {x}_I \Vert ^{-1} \mathbf {x}_I\right] = \mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}}) \mathbf {E}. \end{aligned}$$

Clearly,

$$\begin{aligned} \mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}}) \mathbf {E} = \mathbf {U} \mathbf {D} \mathbf {V}^\top \mathbf {E} = \mathbf {U} \mathbf {D} \mathbf {V}^\top \mathbf {E}^\top = \mathbf {U} \mathbf {D} (\mathbf {E} \mathbf {V})^\top . \end{aligned}$$

As \(\mathbf {V}\) and \(\mathbf {E}\) are both orthogonal, \(\mathbf {E} \mathbf {V}\) is orthogonal too, and so \( \mathbf {U} \mathbf {D} (\mathbf {E} \mathbf {V})^\top \) is the SVD of \(\mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}}) \mathbf {E}\). Consequently,

$$\begin{aligned} (\mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}}) \mathbf {E})_4 = \mathbf {U} \mathbf {D}_4 (\mathbf {E} \mathbf {V})^{\top }. \end{aligned}$$

Taking into account that \( \mathbf {U} \mathbf {D}_4 (\mathbf {E} \mathbf {V})^\top = \mathbf {U} \mathbf {D}_4 \mathbf {V}^{\top } \mathbf {E}^\top = \mathbf {U} \mathbf {D}_4 \mathbf {V}^{\top } \mathbf {E}, \) we conclude that

$$\begin{aligned} \varvec{r}\left( \mathbf {U} \mathbf {D}_4 \mathbf {V}^{\top } \mathbf {E}\right) = \times \varvec{\epsilon }\varvec{r}\left( \mathbf {U} \mathbf {D}_4 \mathbf {V}^{\top }\right) , \end{aligned}$$

which establishes (9). Being multi-positively invariant and multi-sign equivariant, the mapping has the desired property—indeed, a moment’s reflection shows that , where \( \varvec{\sigma }{=} [{{\mathrm{sgn}}}(\lambda _1), \dots , {{\mathrm{sgn}}}(\lambda _I)]^\top . \) The claim is established.

We remark that the argument used above can be readily employed to verify that if \(\mathbf {p}\) and \(\mathbf {p}'\) are such that \(p_i' = \lambda \epsilon _i p_i\), where \(\lambda \ne 0\) and \(\epsilon _i = \pm 1\) for each \(i = 1, \dots , I\), then the procedures and are equivalent. Indeed, it is immediate that \( \mathcal {F}_{\mathbf {p}'}({\varvec{\mathsf {X}}}) = \mathcal {F}_{\mathbf {p}}(\times \varvec{\xi }{\varvec{\mathsf {X}}}), \) where \( \varvec{\xi }= [{{\mathrm{sgn}}}(\lambda )\epsilon _1, \dots , {{\mathrm{sgn}}}(\lambda )\epsilon _I]^\top . \) Hence , and this in turn implies that the sets of homographies defined by and coincide. One consequence of the assertion just established is that to parametrise the enforcement procedures , it suffices to consider weighting vectors \(\mathbf {p}\) with all positive entries summing up to \(1\).

We finally point out that it is a separate question as to what \(\mathbf {p}\) should be chosen to get a statistically most accurate enforcement procedure. A particular value of \(\mathbf {p}\), depending on \({\varvec{\mathsf {X}}}\), can be inferred from considerations contained in [46] (reflecting the fact that interrelated homography matrices \(\mathbf {X}_i\) can always be brought to the form \(\mathbf {X}_i = \lambda _i(\mathbf {A} + \mathbf {b} \mathbf {v}_i^\top )\); see below), but it is conceivable that alternative choices leading to better results exist.

5 Cost function optimisation

After a detour into rank-four constraint enforcement, we now turn our attention to the question of optimisation of functions for which the AML cost function is a prototype—all this, of course, with a view to optimising the AML cost function itself.

Let \(J\) be a cost function for fitting to \({\varvec{\mathsf {X}}}\) of the form

where, for each \(i=1, \dots , I\), \(\mathbf {A}_i\) is a \(kl \times kl\) non-negative definite matrix. Clearly, the AML cost function conforms to this profile. Suppose that the constraints on \({\varvec{\mathsf {\Theta }}}\) take the form

where \(\varvec{\eta }\) is a length-\(d\) vector (we have \(d=4I + 12\) in the case of the constraints given in (4)). Upon introducing the function

the constrained optimisation problem in question reduces to that of optimising \(J'\), which is an unconstrained optimisation problem.

One way of optimising \(J'\) is to use the Levenberg–Marquardt (LM) method. The starting point is to re-express \(J'\) as

$$\begin{aligned} J'(\varvec{\eta }) = \sum _{i=1}^I \left\| \mathbf {f}_i'(\varvec{\eta })\right\| ^2, \end{aligned}$$

where, for each \(i = 1, \dots , I\),

$$\begin{aligned} \mathbf {f}_i'(\varvec{\eta })= & {} \mathbf {f}_i(\varvec{\pi }_i(\varvec{\eta })), \\ \mathbf {f}_i(\varvec{\theta }_i)= & {} \Vert \varvec{\theta }_i \Vert ^{-1} \mathbf {B}_i \varvec{\theta }_i, \quad \varvec{\pi }_i(\varvec{\eta }) = {{\mathrm{vec}}}(\varvec{\Pi }_i(\varvec{\eta })), \end{aligned}$$

with \(\mathbf {B}_i\) a \(kl \times kl\) matrix such that \(\mathbf {B}_i^\top \mathbf {B}_i = \mathbf {A}_i\); in particular, \(\mathbf {B}_i\) may be taken equal to the unique non-negative definite square root of \(\mathbf {A}_i\).Footnote 2 Let \(\mathbf {f}'(\varvec{\eta }) = [\mathbf {f}_1'^\top (\varvec{\eta }), \dots , \mathbf {f}_I'^\top (\varvec{\eta })]^\top \). The LM technique makes use of the \(Ikl \times d\) Jacobian matrix \(\partial _{\varvec{\eta }}{\mathbf {f}'}\) represented as \( \partial _{\varvec{\eta }}{\mathbf {f}'} = [{\partial _{\varvec{\eta }}{\mathbf {f}_1^{'}}}^\top , \dots , {\partial _{\varvec{\eta }}{\mathbf {f}_I^{'}}}^\top ]^\top . \) For each \(i = 1, \dots , I\),

$$\begin{aligned} \partial _{\varvec{\eta }}{\mathbf {f}_i^{'}}(\varvec{\eta }) = \partial _{\varvec{\theta }_i}{\mathbf {f}_i}(\varvec{\pi }_i(\varvec{\eta })) \partial _{\varvec{\eta }}{\varvec{\pi }_i}(\varvec{\eta }) \end{aligned}$$

with \( \partial _{\varvec{\theta }_i}{\mathbf {f}_i}(\varvec{\theta }_i) = \Vert \varvec{\theta }_i \Vert ^{-1} \mathbf {B}_i \mathbf {P}_{\varvec{\theta }_i}^\perp \) and \( \mathbf {P}_{\varvec{\theta }_i}^\perp = \mathbf {I}_{kl} - \Vert \varvec{\theta }_i \Vert ^{-2} \varvec{\theta }_i \varvec{\theta }_i^{\top }. \) The algorithm iteratively improves on an initial approximation \(\varvec{\eta }_0\) to the minimiser of \(J'\) by constructing new approximations with the aid of the update rule

$$\begin{aligned} \varvec{\eta }_{n+1} = \varvec{\eta }_n - [\mathbf {H}(\varvec{\eta }_n) + \lambda _n \mathbf {I}_{d})]^{-1} [\partial _{\varvec{\eta }}{\mathbf {f}'}(\varvec{\eta }_n)]^\top \mathbf {f}'(\varvec{\eta }_n), \end{aligned}$$

where \( \mathbf {H} = (\partial _{\varvec{\eta }}{\mathbf {f}'})^\top \partial _{\varvec{\eta }}{\mathbf {f}'} \) and \(\lambda _n\) is a non-negative scalar that dynamically changes from step to step. Details concerning the choice of \(\lambda _n\) can be found in [38].

6 Multiple homography estimation

We now proceed to consider the LM-based estimation of multiple homography matrices. We first describe the specific details of the relevant iterative scheme and then develop a suitable initialisation procedure. This procedure will be applicable not only to the LM scheme, but to any other iterative method operating on the latent variable vector \(\varvec{\eta }\) (in particular, to a bundle adjustment technique—see Sect. 7.2), and as such is of interest in its own right.

6.1 LM scheme

In the setup described by (4) and (5), we have, in accordance with (7),

$$\begin{aligned} \varvec{\pi }_i(\varvec{\eta }) = {{\mathrm{vec}}}\left( w_i \mathbf {A} + \mathbf {b} \mathbf {v}_i^{\top }\right) = w_i \mathbf {a} + \mathbf {v}_i \otimes \mathbf {b} \end{aligned}$$

for each \(i = 1, \dots , I\). Taking into account that \( \mathbf {v}_i \otimes \mathbf {b} = (\mathbf {I}_3 \otimes \mathbf {b}) \mathbf {v}_i = (\mathbf {v}_i \otimes \mathbf {I}_3) \mathbf {b}, \) one readily verifies that

$$\begin{aligned} \begin{array}{llll} \partial _{\mathbf {a}}{\varvec{\pi }_i} &{} = w_i\mathbf {I}_9, &{}\quad \quad \partial _{\mathbf {b}}{\varvec{\pi }_i} &{}= \mathbf {v}_i \otimes \mathbf {I}_3, \\ \partial _{\mathbf {v}_i}{\varvec{\pi }_i} &{} = \mathbf {I}_3 \otimes \mathbf {b}, &{}\quad \quad \partial _{\mathbf {v}_j}{\varvec{\pi }_j} &{}= \mathbf {0} \quad (i \ne j), \\ \partial _{w_i}{\varvec{\pi }_i} &{} = \mathbf {a},&{}\quad \quad \partial _{w_j}{\varvec{\pi }_j} &{}= \mathbf {0} \quad (i \ne j). \end{array} \end{aligned}$$

Representing, for each \(i = 1, \dots , I\), \(\partial _{\varvec{\eta }}{\mathbf {f}_i^{'}}\) as

$$\begin{aligned} \partial _{\varvec{\eta }}{\mathbf {f}_i^{'}} = \left[ \partial _{\mathbf {a}}{\mathbf {f}_i^{'}}, \partial _{\mathbf {b}}{\mathbf {f}_i^{'}}, \partial _{\mathbf {v}_1}{\mathbf {f}_i^{'}}, \dots , \partial _{\mathbf {v}_I}{\mathbf {f}_i^{'}}, \partial _{w_1}{\mathbf {f}_i^{'}}, \dots , \partial _{w_I}{\mathbf {f}_i^{'}}\right] , \end{aligned}$$

one finds furthermore that

$$\begin{aligned} \partial _{\mathbf {a}}{\mathbf {f}_i'}&= w_i \Vert \varvec{\pi }_i \Vert ^{-1} \mathbf {B}_i \mathbf {P}_{\varvec{\pi }_i}^\perp , \\ \partial _{\mathbf {b}}{\mathbf {f}_i'}&= \Vert \varvec{\pi }_i \Vert ^{-1} \mathbf {B}_i \mathbf {P}_{\varvec{\pi }_i}^\perp (\mathbf {v}_i \otimes \mathbf {I}_3),\\ \partial _{\mathbf {v}_i}{\mathbf {f}_i'}&= \Vert \varvec{\pi }_i \Vert ^{-1} \mathbf {B}_i \mathbf {P}_{\varvec{\pi }_i}^\perp (\mathbf {I}_3 \otimes \mathbf {b}), \quad \partial _{\mathbf {v}_j}{\mathbf {f}_i'} = \mathbf {0} \quad (j \ne i), \\ \partial _{w_i}{\mathbf {f}_i'}&= \Vert \varvec{\pi }_i \Vert ^{-1} \mathbf {B}_i \mathbf {P}_{\varvec{\pi }_i}^\perp \mathbf {a},\qquad \qquad \partial _{w_j}{\mathbf {f}_i'} = \mathbf {0} \quad (j \ne i). \end{aligned}$$

With \(\partial _{\varvec{\eta }}{\mathbf {f}'}\) thus determined, all that is now needed is a means for determining a suitable initial value of \(\varvec{\eta }\).

6.2 Initialisation procedure

To devise an initialisation procedure, we first consider the following problem:

Problem 2

Given \({\varvec{\mathsf {X}}} = [\mathbf {X}_1, \dots , \mathbf {X}_I]\) satisfying

$$\begin{aligned} \mathbf {X}_i = \lambda _i \mathbf {H}_i \end{aligned}$$
(10)

for each \(i = 1, \dots , I\), where \(\lambda _i\) is a non-zero scalar and \( \mathbf {H}_i = w_i \mathbf {A} + \mathbf {b} \mathbf {v}_i^{\top }, \) solve for \(\mathbf {A}\), \(\mathbf {b}\), \(\mathbf {v}_i\) and \(w_i\) in terms of \({\varvec{\mathsf {X}}}\).

The essence here is to recover the structure of a set of matrices which are known a priori to satisfy (1), but which are identified only up to (unknown) scale factors.

Observe first that the solution of the problem cannot be unique. Indeed, if \( \mathbf {H}_i = w_i \mathbf {A} + \mathbf {b} \mathbf {v}_i^\top \) for each \(i\), then also \( \mathbf {H}_i = w_i' \mathbf {A}' + \mathbf {b}' \mathbf {v}_i'^\top \) for each \(i\), where \(\mathbf {A}' = \beta \mathbf {A} + \mathbf {b} \mathbf {c}^\top \), \(\mathbf {b}' = \alpha \mathbf {b}\), \(\mathbf {v}_i' = \alpha ^{-1} \mathbf {v}_i - \alpha ^{-1} \beta ^{-1} \mathbf {c}\), and \(w_i' = \beta ^{-1} w_i\), with \(\alpha \) and \(\beta \) non-zero numbers and \(\mathbf {c}\) a length-\(3\) vector \(\mathbf {c}\). Therefore, we shall contend ourselves with a single specific solution.

Note that \( \mathbf {X}_i = \lambda _i (w_i \mathbf {A} + \mathbf {b} \mathbf {v}_i^{\top }) \) implies \( \mathbf {X}_i = \lambda _i' (\mathbf {A} + \mathbf {b} \mathbf {v}_i'^{\top }) \) with \(\lambda _i' = w_i \lambda _i\) and \(\mathbf {v}_i' = w_i^{-1} \mathbf {v}_i\). Thus, without loss of generality, we may assume that \(w_i = 1\) for each \(i\). Now, system (10) becomes

$$\begin{aligned} \mathbf {X}_i = \lambda _i(\mathbf {A} + \mathbf {b} \mathbf {v}_i^{\top }). \end{aligned}$$
(11)

Select arbitrarily \(i\) and \(j\) with \(i \ne j\). Taking into account that \( \mathbf {v}_i^\top (\mathbf {v}_i \times \mathbf {v}_j) = \mathbf {v}_j^\top (\mathbf {v}_i \times \mathbf {v}_j) = 0, \) we see that

$$\begin{aligned} \lambda _i^{-1} \mathbf {X}_i (\mathbf {v}_i \times \mathbf {v}_j) = \mathbf {A} (\mathbf {v}_i \times \mathbf {v}_j) = \lambda _j^{-1} \mathbf {X}_j (\mathbf {v}_i \times \mathbf {v}_j) \end{aligned}$$

and further

$$\begin{aligned} \mathbf {X}_i^{-1} \mathbf {X}_j (\mathbf {v}_i \times \mathbf {v}_j) = \lambda _i^{-1}\lambda _j (\mathbf {v}_i \times \mathbf {v}_j). \end{aligned}$$
(12)

Consequently, \(\lambda _i^{-1}\lambda _j\) is an eigenvalue of \(\mathbf {X}_i^{-1} \mathbf {X}_j\). Note that if \(\mathbf {c}\) is any length-3 vector, then (11) holds with \(\mathbf {A}\) replaced by \(\mathbf {A} + \mathbf {b} \mathbf {c}^\top \) and with \(\mathbf {b} \mathbf {v}_i^\top \) replaced by \(\mathbf {b} (\mathbf {v}_i - \mathbf {c})^{\top }\). Accordingly, (12) holds with \((\mathbf {v}_i - \mathbf {c}) \times (\mathbf {v}_j - \mathbf {c})\) substituted for \(\mathbf {v}_i \times \mathbf {v}_j\). It is easily seen that as \(\mathbf {c}\) varies, the vectors

$$\begin{aligned} (\mathbf {v}_i - \mathbf {c}) \times (\mathbf {v}_j - \mathbf {c}) = (\mathbf {c} - \mathbf {v}_j) \times (\mathbf {v}_i - \mathbf {v}_j) \end{aligned}$$

fill out a two-dimensional linear space, namely the space of all length-\(3\) vectors orthogonal to \(\mathbf {v}_i - \mathbf {v}_j\). Thus \(\lambda _i^{-1}\lambda _j\) is in fact a double eigenvalue of \(\mathbf {X}_i^{-1} \mathbf {X}_j\), which, generically, is unique. Denoting this eigenvalue by \(\mu _{ij}\) and observing that

$$\begin{aligned} \lambda _i^{-1}\lambda _j \mathbf {X}_i - \mathbf {X}_j = \lambda _j(\lambda _i^{-1}\mathbf {X}_i - \lambda _j^{-j}\mathbf {X}_j) = \lambda _j \mathbf {b}(\mathbf {v}_i - \mathbf {v}_j)^\top , \end{aligned}$$

we next see that \(\mathbf {b}\) can be identified, up to a scalar factor, as the unique left singular vector of \( \mu _{ij} \mathbf {X}_i - \mathbf {X}_j \) corresponding to a non-zero singular value.

Having made these observations, we now fix \(i_0\) arbitrarily. For each \(i \ne i_0\), let \(\mu _{ii_0}\) be the double eigenvalue of \(\mathbf {X}_i^{-1} \mathbf {X}_{i_0}\). Then, for each \(i \ne i_0\),

$$\begin{aligned} \mu _{ii_0} \mathbf {X}_i = \lambda _i^{-1}\lambda _{i_0} \mathbf {X}_i = \lambda _{i_0} \left( \mathbf {A} + \mathbf {b} \mathbf {v}_i^{\top }\right) . \end{aligned}$$

Relabelling \(\lambda _{i_0} \mathbf {A}\) as \(\mathbf {A}\) and \(\lambda _{i_0} \mathbf {b}\) as \(\mathbf {b}\), the last relation simplifies to

$$\begin{aligned} \mu _{ii_0} \mathbf {X}_i = \mathbf {A} + \mathbf {b} \mathbf {v}_i^{\top } \end{aligned}$$
(13)

and the \(i_0\)th defining equation becomes

$$\begin{aligned} \mathbf {X}_{i_0} = \mathbf {A} + \mathbf {b} \mathbf {v}_{i_0}^{\top }. \end{aligned}$$
(14)

Take for \(\mathbf {b}\) the left singular vector of the \(3 \times 3(I-1)\) matrix obtained by juxtapositioning (concatenating horizontally) the matrices \(\mu _{ii_0} \mathbf {X}_i - \mathbf {X}_{i_0}\), \(i \ne i_{i_0}\), corresponding to a single non-zero singular value. Now, by subtracting (14) from (13), we find, for each \(i \ne i_{i_0}\),

$$\begin{aligned} \mu _{ii_0} \mathbf {X}_i - \mathbf {X}_{i_0} = \mathbf {b}(\mathbf {v}_i - \mathbf {v}_{i_0})^\top \end{aligned}$$

whence

$$\begin{aligned} \mathbf {v}_i = \mathbf {v}_{i_0} + \Vert \mathbf {b} \Vert ^{-2} (\mu _{ii_0} \mathbf {X}_i - \mathbf {X}_{i_0})^\top \mathbf {b}. \end{aligned}$$

To fully solve our problem, it remains to determine \(\mathbf {v}_{i_0}\) and \(\mathbf {A}\). The only constraint that we now have to take into account is (14). One simple solution is

$$\begin{aligned} \mathbf {A} = \mathbf {X}_{i_0}, \quad \mathbf {v}_{i_0} = \mathbf {0}. \end{aligned}$$

With our problem successfully solved, we are now in position to furnish a desired initialisation procedure. Based on a noisy data set \({\varvec{\mathsf {X}}}\), a seed \( \bar{\varvec{\eta }} = [\bar{\mathbf {a}}^\top , \bar{\mathbf {b}}^\top , \bar{\mathbf {v}}_1^\top , \dots , \bar{\mathbf {v}}_I^\top , \bar{w}_1, \dots , \bar{w}_I]^\top \) for any iterative method operating on \(\varvec{\eta }\) is obtained by modifying the specific solution given above. The modification reflects the fact that \({\varvec{\mathsf {X}}}\) admits only an approximate representation as in (10). The steps of the initialisation procedure are detailed in Algorithm 1.

figure e

It is noteworthy that for the above procedure to work, just two different input homographies suffice. This is in contrast with the initialisation proposed by Chen and Suter [4] which requires at least three different homographies.

7 Experimental verification

The method was tested on both synthetic and real data. Synthetic data were used to quantify the effect of noise on the method. Real data were used to evaluate the performance of the method in a real-world situation.

Fig. 2
figure 2

Synthetic data generation procedure

7.1 Synthetic data

Synthetic data were created by generating true corresponding points for some stereo configuration and adding random Gaussian noise to these points. Many configurations were investigated. Any specific instantiation of true image points was developed as follows. First, we chose a realistic geometric configuration for two cameras. Next, we applied a random rotation and translation to a plane that is parallel to the first camera’s image plane (see Fig. 2). Repeating this last step several times, we generated several planes in the 3D scene. Finally, between 25 and 50 points in each plane were randomly selected in the field of view of both cameras, and these were projected onto two \(640 \times 480\) pixel images to provide true image points.

7.2 Synthetic simulation procedure

Each synthetic true image point was perturbed by independent homogeneous Gaussian noise at a preset level. For different series of experiments, different noise levels were applied. This resulted in \(I\) groups of noise-contaminated pairs of corresponding points \(\{\mathbf {m}_{n,i},\mathbf {m}_{n,i}^{'}\}_{n=1}^{N_i}\), \(i = 1, \dots , I\), corresponding to \(I\) different planes in the 3D scene, with \(I\in \{2, 4, 5, 6, 7, 8\}\) for different experiments; here, \(N_i\) is the number of feature points in the \(i\)th plane.

The estimation methods considered were:

\(\bullet \) DLT :

direct linear transform,

\(\bullet \) FNS :

fundamental numerical scheme,

\(\bullet \) WALS :

weighted alternating least squares,

\(\bullet \) AML :

approximate maximum likelihood,

\(\bullet \) BA :

bundle adjustment.

DLT [24] is a linear method for estimating a single homography and FNS [8, 39] is an iterative method for the same purpose, the two methods optimising two different, yet related cost functions. For the small noise levels utilised in our study, FNS produces the same results as single homography bundle adjustment, and to all intents and purposes may be regarded as the gold standard single homography estimation method. Both DLT and FNS were run on suitably normalised data. For each \(i = 1, \dots , I\), two data-dependent normalisation matrices \(\mathbf {T}_i\) and \(\mathbf {T}_i^{'}\) were applied to \(\{\mathbf {m}_{n,i},\mathbf {m}_{n,i}^{'}\}_{n=1}^{N_i}\) using the rule

$$\begin{aligned} \tilde{\mathbf {m}}_{n,i} = \mathbf {T}_i \mathbf {m}_{n,i} \quad \text {and} \quad \tilde{\mathbf {m}}_{n,i}^{'} = \mathbf {T}_i^{'} \mathbf {m}_{n,i}^{'} \end{aligned}$$

to produce individually normalised corresponding points \(\{\tilde{\mathbf {m}}_{n,i},\tilde{\mathbf {m}}_{n,i}^{'}\}_{n=1}^{N_i}\) à la Hartley [7, 9, 23]. These normalised groups were used as input to DLT and FNS to produce two sets of \(I\) homography matrices \(\{\widehat{{\tilde{{\varvec{\Theta }}}}}_{\mathrm {DLT},i}\}_{i=1}^{I} \) and \( \{\widehat{{{{\varvec{\Theta }}}}}_{\mathrm {FNS},i}\}_{i=1}^{I}. \) The final estimates and were obtained by applying, for each \(i = 1, \dots , I\), the back-transformation \({\varvec{{\Theta }}}_i \mapsto \mathbf {T}_i'^{-1} {\varvec{{\Theta }}}_i \mathbf {T}_i\) to \(\widehat{{\tilde{{\varvec{\Theta }}}}}_{\mathrm {DLT},i}\) and \(\widehat{{\tilde{{\varvec{\Theta }}}}}_{\mathrm {FNS},i}\), respectively.

WALS is Chen and Suter’s method of weighted alternating least squares with weights derived from the covariances of input estimates. Both WALS and our proposed AML method use as input, in one version, the estimates produced by FNS and, in another version, the estimates produced by DLT. For reasons of numerical stability, both WALS and AML required data normalisation as a pre-processing step. A common data normalisation procedure was employed to both methods, namely we combined all the points together to produce two global normalisation matrices \(\mathbf {T}\) and \(\mathbf {T}'\). These matrices were used to produce globally normalised corresponding points \(\{\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i},\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'}\}_{n=1}^{N_i}\), \(i = 1, \dots , I\), defined by

$$\begin{aligned} \mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i} = \mathbf {T} \mathbf {m}_{n,i} \quad \text {and} \quad \mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'} = \mathbf {T}'\mathbf {m}_{n,i}^{'}. \end{aligned}$$

The input of the FNS-initialised versions of WALS and AML, WALS-FNS and AML-FNS, took the form of the FNS estimates based on \(\{\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i},\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'}\}_{n=1}^{N_i}\), \(i = 1, \dots , I\). Likewise, the input of the DLT-initialised versions of WALS and AML, WALS-DLT and AML-DLT, took the form of the DLT estimates based on \(\{\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i},\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'}\}_{n=1}^{N_i}\), \(i = 1, \dots , I\). For each initialisation, a specific prescription, described below, was used to generate the pre-covariance matrix \(\varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{0}\) for the vectorisation \(\mathbf {x}_i\) of \(\mathbf {X}_i\) for each \(i = 1, \dots , I\). The matrices \(\mathbf {X}_i\) and pre-covariances \(\varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{0}\) were next used as input to the WALS and AML methods to produce estimates which, upon applying the back-transformation \(\varvec{\mathbf {\Theta }}\mapsto \mathbf {T}'^{-1} \varvec{\mathbf {\Theta }}\mathbf {T}\), were taken to be and , respectively

In the case of the FNS-initialised methods, the pre-covariance matrix for the vectorisation \(\mathbf {x}_i\) of \(\mathbf {X}_i\), based on \(\{\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}, \mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'}\}_{n=1}^{N_i}\), was taken to be

$$\begin{aligned} \varvec{\Lambda }^0_{\mathbf {x}_i}= & {} (\mathbf {M}_{\mathbf {x}_i})^{+}_8, \end{aligned}$$
(15)
$$\begin{aligned} \mathbf {M}_{\mathbf {x}_i}= & {} \Vert \mathbf {x}_i \Vert ^2 \sum _{n = 1}^{N_i} \mathbf {U}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i}) ({\varvec{\mathsf {\Sigma }}}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i}, \mathbf {x}_{i}))^+_2 \mathbf {U}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i})^\top . \end{aligned}$$
(16)

Here, \(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i} = [\mathop {{u}}\limits ^{\circ }\!{}_{n,i},\mathop {{v}}\limits ^{\circ }\!{}_{n,i},\mathop {{u}}\limits ^{\circ }\!{}_{n,i}^{'},\mathop {{v}}\limits ^{\circ }\!{}_{n,i}^{'}]^\top \) is the result of combining \(\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i} = [\mathop {{u}}\limits ^{\circ }\!{}_{n,i},\mathop {{v}}\limits ^{\circ }\!{}_{n,i},1]^\top \) and \(\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'}{=} [\mathop {{u}}\limits ^{\circ }\!{}_{n,i}^{'},\mathop {{v}}\limits ^{\circ }\!{}_{n,i}^{'},1]^\top \) into a single vector; \(\mathbf {U}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i})\) and \({\varvec{\mathsf {\Sigma }}}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i}, \mathbf {x}_{i})\) are defined by

$$\begin{aligned} \mathbf {U}(\mathbf {z})&= - \mathbf {m} \otimes [\mathbf {m}']_{\times }, \\ \mathbf {B}(\mathbf {z})&= \partial _{\mathbf {z}}{{{\mathrm{vec}}}(\mathbf {U}(\mathbf {z}))} \varvec{\mathbf {\Lambda }}_{\mathbf {z}}^{} \big [\partial _{\mathbf {z}}{{{\mathrm{vec}}}(\mathbf {U}(\mathbf {z}))}\big ]^\top , \\ {\varvec{\mathsf {\Sigma }}}(\mathbf {z},\mathbf {x})&= (\mathbf {I}_3 \otimes \mathbf {x}^\top ) \mathbf {B}(\mathbf {z}) (\mathbf {I}_3 \otimes \mathbf {x}) \end{aligned}$$

with \(\mathbf {z} = [u,v,u',v']^\top \) derived from \(\mathbf {m} = [u, v, 1]^\top \) and \(\mathbf {m}' = [u', v', 1]^\top \), and with \(\mathbf {x}\) a length-9 vector; for a length-3 vector \(\mathbf {a}\), \([\mathbf {a}]_{\times }\) is the \(3 \times 3\) anti-symmetric matrix given by

$$\begin{aligned}{}[\mathbf {a}]_{\times } = \begin{bmatrix} 0&-a_3&a_2 \\ a_3&0&-a_1 \\ -a_2&a_1&0 \end{bmatrix}; \end{aligned}$$

and \(\mathbf {A}^+_r\) denotes the rank-\(r\) truncated pseudo-inverse of the matrix \(\mathbf {A}\) (see [26] or Appendix A for the definition). The image data covariance matrices \(\varvec{\mathbf {\Lambda }}_{\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i}}^{}\), incorporated in the matrices \({\varvec{\mathsf {\Sigma }}}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i}, \mathbf {x}_{i})\), were all chosen to be in their default form \({{\mathrm{diag}}}(1,1,1,1)\), corresponding to isotropic homogeneous noise in image point measurement. The details of formula (15) are given in Appendix A.

The DLT-initialised methods employed the raw covariance matrix for the vectorisation \(\mathbf {x}_i\) of \(\mathbf {X}_i\), based on \(\{\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}, \mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'}\}_{n=1}^{N_i}\), in the form

$$\begin{aligned} \varvec{\Lambda }^0_{\mathbf {x}_i} = (\mathbf {M}_{\mathbf {x}_i})_8^+ \mathbf {D}_{\mathbf {x}_i} (\mathbf {M}_{\mathbf {x}_i})_8^+. \end{aligned}$$
(17)

with \(\mathbf {M}_{\mathbf {x}_i}\) given in (16) and

$$\begin{aligned} \mathbf {D}_{\mathbf {x}_i} = \Vert \mathbf {x}_i \Vert ^{-2} \sum _{n=1}^N \mathbf {U}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i}) {\varvec{\mathsf {\Sigma }}}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i}, \mathbf {x}_{i}) \mathbf {U}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i})^\top . \end{aligned}$$

Formula (17) relies on the material presented in Appendix B.

BA is a method for calculating maximum likelihood estimates and is regarded as the gold standard for performing optimal parameter estimation. It is the same as what is called joint bundle adjustment in [42]. The qualifier “joint” serves to emphasise the fact that this variant of bundle adjustment, using the latent variables, enforces full homography consistency constraints. In contrast, naive separate bundle adjustment minimises the gold standard geometric error, but does not enforce consistency constraints. In this paper we do not explicitly consider separate bundle adjustment, and therefore we label joint bundle adjustment simply as BA. However, separate bundle adjustment enters implicitly into our considerations through FNS, as, in accordance with our earlier remark, FNS applied to estimate each individual homography separately yields results indistinguishable from those produced by separate bundle adjustment for the levels of noise employed in our experiments.

The BA method was used in two variants, BA-FNS and BA-DLT, which differed in the way the method’s iterative process was initialised. In both variants, a BA estimate was generated directly from all the image data \( \{\{\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}, \mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'}\}_{n=1}^{N_i}\}_{i=1}^{I} \) by minimising the reprojection error

$$\begin{aligned} \sum _{i=1}^{I} \sum _{n=1}^{N_i} \left( d\left( \mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}, \underline{\mathbf {m}}_{n,i}\right) ^2 + d\left( \mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'}, \varvec{\Pi }_i(\varvec{\eta })\underline{\mathbf {m}}_{n,i}\right) ^2 \right) \end{aligned}$$

over all choices of parameter vectors \(\varvec{\eta }\) and 2D points \( \{\{\underline{\mathbf {m}}_{n,i}\}_{n=1}^{N_i}\}_{i=1}^{I}, \) with the minimum attained at the composite of \(\widehat{\varvec{\eta }}\) and \( \{\{\underline{\widehat{\mathbf {m}}}_{n,i}\}_{n=1}^{N_i}\}_{i=1}^{I} \) resulting in \(\widehat{\varvec{{\Theta }}}_{\mathrm {BA},i} = \varvec{\Pi }_i(\widehat{\varvec{\eta }})\). Here, \(d(\mathbf {m},\mathbf {n})\) denotes the Euclidean distance between the points \(\mathbf {m}\) and \(\mathbf {n}\) dehomogenised at the last, third entry. The initial value of each \(\underline{\mathbf {m}}_{n,i}\) was taken to be \(\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}\) in both variants, and the initial value of \(\varvec{\eta }\) was obtained from the result of AML-FNS for BA-FNS, and from the result of AML-DLT for BA-DLT. Upon initialisation, \(\varvec{\eta }\) and the \(\underline{\mathbf {m}}_{n,i}\)’s were recomputed iteratively by an LM scheme adapted to the task of minimising the error given above. With \(\widehat{\mathbf {A}}_{\mathrm {BA}}\), \(\widehat{\mathbf {b}}_{\mathrm {BA}}\), \(\widehat{\mathbf {v}}_{\mathrm {BA},i}\)’s, \(\widehat{w}_{\mathrm {BA},i}\)’s derived from the terminal value of \(\varvec{\eta }\), the estimates \(\widehat{\varvec{{\Theta }}}_{\mathrm {BA},i}\) were finally obtained by applying the back-transformation \({\varvec{{\Theta }}} \mapsto \mathbf {T}'^{-1} {\varvec{{\Theta }}} \mathbf {T}\) to \( \widehat{w}_{\mathrm {BA},i} \widehat{\mathbf {A}}_{\mathrm {BA}} + \widehat{\mathbf {b}}_{\mathrm {BA}} \widehat{\mathbf {v}}_{\mathrm {BA},i}^{\top }. \)

7.3 Real data

To investigate the performance of the proposed method on real data, we utilised the Model House sequence from the Oxford dataset,Footnote 3 the Lady Symon and Old Classics Wing scenes from the AdelaideRMF datasetFootnote 4 [45], and additionally two traffic scenes from the Traffic Signs datasetFootnote 5 [30].

We generated corresponding points for the Oxford and Traffic Signs datasets using correlation-based matching on Harris corner points. Because the matching of corner points was done without manual intervention, the resulting set of corresponding points included pure outliers (incorrect matches). Corresponding points for the AdelaideRMF dataset were generated in a different manner, but also included pure outliers. The curator of the AdelaideRMF dataset detected key points using SIFT and generated true correspondences by manually matching only a subset of key points. The remaining key points were randomly matched and labelled as pure outliers.

To make the experiments on real data as realistic as possible, we did not manually group data points into different planar regions. Instead, we took advantage of recent progress in robust hypothesis generation for multi-structure data [6] and sampled a thousand candidate homographies based on four-element sets of matched correspondences. For each pair of corresponding points, we computed the symmetric transfer error [24, Sect. 4.2.2] for all 1,000 homographies and ranked the homographies according to this error. This means that each correspondence had an associated preference list that ranked the homographies. Making use of the ordered residual kernel method [5] which defines a distance between rankings, together with an agglomerating clustering scheme, we grouped the pairs of corresponding points into several clusters. The clustering scheme that we applied does not guarantee that all final clusters belong to different planes in the scene. In fact, frequently correspondences are fragmented into two different clusters even though visually they should belong to the same plane. This means that effectively the same planar region may have more than one homography associated with it. Such is the case with the Model House sequence where clustering led to the appearance of six homographies (corresponding to six clusters), even though there are only four actual planes in the scene. The fragmentation effect is not unique to our clustering scheme and was also observed by Fouhey et al. [18], who used a different algorithm (J-linkage) to group planar data points together. The empirical fact that fragmentation occurs so frequently serves as an affirmation that it is important to exploit consistency constraints as a means for enhancing coherency between homographies derived from logically linked clusters.

7.4 Real data experiment procedure

After automatically grouping the data points into several clusters, we estimated homographies using DLT for each group. To compare and contrast the stability and accuracy of the consistency enforcing estimation methods, we used the same initialisation procedure (Algorithm 1) on the DLT estimates of each group to seed AML-DLT, WALS-DLT, and BA-DLT.

7.5 Quantitative comparison of methods

On synthetic data, for generated by the DLT, FNS, WALS, AML, and BA methods, the common distance used to quantify data–model discrepancies was the mean root-mean-square (RMS) reprojection error from truth

$$\begin{aligned} \frac{1}{I} \sum _{i=1}^{I} \sqrt{ \frac{1}{4N_iK} \sum _{k=1}^K \min _{\underline{\mathbf {m}}_{n,i}^{(k)}} \sum _{n=1}^{N_i} \left( d(\overline{\mathbf {m}}^{(k)}_{n,i}, \underline{\mathbf {m}}^{(k)}_{n,i})^2 + d(\overline{\mathbf {m}}'^{(k)}_{n,i}, \widehat{\varvec{\mathbf {\Theta }}}_i \underline{\mathbf {m}}^{(k)}_{n,i})^2\right) }, \end{aligned}$$

where \(K\) is the number of experiments, and, for each \(k = 1, \dots , K\), \( \{\{\overline{\mathbf {m}}^{(k)}_{n,i},\overline{\mathbf {m}}'^{(k)}_{n,i}\} _{n=1}^{N_i}\}_{i=1}^{I} \) are noiseless data and \( \{\{\underline{\mathbf {m}}^{(k)}_{n,i}\}_{n=1}^{N_i}\}_{i=1}^{I} \) are arbitrary 2D points over which the minimum is taken in the \(k\)th experiment. A comparison of the methods on synthetic data is shown in Figs. 3 and 4. The results indicate that the proposed algorithm outperforms DLT, FNS, WALS-DLT, and WALS-FNS, and is practically indistinguishable in performance from the BA method for small noise levels. For larger noise levels all algorithms that enforce consistency constraints become susceptible to converging to a poor local minimum, resulting in homography estimates that are worse than separately estimated homographies. Figure 4a shows comparatively how often the algorithms that enforce consistency homographies managed to improve upon the (separately) FNS-estimated homographies. While the likelihood of converging to a superior solution decreased as the noise level increased, our method still fared better than WALS. With \(\sigma = 2\) pixels our algorithm converged to a superior solution in more than \(90\,\%\) of the trials. The deterioration of the algorithm at higher noise levels can be attributed to Algorithm 1 which is used to find initial values for the latent variables. For high noise levels, Algorithm 1 yields latent variables that are associated with high cost function values, which means that the optimisation methods are initialised far from the optimal solution and have a greater chance of stopping at a poor local minimum. In Fig. 4b, we display the percentage reduction in reprojection error that enforcing consistency constraints can produce when compared with separate homography estimation. The results show that one can expect approximately \(23\,\%\) reduction for four homographies, and up to approximately \(30\,\%\) reduction for eight homographies. Even when there were only two planes in the scene, we still observed an approximately \(10\,\%\) reduction in reprojection error. These findings evidence that upgrading a set of inconsistent homographies to a fully consistent set can yield considerable practical benefits and constitute the raison d’être of our algorithm. In Fig. 4c, we present the median running time of BA, WALS, and our method. Our algorithm is orders of magnitude faster than all other options, requiring on average only four iterations to converge. It is thanks to our algorithm’s remarkable and unique computational efficiency that it can be incorporated into a random sampling and consensus framework to filter out sets of putative homographies that are mutually incompatible (see Sect. 7.6).

Fig. 3
figure 3

Comparison of homography estimation methods on synthetic data. The results are based on averaging the reprojection error over all planes in the scene. The DLT and FNS estimators do not enforce consistency constraints, while WALS, AML, and our variant of BA do. The results show that enforcing consistency constraints can result in considerable denoising. The typical performance of AML and BA is virtually indistinguishable. The poor performance of WALS in (b) is due to the instability of the method. On some occasions it converges to a very poor solution with high reprojection error. This adversely affects its mean RMS score.

Fig. 4
figure 4

Evaluating homography estimation methods on synthetic data by: (a) comparing how often the consistency constraint enforcing methods yield lower reprojection errors than separate FNS homography estimation; (b) quantifying the expected percentage reduction in reprojection error, and (c) measuring their running times

On real data, we evaluate the performance of AML-DLT, WALS-DLT, and BA-DLT against each other by comparing the quality of each estimated homography separately. The comparisons are made by plotting bar graphs of the RMS reprojection error from data

$$\begin{aligned} \sqrt{ \frac{1}{4N_i} \min _{\underline{\mathbf {m}}_{n,i}} \sum _{n=1}^{N_i} \left( d\left( \mathbf {m}_{n,i}, \underline{\mathbf {m}}_{n,i}\right) ^2 + d\left( \mathbf {m}_{n,i}^{'}, \widehat{\varvec{\mathbf {\Theta }}}_i \underline{\mathbf {m}}_{n,i}\right) ^2 \right) }, \end{aligned}$$

where \( \{\{\mathbf {m}_{n,i}, \mathbf {m}_{n,i}^{'}\}_{n=1}^{N_i}\}_{i=1}^{I} \) are the original corresponding points and \( \{\{\underline{\mathbf {m}}_{n,i}\} _{n=1}^{N_i}\}_{i=1}^{I} \) are arbitrary 2D points over which the minimum is taken. By plotting the error for each homography separately, it is possible to gain a deeper insight into the performance of each method. A comparison on real data is given in Figs. 56, and 7. The results show that WALS is inferior to both AML and BA, frequently displaying much higher errors in at least one of the homographies. On the other hand, our proposed scheme produces results which are often a very good approximation to BA.

Fig. 5
figure 5

Comparison of consistency enforcing estimation methods on the Model House sequence. The first two columns show the feature points (inliers) associated with various homographies on the model house views. The third column compares the reprojection error of AML-DLT, WALS-DLT, and BA-DLT for each homography. Note that in some figures the errors associated with particular homographies are so close to zero that the bar graphs are barely visible

Fig. 6
figure 6

Comparison of consistency enforcing estimation methods on the AdelaideRMF sequence. The first two columns show the feature points (inliers) associated with various homographies on the Lady Symon and Old Classics Wing buildings. The third column compares the reprojection error of AML-DLT, WALS-DLT, and BA-DLT for each homography

Fig. 7
figure 7

Comparison of consistency enforcing estimation methods on the Traffic Signs sequence. The first two columns show the feature points (inliers) associated with various homographies on two traffic scenes. The third column compares the reprojection error of AML-DLT, WALS-DLT, and BA-DLT for each homography. Note that in some cases, the errors associated with particular homographies are so close to zero that the bar graphs are barely visible

We defer a deeper discussion of the results to Sect. 8.

figure f

7.6 Application: robust consistent homography estimation

Our experiments have shown that the scope of consistency enforcing algorithms is not limited to synthetic data—when the algorithms are applied to homographies estimated on real-world scenes, full consistency can still be achieved. However, neither AML, nor WALS, nor BA is truly robust to outliers and it may be the case that if consistency constraints are imposed on a collection of homographies where one of the homographies is an outlier, then all of the methods may fail to produce accurate estimates. That is because each method employs a non-robust error measure and relies on an initialisation which in its current form is not robust. One possible way to circumvent any robustness issues is to incorporate the consistency constraints directly into the well-known RANSAC estimation loop [24, Sect. 4.7.1], so that the homographies returned by a RANSAC procedure are by construction fully consistent. This can be achieved by a simple modification of the canonical RANSAC procedure, and the pseudo-code for the modification is given in Algorithm 2.

Algorithm 2 makes use of several functions, namely \(f\), \(g\), \(h\), and \(\rho \). The function \(f\) can be any standard method for estimating a single homography, such as DLT or FNS. The function \(g\) serves to determine a set of inliers consistent with a given homography estimate. It utilises a cost function \(\rho \) that computes an error between a point correspondence and a homography, and declares a particular correspondence to be a member of the set of inliers for the given homography estimate if the value of \(\rho \) for that correspondence falls below a user-specified threshold \(t\). Typically, the cost function \(\rho \) is chosen to be the symmetric transfer error. The function \(h\) represents a procedure that takes as input a set of homography matrices together with covariances and produces a new set of fully consistent homographies. In our implementation, we take for \(h\) the AML algorithm proposed in this paper.

When faced with multiple planar structures in a set of correspondences, researchers and practitioners using standard RANSAC frequently resort to the following strategy: (1) search for the homography with the largest number of inliers and (2) from all considered homographies, choose the homography with the largest number of inliers, remove its inliers from the set of correspondences and repeat the whole process to find the next plane [29, 44]. The essence of Algorithm 2 is the same, except that we modify step (2) so that from all considered homographies, we choose the homography that has the largest number of inliers and at the same time is fully consistent with any planar structures found so far. We then remove its inliers from the set of correspondences and repeat the whole process to find the next plane.

Since computational efficiency is crucial for algorithms like RANSAC that are intended to be practical, our modification of RANSAC is an excellent example of where one would prefer to impose consistency constraints using our AML algorithm, which is relatively fast, instead of BA, which is slower. As a proof of concept of our idea, we show in Fig. 8 the inliers associated with fully consistent homographies on various real data taken from the Oxford dataset, which were produced by running our modified RANSAC algorithm. The results demonstrate that finding numerous visually pleasing inliers related to fully consistent homographies is fundamentally possible.

Fig. 8
figure 8

Inliers associated with fully consistent homographies found using Algorithm 2. The results on real data indicate that finding fully consistent homographies using our AML cost function is tractable

8 Discussion

We have identified several important factors that may help explain the success of our method. Firstly, the error covariance matrices associated with the individual homography estimates are absolutely crucial. Without covariance information we are unable to produce meaningful results. Secondly, our scale-invariant cost function captures the projective nature of the parameters that are being estimated, and ensures that the optimisation method always takes a step in a meaningful direction. The scale invariance is not properly accounted for in the method of Chen and Suter, and this partially explains why their method tends to produce worse results. Furthermore, the Levenberg–Marquardt optimisation method is better suited for solving non-linear optimisation problems than for example the bilinear approach followed by Chen and Suter.

Our modification of RANSAC serves to emphasise the practical utility of our research outcome. There are, of course, many other modifications to the original RANSAC algorithm, each claiming to be the best. Usually, the modifications are based on improvements to the sampling scheme. Although they may be faster than the original RANSAC, none can produce fully consistent homographies. We leave it to future work to decide which modern variant of RANSAC to couple with our full consistency constraints to achieve maximum computational efficiency.

We also wish to point out that our algorithm need not be incorporated into a sampling procedure such as RANSAC to be of practical use. One could, for example, utilise the recently proposed generalised projection based M-estimator [36] to estimate a set of inconsistent homographies, and then upgrade them to a fully consistent set using our method. Since enforcing consistency constraints leads to a reduction in reprojection error and hence an improvement in the quality of the homographies, it always makes sense to do so.

9 Conclusion

Our principal contribution within this paper has been to formulate the problem of estimating multiple interdependent homographies in a manner that enforces full consistency between all constituent homographies. A key feature of this formulation is the exploitation of latent variables that link together different homographies. The resulting solution strategy has a natural translation into a full-blown BA procedure and into the computationally more efficient AML estimation procedure. Since both BA and AML are non-linear optimisation techniques, they both require a suitable initialisation to produce meaningful results.

In this connection, our second important contribution has been the derivation of a novel compact initialisation procedure which can be applied when there are two or more planes in the scene. In contrast, the initialisation procedure of Chen and Suter is more involved and requires at least there planes before it can be utilised.

The third notable contribution of our work has been to show that our scale-invariant AML cost function frequently produces estimates that are an accurate approximation of BA. Compared to the WALS algorithm of Chen and Suter, our method has the advantage that it consistently converges to a better minimum. This is evidenced by experimental results on both real and synthetic data. Furthermore, we have furnished a derivation of suitable homography covariance matrices for both DLT and FNS estimates, to be found in Appendices A and B, and our final formulas are tailored to facilitate easy implementation.

Drawing on the experimentally confirmed validity of our approach, we have proposed a modification to the celebrated RANSAC algorithm to robustly estimate fully consistent homographies. Since in RANSAC computational efficiency is vital, the modified algorithm serves as a noteworthy example of where our AML scheme is preferable to BA.

Finally, we conclude by pointing out that once recovered, the latent variables encoding the fully consistent homographies can be immediately utilised to provide a projective reconstruction of the scene.

The source code for our experiments can be found at http://sites.google.com/site/szpakz/.