Enforcing consistency constraints in uncalibrated multiple homography estimation using latent variables

Chojnacki, Wojciech; Szpak, Zygmunt L.; Brooks, Michael J.; van den Hengel, Anton

doi:10.1007/s00138-015-0660-7

Enforcing consistency constraints in uncalibrated multiple homography estimation using latent variables

Original Paper
Published: 13 March 2015

Volume 26, pages 401–422, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Machine Vision and Applications Aims and scope Submit manuscript

Enforcing consistency constraints in uncalibrated multiple homography estimation using latent variables

Download PDF

Wojciech Chojnacki¹,
Zygmunt L. Szpak¹,
Michael J. Brooks¹ &
…
Anton van den Hengel¹

424 Accesses
5 Citations
Explore all metrics

Abstract

An approach is presented for estimating a set of interdependent homography matrices linked together by latent variables. The approach allows enforcement of all underlying consistency constraints while accounting for the arbitrariness of the scale of each individual matrix. The input data is assumed to be in the form of a set of homography matrices individually estimated from image data with no regard to the consistency constraints, appended by a set of error covariances, each characterising the uncertainty of a corresponding homography matrix. A statistically motivated cost function is introduced for upgrading, via optimisation, the input data to a set of homography matrices satisfying the constraints. The function is invariant to a change of any of the individual scales of the input matrices. The proposed approach is applied to the particular problem of estimating a set of homography matrices induced by multiple planes in the 3D scene between two views. An optimisation algorithm for this problem is developed that operates on natural underlying latent variables, with the use of those variables ensuring that all consistency constraints are satisfied. Experimental results indicate that the algorithm outperforms previous schemes proposed for the same task and is fully comparable in accuracy with the ‘gold standard’ bundle adjustment technique, rendering the whole approach both of practical and theoretical interest. With a view to practical application, it is shown that the proposed algorithm can be incorporated into the familiar random sampling and consensus technique, so that the resulting modified scheme is capable of robust fitting of fully consistent homographies to data with outliers.

Full Explicit Consistency Constraints in Uncalibrated Multiple Homography Estimation

A Homography Formulation to the 3pt Plus a Common Direction Relative Pose Problem

MultiCol Bundle Adjustment: A Generic Method for Pose Estimation, Simultaneous Self-Calibration and Reconstruction for Arbitrary Multi-Camera Systems

Article 23 July 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Estimation of a single homography matrix from image measurements is an important step in 3D reconstruction, mosaicing, camera calibration, metric rectification, and other tasks [24]. For some applications, like non-rigid motion detection [25, 46] or enhanced image warping [20], a whole array of homography matrices are required. The matrices have to be intrinsically interconnected to satisfy consistency constraints representing the rigidity of the motion and the scene. Moreover, the matrices have to be collectively multi-homogeneous—rescaling any individual matrix should not affect the projective information contained in the whole matrix set. A key problem in estimating multiple homography matrices is to enforce the underlying consistency constraints while accounting for the arbitrariness of the individual scales of the matrices. The need to cope with scale indeterminacy has not been particularly emphasised previously, but that it is an important aspect of the problem will be apparent from our study.

As a rule, the consistency constraints are available only in implicit form. The conventional approach to cope with such constraints is to evolve a derivative family of explicit constraints. The new constraints are typically more relaxed than the original ones. Adhering to this methodology, Shashua and Avidan [40] have found that homography matrices induced by four or more planes in the 3D scene between two views span a four-dimensional linear subspace. Chen and Suter [4] have derived a set of strengthened constraints for the case of three or more homographies in two views. Zelnik-Manor and Irani [46] have shown that another rank-four constraint applies to a set of so-called relative homographies generated by two planes between four or more views. These latter authors have also derived constraints for larger sets of homographies and views.

Once isolated, the explicit constraints can be put to use in a procedure whereby first individual homography matrices are estimated from image data, and next these matrices are upgraded to matrices satisfying the constraints. Following this pattern, Shashua and Avidan as well as Zelnik-Manor and Irani used low-rank approximation under the Frobenius norm to enforce the rank-four constraint. Chen and Suter enforced their set of constraints also via low-rank approximation, but then employed the Mahalanobis norm with covariances of input homographies. All these estimation procedures involve input matrices coming with specific scale factors. The underlying error measures are such that a change of scale factors may a priori result in a different set of estimates. Furthermore, the output matrices satisfy only the derivative constraints, so their perfect consistency is not guaranteed. Another limitation of the existing methods is that each requires a certain minimum number of input homography matrices and none can work with only two such matrices.

This paper presents an alternative approach to estimating interdependent homography matrices which ensures that all implicit constraints are enforced and that the final estimates are unaffected by any specific choice of individual scale factors. A statistically motivated cost function is proposed for upgrading, via optimisation, the input set of homography matrices to a set satisfying all possible constraints. The function is scale change insensitive. To achieve high estimation accuracy, it incorporates the covariances of the input matrices, and this yields, upon optimisation, a statistically sound approximated maximum likelihood fit to the data. The utility of the function is demonstrated in a specific application, namely the problem of estimating a set of homography matrices induced by multiple planes in a 3D scene between two views. A variant of the Levenberg–Marquardt algorithm for that problem is developed that is specifically tailored to a parametrisation of the homography matrices via natural underlying latent variables. The use of the parametrisation ensures that all consistency constraints are satisfied. A notable contribution of this work is the development of a procedure for determining initial values of the latent variables, so that the optimisation process seeded with these values converges to a useful local minimum. Importantly, the procedure works already for two input homography matrices, hence enabling the overall estimation scheme to work for two input homography matrices or more. The initialisation procedure is in fact of wider interest, as it is suitable for initialising other methods that operate on the same latent variables, including the canonical bundle adjustment technique for maximum likelihood estimation. The results which are contained in the experimental section of the paper validate the approach and show that the proposed estimation method outperforms existing schemes, achieving high levels of accuracy on par with the ‘gold-standard’ bundle adjustment technique. While the newly introduced method, like all other aforementioned methods, is not truly robust to outliers, it can—as it turns out—be made robust via incorporation into a bigger robust fitting scheme. A particular scheme of this kind forms a final contribution of the paper, and one of practical utility. It is a modification of the well-known random sampling and consensus (RANSAC) technique specialised to facilitate robust fitting of fully consistent homographies to data with outliers. The proposed estimation method enters the modified RANSAC as a computationally efficient tool for generating fully consistent homographies.

Earlier results informing this work appeared in [15]. The present findings are part of a broader, ongoing study, one of whose recent contributions is [42].

2 Multi-projective parameter estimation and latent variables

We first formulate the problem of estimating a set of interdependent homographies as a problem in multi-projective parameter estimation [14]. A general multi-projective parameter estimation problem involves a collection $\mathbf {X}_1, \dots , \mathbf {X}_I$ of $k \times l$ matrices envisaged as data points, and a collection ${\varvec{\mathsf {\Theta }}}_1, \dots , {\varvec{\mathsf {\Theta }}}_I$ of $k \times l$ matrices treated as parameters. Each $\mathbf {X}_i$ is assumed to be known only up to an individual multiplicative non-zero factor. The ’s are subject to constraints and are meant to represent improved versions of the $\mathbf {X}_i$’s. With the $k \times Il$ matrix ${\varvec{\mathsf {X}}} = [\mathbf {X}_1, \dots , \mathbf {X}_I]$ denoting the composite datum and the $k \times Il$ matrix denoting the composite parameter, the problem under consideration is to fit to ${\varvec{\mathsf {X}}}$ so that the constraints on are met. Exemplifying this general problem is the following specific problem of interest:

Problem 1

Fit a set of $3 \times 3$ matrices, representing planar homographies engendered by various planes in a 3D scene under common projections on two images, to a given set of $3 \times 3$ matrices.

To see how the multi-projective framework applies here, consider a pair of fixed cameras with camera matrices $\mathbf {K}_{1}\mathbf {R}_{1}[\mathbf {I}_{3}, -\mathbf {t}_{1}]$ and $\mathbf {K}_{2}\mathbf {R}_{2}[\mathbf {I}_{3}, -\mathbf {t}_{2} ]$. Here, the length-3 translation vector $\mathbf {t}_{n}$ and the $3 \times 3$ rotation matrix $\mathbf {R}_{n}$ represent the Euclidean transformation between the $n$th ($n = 1,2$) camera and the world coordinate system, $\mathbf {K}_{n}$ is a $3 \times 3$ upper triangular calibration matrix encoding the internal parameters of the $n$th camera, and, for each $m = 1, 2, \dots $, $\mathbf {I}_m$ denotes the $m \times m$ identity matrix. Suppose, moreover, that a set of $I$ planes in a 3D scene have been selected. Given $i = 1, \dots , I$, let the $i$th plane from the collection have a unit outward normal $\mathbf {n}_i$ and be situated at a distance $d_i$ from the origin of the world coordinate system. Then, for each $i = 1, \dots , I$, the $i$th plane gives rise to a planar homography between the first and second views described by the $3 \times 3$ matrix

$$\begin{aligned} \mathbf {H}_i = w_i \mathbf {A} + \mathbf {b} \mathbf {v}_i^{\top }, \end{aligned}$$

(1)

where

$$\begin{aligned} \mathbf {A}&=\mathbf {K}_{2}\mathbf {R}_{2}\mathbf {R}_{1}^{-1}\mathbf {K}_{1}^{-1}, \quad w_{i} = \mathbf {n}_{i}^{\top }\mathbf {t}_{1} - d_{i}, \nonumber \\ \mathbf {b}&=\mathbf {K}_{2}\mathbf {R}_{2}(\mathbf {t}_{1} - \mathbf {t}_{2}), \quad \mathbf {v}_{i} =\mathbf {K}^{-\top }_{1} \mathbf {R}_{1} \mathbf {n}_{i}. \end{aligned}$$

(2)

We note that in the case of calibrated cameras when one may assume that $\mathbf {K}_1 = \mathbf {K}_2 = \mathbf {I}_3$, $\mathbf {t}_1 = \mathbf {0}$, $\mathbf {R}_1 = \mathbf {I}_3$, $\mathbf {R}_2 = \mathbf {R}$, system (2) reduces to

$$\begin{aligned} \mathbf {A}&=\mathbf {R}, \quad w_{i} = - d_{i}, \nonumber \\ \mathbf {b}&= \mathbf {t}, \quad \mathbf {v}_{i} = \mathbf {n}_{i}, \end{aligned}$$

(3)

with $\mathbf {t} = - \mathbf {R}\mathbf {t}_2$, and equality (1) becomes the familiar direct nRt representation

$$\begin{aligned} \mathbf {H}_i = -d_i \mathbf {R} + \mathbf {t} \mathbf {n}_i^\top \end{aligned}$$

(cf. [2, 33]). We stress that all of our subsequent analysis concerns the general case of possibly uncalibrated cameras, with $\mathbf {A}$, $\mathbf {b}$, $w_i$’s and $\mathbf {v}_i$’s to be interpreted according to (2) rather than (3).

Let $ {\varvec{\mathsf {H}}} = [\mathbf {H}_1, \dots , \mathbf {H}_I] $ be the composite of all the homography matrices in question. Then, with $\mathbf {a} = {{\mathrm{vec}}}(\mathbf {A})$, where ${{\mathrm{vec}}}$ denotes column-wise vectorisation [32], $ \varvec{\eta }= [\mathbf {a}^\top , \mathbf {b}^\top , \mathbf {v}_1^\top , \dots , \mathbf {v}_I^\top , w_1, \dots , w_I]^\top , $ and

(4)

${\varvec{\mathsf {H}}}$ can be represented as

(5)

The components of $\varvec{\eta }$ constitute latent variables that link all the matrices $\varvec{\Pi }_i(\varvec{\eta })$ together and provide a natural parametrisation of the set of all ${\varvec{\mathsf {H}}}$’s. Since $\varvec{\eta }$ has a total of $4I + 12$ entries, the set of all matrices of the form has dimension no greater than $4I + 12$. A more refined argument shows that the set of all ’s has in fact dimension equal to $4I + 7$ [12, 13]. Since $4I + 7 < 9I$ whenever $I \ge 2$, it follows that ${\varvec{\mathsf {H}}}$ resides in a proper subset of all $3 \times 3I$ matrices for $I \ge 2$. Thus, the requirement that ${\varvec{\mathsf {H}}}$ take the form as per (5) whenever $I \ge 2$ can be seen as an implicit constraint on ${\varvec{\mathsf {H}}}$, with the consequence that the $\mathbf {H}_i$’s are all interdependent. Suppose that an estimate $ {\varvec{\mathsf {X}}} = [\mathbf {X}_1, \dots , \mathbf {X}_I] $ of ${\varvec{\mathsf {H}}}$ has been generated in some way. For example, for each $i$, $\mathbf {X}_i$ might be an estimate of $\mathbf {H}_i$ individually obtained from image data. The estimation problem at hand is to upgrade ${\varvec{\mathsf {X}}}$ to so that holds for some $\varvec{\eta }$ and is close to ${\varvec{\mathsf {X}}}$ in a meaningful sense. The essence here is to find a criterion and effective means for selecting an appropriate $\varvec{\eta }$.

3 Approximate maximum likelihood cost function and scale invariance

The general problem of fitting to ${\varvec{\mathsf {X}}}$ with constraints imposed on is best considered as an optimisation problem. Since the input matrices are known only up to individual scales, the output matrices should also be determined only to within individual scales. This can be achieved through the use of multi-homogeneous cost functions. A function $J$ is multi-homogeneous if

for each length-$I$ vector $\varvec{\lambda }= [\lambda _1, \dots , \lambda _I]^\top $ with non-zero entries, where . It is clear that if a function is multi-homogeneous, then it is minimised not only at a single , but also at all composite multiples .

To describe a multi-homogeneous cost function relevant to our problem, for each $i = 1, \dots , I$, let

$$\begin{aligned} \varvec{\theta }_i = {{\mathrm{vec}}}({\varvec{\mathsf {\Theta }}}_i), \quad \mathbf {x}_i = {{\mathrm{vec}}}(\mathbf {X}_i), \end{aligned}$$

with each vector having length $kl$. Referring to the $\mathbf {X}_i$’s via their vectorisations, suppose that associated with each $\mathbf {x}_i$ is a $kl \times kl$ covariance matrix $\varvec{\Lambda }_{x_i}$. Our cost function will incorporate the $\varvec{\Lambda }_{\mathbf{x}_i}$’s, assuming that they all take a specific form that we elaborate on next.

If the parameters of a model are represented by a vector to within a scale factor, then the covariance matrix of any particular estimate of the parameter vector is not uniquely determined. To see why, recall first that covariances are averages of squared perturbations of specific instantiations of a model. If the model is over-parametrised, involving some redundant parameters like an indeterminate scale, then the parameter vector is not identified by the model and perturbations in the model space do not translate unequivocally into perturbations in the parameter space. A way of dealing with this problem is to restrict the parameter space—that is, to identify the model—by imposing equality constraints. Such an imposition is known as gauge fixing, with the term “gauge” referring to any particular set of constraints. The covariances evolved with the aid of gauge-fixing rules are gauge dependent—they can look very different for different gauges [27, 43]. In what follows, we eliminate the scale indeterminacy by imposing the normalisation constraint $\Vert \mathbf {x} \Vert = 1$ (see Fig. 1). Under this condition, the covariance matrix of an estimate $\mathbf {x}$ is such that $ \varvec{\Lambda }_{\mathbf{x}} = \varvec{\Lambda }_{\pm \Vert \mathbf {x} \Vert ^{-1}\mathbf {x}} $ and, moreover, $ \varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{} = \mathbf {P}_{\mathbf {x}}^\perp \varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{0} \mathbf {P}_{\mathbf {x}}^\perp , $ where $\mathbf {P}_{\mathbf {x}}^\perp $ is the $kl \times kl$ symmetric projection matrix given by $ \mathbf {P}_{\mathbf {x}}^\perp = \mathbf {I}_{kl} - \Vert \mathbf {x} \Vert ^{-2} \mathbf {x} \mathbf {x}^{\top } $ and $\varvec{\Lambda }^{0}_{\mathbf{x}}$ is a $kl \times kl$ symmetric matrix that we shall refer to as a pre-covariance matrix. An argument leading to the above assertion, together with explicit expressions for $\varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{0}$ in two specific cases, can be found in Appendices A and B; see also [3, 28]. As $ \mathbf {P}_{\mathbf {x}}^\perp \mathbf {x} = \mathbf {0} $ and $ \mathbf {x}^{\top } \mathbf {P}_{\mathbf {x}}^\perp = \mathbf {0}^{\top }, $ the matrix $\varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{}$ satisfies $ \varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{} \mathbf {x} = \mathbf {0} \quad \text {and} \quad \mathbf {x}^{\top } \varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{} = \mathbf {0}^{\top }, $ and, in particular, is singular. In line with this, the companion information matrix associated with $\mathbf {x}$, given by the Moore–Penrose pseudo-inverse $\varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{+}$ of $\varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{}$, also satisfies $ \varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{+} \mathbf {x} = \mathbf {0} \quad \text {and} \quad \mathbf {x}^{\top } \varvec{\mathbf {\Lambda }}_{\mathbf {x}}^{+} = \mathbf {0}^{\top }, $ and is singular.

We take for our approximate maximum likelihood (AML) cost function the squared Mahalanobis distance between any aggregate $\{\epsilon _i \Vert \varvec{\theta }_i \Vert ^{-1} \varvec{\theta }_i\}_{i=1}^I$, $\epsilon _i = \pm 1$, of normalised, arbitrarily signed variants of the $\varvec{\theta }_i$’s and any aggregate $\{\epsilon _i^{\prime } \Vert \mathbf {x}_i \Vert ^{-1} \mathbf {x}_i)\}_{i=1}^I$, $\epsilon _i^{\prime } = \pm 1$, of similar variants of the $\mathbf {x}_i$’s, with the matrices $\varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{+}$ serving as weights

$$\begin{aligned} J_{\mathrm {AML}}({\varvec{\mathsf {\Theta }}})= & {} \sum _{i=1}^I \left( \epsilon _i \Vert \varvec{\theta }_i \Vert ^{-1} \varvec{\theta }_i - \epsilon _i^{\prime } \Vert \mathbf {x}_i \Vert ^{-1} \mathbf {x}_i\right) ^\top \\&\times \varvec{\mathbf {\Lambda }}_{\epsilon _i^{\prime } \Vert \mathbf {x}_i \Vert ^{-1} \mathbf {x}_i}^{+} \left( \epsilon _i \Vert \varvec{\theta }_i \Vert ^{-1} \varvec{\theta }_i - \epsilon _i^{\prime } \Vert \mathbf {x}_i \Vert ^{-1} \mathbf {x}_i\right) . \end{aligned}$$

On account of $ \varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{+} \mathbf {x}_i = \mathbf {0}, $ $ \mathbf {x}_i^\top \varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{+} = \mathbf {0}^\top , $ and $ \varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{} = \varvec{\mathbf {\Lambda }}_{\pm \Vert \mathbf {x}_i \Vert ^{-1}\mathbf {x}_i}^{}, $ the expression for the function does not depend on any particular choice of the signs $\epsilon _i$ and $\epsilon _i^{\prime }$ and reduces to

$$\begin{aligned} J_{\mathrm {AML}}({\varvec{\mathsf {\Theta }}}) = \sum _{i=1}^I \Vert \varvec{\theta }_i \Vert ^{-2} \varvec{\theta }_i^{\top } \varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{+} \varvec{\theta }_i. \end{aligned}$$

Note that, for each $i = 1, \dots , I$, multiplying $\varvec{\theta }_i$ by a non-zero scalar $\lambda _i$ results in $\varvec{\theta }_i^{\top } \varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{+} \varvec{\theta }_i$ being multiplied by $\lambda _i^2$ and $\Vert \varvec{\theta }_i \Vert ^{-2}$ being multiplied by $\lambda _i^{-2}$, so that $\Vert \varvec{\theta }_i \Vert ^{-2} \varvec{\theta }_i^{\top } \varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{+} \varvec{\theta }_i$ remains intact. As a result, the AML function is multi-homogeneous.

The label “approximate” in the name of $J_{\mathrm {AML}}$ refers to the fact that $J_{\mathrm {AML}}$ is an approximation—to within an additive constant—of another maximum likelihood-based cost function, namely the exact maximum likelihood (EML) cost function that underpins the bundle adjustment method for estimating the $\varvec{\theta }_i$’s directly from image data. The EML cost function, encompassing the so-called reprojection error [24, Sect. 4.2.3], has for its composite argument the principal parameters $\varvec{\theta }_i$, $i = 1, \dots , I$, alongside some additional nuisance parameters. The approximation leading from the EML cost function to the AML cost function involves, among other things, the elimination of the nuisance parameters (cf. [8, 10, 11, 26, 31, 35]).

With the significance of the AML cost function elucidated, when one now takes into consideration the constraints on , the corresponding constrained minimiser of $J_{\mathrm {AML}}$ can be viewed as a statistically well-founded estimate of .

4 Rank-four constraint enforcement

This section will depart from the primary topic of the paper, which is the upgrading of multiple homographies using the AML cost function, to shed light on an earlier multi-homography updating technique commonly known as rank-four constraint enforcement. The material here is fairly technical and can be skipped without detriment to the understanding of the rest of the paper. The purpose of the departure is to reconcile an apparent contradiction: while we stress the importance of explicitly modelling the estimation process within a multi-projective framework, rank-four constraint enforcement—which is known to work—appears at first glance to be outside of this framework. The principle underpinning rank-four constraint enforcement is that a collection of five or more interrelated homography matrices resides in an at most four-dimensional subspace. In its standard form, the technique enforces the rank-four constraint linearly via a singular value decomposition (SVD) based projection on a linear space of lower dimension. The standard form can naturally be extended to a whole family of weighted variants. It is not immediately clear how all of these, including the standard form, can fit into the category of multi-projective techniques. Settling this question seems necessary, given the multi-projective standpoint advocated in this paper. Below, we reveal that all versions of the method, despite their linear allure, are in fact fully compatible with the multi-projectiveness paradigm.

Given a composite of homography matrices $ {\varvec{\mathsf {H}}} = [\mathbf {H}_1, \dots , \mathbf {H}_I] $ satisfying (5), let $\mathbf {H}$ be the $9 \times I$ matrix given by

$$\begin{aligned} \mathbf {H} = [\mathbf {h}_1, \dots \mathbf {h}_I], \quad \mathbf {h}_i = {{\mathrm{vec}}}(\mathbf {H}_i). \end{aligned}$$

(6)

Since, for each $i = 1, \dots , I$,

$$\begin{aligned} \mathbf {h}_i = w_i {{\mathrm{vec}}}(\mathbf {A}) + {{\mathrm{vec}}}(\mathbf {b} \mathbf {v}_i^{\top }) = w_i \mathbf {a} + (\mathbf {I}_3 \otimes \mathbf {b})\mathbf {v}_i, \end{aligned}$$

(7)

where $\otimes $ denotes Kronecker product [32], it follows that

$$\begin{aligned} \mathbf {H} = \mathbf {S} \mathbf {T}, \end{aligned}$$

(8)

where $\mathbf {S}$ is the $9 \times 4$ matrix given by

$$\begin{aligned} \mathbf {S} = [\mathbf {I}_3 \otimes \mathbf {b}, \mathbf {a}] \end{aligned}$$

and $\mathbf {T}$ is the $4 \times I$ matrix given by

$$\begin{aligned} \mathbf {T} = \begin{bmatrix} \mathbf {v}_1&\dots&\mathbf {v}_I \\ w_1&\dots&w_I \end{bmatrix}. \end{aligned}$$

An immediate consequence of (8) is that $\mathbf {H}$ has rank at most four. The requirement that $\mathbf {H}$ should have rank no greater than four places a genuine constraint on $\mathbf {H}$, and hence also on ${\varvec{\mathsf {H}}}$, whenever $I \ge 5$. This constraint is exactly the rank-four constraint of Shashua and Avidan mentioned in the Introduction. In accordance with what has already been pointed out, when $I \ge 5$, the rank-four constraint can be enforced linearly, by employing SVD, in an infinite numbers of ways. Specifically, with every length-$I$ vector $\mathbf {p} = [p_1, \dots , p_I]^\top $ with non-zero entries, each having the meaning of an importance weight, there is associated a specific enforcement procedure. It needs to be immediately added that the procedures associated with different $\mathbf {p}$ are not necessarily different. If $\mathbf {p}$ and $\mathbf {p}'$ are proportional up to individual signs of the entries of $\mathbf {p}$ and $\mathbf {p}'$, that is, if $p_i = \lambda \epsilon _i p_i'$ with $\epsilon _i = \pm 1$ for some $\lambda \ne 0$ and all $i = 1, \dots , I$, then the corresponding procedures are equivalent in the sense that the sets of respective output homographies (but not necessarily the sets of output homography matrices) are equal. If $\mathbf {p}$ and $\mathbf {p}'$ are not proportional up to individual signs of the entries of $\mathbf {p}$ and $\mathbf {p}'$, then the respective procedures are, as a rule, different.

Before describing the enforcement procedures in detail, we shall need to identify the inverse mapping $\varvec{r}$ to the mapping ${\varvec{\mathsf {H}}} \mapsto \mathbf {H}$ defined in (6). It is readily verified that ${\varvec{\mathsf {H}}}$ can be expressed in terms of $\mathbf {H}$ as

$$\begin{aligned} {\varvec{\mathsf {H}}} = \varvec{r}(\mathbf {H}) = ({{\mathrm{vec}}}(\mathbf {H}))^{(3)} \end{aligned}$$

Here, given an $n \times m$ matrix $\mathbf {S}$ and a positive integer $r$ that divides $m$, $\mathbf {S}^{(r)}$ denotes the $r$-wise vector transposition of $\mathbf {S}$, that is, the $n \times (m/r)$ matrix obtained by performing a block transposition on $\mathbf {S}$, with blocks comprising length-$r$ column vectors [19].^{Footnote 1} In MATLAB parlance,

$$\begin{aligned} {\varvec{\mathsf {H}}} = \varvec{r}(\mathbf {H}) = \mathtt {reshape}(\mathbf {H}, 3, 3I). \end{aligned}$$

With these preparations in place, given $ {\varvec{\mathsf {X}}} = [\mathbf {X}_1, \dots , \mathbf {X}_I] $ with $I \ge 5$, let $\mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}})$ be the $9 \times I$ matrix given by

$$\begin{aligned} \mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}}) = [p_1 \Vert \mathbf {x}_1 \Vert ^{-1} \mathbf {x}_1, \dots , p_I \Vert \mathbf {x}_I \Vert ^{-1} \mathbf {x}_I]. \end{aligned}$$

Let $ \mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}}) = \mathbf {U} \mathbf {D} \mathbf {V}^\top $ be the SVD of ${\varvec{\mathsf {X}}}$, with $\mathbf {D}$ being a $9 \times I$ diagonal matrix with main diagonal entries $d_{11}, \dots d_{qq}$, $q = \min (9,I)$, and, correspondingly, let $ \mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}})_4 = \mathbf {U} \mathbf {D}_4 \mathbf {V}^\top $ be the $4$-truncated SVD of $\mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}})$, with $\mathbf {D}_4$ resulting from $\mathbf {D}$ by replacing the entries $d_{55}, \dots , d_{qq}$ by zero. Define the rank-four correction of ${\varvec{\mathsf {X}}}$ by

The procedure whereby ${\varvec{\mathsf {X}}}$ gets upgraded to ${\varvec{\mathsf {\widehat{\Theta }}}}_{\mathrm {rank4},\mathbf {p}}$ constitutes the SVD-based method for enforcing the rank-four constraint associated with $\mathbf {p}$. The default version of the method corresponds to $\mathbf {p}$ having all entries equal to $1$.

We claim that, for any $\mathbf {p}$, the mapping that sends ${\varvec{\mathsf {X}}}$ to has the following property: If ${\varvec{\mathsf {X}}}$ is replaced by $\times \varvec{\lambda }{\varvec{\mathsf {X}}}$ with $\varvec{\lambda }$ being a length-$I$ vector with non-zero entries, then is replaced by for some length-$I$ vector $\varvec{\sigma }$ that depends only on $\varvec{\lambda }$. This property implies that the homographies described by the ${\varvec{\mathsf {\widehat{\varvec{\mathbf {\Theta }}}}}}_{\mathrm {rank4}, \mathbf {p}, i}$’s are well defined as functions of the homographies described by the $\mathbf {X}_i$’s, providing thereby a desired reconciliation with the multi-projectiveness paradigm.

It is clear that $\mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}})$ is multi-positively invariant: if $\varvec{\lambda }= [\lambda _1, \dots , \lambda _I]^\top $ has all entries positive, then $ \mathcal {F}_{\mathbf {p}}(\times \varvec{\lambda }{\varvec{\mathsf {X}}}) = \mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}}). $ Consequently, is also multi-positively invariant. We shall next show that is multi-sign equivariant: if $\varvec{\epsilon }= [\epsilon _1, \dots , \epsilon _i]^\top $ is such that $\epsilon _i = \pm 1$ for each $i = 1, \dots , I$, then

(9)

To this end, we introduce $ \mathbf {E} = {{\mathrm{diag}}}(\epsilon _1, \dots , \epsilon _I). $ Note that $\mathbf {E}^\top = \mathbf {E}$ and $\mathbf {E}^2 = \mathbf {I}_I$, so in particular $\mathbf {E}$ is orthogonal: $\mathbf {E}\mathbf {E}^\top = \mathbf {I}_I$. Note also that

$$\begin{aligned} \mathcal {F}_{\mathbf {p}}(\times \varvec{\epsilon }{\varvec{\mathsf {X}}}) = \left[ p_1 \epsilon _1\Vert \mathbf {x}_1 \Vert ^{-1} \mathbf {x}_1, \dots , p_I \epsilon _I\Vert \mathbf {x}_I \Vert ^{-1} \mathbf {x}_I\right] = \mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}}) \mathbf {E}. \end{aligned}$$

Clearly,

$$\begin{aligned} \mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}}) \mathbf {E} = \mathbf {U} \mathbf {D} \mathbf {V}^\top \mathbf {E} = \mathbf {U} \mathbf {D} \mathbf {V}^\top \mathbf {E}^\top = \mathbf {U} \mathbf {D} (\mathbf {E} \mathbf {V})^\top . \end{aligned}$$

As $\mathbf {V}$ and $\mathbf {E}$ are both orthogonal, $\mathbf {E} \mathbf {V}$ is orthogonal too, and so $ \mathbf {U} \mathbf {D} (\mathbf {E} \mathbf {V})^\top $ is the SVD of $\mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}}) \mathbf {E}$. Consequently,

$$\begin{aligned} (\mathcal {F}_{\mathbf {p}}({\varvec{\mathsf {X}}}) \mathbf {E})_4 = \mathbf {U} \mathbf {D}_4 (\mathbf {E} \mathbf {V})^{\top }. \end{aligned}$$

Taking into account that $ \mathbf {U} \mathbf {D}_4 (\mathbf {E} \mathbf {V})^\top = \mathbf {U} \mathbf {D}_4 \mathbf {V}^{\top } \mathbf {E}^\top = \mathbf {U} \mathbf {D}_4 \mathbf {V}^{\top } \mathbf {E}, $ we conclude that

$$\begin{aligned} \varvec{r}\left( \mathbf {U} \mathbf {D}_4 \mathbf {V}^{\top } \mathbf {E}\right) = \times \varvec{\epsilon }\varvec{r}\left( \mathbf {U} \mathbf {D}_4 \mathbf {V}^{\top }\right) , \end{aligned}$$

which establishes (9). Being multi-positively invariant and multi-sign equivariant, the mapping has the desired property—indeed, a moment’s reflection shows that , where $ \varvec{\sigma }{=} [{{\mathrm{sgn}}}(\lambda _1), \dots , {{\mathrm{sgn}}}(\lambda _I)]^\top . $ The claim is established.

We remark that the argument used above can be readily employed to verify that if $\mathbf {p}$ and $\mathbf {p}'$ are such that $p_i' = \lambda \epsilon _i p_i$, where $\lambda \ne 0$ and $\epsilon _i = \pm 1$ for each $i = 1, \dots , I$, then the procedures and are equivalent. Indeed, it is immediate that $ \mathcal {F}_{\mathbf {p}'}({\varvec{\mathsf {X}}}) = \mathcal {F}_{\mathbf {p}}(\times \varvec{\xi }{\varvec{\mathsf {X}}}), $ where $ \varvec{\xi }= [{{\mathrm{sgn}}}(\lambda )\epsilon _1, \dots , {{\mathrm{sgn}}}(\lambda )\epsilon _I]^\top . $ Hence , and this in turn implies that the sets of homographies defined by and coincide. One consequence of the assertion just established is that to parametrise the enforcement procedures , it suffices to consider weighting vectors $\mathbf {p}$ with all positive entries summing up to $1$.

We finally point out that it is a separate question as to what $\mathbf {p}$ should be chosen to get a statistically most accurate enforcement procedure. A particular value of $\mathbf {p}$, depending on ${\varvec{\mathsf {X}}}$, can be inferred from considerations contained in [46] (reflecting the fact that interrelated homography matrices $\mathbf {X}_i$ can always be brought to the form $\mathbf {X}_i = \lambda _i(\mathbf {A} + \mathbf {b} \mathbf {v}_i^\top )$; see below), but it is conceivable that alternative choices leading to better results exist.

5 Cost function optimisation

After a detour into rank-four constraint enforcement, we now turn our attention to the question of optimisation of functions for which the AML cost function is a prototype—all this, of course, with a view to optimising the AML cost function itself.

Let $J$ be a cost function for fitting to ${\varvec{\mathsf {X}}}$ of the form

where, for each $i=1, \dots , I$, $\mathbf {A}_i$ is a $kl \times kl$ non-negative definite matrix. Clearly, the AML cost function conforms to this profile. Suppose that the constraints on ${\varvec{\mathsf {\Theta }}}$ take the form

where $\varvec{\eta }$ is a length-$d$ vector (we have $d=4I + 12$ in the case of the constraints given in (4)). Upon introducing the function

the constrained optimisation problem in question reduces to that of optimising $J'$, which is an unconstrained optimisation problem.

One way of optimising $J'$ is to use the Levenberg–Marquardt (LM) method. The starting point is to re-express $J'$ as

$$\begin{aligned} J'(\varvec{\eta }) = \sum _{i=1}^I \left\| \mathbf {f}_i'(\varvec{\eta })\right\| ^2, \end{aligned}$$

where, for each $i = 1, \dots , I$,

$$\begin{aligned} \mathbf {f}_i'(\varvec{\eta })= & {} \mathbf {f}_i(\varvec{\pi }_i(\varvec{\eta })), \\ \mathbf {f}_i(\varvec{\theta }_i)= & {} \Vert \varvec{\theta }_i \Vert ^{-1} \mathbf {B}_i \varvec{\theta }_i, \quad \varvec{\pi }_i(\varvec{\eta }) = {{\mathrm{vec}}}(\varvec{\Pi }_i(\varvec{\eta })), \end{aligned}$$

with $\mathbf {B}_i$ a $kl \times kl$ matrix such that $\mathbf {B}_i^\top \mathbf {B}_i = \mathbf {A}_i$; in particular, $\mathbf {B}_i$ may be taken equal to the unique non-negative definite square root of $\mathbf {A}_i$.^{Footnote 2} Let $\mathbf {f}'(\varvec{\eta }) = [\mathbf {f}_1'^\top (\varvec{\eta }), \dots , \mathbf {f}_I'^\top (\varvec{\eta })]^\top $. The LM technique makes use of the $Ikl \times d$ Jacobian matrix $\partial _{\varvec{\eta }}{\mathbf {f}'}$ represented as $ \partial _{\varvec{\eta }}{\mathbf {f}'} = [{\partial _{\varvec{\eta }}{\mathbf {f}_1^{'}}}^\top , \dots , {\partial _{\varvec{\eta }}{\mathbf {f}_I^{'}}}^\top ]^\top . $ For each $i = 1, \dots , I$,

$$\begin{aligned} \partial _{\varvec{\eta }}{\mathbf {f}_i^{'}}(\varvec{\eta }) = \partial _{\varvec{\theta }_i}{\mathbf {f}_i}(\varvec{\pi }_i(\varvec{\eta })) \partial _{\varvec{\eta }}{\varvec{\pi }_i}(\varvec{\eta }) \end{aligned}$$

with $ \partial _{\varvec{\theta }_i}{\mathbf {f}_i}(\varvec{\theta }_i) = \Vert \varvec{\theta }_i \Vert ^{-1} \mathbf {B}_i \mathbf {P}_{\varvec{\theta }_i}^\perp $ and $ \mathbf {P}_{\varvec{\theta }_i}^\perp = \mathbf {I}_{kl} - \Vert \varvec{\theta }_i \Vert ^{-2} \varvec{\theta }_i \varvec{\theta }_i^{\top }. $ The algorithm iteratively improves on an initial approximation $\varvec{\eta }_0$ to the minimiser of $J'$ by constructing new approximations with the aid of the update rule

$$\begin{aligned} \varvec{\eta }_{n+1} = \varvec{\eta }_n - [\mathbf {H}(\varvec{\eta }_n) + \lambda _n \mathbf {I}_{d})]^{-1} [\partial _{\varvec{\eta }}{\mathbf {f}'}(\varvec{\eta }_n)]^\top \mathbf {f}'(\varvec{\eta }_n), \end{aligned}$$

where $ \mathbf {H} = (\partial _{\varvec{\eta }}{\mathbf {f}'})^\top \partial _{\varvec{\eta }}{\mathbf {f}'} $ and $\lambda _n$ is a non-negative scalar that dynamically changes from step to step. Details concerning the choice of $\lambda _n$ can be found in [38].

6 Multiple homography estimation

We now proceed to consider the LM-based estimation of multiple homography matrices. We first describe the specific details of the relevant iterative scheme and then develop a suitable initialisation procedure. This procedure will be applicable not only to the LM scheme, but to any other iterative method operating on the latent variable vector $\varvec{\eta }$ (in particular, to a bundle adjustment technique—see Sect. 7.2), and as such is of interest in its own right.

6.1 LM scheme

In the setup described by (4) and (5), we have, in accordance with (7),

$$\begin{aligned} \varvec{\pi }_i(\varvec{\eta }) = {{\mathrm{vec}}}\left( w_i \mathbf {A} + \mathbf {b} \mathbf {v}_i^{\top }\right) = w_i \mathbf {a} + \mathbf {v}_i \otimes \mathbf {b} \end{aligned}$$

for each $i = 1, \dots , I$. Taking into account that $ \mathbf {v}_i \otimes \mathbf {b} = (\mathbf {I}_3 \otimes \mathbf {b}) \mathbf {v}_i = (\mathbf {v}_i \otimes \mathbf {I}_3) \mathbf {b}, $ one readily verifies that

$$\begin{aligned} \begin{array}{llll} \partial _{\mathbf {a}}{\varvec{\pi }_i} &{} = w_i\mathbf {I}_9, &{}\quad \quad \partial _{\mathbf {b}}{\varvec{\pi }_i} &{}= \mathbf {v}_i \otimes \mathbf {I}_3, \\ \partial _{\mathbf {v}_i}{\varvec{\pi }_i} &{} = \mathbf {I}_3 \otimes \mathbf {b}, &{}\quad \quad \partial _{\mathbf {v}_j}{\varvec{\pi }_j} &{}= \mathbf {0} \quad (i \ne j), \\ \partial _{w_i}{\varvec{\pi }_i} &{} = \mathbf {a},&{}\quad \quad \partial _{w_j}{\varvec{\pi }_j} &{}= \mathbf {0} \quad (i \ne j). \end{array} \end{aligned}$$

Representing, for each $i = 1, \dots , I$, $\partial _{\varvec{\eta }}{\mathbf {f}_i^{'}}$ as

$$\begin{aligned} \partial _{\varvec{\eta }}{\mathbf {f}_i^{'}} = \left[ \partial _{\mathbf {a}}{\mathbf {f}_i^{'}}, \partial _{\mathbf {b}}{\mathbf {f}_i^{'}}, \partial _{\mathbf {v}_1}{\mathbf {f}_i^{'}}, \dots , \partial _{\mathbf {v}_I}{\mathbf {f}_i^{'}}, \partial _{w_1}{\mathbf {f}_i^{'}}, \dots , \partial _{w_I}{\mathbf {f}_i^{'}}\right] , \end{aligned}$$

one finds furthermore that

$$\begin{aligned} \partial _{\mathbf {a}}{\mathbf {f}_i'}&= w_i \Vert \varvec{\pi }_i \Vert ^{-1} \mathbf {B}_i \mathbf {P}_{\varvec{\pi }_i}^\perp , \\ \partial _{\mathbf {b}}{\mathbf {f}_i'}&= \Vert \varvec{\pi }_i \Vert ^{-1} \mathbf {B}_i \mathbf {P}_{\varvec{\pi }_i}^\perp (\mathbf {v}_i \otimes \mathbf {I}_3),\\ \partial _{\mathbf {v}_i}{\mathbf {f}_i'}&= \Vert \varvec{\pi }_i \Vert ^{-1} \mathbf {B}_i \mathbf {P}_{\varvec{\pi }_i}^\perp (\mathbf {I}_3 \otimes \mathbf {b}), \quad \partial _{\mathbf {v}_j}{\mathbf {f}_i'} = \mathbf {0} \quad (j \ne i), \\ \partial _{w_i}{\mathbf {f}_i'}&= \Vert \varvec{\pi }_i \Vert ^{-1} \mathbf {B}_i \mathbf {P}_{\varvec{\pi }_i}^\perp \mathbf {a},\qquad \qquad \partial _{w_j}{\mathbf {f}_i'} = \mathbf {0} \quad (j \ne i). \end{aligned}$$

With $\partial _{\varvec{\eta }}{\mathbf {f}'}$ thus determined, all that is now needed is a means for determining a suitable initial value of $\varvec{\eta }$.

6.2 Initialisation procedure

To devise an initialisation procedure, we first consider the following problem:

Problem 2

Given ${\varvec{\mathsf {X}}} = [\mathbf {X}_1, \dots , \mathbf {X}_I]$ satisfying

$$\begin{aligned} \mathbf {X}_i = \lambda _i \mathbf {H}_i \end{aligned}$$

(10)

for each $i = 1, \dots , I$, where $\lambda _i$ is a non-zero scalar and $ \mathbf {H}_i = w_i \mathbf {A} + \mathbf {b} \mathbf {v}_i^{\top }, $ solve for $\mathbf {A}$, $\mathbf {b}$, $\mathbf {v}_i$ and $w_i$ in terms of ${\varvec{\mathsf {X}}}$.

The essence here is to recover the structure of a set of matrices which are known a priori to satisfy (1), but which are identified only up to (unknown) scale factors.

Observe first that the solution of the problem cannot be unique. Indeed, if $ \mathbf {H}_i = w_i \mathbf {A} + \mathbf {b} \mathbf {v}_i^\top $ for each $i$, then also $ \mathbf {H}_i = w_i' \mathbf {A}' + \mathbf {b}' \mathbf {v}_i'^\top $ for each $i$, where $\mathbf {A}' = \beta \mathbf {A} + \mathbf {b} \mathbf {c}^\top $, $\mathbf {b}' = \alpha \mathbf {b}$, $\mathbf {v}_i' = \alpha ^{-1} \mathbf {v}_i - \alpha ^{-1} \beta ^{-1} \mathbf {c}$, and $w_i' = \beta ^{-1} w_i$, with $\alpha $ and $\beta $ non-zero numbers and $\mathbf {c}$ a length-$3$ vector $\mathbf {c}$. Therefore, we shall contend ourselves with a single specific solution.

Note that $ \mathbf {X}_i = \lambda _i (w_i \mathbf {A} + \mathbf {b} \mathbf {v}_i^{\top }) $ implies $ \mathbf {X}_i = \lambda _i' (\mathbf {A} + \mathbf {b} \mathbf {v}_i'^{\top }) $ with $\lambda _i' = w_i \lambda _i$ and $\mathbf {v}_i' = w_i^{-1} \mathbf {v}_i$. Thus, without loss of generality, we may assume that $w_i = 1$ for each $i$. Now, system (10) becomes

$$\begin{aligned} \mathbf {X}_i = \lambda _i(\mathbf {A} + \mathbf {b} \mathbf {v}_i^{\top }). \end{aligned}$$

(11)

Select arbitrarily $i$ and $j$ with $i \ne j$. Taking into account that $ \mathbf {v}_i^\top (\mathbf {v}_i \times \mathbf {v}_j) = \mathbf {v}_j^\top (\mathbf {v}_i \times \mathbf {v}_j) = 0, $ we see that

$$\begin{aligned} \lambda _i^{-1} \mathbf {X}_i (\mathbf {v}_i \times \mathbf {v}_j) = \mathbf {A} (\mathbf {v}_i \times \mathbf {v}_j) = \lambda _j^{-1} \mathbf {X}_j (\mathbf {v}_i \times \mathbf {v}_j) \end{aligned}$$

and further

$$\begin{aligned} \mathbf {X}_i^{-1} \mathbf {X}_j (\mathbf {v}_i \times \mathbf {v}_j) = \lambda _i^{-1}\lambda _j (\mathbf {v}_i \times \mathbf {v}_j). \end{aligned}$$

(12)

Consequently, $\lambda _i^{-1}\lambda _j$ is an eigenvalue of $\mathbf {X}_i^{-1} \mathbf {X}_j$. Note that if $\mathbf {c}$ is any length-3 vector, then (11) holds with $\mathbf {A}$ replaced by $\mathbf {A} + \mathbf {b} \mathbf {c}^\top $ and with $\mathbf {b} \mathbf {v}_i^\top $ replaced by $\mathbf {b} (\mathbf {v}_i - \mathbf {c})^{\top }$. Accordingly, (12) holds with $(\mathbf {v}_i - \mathbf {c}) \times (\mathbf {v}_j - \mathbf {c})$ substituted for $\mathbf {v}_i \times \mathbf {v}_j$. It is easily seen that as $\mathbf {c}$ varies, the vectors

$$\begin{aligned} (\mathbf {v}_i - \mathbf {c}) \times (\mathbf {v}_j - \mathbf {c}) = (\mathbf {c} - \mathbf {v}_j) \times (\mathbf {v}_i - \mathbf {v}_j) \end{aligned}$$

fill out a two-dimensional linear space, namely the space of all length-$3$ vectors orthogonal to $\mathbf {v}_i - \mathbf {v}_j$. Thus $\lambda _i^{-1}\lambda _j$ is in fact a double eigenvalue of $\mathbf {X}_i^{-1} \mathbf {X}_j$, which, generically, is unique. Denoting this eigenvalue by $\mu _{ij}$ and observing that

$$\begin{aligned} \lambda _i^{-1}\lambda _j \mathbf {X}_i - \mathbf {X}_j = \lambda _j(\lambda _i^{-1}\mathbf {X}_i - \lambda _j^{-j}\mathbf {X}_j) = \lambda _j \mathbf {b}(\mathbf {v}_i - \mathbf {v}_j)^\top , \end{aligned}$$

we next see that $\mathbf {b}$ can be identified, up to a scalar factor, as the unique left singular vector of $ \mu _{ij} \mathbf {X}_i - \mathbf {X}_j $ corresponding to a non-zero singular value.

Having made these observations, we now fix $i_0$ arbitrarily. For each $i \ne i_0$, let $\mu _{ii_0}$ be the double eigenvalue of $\mathbf {X}_i^{-1} \mathbf {X}_{i_0}$. Then, for each $i \ne i_0$,

$$\begin{aligned} \mu _{ii_0} \mathbf {X}_i = \lambda _i^{-1}\lambda _{i_0} \mathbf {X}_i = \lambda _{i_0} \left( \mathbf {A} + \mathbf {b} \mathbf {v}_i^{\top }\right) . \end{aligned}$$

Relabelling $\lambda _{i_0} \mathbf {A}$ as $\mathbf {A}$ and $\lambda _{i_0} \mathbf {b}$ as $\mathbf {b}$, the last relation simplifies to

$$\begin{aligned} \mu _{ii_0} \mathbf {X}_i = \mathbf {A} + \mathbf {b} \mathbf {v}_i^{\top } \end{aligned}$$

(13)

and the $i_0$th defining equation becomes

$$\begin{aligned} \mathbf {X}_{i_0} = \mathbf {A} + \mathbf {b} \mathbf {v}_{i_0}^{\top }. \end{aligned}$$

(14)

Take for $\mathbf {b}$ the left singular vector of the $3 \times 3(I-1)$ matrix obtained by juxtapositioning (concatenating horizontally) the matrices $\mu _{ii_0} \mathbf {X}_i - \mathbf {X}_{i_0}$, $i \ne i_{i_0}$, corresponding to a single non-zero singular value. Now, by subtracting (14) from (13), we find, for each $i \ne i_{i_0}$,

$$\begin{aligned} \mu _{ii_0} \mathbf {X}_i - \mathbf {X}_{i_0} = \mathbf {b}(\mathbf {v}_i - \mathbf {v}_{i_0})^\top \end{aligned}$$

whence

$$\begin{aligned} \mathbf {v}_i = \mathbf {v}_{i_0} + \Vert \mathbf {b} \Vert ^{-2} (\mu _{ii_0} \mathbf {X}_i - \mathbf {X}_{i_0})^\top \mathbf {b}. \end{aligned}$$

To fully solve our problem, it remains to determine $\mathbf {v}_{i_0}$ and $\mathbf {A}$. The only constraint that we now have to take into account is (14). One simple solution is

$$\begin{aligned} \mathbf {A} = \mathbf {X}_{i_0}, \quad \mathbf {v}_{i_0} = \mathbf {0}. \end{aligned}$$

With our problem successfully solved, we are now in position to furnish a desired initialisation procedure. Based on a noisy data set ${\varvec{\mathsf {X}}}$, a seed $ \bar{\varvec{\eta }} = [\bar{\mathbf {a}}^\top , \bar{\mathbf {b}}^\top , \bar{\mathbf {v}}_1^\top , \dots , \bar{\mathbf {v}}_I^\top , \bar{w}_1, \dots , \bar{w}_I]^\top $ for any iterative method operating on $\varvec{\eta }$ is obtained by modifying the specific solution given above. The modification reflects the fact that ${\varvec{\mathsf {X}}}$ admits only an approximate representation as in (10). The steps of the initialisation procedure are detailed in Algorithm 1.

It is noteworthy that for the above procedure to work, just two different input homographies suffice. This is in contrast with the initialisation proposed by Chen and Suter [4] which requires at least three different homographies.

7 Experimental verification

The method was tested on both synthetic and real data. Synthetic data were used to quantify the effect of noise on the method. Real data were used to evaluate the performance of the method in a real-world situation.

7.1 Synthetic data

Synthetic data were created by generating true corresponding points for some stereo configuration and adding random Gaussian noise to these points. Many configurations were investigated. Any specific instantiation of true image points was developed as follows. First, we chose a realistic geometric configuration for two cameras. Next, we applied a random rotation and translation to a plane that is parallel to the first camera’s image plane (see Fig. 2). Repeating this last step several times, we generated several planes in the 3D scene. Finally, between 25 and 50 points in each plane were randomly selected in the field of view of both cameras, and these were projected onto two $640 \times 480$ pixel images to provide true image points.

7.2 Synthetic simulation procedure

Each synthetic true image point was perturbed by independent homogeneous Gaussian noise at a preset level. For different series of experiments, different noise levels were applied. This resulted in $I$ groups of noise-contaminated pairs of corresponding points $\{\mathbf {m}_{n,i},\mathbf {m}_{n,i}^{'}\}_{n=1}^{N_i}$, $i = 1, \dots , I$, corresponding to $I$ different planes in the 3D scene, with $I\in \{2, 4, 5, 6, 7, 8\}$ for different experiments; here, $N_i$ is the number of feature points in the $i$th plane.

The estimation methods considered were:

$\bullet $ DLT :: direct linear transform,
$\bullet $ FNS :: fundamental numerical scheme,
$\bullet $ WALS :: weighted alternating least squares,
$\bullet $ AML :: approximate maximum likelihood,
$\bullet $ BA :: bundle adjustment.

DLT [24] is a linear method for estimating a single homography and FNS [8, 39] is an iterative method for the same purpose, the two methods optimising two different, yet related cost functions. For the small noise levels utilised in our study, FNS produces the same results as single homography bundle adjustment, and to all intents and purposes may be regarded as the gold standard single homography estimation method. Both DLT and FNS were run on suitably normalised data. For each $i = 1, \dots , I$, two data-dependent normalisation matrices $\mathbf {T}_i$ and $\mathbf {T}_i^{'}$ were applied to $\{\mathbf {m}_{n,i},\mathbf {m}_{n,i}^{'}\}_{n=1}^{N_i}$ using the rule

$$\begin{aligned} \tilde{\mathbf {m}}_{n,i} = \mathbf {T}_i \mathbf {m}_{n,i} \quad \text {and} \quad \tilde{\mathbf {m}}_{n,i}^{'} = \mathbf {T}_i^{'} \mathbf {m}_{n,i}^{'} \end{aligned}$$

to produce individually normalised corresponding points $\{\tilde{\mathbf {m}}_{n,i},\tilde{\mathbf {m}}_{n,i}^{'}\}_{n=1}^{N_i}$ à la Hartley [7, 9, 23]. These normalised groups were used as input to DLT and FNS to produce two sets of $I$ homography matrices $\{\widehat{{\tilde{{\varvec{\Theta }}}}}_{\mathrm {DLT},i}\}_{i=1}^{I} $ and $ \{\widehat{{{{\varvec{\Theta }}}}}_{\mathrm {FNS},i}\}_{i=1}^{I}. $ The final estimates and were obtained by applying, for each $i = 1, \dots , I$, the back-transformation ${\varvec{{\Theta }}}_i \mapsto \mathbf {T}_i'^{-1} {\varvec{{\Theta }}}_i \mathbf {T}_i$ to $\widehat{{\tilde{{\varvec{\Theta }}}}}_{\mathrm {DLT},i}$ and $\widehat{{\tilde{{\varvec{\Theta }}}}}_{\mathrm {FNS},i}$, respectively.

WALS is Chen and Suter’s method of weighted alternating least squares with weights derived from the covariances of input estimates. Both WALS and our proposed AML method use as input, in one version, the estimates produced by FNS and, in another version, the estimates produced by DLT. For reasons of numerical stability, both WALS and AML required data normalisation as a pre-processing step. A common data normalisation procedure was employed to both methods, namely we combined all the points together to produce two global normalisation matrices $\mathbf {T}$ and $\mathbf {T}'$. These matrices were used to produce globally normalised corresponding points $\{\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i},\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'}\}_{n=1}^{N_i}$, $i = 1, \dots , I$, defined by

$$\begin{aligned} \mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i} = \mathbf {T} \mathbf {m}_{n,i} \quad \text {and} \quad \mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'} = \mathbf {T}'\mathbf {m}_{n,i}^{'}. \end{aligned}$$

The input of the FNS-initialised versions of WALS and AML, WALS-FNS and AML-FNS, took the form of the FNS estimates based on $\{\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i},\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'}\}_{n=1}^{N_i}$, $i = 1, \dots , I$. Likewise, the input of the DLT-initialised versions of WALS and AML, WALS-DLT and AML-DLT, took the form of the DLT estimates based on $\{\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i},\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'}\}_{n=1}^{N_i}$, $i = 1, \dots , I$. For each initialisation, a specific prescription, described below, was used to generate the pre-covariance matrix $\varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{0}$ for the vectorisation $\mathbf {x}_i$ of $\mathbf {X}_i$ for each $i = 1, \dots , I$. The matrices $\mathbf {X}_i$ and pre-covariances $\varvec{\mathbf {\Lambda }}_{\mathbf {x}_i}^{0}$ were next used as input to the WALS and AML methods to produce estimates which, upon applying the back-transformation $\varvec{\mathbf {\Theta }}\mapsto \mathbf {T}'^{-1} \varvec{\mathbf {\Theta }}\mathbf {T}$, were taken to be and , respectively

In the case of the FNS-initialised methods, the pre-covariance matrix for the vectorisation $\mathbf {x}_i$ of $\mathbf {X}_i$, based on $\{\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}, \mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'}\}_{n=1}^{N_i}$, was taken to be

$$\begin{aligned} \varvec{\Lambda }^0_{\mathbf {x}_i}= & {} (\mathbf {M}_{\mathbf {x}_i})^{+}_8, \end{aligned}$$

(15)

$$\begin{aligned} \mathbf {M}_{\mathbf {x}_i}= & {} \Vert \mathbf {x}_i \Vert ^2 \sum _{n = 1}^{N_i} \mathbf {U}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i}) ({\varvec{\mathsf {\Sigma }}}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i}, \mathbf {x}_{i}))^+_2 \mathbf {U}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i})^\top . \end{aligned}$$

(16)

Here, $\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i} = [\mathop {{u}}\limits ^{\circ }\!{}_{n,i},\mathop {{v}}\limits ^{\circ }\!{}_{n,i},\mathop {{u}}\limits ^{\circ }\!{}_{n,i}^{'},\mathop {{v}}\limits ^{\circ }\!{}_{n,i}^{'}]^\top $ is the result of combining $\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i} = [\mathop {{u}}\limits ^{\circ }\!{}_{n,i},\mathop {{v}}\limits ^{\circ }\!{}_{n,i},1]^\top $ and $\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'}{=} [\mathop {{u}}\limits ^{\circ }\!{}_{n,i}^{'},\mathop {{v}}\limits ^{\circ }\!{}_{n,i}^{'},1]^\top $ into a single vector; $\mathbf {U}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i})$ and ${\varvec{\mathsf {\Sigma }}}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i}, \mathbf {x}_{i})$ are defined by

$$\begin{aligned} \mathbf {U}(\mathbf {z})&= - \mathbf {m} \otimes [\mathbf {m}']_{\times }, \\ \mathbf {B}(\mathbf {z})&= \partial _{\mathbf {z}}{{{\mathrm{vec}}}(\mathbf {U}(\mathbf {z}))} \varvec{\mathbf {\Lambda }}_{\mathbf {z}}^{} \big [\partial _{\mathbf {z}}{{{\mathrm{vec}}}(\mathbf {U}(\mathbf {z}))}\big ]^\top , \\ {\varvec{\mathsf {\Sigma }}}(\mathbf {z},\mathbf {x})&= (\mathbf {I}_3 \otimes \mathbf {x}^\top ) \mathbf {B}(\mathbf {z}) (\mathbf {I}_3 \otimes \mathbf {x}) \end{aligned}$$

with $\mathbf {z} = [u,v,u',v']^\top $ derived from $\mathbf {m} = [u, v, 1]^\top $ and $\mathbf {m}' = [u', v', 1]^\top $, and with $\mathbf {x}$ a length-9 vector; for a length-3 vector $\mathbf {a}$, $[\mathbf {a}]_{\times }$ is the $3 \times 3$ anti-symmetric matrix given by

$$\begin{aligned}{}[\mathbf {a}]_{\times } = \begin{bmatrix} 0&-a_3&a_2 \\ a_3&0&-a_1 \\ -a_2&a_1&0 \end{bmatrix}; \end{aligned}$$

and $\mathbf {A}^+_r$ denotes the rank-$r$ truncated pseudo-inverse of the matrix $\mathbf {A}$ (see [26] or Appendix A for the definition). The image data covariance matrices $\varvec{\mathbf {\Lambda }}_{\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i}}^{}$, incorporated in the matrices ${\varvec{\mathsf {\Sigma }}}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i}, \mathbf {x}_{i})$, were all chosen to be in their default form ${{\mathrm{diag}}}(1,1,1,1)$, corresponding to isotropic homogeneous noise in image point measurement. The details of formula (15) are given in Appendix A.

The DLT-initialised methods employed the raw covariance matrix for the vectorisation $\mathbf {x}_i$ of $\mathbf {X}_i$, based on $\{\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}, \mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'}\}_{n=1}^{N_i}$, in the form

$$\begin{aligned} \varvec{\Lambda }^0_{\mathbf {x}_i} = (\mathbf {M}_{\mathbf {x}_i})_8^+ \mathbf {D}_{\mathbf {x}_i} (\mathbf {M}_{\mathbf {x}_i})_8^+. \end{aligned}$$

(17)

with $\mathbf {M}_{\mathbf {x}_i}$ given in (16) and

$$\begin{aligned} \mathbf {D}_{\mathbf {x}_i} = \Vert \mathbf {x}_i \Vert ^{-2} \sum _{n=1}^N \mathbf {U}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i}) {\varvec{\mathsf {\Sigma }}}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i}, \mathbf {x}_{i}) \mathbf {U}(\mathop {\mathbf {z}}\limits ^{\circ }\!{}_{n,i})^\top . \end{aligned}$$

Formula (17) relies on the material presented in Appendix B.

BA is a method for calculating maximum likelihood estimates and is regarded as the gold standard for performing optimal parameter estimation. It is the same as what is called joint bundle adjustment in [42]. The qualifier “joint” serves to emphasise the fact that this variant of bundle adjustment, using the latent variables, enforces full homography consistency constraints. In contrast, naive separate bundle adjustment minimises the gold standard geometric error, but does not enforce consistency constraints. In this paper we do not explicitly consider separate bundle adjustment, and therefore we label joint bundle adjustment simply as BA. However, separate bundle adjustment enters implicitly into our considerations through FNS, as, in accordance with our earlier remark, FNS applied to estimate each individual homography separately yields results indistinguishable from those produced by separate bundle adjustment for the levels of noise employed in our experiments.

The BA method was used in two variants, BA-FNS and BA-DLT, which differed in the way the method’s iterative process was initialised. In both variants, a BA estimate was generated directly from all the image data $ \{\{\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}, \mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'}\}_{n=1}^{N_i}\}_{i=1}^{I} $ by minimising the reprojection error

$$\begin{aligned} \sum _{i=1}^{I} \sum _{n=1}^{N_i} \left( d\left( \mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}, \underline{\mathbf {m}}_{n,i}\right) ^2 + d\left( \mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}^{'}, \varvec{\Pi }_i(\varvec{\eta })\underline{\mathbf {m}}_{n,i}\right) ^2 \right) \end{aligned}$$

over all choices of parameter vectors $\varvec{\eta }$ and 2D points $ \{\{\underline{\mathbf {m}}_{n,i}\}_{n=1}^{N_i}\}_{i=1}^{I}, $ with the minimum attained at the composite of $\widehat{\varvec{\eta }}$ and $ \{\{\underline{\widehat{\mathbf {m}}}_{n,i}\}_{n=1}^{N_i}\}_{i=1}^{I} $ resulting in $\widehat{\varvec{{\Theta }}}_{\mathrm {BA},i} = \varvec{\Pi }_i(\widehat{\varvec{\eta }})$. Here, $d(\mathbf {m},\mathbf {n})$ denotes the Euclidean distance between the points $\mathbf {m}$ and $\mathbf {n}$ dehomogenised at the last, third entry. The initial value of each $\underline{\mathbf {m}}_{n,i}$ was taken to be $\mathop {\mathbf {m}}\limits ^{\circ }\!{}_{n,i}$ in both variants, and the initial value of $\varvec{\eta }$ was obtained from the result of AML-FNS for BA-FNS, and from the result of AML-DLT for BA-DLT. Upon initialisation, $\varvec{\eta }$ and the $\underline{\mathbf {m}}_{n,i}$’s were recomputed iteratively by an LM scheme adapted to the task of minimising the error given above. With $\widehat{\mathbf {A}}_{\mathrm {BA}}$, $\widehat{\mathbf {b}}_{\mathrm {BA}}$, $\widehat{\mathbf {v}}_{\mathrm {BA},i}$’s, $\widehat{w}_{\mathrm {BA},i}$’s derived from the terminal value of $\varvec{\eta }$, the estimates $\widehat{\varvec{{\Theta }}}_{\mathrm {BA},i}$ were finally obtained by applying the back-transformation ${\varvec{{\Theta }}} \mapsto \mathbf {T}'^{-1} {\varvec{{\Theta }}} \mathbf {T}$ to $ \widehat{w}_{\mathrm {BA},i} \widehat{\mathbf {A}}_{\mathrm {BA}} + \widehat{\mathbf {b}}_{\mathrm {BA}} \widehat{\mathbf {v}}_{\mathrm {BA},i}^{\top }. $

7.3 Real data

To investigate the performance of the proposed method on real data, we utilised the Model House sequence from the Oxford dataset,^{Footnote 3} the Lady Symon and Old Classics Wing scenes from the AdelaideRMF dataset^{Footnote 4} [45], and additionally two traffic scenes from the Traffic Signs dataset^{Footnote 5} [30].

We generated corresponding points for the Oxford and Traffic Signs datasets using correlation-based matching on Harris corner points. Because the matching of corner points was done without manual intervention, the resulting set of corresponding points included pure outliers (incorrect matches). Corresponding points for the AdelaideRMF dataset were generated in a different manner, but also included pure outliers. The curator of the AdelaideRMF dataset detected key points using SIFT and generated true correspondences by manually matching only a subset of key points. The remaining key points were randomly matched and labelled as pure outliers.

To make the experiments on real data as realistic as possible, we did not manually group data points into different planar regions. Instead, we took advantage of recent progress in robust hypothesis generation for multi-structure data [6] and sampled a thousand candidate homographies based on four-element sets of matched correspondences. For each pair of corresponding points, we computed the symmetric transfer error [24, Sect. 4.2.2] for all 1,000 homographies and ranked the homographies according to this error. This means that each correspondence had an associated preference list that ranked the homographies. Making use of the ordered residual kernel method [5] which defines a distance between rankings, together with an agglomerating clustering scheme, we grouped the pairs of corresponding points into several clusters. The clustering scheme that we applied does not guarantee that all final clusters belong to different planes in the scene. In fact, frequently correspondences are fragmented into two different clusters even though visually they should belong to the same plane. This means that effectively the same planar region may have more than one homography associated with it. Such is the case with the Model House sequence where clustering led to the appearance of six homographies (corresponding to six clusters), even though there are only four actual planes in the scene. The fragmentation effect is not unique to our clustering scheme and was also observed by Fouhey et al. [18], who used a different algorithm (J-linkage) to group planar data points together. The empirical fact that fragmentation occurs so frequently serves as an affirmation that it is important to exploit consistency constraints as a means for enhancing coherency between homographies derived from logically linked clusters.

7.4 Real data experiment procedure

After automatically grouping the data points into several clusters, we estimated homographies using DLT for each group. To compare and contrast the stability and accuracy of the consistency enforcing estimation methods, we used the same initialisation procedure (Algorithm 1) on the DLT estimates of each group to seed AML-DLT, WALS-DLT, and BA-DLT.

7.5 Quantitative comparison of methods

On synthetic data, for generated by the DLT, FNS, WALS, AML, and BA methods, the common distance used to quantify data–model discrepancies was the mean root-mean-square (RMS) reprojection error from truth

$$\begin{aligned} \frac{1}{I} \sum _{i=1}^{I} \sqrt{ \frac{1}{4N_iK} \sum _{k=1}^K \min _{\underline{\mathbf {m}}_{n,i}^{(k)}} \sum _{n=1}^{N_i} \left( d(\overline{\mathbf {m}}^{(k)}_{n,i}, \underline{\mathbf {m}}^{(k)}_{n,i})^2 + d(\overline{\mathbf {m}}'^{(k)}_{n,i}, \widehat{\varvec{\mathbf {\Theta }}}_i \underline{\mathbf {m}}^{(k)}_{n,i})^2\right) }, \end{aligned}$$

where $K$ is the number of experiments, and, for each $k = 1, \dots , K$, $ \{\{\overline{\mathbf {m}}^{(k)}_{n,i},\overline{\mathbf {m}}'^{(k)}_{n,i}\} _{n=1}^{N_i}\}_{i=1}^{I} $ are noiseless data and $ \{\{\underline{\mathbf {m}}^{(k)}_{n,i}\}_{n=1}^{N_i}\}_{i=1}^{I} $ are arbitrary 2D points over which the minimum is taken in the $k$th experiment. A comparison of the methods on synthetic data is shown in Figs. 3 and 4. The results indicate that the proposed algorithm outperforms DLT, FNS, WALS-DLT, and WALS-FNS, and is practically indistinguishable in performance from the BA method for small noise levels. For larger noise levels all algorithms that enforce consistency constraints become susceptible to converging to a poor local minimum, resulting in homography estimates that are worse than separately estimated homographies. Figure 4a shows comparatively how often the algorithms that enforce consistency homographies managed to improve upon the (separately) FNS-estimated homographies. While the likelihood of converging to a superior solution decreased as the noise level increased, our method still fared better than WALS. With $\sigma = 2$ pixels our algorithm converged to a superior solution in more than $90\,\%$ of the trials. The deterioration of the algorithm at higher noise levels can be attributed to Algorithm 1 which is used to find initial values for the latent variables. For high noise levels, Algorithm 1 yields latent variables that are associated with high cost function values, which means that the optimisation methods are initialised far from the optimal solution and have a greater chance of stopping at a poor local minimum. In Fig. 4b, we display the percentage reduction in reprojection error that enforcing consistency constraints can produce when compared with separate homography estimation. The results show that one can expect approximately $23\,\%$ reduction for four homographies, and up to approximately $30\,\%$ reduction for eight homographies. Even when there were only two planes in the scene, we still observed an approximately $10\,\%$ reduction in reprojection error. These findings evidence that upgrading a set of inconsistent homographies to a fully consistent set can yield considerable practical benefits and constitute the raison d’être of our algorithm. In Fig. 4c, we present the median running time of BA, WALS, and our method. Our algorithm is orders of magnitude faster than all other options, requiring on average only four iterations to converge. It is thanks to our algorithm’s remarkable and unique computational efficiency that it can be incorporated into a random sampling and consensus framework to filter out sets of putative homographies that are mutually incompatible (see Sect. 7.6).

On real data, we evaluate the performance of AML-DLT, WALS-DLT, and BA-DLT against each other by comparing the quality of each estimated homography separately. The comparisons are made by plotting bar graphs of the RMS reprojection error from data

$$\begin{aligned} \sqrt{ \frac{1}{4N_i} \min _{\underline{\mathbf {m}}_{n,i}} \sum _{n=1}^{N_i} \left( d\left( \mathbf {m}_{n,i}, \underline{\mathbf {m}}_{n,i}\right) ^2 + d\left( \mathbf {m}_{n,i}^{'}, \widehat{\varvec{\mathbf {\Theta }}}_i \underline{\mathbf {m}}_{n,i}\right) ^2 \right) }, \end{aligned}$$

where $ \{\{\mathbf {m}_{n,i}, \mathbf {m}_{n,i}^{'}\}_{n=1}^{N_i}\}_{i=1}^{I} $ are the original corresponding points and $ \{\{\underline{\mathbf {m}}_{n,i}\} _{n=1}^{N_i}\}_{i=1}^{I} $ are arbitrary 2D points over which the minimum is taken. By plotting the error for each homography separately, it is possible to gain a deeper insight into the performance of each method. A comparison on real data is given in Figs. 5, 6, and 7. The results show that WALS is inferior to both AML and BA, frequently displaying much higher errors in at least one of the homographies. On the other hand, our proposed scheme produces results which are often a very good approximation to BA.

We defer a deeper discussion of the results to Sect. 8.

7.6 Application: robust consistent homography estimation

Our experiments have shown that the scope of consistency enforcing algorithms is not limited to synthetic data—when the algorithms are applied to homographies estimated on real-world scenes, full consistency can still be achieved. However, neither AML, nor WALS, nor BA is truly robust to outliers and it may be the case that if consistency constraints are imposed on a collection of homographies where one of the homographies is an outlier, then all of the methods may fail to produce accurate estimates. That is because each method employs a non-robust error measure and relies on an initialisation which in its current form is not robust. One possible way to circumvent any robustness issues is to incorporate the consistency constraints directly into the well-known RANSAC estimation loop [24, Sect. 4.7.1], so that the homographies returned by a RANSAC procedure are by construction fully consistent. This can be achieved by a simple modification of the canonical RANSAC procedure, and the pseudo-code for the modification is given in Algorithm 2.

Algorithm 2 makes use of several functions, namely $f$, $g$, $h$, and $\rho $. The function $f$ can be any standard method for estimating a single homography, such as DLT or FNS. The function $g$ serves to determine a set of inliers consistent with a given homography estimate. It utilises a cost function $\rho $ that computes an error between a point correspondence and a homography, and declares a particular correspondence to be a member of the set of inliers for the given homography estimate if the value of $\rho $ for that correspondence falls below a user-specified threshold $t$. Typically, the cost function $\rho $ is chosen to be the symmetric transfer error. The function $h$ represents a procedure that takes as input a set of homography matrices together with covariances and produces a new set of fully consistent homographies. In our implementation, we take for $h$ the AML algorithm proposed in this paper.

When faced with multiple planar structures in a set of correspondences, researchers and practitioners using standard RANSAC frequently resort to the following strategy: (1) search for the homography with the largest number of inliers and (2) from all considered homographies, choose the homography with the largest number of inliers, remove its inliers from the set of correspondences and repeat the whole process to find the next plane [29, 44]. The essence of Algorithm 2 is the same, except that we modify step (2) so that from all considered homographies, we choose the homography that has the largest number of inliers and at the same time is fully consistent with any planar structures found so far. We then remove its inliers from the set of correspondences and repeat the whole process to find the next plane.

Since computational efficiency is crucial for algorithms like RANSAC that are intended to be practical, our modification of RANSAC is an excellent example of where one would prefer to impose consistency constraints using our AML algorithm, which is relatively fast, instead of BA, which is slower. As a proof of concept of our idea, we show in Fig. 8 the inliers associated with fully consistent homographies on various real data taken from the Oxford dataset, which were produced by running our modified RANSAC algorithm. The results demonstrate that finding numerous visually pleasing inliers related to fully consistent homographies is fundamentally possible.

8 Discussion

We have identified several important factors that may help explain the success of our method. Firstly, the error covariance matrices associated with the individual homography estimates are absolutely crucial. Without covariance information we are unable to produce meaningful results. Secondly, our scale-invariant cost function captures the projective nature of the parameters that are being estimated, and ensures that the optimisation method always takes a step in a meaningful direction. The scale invariance is not properly accounted for in the method of Chen and Suter, and this partially explains why their method tends to produce worse results. Furthermore, the Levenberg–Marquardt optimisation method is better suited for solving non-linear optimisation problems than for example the bilinear approach followed by Chen and Suter.

Our modification of RANSAC serves to emphasise the practical utility of our research outcome. There are, of course, many other modifications to the original RANSAC algorithm, each claiming to be the best. Usually, the modifications are based on improvements to the sampling scheme. Although they may be faster than the original RANSAC, none can produce fully consistent homographies. We leave it to future work to decide which modern variant of RANSAC to couple with our full consistency constraints to achieve maximum computational efficiency.

We also wish to point out that our algorithm need not be incorporated into a sampling procedure such as RANSAC to be of practical use. One could, for example, utilise the recently proposed generalised projection based M-estimator [36] to estimate a set of inconsistent homographies, and then upgrade them to a fully consistent set using our method. Since enforcing consistency constraints leads to a reduction in reprojection error and hence an improvement in the quality of the homographies, it always makes sense to do so.

9 Conclusion

Our principal contribution within this paper has been to formulate the problem of estimating multiple interdependent homographies in a manner that enforces full consistency between all constituent homographies. A key feature of this formulation is the exploitation of latent variables that link together different homographies. The resulting solution strategy has a natural translation into a full-blown BA procedure and into the computationally more efficient AML estimation procedure. Since both BA and AML are non-linear optimisation techniques, they both require a suitable initialisation to produce meaningful results.

In this connection, our second important contribution has been the derivation of a novel compact initialisation procedure which can be applied when there are two or more planes in the scene. In contrast, the initialisation procedure of Chen and Suter is more involved and requires at least there planes before it can be utilised.

The third notable contribution of our work has been to show that our scale-invariant AML cost function frequently produces estimates that are an accurate approximation of BA. Compared to the WALS algorithm of Chen and Suter, our method has the advantage that it consistently converges to a better minimum. This is evidenced by experimental results on both real and synthetic data. Furthermore, we have furnished a derivation of suitable homography covariance matrices for both DLT and FNS estimates, to be found in Appendices A and B, and our final formulas are tailored to facilitate easy implementation.

Drawing on the experimentally confirmed validity of our approach, we have proposed a modification to the celebrated RANSAC algorithm to robustly estimate fully consistent homographies. Since in RANSAC computational efficiency is vital, the modified algorithm serves as a noteworthy example of where our AML scheme is preferable to BA.

Finally, we conclude by pointing out that once recovered, the latent variables encoding the fully consistent homographies can be immediately utilised to provide a projective reconstruction of the scene.

The source code for our experiments can be found at http://sites.google.com/site/szpakz/.

Notes

The following examples illustrate the logic behind the definition of vector transposition:
$$\begin{aligned} \mathop {\left[ \begin{array}{ll} a_{11} &{} a_{12} \\ a_{21} &{} a_{22} \\ a_{31} &{} a_{32} \\ a_{41} &{} a_{42} \\ a_{51} &{} a_{52} \\ a_{61} &{} a_{62} \\ \end{array}\right] }\nolimits ^{(2)} = \left[ \begin{array}{lll} a_{11} &{} a_{31} &{} a_{51} \\ a_{21} &{} a_{41} &{} a_{61} \\ a_{12} &{} a_{32} &{} a_{52} \\ a_{22} &{} a_{42} &{} a_{62} \end{array}\right] , \quad \left[ {\begin{array}{ll} a_{11} &{} a_{12} \\ a_{21} &{} a_{22} \\ a_{31} &{} a_{32} \\ a_{41} &{} a_{42} \\ a_{51} &{} a_{52} \\ a_{61} &{} a_{62} \\ \end{array}}\right] ^{\scriptscriptstyle (3)} = \left[ \begin{array}{ll} a_{11} &{} a_{41} \\ a_{21} &{} a_{51} \\ a_{31} &{} a_{61} \\ a_{12} &{} a_{42} \\ a_{22} &{} a_{52} \\ a_{32} &{} a_{62} \\ \end{array}\right] . \end{aligned}$$
The non-negative definite square root $\mathbf {C}^{1/2}$ of a symmetric non-negative definite matrix $\mathbf {C}$ is defined as follows: If $\mathbf {C} = \mathbf {U} \mathbf {D} \mathbf {U}^\top $ is the eigenvalue decomposition of $\mathbf {C}$ with $\mathbf {U}$ an orthogonal matrix and $\mathbf {D}$ a diagonal matrix comprising the (non-negative) eigenvalues of $\mathbf {C}$, then $\mathbf {C}^{1/2} = \mathbf {U} \mathbf {D} ^{1/2} \mathbf {U}^\top $, where $\mathbf {D} ^{1/2}$ is the diagonal matrix containing the square roots of the respective entries of $\mathbf {D}$.
http://www.robots.ox.ac.uk/~vgg/data/data-mview.html.
http://cs.adelaide.edu.au/~hwong/doku.php?id=data.
http://www.cvl.isy.liu.se/research/datasets/traffic-signs-dataset/download/.

References

Albert, A.: Regression and the Moore–Penrose Pseudoinverse. Academic Press, New York (1972)
MATH Google Scholar
Baker, S., Datta, A., Kanade, T.: Parameterizing homographies. Tech. Rep. CMU-RI-TR-06-11. Robotics Institute, Carnegie Mellon University, Pittsburgh (2006)
Chen, P., Suter, D.: Error analysis in homography estimation by first order approximation tools: a general technique. J. Math. Imaging Vis. 33(3), 281–295 (2009)
Article MathSciNet Google Scholar
Chen, P., Suter, D.: Rank constraints for homographies over two views: revisiting the rank four constraint. Int. J. Comput. Vis. 81(2), 205–225 (2009)
Article Google Scholar
Chin, T.J., Wang, H., Suter, D.: The ordered residual kernel for robust motion subspace clustering. In: Adv. Neural Inf. Process. Syst., vol. 22, pp. 333–341 (2009)
Chin, T.J., Yu, J., Suter, D.: Accelerated hypothesis generation for multistructure data via preference analysis. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 625–638 (2012)
Article Google Scholar
Chojnacki, W., Brooks, M.J.: On the consistency of the normalized eight-point algorithm. J. Math. Imaging Vis. 28(1), 19–27 (2007)
Article MathSciNet Google Scholar
Chojnacki, W., Brooks, M.J., van den Hengel, A., Gawley, D.: On the fitting of surfaces to data with covariances. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1294–1303 (2000)
Article Google Scholar
Chojnacki, W., Brooks, M.J., van den Hengel, A., Gawley, D.: Revisiting Hartley’s normalized eight-point algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 25(9), 1172–1177 (2003)
Article Google Scholar
Chojnacki, W., Brooks, M.J., van den Hengel, A., Gawley, D.: From FNS to HEIV: a link between two vision parameter estimation methods. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 264–268 (2004)
Article Google Scholar
Chojnacki, W., Brooks, M.J., van den Hengel, A., Gawley, D.: FNS, CFNS and HEIV: a unifying approach. J. Math. Imaging Vis. 23(2), 175–183 (2005)
Article Google Scholar
Chojnacki, W., van den Hengel, A.: A dimensionality result for multiple homography matrices. In: Proceedings of the 13th International Conference of Computer Vision, pp. 2104–2109 (2011)
Chojnacki, W., van den Hengel, A.: On the dimension of the set of two-view multi-homography matrices. Complex Anal. Oper. Theory 7(2), 465–484 (2013)
Article MATH MathSciNet Google Scholar
Chojnacki, W., Hill, R., van den Hengel, A., Brooks, M.J.: Multi-projective parameter estimation for sets of homogeneous matrices. In: Proceedings of the International Conference on Digital Image Computing: Techniques and Applications, pp. 119–124 (2009)
Chojnacki, W., Szpak, Z., Brooks, M.J., van den Hengel, A.: Multiple homography estimation with full consistency constraints. In: Proceedings of the International Conference on Digital Image Computing: Techniques and Applications, pp. 480–485 (2010)
Csurka, G., Zeller, C., Zhang, Z., Faugeras, O.D.: Characterizing the uncertainty of the fundamental matrix. Comput. Vis. Image Underst. 68(1), 18–36 (1997)
Article Google Scholar
Engl, H.W., Hanke, M., Neubauer, A.: Regularization of Inverse Problems. Kluwer, Dordrecht (1996)
Book MATH Google Scholar
Fouhey, D.F., Scharstein, D., Briggs, A.J.: Multiple plane detection in image pairs using J-linkage. In: Proceedings of the 20th International Conference on Pattern Recognition, pp. 336–339 (2010)
Fusiello, A.: A matter of notation: several uses of the Kronecker product in 3D computer vision. Pattern Recognit. Lett. 28(15), 2127–2132 (2007)
Article Google Scholar
Gao, J., Kim, S.J., Brown, M.S.: Constructing image panoramas using dual-homography warping. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–56 (2011)
Golub, G.H., Pereyra, V.: The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate. SIAM J. Numer. Anal. 10(2), 413–432 (1973)
Article MATH MathSciNet Google Scholar
Haralick, R.M.: Propagating covariance in computer vision. Int. J. Pattern Recognit. Artif. Intell. 10(5), 561–572 (1996)
Article Google Scholar
Hartley, R.: In defense of the eight-point algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 19(6), 580–593 (1997)
Article Google Scholar
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Kähler, O., Denzler, J.: Rigid motion constraints for tracking planar objects. In: Proceedings of the 29th DAGM Symposium. Lecture Notes in Computer Science, vol. 4713, pp. 102–111 (2007)
Kanatani, K.: Statistical Optimization for Geometric Computation: Theory and Practice. Elsevier, Amsterdam (1996)
MATH Google Scholar
Kanatani, K., Morris, D.D.: Gauges and gauge transformations for uncertainty description of geometric structure with indeterminacy. IEEE Trans. Inf. Theory 47(5), 2017–2028 (2001)
Article MATH MathSciNet Google Scholar
Kanatani, K., Ohta, N., Kanazawa, Y.: Optimal homography computation with a reliability measure. IEICE Trans. Inf. Syst. E83-D(7), 1369–1374 (2000)
Kanazawa, Y., Kawakami, H.: Detection of planar regions with uncalibrated stereo using distribution of feature points. In: Proceedings of the 15th British Machine Vision Conf., pp. 247–256 (2004)
Larsson, F., Felsberg, M.: Using Fourier descriptors and spatial models for traffic sign recognition. In: Proceedings of the 17th Scandinavian Conference on Image Analysis. Lecture Notes in Computer Science, vol. 6688, pp. 238–249 (2011)
Leedan, Y., Meer, P.: Heteroscedastic regression in computer vision: problems with bilinear constraint. Int. J. Comput. Vis. 37(2), 127–150 (2000)
Article MATH Google Scholar
Lütkepol, H.: Handbook of Matrices. Wiley, Chichester (1996)
Google Scholar
Ma, Y., Soatto, S., Košecká, J., Sastry, S.S.: An Invitation to 3-D Vision: From Images to Geometric Models, 2nd edn. Springer, New York (2005)
Google Scholar
Magnus, J.R., Neudecker, H.: Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley, Chichester (1988)
MATH Google Scholar
Matei, B., Meer, P.: Estimation of nonlinear errors-in-variables models for computer vision applications. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1537–1552 (2006)
Article Google Scholar
Mittal, S., Anand, S., Meer, P.: Generalized projection-based M-estimator. IEEE Trans. Pattern Anal. Mach. Intell. 34(12), 2351–2364 (2012)
Article Google Scholar
Penrose, R.: A generalized inverse for matrices. Math. Proc. Camb. Philos. Soc. 51(3), 406–413 (1955)
Article MATH MathSciNet Google Scholar
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C. Cambridge University Press, Cambridge (1995)
Google Scholar
Scoleri, T., Chojnacki, W., Brooks, M.J.: A multi-objective parameter estimator for image mosaicing. In: Proceedings of IEEE International Symposium on Signal Processing and its Applications, vol. 2, pp. 551–554 (2005)
Shashua, A., Avidan, S.: The rank 4 constraint in multiple ($\ge 3$) view geometry. In: Proceedings of the 4th European Conference on Computer Vision. Lecture Notes in Computer Science, vol. 1065, pp. 196–206 (1996)
Stewart, G.W.: On the continuity of the generalized inverse. SIAM J. Appl. Math. 17(1), 33–45 (1969)
Article MATH MathSciNet Google Scholar
Szpak, Z.L., Chojnacki, W., Eriksson, A., van den Hengel, A.: Sampson distance based joint estimation of multiple homographies with uncalibrated cameras. Comput. Vis. Image Underst. 125, 200–213 (2014)
Article Google Scholar
Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment—a modern synthesis. In: Proceedings of the International Workshop on Vision Algorithms. Lecture Notes in Computer Science, vol. 1883, pp. 298–372 (1999)
Vincent, E., Laganiere, R.: Detecting planar homographies in an image pair. In: Proceedings of the 2nd International Symposium on Image and Signal Processing Analysis, pp. 182–187 (2001)
Wong, H.S., Chin, T.J., Yu, J., Suter, D.: Dynamic and hierarchical multi-structure geometric model fitting. In: Proceedings of the 13th International Conference on Computer Vision, pp. 1044–1051 (2011)
Zelnik-Manor, L., Irani, M.: Multiview constraints on homographies. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 214–223 (2002)
Article Google Scholar

Download references

Acknowledgments

This research was supported by the Australian Research Council.

Author information

Authors and Affiliations

School of Computer Science, The University of Adelaide, Adelaide, SA, 5005, Australia
Wojciech Chojnacki, Zygmunt L. Szpak, Michael J. Brooks & Anton van den Hengel

Authors

Wojciech Chojnacki
View author publications
You can also search for this author in PubMed Google Scholar
Zygmunt L. Szpak
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Brooks
View author publications
You can also search for this author in PubMed Google Scholar
Anton van den Hengel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wojciech Chojnacki.

Appendices

Appendix A. Covariance of the AML estimate

Here, we derive a formula for the covariance matrix of the AML estimate of a vectorised homography matrix based on a set of image correspondences. It will be convenient to establish first an expression for the covariance matrix of the AML estimate of a parameter vector of a certain general model. This model will comprise, as particular cases, models whose parameters describe a relationship among image feature locations. Once the general formula for a covariance matrix is established, we shall then evolve a specialised formula for the case of the homography model.

1.1 A1. General model

The data–parameter relationship for the general model will be assumed in the form

$$\begin{aligned} \mathbf {f}(\mathbf {z},\varvec{\beta }) = \mathbf {0}, \end{aligned}$$

where $\mathbf {z}$ is a length-$k$ vector describing an ideal (noiseless) data point, $\varvec{\beta }$ is a length-$l$ vector of parameters, and $\mathbf {f}(\mathbf {z},\varvec{\beta })$ is a length-$m$ vector of constraints of the form

$$\begin{aligned} \mathbf {f}(\mathbf {z},\varvec{\beta }) = \mathbf {U}(\mathbf {z})^\top \varvec{\beta }, \end{aligned}$$

where $\mathbf {U}(\mathbf {z})$ is an $l \times m$ matrix—the so-called data carrier matrix—with entries formed by smooth functions in $\mathbf {z}$. Details on how this formulation applies to the homography model are given in Appendix A.2. It will be further assumed that the observed data points $\mathbf {z}_1, \dots , \mathbf {z}_N$ come equipped with covariance matrices $\varvec{\mathbf {\Lambda }}_{\mathbf {z}_1}^{}, \dots , \varvec{\mathbf {\Lambda }}_{\mathbf {z}_N}^{}$ quantifying measurement errors in the data. Under the assumption that the errors are independently sampled from Gaussian distributions with covariances of the form $\varvec{\mathbf {\Lambda }}_{\mathbf {z}_n}^{}$, $n=1, \dots , N$, the relevant AML cost function to fit the model parameters to the data is given by

$$\begin{aligned} J_{\mathrm {AML}}(\varvec{\beta }) = \sum _{n=1}^N \mathbf {f}(\mathbf {z},\varvec{\beta })^\top \varvec{\mathbf {\Sigma }}(\mathbf {z}_n, \varvec{\beta })^{-1} \mathbf {f}(\mathbf {z},\varvec{\beta }), \end{aligned}$$

where

$$\begin{aligned} \varvec{\mathbf {\Sigma }}(\mathbf {z}_n, \varvec{\beta }) = \partial _{\mathbf {z}}{\mathbf {f}(\mathbf {z}_n,\varvec{\beta })} \varvec{\mathbf {\Lambda }}_{\mathbf {z}_n}^{} [\partial _{\mathbf {z}}{\mathbf {f}(\mathbf {z}_n,\varvec{\beta })}]^\top \end{aligned}$$

(cf. [8, 10, 11, 26, 31, 35]). Importantly, when $m$, the common length of the $\mathbf {f}(\mathbf {z}_n,\varvec{\beta })$’s, surpasses the codimension $r$ of the submanifolds of the form $\{\mathbf {z} \in \mathbb {R}^k \mid \mathbf {f}(\mathbf {z},\varvec{\beta }) = \mathbf {0} \}$ with $\varvec{\beta }$ representing parameters under which the data might have been generated, the inverses $\varvec{\mathbf {\Sigma }}(\mathbf {z}_n,\varvec{\beta })^{-1}$ in the above expression for $J_{\mathrm {AML}}$ must be replaced by, say, the $r$ -truncated pseudo-inverses $\varvec{\mathbf {\Sigma }}(\mathbf {z}_n,\varvec{\beta })^+_r$ [17, 26]. Recall that the $r$-truncated pseudo-inverse of an $m \times m$ matrix $\mathbf {A}$, $\mathbf {A}_r^+$, is defined as follows: If $\mathbf {A} = \mathbf {U} \mathbf {D} \mathbf {V}^\top $ is the SVD of $\mathbf {A}$, with $\mathbf {D} = {{\mathrm{diag}}}(d_1, \dots , d_m)$, and if $\mathbf {A}_r = \mathbf {U} \mathbf {D}_r \mathbf {V}^\top $ with $\mathbf {D}_r = {{\mathrm{diag}}}(d_1, \dots , d_r, 0, \dots , 0)$ is the $r$-truncated SVD of $\mathbf {A}$, then $\mathbf {A}_r^+ = \mathbf {V} \mathbf {D}_r^+ \mathbf {U}^\top $ with $\mathbf {D}_r^+ = {{\mathrm{diag}}}(d_1^+, \dots , d_r^+, 0, \dots , 0)$, where $d_i^+ = d_i^{-1}$ when $d_i \ne 0$ and $d_i^+ = 0$ otherwise. The AML estimate of $\varvec{\beta }$, $\widehat{\varvec{\beta }}_{\mathrm {AML}}$, is the minimiser of $J_{\mathrm {AML}}$. As a consequence of $J_{\mathrm {AML}}$ being homogeneous of degree zero, $\widehat{\varvec{\beta }}_{\mathrm {AML}}$ is determined only up to scale. The estimate $\widehat{\varvec{\beta }}_{\mathrm {AML}}$ satisfies the necessary optimality condition

$$\begin{aligned}{}[\partial _{\varvec{\beta }}{J_{\mathrm {AML}}(\varvec{\beta })}]_{\varvec{\beta }= \widehat{\varvec{\beta }}_{\mathrm {AML}}} = \mathbf {0}^\top , \end{aligned}$$

(18)

which is the basis for all what follows. Using the formula

$$\begin{aligned} \mathbf {U}(\mathbf {z})^\top \varvec{\beta }= (\mathbf {I}_m \otimes \varvec{\beta }^\top ) {{\mathrm{vec}}}(\mathbf {U}(\mathbf {z})), \end{aligned}$$

(19)

one readily verifies that

$$\begin{aligned}{}[\partial _{\varvec{\beta }}{J_{\mathrm {AML}}(\varvec{\beta })}]^\top = 2 \mathbf {X}_{\varvec{\beta }} \varvec{\beta }, \end{aligned}$$

where $ \mathbf {X}_{\varvec{\beta }} = \mathbf {M}_{\varvec{\beta }} - \mathbf {N}_{\varvec{\beta }} $ is an $l \times l$ symmetric matrix with

$$\begin{aligned} \mathbf {M}_{\varvec{\beta }}&= \sum _{n=1}^N \mathbf {U}_n \varvec{\mathbf {\Sigma }}_n^{-1} \mathbf {U}_n^T, \end{aligned}$$

(20a)

$$\begin{aligned} \mathbf {N}_{\varvec{\beta }}&= \sum _{n=1}^N \left( \varvec{\eta }_n^\top \otimes \mathbf {I}_l\right) \mathbf {B}_n (\varvec{\eta }_n\otimes \mathbf {I}_l), \end{aligned}$$

(20b)

$$\begin{aligned} \mathbf {U}_n&= \mathbf {U}(\mathbf {z}_n), \end{aligned}$$

(20c)

$$\begin{aligned} \mathbf {B}_n&= \partial _{\mathbf {z}_n}{{{\mathrm{vec}}}(\mathbf {U}_n )} \varvec{\mathbf {\Lambda }}_{\mathbf {z}_n}^{} [\partial _{\mathbf {z}_n}{{{\mathrm{vec}}}(\mathbf {U}_n )}]^\top , \end{aligned}$$

(20d)

$$\begin{aligned} \varvec{\mathbf {\Sigma }}_n&= \left( \mathbf {I}_m \otimes \varvec{\beta }^\top \right) \mathbf {B}_n (\mathbf {I}_m \otimes \varvec{\beta }), \end{aligned}$$

(20e)

$$\begin{aligned} \varvec{\eta }_n&= \varvec{\mathbf {\Sigma }}_n^{-1} \mathbf {U}_n^\top \varvec{\beta }. \end{aligned}$$

(20f)

Accordingly, Eq. (18) can be rewritten as

$$\begin{aligned} \mathbf {X}_{\hat{\varvec{\beta }}} \hat{\varvec{\beta }}= \mathbf {0}, \end{aligned}$$

(21)

where $\widehat{\varvec{\beta }}_{\mathrm {AML}}$ is abbreviated to $\hat{\varvec{\beta }}$ for clarity. Hereafter, $\hat{\varvec{\beta }}=\hat{\varvec{\beta }}(\mathbf {z}_1, \dots , \mathbf {z}_N)$ will be assumed normalised and smooth as a function of $\mathbf {z}_1, \dots , \mathbf {z}_N$.

To derive an expression for the covariance matrix of $\hat{\varvec{\beta }}$, we use (21) in conjunction with the covariance propagation formula

$$\begin{aligned} \varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{} = \sum _{n=1}^N \partial _{\mathbf {z}_n}{\hat{\varvec{\beta }}} \varvec{\mathbf {\Lambda }}_{\mathbf {z}_n}^{} \left( \partial _{\mathbf {z}_n}{\hat{\varvec{\beta }}}\right) ^{\top } \end{aligned}$$

(22)

(cf. [16, 22]). Differentiating $\Vert \hat{\varvec{\beta }}\Vert ^2 =1$ with respect to $\mathbf {z}_n$ gives $ (\partial _{\mathbf {z}_n}{\hat{\varvec{\beta }}})^{\top }\hat{\varvec{\beta }}= \mathbf {0}. $ This together with (22) implies that

$$\begin{aligned} \hat{\varvec{\beta }}^\top \varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{} = \varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{} \hat{\varvec{\beta }}= \mathbf {0} \end{aligned}$$

so that $\varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{}$ is singular, and further yields

$$\begin{aligned} \mathbf {P}_{\hat{\varvec{\beta }}}^\perp \varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{} = \varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{} \mathbf {P}_{\hat{\varvec{\beta }}}^\perp = \varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{}, \end{aligned}$$

(23)

where, of course, $ \mathbf {P}_{\hat{\varvec{\beta }}}^\perp = \mathbf {I}_{l} - \Vert \hat{\varvec{\beta }}\Vert ^{-2} \hat{\varvec{\beta }}\hat{\varvec{\beta }}^{\top }. $ Letting $\mathbf {z}_n = [z_{n1} \dots z_{nk}]^\top $ and $\hat{\varvec{\beta }}= [\widehat{\beta }_1, \dots , \widehat{\beta }_l]^\top $, and differentiating (21) with respect to $z_{ni}$, we obtain

$$\begin{aligned} \left[ [\partial _{z_{ni}}{\mathbf {X}_{\varvec{\beta }}}]_{\varvec{\beta }= \hat{\varvec{\beta }}} + \sum _{j=1}^l [\partial _{\beta _j}{\mathbf {X}_{\varvec{\beta }}}]_{\varvec{\beta }= \hat{\varvec{\beta }}} \partial _{z_{ni}}{{\widehat{\beta }}_j} \right] \hat{\varvec{\beta }}+ \mathbf {X}_{\hat{\varvec{\beta }}} \partial _{z_{ni}}{\hat{\varvec{\beta }}} = \mathbf {0}. \end{aligned}$$

Introducing the Gauss–Newton approximation, i.e., neglecting the terms that contain $\hat{\varvec{\beta }}^\top \mathbf {u}(\mathbf {z}_n)$, we reduce this equality to the equality

$$\begin{aligned} \mathbf {U}_n \varvec{\mathbf {\Sigma }}_n^{-1} (\partial _{z_{ni}}{\mathbf {U}_n})^T \hat{\varvec{\beta }}+ \mathbf {M}_{\hat{\varvec{\beta }}} \partial _{z_{ni}}{\hat{\varvec{\beta }}} = \mathbf {0}. \end{aligned}$$

Now, in view of (19) and the fact that

$$\begin{aligned} (\partial _{z_{ni}}{\mathbf {U}_n})^T \hat{\varvec{\beta }}&= {{\mathrm{vec}}}((\partial _{z_{ni}}{\mathbf {U}_n})^T\hat{\varvec{\beta }}) = {{\mathrm{vec}}}(\hat{\varvec{\beta }}^T \partial _{z_{ni}}{\mathbf {U}_n})\\&= \left( \mathbf {I}_m \otimes \hat{\varvec{\beta }}^\top \right) \partial _{z_{ni}}{{{\mathrm{vec}}}(\mathbf {U}_n)}, \end{aligned}$$

we have

$$\begin{aligned} \mathbf {M}_{\hat{\varvec{\beta }}} \partial _{z_{ni}}{\hat{\varvec{\beta }}}&= - \mathbf {U}_n \varvec{\mathbf {\Sigma }}_n^{-1}(\partial _{z_{ni}}{\mathbf {U}_n})^T \hat{\varvec{\beta }}\\&= - \mathbf {U}_n \varvec{\mathbf {\Sigma }}_n^{-1} \left( \mathbf {I}_m \otimes \hat{\varvec{\beta }}^\top \right) \partial _{z_{ni}}{{{\mathrm{vec}}}(\mathbf {U}_n)} \end{aligned}$$

and further

$$\begin{aligned} \mathbf {M}_{\hat{\varvec{\beta }}} \partial _{\mathbf {z}_n}{\hat{\varvec{\beta }}} = - \mathbf {U}_n \varvec{\mathbf {\Sigma }}_n^{-1} \left( \mathbf {I}_m \otimes \hat{\varvec{\beta }}^\top \right) \partial _{\mathbf {z}_n}{{{\mathrm{vec}}}(\mathbf {U}_n)}. \end{aligned}$$

Hence,

$$\begin{aligned}&\mathbf {M}_{\hat{\varvec{\beta }}} \partial _{\mathbf {z}_n}{\hat{\varvec{\beta }}} \varvec{\mathbf {\Lambda }}_{\mathbf {z}_n}^{} (\partial _{\mathbf {z}_n}{\hat{\varvec{\beta }}})^{\top } \mathbf {M}_{\hat{\varvec{\beta }}} \\&\quad = \mathbf {U}_n \varvec{\mathbf {\Sigma }}_n^{-1} (\mathbf {I}_m \otimes \hat{\varvec{\beta }}^\top ) \\&\quad \quad \times \partial _{\mathbf {z}_n}{{{\mathrm{vec}}}(\mathbf {U}_n)} \varvec{\mathbf {\Lambda }}_{\mathbf {z}_n}^{} [\partial _{\mathbf {z}_n}{{{\mathrm{vec}}}(\mathbf {U}_n)}]^\top (\mathbf {I}_m \otimes \hat{\varvec{\beta }}) \varvec{\mathbf {\Sigma }}_n^{-1} \mathbf {U}_n^\top . \end{aligned}$$

But, by (20d) and (20e),

$$\begin{aligned}&\left( \mathbf {I}_m \otimes \hat{\varvec{\beta }}^\top \right) \partial _{\mathbf {z}_n}{{{\mathrm{vec}}}(\mathbf {U}_n)} \varvec{\mathbf {\Lambda }}_{\mathbf {z}_n}^{} [\partial _{\mathbf {z}_n}{{{\mathrm{vec}}}(\mathbf {U}_n)}]^\top (\mathbf {I}_m \otimes \hat{\varvec{\beta }})\\&\quad = \left( \mathbf {I}_m \otimes \hat{\varvec{\beta }}^\top \right) \mathbf {B}_n \left( \mathbf {I}_m \otimes \hat{\varvec{\beta }}\right) =\varvec{\mathbf {\Sigma }}_n \end{aligned}$$

so

$$\begin{aligned} \mathbf {M}_{\hat{\varvec{\beta }}} \partial _{\mathbf {z}_n}{\hat{\varvec{\beta }}} \varvec{\mathbf {\Lambda }}_{\mathbf {z}_n}^{} (\partial _{\mathbf {z}_n}{\hat{\varvec{\beta }}})^{\top } \mathbf {M}_{\hat{\varvec{\beta }}}&= \mathbf {U}_n \varvec{\mathbf {\Sigma }}_n^{-1} \varvec{\mathbf {\Sigma }}_n \varvec{\mathbf {\Sigma }}_n^{-1} \mathbf {U}_n^\top \\&= \mathbf {U}_n \varvec{\mathbf {\Sigma }}_n^{-1} \mathbf {U}_n^\top . \end{aligned}$$

Therefore, in view of (20a),

$$\begin{aligned} \mathbf {M}_{\hat{\varvec{\beta }}} \left[ \sum _{n=1}^N \partial _{\mathbf {z}_n}{\hat{\varvec{\beta }}} \varvec{\mathbf {\Lambda }}_{\mathbf {z}_n}^{} (\partial _{\mathbf {z}_n}{\hat{\varvec{\beta }}})^{\top } \right] \mathbf {M}_{\hat{\varvec{\beta }}} = \sum _{n=1}^N \mathbf {U}_n \varvec{\mathbf {\Sigma }}_n^{-1} \mathbf {U}_n^\top = \mathbf {M}_{\hat{\varvec{\beta }}}, \end{aligned}$$

or equivalently, on account of (22),

$$\begin{aligned} \mathbf {M}_{\hat{\varvec{\beta }}} \varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{} \mathbf {M}_{\hat{\varvec{\beta }}} = \mathbf {M}_{\hat{\varvec{\beta }}}. \end{aligned}$$

(24)

At this stage, one might be tempted to conclude that $\varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{} = \mathbf {M}_{\hat{\varvec{\beta }}}^{-1}$, but this would contravene the fact that $\varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{}$ is singular. To exploit (24) properly, we first note that, in view of (23),

$$\begin{aligned} \mathbf {P}_{\hat{\varvec{\beta }}}^\perp \varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{} \mathbf {P}_{\hat{\varvec{\beta }}}^\perp = \varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{}, \end{aligned}$$

(25)

so we can rewrite (24) as

$$\begin{aligned} \mathbf {M}_{\hat{\varvec{\beta }}} \mathbf {P}_{\hat{\varvec{\beta }}}^\perp \varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{} \mathbf {P}_{\hat{\varvec{\beta }}}^\perp \mathbf {M}_{\hat{\varvec{\beta }}} = \mathbf {M}_{\hat{\varvec{\beta }}}. \end{aligned}$$

Pre- and post-multiplying the last equation by $\mathbf {P}_{\hat{\varvec{\beta }}}^\perp $ and letting

$$\begin{aligned} \mathbf {M}_{\hat{\varvec{\beta }}}^\perp = \mathbf {P}_{\hat{\varvec{\beta }}}^\perp \mathbf {M}_{\hat{\varvec{\beta }}} \mathbf {P}_{\hat{\varvec{\beta }}}^\perp \end{aligned}$$

yield

$$\begin{aligned} \mathbf {M}_{\hat{\varvec{\beta }}}^\perp \varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{} \mathbf {M}_{\hat{\varvec{\beta }}}^\perp = \mathbf {M}_{\hat{\varvec{\beta }}}^\perp . \end{aligned}$$

Pre- and post-multiplying this equation by $(\mathbf {M}_{\hat{\varvec{\beta }}}^\perp )^{+}$ further yields

$$\begin{aligned} \left( \mathbf {M}_{\hat{\varvec{\beta }}}^\perp \right) ^{+} \mathbf {M}_{\hat{\varvec{\beta }}}^\perp \varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{} \mathbf {M}_{\hat{\varvec{\beta }}}^\perp \left( \mathbf {M}_{\hat{\varvec{\beta }}}^\perp \right) ^{+} = \left( \mathbf {M}_{\hat{\varvec{\beta }}}^\perp \right) ^{+} \mathbf {M}_{\hat{\varvec{\beta }}}^\perp \left( \mathbf {M}_{\hat{\varvec{\beta }}}^\perp \right) ^{+}. \end{aligned}$$

(26)

The matrix $\mathbf {M}_{\hat{\varvec{\beta }}}^\perp $ is symmetric and its null space is, generically, spanned by $\hat{\varvec{\beta }}$, so

$$\begin{aligned} \mathbf {M}_{\hat{\varvec{\beta }}}^\perp \left( \mathbf {M}_{\hat{\varvec{\beta }}}^\perp \right) ^{+} = (\mathbf {M}_{\hat{\varvec{\beta }}}^\perp )^{+} \mathbf {M}_{\hat{\varvec{\beta }}}^\perp = \mathbf {P}_{\hat{\varvec{\beta }}}^\perp \end{aligned}$$

(cf. [1, Cor. 3.5]). We also have $ (\mathbf {M}_{\hat{\varvec{\beta }}}^\perp )^{+} \mathbf {M}_{\hat{\varvec{\beta }}}^\perp (\mathbf {M}_{\hat{\varvec{\beta }}}^\perp )^{+} = (\mathbf {M}_{\hat{\varvec{\beta }}}^\perp )^{+} $ by virtue of one of the four defining properties of the pseudo-inverse [1, Thm. 3.9]. Therefore (26) can be restated as

$$\begin{aligned} \mathbf {P}_{\hat{\varvec{\beta }}}^\perp \varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{} \mathbf {P}_{\hat{\varvec{\beta }}}^\perp = \left( \mathbf {M}_{\hat{\varvec{\beta }}}^\perp \right) ^{+}, \end{aligned}$$

which, on account of (25), implies

$$\begin{aligned} \varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{} = (\mathbf {M}_{\hat{\varvec{\beta }}}^\perp )^{+}. \end{aligned}$$

(27)

We now derive an alternate formula for the covariance matrix of $\hat{\varvec{\beta }}$, namely

$$\begin{aligned} \varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{} = \mathbf {P}_{\hat{\varvec{\beta }}}^\perp (\mathbf {M}_{\hat{\varvec{\beta }}})^{+}_{l-1} \mathbf {P}_{\hat{\varvec{\beta }}}^\perp . \end{aligned}$$

(28)

In this form, $\varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{}$ is explicitly expressed as $ \varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{} = \mathbf {P}_{\hat{\varvec{\beta }}}^\perp \varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{0} \mathbf {P}_{\hat{\varvec{\beta }}}^\perp , $ with the pre-covariance matrix $\varvec{\mathbf {\Lambda }}_{\hat{\varvec{\beta }}}^{0} = (\mathbf {M}_{\hat{\varvec{\beta }}})^{+}_{l-1}$. We start by noting that, in view of (21), $\hat{\varvec{\beta }}$ is in the null space $\mathcal {N}(\mathbf {X}_{\hat{\varvec{\beta }}})$ of $\mathbf {X}_{\hat{\varvec{\beta }}}$. Generically, we may assume that $\mathcal {N}(\mathbf {X}_{\hat{\varvec{\beta }}})$ is spanned by $\hat{\varvec{\beta }}$. As $\mathbf {X}_{\hat{\varvec{\beta }}}$ is symmetric, the column space of $\mathbf {X}_{\hat{\varvec{\beta }}}$ is equal to the orthogonal complement of $\mathcal {N}(\mathbf {X}_{\hat{\varvec{\beta }}})$. In particular, $\mathbf {X}_{\hat{\varvec{\beta }}}$ has rank $l-1$. This together with $\mathbf {X}_{\hat{\varvec{\beta }}}$ being equal to $\mathbf {M}_{\hat{\varvec{\beta }}}$ to a first-order approximation implies that $\mathbf {X}_{\hat{\varvec{\beta }}}$ is in fact approximately equal to the $(l-1)$-truncated SVD of $\mathbf {M}_{\hat{\varvec{\beta }}}$, $(\mathbf {M}_{\hat{\varvec{\beta }}})_{l-1}$. Since the function $\mathbf {A} \mapsto \mathbf {A}^+$ is continuous on the set of all $l \times l$ matrices of constant rank $l-1$ [21, 34, 37, 41], we have, approximately,

$$\begin{aligned} \mathbf {X}_{\hat{\varvec{\beta }}}^{+} = (\mathbf {M}_{\hat{\varvec{\beta }}})^{+}_{l-1}. \end{aligned}$$

Taking into account that $ \mathbf {X}_{\hat{\varvec{\beta }}}^{+} = \mathbf {P}_{\hat{\varvec{\beta }}}^\perp \mathbf {X}_{\hat{\varvec{\beta }}}^{+} \mathbf {P}_{\hat{\varvec{\beta }}}^\perp , $ which immediately follows from (21), we see that, again approximately,

$$\begin{aligned} \mathbf {X}_{\hat{\varvec{\beta }}}^{+}=\mathbf {P}_{\hat{\varvec{\beta }}}^\perp (\mathbf {M}_{\hat{\varvec{\beta }}})^{+}_{l-1} \mathbf {P}_{\hat{\varvec{\beta }}}^\perp . \end{aligned}$$

(29)

As a consequence of $\mathbf {M}_{\hat{\varvec{\beta }}}$ being approximately equal to $\mathbf {X}_{\hat{\varvec{\beta }}}$, $\mathbf {M}_{\hat{\varvec{\beta }}}^\perp $ (= $\mathbf {P}_{\hat{\varvec{\beta }}}^\perp \mathbf {M}_{\hat{\varvec{\beta }}} \mathbf {P}_{\hat{\varvec{\beta }}}^\perp $) is approximately equal to $\mathbf {P}_{\hat{\varvec{\beta }}}^\perp \mathbf {X}_{\hat{\varvec{\beta }}} \mathbf {P}_{\hat{\varvec{\beta }}}^\perp = \mathbf {X}_{\hat{\varvec{\beta }}}$. Both $\mathbf {M}_{\hat{\varvec{\beta }}}^\perp $ and $\mathbf {X}_{\hat{\varvec{\beta }}}$ have rank $l-1$, so their pseudo-inverses are also approximately equal,

$$\begin{aligned} \left( \mathbf {M}_{\hat{\varvec{\beta }}}^\perp \right) ^+ = \mathbf {X}_{\hat{\varvec{\beta }}}^+, \end{aligned}$$

by the aforementioned continuity property of the pseudo-inverse. Hence, (29) can be restated as

$$\begin{aligned} \left( \mathbf {M}_{\hat{\varvec{\beta }}}^\perp \right) ^+ = \mathbf {P}_{\hat{\varvec{\beta }}}^\perp (\mathbf {M}_{\hat{\varvec{\beta }}})^{+}_{l-1} \mathbf {P}_{\hat{\varvec{\beta }}}^\perp , \end{aligned}$$

and this in combination with (27) yields (28).

In the case that the matrices $\varvec{\mathbf {\Sigma }}(\mathbf {z}_n, \varvec{\beta })^{-1}$ are replaced by the matrices $\varvec{\mathbf {\Sigma }}(\mathbf {z}_n,\varvec{\beta })^+_r$ in the expression for $J_{\mathrm {AML}}$, a similar change also affects the matrices $\mathbf {M}_{\hat{\varvec{\beta }}}$, $\mathbf {N}_{\hat{\varvec{\beta }}}$, and $\mathbf {X}_{\hat{\varvec{\beta }}}$. With $\mathbf {M}_{\hat{\varvec{\beta }}}$ suitably modified, formulae (27) and (28) continue to hold.

1.2 A.2. Homography model

If a planar homography is represented by an invertible $3 \times 3$ matrix $\mathbf {H}$ and if $\mathbf {m}' = [u', v', 1]^\top $ is the image of $\mathbf {m} = [u, v, 1]^\top $ by that homography, then

$$\begin{aligned} \mathbf {m}' \simeq \mathbf {H}\, \mathbf {m}, \end{aligned}$$

where $\,\simeq \,$ denotes equality up to scale. This relation can equivalently be written as

$$\begin{aligned}{}[\mathbf {m}']_{\times } \mathbf {H} \mathbf {m} = \mathbf {0}. \end{aligned}$$

(30)

With $\varvec{\beta }= {{\mathrm{vec}}}(\mathbf {H})$, $\mathbf {z} = [u,v,u',v']^\top $, and $ \mathbf {U}(\mathbf {z}) = - \mathbf {m} \otimes [\mathbf {m}']_{\times }, $ we have

$$\begin{aligned}{}[\mathbf {m}']_{\times } \mathbf {H} \mathbf {m} = \mathbf {U}(\mathbf {z})^\top \varvec{\beta }, \end{aligned}$$

and so (30) can be restated as

$$\begin{aligned} \mathbf {U}(\mathbf {z})^\top \varvec{\beta }= \mathbf {0}. \end{aligned}$$

(31)

The last relation encapsulates the homography model (for image motion) in the form conforming to the framework of Appendix A.1. Since the $9 \times 3$ matrix $\mathbf {U}(\mathbf {z})$ has rank $2$, the three equations in (31) are linearly dependent and can be reduced—by deleting any one of them—to a system of two equations. For $\varvec{\beta }\ne \mathbf {0}$, the reduced system gives two functionally independent constraints on $\mathbf {z}$, and this has the consequence that the set of image correspondences $\{ \mathbf {z} \in \mathbb {R}^4 \mid \mathbf {U}(\mathbf {z})^\top \varvec{\beta }= \mathbf {0} \}$ is a submanifold of $\mathbb {R}^4$ of codimension $2$.

Let $\{\mathbf {m}_n, \mathbf {m}^{\prime }_n\}_{n=1}^N$ be a set of image correspondences based on which an AML estimate of a homography is to be evolved. For each $n = 1, \dots , N$, write $\mathbf {m}_n = [u_n,v_n,1]^\top $ and $\mathbf {m}_n^{\prime } = [u_n^{\prime },v_n^{\prime },1]^\top $ and let $\mathbf {z}_n = [u_n,v_n,u^{\prime }_n,v^{\prime }_n]^\top $. Suppose that each pair $\mathbf {m}_n$, $\mathbf {m}^{\prime }_n$ comes equipped with a pair of $2 \times 2$ respective covariance matrices $\varvec{\mathbf {\Lambda }}_{u_n,v_n}^{}$, $\varvec{\mathbf {\Lambda }}_{u_n^{\prime },v_n^{\prime }}^{}$. For each $n = 1, \dots , N$, let

$$\begin{aligned} \varvec{\mathbf {\Lambda }}_{\mathbf {z_n}}^{} = \begin{bmatrix} \varvec{\mathbf {\Lambda }}_{u_n,v_n}^{}&\mathbf {0} \\ \mathbf {0}&\varvec{\mathbf {\Lambda }}_{u^{\prime }_n,v^{\prime }_n}^{} \end{bmatrix}. \end{aligned}$$

Since $\{ \mathbf {z} \in \mathbb {R}^4 \mid \mathbf {U}(\mathbf {z})^\top \varvec{\beta }= \mathbf {0} \}$ has codimension $2$, the appropriate AML cost function is given by

$$\begin{aligned} J_{\mathrm {AML}}(\varvec{\beta }) = \sum _{n=1}^N \varvec{\beta }^\top \mathbf {U}(\mathbf {z}_n) [\varvec{\mathbf {\Sigma }}(\mathbf {z}_n,\varvec{\beta })]^+_2 \mathbf {U}(\mathbf {z}_n)^\top \varvec{\beta }, \end{aligned}$$

where

$$\begin{aligned} \varvec{\mathbf {\Sigma }}(\mathbf {z}_n,\varvec{\beta })&= \left( \mathbf {I}_3 \otimes \varvec{\beta }^\top \right) \mathbf {B}(\mathbf {z}_n) (\mathbf {I}_3 \otimes \varvec{\beta }),\\ \mathbf {B}(\mathbf {z}_n)&= [\partial _{\mathbf {z}}{{{\mathrm{vec}}}(\mathbf {U}(\mathbf {z}))}]_{\mathbf {z} = \mathbf {z}_n} \varvec{\mathbf {\Lambda }}_{\mathbf {z}_n}^{} \left[ [\partial _{\mathbf {z}}{{{\mathrm{vec}}}(\mathbf {U}(\mathbf {z}))}]_{\mathbf {z} = \mathbf {z}_n}\right] ^\top , \end{aligned}$$

and, explicitly,

$$\begin{aligned} \partial _{\mathbf {z}}{{{\mathrm{vec}}}(\mathbf {U}(\mathbf {z}))}&= - \Big [{{\mathrm{vec}}}(\mathbf {e}_1 \otimes [\mathbf {m}']_{\times }), {{\mathrm{vec}}}(\mathbf {e}_2 \otimes [\mathbf {m}']_{\times }), \\&\qquad \quad {{\mathrm{vec}}}(\mathbf {m} \otimes [\mathbf {e}_1]_{\times }), {{\mathrm{vec}}}(\mathbf {m} \otimes [\mathbf {e}_2]_{\times })\Big ], \end{aligned}$$

with $\mathbf {e}_1 = [1, 0, 0]^\top $ and $\mathbf {e}_2 = [0, 1, 0]^\top $. Now, on account of (28), the covariance matrix of the AML estimate $\widehat{\varvec{\beta }}_{\mathrm {AML}} = {{\mathrm{vec}}}(\widehat{\mathbf {H}}_{\mathrm {AML}})$ can be explicitly expressed as

$$\begin{aligned} \varvec{\mathbf {\Lambda }}_{\widehat{\varvec{\beta }}_{\mathrm {AML}}}^{} = \mathbf {P}_{\widehat{\varvec{\beta }}_{\mathrm {AML}}}^\perp \varvec{\mathbf {\Lambda }}_{\widehat{\varvec{\beta }}_{\mathrm {AML}}}^{0} \mathbf {P}_{\widehat{\varvec{\beta }}_{\mathrm {AML}}}^\perp , \end{aligned}$$

where the pre-covariance matrix $\varvec{\mathbf {\Lambda }}_{\widehat{\varvec{\beta }}_{\mathrm {AML}}}^{0}$ is given by

$$\begin{aligned} \varvec{\mathbf {\Lambda }}_{\widehat{\varvec{\beta }}_{\mathrm {AML}}}^{0}&= (\mathbf {M}_{\widehat{\varvec{\beta }}_{\mathrm {AML}}})^+_8,\\ \mathbf {M}_{\widehat{\varvec{\beta }}_{\mathrm {AML}}}&= \left\| \widehat{\varvec{\beta }}_{\mathrm {AML}} \right\| ^2 \sum _{n=1}^N \mathbf {U}(\mathbf {z}_n) \left[ \varvec{\mathbf {\Sigma }}(\mathbf {z}_n, \widehat{\varvec{\beta }}_{\mathrm {AML}})\right] ^+_2 \mathbf {U}(\mathbf {z}_n)^\top . \end{aligned}$$

Appendix B. Covariance of the DLT estimate

We finally derive the formula for the covariance matrix of the DLT estimate of a vectorised homography matrix under the assumption that the estimate is evolved from a normalised image data set.

Let $\mathbf {T}$ and $\mathbf {T}'$ be two transformations for normalising the coordinates of 2D image points,

$$\begin{aligned} \tilde{\mathbf {m}} = \mathbf {T} \mathbf {m} \quad \text {and} \quad \tilde{\mathbf {m}}' = \mathbf {T}' \mathbf {m}'. \end{aligned}$$

The maps $\mathbf {T}$ and $\mathbf {T}'$ induce the corresponding transformation of homographies given by

$$\begin{aligned} \tilde{\mathbf {H}} = \mathbf {T}' \mathbf {H} \mathbf {T}^{-1}. \end{aligned}$$

A defining characteristic of this latter transformation is that $ \mathbf {m}' \simeq \mathbf {H} \mathbf {m} $ holds precisely when $ \tilde{\mathbf {m}}' \simeq \tilde{\mathbf {H}} \tilde{\mathbf {m}}. $ With $ \varvec{\beta }= {{\mathrm{vec}}}(\mathbf {H}) $ and $ \tilde{\varvec{\beta }}= {{\mathrm{vec}}}(\tilde{\mathbf {H}}), $ the transformation of homographies becomes

$$\begin{aligned} \tilde{\varvec{\beta }}= (\mathbf {T}^{-\top } \otimes \mathbf {T}') \varvec{\beta }. \end{aligned}$$

Let $\{\tilde{\mathbf {m}}_n, \tilde{\mathbf {m}}^{\prime }_n \}_{n=1}^N$ be a set of corresponding normalised 2D points. Set $\tilde{\mathbf {z}}_n = [\tilde{{u}}_n,\tilde{{v}}_n, \tilde{{u}}^{\prime }_n, \tilde{{v}}^{\prime }_n]^\top $ for each pair $\tilde{\mathbf {m}}_n = [\tilde{{u}}_n,\tilde{{v}}_n,1]^\top $ and $\tilde{\mathbf {m}}^{\prime }_n = [\tilde{{u}}^{\prime }_n,\tilde{{v}}^{\prime }_n,1]^\top $, and let

$$\begin{aligned} \tilde{\mathbf {A}} = \sum _{n=1}^N \mathbf {U}(\tilde{\mathbf {z}}_n)\mathbf {U}(\tilde{\mathbf {z}}_n)^\top . \end{aligned}$$

The DLT estimate of $\tilde{\varvec{\beta }}$, $\widehat{\tilde{\varvec{\beta }}}_{\mathrm {DLT}}$, based on $\{\tilde{\mathbf {m}}_n, \tilde{\mathbf {m}}^{\prime }_n \}_{n=1}^N$ is defined as the minimiser $\widehat{\tilde{\varvec{\beta }}}_{\mathrm {DLT}}$ of the cost function

$$\begin{aligned} \tilde{J}_{\mathrm {DLT}}(\tilde{\varvec{\beta }}) = \frac{\tilde{\varvec{\beta }}^\top \tilde{\mathbf {A}} \tilde{\varvec{\beta }}}{\Vert \tilde{\varvec{\beta }}\Vert ^2} \end{aligned}$$

and coincides with the eigenvector of $\tilde{\mathbf {A}}$ corresponding to the smallest value. The function $\tilde{J}_{\mathrm {DLT}}$ is similar in form to the function $J_{\mathrm {AML}}$—the scalar quantity $\Vert \varvec{\beta }\Vert ^2$ plays in $\tilde{J}_{\mathrm {DLT}}$ the role of the matrices $\varvec{\mathbf {\Sigma }}_n$ in $J_{\mathrm {AML}}$. Exploiting this observation, one can immediately put forward an argument along the lines of Appendix A.1, showing that $\widehat{\tilde{\varvec{\beta }}}_{\mathrm {DLT}}$ has the covariance matrix in the form

$$\begin{aligned} \varvec{{\Lambda }}_{\widehat{\tilde{\varvec{\beta }}}_{\mathrm {DLT}}} = \mathbf {P}_{\widehat{\tilde{\varvec{\beta }}}_{\mathrm {DLT}}}^\perp \varvec{\mathbf {\Lambda }}_{\widehat{\tilde{\varvec{\beta }}}_{\mathrm {DLT}}}^{0} \mathbf {P}_{\widehat{\tilde{\varvec{\beta }}}_{\mathrm {DLT}}}^\perp , \end{aligned}$$

where the pre-covariance matrix $\varvec{{\Lambda }}^{0}_{\widehat{\tilde{\varvec{\beta }}}_{\mathrm {DLT}}}$ is given by

$$\begin{aligned} \varvec{{\Lambda }}^{0}_{\widehat{\tilde{\varvec{\beta }}}_{\mathrm {DLT}}}&= \left( \mathbf {M}_{\widehat{\tilde{\varvec{\beta }}}_{\mathrm {DLT}}}\right) _8^+ \mathbf {D}_{\widehat{\tilde{\varvec{\beta }}}_{\mathrm {DLT}}}\left( \mathbf {M}_{\widehat{\tilde{\varvec{\beta }}}_{\mathrm {DLT}}}\right) _8^+,\\ \mathbf {D}_{\widehat{\tilde{\varvec{\beta }}}_{\mathrm {DLT}}}&= \left\| \widehat{\tilde{\varvec{\beta }}}_{\mathrm {DLT}}^\top \right\| ^{-2} \sum _{n=1}^N \mathbf {U}(\tilde{\mathbf {z}}_n) \varvec{\mathbf {\Sigma }}\left( \tilde{\mathbf {z}}_n,\widehat{\tilde{\varvec{\beta }}}_{\mathrm {DLT}}\right) \mathbf {U}(\tilde{\mathbf {z}}_n)^\top . \end{aligned}$$

The details of the calculation leading to the above expression for $\varvec{{\Lambda }}_{\widehat{\tilde{\varvec{\beta }}}_{\mathrm {DLT}}}$, analogous to those presented in Appendix A.1, are omitted.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chojnacki, W., Szpak, Z.L., Brooks, M.J. et al. Enforcing consistency constraints in uncalibrated multiple homography estimation using latent variables. Machine Vision and Applications 26, 401–422 (2015). https://doi.org/10.1007/s00138-015-0660-7

Download citation

Received: 02 September 2014
Accepted: 28 December 2014
Published: 13 March 2015
Issue Date: April 2015
DOI: https://doi.org/10.1007/s00138-015-0660-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Enforcing consistency constraints in uncalibrated multiple homography estimation using latent variables

Abstract

Similar content being viewed by others

Full Explicit Consistency Constraints in Uncalibrated Multiple Homography Estimation

A Homography Formulation to the 3pt Plus a Common Direction Relative Pose Problem

MultiCol Bundle Adjustment: A Generic Method for Pose Estimation, Simultaneous Self-Calibration and Reconstruction for Arbitrary Multi-Camera Systems

1 Introduction

2 Multi-projective parameter estimation and latent variables

Problem 1

3 Approximate maximum likelihood cost function and scale invariance

4 Rank-four constraint enforcement

5 Cost function optimisation

6 Multiple homography estimation

6.1 LM scheme

6.2 Initialisation procedure

Problem 2

7 Experimental verification

7.1 Synthetic data

7.2 Synthetic simulation procedure

7.3 Real data

7.4 Real data experiment procedure

7.5 Quantitative comparison of methods

7.6 Application: robust consistent homography estimation

8 Discussion

9 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A. Covariance of the AML estimate

1.1 A1. General model

1.2 A.2. Homography model

Appendix B. Covariance of the DLT estimate

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation