Guaranteed Ellipse Fitting with a Confidence Region and an Uncertainty Measure for Centre, Axes, and Orientation

Szpak, Zygmunt L.; Chojnacki, Wojciech; van den Hengel, Anton

doi:10.1007/s10851-014-0536-x

Guaranteed Ellipse Fitting with a Confidence Region and an Uncertainty Measure for Centre, Axes, and Orientation

Published: 25 September 2014

Volume 52, pages 173–199, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Guaranteed Ellipse Fitting with a Confidence Region and an Uncertainty Measure for Centre, Axes, and Orientation

Download PDF

Zygmunt L. Szpak¹,
Wojciech Chojnacki¹ &
Anton van den Hengel¹

1213 Accesses
39 Citations
Explore all metrics

Abstract

A simple and fast ellipse estimation method is presented based on optimisation of the Sampson distance serving as a measure of the quality of fit between a candidate ellipse and data points. Generation of ellipses, not just conics, as estimates is ensured through the use of a parametrisation of the set of all ellipses. Optimisation of the Sampson distance is performed with the aid of a custom variant of the Levenberg–Marquardt algorithm. The method is supplemented with a measure of uncertainty of an ellipse fit in two closely related forms. One of these concerns the uncertainty in the algebraic parameters of the fit and the other pertains to the uncertainty in the geometrically meaningful parameters of the fit such as the centre, axes, and major axis orientation. In addition, a means is provided for visualising the uncertainty of an ellipse fit in the form of planar confidence regions. For moderate noise levels, the proposed estimator produces results that are fully comparable in accuracy to those produced by the much slower maximum likelihood estimator. Due to its speed and simplicity, the method may prove useful in numerous industrial applications where a measure of reliability for geometric ellipse parameters is required.

The adaptation of the k-means algorithm to solving the multiple ellipses detection problem by using an initial approximation obtained by the DIRECT global optimization algorithm

Article 29 October 2019

A Multi-objective Robust Ellipse Fitting Algorithm

Concentric Ellipse Fitting with Bias Correction and Specialized Geometric-Based Clustering Approach

Article 26 May 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The task of fitting an ellipse to data is frequently encountered in numerous disciplines. This is partially explained by the fact that even though very few shapes and trajectories are perfectly elliptical, non-elliptical shapes and trajectories can often be modelled using one or more ellipses. Recently, Wong et al. [60] compiled a comprehensive list of ellipse fitting applications with items such as human gait analysis, grain sorting, steel coils quality assurance, and cell segmentation—to name a few. All of these applications require an ellipse fitting method that is (1) simple to implement, (2) computationally efficient, (3) accurate, (4) capable of delivering a measure of uncertainty associated with any estimate generated, and (5) always produces a fit in the form of an ellipse. Unfortunately, there is currently no ellipse fitting method available that simultaneously satisfies all of these requirements.

For example, the maximum likelihood ellipse estimation method in its classic form seeks to minimise the sum of the orthogonal distances between data points and a candidate ellipse. Minimising the orthogonal distance, however, is a complicated process in practice [1, 8, 10, 19, 32, 53, 61]. It necessitates projecting points onto conics—a task for which there is no closed form solution, and one that has been described by numerous authors as time consuming and numerically unstable [2, 11]. So even though the classic maximum likelihood estimation method, also called the orthogonal distance regression method, yields accurate results, it is often inadequate for many practical scenarios.

There are also algebraic ellipse fitting techniques based on error measures that determine how well a collection of data points satisfies the ellipse equation in a certain least squares sense [3, 25, 33, 48]. Least squares error measures are different from the orthogonal distance, but are much easier to compute and much easier to minimise. Therefore, algebraic ellipse fitting methods are typically fast and simple, and some can even always generate ellipses, not just conics, as estimates [17, 21, 42, 59]. However, in comparison to the orthogonal distance regression method, algebraic ellipse fitting techniques often produce estimates of inferior accuracy.

Another class of ellipse fitting methods comprises techniques tagged with labels such as gradient-weighted, approximate maximum likelihood (AML), and hyper-accurate [28, 29, 55]. The methods in this class seek a balance between the accuracy of the orthogonal distance regression method and the simplicity of algebraic fitting methods. What differentiates these techniques from pure algebraic schemes is that they have credible statistical foundations [9, 13] and are capable of accommodating the uncertainty associated with a data point. One particular estimation method that has stood the test of time was proposed independently by Turner [58] in 1974 and by Sampson [50] in 1982. The cost function underlying this method falls into the category of AML cost functions and is often called the Sampson distance [24, Sect. 4.2.6]. For moderate noise levels, the Sampson distance is an excellent approximation of the orthogonal distance. Whilst ellipse fitting methods based on the Sampson distance are typically simple, fast, and accurate at moderate noise levels, most do not guarantee generation of an estimate in the form of an ellipse.

An ellipse fitting method that is fast, simple, utilises the Sampson distance, and guarantees a genuine ellipse fit was recently proposed by Szpak et al. [54]. To enforce the ellipticity constraint, the method uses a barrier term added to the AML cost function. In this paper, we show that an explicit barrier term may be dispensed with in favour of a parametrisation of the set of all ellipses that leads to the implicit enforcement of the ellipticity constraint. The estimator that we develop based on this parametrisation is conceptually and practically simpler than its immediate predecessor involving a barrier term. As a further extension of the barrier method, we develop a measure of uncertainty of a Sampson distance based ellipse estimate in two closely related forms. One of these concerns the uncertainty in the algebraic parameters of the estimate and the other pertains to the uncertainty in the geometrically meaningful parameters of the estimate such as the centre, axes, and major axis orientation. We also provide a means for visualising the uncertainty of an estimate in the form of planar confidence regions. By providing a measure of uncertainty of an ellipse estimate, we facilitate principled statistical decision making for users of our algorithm. Thus, although the algorithm uses a parametrisation of the ellipses that has no immediate geometric significance, practitioners can apply our algorithm and still have a measure of reliability of the geometrically meaningful parameters of an ellipse estimate.

2 Background

A conic section, or simply a conic, is the locus of solutions $\mathbf {x} = [m_1, m_2]^\mathsf {T}$ in the Euclidean plane $\mathbb {R}^2$ of a quadratic equation

$$\begin{aligned} a m_1^2 + b m_1 m_2 + c m_2^2 + d m_1 + e m_2 + f = 0, \end{aligned}$$

(2.1)

where $a$, $b$, $c$, $d$, $e$, $f$ are real numbers such that $a^2 + b^2 + c^2 > 0$. With $\varvec{\theta }=[a, b, c, d, e, f]^\mathsf {T}$ and $\mathbf {u}(\mathbf {x}) = [m_1^2, m_1 m_2, m_2^2, m_1, m_2, 1]^\mathsf {T},$ Eq. (2.1) can equivalently be written as

$$\begin{aligned} \varvec{\theta }^T \mathbf {u}(\mathbf {x}) = 0. \end{aligned}$$

(2.2)

Any multiple of $\varvec{\theta }$ by a non-zero number corresponds to the same conic. A conic is non-degenerate if the determinant of the conic

$$\begin{aligned} D =\begin{vmatrix} a&b/2&d/2 \\ b/2&c&e/2 \\ d/2&e/2&f \end{vmatrix} \end{aligned}$$

is non-zero. When $D = 0$, the conic is degenerate. A non-degenerate conic is either an ellipse (possibly with no graph, that is, an ellipse reduced to an empty set), a parabola, or a hyperbola depending on whether the discriminant $\Delta =$ $ b^2 - 4ac$ is negative, zero, or positive, respectively. A degenerate conic can be either: a single point ($\Delta < 0$); an empty set, a straight line, or two parallel lines ($\Delta = 0$); or a pair of intersecting lines ($\Delta > 0$). By convention, degenerate conics with $\Delta < 0$, $\Delta = 0$, and $\Delta > 0$ are referred to as degenerate ellipses, degenerate parabolas, and degenerate hyperbolas, respectively.

The condition $\Delta < 0$ characterising the ellipses (non-degenerate or otherwise, including the ellipses with no graph or reduced to a point) can alternatively be written as

$$\begin{aligned} \varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta }> 0, \end{aligned}$$

(2.3)

where

$$\begin{aligned} \mathbf {F} =\begin{bmatrix} 1&0 \\ 0&0 \end{bmatrix}\otimes \begin{bmatrix} 0&0&2 \\ 0&-1&0 \\ 2&0&0 \end{bmatrix} \end{aligned}$$

and $\otimes $ denotes Kronecker product [35]. Hereafter, of all conics, we shall specifically be concerned with ellipses.

The task of fitting an ellipse to a set of points $\mathbf {x}_1, \dots , \mathbf {x}_N$ requires a meaningful cost function that characterises the extent to which any particular $\varvec{\theta }$ fails to satisfy the system of $N$ copies of Eq. (2.2) associated with $\mathbf {x} = \mathbf {x}_n$, $n = 1, \dots , N$. Once a cost function is selected, the corresponding ellipse fit is generated by minimising the cost function subject to the constraint (2.3).

Effectively, though not explicitly, Varah [59] and Fitzgibbon et al. [17] proposed to use for ellipse fitting the direct ellipse fitting cost function defined by

$$\begin{aligned} J_{\mathrm {DIR}}(\varvec{\theta }) = {\left\{ \begin{array}{ll} \frac{\varvec{\theta }^\mathsf {T}\mathbf {A} \varvec{\theta }}{\varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta }} &{} \text {if} \quad \varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta }> 0, \\ \infty &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$

where $\mathbf {A} = \sum _{n=1}^N \mathbf {u}(\mathbf {x}_n) \mathbf {u}(\mathbf {x}_n)^\mathsf {T}.$ The minimiser $\widehat{\varvec{\theta }}_{\mathrm {DIR}}$ of $J_{\mathrm {DIR}}$ is the same as the solution of the problem

$$\begin{aligned} \begin{array}{ll} \text {minimise} &{} \varvec{\theta }^\mathsf {T}\mathbf {A} \varvec{\theta }\\ \text {subject to} &{} \varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta }= 1, \end{array} \end{aligned}$$

(2.4)

and it is in this form that $\widehat{\varvec{\theta }}_{\mathrm {DIR}}$ was originally introduced. The representation of $\widehat{\varvec{\theta }}_{\mathrm {DIR}}$ as a solution of the problem (2.4) makes it clear that $\widehat{\varvec{\theta }}_{\mathrm {DIR}}$ is always an ellipse. Extending the work of Varah and Fitzgibbon et al., Halíř and Flusser [21] introduced a numerically stable algorithm for calculating $\widehat{\varvec{\theta }}_{\mathrm {DIR}}$.

Another cost function for ellipse fitting, and in fact more generally for conic fitting, was first proposed independently by Turner [58] and Sampson [50] and then popularised, in a broader context, by Taubin [56] and Kanatani [27]. It is associated with many names such as the Sampson, gradient-weighted, and approximate maximum likelihood (AML) distance or cost function, and takes the form

$$\begin{aligned} J_{\mathrm {AML}}(\varvec{\theta }) = \sum _{n=1}^N \frac{\varvec{\theta }^\mathsf {T}\mathbf {A}_n \varvec{\theta }}{\varvec{\theta }^\mathsf {T}\mathbf {B}_n \varvec{\theta }} \end{aligned}$$

with

$$\begin{aligned} \mathbf {A}_n = \mathbf {u}(\mathbf {x}_n) \mathbf {u}(\mathbf {x}_n)^\mathsf {T}\quad \text {and} \quad \mathbf {B}_n = \partial _{\mathbf {x}}\mathbf {u}(\mathbf {x}_n) \varvec{\varLambda }_{\mathbf {x}_n}^{} \partial _{\mathbf {x}}\mathbf {u}(\mathbf {x}_n)^\mathsf {T}\end{aligned}$$

(2.5)

for each $n = 1, \dots , N$. Here, for any length-$2$ vector $\mathbf {y}$, $\partial _{\mathbf {x}}\mathbf {u}(\mathbf {y})$ denotes the $6 \times 2$ matrix of the partial derivatives of the function $\mathbf {x} \mapsto \mathbf {u}(\mathbf {x})$ evaluated at $\mathbf {y}$, and, for each $n = 1, \dots , N$, $\varvec{\varLambda }_{\mathbf {x}_n}^{}$ is a $2 \times 2$ symmetric covariance matrix describing the uncertainty of the data point $\mathbf {x}_n$ [6, 13, 27]. The function $J_{\mathrm {AML}}$ is a first-order approximation of a genuine maximum likelihood cost function $J_{\mathrm {ML}}$ which can be evolved based on the Gaussian model of errors in conjunction with the principle of maximum likelihood. In the case when identical independent homogeneous Gaussian noise corrupts the data points, $J_{\mathrm {ML}}$ reduces—up to a numeric constant depending on the noise level—to the sum of orthogonal distances of the data points and an ellipse. For the specific task of ellipse fitting, the search space for minimising $J_{\mathrm {AML}}$ is reduced from the set of all length-$6$ vectors to the set $E = \{ \varvec{\theta }\mid \Delta < 0 \}$ of all ellipses, each member of the latter set being conventionally referred to as a feasible point. The approximated maximum likelihood estimate $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ is, by definition, the minimiser of $J_{\mathrm {AML}}$ selected from among all feasible points.

The existence of a minimiser of $J_{\mathrm {AML}}$ in the set of all ellipses is not always guaranteed. In contrast, there always exists a minimiser of $J_{\mathrm {AML}}$ in the joint set $EP = \{ \varvec{\theta }\mid \Delta \le 0\}$ of ellipses and parabolas. Indeed, $J_{\mathrm {AML}}$ is homogeneous in $\varvec{\theta }$ (meaning that $J_{\mathrm {AML}}(\lambda \varvec{\theta }) = J_{\mathrm {AML}}(\varvec{\theta })$ for any scalar factor $\lambda $) and this implies that the range of $J_{\mathrm {AML}}$—the set of values that $J_{\mathrm {AML}}$ can take—is the same as the range of the restriction of $J_{\mathrm {AML}}$ to the five-dimensional unit sphere $\mathbb {S}_5= \{ \varvec{\theta }\mid \Vert \varvec{\theta }\Vert = 1\}$, where $\Vert \cdot \Vert $ denotes the Euclidean norm. Consequently, the search for a minimiser of $J_{\mathrm {AML}}$ within $EP$ may be restricted to the search within the intersection of $EP$ and $\mathbb {S}_5$, $EP \cap \mathbb {S}_5$, characterised by the conjunction of the conditions $\Delta \le 0$ and $\Vert \varvec{\theta }\Vert = 1$. This intersection is a closed subset of $\mathbb {S}_5$ and as such is compact in a topological sense. Now, the function $J_{\mathrm {AML}}$ is continuous, and, given that any continuous function attains a minimum on a compact set, the function $J_{\mathrm {AML}}$, considered over $EP \cap \mathbb {S}_5$, attains its minimum on $EP \cap \mathbb {S}_5$.

Typically, when the data set adheres sufficiently closely to a generative model producing data points based on a bona fide ellipse, the minimiser of $J_{\mathrm {AML}}$ within $EP$ will represent a genuine ellipse. In some scenarios, however, the minimiser will be a member of $P = \{ \varvec{\theta }\mid \Delta = 0 \}$, the “parabolic” boundary of the set $E$ of all ellipses. For example, when all data points lie on a line [7], the minimiser interpreted geometrically coincides with the line on which the points lie, this line being an instance of a degenerate parabola. In other words, the problem of minimising $J_{\mathrm {AML}}$ within the set of ellipses is ill posed, and solving it requires some form of regularisation. We shall regularise the minimisation problem by adaptively restricting the search domain for a minimiser to a subset of $E$. The restriction procedure will take the form of a stopping criterion for an optimisation algorithm, preventing iteratively generated estimates from going too close to the set of parabolas $P$. While this approach involves a good deal of arbitrariness, one has to immediately point out that such arbitrariness is unavoidable: no ill-posed problem admits a unique, or canonical, regularisation.

3 Optimisation Using a Barrier Term

In our previous work [54] we developed a technique for near-optimising $J_{\mathrm {AML}}$ subject to the ellipticity constraint. The method uses the merit function

$$\begin{aligned} P(\varvec{\theta }, \alpha ) = J_{\mathrm {AML}}(\varvec{\theta }) + \alpha g(\varvec{\theta }), \end{aligned}$$

where $g$ is a barrier function of the form

$$\begin{aligned} g(\varvec{\theta }) = \frac{\Vert \varvec{\theta }\Vert ^4}{(\varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta })^2} = \Vert \varvec{\theta }\Vert ^4 \Delta ^{-2} \end{aligned}$$

and $\alpha $ is a positive number. A solution representing an ellipse is sought in the form of a local minimiser of $P$. The critical property of the barrier function is that it tends to infinity as $\varvec{\theta }$ approaches the parabolic boundary $P = \{ \varvec{\theta }\mid \varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta }= 0 \}$ between the feasible elliptic region $E = \{ \varvec{\theta }\mid \varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta }> 0 \}$ and the infeasible hyperbolic region $H = \{ \varvec{\theta }\mid \varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta }< 0 \}$ of the search space. This property ensures that if $P$ is optimised in sufficiently short steps starting from a feasible point, then any local miminiser reached on the way is feasible; and if the value of $\alpha $ is small enough, this local minimiser is a good approximation of a local minimiser of $J_{\mathrm {AML}}$. By design, the parameter $\alpha $ is set to a very small number (of the order of $10^{-15}$). The search for an optimal solution is carried out by a fast converging iterative procedure that takes the direct ellipse fit $\widehat{\varvec{\theta }}_{\mathrm {DIR}}$ for a seed. The technique turns out to be statistically accurate and computationally efficient. However, an issue that might be of concern with regard to the method is that it produces ellipse estimates that are not exact minimisers of $J_{\mathrm {AML}}$.

4 Optimisation Using a Parametrisation

Below we present a method capable of isolating ellipse estimates that are exact minimisers of $J_{\mathrm {AML}}$ within the feasible part of the search space, provided that $J_{\mathrm {AML}}$ admits a local minimum on that part. When $J_{\mathrm {AML}}$ has no local minimum amongst the ellipses, the method delivers an ellipse estimate which is an approximation of a (possibly generalised) parabola constituting the minimiser of $J_{\mathrm {AML}}$ within the joint set of ellipses and parabolas. The barrier term from our previous method plays no role here—this term is simply discarded. The method searches for a solution by employing a parametrisation of the set of all ellipses. The search is conducted with the aid of a specially crafted Levenberg–Marquardt (LM) algorithm, which ensures a fast convergence towards the optimal solution starting from an initial guess based on the direct ellipse fit. The specific design of our proposed algorithm is a substantial contribution of the present paper.

4.1 A Parametrisation of the Set of All Ellipses

The set of all ellipses admits a natural parametrisation. To reveal it, consider an ellipse described by $\varvec{\theta }$. Then $a \ne 0$ and $c \ne 0$, for otherwise—should $a = 0$ or $c = 0$ hold—the discriminant $\Delta = b^2 - 4ac = b^2$ would be non-negative, and this would contradict the fact that for ellipses the discriminant is always negative. Given this and the fact that the scale of $\varvec{\theta }$ is irrelevant as far as the determination of the underlying ellipse is involved, it is safe to assume that $a = 1$. Under this assumption the condition $\Delta < 0$ becomes $4c > b^2$, and letting $b = 2p$, we see that $c > p^2$, or, equivalently, $c = p^2 + q^{-2}$ for some $q$. It is now clear that the set of ellipses can be parametrised as

$$\begin{aligned} \varvec{\theta }= \varvec{\kappa }(\varvec{\eta }), \quad \varvec{\kappa }(\varvec{\eta }) = [1, 2p, p^2+q^{-2}, r, s, t]^\mathsf {T}, \end{aligned}$$

(4.1)

with $\varvec{\eta }= [p,q,r,s,t]^\mathsf {T}$ running over all possible length-$5$ vectors. A variant of the above parametrisation that will be particularly useful in what follows is given by

$$\begin{aligned} \varvec{\theta }= \varvec{\pi }(\varvec{\eta }), \quad \varvec{\pi }(\varvec{\eta }) = \Vert \varvec{\kappa }(\varvec{\eta }) \Vert ^{-1} \varvec{\kappa }(\varvec{\eta }). \end{aligned}$$

(4.2)

The significance of the unit-normalisation step will become apparent when we proceed to develop our customised version of the LM algorithm.

4.2 Optimisation Algorithm

With the parametrisation (4.2) in place, the effective function to optimise is

$$\begin{aligned} J_{\mathrm {AML}}'(\varvec{\eta }) = J_{\mathrm {AML}}(\varvec{\pi }(\varvec{\eta })). \end{aligned}$$

We shall next evolve a modified version of the LM algorithm for efficient optimisation of this function. It will be convenient to precede the description of our version of the LM scheme with a presentation of the standard form of the method.

4.2.1 Initialisation

Because the LM algorithm is an iterative technique, both its standard and modified form will require a proper initialisation. An initial value of $\varvec{\eta }$, $\varvec{\eta }_0$, can be extracted from any estimate $\widehat{\varvec{\theta }}$ representing a genuine ellipse by setting $\varvec{\eta }_0 = \mathbf {c}(\widehat{\varvec{\theta }})$, where $\mathbf {c}$ is the composition of the mapping $\varvec{\theta }\mapsto \varvec{\theta }/\theta _1 = [1, \theta _2/\theta _1, \dots \theta _6/\theta _1]^\mathsf {T}$ and the inverse of the mapping $\varvec{\eta }\mapsto \varvec{\kappa }(\varvec{\eta })$. Specifically, $\mathbf {c}(\varvec{\theta }) = [p, q, r, s, t]^\mathsf {T}$ is given by

$$\begin{aligned} p&= \frac{\theta _2}{2\theta _1}, \quad q = \left( \frac{\theta _3}{\theta _1}- \left( \frac{\theta _2}{2\theta _1} \right) ^2\right) ^{-1/2},\nonumber \\ r&= \frac{\theta _4}{\theta _1}, \quad s = \frac{\theta _5}{\theta _1}, \quad t = \frac{\theta _6}{\theta _1}. \end{aligned}$$

(4.3)

As the direct ellipse fit always represents a genuine ellipse, a natural choice for $\varvec{\eta }_0$ is $\mathbf {c}(\widehat{\varvec{\theta }}_{\mathrm {DIR}})$.

4.2.2 Standard Form of LM

Both the standard and our version of the LM scheme will rely on a least squares expression for $J_{\mathrm {AML}}'$. Let $\mathbf {r}(\varvec{\theta }) = [r_1(\varvec{\theta }), \dots , r_N(\varvec{\theta })]^\mathsf {T}$, where

$$\begin{aligned} r_n(\varvec{\theta }) = \left( \frac{\varvec{\theta }^\mathsf {T}\mathbf {A}_n \varvec{\theta }}{\varvec{\theta }^\mathsf {T}\mathbf {B}_n \varvec{\theta }} \right) ^{1/2} \end{aligned}$$

for each $n = 1, \dots , N$. Then the function $J_{\mathrm {AML}}$ can be written in least squares form as

$$\begin{aligned} J_{\mathrm {AML}}(\varvec{\theta }) = \mathbf {r}(\varvec{\theta })^\mathsf {T}\mathbf {r}(\varvec{\theta }) = \Vert \mathbf {r}(\varvec{\theta }) \Vert ^2. \end{aligned}$$

Similarly, the companion function $J_{\mathrm {AML}}'$ can be expressed as

$$\begin{aligned} J_{\mathrm {AML}}'(\varvec{\eta }) = \Vert \mathbf {r}'(\varvec{\eta }) \Vert ^2, \end{aligned}$$

where$\mathbf {r}'(\varvec{\eta }) = [r'_1(\varvec{\eta }), \dots , r'_N(\varvec{\eta })]^\mathsf {T}$ with $r'_n(\varvec{\eta }) = r_n(\varvec{\pi }(\varvec{\eta }))$ for each $n = 1, \dots , N$.

A full description of any variant of the LM scheme requires an explicit form of the Jacobian matrix of $\mathbf {r}'(\varvec{\eta })$. A straightforward computation shows that the Jacobian matrix of $\mathbf {r}(\varvec{\theta })$,

$$\begin{aligned} \partial _{\varvec{\theta }} \mathbf {r}(\varvec{\theta }) = [ (\partial _{\varvec{\theta }}r_1(\varvec{\theta }))^\mathsf {T}, \dots , (\partial _{\varvec{\theta }}r_N(\varvec{\theta }))^\mathsf {T}]^\mathsf {T}, \end{aligned}$$

is fully determined by the relations

$$\begin{aligned} (\partial _{\varvec{\theta }} r_n(\varvec{\theta }))^\mathsf {T}&= r_n^{-1}(\varvec{\theta }) \mathbf {X}_n(\varvec{\theta }) \varvec{\theta }, \\ \mathbf {X}_n(\varvec{\theta })&= \frac{\mathbf {A}_n}{\varvec{\theta }^\mathsf {T}\mathbf {B}_n \varvec{\theta }} - \frac{\varvec{\theta }^\mathsf {T}\mathbf {A}_n \varvec{\theta }}{(\varvec{\theta }^\mathsf {T}\mathbf {B}_n \varvec{\theta })^2} \mathbf {B}_n \end{aligned}$$

for all $n = 1, \dots , N$. For each $m = 1, 2, \dots $, denote by $\mathbf {I}_m$ the $m \times m$ identity matrix. Let $\mathbf {P}_{\varvec{\theta }}^\perp $ be the symmetric matrix representing the projection along $\varvec{\theta }$ onto the orthogonal complement of $\varvec{\theta }$, given by

$$\begin{aligned} \mathbf {P}_{\varvec{\theta }}^\perp = \mathbf {I}_6 - \Vert \varvec{\theta }\Vert ^{-2} \varvec{\theta }\varvec{\theta }^\mathsf {T}. \end{aligned}$$

(4.4)

Then the Jacobian matrix of $\mathbf {r}'(\varvec{\eta })$ can be expressed as

$$\begin{aligned} \partial _{\varvec{\eta }}{\mathbf {r}'}(\varvec{\eta }) = \partial _{\varvec{\theta }}{\mathbf {r}(\varvec{\pi }(\varvec{\eta }))} \, \partial _{\varvec{\eta }}{\varvec{\pi }}(\varvec{\eta }), \end{aligned}$$

where

$$\begin{aligned} \partial _{\varvec{\eta }}{\varvec{\pi }}(\varvec{\eta }) = \Vert \varvec{\kappa }(\varvec{\eta }) \Vert ^{-1} \mathbf {P}_{\varvec{\kappa }(\varvec{\eta })}^\perp \partial _{\varvec{\eta }}{\varvec{\kappa }(\varvec{\eta })} \end{aligned}$$

and

$$\begin{aligned} \partial _{\varvec{\eta }}{\varvec{\kappa }}(\varvec{\eta }) = \begin{bmatrix} 0&0&0&0&0 \\ 2&0&0&0&0 \\ 2p&-2q^{-3}&0&0&0 \\ 0&0&1&0&0 \\ 0&0&0&1&0 \\ 0&0&0&0&1 \end{bmatrix}. \end{aligned}$$

We can now summarise the standard LM algorithm for optimising $J_{\mathrm {AML}}'$ as follows. Starting with an initial estimate $\varvec{\eta }_0$, the scheme iteratively replaces a current estimate $\varvec{\eta }_k$ with a new estimate $\varvec{\eta }_{k+1}$ based on the update rule

$$\begin{aligned}&\varvec{\eta }_{k+1} = \varvec{\eta }_k + \delta \varvec{\eta }_k,\end{aligned}$$

(4.5)

$$\begin{aligned}&\delta \varvec{\eta }_k = - \big ((\partial _{\varvec{\eta }}{\mathbf {r}'}(\varvec{\eta }_k))^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'}(\varvec{\eta }_k) + \lambda _k \mathbf {I}_5\big )^{-1} (\partial _{\varvec{\eta }}{\mathbf {r}'}(\varvec{\eta }_k))^\mathsf {T}\mathbf {r}'(\varvec{\eta }_k). \end{aligned}$$

(4.6)

Here $\lambda _k$ is a non-negative scalar that dynamically changes from step to step. Details concerning the choice of $\lambda _k$ can be found in [47]; see also Sect. 9.

4.2.3 Modified Form of LM

Experiments show that the standard form of the LM scheme applied to $J_{\mathrm {AML}}'$ can sometimes take relatively many iterations before reaching an optimal solution. To achieve a faster convergence, we shall modify the LM algorithm, replacing the standard weight matrix $\lambda \mathbf {I}_5$ in (4.6) by a non-standard, step-dependent, positive-definite weight matrix $\lambda \mathbf {W}_{\varvec{\eta }}$. The customised scheme retains the logic of the standard scheme in that for large values of the scale factor $\lambda $, the LM step will align with a descent direction for $J_{\mathrm {AML}}'$, which is the direction of $-(\lambda \mathbf {W}_{\varvec{\eta }})^{-1} (\partial _{\varvec{\eta }}{\mathbf {r}'}(\varvec{\eta }_k))^\mathsf {T}\mathbf {r}'(\varvec{\eta }_k).$ This direction is no longer the direction of the steepest descent in the ordinary Euclidean norm on the parameter space, but rather in the norm $ \vert \vert \vert \varvec{\zeta } \vert \vert \vert = (\varvec{\zeta }^\mathsf {T}\mathbf {W}_{\varvec{\eta }} \varvec{\zeta })^{1/2}$ [43, Sect. 8.2.2]. For precedents on the use of weight matrices in LM different from multiples of the identity matrix, see [5, 44]. Passing to the specifics of our scheme, we shall adopt

$$\begin{aligned} \mathbf {W}_{\varvec{\eta }} = (\partial _{\varvec{\eta }}{\varvec{\pi }}(\varvec{\eta }))^\mathsf {T}\partial _{\varvec{\eta }}{\varvec{\pi }}(\varvec{\eta }), \end{aligned}$$

(4.7)

with the main ingredient of update rule then becoming

$$\begin{aligned} \delta \varvec{\eta }_k = - \big ((\partial _{\varvec{\eta }}{\mathbf {r}'}(\varvec{\eta }_k))^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'}(\varvec{\eta }_k) + \lambda _k \mathbf {W}_{\varvec{\eta }_k}\big )^{-1} (\partial _{\varvec{\eta }}{\mathbf {r}'}(\varvec{\eta }_k))^\mathsf {T}\mathbf {r}'(\varvec{\eta }_k).\nonumber \\ \end{aligned}$$

(4.8)

The motivation behind this choice of the weight matrix has to do with the experimentally observed fact that the LM algorithm for optimising the non-parametrised function $J_{\mathrm {AML}}$ (which produces conics, but not necessarily ellipses, as optimal solutions), based on the rule

$$\begin{aligned} \varvec{\theta }_{k+1}&= \varvec{\theta }_k + \delta \varvec{\theta }_k,\end{aligned}$$

(4.9)

$$\begin{aligned} \delta \varvec{\theta }_k&= - \big ((\partial _{\varvec{\theta }}{\mathbf {r}}(\varvec{\theta }_k))^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}}(\varvec{\theta }_k) + \lambda _k \mathbf {I}_6\big )^{-1} (\partial _{\varvec{\theta }}{\mathbf {r}}(\varvec{\theta }_k))^\mathsf {T}\mathbf {r}(\varvec{\theta }_k), \end{aligned}$$

(4.10)

runs faster than the standard LM algorithm for optimising $J_{\mathrm {AML}}'$ (as per (4.5) and (4.6)). One might hope then that if $\mathbf {W}_{\varvec{\eta }}$ is chosen in such a way that the increment

$$\begin{aligned} \delta \varvec{\eta }= - \big ((\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} + \lambda \mathbf {W}_{\varvec{\eta }} \big )^{-1} (\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\mathbf {r}' \end{aligned}$$

transformed into an increment of $\varvec{\theta }$ by means of the relation

$$\begin{aligned} \delta \varvec{\theta }= \partial _{\varvec{\eta }}{\varvec{\pi }} \, \delta \varvec{\eta }\end{aligned}$$

becomes

$$\begin{aligned} \delta \varvec{\theta }= - \big ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} + \lambda \mathbf {I}_6\big )^{-1} (\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\mathbf {r} \end{aligned}$$

and in so doing mimics the rule given in (4.10), then the LM algorithm employing such $\delta \varvec{\eta }$ will run faster than the standard version of LM. Of course, the above is just a guiding principle whose merit has to be scrutinised experimentally. Fortunately, relevant results confirm the efficacy of the proposed approach—see Sect. 10.

We now discuss how the above condition on $\mathbf {W}_{\varvec{\eta }}$ leads to formula (4.7). The defining property of $\mathbf {W}_{\varvec{\eta }}$ is

$$\begin{aligned}&\!\!\!\partial _{\varvec{\eta }}{\varvec{\pi }} ((\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} +\lambda \mathbf {W}_{\varvec{\eta }})^{-1} (\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\mathbf {r}'\\&\quad =\big ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} + \lambda \mathbf {I}_6\big )^{-1} (\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\mathbf {r}. \end{aligned}$$

Here, of course, $\mathbf {r}'$ and $\partial _{\varvec{\eta }}{\mathbf {r}'}$ are evaluated at $\varvec{\eta }$, and $\mathbf {r}$ and $\partial _{\varvec{\theta }}{\mathbf {r}}$ are evaluated at $\varvec{\pi }(\varvec{\eta })$; in particular, $\mathbf {r}' = \mathbf {r}'(\varvec{\eta }) = \mathbf {r}(\varvec{\pi }(\varvec{\eta })) = \mathbf {r}$. Taking into account that

$$\begin{aligned}&\!\!\!\partial _{\varvec{\eta }}{\varvec{\pi }} ((\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} + \lambda \mathbf {W}_{\varvec{\eta }})^{-1} (\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\mathbf {r}'\\&\quad =\partial _{\varvec{\eta }}{\varvec{\pi }} ((\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} + \lambda \mathbf {W}_{\varvec{\eta }})^{-1} (\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}(\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\mathbf {r}', \end{aligned}$$

we see that $\mathbf {W}_{\varvec{\eta }}$ has to satisfy

$$\begin{aligned} \partial _{\varvec{\eta }}{\varvec{\pi }} ((\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} + \lambda \mathbf {W}_{\varvec{\eta }})^{-1} (\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}= ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6)^{-1}. \end{aligned}$$

(4.11)

For a given matrix $\mathbf {A}$, denote by $\mathbf {A}^+$ the Moore–Penrose pseudo-inverse of $\mathbf {A}$ [35]. It is easily seen that

$$\begin{aligned}&(\partial _{\varvec{\eta }}{\varvec{\pi }})^+ = ((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}\partial _{\varvec{\eta }}{\varvec{\pi }})^{-1}(\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}, \end{aligned}$$

(4.12a)

$$\begin{aligned}&((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T})^+ = \partial _{\varvec{\eta }}{\varvec{\pi }} ((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}\partial _{\varvec{\eta }}{\varvec{\pi }})^{-1}. \end{aligned}$$

(4.12b)

Hence, in particular,

$$\begin{aligned}&(\partial _{\varvec{\eta }}{\varvec{\pi }})^+ \partial _{\varvec{\eta }}{\varvec{\pi }} = \mathbf {I}_5, \end{aligned}$$

(4.13a)

$$\begin{aligned}&(\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T})^+ = \mathbf {I}_5. \end{aligned}$$

(4.13b)

Pre-multiplying and post-multiplying both sides of (4.11) by $(\partial _{\varvec{\eta }}{\varvec{\pi }})^+$ and $((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T})^+,$ respectively, and invoking (4.13a) and (4.13b), we get

$$\begin{aligned} ((\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} + \lambda \mathbf {W}_{\varvec{\eta }})^{-1}&= (\partial _{\varvec{\eta }}{\varvec{\pi }})^+ ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}}+\lambda \mathbf {I}_6)^{-1} \\&\times ((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T})^+. \end{aligned}$$

Now the critical ingredient of our argument is the equality

$$\begin{aligned}&\!\!\!(\partial _{\varvec{\eta }}{\varvec{\pi }})^+ ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6)^{-1} ((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T})^+\nonumber \\&\quad =\big ((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6) \partial _{\varvec{\eta }}{\varvec{\pi }}\big )^{-1}. \end{aligned}$$

(4.14)

We defer the rather lengthy proof of this equality to Appendix 1. Assuming (4.14) for now, we find that

$$\begin{aligned} ((\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} + \lambda \mathbf {W}_{\varvec{\eta }})^{-1} = \big ((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6) \partial _{\varvec{\eta }}{\varvec{\pi }}\big )^{-1} \end{aligned}$$

and further

$$\begin{aligned} (\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} + \lambda \mathbf {W}_{\varvec{\eta }}&= (\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6) \partial _{\varvec{\eta }}{\varvec{\pi }}\\&= (\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} + \lambda (\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}\partial _{\varvec{\eta }}{\varvec{\pi }}. \end{aligned}$$

Hence, immediately, $\mathbf {W}_{\varvec{\eta }} = (\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}\partial _{\varvec{\eta }}{\varvec{\pi }},$ as desired.

5 Two Forms of Estimate Covariances

To be truly useful, an estimate of an ellipse must be accompanied by a reliable measure of its uncertainty characterising the dispersion of values that the estimate could reasonably attain. A fundamental advantage of the AML cost function over the ML cost function is that the former allows one to obtain, in a relatively straightforward way, a quantitative measure of uncertainty for any estimate related to that function. The measure takes the form of a covariance matrix. Here we present two closely related expressions for the covariance matrix of an AML estimate of an ellipse. The first of these concerns the case that the estimate is expressed in terms of the familiar algebraic parameters $\varvec{\theta }$. The second concerns the case that the estimate is represented via the natural geometric parameters of an ellipse, namely centre co-ordinates, semi-major and semi-minor axes, and orientation.

5.1 Covariance of the Algebraically Parametrised Estimate

Any effort to assess the uncertainty of an estimate like $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ has to involve assumptions as to the nature of noise in the data. The assumption behind the formula for the covariance matrix of $\widehat{\varvec{\theta }}_{\mathrm {AML}}$, $\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}$, is the same as that underlying the development of $J_{\mathrm {AML}}$, namely that the observed data points $\mathbf {x}_1, \dots , \mathbf {x}_N$ are noisy versions of noise-free data points, with the noise in each $\mathbf {x}_n$ independent and Gaussian, of zero mean and covariance matrix $\varvec{\varLambda }_{\mathbf {x}_n}^{}$.

Given an $m \times m$ matrix $\mathbf {A}$ and a positive integer $r$ no greater than $m$, we denote by $\mathbf {A}_r$ the $r$ -truncated SVD of $\mathbf {A}$ and by $\mathbf {A}_r^+$ the r-truncated pseudo-inverse of $\mathbf {A}$. These are defined as follows. If $\mathbf {A} = \mathbf {U} \mathbf {D} \mathbf {V}^\mathsf {T}$ is the singular value decomposition (SVD) of $\mathbf {A}$, with $\mathbf {D} = {{\mathrm{\mathrm {diag}}}}(d_1, \dots , d_m)$, then $\mathbf {A}_r = \mathbf {U} \mathbf {D}_r \mathbf {V}^\mathsf {T}$ with $\mathbf {D}_r = {{\mathrm{\mathrm {diag}}}}(d_1, \dots , d_r, 0, \dots , 0)$, and $\mathbf {A}_r^+ = \mathbf {V} \mathbf {D}_r^+ \mathbf {U}^\mathsf {T}$ with $\mathbf {D}_r^+ = {{\mathrm{\mathrm {diag}}}}(d_1^+, \dots , d_r^+, 0, \dots , 0)$, where $d_i^+ = d_i^{-1}$ when $d_i \ne 0$ and $d_i^+ = 0$ otherwise. Continuing with the preparations, we introduce the matrix

$$\begin{aligned} \mathbf {M}_{\varvec{\theta }} = \sum _{n=1}^N \frac{\mathbf {A}_n}{\varvec{\theta }^{\mathsf {T}} \mathbf {B}_n \varvec{\theta }}. \end{aligned}$$

(5.1)

We are now ready to present our first covariance matrix formula. Under the assumption that $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ is normalised, $\Vert \widehat{\varvec{\theta }}_{\mathrm {AML}}\Vert = 1$, the covariance matrix of $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ is given by

$$\begin{aligned} \varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{} = \mathbf {P}_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^\perp (\mathbf {M}_{\widehat{\varvec{\theta }}_{\mathrm {AML}}})^+_5 \mathbf {P}_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^\perp . \end{aligned}$$

(5.2)

We remark that the presence of $\mathbf {P}_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^\perp $ here makes the matrix $\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}$ singular, with $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ in the null space of $\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}$. This reflects the particular way in which the scale ambiguity of the estimates has been eliminated, namely by scaling the estimates to unit length. The details of the derivation of (5.2) are somewhat technical and are deferred to Appendix 2. For additional results related to the content of Appendix 2, the interested reader is referred to [18] and [31].

5.2 Covariance of the Geometrically Parametrised Estimate

Alongside a $\varvec{\theta }$-vector of algebraic parameters, an ellipse can be represented by a $\varvec{\xi }$-vector of geometric parameters, $\varvec{\xi }= [A, B, H, K, \tau ]^\mathsf {T}.$ Here $A$ and $B$ denote the semi-major axis and the semi-minor axis of the ellipse, $H$ and $K$ denote the $x$ and $y$ coordinates of the centre of the ellipse, and $\tau $ is the angle formed by the major axis with the positive $x$-axis. Recalling that $\Delta $ denotes the discriminant ($\Delta = b^2 - 4ac$), we let

$$\begin{aligned} \lambda _{\pm }&= \frac{1}{2}\left( a + c \mp \left( b^2+(a-c)^2\right) ^{1/2}\right) \!,\\ \psi&= bde - ae^2 - b^2f + c(4af - d^2),\\ V^{\pm }&= \left( \frac{\psi }{\lambda _{\pm }\Delta }\right) ^{1/2}, \end{aligned}$$

where $\pm $ and $\mp $ are shorthand for $+$ or $-$ that allow presentation of two expressions in one formula, with the upper $-$ of $\mp $ associated with the $+$ of $\pm $. It is a matter of a straightforward but tedious analysis to establish the following conversion rules for passing from the $\varvec{\theta }$-based to the $\varvec{\xi }$-based description of the ellipse (see [62, Sect. 4.10.2] for the starting point of the derivation of the formulae):

$$\begin{aligned}&\!\!\!A = \max (V^+,V^-), \quad B = \min (V^+,V^-),\end{aligned}$$

(5.3)

$$\begin{aligned}&\!\!\!H = \frac{2cd- be}{\Delta }, \quad K = \frac{2ae - bd}{\Delta },\end{aligned}$$

(5.4)

$$\begin{aligned}&\!\!\!\tau = {\left\{ \begin{array}{ll} \frac{1}{2}{{\mathrm{arccot}}}\left( \frac{a-c}{b}\right) &{} \text {if } b < 0, a < c\hbox { and }V^+\ge V^-,\\ \frac{\pi }{4} &{} \text {if }b < 0, a = c\hbox { and }V^+\ge V^-,\\ \frac{1}{2}{{\mathrm{arccot}}}\left( \frac{a-c}{b}\right) + \frac{\pi }{2} &{} \text {if }b < 0, a > c\hbox { and }V^+\ge V^-,\\ 0 &{} \text {if }b = 0, a < c\hbox { and }V^+\ge V^-,\\ \frac{\pi }{2} &{} \text {if }b = 0, a \ge c\hbox { and }V^+\ge V^-,\\ \frac{1}{2}{{\mathrm{arccot}}}\left( \frac{a-c}{b}\right) + \pi &{} \text {if }b > 0, a < c\hbox { and }V^+\ge V^-,\\ \frac{3\pi }{4} &{} \text {if }b > 0, a = c\hbox { and }V^+\ge V^-,\\ \frac{1}{2}{{\mathrm{arccot}}}\left( \frac{a-c}{b}\right) + \frac{\pi }{2} &{} \text {if }b > 0, a > c\hbox { and }V^+\ge V^-,\\ \frac{1}{2}{{\mathrm{arccot}}}\left( \frac{a-c}{b}\right) + \frac{\pi }{2} &{} \text {if }b < 0, a < c\hbox { and }V^+< V^-,\\ \frac{3\pi }{4} &{} \text {if }b < 0, a = c\hbox { and }V^+< V^-,\\ \frac{1}{2}{{\mathrm{arccot}}}\left( \frac{a-c}{b}\right) + \pi &{} \text {if }b < 0, a > c\hbox { and }V^+< V^-,\\ \frac{\pi }{2} &{} \text {if }b = 0, a < c\hbox { and }V^+< V^-,\\ 0 &{} \text {if }b = 0, a \ge c\hbox { and }V^+< V^-,\\ \frac{1}{2}{{\mathrm{arccot}}}\left( \frac{a-c}{b}\right) + \frac{\pi }{2} &{} \text {if }b > 0, a < c\hbox { and }V^+< V^-,\\ \frac{\pi }{4}&{} \text {if }b > 0, a = c\hbox { and }V^+< V^-,\\ \frac{1}{2}{{\mathrm{arccot}}}\left( \frac{a-c}{b}\right) &{} \text {if }b > 0, a > c\hbox { and }V^+< V^-. \end{array}\right. } \end{aligned}$$

(5.5)

We remark that the formula for $\tau $ is valid only under the assumption that the ellipse is not a circle; that is, provided the inequality $(a-c)^2 + b^2 > 0$ holds.

With the aid of (5.3)–(5.5), the geometric content of $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ can be revealed as $\varvec{\xi }(\widehat{\varvec{\theta }}_{\mathrm {AML}})$. We term $\varvec{\xi }(\widehat{\varvec{\theta }}_{\mathrm {AML}})$ the AML estimate of the geometric parameters of an ellipse and denote it by $\widehat{\varvec{\xi }}_{\mathrm {AML}}$. For consistency, we shall refer to $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ as the AML estimate of the algebraic parameters of an ellipse. The rule of covariance propagation readily permits finding the covariance matrix of the AML estimate of the geometric parameters of an ellipse from the covariance matrix of the AML estimate of the algebraic parameters. Assuming that $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ is normed to unity, we have

$$\begin{aligned} \varvec{\varLambda }_{\widehat{\varvec{\xi }}_{\mathrm {AML}}}^{} = [\partial _{\varvec{\theta }}{\varvec{\xi }}]_{\varvec{\theta }= \widehat{\varvec{\theta }}_{\mathrm {AML}}}\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{} [(\partial _{\varvec{\theta }}{\varvec{\xi }})]_{\varvec{\theta }= \widehat{\varvec{\theta }}_{\mathrm {AML}}}^\mathsf {T}, \end{aligned}$$

(5.6)

The $5 \times 6$ Jacobian matrix $\partial _{\varvec{\theta }}{\varvec{\xi }} = [\partial _{\varvec{\theta }}{A}^\mathsf {T}, \partial _{\varvec{\theta }}{B}^\mathsf {T}, \partial _{\varvec{\theta }}{H}^\mathsf {T}, \partial _{\varvec{\theta }}{K}^\mathsf {T}, \partial _{\varvec{\theta }}{\tau }^\mathsf {T}]^\mathsf {T}$ can be specified explicitly as follows. The first two rows are given by

$$\begin{aligned} \partial _{\varvec{\theta }}{A}&= {\left\{ \begin{array}{ll} \partial _{\varvec{\theta }}{V^+} &{} \text {if } A = V^+, \\ \partial _{\varvec{\theta }}{V^-} &{} \text {if } A = V^-, \end{array}\right. } \end{aligned}$$

(5.7)

$$\begin{aligned} \partial _{\varvec{\theta }}{B}&= {\left\{ \begin{array}{ll} \partial _{\varvec{\theta }}{V^+} &{} \text {if } B = V^+, \\ \partial _{\varvec{\theta }}{V^-} &{} \text {if } B = V^-, \end{array}\right. } \end{aligned}$$

(5.8)

where $\partial _{\varvec{\theta }}{V^{\pm }} = [\partial _{a}{V^{\pm }}, \partial _{b}{V^{\pm }}, \partial _{c}{V^{\pm }}, \partial _{d}{V^{\pm }}, \partial _{e}{V^{\pm }}, \partial _{f}{V^{\pm }}]$ is given by

$$\begin{aligned} \partial _{a}{V^{\pm }}&= \frac{1}{2\lambda _{\pm }\Delta } \left( \frac{\psi }{\lambda _{\pm }\Delta }\right) ^{-1/2} \left[ 4cf - e^2 + 4\Delta ^{-1} c \psi \right. \nonumber \\&\left. \quad - \frac{\psi }{2\lambda _{\pm }} \left( 1 \pm \frac{c - a}{\left( (a - c)^2+ b^2\right) ^{1/2}} \right) \right] ,\end{aligned}$$

(5.9)

$$\begin{aligned} \partial _{b}{V^{\pm }}&= \frac{1}{2\lambda _{\pm }\Delta } \left( \frac{\psi }{\lambda _{\pm }\Delta }\right) ^{-1/2} \left[ de - 2bf - 2\Delta ^{-1} b \psi \right. \nonumber \\&\left. \quad \pm \frac{b \psi }{2\lambda _{\pm }\left( (a - c)^2 + b^2\right) ^{1/2}} \right] , \end{aligned}$$

(5.10)

$$\begin{aligned} \partial _{c}{V^{\pm }}&= \frac{1}{2\lambda _{\pm }\Delta } \left( \frac{\psi }{\lambda _{\pm }\Delta }\right) ^{-1/2} \left[ 4af - d^2 + 4\Delta ^{-1} a \psi \right. \nonumber \\&\left. \quad - \frac{\psi }{2\lambda _{\pm }} \left( 1 \pm \frac{a - c}{\left( (a - c)^2+ b^2\right) ^{1/2}} \right) \right] ,\end{aligned}$$

(5.11)

$$\begin{aligned} \partial _{d}{V^{\pm }}&= \left( \frac{\psi }{\lambda _{\pm }\Delta }\right) ^{1/2} \frac{be - 2cd}{2\psi }, \end{aligned}$$

(5.12)

$$\begin{aligned} \partial _{e}{V^{\pm }}&= \left( \frac{\psi }{\lambda _{\pm }\Delta }\right) ^{1/2} \frac{bd - 2ae}{2\psi },\end{aligned}$$

(5.13)

$$\begin{aligned} \partial _{f}{V^{\pm }}&= - \frac{1}{2\lambda _{\pm }} \left( \frac{\psi }{\lambda _{\pm }\Delta }\right) ^{-1/2}. \end{aligned}$$

(5.14)

The remaining three rows are given by

$$\begin{aligned} \partial _{\varvec{\theta }}{H}&= \left[ \frac{4c(2cd-be)}{\Delta ^2}, \frac{b^2e + 4ace - 4bcd}{\Delta ^2}, \right. \nonumber \\&\quad \left. \frac{2b(bd-2ae)}{\Delta ^2}, \frac{2c}{\Delta }, - \frac{b}{\Delta }, 0 \right] ,\end{aligned}$$

(5.15)

$$\begin{aligned} \partial _{\varvec{\theta }}{K}&= \left[ \frac{2b(be - 2cd)}{\Delta ^2}, \frac{b^2d+4acd-4abe}{\Delta ^2}, \right. \nonumber \\&\quad \left. \frac{4a(2ae - bd)}{\Delta ^2}, - \frac{b}{\Delta }, \frac{2a}{\Delta }, 0 \right] ,\end{aligned}$$

(5.16)

$$\begin{aligned} \partial _{\varvec{\theta }}{\tau }&= \left[ -\frac{b}{2\left( b^2+(a-c)^2\right) }, \frac{a-c}{2\left( b^2+(a-c)^2\right) }, \right. \nonumber \\&\quad \left. \frac{b}{2\left( b^2+(a-c)^2\right) }, 0 ,0 ,0 \right] . \end{aligned}$$

(5.17)

6 Confidence Regions

The reliability of an AML estimate of an ellipse can be suggestively expressed using the concept of a confidence region. It is standard to build a confidence region of a parameter vector estimate as a portion of the parameter space that contains the true parameter vector with a given high probability. The parameter space for all ellipses is five dimensional and as such is difficult to grasp. Here we propose a more visually appealing form of ellipse-specific confidence region, namely a confidence region in the plane. Such a region is meant to cover the in-plane locus of the true ellipse with a given high probability.

The probability statement associated with a confidence region requires some explanation if it is not to be misinterpreted. The point is that the statement is not meant to imply that every confidence region contains the true parameters with a specified high probability. Rather, the probability expresses how frequently a confidence region can be expected to contain the true parameters in the long-run process of taking random data points and obtaining regions.

The first to consider planar confidence regions for ellipse fits was Porrill [46]. Our approach is inspired by Scheffé’s $S$-method for constructing simultaneous confidence bands for linear regression [51], [34, Sect. 9.4–5]. The starting point for the construction is the observation that when $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ is viewed as a multivariate normally distributed random vector,

$$\begin{aligned} \widehat{\varvec{\theta }}_{\mathrm {AML}}\sim N(\varvec{\theta }_*, \varvec{\varLambda }_{\varvec{\theta }_*}^{}), \end{aligned}$$

where $\varvec{\theta }_*$ is the parameter vector of the true ellipse and $\varvec{\varLambda }_{\varvec{\theta }_*}^{}$ is a covariance matrix, the scalar random variable $\widehat{\varvec{\theta }}_{\mathrm {AML}}^T \mathbf {u}(\mathbf {x})$ is normally distributed with variance $\mathbf {u}(\mathbf {x})^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{} \mathbf {u}(\mathbf {x})$ for every point $\mathbf {x}$ on the locus $\mathcal {E}_{\varvec{\theta }_*} = \{\mathbf {x} \in \mathbb {R}^2 \mid \varvec{\theta }_*^T \mathbf {u}(\mathbf {x}) = 0\}$ of the true ellipse. The observation is based on the fact that $\widehat{\varvec{\theta }}_{\mathrm {AML}}^T \mathbf {u}(\mathbf {x}) = (\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*)^T \mathbf {u}(\mathbf {x})$ whenever $\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}$ and the fact that by the rule of covariance propagation $(\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*)^T \mathbf {u}(\mathbf {x})$ has variance $\mathbf {u}(\mathbf {x})^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{} \mathbf {u}(\mathbf {x})$. Consequently, under the assumption that $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ is an unbiased estimate of $\varvec{\theta }_*$,^{Footnote 1}

$$\begin{aligned} z_{\mathbf {x}} = \frac{(\widehat{\varvec{\theta }}_{\mathrm {AML}}^T \mathbf {u}(\mathbf {x}))^2}{\mathbf {u}(\mathbf {x})^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{} \mathbf {u}(\mathbf {x})} \end{aligned}$$

is a squared normal random variable for every $\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}$. Each $z_{\mathbf {x}}$, insofar as $\mathbf {x}$ belongs to $\mathcal {E}_{\varvec{\theta }_*}$, attains large values with less probability than small values, with the probability of any particular set of values regarded as large or small being independent of $\mathbf {x}$. This suggests using the $z_{\mathbf {x}}$ as building blocks in the construction of a confidence region in the plane. Since the covariance $\varvec{\varLambda }_{\varvec{\theta }_*}^{}$ is unknown, the $z_{\mathbf {x}}$ do not have observable realisations and, for the sake of construction, have to be replaced with these variables’ observable variants

$$\begin{aligned} \hat{z}_{\mathbf {x}} = \frac{(\widehat{\varvec{\theta }}_{\mathrm {AML}}^T \mathbf {u}(\mathbf {x}))^2}{\mathbf {u}(\mathbf {x})^\mathsf {T}\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{} \mathbf {u}(\mathbf {x})}, \end{aligned}$$

(6.1)

where the covariance estimate $\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}$ serves as a natural replacement for $\varvec{\varLambda }_{\varvec{\theta }_*}^{}$. Again, large observed values of $\hat{z}_{\mathbf {x}}$ are less plausible than small observed values as long as $\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}$. It is thus natural to consider confidence regions for $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ in the form

$$\begin{aligned} \left\{ \mathbf {x} \in \mathbb {R}^2 \mid \hat{z}_{\mathbf {x}} \le c \right\} , \end{aligned}$$

where $c$ is a positive constant. Ideally, for a confidence region at (confidence) level $1 - \alpha $, we should choose $c$ such that

$$\begin{aligned} \mathsf {P}\left( z_{\mathbf {x}} \le c \ \text {for all} \ \mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*} \right) = \mathsf {P}\left( \sup _{\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}} z_{\mathbf {x}} \le c \right) = 1 - \alpha , \end{aligned}$$

where $\mathsf {P}(A)$ denotes the probability of the event $A$. But the distribution of $\sup _{\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}} z_{\mathbf {x}}$ is not easy to determine, so as a second best choice we shall replace $\sup _{\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}} z_{\mathbf {x}}$ by a random upper bound whose distribution can be readily calculated. Proceeding to the specifics, we may assume, in line with fact that the parameter space of all ellipses is five-dimensional, that $(\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*)^\mathsf {T}\varvec{\theta }_*= 0,$ or equivalently, $\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*= \mathbf {P}_{\varvec{\theta }_*}^\perp (\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*).$ This then leads to

$$\begin{aligned} \mathbf {P}_{\varvec{\theta }_*}^\perp \varvec{\varLambda }_{\varvec{\theta }_*}^{} = \varvec{\varLambda }_{\varvec{\theta }_*}^{} \mathbf {P}_{\varvec{\theta }_*}^\perp = \varvec{\varLambda }_{\varvec{\theta }_*}^{}. \end{aligned}$$

(6.2)

Now, if $\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}$, then

$$\begin{aligned} \widehat{\varvec{\theta }}_{\mathrm {AML}}^\mathsf {T}\mathbf {u}(\mathbf {x}) = \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) ^\mathsf {T}\mathbf {u}(\mathbf {x}) = (\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*)^\mathsf {T}\mathbf {P}_{\varvec{\theta }_*}^\perp \mathbf {u}(\mathbf {x}). \end{aligned}$$

But it follows from (6.2) that $(\varvec{\varLambda }_{\varvec{\theta }_*}^{+})^{1/2} \varvec{\varLambda }_{\varvec{\theta }_*}^{1/2} = \mathbf {P}_{\varvec{\theta }_*}^\perp ,$ so

$$\begin{aligned} \widehat{\varvec{\theta }}_{\mathrm {AML}}^\mathsf {T}\mathbf {u}(\mathbf {x})&= \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) ^\mathsf {T}(\varvec{\varLambda }_{\varvec{\theta }_*}^{+})^{1/2} \varvec{\varLambda }_{\varvec{\theta }_*}^{1/2} \mathbf {u}(\mathbf {x})\\&= \left( \left( \varvec{\varLambda }_{\varvec{\theta }_*}^{+}\right) ^{1/2} \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) \right) ^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{1/2} \mathbf {u}(\mathbf {x}). \end{aligned}$$

By the Cauchy–Bunyakovsky–Schwarz inequality,

$$\begin{aligned} (\widehat{\varvec{\theta }}_{\mathrm {AML}}^\mathsf {T}\mathbf {u}(\mathbf {x}))^2 \le \Vert (\varvec{\varLambda }_{\varvec{\theta }_*}^{+})^{1/2} (\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*) \Vert ^2 \Vert \varvec{\varLambda }_{\varvec{\theta }_*}^{1/2} \mathbf {u}(\mathbf {x}) \Vert ^2. \end{aligned}$$

Also

$$\begin{aligned}&\left\| (\varvec{\varLambda }_{\varvec{\theta }_*}^{+})^{1/2} (\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*) \right\| ^2 \\&\quad = \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) ^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{+} \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) \end{aligned}$$

and

$$\begin{aligned} \left\| \varvec{\varLambda }_{\varvec{\theta }_*}^{1/2} \mathbf {u}(\mathbf {x}) \right\| ^2 = \mathbf {u}(\mathbf {x})^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{} \mathbf {u}(\mathbf {x}). \end{aligned}$$

Hence

$$\begin{aligned} z_{\mathbf {x}} \le \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) ^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{+} \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) . \end{aligned}$$

Since $\mathbf {x}$ is an arbitrary member of $\mathcal {E}_{\varvec{\theta }_*}$, we have

$$\begin{aligned} \sup _{\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}} z_{\mathbf {x}} \le \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) ^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{+} \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) . \end{aligned}$$

(6.3)

Now the random variable $(\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*)^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{+} (\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*)$ has approximately a chi-squared distribution with $5$ degree of freedom. Let $\chi ^2_{5, \alpha }$ denote the $100(1 - \alpha )\%$ percentile of the $\chi ^2$ distribution with $5$ degrees of freedom, characterised by the relation $\mathsf {P}\left( \chi ^2 \le \chi ^2_{5,\alpha }\right) = 1 - \alpha .$ Inequality (6.3) guarantees that

$$\begin{aligned} \mathsf {P}\left( \sup _{\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}} z_{\mathbf {x}} \le \chi ^2_{5,\alpha } \right) \ge 1 - \alpha \end{aligned}$$

Substituting $\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}$ for $\varvec{\varLambda }_{\varvec{\theta }_*}^{}$, we also approximately have

$$\begin{aligned} \mathsf {P}\left( \sup _{\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}} \hat{z}_{\mathbf {x}} \le \chi ^2_{5,\alpha } \right) \ge 1 - \alpha . \end{aligned}$$

This allows an approximate confidence region at level $1 - \alpha $ for $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ to be taken as

$$\begin{aligned} \Omega _{\alpha } = \left\{ \mathbf {x} \in \mathbb {R}^2 \mid \hat{z}_{\mathbf {x}} \le \chi ^2_{5,\alpha }\right\} . \end{aligned}$$

(6.4)

We finally point out that if $\alpha $ is set to the standard conventional value of $0.05$, then $\chi ^2_{5, \alpha } = 11.07$.

7 Data Normalisation

For reason of numerical stability, it is important to incorporate data normalisation, or pre-conditioning, in the process of calculating AML estimates and their covariances. One popular form of data normalisation is due to Hartley [12, 14, 23] and can be formulated with the aid of a data-dependent $3 \times 3$ matrix $\mathbf {T}$ given by

$$\begin{aligned} \mathbf {T} = \begin{bmatrix} s^{-1}&0&-s^{-1} \overline{m}_1 \\ 0&s^{-1}&-s^{-1} \overline{m}_2 \\ 0&0&1 \end{bmatrix}, \end{aligned}$$

where

$$\begin{aligned} \overline{m}_i = \frac{1}{N} \sum _{n=1}^N m_{n,i} \quad (i = 1,2) \end{aligned}$$

and

$$\begin{aligned} s = \left( \frac{1}{2N} \sum _{n=1}^N (m_{n,1} - \overline{m}_1)^2 + (m_{n,2} - \overline{m}_2)^2 \right) ^{1/2}. \end{aligned}$$

If $\mathbf {m} = [m_1, m_2, 1]^\mathsf {T}$ represents a data point in the original, un-normalised coordinates, then $\tilde{\mathbf {m}} = [\tilde{m}_1, \tilde{m}_2, 1]^\mathsf {T}$ defined by $\tilde{\mathbf {m}} = \mathbf {T} \mathbf {m}$ represents the same point in the normalised coordinates. The form of $\mathbf {T}$ reflects the fact that the transformation $\mathbf {m} \mapsto \tilde{\mathbf {m}}$ scales down the data set to a unit box with centre at the origin. In the normalised coordinates, Eq. (2.2) can be equivalently written as $\tilde{\varvec{\theta }}^\mathsf {T}\mathbf {u}(\tilde{\mathbf {x}}) = 0$ with $\tilde{\mathbf {x}} = [\tilde{m}_1, \tilde{m}_2]^\mathsf {T}$ and $\tilde{\varvec{\theta }}$ related to $\varvec{\theta }$ via the relation

$$\begin{aligned} \tilde{\varvec{\theta }}= \mathbf {E}^{-1} \mathbf {P}_{(34)} \mathbf {D}_3^+ (\mathbf {T} \otimes \mathbf {T})^{-\mathsf {T}} \mathbf {D}_3 \mathbf {P}_{(34)} \mathbf {E}\varvec{\theta }. \end{aligned}$$

(7.1)

Here, $\mathbf {D}_3$ is the $9 \times 6$ duplication matrix given by

$$\begin{aligned} \mathbf {D}_3 = \begin{bmatrix} 1&0&0&0&0&0 \\ 0&1&0&0&0&0 \\ 0&0&1&0&0&0 \\ 0&1&0&0&0&0 \\ 0&0&0&1&0&0 \\ 0&0&0&0&1&0 \\ 0&0&1&0&0&0 \\ 0&0&0&0&1&0 \\ 0&0&0&0&0&1 \end{bmatrix}, \end{aligned}$$

$\mathbf {P}_{(34)}$ is the permutation matrix for interchanging the 3rd and 4th entries of length-$6$ vectors, given by

$$\begin{aligned} \mathbf {P}_{(34)} = {{\mathrm{\mathrm {diag}}}}(0,1,0) \otimes \begin{bmatrix} 0&1 \\ 1&0 \end{bmatrix} + {{\mathrm{\mathrm {diag}}}}(1,0,1) \otimes \begin{bmatrix} 1&0 \\ 0&1 \end{bmatrix}, \end{aligned}$$

and $\mathbf {E} = {{\mathrm{\mathrm {diag}}}}(1, 2^{-1}, 1, 2^{-1}, 2^{-1}, 1).$ The computation of $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ involving data normalisation proceeds in two steps. First, an AML estimate of $\tilde{\varvec{\theta }}$ relative to the normalised coordinates, $\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}$, is extracted by minimising the AML cost function

$$\begin{aligned} \tilde{J}_{\mathrm {AML}}(\tilde{\varvec{\theta }}) = \sum _{n=1}^N \frac{\tilde{\varvec{\theta }}^\mathsf {T}\tilde{\mathbf {A}}_n \tilde{\varvec{\theta }}}{\tilde{\varvec{\theta }}^\mathsf {T}\tilde{\mathbf {B}}_n \tilde{\varvec{\theta }}} \end{aligned}$$

where, for each $n = 1, \dots , N$, $\tilde{\mathbf {A}}_n = \mathbf {u}(\tilde{\mathbf {x}}_n) \mathbf {u}(\tilde{\mathbf {x}}_n)^\mathsf {T},$ $\tilde{\mathbf {B}}_n = \partial _{\mathbf {x}}\mathbf {u}(\tilde{\mathbf {x}}_n) \varvec{\varLambda }_{\tilde{\mathbf {x}}_n}^{} \partial _{\tilde{\mathbf {x}}}\mathbf {u}(\tilde{\mathbf {x}}_n)^\mathsf {T},$ and $\varvec{\varLambda }_{\tilde{\mathbf {x}}_n}^{}$ is the result of the propagation of $\varvec{\varLambda }_{\mathbf {x}_n}^{}$ to the normalised coordinates, namely

$$\begin{aligned} \begin{bmatrix} \varvec{\varLambda }_{\tilde{\mathbf {x}}_n}^{}&\mathbf {0} \\ \mathbf {0}^{\mathsf {T}}&0 \end{bmatrix} = \mathbf {T} \begin{bmatrix} \varvec{\varLambda }_{\mathbf {x}_n}^{}&\mathbf {0} \\ \mathbf {0}^{\mathsf {T}}&0 \end{bmatrix} \mathbf {T}^{\mathsf {T}}. \end{aligned}$$

Next, $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ is obtained by the un-normalisation procedure captured in the formula

$$\begin{aligned} \widehat{\varvec{\theta }}_{\mathrm {AML}}= \mathbf {E}^{-1} \mathbf {P}_{(34)} \mathbf {D}_3^+ (\mathbf {T} \otimes \mathbf {T})^\mathsf {T}\mathbf {D}_3 \mathbf {P}_{(34)} \mathbf {E} \widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}. \end{aligned}$$

(7.2)

This formula is an instance of the inverse equation to (7.1) and is an immediate consequence of the identity $J_{\mathrm {AML}}(\varvec{\theta }) = \tilde{J}_{\mathrm {AML}}(\tilde{\varvec{\theta }}).$ Notice that $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ given in (7.2) does not necessarily have unit length.

The computation of $\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}$ involving data normalisation starts with the computation of $\varvec{\varLambda }_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^{}$ which is nothing else but an application of (5.2) with $\mathbf {P}_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^\perp $ substituted for $\mathbf {P}_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^\perp $ and $\mathbf {M}_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}$ substituted for $\mathbf {M}_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}$, where, of course,

$$\begin{aligned} \mathbf {M}_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}} = \sum _{n=1}^N \frac{\tilde{\mathbf {A}}_n}{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}^{\mathsf {T}} \tilde{\mathbf {B}}_n \widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}. \end{aligned}$$

Here $\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}$ is assumed to be of unit length. Once $\varvec{\varLambda }_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^{}$ is found, $\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}$ is given by

$$\begin{aligned} \varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{} = \Vert \widehat{\varvec{\theta }}_{\mathrm {AML}}\Vert ^{-2} \mathbf {P}_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^\perp \mathbf {F} \varvec{\varLambda }_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^{} \mathbf {F}^\mathsf {T}\mathbf {P}_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^\perp , \end{aligned}$$

(7.3)

where $\mathbf {F}$ is short for $\mathbf {E}^{-1} \mathbf {P}_{(34)} \mathbf {D}_3^+ (\mathbf {T} \otimes \mathbf {T})^\mathsf {T}\mathbf {D}_3 \mathbf {P}_{(34)} \mathbf {E}$ and $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ on the right-hand side is given by (7.2)—that is, $\widehat{\varvec{\theta }}_{\mathrm {AML}} = \mathbf {F} \widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}$. The above expression for $\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}$ is the result of an application of the rule of covariance propagation to $\varvec{\varLambda }_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^{}$ and the mapping $\tilde{\varvec{\theta }}\mapsto \mathbf {F} \tilde{\varvec{\theta }}/ \Vert \mathbf {F} \tilde{\varvec{\theta }}\Vert $.

Similarly, the data normalisation based computation of $\varvec{\varLambda }_{\widehat{\varvec{\xi }}_{\mathrm {AML}}}^{}$ starts with the computation of $\varvec{\varLambda }_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^{}$. The matrix $\varvec{\varLambda }_{\widehat{\tilde{\varvec{\xi }}}_{\mathrm {AML}}}^{}$ is next retrieved by applying the formula

$$\begin{aligned} \varvec{\varLambda }_{\widehat{\tilde{\varvec{\xi }}}_{\mathrm {AML}}}^{} = [\partial _{\tilde{\varvec{\theta }}}{\tilde{\varvec{\xi }}}]_{\tilde{\varvec{\theta }}= \widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}} \varvec{\varLambda }_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^{} [(\partial _{\tilde{\varvec{\theta }}}{\tilde{\varvec{\xi }}})]_{\tilde{\varvec{\theta }}= \widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^\mathsf {T}, \end{aligned}$$

which is a variant of (5.2) relative to the normalised coordinates. Finally, $\varvec{\varLambda }_{\widehat{\varvec{\xi }}_{\mathrm {AML}}}^{}$ is given by the expression

$$\begin{aligned} \varvec{\varLambda }_{\widehat{\varvec{\xi }}_{\mathrm {AML}}}^{} = \mathbf {S}\varvec{\varLambda }_{\widehat{\tilde{\varvec{\xi }}}_{\mathrm {AML}}}^{}\mathbf {S}^\mathsf {T}, \quad \mathbf {S} = {{\mathrm{\mathrm {diag}}}}(s,s,s,s,1), \end{aligned}$$

which encodes the covariance propagation for geometric parameters from the normalised to the un-normalised coordinates.

The procedures described above involve an arbitrary set of covariance matrices $\varvec{\varLambda }_{\mathbf {x}_n}^{}$. A common assumption in practice is that the noise in the data set is homogeneous isotropic Gaussian, so that $\varvec{\varLambda }_{\mathbf {x}_n}^{} = \sigma ^2 \mathbf {I}_2$ for each $n = 1, \dots , N$, where $\sigma $ is a common standard deviation. The value of $\sigma $ is generally not known a priori. This value can, however, be learned from the data. The key is the fact that two variants of $J_{\mathrm {AML}}$, one based on the covariances $\varvec{\varLambda }_{\mathbf {x}_n}^{} = \sigma ^2 \mathbf {I}_2$ and one based on the covariances $\varvec{\varLambda }_{\mathbf {x}_n}^{} = \mathbf {I}_2$, differ only by a multiplicative constant and, as a result, lead to a common value of $\widehat{\varvec{\theta }}_{\mathrm {AML}}$. In other words, $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ can be obtained without knowing the actual value of $\sigma $, by minimising $J_{\mathrm {AML}}$ under the default assumption that $\varvec{\varLambda }_{\mathbf {x}_n}^{} = \mathbf {I}_2$. Once this is done, one can take

$$\begin{aligned} \widehat{\sigma } = \sqrt{\frac{J_{\mathrm {AML}}(\widehat{\varvec{\theta }}_{\mathrm {AML}})}{N - 5}} \end{aligned}$$

for an estimate of $\sigma $ (cf. [27, Sect. 7.1.4]). For all practical purposes, $\sigma $ may safely be identified with $\widehat{\sigma }$. With this in effect, the $\varvec{\varLambda }_{\mathbf {x}_n}^{}$ may be assumed to be fully known.

8 Normalised Confidence Regions

Data normalisation can also be used in forming confidence regions. To generate a confidence region based on normalised data, we first define

$$\begin{aligned} \hat{\tilde{z}}_{\tilde{\mathbf {x}}} = \frac{(\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}^T \tilde{\mathbf {u}}(\tilde{\mathbf {x}}))^2}{\tilde{\mathbf {u}}(\tilde{\mathbf {x}})^\mathsf {T}\varvec{\varLambda }_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^{} \tilde{\mathbf {u}}(\tilde{\mathbf {x}})}, \end{aligned}$$

where $\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}$ is assumed to have unit norm. We then let

$$\begin{aligned} \tilde{\Delta }_{\alpha } = \left\{ \tilde{\mathbf {x}} \in \mathbb {R}^2 \mid \hat{\tilde{z}}_{\tilde{\mathbf {x}}} \le \chi ^2_{5,\alpha }\right\} \end{aligned}$$

and finally take

$$\begin{aligned} \Delta _{\alpha } = \left\{ \mathbf {x} \in \mathbb {R}^2 \mid \tilde{\mathbf {x}} \in \tilde{\Delta }_{\alpha } \right\} \end{aligned}$$

for an approximate confidence region at level $1 - \alpha $ for $\widehat{\varvec{\theta }}_{\mathrm {AML}}$. Equivalently, $\Delta _{\alpha }$ can be defined as

$$\begin{aligned} \Delta _{\alpha } = \left\{ \mathbf {x} \in \mathbb {R}^2 \mid \hat{z}_{\mathbf {x}} \le \chi ^2_{5,\alpha } \right\} \end{aligned}$$

provided that $\hat{z}_{\mathbf {x}}$ is taken with $\widehat{\varvec{\theta }}_{\mathrm {AML}}= \mathbf {F} \widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}$ and $\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{} = \mathbf {F} \varvec{\varLambda }_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^{} \mathbf {F}^\mathsf {T},$ ^{Footnote 2} with $\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}$ remaining normalised; this follows from the fact that with $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ and $\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}$ as above, we have $\hat{z}_{\mathbf {x}} = \hat{\tilde{z}}_{\tilde{\mathbf {x}}}.$ Now, as a rule, the set $\Delta _{\alpha }$ will be different from the set $\Omega _{\alpha }$, specified in (6.4), representing an approximate confidence region at level $1 - \alpha $ for $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ based on raw data. While typically the difference between confidence regions constructed using normalised versus raw data is rather small, normalised confidence regions tend to be more visually pleasing.

9 Algorithms

Our overall optimisation scheme is summarised in Algorithms 1 and 2. Algorithms 3 and 4 give standalone procedures for computing the covariances of the algebraic and geometric parameters of an ellipse estimate obtained by applying our scheme. It should be noted that an integral part of Algorithm 1 are conditions for operation termination. We shall elaborate on these conditions in Sect. 10.2.3 while discussing details of our implementation of Algorithm 1.

10 Performance Analysis

In order to characterise the accuracy and computational efficiency of our method, we conducted numerous simulations under diverse experimental conditions. We manipulated the eccentricity of ellipses, the number of data points sampled from ellipses, the level of noise that was applied to data points, as well as the length of ellipse segments from which data points were generated. Taking into account the length of ellipse segments allowed us to determine how an ellipse estimate is influenced by the proportion of the length of an ellipse segment to the length of the entire underlying ellipse. Based on experiments with changing this proportion, we learned to distinguish between ill-posed and well-posed ellipse fitting problems. We now consider an ellipse fitting problem as ill-posed when the length of a segment of an ellipse is less than half of the ellipse’s perimeter, and as well-posed when the length of a segment of an ellipse is greater than half of the ellipse’s perimeter.

In evaluating the quality of ellipse estimates, we distinguished between training data and testing data. Training data consists of noise perturbed points sampled from an ellipse possibly reduced to a small fragment. In contrast, testing data consists of noiseless data points uniformly sampled from an entire ellipse. Any particular ellipse in our experiments was estimated using training data, and the ellipse’s estimates were evaluated using corresponding testing data. By distinguishing between training data and testing data, we were explicitly interpreting ellipse fitting as a prediction problem. This means that we were not interested in finding the best ellipse that fits a given training data per se, but expected the estimated ellipse to be representative of the true ellipse in all regions, not just in the region from which the training data was sampled.

10.1 Procedure for Generating Synthetic Data and Evaluating Ellipse Estimates

We now summarise the procedure for generating training data and testing data, and explain in more detail how the quality of ellipse estimates was characterised.

10.1.1 Data Generation

The first step of the data generation procedure is to determine an ellipse in standard form $x^2/a^2 + y^2/b^2 = 1$ by choosing first a random value of $a$ in the interval $(101 , 200)$ and next a random value of $b$ in the interval $(100,a)$. Such a choice ensures that the inequality $a > b$ holds and, consequently, that $a$ is the length of the semi-major axis and $b$ is the length of the semi-minor axis of the ellipse in question.

Once an initial ellipse has been selected, training and testing data-sets are constructed using the following steps:

1.
Sample $K$ points equidistantly along a segment of the ellipse with a specified length. The length of the segment is a fraction of the ellipse perimeter, and is measured clockwise starting from the positive $y$-axis. For example, the segment length can be half of the ellipse perimeter (see Fig. 1a). The $K$ points will serve as a basis for the training data.
2.
Sample $N$ points equidistantly along the entire perimeter of the ellipse. These $N$ points will serve as a basis for the testing data.
3.
Rotate and translate the ellipse together with the points generated in the first two steps by a random angle and random offset (see Fig. 1d).
4.
Add zero-mean Gaussian noise with desired standard deviation to the rotated $K$ points (see Fig. 1e). Take these noise-perturbed data points as the training data (serving as input to the ellipse estimation methods).
5.
Take the rotated set of $N$ points as the testing data.

10.1.2 Estimates Evaluation

After an ellipse has been estimated on the training data with any particular estimation method, the quality of the respective estimate is evaluated on the testing data using the root-mean-square (RMS) orthogonal distance

$$\begin{aligned} \sqrt{\frac{1}{2 N}\sum _{n=1}^{N} d_{n}^2}, \end{aligned}$$

where $d_n$ denotes the orthogonal distance between the $n$th data point and the ellipse constituting the estimate (see Fig. 1f). The RMS orthogonal distance measures the geometric error of the estimate with respect to the testing data points. The process of computing the orthogonal distances $d_{n}$ is rather involved—detailed formulae can be found in [11] and [61].

10.2 Results

We compared our estimation technique with the orthogonal distance regression method and the direct ellipse fitting method, which represent the gold standard and the baseline technique for ellipse fitting, respectively. Both the orthogonal distance regression method and our proposed method were initialised with the result of the direct ellipse fitting technique. All estimation schemes operated on Hartley-normalised data points.

10.2.1 Synthetic Data

In the first set of simulations we held the noise level fixed at $\sigma = 5$ pixels, and varied the number of training data points and the length of ellipse segments from which the data points were sampled. For each combination of the number of data points and the length of an ellipse segment, we conducted 250 simulation trials and recorded the mean root-mean-square error. The mean-root-mean-square errors are displayed using two-dimensional contour plots in Fig. 2.

Two important conclusions can be drawn from the results of the first experiment:

1.
When the length of an ellipse segment is less than half of the length of the entire ellipse, the direct ellipse estimate does not improve as the number of data points is increased.
2.
The Sampson distance based ellipse fitting method and the orthogonal distance regression method yield almost indistinguishable results.

In the second set of simulations we utilised only 25 data points, but still varied the noise level and the length of ellipse segments from which data points were sampled. The results of the second simulation are summarised using two-dimensional contour plots in Fig. 3. The second experiment confirms that our Sampson distance based ellipse fitting method produces similar results to those of the gold standard orthogonal distance regression method for a variety of noise levels. The second experiment also shows that our Sampson distance based ellipse fitting imitates the orthogonal distance regression even when both the noise level and the ellipse segment length are varied. The direct ellipse fitting method does not share this desirable property, and instead yields inferior results.

10.2.2 Stability and Efficiency

For every experiment, we verified that our algorithm was indeed producing an ellipse fit by confirming that the discriminant of the estimated parameters was less than zero. We also monitored the average running time of a MATLAB implementation of our algorithm,^{Footnote 3} while we varied the number of data points as well as the fraction of the ellipse perimeter from which data points were sampled. A summary of our findings together with a comparison against the running time of the orthogonal distance regression method is presented using two-dimensional density plots in Fig. 4.

Based on the outcome of the experiment, we make the following two observations:

1.
The running time of our fast guaranteed ellipse fitting algorithm increases gradually and smoothly as the number of data points is increased, and as the problem becomes more ill-posed.
2.
The running time of the orthogonal distance regression method follows a similar trend, but does not increase gradually nor smoothly.

Our experiments indicate that the running time of the orthogonal distance regression method is much more unpredictable than the running time of our technique.

In Sect. 4.2 we mentioned that utilising the LM method in standard form to optimise our cost function may occasionally take many iterations to converge, and proposed a modification to the LM scheme. A comparison of the running time of our fast guaranteed ellipse fitting algorithm against the basic guaranteed ellipse fitting algorithm that uses the standard LM method is presented in Fig. 5. The results show that for ill-posed ellipse fitting problems, our fast guaranteed ellipse implementation is twice as fast as the basic guaranteed ellipse fitting algorithm.

10.2.3 Attributes of the Estimate

While our ellipse specific parametrisation explicitly excludes hyperbola fits, it does permit degenerate ellipses or ellipses that approach parabolas in the limit. As discussed in Sect. 2, degenerate ellipses are those for which the corresponding determinant $D$ is zero. In turn, ellipses close to parabolas are those for which the corresponding discriminant $\Delta $ is close to zero. To avoid producing degenerate ellipses, our algorithm terminates when the absolute value of the determinant of an ellipse falls below a user-specified threshold. To prevent the solution from coming too close to the parabolas, our algorithm stops when the discriminant approaches zero.

In our experiments the threshold for the determinant was set to $10^{-5}$, and the discriminant was ensured to be large enough in modulus by forcing the algorithm to terminate when $\log (\Vert \varvec{\theta }\Vert ^2/\varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta })$ rose above a threshold of $15.5$.

We label close to degenerate ellipses and ellipses that approximate parabolas as depreciated ellipses. In contrast to bona fide ellipses, depreciated ellipses are characterised by the property that only a subset of geometric ellipse parameters are estimated with reasonable certainty. Although not as informative as genuine ellipses, depreciated ellipses still provide useful knowledge. For example, an ellipse that approximates a parabola will have tremendous uncertainty associated with its semi-major axis, but the semi-minor axis and orientation of the axes may still be estimated with great precision, and this will be reflected in the covariance matrix.

In order to determine how frequently depreciated ellipses occur in typical scenarios, we conducted numerous simulations which are summarised in Table 1.

Table 1 Prevalence of ellipses, depreciated ellipses, and hyperbolas when minimising AML using the conic parametrisation ($\varvec{\theta }$) or the ellipse-specific parametrisation ($\varvec{\eta }$)

Full size table

The results of our simulations are based on 10,000 trials. For each trial an ellipse was generated by randomly selecting a length for the semi-major and semi-minor axes, while keeping the axes aligned with the Cartesian axes. Points were then sampled from the upper half, right half, and upper right quarter of the ellipse. Our experiments revealed that with small noise levels and with at least ten data points, depreciated ellipses occur on rare occasions. The most challenging situation arises when data points are sampled from the upper right quarter of an ellipse and the noise level is large. This constitutes a very ill-posed problem—neither orthogonal distance regression nor AML estimation are capable of producing an ellipse that is close to a true ellipse without recourse to some kind of regularisation. We believe that ill-posed estimation problems are best solved within a Bayesian setting, where regularisation is achieved through a suitable choice of a prior distribution over the parameter space. The design of an appropriate prior over the space of ellipses that might serve to guide the estimate away from degenerate ellipses and parabolas is a challenging problem which we intend to pursue in future work.

10.2.4 Real Data

To validate the conclusions drawn from the synthetic experiments, we also compared the ellipse fitting methods on real images. In Fig. 6, we utilised two images of a Martian moon eclipse captured by the Opportunity rover, and in Fig. 7, we used images of two of Saturn’s moons captured by the Cassini spacecraft. Our experiments confirm that the estimates produced by our fast guaranteed ellipse fitting method agree with the estimates obtained via the orthogonal distance regression.

10.2.5 Accuracy of Geometric Parameter Covariance Estimation

We performed additional simulations to test the validity of our geometric parameter covariance formulae. Our validation was based on comparing the covariance matrix given by the propagation formula (5.6) to a covariance matrix resulting from a Monte Carlo simulation. In particular, we sampled $250$ points equidistantly between 0$^\circ $ and 225$^\circ $ on the ellipse parametrised by $\varvec{\xi }= [100, 50, 250, 250, 0.7854]^\mathsf {T}$. We then added zero-mean Gaussian noise at a pre-set noise level to the data points, and produced 10,000 simulation trials. The Monte Carlo covariance matrix was computed with the aid of the formula

$$\begin{aligned} \varvec{\varLambda }_{\widehat{\varvec{\xi }}_{\mathrm {MONTE}}}^{} \!=\! \frac{1}{9995} \sum _{n = 1}^{10 000}\left( \widehat{\varvec{\xi }}_{\mathrm {AML}, {n}}\!-\!\bar{\varvec{\xi }}_{\mathrm {AML}}\right) \left( \widehat{\varvec{\xi }}_{\mathrm {AML}, {n}}\!-\!\bar{\varvec{\xi }}_{\mathrm {AML}}\right) ^\mathsf {T}, \end{aligned}$$

where $\widehat{\varvec{\xi }}_{\mathrm {AML}, {n}}$ is the geometric parameter estimate corresponding to the $n$th trial and $\bar{\varvec{\xi }}_{\mathrm {AML}}$ represents the corresponding mean geometric parameter vector. We then compared the relative error between our propagated covariance matrix and the Monte Carlo covariance matrix for each simulation trial. The relative error is defined as the absolute error between our propagated covariance matrix and the Monte Carlo covariance matrix, divided by the Frobenius norm of the Monte Carlo covariance matrix. The relative error is sensitive to very small differences. We also computed the angular error defined as the angle between vectorised and unit-normalised variants of the propagated covariance and Monte Carlo covariance matrices. The angular error is invariant to differences in scale between the two covariance matrices. The median relative and median angular errors are presented in Table 2 for various noise levels. The results show that our covariance matrix estimates are very accurate, achieving a relative error of less than $15\%$, provided that the noise level is less than $\sigma = 3$ pixels.

Table 2 Median errors between our propagated geometric parameter covariance matrix and a Monte Carlo estimate of the geometric parameter covariance matrix

Full size table

To further illustrate the practical insight that can be gleaned from the covariance matrices, we plot a 95% confidence region and report the estimated geometric parameters and corresponding parameter standard errors on four different data-sets (see Fig. 8). In accordance with theoretical expectations, the true ellipses happened to fall inside the confidence regions and the confidence regions became narrower as the number of data points was increased. While the results presented in Fig. 8 are representative of the kinds of confidence regions one would typically observe in many practical scenarios, they may not be representative of confidence regions arising when data points are sampled from a very short fragment of an ellipse (an ill-posed problem)—in that case the assumptions underlying the generation of confidence regions may not be satisfied and the confidence regions may thus be too narrow.

11 Discussion

Our experiments show that the Sampson distance is an excellent approximation to the orthogonal distance for the purpose of fitting an ellipse to data. Our results are in agreement with the findings of Kanatani and Rangarajan who report that “[it] has also been observed that the solution that minimizes the Sampson error agrees with the ML [orthogonal distance] solution up to several significant digits.” [30, p. 2204].

The utility of the Sampson distance was also acknowledged by Rosin. This author conducted a comprehensive comparison of various ellipse fitting cost functions and concluded that “...for small amounts of noise $ EOF _2$ [the Sampson distance] is the overall best approximation to use ...” [49, p. 501].

Some researchers, however, including Chernov and Ma [10, p. 299], have cautioned against replacing the orthogonal distance estimate with the Sampson distance. Their concerns are based on a list of disadvantages associated with algebraic fitting methods,^{Footnote 4} which was compiled by Anh [1]. The list includes the following items:

1.
Error definition does not comply with the measurement guidelines.
2.
Conversion of the algebraic parameters to the physical parameters (shape, size, position, and rotation parameters) is highly difficult.
3.
Fitting errors are weighted.
4.
The estimated model parameters are biased.
5.
It is very difficult to test the reliability of the estimated model parameters (particularly in terms of physical parameters).
6.
The fitting procedure sometimes ends with an unintended model feature (e.g. a hyperbola instead of an ellipse).
7.
The model parameters are not invariant to coordinate transformation (e.g. a parallel shift of the set of given points causes changes not only in position, but also in the form and rotation of the estimated model feature).

In the context of ellipse fitting, the disadvantages are not so severe. We offer the following rejoinder to some of the above criticisms (the numbers in brackets will refer to the original item numbers):

[2].
The formulae for the conversion of algebraic ellipse parameters to physical ellipse parameters are straightforward, albeit tedious, to derive—see Sect. 5.2.
[3].
The fact that fitting errors are weighted is not necessarily a disadvantage. It is through the introduction of gradient-cum-covariance weights into the algebraic distance that the Sampson distance is able to produce accurate estimates.
[4].
Bias in the estimated model parameters is not limited to algebraic fitting methods. The orthogonal distance estimate is also biased even in the case of ellipse fitting [16, 38–41].
[5].
Our fitting procedure includes a reliability measure for both algebraic and geometric ellipse parameters.
[6].
Our fitting procedure always produces an ellipse estimate.
[7].
If data points and data point covariance matrices are both jointly and appropriately modified in accordance with a change of coordinate system, then the Sampson distance is theoretically unchanged. However, certain coordinate systems (e.g. scaling all data points to lie within a unit box [14]) present favourable numerical advantages.

Our response above reduces the list of disadvantages to item (1). Now, passing to this remaining item, given that the Sampson distance yields accurate estimates, does it really matter that the error definition does not comply with measurement guidelines? In our view, it does not, but we concede that the answer may be domain dependent.

An astute reader would have probably noticed that the Sampson distance was devised more than forty years ago, and now may wonder whether the accuracy of the Sampson distance based estimation method has been surpassed by more recent techniques. To answer this question, a comprehensive evaluation of several new schemes was recently conducted [55]. These schemes are not based on optimisation of any particular cost function but rather exploit the idea of unbiasing existing estimators. The results of the comparison showed that whilst in some instances the newer techniques may lead to more accurate estimates, the improvements they yield are very small and can usually only be measured using several decimal places. Hence, in many situations, the improvement in accuracy that may be gleaned by using one of the more recently proposed techniques may not be practically useful.

12 Conclusion and Future Work

We have presented a straightforward and efficient algorithm for fitting an ellipse to data. The method exploits a parametrisation of the set of all ellipses to implicitly enforce the ellipse constraint. Computational efficiency is accomplished with a custom variant of the Levenberg–Marquardt algorithm. The technique yields estimates that are almost indistinguishable from the estimates produced by the gold-standard orthogonal distance regression method, even at moderate noise levels. The proposed method is easy to implement, thanks to the reliance on the Sampson distance. Additionally, we have provided a means for plotting confidence regions and a measure of uncertainty for both algebraic and geometric parameters of an ellipse estimate. This measure of uncertainty may prove useful in various industrial applications and may also help in low-level vision tasks where deciding whether a group of pixels belong to a line, circle, or ellipse is often a prerequisite for higher-level image analysis. In future work we plan to investigate robust variants of our cost function to neutralise the impact of outliers on the estimate, and plan to explore suitable priors over the space of ellipses to improve the quality of the estimate for ill-posed problems. We also intend to characterise the empirical and theoretical accuracy of our measure of uncertainty more comprehensively by extending the scope of the Monte Carlo simulations and experiments.

Notes

This is a realistic assumption, as $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ is known to be an unbiased estimate of $\varvec{\theta }_*$ up to the first order. See [26], [27, Sect. 10.2.2], and [37] for more details.
This is a genuine formula for the covariance matrix of $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ corresponding a different gauge constraint than the one underlying formula (7.3), which is the constraint that $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ be scaled to unit norm. Gauge constraints serve to eliminate redundant degrees of freedom in the parametrisation and lead to gauged covariances. See [57, Sect. 9] for more details.
http://sites.google.com/site/szpakz/.
The Sampson distance based estimation technique is sometimes classified as a geometric fitting method and sometimes as an algebraic fitting method. Chernov and Ma as well as Anh regard the Sampson distance optimisation as an algebraic fitting method.

References

Ahn, S.J.: Least Squares Orthogonal Distance Fitting of Curves and Surfaces in Space. Lecture Notes in Computer Science, vol. 3151. Springer, Heidelberg (2004)
Google Scholar
Ahn, S.J., Rauh, W., Warnecke, H.: Least-squares orthogonal distances fitting of circle, sphere, ellipse, hyperbola, and parabola. Pattern Recog. 34(12), 2283–2303 (2001)
Article MATH Google Scholar
Al-Sharadqah, A., Chernov, N.: A doubly optimal ellipse fit. Comput. Stat. Data Anal. 56(9), 2771–2781 (2012)
Article MATH MathSciNet Google Scholar
Albert, A.: Regression and the Moore-Penrose Pseudoinverse. Academic Press, New York (1972)
MATH Google Scholar
Bartoli, A.: Towards gauge invariant bundle adjustment: a solution based on gauge dependent damping. In: Proceedings of 9th International Conference on Computer Vision, pp. 760–765 (2003)
Brooks, M.J., Chojnacki, W., Gawley, D., van den Hengel, A.: What value covariance information in estimating vision parameters? In: Proceedings of 8th International Conference on Computer Vision, vol. 1, pp. 302–308 (2001)
Chernov, N., Huang, Q., Ma, H.: Does the best-fitting curve always exist? ISRN Prob. Stat. (2012). doi:10.5402/2012/895178
Chernov, N., Huang, Q., Ma, H.: Fitting quadratic curves to data points. Br. J. Math. Comput. Sci. 4(1), 33–60 (2014)
Article Google Scholar
Chernov, N., Lesort, C.: Statistical efficiency of curve fitting algorithms. Comput. Stat. Data Anal. 47(4), 713–728 (2004)
Article MATH MathSciNet Google Scholar
Chernov, N., Ma, H.: Least squares fitting of quadratic curves and surfaces. In: Yoshida, S.R. (ed.) Computer Vision, pp. 285–302. Nova Science Publishers, New York (2011)
Google Scholar
Chernov, N., Wijewickrema, S.: Algorithms for projecting points onto conics. J. Comput. Appl. Math. 251, 8–21 (2013)
Article MATH MathSciNet Google Scholar
Chojnacki, W., Brooks, M.J.: On the consistency of the normalized eight-point algorithm. J. Math. Imaging Vis. 28, 19–27 (2007)
Article MathSciNet Google Scholar
Chojnacki, W., Brooks, M.J., van den Hengel, A., Gawley, D.: On the fitting of surfaces to data with covariances. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1294–1303 (2000)
Article Google Scholar
Chojnacki, W., Brooks, M.J., van den Hengel, A., Gawley, D.: Revisiting Hartley’s normalized eight-point algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 25(9), 1172–1177 (2003)
Article Google Scholar
Csurka, G., Zeller, C., Zhang, Z., Faugeras, O.D.: Characterizing the uncertainty of the fundamental matrix. Comput. Vis. Image Underst. 68(1), 18–36 (1997)
Article Google Scholar
Fazekas, I., Kukush, A., Zwanzig, S.: Correction of nonlinear orthogonal regression estimator. Ukr. Math. J. 56(8), 1308–1330 (2004)
Article MathSciNet Google Scholar
Fitzgibbon, A., Pilu, M., Fisher, R.B.: Direct least square fitting of ellipses. IEEE Trans. Pattern Anal. Mach. Intell. 21(5), 476–480 (1999)
Article Google Scholar
Förstner, W.: Uncertainty and projective geometry. In: Corrochano, E.B., Förstner, W. (eds.) Handbook of Geometric Computing, pp. 493–534. Springer, Berlin Heidelberg (2005)
Chapter Google Scholar
Gander, W., Golub, G.H., Strebel, R.: Least-squares fitting of circles and ellipses. BIT 34(4), 558–578 (1994)
Article MATH MathSciNet Google Scholar
Golub, G.H., Pereyra, V.: The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate. SIAM J. Numer. Anal. 10(2), 413–432 (1973)
Article MATH MathSciNet Google Scholar
Halíř, R., Flusser, J.: Numerically stable direct least squares fitting of ellipses. In: Proceedings of 6th International Conference in Central Europe on Computer Graphics and Visualization, vol. 1, pp. 125–132 (1998)
Haralick, R.M.: Propagating covariance in computer vision. Int. J. Pattern Recogn. Artif. Intell. 10(5), 561–572 (1996)
Article Google Scholar
Hartley, R.: In defense of the eight-point algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 19(6), 580–593 (1997)
Article Google Scholar
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Hunyadi, L., Vajk, I.: Constrained quadratic errors-in-variables fitting. Visual Comput. (2013). doi:10.1007/s00371-013-0885-2
Google Scholar
Kanatani, K.: Statistical bias of conic fitting and renormalization. IEEE Trans. Pattern Anal. Mach. Intell. 16(3), 320–326 (1994)
Article MATH Google Scholar
Kanatani, K.: Statistical Optimization for Geometric Computation: Theory and Practice. Elsevier, Amsterdam (1996)
MATH Google Scholar
Kanatani, K.: Ellipse fitting with hyperaccuracy. IEICE Trans. Inf. Syst. E89–D(10), 2653–2660 (2006)
Article Google Scholar
Kanatani, K., Al-Sharadqah, A., Chernov, N., Sugaya, Y.: Renormalization returns: hyper-renormalization and its applications. In: Proceedings of 12th European Conference on Computer Vision, Lecture Notes on Computer Science, vol. 7574, pp. 384–397 (2012)
Kanatani, K., Rangarajan, P.: Hyper least squares fitting of circles and ellipses. Comput. Stat. Data Anal. 55(6), 2197–2208 (2011)
Article MathSciNet Google Scholar
Kanazawa, Y., Kanatani, K.: Optimal conic fitting and reliability evaluation. IEICE Trans. Inf. Syst. E79–D(9), 1323–1328 (1996)
Google Scholar
Kim, I.: Orthogonal distance fitting of ellipses. Commun. Korean Math. Soc. 17(1), 121–142 (2002)
Article MATH MathSciNet Google Scholar
Kukush, A., Markovsky, I., Huffel, S.V.: Consistent estimation in an implicit quadratic measurement error model. Comput. Stat. Data Anal. 47(1), 123–147 (2004)
Article MATH Google Scholar
Lehmann, E.L., Romano, J.P.: Testing Statistical Hypotheses, 3rd edn. Springer, New York (2005)
MATH Google Scholar
Lütkepol, H.: Handbook of Matrices. Wiley, Chichester (1996)
Google Scholar
Magnus, J.R., Neudecker, H.: Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley, Chichester (1988)
MATH Google Scholar
Matei, B., Meer, P.: Reduction of bias in maximum likelihood ellipse fitting. In: Proceedings of 15th International Conference on Pattern Recognition, vol. 3, pp. 794–798 (2000)
Newsam, G.N., Redding, N.J.: Fitting the most probable curve to noisy observations. In: Proceedings of International Conference on Image Processing 2, 752–755 (1997)
Newsam, G.N., Redding, N.J.: Fitting the most likely curve through noisy data. Technical Report DSTO-RR-0242, Electronics and Surveillance Research Laboratory, Defence Science and Technology Organisation, Edinburgh, South Australia (2002)
Okatani, T., Deguchi, K.: Improving accuracy of geometric parameter estimation using projected score method. In: Proceedings of 12th International Conference on Computer Vision, pp. 1733–1740 (2009)
Okatani, T., Deguchi, K.: On bias correction for geometric parameter estimation in computer vision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 959–966 (2009)
O’Leary, P., Zsombor-Murray, P.: Direct and specific least-square fitting of hyperbolæ and ellipses. J. Electron. Imaging 13(3), 492–503 (2004)
Article Google Scholar
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2000)
Book MATH Google Scholar
Osborne, M.R.: Nonlinear least squares-the Levenberg algorithm revisited. J. Aust. Math. Soc. Ser. B 19(3), 343–357 (1976)
Article MATH MathSciNet Google Scholar
Penrose, R.: A generalized inverse for matrices. Math. Proc. Cambridge Philos. Soc. 51, 406–413 (1955)
Article MATH MathSciNet Google Scholar
Porrill, J.: Fitting ellipses and predicting confidence envelopes using a bias corrected Kalman filter. Image Vis. Comput. 8(1), 37–41 (1990)
Article Google Scholar
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C. Cambridge University Press, Cambridge (1995)
Google Scholar
Rosin, P.L.: A note on the least squares fitting of ellipses. Pattern Recogn. Lett. 14(10), 799–808 (1993)
Rosin, P.L.: Analysing error of fit functions for ellipses. Pattern Recogn. Lett. 17(14), 1461–1470 (1996)
Article Google Scholar
Sampson, P.D.: Fitting conic sections to ‘very scattered’ data: an iterative refinement of the Bookstein algorithm. Comput. Graph. Image Process. 18(1), 97–108 (1982)
Article Google Scholar
Scheffé, H.: A method for judging all contrasts in the analysis of variance. Biometrika 40, 87–104 (1953)
MATH MathSciNet Google Scholar
Stewart, G.W.: On the continuity of the generalized inverse. SIAM J. Appl. Math. 17(1), 33–45 (1969)
Article MATH MathSciNet Google Scholar
Sturm, P., Gargallo, P.: Conic fitting using the geometric distance. In: Proceedings of 8th Asian Conference on Computer Vision, Lecture Notes on Computer Science, vol. 4844, pp. 784–795 (2007)
Szpak, Z., Chojnacki, W., van den Hengel, A.: Guaranteed ellipse fitting with the Sampson distance. In: Proceedings of 12th European Conference on Computer Vision, Lecture Notes on Computer Science, vol. 7576, pp. 87–100 (2012)
Szpak, Z.L., Chojnacki, W., van den Hengel, A.: A comparison of ellipse fitting methods and implications for multiple-view geometry estimation. In: Proceedings of International Conference on Digital Image Computing Techniques and Applications, pp. 1–8 (2012)
Taubin, G.: Estimation of planar curves, surfaces and nonplanar space curves defined by implicit equations, with applications to edge and range image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 13(11), 1115–1138 (1991)
Article Google Scholar
Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment—a modern synthesis. In: Proceedings of International Workshop on Vision Algorithms, Lecture Notes in Computer Science, vol. 1883, pp. 298–372 (1999)
Turner, K.: Computer perception of curved objects using a television camera. Ph.D. thesis, Department of Machine Intelligence, University of Edinburgh (1974)
Varah, J.M.: Least squares data fitting with implicit functions. BIT 36(4), 842–854 (1996)
Article MATH MathSciNet Google Scholar
Wong, C.Y., Lin, C.F., Ren, T.R., Kwok, N.M.: A survey on ellipse detection methods. In: IEEE International Symposium on Industrial Electronics, pp. 1105–1110 (2012)
Zhang, Z.: Parameter estimation techniques: a tutorial with application to conic fitting. Image and Vision Computing 15(1), 59–76 (1997)
Article Google Scholar
Zwillinger, D.: CRC Standard Mathematical Tables and Formulae, 32nd edn. CRC Press, Boca Raton (2012)
Google Scholar

Download references

Acknowledgments

This work was partially supported by the Australian Research Council.

Author information

Authors and Affiliations

School of Computer Science, The University of Adelaide, Adelaide, SA, 5005, Australia
Zygmunt L. Szpak, Wojciech Chojnacki & Anton van den Hengel

Authors

Zygmunt L. Szpak
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Chojnacki
View author publications
You can also search for this author in PubMed Google Scholar
Anton van den Hengel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zygmunt L. Szpak.

Appendices

1.1 Appendix 1: Proof of Equation (4.14)

In this appendix, we establish Eq. (4.14). The proof will not rely on general identities for pseudo-inverse matrices but rather will involve the specifics of $\mathbf {r}(\varvec{\theta })$ and $\mathbf {\varvec{\pi }}(\varvec{\eta })$.

Differentiating $\mathbf {r}(t \varvec{\theta }) = \mathbf {r}(\varvec{\theta })$ with respect to $t$ and evaluating at $t = 1$, we find that $\partial _{\varvec{\theta }}{\mathbf {r}}(\varvec{\theta }) \varvec{\theta }= \mathbf {0}.$ Hence, in particular, $\partial _{\varvec{\theta }}{\mathbf {r}}(\varvec{\pi }(\varvec{\eta }))\varvec{\pi }(\varvec{\eta }) = \mathbf {0}.$ Consequently, recalling (4.4), we have

$$\begin{aligned} \partial _{\varvec{\theta }}{\mathbf {r}}(\varvec{\pi }(\varvec{\eta }))\mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp&= \partial _{\varvec{\theta }}{\mathbf {r}}(\varvec{\pi }(\varvec{\eta }))\\&\quad - \Vert \varvec{\pi }(\varvec{\eta }) \Vert ^{-2} \partial _{\varvec{\theta }}{\mathbf {r}}(\varvec{\pi }(\varvec{\eta }))\varvec{\pi }(\varvec{\eta })\varvec{\pi }(\varvec{\eta })^\mathsf {T}\\&= \partial _{\varvec{\theta }}{\mathbf {r}}(\varvec{\pi }(\varvec{\eta })). \end{aligned}$$

We summarise this simply as

$$\begin{aligned} \partial _{\varvec{\theta }}{\mathbf {r}} \, \mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp = \partial _{\varvec{\theta }}{\mathbf {r}} \end{aligned}$$

in line with our earlier convention that $\partial _{\varvec{\theta }}{\mathbf {r}}$ be evaluated at $\varvec{\pi }(\varvec{\eta })$. As an immediate consequence, we obtain

$$\begin{aligned} ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} + \lambda \mathbf {I}_6) \mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp = (\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} + \lambda \mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp . \end{aligned}$$

This together with the observation that $(\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} + \lambda \mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp $ is symmetric (being the sum of the symmetric matrices $(\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}}$ and $\lambda \mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp $) yields

$$\begin{aligned} ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} + \lambda \mathbf {I}_6) \mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp&= (((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} + \lambda \mathbf {I}_6) \mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp )^\mathsf {T}\nonumber \\&= (\mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp )^\mathsf {T}((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} + \lambda \mathbf {I}_6)^\mathsf {T}\nonumber \\&= \mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} + \lambda \mathbf {I}_6). \end{aligned}$$

(13.1)

Now note that if $\mathbf {A}$ and $\mathbf {B}$ are square matrices of the same sizes, $\mathbf {A}$ is invertible, and $\mathbf {A}\mathbf {B} = \mathbf {B}\mathbf {A}$, then $\mathbf {A}^{-1}\mathbf {B} = \mathbf {B}\mathbf {A}^{-1}$, as is easily seen by pre- and post-multiplying the both sides of $\mathbf {A}\mathbf {B} = \mathbf {B}\mathbf {A}$ by $\mathbf {A}^{-1}$. This in conjunction with (13.1) implies

$$\begin{aligned} ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} + \lambda \mathbf {I}_6)^{-1} \mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp = \mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} + \lambda \mathbf {I}_6)^{-1}. \end{aligned}$$

(13.2)

Differentiating the identity $\Vert \varvec{\pi }(\varvec{\eta }) \Vert ^2 = \varvec{\pi }(\varvec{\eta })^\mathsf {T}\varvec{\pi }(\varvec{\eta }) = 1$ with respect to $\varvec{\eta }$, we get

$$\begin{aligned} (\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}\varvec{\pi }(\varvec{\eta }) = 0. \end{aligned}$$

(13.3)

Hence

$$\begin{aligned} (\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}\mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp&= (\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}- \Vert \varvec{\pi }(\varvec{\eta }) \Vert ^{-2} (\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}\varvec{\pi }(\varvec{\eta })\varvec{\pi }(\varvec{\eta })^\mathsf {T}\\&= (\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}. \end{aligned}$$

Pre-multiplying both sides of this equality by $((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}\partial _{\varvec{\eta }}{\varvec{\pi }})^{-1}$ and invoking (4.12a), we obtain

$$\begin{aligned} (\partial _{\varvec{\eta }}{\varvec{\pi }})^+ \mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp = (\partial _{\varvec{\eta }}{\varvec{\pi }})^+. \end{aligned}$$

(13.4)

Now recall that for any matrix $\mathbf {A}$, the matrix $\mathbf {A}^+ \mathbf {A}$ represents the orthogonal projection onto the range (column space) of $\mathbf {A}^\mathsf {T}$, or equivalently, the orthogonal projection onto the orthogonal complement of the null space of $\mathbf {A}$. Since, on account of (13.3), the null space of $(\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}$ is spanned by $\varvec{\pi }(\varvec{\eta })$, it follows that

$$\begin{aligned} ((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T})^+ (\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}= \mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp . \end{aligned}$$

(13.5)

We now have all ingredients needed to establish (4.14). We calculate as follows:

$$\begin{aligned}&\!\!\!(\partial _{\varvec{\eta }}{\varvec{\pi }})^+ ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6)^{-1} ((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T})^+ (\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6) \partial _{\varvec{\eta }}{\varvec{\pi }}\\&\qquad = (\partial _{\varvec{\eta }}{\varvec{\pi }})^+ ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6)^{-1} \mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6) \partial _{\varvec{\eta }}{\varvec{\pi }}\\&\qquad =(\partial _{\varvec{\eta }}{\varvec{\pi }})^+ \mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6)^{-1} ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6) \partial _{\varvec{\eta }}{\varvec{\pi }}\\&\qquad =(\partial _{\varvec{\eta }}{\varvec{\pi }})^+ \mathbf {P}_{\varvec{\pi }(\varvec{\eta })}^\perp \partial _{\varvec{\eta }}{\varvec{\pi }}\\&\qquad =(\partial _{\varvec{\eta }}{\varvec{\pi }})^+ \partial _{\varvec{\eta }}{\varvec{\pi }}\\&\qquad =\mathbf {I}_5. \end{aligned}$$

In the above the second line comes from the first by (13.5); the third line comes from the second by (13.2); the fourth line comes from the third by the tautological identity $((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6)^{-1} ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6) =\mathbf {I}_6;$ the fifth line comes from the fourth by (13.4); and the sixth line comes from the fifth by (4.13a). The end result of our calculation is what is exactly needed to establish (4.14).

1.2 Appendix 2: Proof of Equation (5.2)

In this appendix, we establish the covariance formula (5.2). The derivation is based on two ingredients: an equation characterising $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ and a covariance propagation formula. The first ingredient, embodied in Eq. (13.7) below, comes from the optimality condition that $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ satisfies as the minimiser of $J_{\mathrm {AML}}$:

$$\begin{aligned}{}[\partial _{\varvec{\theta }}{J_{\mathrm {AML}}(\varvec{\theta }; \mathbf {x}_1, \dots , \mathbf {x}_N)}]_{\varvec{\theta }= \widehat{\varvec{\theta }}_{\mathrm {AML}}} = \mathbf {0}^\mathsf {T}. \end{aligned}$$

(13.6)

Direct computation shows that

$$\begin{aligned}{}[\partial _{\varvec{\theta }}{J_{\mathrm {AML}}(\varvec{\theta }; \mathbf {x}_1, \dots , \mathbf {x}_N)}]^\mathsf {T}= 2 \mathbf {X}_{\varvec{\theta }} \varvec{\theta }, \end{aligned}$$

where

$$\begin{aligned} \mathbf {X}_{\varvec{\theta }} = \sum _{n=1}^N \frac{\mathbf {A}_n}{\varvec{\theta }^{\mathsf {T}} \mathbf {B}_n\varvec{\theta }} - \sum _{n=1}^N \frac{\varvec{\theta }^{\mathsf {T}} \mathbf {A}_n \varvec{\theta }}{(\varvec{\theta }^{\mathsf {T}} \mathbf {B}_n \varvec{\theta })^2} \mathbf {B}_n. \end{aligned}$$

Accordingly, (13.6) can be reformulated as

$$\begin{aligned} \mathbf {X}_{\hat{\varvec{\theta }}} \hat{\varvec{\theta }}= \mathbf {0}, \end{aligned}$$

(13.7)

where $\widehat{\varvec{\theta }}_{\mathrm {AML}}$ is abbreviated to $\hat{\varvec{\theta }}$ for clarity.

The second ingredient, which will be used in combination with the first, is the covariance propagation formula

$$\begin{aligned} \varvec{\varLambda }_{\hat{\varvec{\theta }}}^{} = \sum _{n=1}^N \partial _{\mathbf {x}_n}{\hat{\varvec{\theta }}} \varvec{\varLambda }_{\mathbf {x}_n}^{} (\partial _{\mathbf {x}_n}{\hat{\varvec{\theta }}})^{\mathsf {T}} \end{aligned}$$

(13.8)

(cf. [15, 22]). Here it is tacitly assumed that, corresponding to varying sets of data points $\mathbf {x}_1, \dots , \mathbf {x}_N$, the normalised vectors $\hat{\varvec{\theta }}= \hat{\varvec{\theta }}(\mathbf {x}_1, \dots , \mathbf {x}_N)$ have been chosen in a coordinated way, so that $\hat{\varvec{\theta }}$ varies smoothly, without sign flipping, as a function of $\mathbf {x}_1, \dots , \mathbf {x}_N$ and in particular may be differentiated.

As a first step towards the derivation of formula (5.2), we differentiate $\Vert \hat{\varvec{\theta }}\Vert ^2 =1$ with respect to $\mathbf {x}_n$ to get $(\partial _{\mathbf {x}_n}{\hat{\varvec{\theta }}})^{\mathsf {T}}\hat{\varvec{\theta }}= \mathbf {0}.$ This jointly with (4.4) and (13.8) then implies

$$\begin{aligned} \mathbf {P}_{\hat{\varvec{\theta }}}^\perp \varvec{\varLambda }_{\hat{\varvec{\theta }}}^{} = \varvec{\varLambda }_{\hat{\varvec{\theta }}}^{} \mathbf {P}_{\hat{\varvec{\theta }}}^\perp = \varvec{\varLambda }_{\hat{\varvec{\theta }}}^{}. \end{aligned}$$

(13.9)

Next, letting $\mathbf {x}_n = [m_{n,1}, m_{n,2}]^\mathsf {T}$ and $\hat{\varvec{\theta }}= [\hat{\theta }_1, \dots , \hat{\theta }_6]^\mathsf {T}$, and differentiating (13.7) with respect to $m_{n,i}$, we obtain

$$\begin{aligned} \left[ [\partial _{m_{n,i}}{\mathbf {X}_{\varvec{\theta }}}]_{\varvec{\theta }= \hat{\varvec{\theta }}} + \sum _{j=1}^6 [\partial _{\theta _j}{\mathbf {X}_{\varvec{\theta }}}]_{\varvec{\theta }= \hat{\varvec{\theta }}} \partial _{m_{n,i}}{{\hat{\theta }}_j} \right] {\hat{\varvec{\theta }}} + \mathbf {X}_{\hat{\varvec{\theta }}} \partial _{m_{n,i}}{\hat{\varvec{\theta }}} = \mathbf {0}. \end{aligned}$$

Introducing the Gauss-Newton approximation, i.e., neglecting the terms that contain $\hat{\varvec{\theta }}^\mathsf {T}\mathbf {u}(\mathbf {x}_n)$, we arrive (after some calculations) at

$$\begin{aligned} \frac{\mathbf {u}(\mathbf {x}_n)[\partial _{m_{n,i}}{\mathbf {u}(\mathbf {x}_n)}]^{\mathsf {T}} \hat{\varvec{\theta }}}{\hat{\varvec{\theta }}^{\mathsf {T}} \mathbf {B}_n \hat{\varvec{\theta }}} + \left[ \sum _{n=1}^N \frac{\mathbf {A}_n}{\hat{\varvec{\theta }}^{\mathsf {T}} \mathbf {B}_n \hat{\varvec{\theta }}} \right] \partial _{m_{n,i}}{\hat{\varvec{\theta }}} = \mathbf {0}. \end{aligned}$$

This together with the observation that the scalar $[\partial _{m_{n,i}}{\mathbf {u}(\mathbf {x}_n)}]^{\mathsf {T}} \hat{\varvec{\theta }}$ can also be written as $\hat{\varvec{\theta }}^{\mathsf {T}} \partial _{m_{n,i}}{\mathbf {u}(\mathbf {x}_n)}$ leads to

$$\begin{aligned} \left[ \sum _{n=1}^N \frac{\mathbf {A}_n}{\hat{\varvec{\theta }}^{\mathsf {T}} \mathbf {B}_n \hat{\varvec{\theta }}} \right] \partial _{m_{n,i}}{\hat{\varvec{\theta }}}&= - \frac{\mathbf {u}(\mathbf {x}_n) [\partial _{m_{n,i}}{\mathbf {u}(\mathbf {x}_n)}]^{\mathsf {T}} \hat{\varvec{\theta }}}{\hat{\varvec{\theta }}^{\mathsf {T}} \mathbf {B}_n \hat{\varvec{\theta }}}\\&= - \frac{\mathbf {u}(\mathbf {x}_n) \hat{\varvec{\theta }}^{\mathsf {T}} \partial _{m_{n,i}}{\mathbf {u}(\mathbf {x}_n)}}{\hat{\varvec{\theta }}^{\mathsf {T}} \mathbf {B}_n \hat{\varvec{\theta }}}. \end{aligned}$$

Consequently,

$$\begin{aligned} \left[ \sum _{n=1}^N \frac{\mathbf {A}_n}{\hat{\varvec{\theta }}^{\mathsf {T}} \mathbf {B}_n \hat{\varvec{\theta }}} \right] \partial _{\mathbf {x}_n}{\hat{\varvec{\theta }}} = - \frac{\mathbf {u}(\mathbf {x}_n) \hat{\varvec{\theta }}^{\mathsf {T}} \partial _{\mathbf {x}_n}{\mathbf {u}(\mathbf {x}_n)}}{\hat{\varvec{\theta }}^{\mathsf {T}} \mathbf {B}_n \hat{\varvec{\theta }}} \end{aligned}$$

and further, recalling the definitions of $\mathbf {A}_n$ and $\mathbf {B}_n$ given in (2.5),

$$\begin{aligned}&\left[ \sum _{n=1}^N \frac{\mathbf {A}_n}{\hat{\varvec{\theta }}^{\mathsf {T}} \mathbf {B}_n \hat{\varvec{\theta }}} \right] \partial _{\mathbf {x}_n}{\hat{\varvec{\theta }}} \varvec{\varLambda }_{\mathbf {x}_n}^{} (\partial _{\mathbf {x}_n}{\hat{\varvec{\theta }}})^{\mathsf {T}} \left[ \sum _{n=1}^N \frac{\mathbf {A}_n}{\hat{\varvec{\theta }}^{\mathsf {T}} \mathbf {B}_n \hat{\varvec{\theta }}} \right] \\&\qquad = \frac{ \mathbf {u}(\mathbf {x}_n) \hat{\varvec{\theta }}^{\mathsf {T}} \partial _{\mathbf {x}_n}{\mathbf {u}(\mathbf {x}_n)} \varvec{\varLambda }_{\mathbf {x}_n}^{} [\partial _{\mathbf {x}_n}{\mathbf {u}(\mathbf {x}_n)}]^{\mathsf {T}} \hat{\varvec{\theta }}\mathbf {u}(\mathbf {x}_n)^{\mathsf {T}}}{(\hat{\varvec{\theta }}^{\mathsf {T}} \mathbf {B}_n \hat{\varvec{\theta }})^2}\\&\qquad = \frac{\mathbf {u}(\mathbf {x}_n) \mathbf {u}(\mathbf {x}_n)^{\mathsf {T}}}{\hat{\varvec{\theta }}^{\mathsf {T}} \mathbf {B}_n \hat{\varvec{\theta }}} = \frac{\mathbf {A}_n}{\hat{\varvec{\theta }}^{\mathsf {T}} \mathbf {B}_n \hat{\varvec{\theta }}}. \end{aligned}$$

Now

$$\begin{aligned}&\left[ \sum _{n=1}^N \frac{\mathbf {A}_n}{\hat{\varvec{\theta }}^{\mathsf {T}} \mathbf {B}_n \hat{\varvec{\theta }}} \right] \left[ \sum _{n=1}^N \partial _{\mathbf {x}_n}{\hat{\varvec{\theta }}} \varvec{\varLambda }_{\mathbf {x}_n}^{} (\partial _{\mathbf {x}_n}{\hat{\varvec{\theta }}})^{\mathsf {T}} \right] \left[ \sum _{n=1}^N \frac{\mathbf {A}_n}{\hat{\varvec{\theta }}^{\mathsf {T}} \mathbf {B}_n \hat{\varvec{\theta }}} \right] \\&\qquad = \sum _{n=1}^N \frac{\mathbf {A}_n}{\hat{\varvec{\theta }}^{\mathsf {T}} \mathbf {B}_n \hat{\varvec{\theta }}}. \end{aligned}$$

By (5.1) and (13.8), the last equality becomes

$$\begin{aligned} \mathbf {M}_{\hat{\varvec{\theta }}} \varvec{\varLambda }_{\hat{\varvec{\theta }}}^{} \mathbf {M}_{\hat{\varvec{\theta }}} = \mathbf {M}_{\hat{\varvec{\theta }}}. \end{aligned}$$

(13.10)

At this stage, one might be tempted to conclude that $\varvec{\varLambda }_{\hat{\varvec{\theta }}}^{} = \mathbf {M}_{\hat{\varvec{\theta }}}^{-1}$, but this would contravene the fact that $\varvec{\varLambda }_{\hat{\varvec{\theta }}}^{}$ is singular. In order to exploit (13.10) properly as an approximate equality, we first note that, in view of (13.9) and the fact that $\mathbf {P}_{\hat{\varvec{\theta }}}^\perp $ is idempotent, $\mathbf {P}_{\hat{\varvec{\theta }}}^\perp = (\mathbf {P}_{\hat{\varvec{\theta }}}^\perp )^2,$ we have

$$\begin{aligned} \mathbf {P}_{\hat{\varvec{\theta }}}^\perp \varvec{\varLambda }_{\hat{\varvec{\theta }}}^{} \mathbf {P}_{\hat{\varvec{\theta }}}^\perp = \varvec{\varLambda }_{\hat{\varvec{\theta }}}^{}, \end{aligned}$$

(13.11)

so (13.10) can be rewritten as

$$\begin{aligned} \mathbf {M}_{\hat{\varvec{\theta }}} \mathbf {P}_{\hat{\varvec{\theta }}}^\perp \varvec{\varLambda }_{\hat{\varvec{\theta }}}^{} \mathbf {P}_{\hat{\varvec{\theta }}}^\perp \mathbf {M}_{\hat{\varvec{\theta }}} = \mathbf {M}_{\hat{\varvec{\theta }}}. \end{aligned}$$

Pre- and post-multiplying the last equation by $\mathbf {P}_{\hat{\varvec{\theta }}}^\perp $ and letting

$$\begin{aligned} \mathbf {M}_{\hat{\varvec{\theta }}}^\perp = \mathbf {P}_{\hat{\varvec{\theta }}}^\perp \mathbf {M}_{\hat{\varvec{\theta }}} \mathbf {P}_{\hat{\varvec{\theta }}}^\perp \end{aligned}$$

now leads to

$$\begin{aligned} \mathbf {M}_{\hat{\varvec{\theta }}}^\perp \varvec{\varLambda }_{\hat{\varvec{\theta }}}^{} \mathbf {M}_{\hat{\varvec{\theta }}}^\perp = \mathbf {M}_{\hat{\varvec{\theta }}}^\perp . \end{aligned}$$

(13.12)

In turn, pre- and post-multiplying (13.12) by $(\mathbf {M}_{\hat{\varvec{\theta }}}^\perp )^+$ yields

$$\begin{aligned} (\mathbf {M}_{\hat{\varvec{\theta }}}^\perp )^+ \mathbf {M}_{\hat{\varvec{\theta }}}^\perp \varvec{\varLambda }_{\hat{\varvec{\theta }}}^{} \mathbf {M}_{\hat{\varvec{\theta }}}^\perp (\mathbf {M}_{\hat{\varvec{\theta }}}^\perp )^+ = (\mathbf {M}_{\hat{\varvec{\theta }}}^\perp )^+ \mathbf {M}_{\hat{\varvec{\theta }}}^\perp (\mathbf {M}_{\hat{\varvec{\theta }}}^\perp )^+. \end{aligned}$$

(13.13)

The matrix $\mathbf {M}_{\hat{\varvec{\theta }}}^\perp $ is symmetric and its null space is spanned by $\hat{\varvec{\theta }}$, so

$$\begin{aligned} \mathbf {M}_{\hat{\varvec{\theta }}}^\perp (\mathbf {M}_{\hat{\varvec{\theta }}}^\perp )^+ = (\mathbf {M}_{\hat{\varvec{\theta }}}^\perp )^+ \mathbf {M}_{\hat{\varvec{\theta }}}^\perp = \mathbf {P}_{\hat{\varvec{\theta }}}^\perp \end{aligned}$$

(cf. [4, Cor. 3.5]). We also have $(\mathbf {M}_{\hat{\varvec{\theta }}}^\perp )^+ \mathbf {M}_{\hat{\varvec{\theta }}}^\perp (\mathbf {M}_{\hat{\varvec{\theta }}}^\perp )^+ = (\mathbf {M}_{\hat{\varvec{\theta }}}^\perp )^+$ by virtue of one of the four defining properties of the pseudo-inverse [4, Thm. 3.9]. Therefore (13.13) can be restated as

$$\begin{aligned} \mathbf {P}_{\hat{\varvec{\theta }}}^\perp \varvec{\varLambda }_{\hat{\varvec{\theta }}}^{} \mathbf {P}_{\hat{\varvec{\theta }}}^\perp = (\mathbf {M}_{\hat{\varvec{\theta }}}^\perp )^+, \end{aligned}$$

which, on account of (13.11), implies

$$\begin{aligned} \varvec{\varLambda }_{\hat{\varvec{\theta }}}^{} = (\mathbf {M}_{\hat{\varvec{\theta }}}^\perp )^+. \end{aligned}$$

(13.14)

We now deduce our final formula for $\varvec{\varLambda }_{\hat{\varvec{\theta }}}^{}$, namely

$$\begin{aligned} \varvec{\varLambda }_{\hat{\varvec{\theta }}}^{} = \mathbf {P}_{\hat{\varvec{\theta }}}^\perp (\mathbf {M}_{\hat{\varvec{\theta }}})^+_5 \mathbf {P}_{\hat{\varvec{\theta }}}^\perp , \end{aligned}$$

(13.15)

which is nothing else but Eq. (5.2) transcribed to the present notation. First we note that as, by (13.7), $\hat{\varvec{\theta }}$ spans the null space of $\mathbf {X}_{\hat{\varvec{\theta }}}$, $\mathbf {X}_{\hat{\varvec{\theta }}}$ has rank $5$. Next we observe that in the Gauss–Newton approximation $\mathbf {X}_{\hat{\varvec{\theta }}}$ is equal to $\mathbf {M}_{\hat{\varvec{\theta }}}$, so, having rank $5$, $\mathbf {X}_{\hat{\varvec{\theta }}}$ is also approximately equal to $(\mathbf {M}_{\hat{\varvec{\theta }}})_5$. This in turn implies that, approximately,

$$\begin{aligned} (\mathbf {M}_{\hat{\varvec{\theta }}})^+_5 = \mathbf {X}_{\hat{\varvec{\theta }}}^+, \end{aligned}$$

given that the function $\mathbf {A} \mapsto \mathbf {A}^+$ is continuous when considered on sets of matrices of equal rank [20, 36, 45, 52], The last equality together with $\mathbf {P}_{\hat{\varvec{\theta }}}^\perp \mathbf {X}_{\hat{\varvec{\theta }}}^+ \mathbf {P}_{\hat{\varvec{\theta }}}^\perp = \mathbf {X}_{\hat{\varvec{\theta }}}^+,$ which immediately follows from the facts that $\mathbf {X}_{\hat{\varvec{\theta }}}$ is symmetric and that $\hat{\varvec{\theta }}$ spans the null space of $\mathbf {X}_{\hat{\varvec{\theta }}}$, implies

$$\begin{aligned} \mathbf {P}_{\hat{\varvec{\theta }}}^\perp (\mathbf {M}_{\hat{\varvec{\theta }}})^+_5 \mathbf {P}_{\hat{\varvec{\theta }}}^\perp = \mathbf {X}_{\hat{\varvec{\theta }}}^+. \end{aligned}$$

(13.16)

As $\mathbf {M}_{\hat{\varvec{\theta }}}$ is approximately equal to $\mathbf {X}_{\hat{\varvec{\theta }}}$, it is clear that $\mathbf {M}_{\hat{\varvec{\theta }}}^\perp $ ($= \mathbf {P}_{\hat{\varvec{\theta }}}^\perp \mathbf {M}_{\hat{\varvec{\theta }}} \mathbf {P}_{\hat{\varvec{\theta }}}^\perp $) is approximately equal to $\mathbf {P}_{\hat{\varvec{\theta }}}^\perp \mathbf {X}_{\hat{\varvec{\theta }}} \mathbf {P}_{\hat{\varvec{\theta }}}^\perp = \mathbf {X}_{\hat{\varvec{\theta }}}$. Both $\mathbf {M}_{\hat{\varvec{\theta }}}^\perp $ and $\mathbf {X}_{\hat{\varvec{\theta }}}$ have rank $5$, so, as they are approximately equal, their pseudo-inverses are also approximately equal,

$$\begin{aligned} (\mathbf {M}_{\hat{\varvec{\theta }}}^\perp )^+ = \mathbf {X}_{\hat{\varvec{\theta }}}^+, \end{aligned}$$

by the aforementioned continuity property of the pseudo-inverse. Combining this last equation with (13.16) yields

$$\begin{aligned} (\mathbf {M}_{\hat{\varvec{\theta }}}^\perp )^+ = \mathbf {P}_{\hat{\ varvec{\theta }}}^\perp (\mathbf {M}_{\hat{\varvec{\theta }}})^+_5 \mathbf {P}_{\hat{\varvec{\theta }}}^\perp , \end{aligned}$$

and this together (13.14) establishes (13.15).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Szpak, Z.L., Chojnacki, W. & van den Hengel, A. Guaranteed Ellipse Fitting with a Confidence Region and an Uncertainty Measure for Centre, Axes, and Orientation. J Math Imaging Vis 52, 173–199 (2015). https://doi.org/10.1007/s10851-014-0536-x

Download citation

Received: 10 December 2013
Accepted: 30 August 2014
Published: 25 September 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s10851-014-0536-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Guaranteed Ellipse Fitting with a Confidence Region and an Uncertainty Measure for Centre, Axes, and Orientation

Abstract

Similar content being viewed by others

The adaptation of the k-means algorithm to solving the multiple ellipses detection problem by using an initial approximation obtained by the DIRECT global optimization algorithm

A Multi-objective Robust Ellipse Fitting Algorithm

Concentric Ellipse Fitting with Bias Correction and Specialized Geometric-Based Clustering Approach

1 Introduction

2 Background

3 Optimisation Using a Barrier Term

4 Optimisation Using a Parametrisation

4.1 A Parametrisation of the Set of All Ellipses

4.2 Optimisation Algorithm

4.2.1 Initialisation

4.2.2 Standard Form of LM

4.2.3 Modified Form of LM

5 Two Forms of Estimate Covariances

5.1 Covariance of the Algebraically Parametrised Estimate

5.2 Covariance of the Geometrically Parametrised Estimate

6 Confidence Regions

7 Data Normalisation

8 Normalised Confidence Regions

9 Algorithms

10 Performance Analysis

10.1 Procedure for Generating Synthetic Data and Evaluating Ellipse Estimates

10.1.1 Data Generation

10.1.2 Estimates Evaluation

10.2 Results

10.2.1 Synthetic Data

10.2.2 Stability and Efficiency

10.2.3 Attributes of the Estimate

10.2.4 Real Data

10.2.5 Accuracy of Geometric Parameter Covariance Estimation

11 Discussion

12 Conclusion and Future Work

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendices

1.1 Appendix 1: Proof of Equation (4.14)

1.2 Appendix 2: Proof of Equation (5.2)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation