1 Introduction

The task of fitting an ellipse to data is frequently encountered in numerous disciplines. This is partially explained by the fact that even though very few shapes and trajectories are perfectly elliptical, non-elliptical shapes and trajectories can often be modelled using one or more ellipses. Recently, Wong et al. [60] compiled a comprehensive list of ellipse fitting applications with items such as human gait analysis, grain sorting, steel coils quality assurance, and cell segmentation—to name a few. All of these applications require an ellipse fitting method that is (1) simple to implement, (2) computationally efficient, (3) accurate, (4) capable of delivering a measure of uncertainty associated with any estimate generated, and (5) always produces a fit in the form of an ellipse. Unfortunately, there is currently no ellipse fitting method available that simultaneously satisfies all of these requirements.

For example, the maximum likelihood ellipse estimation method in its classic form seeks to minimise the sum of the orthogonal distances between data points and a candidate ellipse. Minimising the orthogonal distance, however, is a complicated process in practice [1, 8, 10, 19, 32, 53, 61]. It necessitates projecting points onto conics—a task for which there is no closed form solution, and one that has been described by numerous authors as time consuming and numerically unstable [2, 11]. So even though the classic maximum likelihood estimation method, also called the orthogonal distance regression method, yields accurate results, it is often inadequate for many practical scenarios.

There are also algebraic ellipse fitting techniques based on error measures that determine how well a collection of data points satisfies the ellipse equation in a certain least squares sense [3, 25, 33, 48]. Least squares error measures are different from the orthogonal distance, but are much easier to compute and much easier to minimise. Therefore, algebraic ellipse fitting methods are typically fast and simple, and some can even always generate ellipses, not just conics, as estimates [17, 21, 42, 59]. However, in comparison to the orthogonal distance regression method, algebraic ellipse fitting techniques often produce estimates of inferior accuracy.

Another class of ellipse fitting methods comprises techniques tagged with labels such as gradient-weighted, approximate maximum likelihood (AML), and hyper-accurate [28, 29, 55]. The methods in this class seek a balance between the accuracy of the orthogonal distance regression method and the simplicity of algebraic fitting methods. What differentiates these techniques from pure algebraic schemes is that they have credible statistical foundations [9, 13] and are capable of accommodating the uncertainty associated with a data point. One particular estimation method that has stood the test of time was proposed independently by Turner [58] in 1974 and by Sampson [50] in 1982. The cost function underlying this method falls into the category of AML cost functions and is often called the Sampson distance [24, Sect. 4.2.6]. For moderate noise levels, the Sampson distance is an excellent approximation of the orthogonal distance. Whilst ellipse fitting methods based on the Sampson distance are typically simple, fast, and accurate at moderate noise levels, most do not guarantee generation of an estimate in the form of an ellipse.

An ellipse fitting method that is fast, simple, utilises the Sampson distance, and guarantees a genuine ellipse fit was recently proposed by Szpak et al. [54]. To enforce the ellipticity constraint, the method uses a barrier term added to the AML cost function. In this paper, we show that an explicit barrier term may be dispensed with in favour of a parametrisation of the set of all ellipses that leads to the implicit enforcement of the ellipticity constraint. The estimator that we develop based on this parametrisation is conceptually and practically simpler than its immediate predecessor involving a barrier term. As a further extension of the barrier method, we develop a measure of uncertainty of a Sampson distance based ellipse estimate in two closely related forms. One of these concerns the uncertainty in the algebraic parameters of the estimate and the other pertains to the uncertainty in the geometrically meaningful parameters of the estimate such as the centre, axes, and major axis orientation. We also provide a means for visualising the uncertainty of an estimate in the form of planar confidence regions. By providing a measure of uncertainty of an ellipse estimate, we facilitate principled statistical decision making for users of our algorithm. Thus, although the algorithm uses a parametrisation of the ellipses that has no immediate geometric significance, practitioners can apply our algorithm and still have a measure of reliability of the geometrically meaningful parameters of an ellipse estimate.

2 Background

A conic section, or simply a conic, is the locus of solutions \(\mathbf {x} = [m_1, m_2]^\mathsf {T}\) in the Euclidean plane \(\mathbb {R}^2\) of a quadratic equation

$$\begin{aligned} a m_1^2 + b m_1 m_2 + c m_2^2 + d m_1 + e m_2 + f = 0, \end{aligned}$$
(2.1)

where \(a\), \(b\), \(c\), \(d\), \(e\), \(f\) are real numbers such that \(a^2 + b^2 + c^2 > 0\). With \(\varvec{\theta }=[a, b, c, d, e, f]^\mathsf {T}\) and \(\mathbf {u}(\mathbf {x}) = [m_1^2, m_1 m_2, m_2^2, m_1, m_2, 1]^\mathsf {T},\) Eq. (2.1) can equivalently be written as

$$\begin{aligned} \varvec{\theta }^T \mathbf {u}(\mathbf {x}) = 0. \end{aligned}$$
(2.2)

Any multiple of \(\varvec{\theta }\) by a non-zero number corresponds to the same conic. A conic is non-degenerate if the determinant of the conic

$$\begin{aligned} D =\begin{vmatrix} a&b/2&d/2 \\ b/2&c&e/2 \\ d/2&e/2&f \end{vmatrix} \end{aligned}$$

is non-zero. When \(D = 0\), the conic is degenerate. A non-degenerate conic is either an ellipse (possibly with no graph, that is, an ellipse reduced to an empty set), a parabola, or a hyperbola depending on whether the discriminant \(\Delta =\) \( b^2 - 4ac\) is negative, zero, or positive, respectively. A degenerate conic can be either: a single point (\(\Delta < 0\)); an empty set, a straight line, or two parallel lines (\(\Delta = 0\)); or a pair of intersecting lines (\(\Delta > 0\)). By convention, degenerate conics with \(\Delta < 0\), \(\Delta = 0\), and \(\Delta > 0\) are referred to as degenerate ellipses, degenerate parabolas, and degenerate hyperbolas, respectively.

The condition \(\Delta < 0\) characterising the ellipses (non-degenerate or otherwise, including the ellipses with no graph or reduced to a point) can alternatively be written as

$$\begin{aligned} \varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta }> 0, \end{aligned}$$
(2.3)

where

$$\begin{aligned} \mathbf {F} =\begin{bmatrix} 1&0 \\ 0&0 \end{bmatrix}\otimes \begin{bmatrix} 0&0&2 \\ 0&-1&0 \\ 2&0&0 \end{bmatrix} \end{aligned}$$

and \(\otimes \) denotes Kronecker product [35]. Hereafter, of all conics, we shall specifically be concerned with ellipses.

The task of fitting an ellipse to a set of points \(\mathbf {x}_1, \dots , \mathbf {x}_N\) requires a meaningful cost function that characterises the extent to which any particular \(\varvec{\theta }\) fails to satisfy the system of \(N\) copies of Eq. (2.2) associated with \(\mathbf {x} = \mathbf {x}_n\), \(n = 1, \dots , N\). Once a cost function is selected, the corresponding ellipse fit is generated by minimising the cost function subject to the constraint (2.3).

Effectively, though not explicitly, Varah [59] and Fitzgibbon et al. [17] proposed to use for ellipse fitting the direct ellipse fitting cost function defined by

$$\begin{aligned} J_{\mathrm {DIR}}(\varvec{\theta }) = {\left\{ \begin{array}{ll} \frac{\varvec{\theta }^\mathsf {T}\mathbf {A} \varvec{\theta }}{\varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta }} &{} \text {if} \quad \varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta }> 0, \\ \infty &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$

where \(\mathbf {A} = \sum _{n=1}^N \mathbf {u}(\mathbf {x}_n) \mathbf {u}(\mathbf {x}_n)^\mathsf {T}.\) The minimiser \(\widehat{\varvec{\theta }}_{\mathrm {DIR}}\) of \(J_{\mathrm {DIR}}\) is the same as the solution of the problem

$$\begin{aligned} \begin{array}{ll} \text {minimise} &{} \varvec{\theta }^\mathsf {T}\mathbf {A} \varvec{\theta }\\ \text {subject to} &{} \varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta }= 1, \end{array} \end{aligned}$$
(2.4)

and it is in this form that \(\widehat{\varvec{\theta }}_{\mathrm {DIR}}\) was originally introduced. The representation of \(\widehat{\varvec{\theta }}_{\mathrm {DIR}}\) as a solution of the problem (2.4) makes it clear that \(\widehat{\varvec{\theta }}_{\mathrm {DIR}}\) is always an ellipse. Extending the work of Varah and Fitzgibbon et al., Halíř and Flusser [21] introduced a numerically stable algorithm for calculating \(\widehat{\varvec{\theta }}_{\mathrm {DIR}}\).

Another cost function for ellipse fitting, and in fact more generally for conic fitting, was first proposed independently by Turner [58] and Sampson [50] and then popularised, in a broader context, by Taubin [56] and Kanatani [27]. It is associated with many names such as the Sampson, gradient-weighted, and approximate maximum likelihood (AML) distance or cost function, and takes the form

$$\begin{aligned} J_{\mathrm {AML}}(\varvec{\theta }) = \sum _{n=1}^N \frac{\varvec{\theta }^\mathsf {T}\mathbf {A}_n \varvec{\theta }}{\varvec{\theta }^\mathsf {T}\mathbf {B}_n \varvec{\theta }} \end{aligned}$$

with

$$\begin{aligned} \mathbf {A}_n = \mathbf {u}(\mathbf {x}_n) \mathbf {u}(\mathbf {x}_n)^\mathsf {T}\quad \text {and} \quad \mathbf {B}_n = \partial _{\mathbf {x}}\mathbf {u}(\mathbf {x}_n) \varvec{\varLambda }_{\mathbf {x}_n}^{} \partial _{\mathbf {x}}\mathbf {u}(\mathbf {x}_n)^\mathsf {T}\end{aligned}$$
(2.5)

for each \(n = 1, \dots , N\). Here, for any length-\(2\) vector \(\mathbf {y}\), \(\partial _{\mathbf {x}}\mathbf {u}(\mathbf {y})\) denotes the \(6 \times 2\) matrix of the partial derivatives of the function \(\mathbf {x} \mapsto \mathbf {u}(\mathbf {x})\) evaluated at \(\mathbf {y}\), and, for each \(n = 1, \dots , N\), \(\varvec{\varLambda }_{\mathbf {x}_n}^{}\) is a \(2 \times 2\) symmetric covariance matrix describing the uncertainty of the data point \(\mathbf {x}_n\) [6, 13, 27]. The function \(J_{\mathrm {AML}}\) is a first-order approximation of a genuine maximum likelihood cost function \(J_{\mathrm {ML}}\) which can be evolved based on the Gaussian model of errors in conjunction with the principle of maximum likelihood. In the case when identical independent homogeneous Gaussian noise corrupts the data points, \(J_{\mathrm {ML}}\) reduces—up to a numeric constant depending on the noise level—to the sum of orthogonal distances of the data points and an ellipse. For the specific task of ellipse fitting, the search space for minimising \(J_{\mathrm {AML}}\) is reduced from the set of all length-\(6\) vectors to the set \(E = \{ \varvec{\theta }\mid \Delta < 0 \}\) of all ellipses, each member of the latter set being conventionally referred to as a feasible point. The approximated maximum likelihood estimate \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) is, by definition, the minimiser of \(J_{\mathrm {AML}}\) selected from among all feasible points.

The existence of a minimiser of \(J_{\mathrm {AML}}\) in the set of all ellipses is not always guaranteed. In contrast, there always exists a minimiser of \(J_{\mathrm {AML}}\) in the joint set \(EP = \{ \varvec{\theta }\mid \Delta \le 0\}\) of ellipses and parabolas. Indeed, \(J_{\mathrm {AML}}\) is homogeneous in \(\varvec{\theta }\) (meaning that \(J_{\mathrm {AML}}(\lambda \varvec{\theta }) = J_{\mathrm {AML}}(\varvec{\theta })\) for any scalar factor \(\lambda \)) and this implies that the range of \(J_{\mathrm {AML}}\)—the set of values that \(J_{\mathrm {AML}}\) can take—is the same as the range of the restriction of \(J_{\mathrm {AML}}\) to the five-dimensional unit sphere \(\mathbb {S}_5= \{ \varvec{\theta }\mid \Vert \varvec{\theta }\Vert = 1\}\), where \(\Vert \cdot \Vert \) denotes the Euclidean norm. Consequently, the search for a minimiser of \(J_{\mathrm {AML}}\) within \(EP\) may be restricted to the search within the intersection of \(EP\) and \(\mathbb {S}_5\), \(EP \cap \mathbb {S}_5\), characterised by the conjunction of the conditions \(\Delta \le 0\) and \(\Vert \varvec{\theta }\Vert = 1\). This intersection is a closed subset of \(\mathbb {S}_5\) and as such is compact in a topological sense. Now, the function \(J_{\mathrm {AML}}\) is continuous, and, given that any continuous function attains a minimum on a compact set, the function \(J_{\mathrm {AML}}\), considered over \(EP \cap \mathbb {S}_5\), attains its minimum on \(EP \cap \mathbb {S}_5\).

Typically, when the data set adheres sufficiently closely to a generative model producing data points based on a bona fide ellipse, the minimiser of \(J_{\mathrm {AML}}\) within \(EP\) will represent a genuine ellipse. In some scenarios, however, the minimiser will be a member of \(P = \{ \varvec{\theta }\mid \Delta = 0 \}\), the “parabolic” boundary of the set \(E\) of all ellipses. For example, when all data points lie on a line [7], the minimiser interpreted geometrically coincides with the line on which the points lie, this line being an instance of a degenerate parabola. In other words, the problem of minimising \(J_{\mathrm {AML}}\) within the set of ellipses is ill posed, and solving it requires some form of regularisation. We shall regularise the minimisation problem by adaptively restricting the search domain for a minimiser to a subset of \(E\). The restriction procedure will take the form of a stopping criterion for an optimisation algorithm, preventing iteratively generated estimates from going too close to the set of parabolas \(P\). While this approach involves a good deal of arbitrariness, one has to immediately point out that such arbitrariness is unavoidable: no ill-posed problem admits a unique, or canonical, regularisation.

3 Optimisation Using a Barrier Term

In our previous work [54] we developed a technique for near-optimising \(J_{\mathrm {AML}}\) subject to the ellipticity constraint. The method uses the merit function

$$\begin{aligned} P(\varvec{\theta }, \alpha ) = J_{\mathrm {AML}}(\varvec{\theta }) + \alpha g(\varvec{\theta }), \end{aligned}$$

where \(g\) is a barrier function of the form

$$\begin{aligned} g(\varvec{\theta }) = \frac{\Vert \varvec{\theta }\Vert ^4}{(\varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta })^2} = \Vert \varvec{\theta }\Vert ^4 \Delta ^{-2} \end{aligned}$$

and \(\alpha \) is a positive number. A solution representing an ellipse is sought in the form of a local minimiser of \(P\). The critical property of the barrier function is that it tends to infinity as \(\varvec{\theta }\) approaches the parabolic boundary \(P = \{ \varvec{\theta }\mid \varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta }= 0 \}\) between the feasible elliptic region \(E = \{ \varvec{\theta }\mid \varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta }> 0 \}\) and the infeasible hyperbolic region \(H = \{ \varvec{\theta }\mid \varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta }< 0 \}\) of the search space. This property ensures that if \(P\) is optimised in sufficiently short steps starting from a feasible point, then any local miminiser reached on the way is feasible; and if the value of \(\alpha \) is small enough, this local minimiser is a good approximation of a local minimiser of \(J_{\mathrm {AML}}\). By design, the parameter \(\alpha \) is set to a very small number (of the order of \(10^{-15}\)). The search for an optimal solution is carried out by a fast converging iterative procedure that takes the direct ellipse fit \(\widehat{\varvec{\theta }}_{\mathrm {DIR}}\) for a seed. The technique turns out to be statistically accurate and computationally efficient. However, an issue that might be of concern with regard to the method is that it produces ellipse estimates that are not exact minimisers of \(J_{\mathrm {AML}}\).

4 Optimisation Using a Parametrisation

Below we present a method capable of isolating ellipse estimates that are exact minimisers of \(J_{\mathrm {AML}}\) within the feasible part of the search space, provided that \(J_{\mathrm {AML}}\) admits a local minimum on that part. When \(J_{\mathrm {AML}}\) has no local minimum amongst the ellipses, the method delivers an ellipse estimate which is an approximation of a (possibly generalised) parabola constituting the minimiser of \(J_{\mathrm {AML}}\) within the joint set of ellipses and parabolas. The barrier term from our previous method plays no role here—this term is simply discarded. The method searches for a solution by employing a parametrisation of the set of all ellipses. The search is conducted with the aid of a specially crafted Levenberg–Marquardt (LM) algorithm, which ensures a fast convergence towards the optimal solution starting from an initial guess based on the direct ellipse fit. The specific design of our proposed algorithm is a substantial contribution of the present paper.

4.1 A Parametrisation of the Set of All Ellipses

The set of all ellipses admits a natural parametrisation. To reveal it, consider an ellipse described by \(\varvec{\theta }\). Then \(a \ne 0\) and \(c \ne 0\), for otherwise—should \(a = 0\) or \(c = 0\) hold—the discriminant \(\Delta = b^2 - 4ac = b^2\) would be non-negative, and this would contradict the fact that for ellipses the discriminant is always negative. Given this and the fact that the scale of \(\varvec{\theta }\) is irrelevant as far as the determination of the underlying ellipse is involved, it is safe to assume that \(a = 1\). Under this assumption the condition \(\Delta < 0\) becomes \(4c > b^2\), and letting \(b = 2p\), we see that \(c > p^2\), or, equivalently, \(c = p^2 + q^{-2}\) for some \(q\). It is now clear that the set of ellipses can be parametrised as

$$\begin{aligned} \varvec{\theta }= \varvec{\kappa }(\varvec{\eta }), \quad \varvec{\kappa }(\varvec{\eta }) = [1, 2p, p^2+q^{-2}, r, s, t]^\mathsf {T}, \end{aligned}$$
(4.1)

with \(\varvec{\eta }= [p,q,r,s,t]^\mathsf {T}\) running over all possible length-\(5\) vectors. A variant of the above parametrisation that will be particularly useful in what follows is given by

$$\begin{aligned} \varvec{\theta }= \varvec{\pi }(\varvec{\eta }), \quad \varvec{\pi }(\varvec{\eta }) = \Vert \varvec{\kappa }(\varvec{\eta }) \Vert ^{-1} \varvec{\kappa }(\varvec{\eta }). \end{aligned}$$
(4.2)

The significance of the unit-normalisation step will become apparent when we proceed to develop our customised version of the LM algorithm.

4.2 Optimisation Algorithm

With the parametrisation (4.2) in place, the effective function to optimise is

$$\begin{aligned} J_{\mathrm {AML}}'(\varvec{\eta }) = J_{\mathrm {AML}}(\varvec{\pi }(\varvec{\eta })). \end{aligned}$$

We shall next evolve a modified version of the LM algorithm for efficient optimisation of this function. It will be convenient to precede the description of our version of the LM scheme with a presentation of the standard form of the method.

4.2.1 Initialisation

Because the LM algorithm is an iterative technique, both its standard and modified form will require a proper initialisation. An initial value of \(\varvec{\eta }\), \(\varvec{\eta }_0\), can be extracted from any estimate \(\widehat{\varvec{\theta }}\) representing a genuine ellipse by setting \(\varvec{\eta }_0 = \mathbf {c}(\widehat{\varvec{\theta }})\), where \(\mathbf {c}\) is the composition of the mapping \(\varvec{\theta }\mapsto \varvec{\theta }/\theta _1 = [1, \theta _2/\theta _1, \dots \theta _6/\theta _1]^\mathsf {T}\) and the inverse of the mapping \(\varvec{\eta }\mapsto \varvec{\kappa }(\varvec{\eta })\). Specifically, \(\mathbf {c}(\varvec{\theta }) = [p, q, r, s, t]^\mathsf {T}\) is given by

$$\begin{aligned} p&= \frac{\theta _2}{2\theta _1}, \quad q = \left( \frac{\theta _3}{\theta _1}- \left( \frac{\theta _2}{2\theta _1} \right) ^2\right) ^{-1/2},\nonumber \\ r&= \frac{\theta _4}{\theta _1}, \quad s = \frac{\theta _5}{\theta _1}, \quad t = \frac{\theta _6}{\theta _1}. \end{aligned}$$
(4.3)

As the direct ellipse fit always represents a genuine ellipse, a natural choice for \(\varvec{\eta }_0\) is \(\mathbf {c}(\widehat{\varvec{\theta }}_{\mathrm {DIR}})\).

4.2.2 Standard Form of LM

Both the standard and our version of the LM scheme will rely on a least squares expression for \(J_{\mathrm {AML}}'\). Let \(\mathbf {r}(\varvec{\theta }) = [r_1(\varvec{\theta }), \dots , r_N(\varvec{\theta })]^\mathsf {T}\), where

$$\begin{aligned} r_n(\varvec{\theta }) = \left( \frac{\varvec{\theta }^\mathsf {T}\mathbf {A}_n \varvec{\theta }}{\varvec{\theta }^\mathsf {T}\mathbf {B}_n \varvec{\theta }} \right) ^{1/2} \end{aligned}$$

for each \(n = 1, \dots , N\). Then the function \(J_{\mathrm {AML}}\) can be written in least squares form as

$$\begin{aligned} J_{\mathrm {AML}}(\varvec{\theta }) = \mathbf {r}(\varvec{\theta })^\mathsf {T}\mathbf {r}(\varvec{\theta }) = \Vert \mathbf {r}(\varvec{\theta }) \Vert ^2. \end{aligned}$$

Similarly, the companion function \(J_{\mathrm {AML}}'\) can be expressed as

$$\begin{aligned} J_{\mathrm {AML}}'(\varvec{\eta }) = \Vert \mathbf {r}'(\varvec{\eta }) \Vert ^2, \end{aligned}$$

where\(\mathbf {r}'(\varvec{\eta }) = [r'_1(\varvec{\eta }), \dots , r'_N(\varvec{\eta })]^\mathsf {T}\) with \(r'_n(\varvec{\eta }) = r_n(\varvec{\pi }(\varvec{\eta }))\) for each \(n = 1, \dots , N\).

A full description of any variant of the LM scheme requires an explicit form of the Jacobian matrix of \(\mathbf {r}'(\varvec{\eta })\). A straightforward computation shows that the Jacobian matrix of \(\mathbf {r}(\varvec{\theta })\),

$$\begin{aligned} \partial _{\varvec{\theta }} \mathbf {r}(\varvec{\theta }) = [ (\partial _{\varvec{\theta }}r_1(\varvec{\theta }))^\mathsf {T}, \dots , (\partial _{\varvec{\theta }}r_N(\varvec{\theta }))^\mathsf {T}]^\mathsf {T}, \end{aligned}$$

is fully determined by the relations

$$\begin{aligned} (\partial _{\varvec{\theta }} r_n(\varvec{\theta }))^\mathsf {T}&= r_n^{-1}(\varvec{\theta }) \mathbf {X}_n(\varvec{\theta }) \varvec{\theta }, \\ \mathbf {X}_n(\varvec{\theta })&= \frac{\mathbf {A}_n}{\varvec{\theta }^\mathsf {T}\mathbf {B}_n \varvec{\theta }} - \frac{\varvec{\theta }^\mathsf {T}\mathbf {A}_n \varvec{\theta }}{(\varvec{\theta }^\mathsf {T}\mathbf {B}_n \varvec{\theta })^2} \mathbf {B}_n \end{aligned}$$

for all \(n = 1, \dots , N\). For each \(m = 1, 2, \dots \), denote by \(\mathbf {I}_m\) the \(m \times m\) identity matrix. Let \(\mathbf {P}_{\varvec{\theta }}^\perp \) be the symmetric matrix representing the projection along \(\varvec{\theta }\) onto the orthogonal complement of \(\varvec{\theta }\), given by

$$\begin{aligned} \mathbf {P}_{\varvec{\theta }}^\perp = \mathbf {I}_6 - \Vert \varvec{\theta }\Vert ^{-2} \varvec{\theta }\varvec{\theta }^\mathsf {T}. \end{aligned}$$
(4.4)

Then the Jacobian matrix of \(\mathbf {r}'(\varvec{\eta })\) can be expressed as

$$\begin{aligned} \partial _{\varvec{\eta }}{\mathbf {r}'}(\varvec{\eta }) = \partial _{\varvec{\theta }}{\mathbf {r}(\varvec{\pi }(\varvec{\eta }))} \, \partial _{\varvec{\eta }}{\varvec{\pi }}(\varvec{\eta }), \end{aligned}$$

where

$$\begin{aligned} \partial _{\varvec{\eta }}{\varvec{\pi }}(\varvec{\eta }) = \Vert \varvec{\kappa }(\varvec{\eta }) \Vert ^{-1} \mathbf {P}_{\varvec{\kappa }(\varvec{\eta })}^\perp \partial _{\varvec{\eta }}{\varvec{\kappa }(\varvec{\eta })} \end{aligned}$$

and

$$\begin{aligned} \partial _{\varvec{\eta }}{\varvec{\kappa }}(\varvec{\eta }) = \begin{bmatrix} 0&0&0&0&0 \\ 2&0&0&0&0 \\ 2p&-2q^{-3}&0&0&0 \\ 0&0&1&0&0 \\ 0&0&0&1&0 \\ 0&0&0&0&1 \end{bmatrix}. \end{aligned}$$

We can now summarise the standard LM algorithm for optimising \(J_{\mathrm {AML}}'\) as follows. Starting with an initial estimate \(\varvec{\eta }_0\), the scheme iteratively replaces a current estimate \(\varvec{\eta }_k\) with a new estimate \(\varvec{\eta }_{k+1}\) based on the update rule

$$\begin{aligned}&\varvec{\eta }_{k+1} = \varvec{\eta }_k + \delta \varvec{\eta }_k,\end{aligned}$$
(4.5)
$$\begin{aligned}&\delta \varvec{\eta }_k = - \big ((\partial _{\varvec{\eta }}{\mathbf {r}'}(\varvec{\eta }_k))^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'}(\varvec{\eta }_k) + \lambda _k \mathbf {I}_5\big )^{-1} (\partial _{\varvec{\eta }}{\mathbf {r}'}(\varvec{\eta }_k))^\mathsf {T}\mathbf {r}'(\varvec{\eta }_k). \end{aligned}$$
(4.6)

Here \(\lambda _k\) is a non-negative scalar that dynamically changes from step to step. Details concerning the choice of \(\lambda _k\) can be found in [47]; see also Sect. 9.

4.2.3 Modified Form of LM

Experiments show that the standard form of the LM scheme applied to \(J_{\mathrm {AML}}'\) can sometimes take relatively many iterations before reaching an optimal solution. To achieve a faster convergence, we shall modify the LM algorithm, replacing the standard weight matrix \(\lambda \mathbf {I}_5\) in (4.6) by a non-standard, step-dependent, positive-definite weight matrix \(\lambda \mathbf {W}_{\varvec{\eta }}\). The customised scheme retains the logic of the standard scheme in that for large values of the scale factor \(\lambda \), the LM step will align with a descent direction for \(J_{\mathrm {AML}}'\), which is the direction of \(-(\lambda \mathbf {W}_{\varvec{\eta }})^{-1} (\partial _{\varvec{\eta }}{\mathbf {r}'}(\varvec{\eta }_k))^\mathsf {T}\mathbf {r}'(\varvec{\eta }_k).\) This direction is no longer the direction of the steepest descent in the ordinary Euclidean norm on the parameter space, but rather in the norm \( \vert \vert \vert \varvec{\zeta } \vert \vert \vert = (\varvec{\zeta }^\mathsf {T}\mathbf {W}_{\varvec{\eta }} \varvec{\zeta })^{1/2}\) [43, Sect. 8.2.2]. For precedents on the use of weight matrices in LM different from multiples of the identity matrix, see [5, 44]. Passing to the specifics of our scheme, we shall adopt

$$\begin{aligned} \mathbf {W}_{\varvec{\eta }} = (\partial _{\varvec{\eta }}{\varvec{\pi }}(\varvec{\eta }))^\mathsf {T}\partial _{\varvec{\eta }}{\varvec{\pi }}(\varvec{\eta }), \end{aligned}$$
(4.7)

with the main ingredient of update rule then becoming

$$\begin{aligned} \delta \varvec{\eta }_k = - \big ((\partial _{\varvec{\eta }}{\mathbf {r}'}(\varvec{\eta }_k))^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'}(\varvec{\eta }_k) + \lambda _k \mathbf {W}_{\varvec{\eta }_k}\big )^{-1} (\partial _{\varvec{\eta }}{\mathbf {r}'}(\varvec{\eta }_k))^\mathsf {T}\mathbf {r}'(\varvec{\eta }_k).\nonumber \\ \end{aligned}$$
(4.8)

The motivation behind this choice of the weight matrix has to do with the experimentally observed fact that the LM algorithm for optimising the non-parametrised function \(J_{\mathrm {AML}}\) (which produces conics, but not necessarily ellipses, as optimal solutions), based on the rule

$$\begin{aligned} \varvec{\theta }_{k+1}&= \varvec{\theta }_k + \delta \varvec{\theta }_k,\end{aligned}$$
(4.9)
$$\begin{aligned} \delta \varvec{\theta }_k&= - \big ((\partial _{\varvec{\theta }}{\mathbf {r}}(\varvec{\theta }_k))^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}}(\varvec{\theta }_k) + \lambda _k \mathbf {I}_6\big )^{-1} (\partial _{\varvec{\theta }}{\mathbf {r}}(\varvec{\theta }_k))^\mathsf {T}\mathbf {r}(\varvec{\theta }_k), \end{aligned}$$
(4.10)

runs faster than the standard LM algorithm for optimising \(J_{\mathrm {AML}}'\) (as per (4.5) and (4.6)). One might hope then that if \(\mathbf {W}_{\varvec{\eta }}\) is chosen in such a way that the increment

$$\begin{aligned} \delta \varvec{\eta }= - \big ((\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} + \lambda \mathbf {W}_{\varvec{\eta }} \big )^{-1} (\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\mathbf {r}' \end{aligned}$$

transformed into an increment of \(\varvec{\theta }\) by means of the relation

$$\begin{aligned} \delta \varvec{\theta }= \partial _{\varvec{\eta }}{\varvec{\pi }} \, \delta \varvec{\eta }\end{aligned}$$

becomes

$$\begin{aligned} \delta \varvec{\theta }= - \big ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} + \lambda \mathbf {I}_6\big )^{-1} (\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\mathbf {r} \end{aligned}$$

and in so doing mimics the rule given in (4.10), then the LM algorithm employing such \(\delta \varvec{\eta }\) will run faster than the standard version of LM. Of course, the above is just a guiding principle whose merit has to be scrutinised experimentally. Fortunately, relevant results confirm the efficacy of the proposed approach—see Sect. 10.

We now discuss how the above condition on \(\mathbf {W}_{\varvec{\eta }}\) leads to formula (4.7). The defining property of \(\mathbf {W}_{\varvec{\eta }}\) is

$$\begin{aligned}&\!\!\!\partial _{\varvec{\eta }}{\varvec{\pi }} ((\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} +\lambda \mathbf {W}_{\varvec{\eta }})^{-1} (\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\mathbf {r}'\\&\quad =\big ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} + \lambda \mathbf {I}_6\big )^{-1} (\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\mathbf {r}. \end{aligned}$$

Here, of course, \(\mathbf {r}'\) and \(\partial _{\varvec{\eta }}{\mathbf {r}'}\) are evaluated at \(\varvec{\eta }\), and \(\mathbf {r}\) and \(\partial _{\varvec{\theta }}{\mathbf {r}}\) are evaluated at \(\varvec{\pi }(\varvec{\eta })\); in particular, \(\mathbf {r}' = \mathbf {r}'(\varvec{\eta }) = \mathbf {r}(\varvec{\pi }(\varvec{\eta })) = \mathbf {r}\). Taking into account that

$$\begin{aligned}&\!\!\!\partial _{\varvec{\eta }}{\varvec{\pi }} ((\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} + \lambda \mathbf {W}_{\varvec{\eta }})^{-1} (\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\mathbf {r}'\\&\quad =\partial _{\varvec{\eta }}{\varvec{\pi }} ((\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} + \lambda \mathbf {W}_{\varvec{\eta }})^{-1} (\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}(\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\mathbf {r}', \end{aligned}$$

we see that \(\mathbf {W}_{\varvec{\eta }}\) has to satisfy

$$\begin{aligned} \partial _{\varvec{\eta }}{\varvec{\pi }} ((\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} + \lambda \mathbf {W}_{\varvec{\eta }})^{-1} (\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}= ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6)^{-1}. \end{aligned}$$
(4.11)

For a given matrix \(\mathbf {A}\), denote by \(\mathbf {A}^+\) the Moore–Penrose pseudo-inverse of \(\mathbf {A}\) [35]. It is easily seen that

$$\begin{aligned}&(\partial _{\varvec{\eta }}{\varvec{\pi }})^+ = ((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}\partial _{\varvec{\eta }}{\varvec{\pi }})^{-1}(\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}, \end{aligned}$$
(4.12a)
$$\begin{aligned}&((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T})^+ = \partial _{\varvec{\eta }}{\varvec{\pi }} ((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}\partial _{\varvec{\eta }}{\varvec{\pi }})^{-1}. \end{aligned}$$
(4.12b)

Hence, in particular,

$$\begin{aligned}&(\partial _{\varvec{\eta }}{\varvec{\pi }})^+ \partial _{\varvec{\eta }}{\varvec{\pi }} = \mathbf {I}_5, \end{aligned}$$
(4.13a)
$$\begin{aligned}&(\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T})^+ = \mathbf {I}_5. \end{aligned}$$
(4.13b)

Pre-multiplying and post-multiplying both sides of (4.11) by \((\partial _{\varvec{\eta }}{\varvec{\pi }})^+\) and \(((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T})^+,\) respectively, and invoking (4.13a) and (4.13b), we get

$$\begin{aligned} ((\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} + \lambda \mathbf {W}_{\varvec{\eta }})^{-1}&= (\partial _{\varvec{\eta }}{\varvec{\pi }})^+ ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}}+\lambda \mathbf {I}_6)^{-1} \\&\times ((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T})^+. \end{aligned}$$

Now the critical ingredient of our argument is the equality

$$\begin{aligned}&\!\!\!(\partial _{\varvec{\eta }}{\varvec{\pi }})^+ ((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6)^{-1} ((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T})^+\nonumber \\&\quad =\big ((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6) \partial _{\varvec{\eta }}{\varvec{\pi }}\big )^{-1}. \end{aligned}$$
(4.14)

We defer the rather lengthy proof of this equality to Appendix 1. Assuming (4.14) for now, we find that

$$\begin{aligned} ((\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} + \lambda \mathbf {W}_{\varvec{\eta }})^{-1} = \big ((\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6) \partial _{\varvec{\eta }}{\varvec{\pi }}\big )^{-1} \end{aligned}$$

and further

$$\begin{aligned} (\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} + \lambda \mathbf {W}_{\varvec{\eta }}&= (\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}((\partial _{\varvec{\theta }}{\mathbf {r}})^\mathsf {T}\partial _{\varvec{\theta }}{\mathbf {r}} +\lambda \mathbf {I}_6) \partial _{\varvec{\eta }}{\varvec{\pi }}\\&= (\partial _{\varvec{\eta }}{\mathbf {r}'})^\mathsf {T}\partial _{\varvec{\eta }}{\mathbf {r}'} + \lambda (\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}\partial _{\varvec{\eta }}{\varvec{\pi }}. \end{aligned}$$

Hence, immediately, \(\mathbf {W}_{\varvec{\eta }} = (\partial _{\varvec{\eta }}{\varvec{\pi }})^\mathsf {T}\partial _{\varvec{\eta }}{\varvec{\pi }},\) as desired.

5 Two Forms of Estimate Covariances

To be truly useful, an estimate of an ellipse must be accompanied by a reliable measure of its uncertainty characterising the dispersion of values that the estimate could reasonably attain. A fundamental advantage of the AML cost function over the ML cost function is that the former allows one to obtain, in a relatively straightforward way, a quantitative measure of uncertainty for any estimate related to that function. The measure takes the form of a covariance matrix. Here we present two closely related expressions for the covariance matrix of an AML estimate of an ellipse. The first of these concerns the case that the estimate is expressed in terms of the familiar algebraic parameters \(\varvec{\theta }\). The second concerns the case that the estimate is represented via the natural geometric parameters of an ellipse, namely centre co-ordinates, semi-major and semi-minor axes, and orientation.

5.1 Covariance of the Algebraically Parametrised Estimate

Any effort to assess the uncertainty of an estimate like \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) has to involve assumptions as to the nature of noise in the data. The assumption behind the formula for the covariance matrix of \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\), \(\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}\), is the same as that underlying the development of \(J_{\mathrm {AML}}\), namely that the observed data points \(\mathbf {x}_1, \dots , \mathbf {x}_N\) are noisy versions of noise-free data points, with the noise in each \(\mathbf {x}_n\) independent and Gaussian, of zero mean and covariance matrix \(\varvec{\varLambda }_{\mathbf {x}_n}^{}\).

Given an \(m \times m\) matrix \(\mathbf {A}\) and a positive integer \(r\) no greater than \(m\), we denote by \(\mathbf {A}_r\) the \(r\) -truncated SVD of \(\mathbf {A}\) and by \(\mathbf {A}_r^+\) the r-truncated pseudo-inverse of \(\mathbf {A}\). These are defined as follows. If \(\mathbf {A} = \mathbf {U} \mathbf {D} \mathbf {V}^\mathsf {T}\) is the singular value decomposition (SVD) of \(\mathbf {A}\), with \(\mathbf {D} = {{\mathrm{\mathrm {diag}}}}(d_1, \dots , d_m)\), then \(\mathbf {A}_r = \mathbf {U} \mathbf {D}_r \mathbf {V}^\mathsf {T}\) with \(\mathbf {D}_r = {{\mathrm{\mathrm {diag}}}}(d_1, \dots , d_r, 0, \dots , 0)\), and \(\mathbf {A}_r^+ = \mathbf {V} \mathbf {D}_r^+ \mathbf {U}^\mathsf {T}\) with \(\mathbf {D}_r^+ = {{\mathrm{\mathrm {diag}}}}(d_1^+, \dots , d_r^+, 0, \dots , 0)\), where \(d_i^+ = d_i^{-1}\) when \(d_i \ne 0\) and \(d_i^+ = 0\) otherwise. Continuing with the preparations, we introduce the matrix

$$\begin{aligned} \mathbf {M}_{\varvec{\theta }} = \sum _{n=1}^N \frac{\mathbf {A}_n}{\varvec{\theta }^{\mathsf {T}} \mathbf {B}_n \varvec{\theta }}. \end{aligned}$$
(5.1)

We are now ready to present our first covariance matrix formula. Under the assumption that \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) is normalised, \(\Vert \widehat{\varvec{\theta }}_{\mathrm {AML}}\Vert = 1\), the covariance matrix of \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) is given by

$$\begin{aligned} \varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{} = \mathbf {P}_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^\perp (\mathbf {M}_{\widehat{\varvec{\theta }}_{\mathrm {AML}}})^+_5 \mathbf {P}_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^\perp . \end{aligned}$$
(5.2)

We remark that the presence of \(\mathbf {P}_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^\perp \) here makes the matrix \(\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}\) singular, with \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) in the null space of \(\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}\). This reflects the particular way in which the scale ambiguity of the estimates has been eliminated, namely by scaling the estimates to unit length. The details of the derivation of (5.2) are somewhat technical and are deferred to Appendix 2. For additional results related to the content of Appendix 2, the interested reader is referred to [18] and [31].

5.2 Covariance of the Geometrically Parametrised Estimate

Alongside a \(\varvec{\theta }\)-vector of algebraic parameters, an ellipse can be represented by a \(\varvec{\xi }\)-vector of geometric parameters, \(\varvec{\xi }= [A, B, H, K, \tau ]^\mathsf {T}.\) Here \(A\) and \(B\) denote the semi-major axis and the semi-minor axis of the ellipse, \(H\) and \(K\) denote the \(x\) and \(y\) coordinates of the centre of the ellipse, and \(\tau \) is the angle formed by the major axis with the positive \(x\)-axis. Recalling that \(\Delta \) denotes the discriminant (\(\Delta = b^2 - 4ac\)), we let

$$\begin{aligned} \lambda _{\pm }&= \frac{1}{2}\left( a + c \mp \left( b^2+(a-c)^2\right) ^{1/2}\right) \!,\\ \psi&= bde - ae^2 - b^2f + c(4af - d^2),\\ V^{\pm }&= \left( \frac{\psi }{\lambda _{\pm }\Delta }\right) ^{1/2}, \end{aligned}$$

where \(\pm \) and \(\mp \) are shorthand for \(+\) or \(-\) that allow presentation of two expressions in one formula, with the upper \(-\) of \(\mp \) associated with the \(+\) of \(\pm \). It is a matter of a straightforward but tedious analysis to establish the following conversion rules for passing from the \(\varvec{\theta }\)-based to the \(\varvec{\xi }\)-based description of the ellipse (see [62, Sect. 4.10.2] for the starting point of the derivation of the formulae):

$$\begin{aligned}&\!\!\!A = \max (V^+,V^-), \quad B = \min (V^+,V^-),\end{aligned}$$
(5.3)
$$\begin{aligned}&\!\!\!H = \frac{2cd- be}{\Delta }, \quad K = \frac{2ae - bd}{\Delta },\end{aligned}$$
(5.4)
$$\begin{aligned}&\!\!\!\tau = {\left\{ \begin{array}{ll} \frac{1}{2}{{\mathrm{arccot}}}\left( \frac{a-c}{b}\right) &{} \text {if } b < 0, a < c\hbox { and }V^+\ge V^-,\\ \frac{\pi }{4} &{} \text {if }b < 0, a = c\hbox { and }V^+\ge V^-,\\ \frac{1}{2}{{\mathrm{arccot}}}\left( \frac{a-c}{b}\right) + \frac{\pi }{2} &{} \text {if }b < 0, a > c\hbox { and }V^+\ge V^-,\\ 0 &{} \text {if }b = 0, a < c\hbox { and }V^+\ge V^-,\\ \frac{\pi }{2} &{} \text {if }b = 0, a \ge c\hbox { and }V^+\ge V^-,\\ \frac{1}{2}{{\mathrm{arccot}}}\left( \frac{a-c}{b}\right) + \pi &{} \text {if }b > 0, a < c\hbox { and }V^+\ge V^-,\\ \frac{3\pi }{4} &{} \text {if }b > 0, a = c\hbox { and }V^+\ge V^-,\\ \frac{1}{2}{{\mathrm{arccot}}}\left( \frac{a-c}{b}\right) + \frac{\pi }{2} &{} \text {if }b > 0, a > c\hbox { and }V^+\ge V^-,\\ \frac{1}{2}{{\mathrm{arccot}}}\left( \frac{a-c}{b}\right) + \frac{\pi }{2} &{} \text {if }b < 0, a < c\hbox { and }V^+< V^-,\\ \frac{3\pi }{4} &{} \text {if }b < 0, a = c\hbox { and }V^+< V^-,\\ \frac{1}{2}{{\mathrm{arccot}}}\left( \frac{a-c}{b}\right) + \pi &{} \text {if }b < 0, a > c\hbox { and }V^+< V^-,\\ \frac{\pi }{2} &{} \text {if }b = 0, a < c\hbox { and }V^+< V^-,\\ 0 &{} \text {if }b = 0, a \ge c\hbox { and }V^+< V^-,\\ \frac{1}{2}{{\mathrm{arccot}}}\left( \frac{a-c}{b}\right) + \frac{\pi }{2} &{} \text {if }b > 0, a < c\hbox { and }V^+< V^-,\\ \frac{\pi }{4}&{} \text {if }b > 0, a = c\hbox { and }V^+< V^-,\\ \frac{1}{2}{{\mathrm{arccot}}}\left( \frac{a-c}{b}\right) &{} \text {if }b > 0, a > c\hbox { and }V^+< V^-. \end{array}\right. } \end{aligned}$$
(5.5)

We remark that the formula for \(\tau \) is valid only under the assumption that the ellipse is not a circle; that is, provided the inequality \((a-c)^2 + b^2 > 0\) holds.

With the aid of (5.3)–(5.5), the geometric content of \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) can be revealed as \(\varvec{\xi }(\widehat{\varvec{\theta }}_{\mathrm {AML}})\). We term \(\varvec{\xi }(\widehat{\varvec{\theta }}_{\mathrm {AML}})\) the AML estimate of the geometric parameters of an ellipse and denote it by \(\widehat{\varvec{\xi }}_{\mathrm {AML}}\). For consistency, we shall refer to \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) as the AML estimate of the algebraic parameters of an ellipse. The rule of covariance propagation readily permits finding the covariance matrix of the AML estimate of the geometric parameters of an ellipse from the covariance matrix of the AML estimate of the algebraic parameters. Assuming that \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) is normed to unity, we have

$$\begin{aligned} \varvec{\varLambda }_{\widehat{\varvec{\xi }}_{\mathrm {AML}}}^{} = [\partial _{\varvec{\theta }}{\varvec{\xi }}]_{\varvec{\theta }= \widehat{\varvec{\theta }}_{\mathrm {AML}}}\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{} [(\partial _{\varvec{\theta }}{\varvec{\xi }})]_{\varvec{\theta }= \widehat{\varvec{\theta }}_{\mathrm {AML}}}^\mathsf {T}, \end{aligned}$$
(5.6)

The \(5 \times 6\) Jacobian matrix \(\partial _{\varvec{\theta }}{\varvec{\xi }} = [\partial _{\varvec{\theta }}{A}^\mathsf {T}, \partial _{\varvec{\theta }}{B}^\mathsf {T}, \partial _{\varvec{\theta }}{H}^\mathsf {T}, \partial _{\varvec{\theta }}{K}^\mathsf {T}, \partial _{\varvec{\theta }}{\tau }^\mathsf {T}]^\mathsf {T}\) can be specified explicitly as follows. The first two rows are given by

$$\begin{aligned} \partial _{\varvec{\theta }}{A}&= {\left\{ \begin{array}{ll} \partial _{\varvec{\theta }}{V^+} &{} \text {if } A = V^+, \\ \partial _{\varvec{\theta }}{V^-} &{} \text {if } A = V^-, \end{array}\right. } \end{aligned}$$
(5.7)
$$\begin{aligned} \partial _{\varvec{\theta }}{B}&= {\left\{ \begin{array}{ll} \partial _{\varvec{\theta }}{V^+} &{} \text {if } B = V^+, \\ \partial _{\varvec{\theta }}{V^-} &{} \text {if } B = V^-, \end{array}\right. } \end{aligned}$$
(5.8)

where \(\partial _{\varvec{\theta }}{V^{\pm }} = [\partial _{a}{V^{\pm }}, \partial _{b}{V^{\pm }}, \partial _{c}{V^{\pm }}, \partial _{d}{V^{\pm }}, \partial _{e}{V^{\pm }}, \partial _{f}{V^{\pm }}]\) is given by

$$\begin{aligned} \partial _{a}{V^{\pm }}&= \frac{1}{2\lambda _{\pm }\Delta } \left( \frac{\psi }{\lambda _{\pm }\Delta }\right) ^{-1/2} \left[ 4cf - e^2 + 4\Delta ^{-1} c \psi \right. \nonumber \\&\left. \quad - \frac{\psi }{2\lambda _{\pm }} \left( 1 \pm \frac{c - a}{\left( (a - c)^2+ b^2\right) ^{1/2}} \right) \right] ,\end{aligned}$$
(5.9)
$$\begin{aligned} \partial _{b}{V^{\pm }}&= \frac{1}{2\lambda _{\pm }\Delta } \left( \frac{\psi }{\lambda _{\pm }\Delta }\right) ^{-1/2} \left[ de - 2bf - 2\Delta ^{-1} b \psi \right. \nonumber \\&\left. \quad \pm \frac{b \psi }{2\lambda _{\pm }\left( (a - c)^2 + b^2\right) ^{1/2}} \right] , \end{aligned}$$
(5.10)
$$\begin{aligned} \partial _{c}{V^{\pm }}&= \frac{1}{2\lambda _{\pm }\Delta } \left( \frac{\psi }{\lambda _{\pm }\Delta }\right) ^{-1/2} \left[ 4af - d^2 + 4\Delta ^{-1} a \psi \right. \nonumber \\&\left. \quad - \frac{\psi }{2\lambda _{\pm }} \left( 1 \pm \frac{a - c}{\left( (a - c)^2+ b^2\right) ^{1/2}} \right) \right] ,\end{aligned}$$
(5.11)
$$\begin{aligned} \partial _{d}{V^{\pm }}&= \left( \frac{\psi }{\lambda _{\pm }\Delta }\right) ^{1/2} \frac{be - 2cd}{2\psi }, \end{aligned}$$
(5.12)
$$\begin{aligned} \partial _{e}{V^{\pm }}&= \left( \frac{\psi }{\lambda _{\pm }\Delta }\right) ^{1/2} \frac{bd - 2ae}{2\psi },\end{aligned}$$
(5.13)
$$\begin{aligned} \partial _{f}{V^{\pm }}&= - \frac{1}{2\lambda _{\pm }} \left( \frac{\psi }{\lambda _{\pm }\Delta }\right) ^{-1/2}. \end{aligned}$$
(5.14)

The remaining three rows are given by

$$\begin{aligned} \partial _{\varvec{\theta }}{H}&= \left[ \frac{4c(2cd-be)}{\Delta ^2}, \frac{b^2e + 4ace - 4bcd}{\Delta ^2}, \right. \nonumber \\&\quad \left. \frac{2b(bd-2ae)}{\Delta ^2}, \frac{2c}{\Delta }, - \frac{b}{\Delta }, 0 \right] ,\end{aligned}$$
(5.15)
$$\begin{aligned} \partial _{\varvec{\theta }}{K}&= \left[ \frac{2b(be - 2cd)}{\Delta ^2}, \frac{b^2d+4acd-4abe}{\Delta ^2}, \right. \nonumber \\&\quad \left. \frac{4a(2ae - bd)}{\Delta ^2}, - \frac{b}{\Delta }, \frac{2a}{\Delta }, 0 \right] ,\end{aligned}$$
(5.16)
$$\begin{aligned} \partial _{\varvec{\theta }}{\tau }&= \left[ -\frac{b}{2\left( b^2+(a-c)^2\right) }, \frac{a-c}{2\left( b^2+(a-c)^2\right) }, \right. \nonumber \\&\quad \left. \frac{b}{2\left( b^2+(a-c)^2\right) }, 0 ,0 ,0 \right] . \end{aligned}$$
(5.17)

6 Confidence Regions

The reliability of an AML estimate of an ellipse can be suggestively expressed using the concept of a confidence region. It is standard to build a confidence region of a parameter vector estimate as a portion of the parameter space that contains the true parameter vector with a given high probability. The parameter space for all ellipses is five dimensional and as such is difficult to grasp. Here we propose a more visually appealing form of ellipse-specific confidence region, namely a confidence region in the plane. Such a region is meant to cover the in-plane locus of the true ellipse with a given high probability.

The probability statement associated with a confidence region requires some explanation if it is not to be misinterpreted. The point is that the statement is not meant to imply that every confidence region contains the true parameters with a specified high probability. Rather, the probability expresses how frequently a confidence region can be expected to contain the true parameters in the long-run process of taking random data points and obtaining regions.

The first to consider planar confidence regions for ellipse fits was Porrill [46]. Our approach is inspired by Scheffé’s \(S\)-method for constructing simultaneous confidence bands for linear regression [51], [34, Sect. 9.4–5]. The starting point for the construction is the observation that when \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) is viewed as a multivariate normally distributed random vector,

$$\begin{aligned} \widehat{\varvec{\theta }}_{\mathrm {AML}}\sim N(\varvec{\theta }_*, \varvec{\varLambda }_{\varvec{\theta }_*}^{}), \end{aligned}$$

where \(\varvec{\theta }_*\) is the parameter vector of the true ellipse and \(\varvec{\varLambda }_{\varvec{\theta }_*}^{}\) is a covariance matrix, the scalar random variable \(\widehat{\varvec{\theta }}_{\mathrm {AML}}^T \mathbf {u}(\mathbf {x})\) is normally distributed with variance \(\mathbf {u}(\mathbf {x})^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{} \mathbf {u}(\mathbf {x})\) for every point \(\mathbf {x}\) on the locus \(\mathcal {E}_{\varvec{\theta }_*} = \{\mathbf {x} \in \mathbb {R}^2 \mid \varvec{\theta }_*^T \mathbf {u}(\mathbf {x}) = 0\}\) of the true ellipse. The observation is based on the fact that \(\widehat{\varvec{\theta }}_{\mathrm {AML}}^T \mathbf {u}(\mathbf {x}) = (\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*)^T \mathbf {u}(\mathbf {x})\) whenever \(\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}\) and the fact that by the rule of covariance propagation \((\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*)^T \mathbf {u}(\mathbf {x})\) has variance \(\mathbf {u}(\mathbf {x})^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{} \mathbf {u}(\mathbf {x})\). Consequently, under the assumption that \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) is an unbiased estimate of \(\varvec{\theta }_*\),Footnote 1

$$\begin{aligned} z_{\mathbf {x}} = \frac{(\widehat{\varvec{\theta }}_{\mathrm {AML}}^T \mathbf {u}(\mathbf {x}))^2}{\mathbf {u}(\mathbf {x})^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{} \mathbf {u}(\mathbf {x})} \end{aligned}$$

is a squared normal random variable for every \(\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}\). Each \(z_{\mathbf {x}}\), insofar as \(\mathbf {x}\) belongs to \(\mathcal {E}_{\varvec{\theta }_*}\), attains large values with less probability than small values, with the probability of any particular set of values regarded as large or small being independent of \(\mathbf {x}\). This suggests using the \(z_{\mathbf {x}}\) as building blocks in the construction of a confidence region in the plane. Since the covariance \(\varvec{\varLambda }_{\varvec{\theta }_*}^{}\) is unknown, the \(z_{\mathbf {x}}\) do not have observable realisations and, for the sake of construction, have to be replaced with these variables’ observable variants

$$\begin{aligned} \hat{z}_{\mathbf {x}} = \frac{(\widehat{\varvec{\theta }}_{\mathrm {AML}}^T \mathbf {u}(\mathbf {x}))^2}{\mathbf {u}(\mathbf {x})^\mathsf {T}\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{} \mathbf {u}(\mathbf {x})}, \end{aligned}$$
(6.1)

where the covariance estimate \(\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}\) serves as a natural replacement for \(\varvec{\varLambda }_{\varvec{\theta }_*}^{}\). Again, large observed values of \(\hat{z}_{\mathbf {x}}\) are less plausible than small observed values as long as \(\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}\). It is thus natural to consider confidence regions for \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) in the form

$$\begin{aligned} \left\{ \mathbf {x} \in \mathbb {R}^2 \mid \hat{z}_{\mathbf {x}} \le c \right\} , \end{aligned}$$

where \(c\) is a positive constant. Ideally, for a confidence region at (confidence) level \(1 - \alpha \), we should choose \(c\) such that

$$\begin{aligned} \mathsf {P}\left( z_{\mathbf {x}} \le c \ \text {for all} \ \mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*} \right) = \mathsf {P}\left( \sup _{\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}} z_{\mathbf {x}} \le c \right) = 1 - \alpha , \end{aligned}$$

where \(\mathsf {P}(A)\) denotes the probability of the event \(A\). But the distribution of \(\sup _{\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}} z_{\mathbf {x}}\) is not easy to determine, so as a second best choice we shall replace \(\sup _{\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}} z_{\mathbf {x}}\) by a random upper bound whose distribution can be readily calculated. Proceeding to the specifics, we may assume, in line with fact that the parameter space of all ellipses is five-dimensional, that \((\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*)^\mathsf {T}\varvec{\theta }_*= 0,\) or equivalently, \(\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*= \mathbf {P}_{\varvec{\theta }_*}^\perp (\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*).\) This then leads to

$$\begin{aligned} \mathbf {P}_{\varvec{\theta }_*}^\perp \varvec{\varLambda }_{\varvec{\theta }_*}^{} = \varvec{\varLambda }_{\varvec{\theta }_*}^{} \mathbf {P}_{\varvec{\theta }_*}^\perp = \varvec{\varLambda }_{\varvec{\theta }_*}^{}. \end{aligned}$$
(6.2)

Now, if \(\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}\), then

$$\begin{aligned} \widehat{\varvec{\theta }}_{\mathrm {AML}}^\mathsf {T}\mathbf {u}(\mathbf {x}) = \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) ^\mathsf {T}\mathbf {u}(\mathbf {x}) = (\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*)^\mathsf {T}\mathbf {P}_{\varvec{\theta }_*}^\perp \mathbf {u}(\mathbf {x}). \end{aligned}$$

But it follows from (6.2) that \((\varvec{\varLambda }_{\varvec{\theta }_*}^{+})^{1/2} \varvec{\varLambda }_{\varvec{\theta }_*}^{1/2} = \mathbf {P}_{\varvec{\theta }_*}^\perp ,\) so

$$\begin{aligned} \widehat{\varvec{\theta }}_{\mathrm {AML}}^\mathsf {T}\mathbf {u}(\mathbf {x})&= \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) ^\mathsf {T}(\varvec{\varLambda }_{\varvec{\theta }_*}^{+})^{1/2} \varvec{\varLambda }_{\varvec{\theta }_*}^{1/2} \mathbf {u}(\mathbf {x})\\&= \left( \left( \varvec{\varLambda }_{\varvec{\theta }_*}^{+}\right) ^{1/2} \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) \right) ^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{1/2} \mathbf {u}(\mathbf {x}). \end{aligned}$$

By the Cauchy–Bunyakovsky–Schwarz inequality,

$$\begin{aligned} (\widehat{\varvec{\theta }}_{\mathrm {AML}}^\mathsf {T}\mathbf {u}(\mathbf {x}))^2 \le \Vert (\varvec{\varLambda }_{\varvec{\theta }_*}^{+})^{1/2} (\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*) \Vert ^2 \Vert \varvec{\varLambda }_{\varvec{\theta }_*}^{1/2} \mathbf {u}(\mathbf {x}) \Vert ^2. \end{aligned}$$

Also

$$\begin{aligned}&\left\| (\varvec{\varLambda }_{\varvec{\theta }_*}^{+})^{1/2} (\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*) \right\| ^2 \\&\quad = \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) ^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{+} \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) \end{aligned}$$

and

$$\begin{aligned} \left\| \varvec{\varLambda }_{\varvec{\theta }_*}^{1/2} \mathbf {u}(\mathbf {x}) \right\| ^2 = \mathbf {u}(\mathbf {x})^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{} \mathbf {u}(\mathbf {x}). \end{aligned}$$

Hence

$$\begin{aligned} z_{\mathbf {x}} \le \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) ^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{+} \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) . \end{aligned}$$

Since \(\mathbf {x}\) is an arbitrary member of \(\mathcal {E}_{\varvec{\theta }_*}\), we have

$$\begin{aligned} \sup _{\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}} z_{\mathbf {x}} \le \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) ^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{+} \left( \widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*\right) . \end{aligned}$$
(6.3)

Now the random variable \((\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*)^\mathsf {T}\varvec{\varLambda }_{\varvec{\theta }_*}^{+} (\widehat{\varvec{\theta }}_{\mathrm {AML}}- \varvec{\theta }_*)\) has approximately a chi-squared distribution with \(5\) degree of freedom. Let \(\chi ^2_{5, \alpha }\) denote the \(100(1 - \alpha )\%\) percentile of the \(\chi ^2\) distribution with \(5\) degrees of freedom, characterised by the relation \(\mathsf {P}\left( \chi ^2 \le \chi ^2_{5,\alpha }\right) = 1 - \alpha .\) Inequality (6.3) guarantees that

$$\begin{aligned} \mathsf {P}\left( \sup _{\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}} z_{\mathbf {x}} \le \chi ^2_{5,\alpha } \right) \ge 1 - \alpha \end{aligned}$$

Substituting \(\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}\) for \(\varvec{\varLambda }_{\varvec{\theta }_*}^{}\), we also approximately have

$$\begin{aligned} \mathsf {P}\left( \sup _{\mathbf {x} \in \mathcal {E}_{\varvec{\theta }_*}} \hat{z}_{\mathbf {x}} \le \chi ^2_{5,\alpha } \right) \ge 1 - \alpha . \end{aligned}$$

This allows an approximate confidence region at level \(1 - \alpha \) for \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) to be taken as

$$\begin{aligned} \Omega _{\alpha } = \left\{ \mathbf {x} \in \mathbb {R}^2 \mid \hat{z}_{\mathbf {x}} \le \chi ^2_{5,\alpha }\right\} . \end{aligned}$$
(6.4)

We finally point out that if \(\alpha \) is set to the standard conventional value of \(0.05\), then \(\chi ^2_{5, \alpha } = 11.07\).

7 Data Normalisation

For reason of numerical stability, it is important to incorporate data normalisation, or pre-conditioning, in the process of calculating AML estimates and their covariances. One popular form of data normalisation is due to Hartley [12, 14, 23] and can be formulated with the aid of a data-dependent \(3 \times 3\) matrix \(\mathbf {T}\) given by

$$\begin{aligned} \mathbf {T} = \begin{bmatrix} s^{-1}&0&-s^{-1} \overline{m}_1 \\ 0&s^{-1}&-s^{-1} \overline{m}_2 \\ 0&0&1 \end{bmatrix}, \end{aligned}$$

where

$$\begin{aligned} \overline{m}_i = \frac{1}{N} \sum _{n=1}^N m_{n,i} \quad (i = 1,2) \end{aligned}$$

and

$$\begin{aligned} s = \left( \frac{1}{2N} \sum _{n=1}^N (m_{n,1} - \overline{m}_1)^2 + (m_{n,2} - \overline{m}_2)^2 \right) ^{1/2}. \end{aligned}$$

If \(\mathbf {m} = [m_1, m_2, 1]^\mathsf {T}\) represents a data point in the original, un-normalised coordinates, then \(\tilde{\mathbf {m}} = [\tilde{m}_1, \tilde{m}_2, 1]^\mathsf {T}\) defined by \(\tilde{\mathbf {m}} = \mathbf {T} \mathbf {m}\) represents the same point in the normalised coordinates. The form of \(\mathbf {T}\) reflects the fact that the transformation \(\mathbf {m} \mapsto \tilde{\mathbf {m}}\) scales down the data set to a unit box with centre at the origin. In the normalised coordinates, Eq. (2.2) can be equivalently written as \(\tilde{\varvec{\theta }}^\mathsf {T}\mathbf {u}(\tilde{\mathbf {x}}) = 0\) with \(\tilde{\mathbf {x}} = [\tilde{m}_1, \tilde{m}_2]^\mathsf {T}\) and \(\tilde{\varvec{\theta }}\) related to \(\varvec{\theta }\) via the relation

$$\begin{aligned} \tilde{\varvec{\theta }}= \mathbf {E}^{-1} \mathbf {P}_{(34)} \mathbf {D}_3^+ (\mathbf {T} \otimes \mathbf {T})^{-\mathsf {T}} \mathbf {D}_3 \mathbf {P}_{(34)} \mathbf {E}\varvec{\theta }. \end{aligned}$$
(7.1)

Here, \(\mathbf {D}_3\) is the \(9 \times 6\) duplication matrix given by

$$\begin{aligned} \mathbf {D}_3 = \begin{bmatrix} 1&0&0&0&0&0 \\ 0&1&0&0&0&0 \\ 0&0&1&0&0&0 \\ 0&1&0&0&0&0 \\ 0&0&0&1&0&0 \\ 0&0&0&0&1&0 \\ 0&0&1&0&0&0 \\ 0&0&0&0&1&0 \\ 0&0&0&0&0&1 \end{bmatrix}, \end{aligned}$$

\(\mathbf {P}_{(34)}\) is the permutation matrix for interchanging the 3rd and 4th entries of length-\(6\) vectors, given by

$$\begin{aligned} \mathbf {P}_{(34)} = {{\mathrm{\mathrm {diag}}}}(0,1,0) \otimes \begin{bmatrix} 0&1 \\ 1&0 \end{bmatrix} + {{\mathrm{\mathrm {diag}}}}(1,0,1) \otimes \begin{bmatrix} 1&0 \\ 0&1 \end{bmatrix}, \end{aligned}$$

and \(\mathbf {E} = {{\mathrm{\mathrm {diag}}}}(1, 2^{-1}, 1, 2^{-1}, 2^{-1}, 1).\) The computation of \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) involving data normalisation proceeds in two steps. First, an AML estimate of \(\tilde{\varvec{\theta }}\) relative to the normalised coordinates, \(\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}\), is extracted by minimising the AML cost function

$$\begin{aligned} \tilde{J}_{\mathrm {AML}}(\tilde{\varvec{\theta }}) = \sum _{n=1}^N \frac{\tilde{\varvec{\theta }}^\mathsf {T}\tilde{\mathbf {A}}_n \tilde{\varvec{\theta }}}{\tilde{\varvec{\theta }}^\mathsf {T}\tilde{\mathbf {B}}_n \tilde{\varvec{\theta }}} \end{aligned}$$

where, for each \(n = 1, \dots , N\), \(\tilde{\mathbf {A}}_n = \mathbf {u}(\tilde{\mathbf {x}}_n) \mathbf {u}(\tilde{\mathbf {x}}_n)^\mathsf {T},\) \(\tilde{\mathbf {B}}_n = \partial _{\mathbf {x}}\mathbf {u}(\tilde{\mathbf {x}}_n) \varvec{\varLambda }_{\tilde{\mathbf {x}}_n}^{} \partial _{\tilde{\mathbf {x}}}\mathbf {u}(\tilde{\mathbf {x}}_n)^\mathsf {T},\) and \(\varvec{\varLambda }_{\tilde{\mathbf {x}}_n}^{}\) is the result of the propagation of \(\varvec{\varLambda }_{\mathbf {x}_n}^{}\) to the normalised coordinates, namely

$$\begin{aligned} \begin{bmatrix} \varvec{\varLambda }_{\tilde{\mathbf {x}}_n}^{}&\mathbf {0} \\ \mathbf {0}^{\mathsf {T}}&0 \end{bmatrix} = \mathbf {T} \begin{bmatrix} \varvec{\varLambda }_{\mathbf {x}_n}^{}&\mathbf {0} \\ \mathbf {0}^{\mathsf {T}}&0 \end{bmatrix} \mathbf {T}^{\mathsf {T}}. \end{aligned}$$

Next, \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) is obtained by the un-normalisation procedure captured in the formula

$$\begin{aligned} \widehat{\varvec{\theta }}_{\mathrm {AML}}= \mathbf {E}^{-1} \mathbf {P}_{(34)} \mathbf {D}_3^+ (\mathbf {T} \otimes \mathbf {T})^\mathsf {T}\mathbf {D}_3 \mathbf {P}_{(34)} \mathbf {E} \widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}. \end{aligned}$$
(7.2)

This formula is an instance of the inverse equation to (7.1) and is an immediate consequence of the identity \(J_{\mathrm {AML}}(\varvec{\theta }) = \tilde{J}_{\mathrm {AML}}(\tilde{\varvec{\theta }}).\) Notice that \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) given in (7.2) does not necessarily have unit length.

The computation of \(\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}\) involving data normalisation starts with the computation of \(\varvec{\varLambda }_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^{}\) which is nothing else but an application of (5.2) with \(\mathbf {P}_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^\perp \) substituted for \(\mathbf {P}_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^\perp \) and \(\mathbf {M}_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}\) substituted for \(\mathbf {M}_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}\), where, of course,

$$\begin{aligned} \mathbf {M}_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}} = \sum _{n=1}^N \frac{\tilde{\mathbf {A}}_n}{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}^{\mathsf {T}} \tilde{\mathbf {B}}_n \widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}. \end{aligned}$$

Here \(\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}\) is assumed to be of unit length. Once \(\varvec{\varLambda }_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^{}\) is found, \(\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}\) is given by

$$\begin{aligned} \varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{} = \Vert \widehat{\varvec{\theta }}_{\mathrm {AML}}\Vert ^{-2} \mathbf {P}_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^\perp \mathbf {F} \varvec{\varLambda }_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^{} \mathbf {F}^\mathsf {T}\mathbf {P}_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^\perp , \end{aligned}$$
(7.3)

where \(\mathbf {F}\) is short for \(\mathbf {E}^{-1} \mathbf {P}_{(34)} \mathbf {D}_3^+ (\mathbf {T} \otimes \mathbf {T})^\mathsf {T}\mathbf {D}_3 \mathbf {P}_{(34)} \mathbf {E}\) and \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) on the right-hand side is given by (7.2)—that is, \(\widehat{\varvec{\theta }}_{\mathrm {AML}} = \mathbf {F} \widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}\). The above expression for \(\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}\) is the result of an application of the rule of covariance propagation to \(\varvec{\varLambda }_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^{}\) and the mapping \(\tilde{\varvec{\theta }}\mapsto \mathbf {F} \tilde{\varvec{\theta }}/ \Vert \mathbf {F} \tilde{\varvec{\theta }}\Vert \).

Similarly, the data normalisation based computation of \(\varvec{\varLambda }_{\widehat{\varvec{\xi }}_{\mathrm {AML}}}^{}\) starts with the computation of \(\varvec{\varLambda }_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^{}\). The matrix \(\varvec{\varLambda }_{\widehat{\tilde{\varvec{\xi }}}_{\mathrm {AML}}}^{}\) is next retrieved by applying the formula

$$\begin{aligned} \varvec{\varLambda }_{\widehat{\tilde{\varvec{\xi }}}_{\mathrm {AML}}}^{} = [\partial _{\tilde{\varvec{\theta }}}{\tilde{\varvec{\xi }}}]_{\tilde{\varvec{\theta }}= \widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}} \varvec{\varLambda }_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^{} [(\partial _{\tilde{\varvec{\theta }}}{\tilde{\varvec{\xi }}})]_{\tilde{\varvec{\theta }}= \widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^\mathsf {T}, \end{aligned}$$

which is a variant of (5.2) relative to the normalised coordinates. Finally, \(\varvec{\varLambda }_{\widehat{\varvec{\xi }}_{\mathrm {AML}}}^{}\) is given by the expression

$$\begin{aligned} \varvec{\varLambda }_{\widehat{\varvec{\xi }}_{\mathrm {AML}}}^{} = \mathbf {S}\varvec{\varLambda }_{\widehat{\tilde{\varvec{\xi }}}_{\mathrm {AML}}}^{}\mathbf {S}^\mathsf {T}, \quad \mathbf {S} = {{\mathrm{\mathrm {diag}}}}(s,s,s,s,1), \end{aligned}$$

which encodes the covariance propagation for geometric parameters from the normalised to the un-normalised coordinates.

The procedures described above involve an arbitrary set of covariance matrices \(\varvec{\varLambda }_{\mathbf {x}_n}^{}\). A common assumption in practice is that the noise in the data set is homogeneous isotropic Gaussian, so that \(\varvec{\varLambda }_{\mathbf {x}_n}^{} = \sigma ^2 \mathbf {I}_2\) for each \(n = 1, \dots , N\), where \(\sigma \) is a common standard deviation. The value of \(\sigma \) is generally not known a priori. This value can, however, be learned from the data. The key is the fact that two variants of \(J_{\mathrm {AML}}\), one based on the covariances \(\varvec{\varLambda }_{\mathbf {x}_n}^{} = \sigma ^2 \mathbf {I}_2\) and one based on the covariances \(\varvec{\varLambda }_{\mathbf {x}_n}^{} = \mathbf {I}_2\), differ only by a multiplicative constant and, as a result, lead to a common value of \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\). In other words, \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) can be obtained without knowing the actual value of \(\sigma \), by minimising \(J_{\mathrm {AML}}\) under the default assumption that \(\varvec{\varLambda }_{\mathbf {x}_n}^{} = \mathbf {I}_2\). Once this is done, one can take

$$\begin{aligned} \widehat{\sigma } = \sqrt{\frac{J_{\mathrm {AML}}(\widehat{\varvec{\theta }}_{\mathrm {AML}})}{N - 5}} \end{aligned}$$

for an estimate of \(\sigma \) (cf. [27, Sect. 7.1.4]). For all practical purposes, \(\sigma \) may safely be identified with \(\widehat{\sigma }\). With this in effect, the \(\varvec{\varLambda }_{\mathbf {x}_n}^{}\) may be assumed to be fully known.

8 Normalised Confidence Regions

Data normalisation can also be used in forming confidence regions. To generate a confidence region based on normalised data, we first define

$$\begin{aligned} \hat{\tilde{z}}_{\tilde{\mathbf {x}}} = \frac{(\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}^T \tilde{\mathbf {u}}(\tilde{\mathbf {x}}))^2}{\tilde{\mathbf {u}}(\tilde{\mathbf {x}})^\mathsf {T}\varvec{\varLambda }_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^{} \tilde{\mathbf {u}}(\tilde{\mathbf {x}})}, \end{aligned}$$

where \(\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}\) is assumed to have unit norm. We then let

$$\begin{aligned} \tilde{\Delta }_{\alpha } = \left\{ \tilde{\mathbf {x}} \in \mathbb {R}^2 \mid \hat{\tilde{z}}_{\tilde{\mathbf {x}}} \le \chi ^2_{5,\alpha }\right\} \end{aligned}$$

and finally take

$$\begin{aligned} \Delta _{\alpha } = \left\{ \mathbf {x} \in \mathbb {R}^2 \mid \tilde{\mathbf {x}} \in \tilde{\Delta }_{\alpha } \right\} \end{aligned}$$

for an approximate confidence region at level \(1 - \alpha \) for \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\). Equivalently, \(\Delta _{\alpha }\) can be defined as

$$\begin{aligned} \Delta _{\alpha } = \left\{ \mathbf {x} \in \mathbb {R}^2 \mid \hat{z}_{\mathbf {x}} \le \chi ^2_{5,\alpha } \right\} \end{aligned}$$

provided that \(\hat{z}_{\mathbf {x}}\) is taken with \(\widehat{\varvec{\theta }}_{\mathrm {AML}}= \mathbf {F} \widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}\) and \(\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{} = \mathbf {F} \varvec{\varLambda }_{\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}}^{} \mathbf {F}^\mathsf {T},\) Footnote 2 with \(\widehat{\tilde{\varvec{\theta }}}_{\mathrm {AML}}\) remaining normalised; this follows from the fact that with \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) and \(\varvec{\varLambda }_{\widehat{\varvec{\theta }}_{\mathrm {AML}}}^{}\) as above, we have \(\hat{z}_{\mathbf {x}} = \hat{\tilde{z}}_{\tilde{\mathbf {x}}}.\) Now, as a rule, the set \(\Delta _{\alpha }\) will be different from the set \(\Omega _{\alpha }\), specified in (6.4), representing an approximate confidence region at level \(1 - \alpha \) for \(\widehat{\varvec{\theta }}_{\mathrm {AML}}\) based on raw data. While typically the difference between confidence regions constructed using normalised versus raw data is rather small, normalised confidence regions tend to be more visually pleasing.

9 Algorithms

figure d
figure e
figure f
figure g

Our overall optimisation scheme is summarised in Algorithms 1 and 2. Algorithms 3 and 4 give standalone procedures for computing the covariances of the algebraic and geometric parameters of an ellipse estimate obtained by applying our scheme. It should be noted that an integral part of Algorithm 1 are conditions for operation termination. We shall elaborate on these conditions in Sect. 10.2.3 while discussing details of our implementation of Algorithm 1.

10 Performance Analysis

In order to characterise the accuracy and computational efficiency of our method, we conducted numerous simulations under diverse experimental conditions. We manipulated the eccentricity of ellipses, the number of data points sampled from ellipses, the level of noise that was applied to data points, as well as the length of ellipse segments from which data points were generated. Taking into account the length of ellipse segments allowed us to determine how an ellipse estimate is influenced by the proportion of the length of an ellipse segment to the length of the entire underlying ellipse. Based on experiments with changing this proportion, we learned to distinguish between ill-posed and well-posed ellipse fitting problems. We now consider an ellipse fitting problem as ill-posed when the length of a segment of an ellipse is less than half of the ellipse’s perimeter, and as well-posed when the length of a segment of an ellipse is greater than half of the ellipse’s perimeter.

In evaluating the quality of ellipse estimates, we distinguished between training data and testing data. Training data consists of noise perturbed points sampled from an ellipse possibly reduced to a small fragment. In contrast, testing data consists of noiseless data points uniformly sampled from an entire ellipse. Any particular ellipse in our experiments was estimated using training data, and the ellipse’s estimates were evaluated using corresponding testing data. By distinguishing between training data and testing data, we were explicitly interpreting ellipse fitting as a prediction problem. This means that we were not interested in finding the best ellipse that fits a given training data per se, but expected the estimated ellipse to be representative of the true ellipse in all regions, not just in the region from which the training data was sampled.

10.1 Procedure for Generating Synthetic Data and Evaluating Ellipse Estimates

We now summarise the procedure for generating training data and testing data, and explain in more detail how the quality of ellipse estimates was characterised.

10.1.1 Data Generation

The first step of the data generation procedure is to determine an ellipse in standard form \(x^2/a^2 + y^2/b^2 = 1\) by choosing first a random value of \(a\) in the interval \((101 , 200)\) and next a random value of \(b\) in the interval \((100,a)\). Such a choice ensures that the inequality \(a > b\) holds and, consequently, that \(a\) is the length of the semi-major axis and \(b\) is the length of the semi-minor axis of the ellipse in question.

Once an initial ellipse has been selected, training and testing data-sets are constructed using the following steps:

  1. 1.

    Sample \(K\) points equidistantly along a segment of the ellipse with a specified length. The length of the segment is a fraction of the ellipse perimeter, and is measured clockwise starting from the positive \(y\)-axis. For example, the segment length can be half of the ellipse perimeter (see Fig. 1a). The \(K\) points will serve as a basis for the training data.

  2. 2.

    Sample \(N\) points equidistantly along the entire perimeter of the ellipse. These \(N\) points will serve as a basis for the testing data.

  3. 3.

    Rotate and translate the ellipse together with the points generated in the first two steps by a random angle and random offset (see Fig. 1d).

  4. 4.

    Add zero-mean Gaussian noise with desired standard deviation to the rotated \(K\) points (see Fig. 1e). Take these noise-perturbed data points as the training data (serving as input to the ellipse estimation methods).

  5. 5.

    Take the rotated set of \(N\) points as the testing data.

Fig. 1
figure 1

Summary of the simulation procedure. (a) Training data points are sampled from a segment of an ellipse. In this example, the length of the segment is half of the ellipse perimeter. (b) Testing data points are sampled equidistantly from the entire ellipse. (c) Training data is randomly rotated and translated. (d) The same rotation and translation is applied to testing data. (e) Noise perturbed training data serves as input to ellipse estimation methods. (f) The quality of an estimated ellipse is characterised in terms of the root-mean-square orthogonal distance between the testing data set and the estimated ellipse. The orthogonal distance is depicted by straight lines joining testing data points and the corresponding closest points on the ellipse

10.1.2 Estimates Evaluation

After an ellipse has been estimated on the training data with any particular estimation method, the quality of the respective estimate is evaluated on the testing data using the root-mean-square (RMS) orthogonal distance

$$\begin{aligned} \sqrt{\frac{1}{2 N}\sum _{n=1}^{N} d_{n}^2}, \end{aligned}$$

where \(d_n\) denotes the orthogonal distance between the \(n\)th data point and the ellipse constituting the estimate (see Fig. 1f). The RMS orthogonal distance measures the geometric error of the estimate with respect to the testing data points. The process of computing the orthogonal distances \(d_{n}\) is rather involved—detailed formulae can be found in [11] and [61].

10.2 Results

We compared our estimation technique with the orthogonal distance regression method and the direct ellipse fitting method, which represent the gold standard and the baseline technique for ellipse fitting, respectively. Both the orthogonal distance regression method and our proposed method were initialised with the result of the direct ellipse fitting technique. All estimation schemes operated on Hartley-normalised data points.

10.2.1 Synthetic Data

In the first set of simulations we held the noise level fixed at \(\sigma = 5\) pixels, and varied the number of training data points and the length of ellipse segments from which the data points were sampled. For each combination of the number of data points and the length of an ellipse segment, we conducted 250 simulation trials and recorded the mean root-mean-square error. The mean-root-mean-square errors are displayed using two-dimensional contour plots in Fig. 2.

Fig. 2
figure 2

Comparison of mean root-mean-square orthogonal distance error for a fixed noise level of \(\sigma = 5\) pixels, as both the number of data points and the portion from which the data points are sampled are varied. The colour intensity of the plot is correlated with the mean root-mean-square orthogonal distance error—the lighter the colour, the larger the error. Contour lines represent level sets with constant error. When the fraction of the ellipse from which data points are sampled is less than half of the ellipse perimeter, the error of the direct ellipse estimation (DIR) method fails to improve as the number of data points is increased. In contrast, both our fast guaranteed ellipse estimation method (FGEE) and the gold standard orthogonal distance estimation (ODE) method exhibit a consistent reduction in error as the number of data points is increased. (d) may be interpreted as comparing a sample of values on the “number of points” axis in (ac), while holding the corresponding “fraction of ellipse perimeter” axis fixed at 0.55. (a) Direct ellipse estimation. (b) Fast guaranteed ellipse estimation. (c) Orthogonal distance estimation. (d) Comparing all three methods (Color figure online)

Two important conclusions can be drawn from the results of the first experiment:

  1. 1.

    When the length of an ellipse segment is less than half of the length of the entire ellipse, the direct ellipse estimate does not improve as the number of data points is increased.

  2. 2.

    The Sampson distance based ellipse fitting method and the orthogonal distance regression method yield almost indistinguishable results.

In the second set of simulations we utilised only 25 data points, but still varied the noise level and the length of ellipse segments from which data points were sampled. The results of the second simulation are summarised using two-dimensional contour plots in Fig. 3. The second experiment confirms that our Sampson distance based ellipse fitting method produces similar results to those of the gold standard orthogonal distance regression method for a variety of noise levels. The second experiment also shows that our Sampson distance based ellipse fitting imitates the orthogonal distance regression even when both the noise level and the ellipse segment length are varied. The direct ellipse fitting method does not share this desirable property, and instead yields inferior results.

Fig. 3
figure 3

Comparison of mean root-mean-square orthogonal distance error for 25 data points, as both the portion from which the data points are sampled and the standard deviation of the noise level (measured in pixels) are varied. The colour intensity of the plot is correlated with the mean root-mean-square orthogonal distance error—the lighter the colour, the larger the error. Contour lines represent level-sets with constant error. When the fraction of the ellipse from which data points are sampled is less than half of the ellipse perimeter, the error of the direct ellipse estimation method is considerably higher than that of the other methods. In contrast, our fast guaranteed ellipse estimation method and the gold standard orthogonal distance estimation method exhibit similar error levels. (d) may be interpreted as comparing a sample of values on the “number of points” axis in (ac), while holding the corresponding “fraction of ellipse perimeter” axis fixed at 0.45. (a) Direct ellipse estimation. (b) Fast guaranteed ellipse estimation. (c ) Orthogonal distance estimation. (d) Comparing all three methods (Color figure online)

10.2.2 Stability and Efficiency

For every experiment, we verified that our algorithm was indeed producing an ellipse fit by confirming that the discriminant of the estimated parameters was less than zero. We also monitored the average running time of a MATLAB implementation of our algorithm,Footnote 3 while we varied the number of data points as well as the fraction of the ellipse perimeter from which data points were sampled. A summary of our findings together with a comparison against the running time of the orthogonal distance regression method is presented using two-dimensional density plots in Fig. 4.

Fig. 4
figure 4

Comparison of mean running time for fixed noise level of \(\sigma = 5\) pixels, as both the portion from which the data points are sampled and the number of data points are varied. The running time is measured in seconds. The colour intensity of the plot is correlated with the average running time—the lighter the colour, the greater the running time. In (a) and (b) the same colour scaling function was used to map running time to colour, thereby ensuring that the two plots are directly comparable. According to (a), the average running time of the fast guaranteed ellipse estimation method increases smoothly as the number of points is increased and as the fraction of the ellipse perimeter from which data points are sampled is decreased. Whilst the running time of the orthogonal distance estimation method in (b) follows a similar trend, the trade-off is not as smooth, indicating that the running time of the orthogonal distance estimation method for a sample of data points is much more unpredictable. In (c) the irregularity of the running time of the orthogonal distance estimation method is made more prominent by a different choice of the colour scaling function. (d) may be interpreted as comparing a sample of values on the “fraction of ellipse perimeter” axis in (a) and (b), while holding the corresponding “number of points” axis fixed at 350. (a) Fast guaranteed ellipse estimation. (b) Orthogonal distance estimation. (c) Orthogonal distance estimation. (d) Fast guaranteed ellipse estimation vs orthogonal distance estimation (Color figure online)

Based on the outcome of the experiment, we make the following two observations:

  1. 1.

    The running time of our fast guaranteed ellipse fitting algorithm increases gradually and smoothly as the number of data points is increased, and as the problem becomes more ill-posed.

  2. 2.

    The running time of the orthogonal distance regression method follows a similar trend, but does not increase gradually nor smoothly.

Our experiments indicate that the running time of the orthogonal distance regression method is much more unpredictable than the running time of our technique.

In Sect. 4.2 we mentioned that utilising the LM method in standard form to optimise our cost function may occasionally take many iterations to converge, and proposed a modification to the LM scheme. A comparison of the running time of our fast guaranteed ellipse fitting algorithm against the basic guaranteed ellipse fitting algorithm that uses the standard LM method is presented in Fig. 5. The results show that for ill-posed ellipse fitting problems, our fast guaranteed ellipse implementation is twice as fast as the basic guaranteed ellipse fitting algorithm.

Fig. 5
figure 5

Comparison of smoothed histograms of the number of seconds that elapsed before our algorithm that uses a modified damping matrix in the LM optimisation scheme converged for (a) well-posed and (b) ill-posed problems, and of similar histograms for the basic guaranteed ellipse fitting method that uses an identity damping matrix. The results indicate that the use of the modified damping matrix in the LM scheme can considerably decrease the running time for ill-posed problems. On average, the modified damping matrix reduces the running time by half for ill-posed problems. For well-posed problems, there is no notable difference in running time as both algorithms converge within a handful of iterations. (a) 25 data points sampled from the entire ellipse. (b) 25 data points sampled from one third of the ellipse

10.2.3 Attributes of the Estimate

While our ellipse specific parametrisation explicitly excludes hyperbola fits, it does permit degenerate ellipses or ellipses that approach parabolas in the limit. As discussed in Sect. 2, degenerate ellipses are those for which the corresponding determinant \(D\) is zero. In turn, ellipses close to parabolas are those for which the corresponding discriminant \(\Delta \) is close to zero. To avoid producing degenerate ellipses, our algorithm terminates when the absolute value of the determinant of an ellipse falls below a user-specified threshold. To prevent the solution from coming too close to the parabolas, our algorithm stops when the discriminant approaches zero.

In our experiments the threshold for the determinant was set to \(10^{-5}\), and the discriminant was ensured to be large enough in modulus by forcing the algorithm to terminate when \(\log (\Vert \varvec{\theta }\Vert ^2/\varvec{\theta }^\mathsf {T}\mathbf {F} \varvec{\theta })\) rose above a threshold of \(15.5\).

We label close to degenerate ellipses and ellipses that approximate parabolas as depreciated ellipses. In contrast to bona fide ellipses, depreciated ellipses are characterised by the property that only a subset of geometric ellipse parameters are estimated with reasonable certainty. Although not as informative as genuine ellipses, depreciated ellipses still provide useful knowledge. For example, an ellipse that approximates a parabola will have tremendous uncertainty associated with its semi-major axis, but the semi-minor axis and orientation of the axes may still be estimated with great precision, and this will be reflected in the covariance matrix.

In order to determine how frequently depreciated ellipses occur in typical scenarios, we conducted numerous simulations which are summarised in Table 1.

Table 1 Prevalence of ellipses, depreciated ellipses, and hyperbolas when minimising AML using the conic parametrisation (\(\varvec{\theta }\)) or the ellipse-specific parametrisation (\(\varvec{\eta }\))

The results of our simulations are based on 10,000 trials. For each trial an ellipse was generated by randomly selecting a length for the semi-major and semi-minor axes, while keeping the axes aligned with the Cartesian axes. Points were then sampled from the upper half, right half, and upper right quarter of the ellipse. Our experiments revealed that with small noise levels and with at least ten data points, depreciated ellipses occur on rare occasions. The most challenging situation arises when data points are sampled from the upper right quarter of an ellipse and the noise level is large. This constitutes a very ill-posed problem—neither orthogonal distance regression nor AML estimation are capable of producing an ellipse that is close to a true ellipse without recourse to some kind of regularisation. We believe that ill-posed estimation problems are best solved within a Bayesian setting, where regularisation is achieved through a suitable choice of a prior distribution over the parameter space. The design of an appropriate prior over the space of ellipses that might serve to guide the estimate away from degenerate ellipses and parabolas is a challenging problem which we intend to pursue in future work.

10.2.4 Real Data

To validate the conclusions drawn from the synthetic experiments, we also compared the ellipse fitting methods on real images. In Fig. 6, we utilised two images of a Martian moon eclipse captured by the Opportunity rover, and in Fig. 7, we used images of two of Saturn’s moons captured by the Cassini spacecraft. Our experiments confirm that the estimates produced by our fast guaranteed ellipse fitting method agree with the estimates obtained via the orthogonal distance regression.

Fig. 6
figure 6

Comparison of ellipse fitting methods on two images of a Martian moon eclipse. The two images capture the Martian moon Phobos as it occludes the sun. A Canny edge detector was applied to the two images, and the maximum edge response corresponding to the common border of the moon and the sun was used as input to the ellipse fitting methods. In both cases the smallest ellipse corresponds to the direct ellipse fit, whilst our fast guaranteed ellipse estimation method and the orthogonal distance estimation method yield indistinguishable results (the ellipse contours overlap). Image Credit: NASA/JPL/Cornell. (a) Martian moon eclipse stage 1. (b) Martian moon eclipse stage 2

Fig. 7
figure 7

Comparison of ellipse fitting methods on Saturn’s crescent moons. A Canny edge detector was applied to the two images, and the maximum edge response corresponding to the outer border of the crescents was used as input to the ellipse fitting methods. In both cases the smallest ellipse corresponds to the direct ellipse fit, whilst our fast guaranteed ellipse estimation method and the orthogonal distance estimation method yield indistinguishable results (the ellipse contours overlap) Image Credit: NASA/JPL/Space Science Institute. (a) Rhea. (b) Mimas

10.2.5 Accuracy of Geometric Parameter Covariance Estimation

We performed additional simulations to test the validity of our geometric parameter covariance formulae. Our validation was based on comparing the covariance matrix given by the propagation formula (5.6) to a covariance matrix resulting from a Monte Carlo simulation. In particular, we sampled \(250\) points equidistantly between 0\(^\circ \) and 225\(^\circ \) on the ellipse parametrised by \(\varvec{\xi }= [100, 50, 250, 250, 0.7854]^\mathsf {T}\). We then added zero-mean Gaussian noise at a pre-set noise level to the data points, and produced 10,000 simulation trials. The Monte Carlo covariance matrix was computed with the aid of the formula

$$\begin{aligned} \varvec{\varLambda }_{\widehat{\varvec{\xi }}_{\mathrm {MONTE}}}^{} \!=\! \frac{1}{9995} \sum _{n = 1}^{10 000}\left( \widehat{\varvec{\xi }}_{\mathrm {AML}, {n}}\!-\!\bar{\varvec{\xi }}_{\mathrm {AML}}\right) \left( \widehat{\varvec{\xi }}_{\mathrm {AML}, {n}}\!-\!\bar{\varvec{\xi }}_{\mathrm {AML}}\right) ^\mathsf {T}, \end{aligned}$$

where \(\widehat{\varvec{\xi }}_{\mathrm {AML}, {n}}\) is the geometric parameter estimate corresponding to the \(n\)th trial and \(\bar{\varvec{\xi }}_{\mathrm {AML}}\) represents the corresponding mean geometric parameter vector. We then compared the relative error between our propagated covariance matrix and the Monte Carlo covariance matrix for each simulation trial. The relative error is defined as the absolute error between our propagated covariance matrix and the Monte Carlo covariance matrix, divided by the Frobenius norm of the Monte Carlo covariance matrix. The relative error is sensitive to very small differences. We also computed the angular error defined as the angle between vectorised and unit-normalised variants of the propagated covariance and Monte Carlo covariance matrices. The angular error is invariant to differences in scale between the two covariance matrices. The median relative and median angular errors are presented in Table 2 for various noise levels. The results show that our covariance matrix estimates are very accurate, achieving a relative error of less than \(15\%\), provided that the noise level is less than \(\sigma = 3\) pixels.

Table 2 Median errors between our propagated geometric parameter covariance matrix and a Monte Carlo estimate of the geometric parameter covariance matrix

To further illustrate the practical insight that can be gleaned from the covariance matrices, we plot a 95% confidence region and report the estimated geometric parameters and corresponding parameter standard errors on four different data-sets (see Fig. 8). In accordance with theoretical expectations, the true ellipses happened to fall inside the confidence regions and the confidence regions became narrower as the number of data points was increased. While the results presented in Fig. 8 are representative of the kinds of confidence regions one would typically observe in many practical scenarios, they may not be representative of confidence regions arising when data points are sampled from a very short fragment of an ellipse (an ill-posed problem)—in that case the assumptions underlying the generation of confidence regions may not be satisfied and the confidence regions may thus be too narrow.

Fig. 8
figure 8

Fast guaranteed ellipse estimation with corresponding 95 % confidence band (shaded region). (a) The estimated ellipse is centred at \((254.218 \pm 3.176, 254.866 \pm 3.580)\), with semi-major axis length of \(199.347\pm 4.965\) pixels, semi-minor axis length of \(29.447 \pm 1.683\) pixels, and orientation of \(45.612 \pm 0.549^\circ \). (b) The estimated ellipse is centred at \((245.330\pm 2.612, 250.868 \pm 2.403)\), with semi-major axis length of \(200.105 \pm 3.613\) pixels, semi-minor axis length of \(23.596 \pm 1.327\) pixels, and orientation of \(44.035 \pm 0.419^\circ \). (c) The estimated ellipse is centred at \((245.316\pm 5.478, 236.282 \pm 7.851)\), with semi-major axis length of \(201.939 \pm 3.629\) pixels, semi-minor axis length of \(110.769 \pm 9.859\) pixels, and orientation of \(132.503 \pm 1.772^\circ \). (d) The estimated ellipse is centred at \((256.101 \pm 3.549, 256.386 \pm 3.490)\), with semi-major axis length of \(197.207 \pm 1.719\) pixels, semi-minor axis length of \(88.211 \pm 5.447\) pixels, and orientation of \(134.868 \pm 0.782^\circ \). (a) 10 data points sampled from an ellipse centred at (250, 250), with semi-major axis length of 200 pixels, semi-minor axis length of 25 pixels, and orientation of \(45^\circ \). The data points were perturbed with zero-mean homogeneous Gaussian noise of \(\sigma = 5\) pixels. (b) 40 data points sampled from an ellipse centred at (250, 250), with semi-major axis length of 200 pixels, semi-minor axis length of 25 pixels, and orientation of \(45^\circ \). The data points were perturbed with zero-mean homogeneous Gaussian noise of \(\sigma = 5\) pixels. (c) 10 data points sampled from an ellipse centred at (250, 250), with semi-major axis length of 200 pixels, semi-minor axis length of 100 pixels, and orientation of \(135^\circ \). The data points were perturbed with zero-mean homogeneous Gaussian noise of \(\sigma = 5\) pixels. (d) 40 data points sampled from an ellipse centred at (250, 250), with semi-major axis length of 200 pixels, semi-minor axis length of 100 pixels, and orientation of \(135^\circ \). The data points were perturbed with zero-mean homogeneous Gaussian noise of \(\sigma = 5\) pixels

11 Discussion

Our experiments show that the Sampson distance is an excellent approximation to the orthogonal distance for the purpose of fitting an ellipse to data. Our results are in agreement with the findings of Kanatani and Rangarajan who report that “[it] has also been observed that the solution that minimizes the Sampson error agrees with the ML [orthogonal distance] solution up to several significant digits.” [30, p. 2204].

The utility of the Sampson distance was also acknowledged by Rosin. This author conducted a comprehensive comparison of various ellipse fitting cost functions and concluded that “...for small amounts of noise \( EOF _2\) [the Sampson distance] is the overall best approximation to use ...” [49, p. 501].

Some researchers, however, including Chernov and Ma [10, p. 299], have cautioned against replacing the orthogonal distance estimate with the Sampson distance. Their concerns are based on a list of disadvantages associated with algebraic fitting methods,Footnote 4 which was compiled by Anh [1]. The list includes the following items:

  1. 1.

    Error definition does not comply with the measurement guidelines.

  2. 2.

    Conversion of the algebraic parameters to the physical parameters (shape, size, position, and rotation parameters) is highly difficult.

  3. 3.

    Fitting errors are weighted.

  4. 4.

    The estimated model parameters are biased.

  5. 5.

    It is very difficult to test the reliability of the estimated model parameters (particularly in terms of physical parameters).

  6. 6.

    The fitting procedure sometimes ends with an unintended model feature (e.g. a hyperbola instead of an ellipse).

  7. 7.

    The model parameters are not invariant to coordinate transformation (e.g. a parallel shift of the set of given points causes changes not only in position, but also in the form and rotation of the estimated model feature).

In the context of ellipse fitting, the disadvantages are not so severe. We offer the following rejoinder to some of the above criticisms (the numbers in brackets will refer to the original item numbers):

  1. [2].

    The formulae for the conversion of algebraic ellipse parameters to physical ellipse parameters are straightforward, albeit tedious, to derive—see Sect. 5.2.

  2. [3].

    The fact that fitting errors are weighted is not necessarily a disadvantage. It is through the introduction of gradient-cum-covariance weights into the algebraic distance that the Sampson distance is able to produce accurate estimates.

  3. [4].

    Bias in the estimated model parameters is not limited to algebraic fitting methods. The orthogonal distance estimate is also biased even in the case of ellipse fitting [16, 3841].

  4. [5].

    Our fitting procedure includes a reliability measure for both algebraic and geometric ellipse parameters.

  5. [6].

    Our fitting procedure always produces an ellipse estimate.

  6. [7].

    If data points and data point covariance matrices are both jointly and appropriately modified in accordance with a change of coordinate system, then the Sampson distance is theoretically unchanged. However, certain coordinate systems (e.g. scaling all data points to lie within a unit box [14]) present favourable numerical advantages.

Our response above reduces the list of disadvantages to item (1). Now, passing to this remaining item, given that the Sampson distance yields accurate estimates, does it really matter that the error definition does not comply with measurement guidelines? In our view, it does not, but we concede that the answer may be domain dependent.

An astute reader would have probably noticed that the Sampson distance was devised more than forty years ago, and now may wonder whether the accuracy of the Sampson distance based estimation method has been surpassed by more recent techniques. To answer this question, a comprehensive evaluation of several new schemes was recently conducted [55]. These schemes are not based on optimisation of any particular cost function but rather exploit the idea of unbiasing existing estimators. The results of the comparison showed that whilst in some instances the newer techniques may lead to more accurate estimates, the improvements they yield are very small and can usually only be measured using several decimal places. Hence, in many situations, the improvement in accuracy that may be gleaned by using one of the more recently proposed techniques may not be practically useful.

12 Conclusion and Future Work

We have presented a straightforward and efficient algorithm for fitting an ellipse to data. The method exploits a parametrisation of the set of all ellipses to implicitly enforce the ellipse constraint. Computational efficiency is accomplished with a custom variant of the Levenberg–Marquardt algorithm. The technique yields estimates that are almost indistinguishable from the estimates produced by the gold-standard orthogonal distance regression method, even at moderate noise levels. The proposed method is easy to implement, thanks to the reliance on the Sampson distance. Additionally, we have provided a means for plotting confidence regions and a measure of uncertainty for both algebraic and geometric parameters of an ellipse estimate. This measure of uncertainty may prove useful in various industrial applications and may also help in low-level vision tasks where deciding whether a group of pixels belong to a line, circle, or ellipse is often a prerequisite for higher-level image analysis. In future work we plan to investigate robust variants of our cost function to neutralise the impact of outliers on the estimate, and plan to explore suitable priors over the space of ellipses to improve the quality of the estimate for ill-posed problems. We also intend to characterise the empirical and theoretical accuracy of our measure of uncertainty more comprehensively by extending the scope of the Monte Carlo simulations and experiments.