1 Introduction

Scene-plane rectification is used in many classic computer vision tasks, including single-view 3D reconstruction, camera calibration, grouping coplanar symmetries, and image editing (Lukáč et al. 2017; Pritts et al. 2014; Wu et al. 2011). In particular, the affine rectification of an imaged scene plane transforms the camera’s principal plane so that it is parallel to the scene plane (see Fig. 1). This restores the affine invariants of the imaged scene plane, such as the parallelism of lines and ratios of areas (Hartley and Zisserman 2004). There is only an affine transformation between the affine-rectified imaged scene plane and its real-world counterpart. The removal of the effects of perspective imaging is helpful to understand the geometry of the scene plane.

Fig. 1
figure 1

Input (top left) is a distorted view of a scene plane, and the outputs (top right, bottom) are the undistorted and rectified scene plane. The method is fully automatic

This paper proposes minimal solvers that jointly estimate affine rectification and lens distortion from local features extracted from essentially arbitrarily repeating coplanar texture (see Fig. 2). Wide-angle lenses with significant radial lens distortion are common in consumer cameras like the GoPro series of cameras. In the case of Internet imagery, the camera and its metadata are often unavailable for use with off-line calibration techniques. The state of the art has several approaches for rectifying (or partially calibrating) a distorted image, but these methods make restrictive assumptions about scene content by assuming, e.g., the presence of sets of parallel scene lines (Antunes et al. 2017; Wildenauer and Micusík 2013) or translational symmetries (Pritts et al. 2018). The proposed solvers relax the need for specific assumptions about scene content to unknown repeated structures (see Table 1).

The proposed minimal solvers exploit the scale constraint: two instances of rigidly-transformed coplanar repeats occupy identical areas in the scene plane and in the affine rectified image of the scene plane (e.g., see the rectifications in Figs. 2, 3, 4). There are two groups of solvers introduced in this paper: the directly-encoded scale and change-of-scale solvers, which are differentiated by the way in which the scale constraint is used. The directly-encoded scale solvers, which we acronymize as the DES solvers for short, encode the unknown area of a rectified region as a dependent function of the measured region, vanishing line, and undistortion parameter (see Sect. 4). The change-of-scale solvers—CS solvers for short—linearize the undistorting and rectifying transformation and use its Jacobian determinant to induce constraints on the unknown undistortion and rectification parameters (see Sect. 5). The Jacobian determinant measures the local change-of-scale of the rectifying transformation (and, more generally, of any differentiable transformation).

The input to the solvers are intra-image correspondences of local features. Geometrically, the local features are represented by local affine frames, that is, by triplets of (semi-) locally measured image points. There are three different minimal configurations of corresponding features that provide a sufficient number of constraints to solve for the unknown undistortion and rectification parameters (see Sect. 4.3). The minimal configurations are shown in Fig. 4 and are the same for the DES and CS groups of solvers. We generate solvers for all input configurations for both groups of solvers to provide for flexible sampling during robust estimation.

Fig. 2
figure 2

A shortcut to affine rectification. The hierarchy of rectifications from distorted to metric space is traversed clockwise from the top left. The proposed method is a direct path to affine-rectified space using only rigidly-transformed coplanar repeats, in contrast to the state of the art, which requires scene lines or sampled undistortions. The scene plane’s vanishing line is shown in the original and undistorted image (\(\tilde{\mathbf {l}}\) and \(\mathbf {l} \), respectively). The affine-covariant regions are in the 222-configuration (see Sect. 4.2), where corresponded coplanar regions are the same color. All affine-rectified images are metrically upgraded with the method of Pritts et al. (2014) for presentation (see Sect. 6.3)

The solvers are fast and robust to noisy feature detections, so they work well in robust estimation frameworks like ransac (Fischler and Bolles 1981). The proposed work is applicable for several important computer vision tasks including symmetry detection (Funk et al. 2017), inpainting (Lukáč et al. 2017), and single-view 3D reconstruction (Wu et al. 2011).

Table 1 Scene assumptions

1.1 Previous Work

Several state-of-the-art methods can rectify from imaged coplanar repeated texture, but these methods assume the pinhole camera model (Ahmad and Cheong 2018; Aiger et al. 2012; Chum and Matas 2010; Criminisi and Zisserman 2000; Lukáč et al. 2017; Ohta et al. 1981; Zhang et al. 2012). A subset of these methods introduce solvers that use algebraic constraints induced by the equal-scale invariant of affine-rectified space (Chum and Matas 2010; Criminisi and Zisserman 2000; Ohta et al. 1981) in a similar formulation to the proposed solvers (see Fig. 2). These methods are members of the change-of-scale (CS) solver group (see Sect. 5) since they use the Jacobian determinant of the affine-rectifying transformation to induce constraints on the imaged scene plane’s vanishing line. To complete the family of affine-rectifying minimal solvers for pinhole cameras Chum and Matas (2010); Criminisi and Zisserman (2000); Ohta et al. (1981), we also construct and evaluate a novel DES solver that assumes the pinhole camera model in Sect. 4.5.

Pritts et al. (2014) rectify images of scene planes with lens-distortion using a two-step approach: a rectification that is estimated from a minimal sample using the pinhole assumption is refined by a nonlinear program that incorporates lens distortion. However, even with relaxed thresholds, a robust estimator like ransac (Fischler and Bolles 1981) discards measurements around the boundary of the image since this region is the most affected by radial distortion and cannot be accurately modeled with a pinhole camera. Neglecting lens distortion during the labeling of good and bad measurements, as done during the verification step of ransac, can give fits that are biased to barrel distortion (Kukelova et al. 2015), which degrades rectification accuracy.

Fig. 3
figure 3

Wide-angle results. Input (top left) is an image of a scene plane. Outputs include the undistorted image (top right) and rectified scene planes (bottom row). The method is automatic

Pritts et al. (2018) first proposed minimal solvers that jointly estimate affine rectification and lens distortion, but this method is restricted to scene content with translational symmetries (see Table 1). Furthermore, we show that the conjugate translation solvers of Pritts et al. (2018) are more noise sensitive than the proposed scale-based solvers (see Figs. 78).

There are two recent methods that affine-rectify lens-distorted images by enforcing the constraint that scene lines are imaged as circles with the division model (Antunes et al. 2017; Wildenauer and Micusík 2013). The input to these solvers are circles fitted to contours extracted from the image. Sets of circles whose preimages are coplanar parallel lines are used to induce constraints on the division model parameter and vanishing points. These methods require two distinct sets of imaged parallel lines [5 total lines for Wildenauer and Micusík (2013) and 7 for Antunes et al. (2017); see Table 1] to estimate rectification, which is a strong scene-content assumption. In addition, these methods must perform a multi-model estimation to label distinct vanishing points as pairwise consistent with a vanishing line. In contrast, the proposed solvers can undistort and rectify from as few as 4 coplanar repeated local features (see Table 1).

2 Preliminaries

In this section, we provide a brief review of the parameterizations, methods, and notations that are used in this paper. We use the one-parameter division model to parameterize the radial undistortion function. The strengths of this model were shown by Fitzgibbon (2001) for the joint estimation of two-view geometry and non-linear lens distortion. The division model is especially suited for minimal solvers since it is able to express a wide range of distortions with a single parameter (denoted \(\lambda \)), as well as yielding simpler equations compared to other distortion models (see Sect. 3.1).

The polynomial system of equations encoding the rectifying constraints is solved using an algebraic method based on Gröbner bases. Automated solver generators using the Gröbner basis method (Kukelova et al. 2008; Larsson et al. 2017a) have been used to generate solvers for several camera geometry estimation problems (Kukelova et al. 2008, 2015; Larsson et al. 2017a, b), Pritts et al. (2018). However, the straightforward application of automated solver generators to the proposed constraints resulted in unstable solvers (see Sect. 7). Recently, Larsson et al. (2018) sampled feasible monomial bases, which can be used in the action-matrix method. In Larsson et al. (2018) basis, sampling was used to minimize the size of the solver. We modified the objective of Larsson et al. (2018) to maximize for solver stability. Stability sampling generated significantly more numerically stable solvers (see Fig. 6).

Table 2 Common denotations

2.1 Notation

For most of the text imaged points are modeled with homogeneous coordinates and are denoted , where \(x_i,y_i\) are the image coordinates. The image of a scene plane’s vanishing line is denoted and the line at infinity is . Matrices are in typewriter font; e.g., an affinity is \(\mathtt {A} _{}\), and a homography is \(\mathtt {H} \). For the derivation of the solvers in Sect. 5, it is convenient to use Euclidean points, which are given by inhomogeneous coordinates and typeset as .

A covariant region detection (see Sect. 6.1) is a distorted function of some region from the pinhole image and is denoted \(\tilde{\mathcal {R}}_{}\). Likewise, a distorted point extracted from a region detection is denoted , and its Euclidean representation is . Under the division model, the distorted image of the vanishing line is a circle (Bukhari and Dailey 2013; Fitzgibbon 2001; Strand and Hayman 2005; Wang et al. 2009) and is denoted \(\tilde{\mathbf {l}}\).

The affine-rectified images of regions, homogeneous points and Euclidean points are denoted as , and , respectively (see Table 2).

Fig. 4
figure 4

Solver variants. (top-left image) The input to the method is a single image. (bottom-left triptych, contrast enhanced) The three configurations—222, 32, 4—of affine frames that are inputs to the proposed solvers variants. Corresponded frames have the same color. (top row, right) Undistorted outputs of the proposed solver variants. (bottom row, right) Cutouts of the dartboard rectified by the proposed solver variants. The affine frame configurations—222, 32, 4—are transformed to the undistorted and rectified images. The rectifications were estimated by the proposed directly-encoded scale (DES) solvers (see Sect. 4), but the input configurations are the same for the proposed change-of-scale (CS) solvers (see Sect. 5)

3 Problem Formulation

An affine-rectifying homography \(\mathtt {H} \) transforms the image of the scene plane’s vanishing line to the line at infinity (Hartley and Zisserman 2004). Thus any homography \(\mathtt {H} \) satisfying the constraint

(1)

and where \(\mathbf {l} \) is an imaged scene plane’s vanishing line, is an affine-rectifying homography. Constraint (1) implies that \(\mathbf {h} _3=\mathbf {l} \), and that the line at infinity is independent of rows \(\mathbf {h} ^{\top }_1\) and \(\mathbf {h} ^{\top }_2\) of \(\mathtt {H} \). Thus, assuming \(l_3 \ne 0\), the affine-rectification of image point \(\mathbf {x} _{}\) to the affine-rectified point can be defined as

(2)

3.1 Radial Lens Distortion

Rectification, as given in (2), is valid only if \(\mathbf {x} _{} \) is imaged by a pinhole camera. Cameras always have some lens distortion, and the distortion can be significant for wide-angle lenses. For a lens distorted point, denoted \(\tilde{\mathbf{x }}\), an undistortion function f is needed to transform \(\tilde{\mathbf{x }}\) to the pinhole point \(\mathbf {x} _{} \). A common parameterization for radial lens undistortion is the one-parameter division model (Fitzgibbon 2001), which has the form

(3)

where is a feature point with the distortion center subtracted, which is assumed to be fixed at the image center. Substituting (3) into (2) gives

(4)

The unknown division model parameter \(\lambda \) and vanishing line \(\mathbf {l} \) appear only in the third coordinate. This property simplifies the solvers derived in Sects. 4 and 5. We also generated a solver using the standard second-order Brown–Conrady model (Hartley and Zisserman 2004; Brown 1966; Conrady 1919); however, these constraints generated a very larger solver with 85 solutions because the radial distortion coefficients appear in the first two coordinates.

For most cameras, the center of distortion coincides with the principal point, which we assume to be the image center. While this assumption is not strictly satisfied in practice, we will see in the experiments in Sect. 7 that the method is robust to these small deviations. In particular, Fig. 14 (right pair) demonstrates a successful rectification of a fisheye image, where the principal point is clearly shifted from the image center.

4 The Directly-Encoded Scale (DES) Solvers

The proposed DES solvers use the invariant that rectified coplanar repeats have equal scales. In Sects. 4.1 and 4.2 the equal-scale invariant is used to formulate a system of polynomial constraint equations on rectified coplanar repeats with the vanishing line and radial undistortion parameter as unknowns. The radial lens undistortion function is parameterized with the one-parameter division model as defined in Sect. 3.1. Affine-covariant region detections are used to model repeats since they encode the necessary geometry for scale estimation (see Fig. 4, Sect. 6.1). The geometry of an affine-covariant region is uniquely given by an affine frame (see Sect. 4.1). The solvers require 3 points from each detected region to measure the region’s scale in the image space. The scale of the rectified coplanar repeat is defined as the area of the triangle defined by the 3 rectified points that represent a corresponding affine-covariant region.

Three minimal cases exist for the joint estimation of the vanishing line and division-model parameter (see Fig. 4, Sect. 4.2). These cases differ by the number of affine-covariant regions needed for each detected repetition. The method for generating the minimal solvers for the three variants is described in Sect. 4.4. Finally, in Sect. 4.5, we show that if the undistortion parameter is given, then the constraint equations simplify, which results in a small solver for estimating rectification under the pinhole camera assumption.

4.1 Equal Scales Constraint from Rectified Affine-Covariant Regions

The geometry of an oriented affine-covariant region is given by an affine frame with its origin at the midpoint of the affine-covariant region detection (Mikolajczyk and Schmid 2004; Vedaldi and Fulkerson 2008). The affine frame is typically given as the orientation-preserving homogeneous transformation \(\mathtt {A} _{}\) that maps from the right-handed orthonormal frame, which is the canonical frame used for descriptor extraction, to the image space as

$$\begin{aligned} \begin{bmatrix} \mathbf {y}&\mathbf {o}&\mathbf {x} \end{bmatrix} = \mathtt {A} _{} \begin{bmatrix} 0 &{}\quad 0 &{} \quad 1 \\ 1 &{}\quad 0 &{}\quad 0 \\ 1 &{}\quad 1 &{}\quad 1 \end{bmatrix}, \end{aligned}$$

where \(\mathbf {o}\) is the origin of the linear basis defined by \(\mathbf {x}\) and \(\mathbf {y}\) in the image coordinate system (Mikolajczyk and Schmid 2004; Vedaldi and Fulkerson 2008). Thus the matrix \(\begin{bmatrix}\mathbf {y}&\mathbf {o}&\mathbf {x} \end{bmatrix}\) is a parameterization of affine-covariant region , which we call its point-parameterization.

Let \(\begin{bmatrix} \tilde{\mathbf{x }}_{i,1}&\tilde{\mathbf{x }}_{i,2}&\tilde{\mathbf{x }}_{i,3} \end{bmatrix}\) be the point parameterization of an affine-covariant region \(\tilde{\mathcal {R}}_{i}\) detected in a radially-distorted image. Then, by (4), the point parameterization of an affine-rectified image of —namely —is

(5)

where \(\alpha _{i,j}=\mathbf {l} ^{\top }f(\tilde{\mathbf{x }}_{i,j},\lambda )\). Thus the scale of is given as an area of a triangle defined by points in (5) as

(6)

The numerators of the second and third expressions in (6) depend only on the undistortion parameter \(\lambda \) and \(l_3\) due to cancellations in the determinant. The sign of depends on the handedness of the detected affine-covariant region. See Sect. 4.7 for a method to use reflected affine-covariant regions with the proposed solvers.

4.2 Eliminating the Rectified Scales

The affine-rectified scale in (6) is a function of the unknown undistortion parameter \(\lambda \) and vanishing line . This encoding of the rectified scale is the motivation for calling this solver group the directly-encoded scale (DES) solvers. A unique solution to (6) can be defined by restricting the vanishing line to the affine subspace \(l_3=1\) or by fixing a rectified scale, e.g., . The inhomogeneous representation for the vanishing line is used since it results in degree 4 constraints in the unknowns \(\lambda , l_1,l_2\) and as opposed to fixing a rectified scale, which results in complicated equations of degree 7.

Let \(\tilde{\mathcal {R}}_{i}\) and \(\tilde{\mathcal {R}}_{j}\) be repeated affine-covariant region detections. Then the scales and of affine-rectified regions and are equal, namely . Thus the unknown rectified scales of a corresponded set of n affine-covariant repeated regions can be eliminated in pairs, which gives \(n-1\) algebraically independent constraints and \({n}\atopwithdelims (){2}\) polynomial equations that are obtained by cross multiplying the denominators of the rational equations . After eliminating the rectified scales, 3 unknowns remain, and \(\lambda \), so 3 constraints are needed.

Table 3 State of the art versus proposed solvers (rows 1–3, 8)

4.3 Solver Variants

There are 3 minimal configurations for which we derive 3 solver variants: (1) 3 affine-covariant region correspondences, which we denote as the 222-configuration; (2) 1 corresponded set of 3 affine-covariant regions and 1 affine-covariant region correspondence, denoted the 32-configuration; (3) and 1 corresponded set of 4 affine-covariant regions, denoted the 4-configuration.

The notational convention introduced for the input configurations—(222, 32, 4)—is extended to the change-of-scale solvers introduced in Sect. 5 and the bench of state-of-the-art solvers evaluated in the experiments (see Sect. 7) to make comparisons between the inputs of all the solvers easier. See Fig. 4 for examples of all input configurations and results from each corresponding solver variant, and see Table 3 for a summary of all the tested solvers.

The system of equations is of degree 4 regardless of the input configuration and has the form

$$\begin{aligned}&\alpha _{j,1}\alpha _{j,2}\alpha _{j,3} \sum _{k=1}^3(-1)^{k}M^{(i)}_{3,k}\alpha _{i,k}\nonumber \\&\quad =\alpha _{i,1}\alpha _{i,2}\alpha _{i,3}\sum _{k=1}^3(-1)^{k}M^{(j)}_{3,k}\alpha _{j,k}, \end{aligned}$$
(7)

where \(M^{(i)}_{3,k}\) is the (3, k)-minor of the rectified point-parameterization matrix defined by (5).

Note that the minors \(M^{(i)}_{3,\cdot }\) are constant coefficients [see (6)]. The 222-configuration results in a system of 3 polynomial equations of degree 4 in three unknowns \(l_1,l_2\) and \(\lambda \); the 32-configuration results in 4 equations of degree 4, and the 4-configuration gives 6 equations of degree 4. Only 3 constraints are needed, but we found that for the 32- and 4-configurations that all \({n}\atopwithdelims (){2}\) equations must be used to avoid spurious solutions that are introduced when the rectified scales are eliminated and the original rational equations are multiplied with their denominators. For example, if only the polynomial equations coming from the constraints , , are used for the 4-configuration

$$\begin{aligned}&\alpha _{i,1}\alpha _{i,2}\alpha _{i,3} \sum _{k=1}^3(-1)^kM^{(j)}_{3,k}\alpha _{1,k} \nonumber \\&\quad =\alpha _{1,1}\alpha _{1,2}\alpha _{1,3} \sum _{k=1}^3(-1)^kM^{(i)}_{3,k}\alpha _{i,k} \quad i=2,3,4, \end{aligned}$$
(8)

then \(\lambda \) can be chosen such that \(\sum _{k=1}^3(-1)^kM^{(i)}_{3,k}\alpha _{1,k} = 0\), and the remaining unknowns \(l_1\) or \(l_2\) chosen such that \(\alpha _{1,1}\alpha _{1,2}\alpha _{1,3} = 0\), which gives a 1-dimensional family of solutions. Thus, adding two additional equations removes all spurious solutions. Furthermore, including all equations simplified the elimination template construction.

In principle, a solver for the 222-configuration can be applied to the 32- and 4-configurations by duplicating the corresponding points in the input. Depending on how the points are duplicated, different results are obtained. In practice we observed that if, as above, we select the input such that , , , the solver breaks down. This is expected since the ideal is no longer zero-dimensional. However, other input configurations, e.g., , , , allow us to recover the same solutions as the 4-configuration solver in addition to a set of spurious solutions corresponding to some \(\sum _{k=1}^3(-1)^kM^{(i)}_{3,k}\alpha _{i,k}\) vanishing.

4.4 Creating the Solvers

We used the automatic generator from Larsson et al. (2017a) to make the polynomial solvers for the three input configurations: 222, 32, and 4. The directly-encoded scale solver corresponding to each input configuration is denoted \(\mathtt {H} ^{\mathrm{DES}}_{222}\mathbf {l} \lambda \), \(\mathtt {H} ^{\mathrm{DES}}_{32}\mathbf {l} \lambda \), and \(\mathtt {H} ^{\mathrm{DES}}_{4}\mathbf {l} \lambda \), respectively. The resulting elimination templates were of sizes \(101\times 155\) (54 solutions), \(107\times 152\) (45 solutions), and \(115\times 151\) (36 solutions). The equations have coefficients of very different magnitude. E.g., the center-subtracted image coordinates have magnitude \(\tilde{x}_{i},\tilde{y}_{i} \approx 10^3\), and thus the distance to the image center \(\tilde{x}_{i} ^2+\tilde{y}_{i} ^2\) is \(\approx 10^6\). To improve numerical conditioning, we re-scaled both the image coordinates and the squared distances by their average magnitudes. Note that this corresponds to a simple re-scaling of the variables in \((\lambda ,l_1,l_2)\), which is inverted once the solutions are obtained.

Experiments on synthetic data (see Sect. 7.1.2) revealed that using the standard GRevLex bases in the generator of Larsson et al. (2017a) gave solvers with poor numerical stability. To generate stable solvers, we used the basis sampling technique proposed by Larsson et al. (2018). In Larsson et al. (2018) the authors propose a method for randomly sampling feasible monomial bases, which can be used to construct polynomial solvers. We generated (with Larsson et al. 2017a) 1000 solvers with different monomial bases for each of the three variants using the heuristic from Larsson et al. (2018). Following the method from Kuang and Åström (2012), the sampled solvers were evaluated on a test set of 1000 synthetic instances, and the solvers with the smallest median equation residual were kept. The resulting solvers have slightly larger elimination templates (\(133\times 187\)\(154\times 199\), and \(115\times 151\)); however, they are significantly more stable. See Sect. 7.1.2 for a comparison between the solvers using the sampled bases and the standard GRevLex bases (default in Larsson et al. 2017a).

4.5 The Fixed Lens Distortion Variant

Finally, we consider the case of known division-model parameter \(\lambda \). Fixing \(\lambda \) in (7) yields degree 3 constraints in only 2 unknowns \(l_1\) and \(l_2\). Thus only 2 correspondences of 2 repeated affine-covariant regions are needed. The generator of Larsson et al. (2017a) found a stable solver (denoted \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \)) with an elimination template of size \(12\times 21\), which has 9 solutions. Basis sampling was not required in this case. There is a second minimal problem for 3 repeated affine-covariant regions; however, unlike the case of unknown distortion, this minimal problem is equivalent to the \(\mathtt {H} _{22}\) variant. It also has 9 solutions and can be solved with the \(\mathtt {H} ^{\mathrm{DES}}_{222}\mathbf {l} \lambda \) solver by duplicating a region in the input. The proposed \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \) solver contrasts to the solvers from Ohta et al. (1981), Criminisi and Zisserman (2000) and Chum and Matas (2010) in that it is generated from constraints directly induced by the rectifying homography rather than its linearization.

4.6 Degeneracies

We observed three important degeneracies for the DES solvers. First, if the vanishing line passes through the image origin, i.e., , then the radial term in the homogeneous coordinate of (4) is canceled. In this case, it is not possible to recover the radial distortion using the equations in (7). However, the degeneracy does not arise from the problem formulation. An affine transform can be applied to the undistorted image such that the vanishing line \(\mathbf {l} \) in the affine-transformed space has \(l_3 \ne 0\). As future work, we will investigate how to remove this degeneracy from the solvers.

Secondly, the problem degenerates if the scene plane is already fronto-parallel to the camera and the corresponding points from the affine-covariant regions fall on circles centered at the image center. Since the corresponding points have the same radii, they will undergo the same scaling due to radial distortion [see (3)]. In this case, the radial distortion parameter again becomes unobservable since it is impossible to disambiguate the scale of the features from the scaling of the lens distortion.

Third, suppose that (1) \(\mathtt {H} \) is a rectifying homography other than the identity matrix, (2) that the image has no radial distortion, (3) and that all corresponding points from repeated affine-covariant regions fall on a single circle centered at the image center. As in the second case, applying the division model (see Sect. 3.1) uniformly scales the points about the image center. Given \(\lambda \ne 0\), for a transformation by \(f(\cdot ,\lambda )\) defined in (3) of the points lying on the circle there is a scaling matrix \(\mathtt {S} (\lambda )={\text {diag}}(1/\lambda ,1/\lambda ,1)\) that maps the points back to their original positions. Thus there is a 1D family of rectifying homographies given by \(\mathtt {H} \mathtt {S} (\lambda )\) for the corresponding set of undistorted images given by \(f(\cdot ,\lambda )\).

4.7 Reflections

In the derivation of (7), the rectified scales were eliminated with the assumption that they had equal signs (see Sect. 4.4). Reflections will have oppositely signed rectified scales; however, reversing the orientation of left-handed affine frames in a simple pre-processing step that admits the use of reflections. Suppose that \(\det \left( \begin{bmatrix} \tilde{\mathbf{x }}_{i,1} \, \tilde{\mathbf{x }}_{i,2} \, \tilde{\mathbf{x }}_{i,3} \end{bmatrix}\right) < 0\), where \((\tilde{\mathbf{x }}_{i,1},\tilde{\mathbf{x }}_{i,2},\tilde{\mathbf{x }}_{i,3})\) is a distorted left-handed point parameterization of an affine-covariant region. Then reordering the point parameterization as \((\tilde{\mathbf{x }}_{i,3},\tilde{\mathbf{x }}_{i,2},\tilde{\mathbf{x }}_{i,1})\) results in a right-handed point-parameterization such that \(\det \left( \begin{bmatrix} \tilde{\mathbf{x }}_{i,3} \, \tilde{\mathbf{x }}_{i,2} \, \tilde{\mathbf{x }}_{i,1} \end{bmatrix}\right) > 0\), and the scales of corresponded rectified reflections will be equal.

5 The Change-of-Scale (CS) Solvers

The proposed change-of-scale (CS) solvers use the Jacobian determinant of the rectifying transformation to induce local constraints on the imaged vanishing line and the unknown parameter for the division model of radial lens distortion (see Sect. 3.1). In particular, the derivation uses the fact that the unknown division model parameter is encoded exclusively in the third coordinate [see (3)], which results in a formulation that is tractable for automatic solver generators.

In fact, there are several related works that linearize the homography and impose constraints on the Jacobian determinant (Barath and Hajder 2017; Chum and Matas 2010; Ohta et al. 1981; Köser et al. 2008; Köser and Koch 2008); however, the proposed CS solvers are the first solvers to incorporate lens distortion with this approach. The Jacobian determinant gives the change of scale of a function at a point, which motivates the name change of scale (CS) for the solvers proposed in this section. It is a surprising discovery that the combined effects of severe lens distortion and perspective imaging from oblique views can be linearized over regions with scales that are typical for covariant region detections (see Fig. 5), which measure the relative scale change between coplanar repeats due to imaging. In fact, the change-of-scale solvers are used to rectify near fisheye distortions effectively (see Fig. 5).

The CS solvers have the advantage over the DES solvers in that they admit strictly scale-covariant regions detections, whereas the DES solvers require affine-covariant region detections. As with the DES solvers in Sect. 4.4, the solvers restore the affine invariant that coplanar repeated regions have the same scale.

5.1 The Change-of-Scale Formulation

The Euclidean coordinates of the rectified point [refer to (4)], of any imaged point \(\tilde{\mathbf{x }}_i=(\tilde{x}_i,\tilde{y}_i,1)^{\top }\) on the scene plane is given by the vector-valued nonlinear function

The function , which returns the inhomogeneous coordinates of the undistorted and rectified point , can be linearized at \((\tilde{x},\tilde{y})\) with the first-order Taylor expansion,

The Jacobian determinant gives the approximate change of scale of the rectifying and undistorting function near the point \((\tilde{x},\tilde{y})^{\top }\). Let \(\tilde{s}_i\) be the scale of an image region \(\tilde{\mathcal {R}}_{i}\) with its centroid at , where the preimage of is on some scene plane \(\varPi \). Let be the rectified scale of . Then the unknown rectified scale can be expressed in terms of the distorted scale and the Jacobian determinant as

(9)
Fig. 5
figure 5

Change-of-scale solver results. The input images are on the first and third rows and show the distorted image of the vanishing line in orange and the dense change of scale (see Sect. 5.5) in the parula color map that is alpha blended on the scene plane. Purple corresponds to the smallest relative scale change due to the imaging of the scene plane and yellow to the largest with respect to a chosen reference point on the plane. The second and fourth rows contain the rectified results from the \(\mathtt {H} ^{\mathrm{CS}}_{222}\mathbf {l} \lambda \) change-of-scale solver (see Sect. 5)

5.2 Eliminating the Rectified Scale

The equation for the rectified scale given in (9) defines the unknown geometric quantities: (1) division-model parameter \(\lambda \), (2) scene-plane vanishing line , (3) and the rectified scale of the rectified image region . The distorted scale \(\tilde{s}_i\) of imaged region \(\tilde{\mathcal {R}}_{i}\) is measured by some scale-covariant region detector, e.g., the SIFT or Hessian Affine detector (Lowe 2004; Mikolajczyk and Schmid 2004). Let \(\tilde{\mathcal {R}}_{i}\) and \(\tilde{\mathcal {R}}_{j}\) be detected repeated coplanar regions. Then the scales of their rectified preimages and are equal, namely . A unique solution is defined by restricting the vanishing line to the affine subspace \(l_3=1\), which results in degree 4 constraints. The alternative of fixing the rectified scale is rejected since it results in higher degree constraints. Thus, the unknown rectified scales of a group of n co-planar repeats can be eliminated in pairs [see (10)], which gives \(n-1\) algebraically independent constraints and \({n}\atopwithdelims (){2}\) polynomial equations that are obtained by cross multiplying the denominators of the rational equations .

5.3 Creating the Solver

After eliminating the rectified scales 3 unknowns remain, namely and \(\lambda \), so 3 equations are needed. The minimal configurations are the same as the DES solvers and an analogous naming scheme is adopted for the CS solvers. The CS solvers can be obtained from 3 correspondences of 2 coplanar repeats, denoted \(\mathtt {H} ^{\mathrm{CS}}_{222}\mathbf {l} \lambda \), 1 corresponded set of 3 and 1 correspondence of 2 coplanar repeats, denoted \(\mathtt {H} ^{\mathrm{CS}}_{32}\mathbf {l} \lambda \), or 1 corresponded set of 4 coplanar repeats, denoted \(\mathtt {H} ^{\mathrm{CS}}_{4}\mathbf {l} \lambda \) (see the comparison in Table 3). The system of equations contains rational expressions of the form

(10)

After multiplying equations (10) by common denominators we obtain a system of three quartic polynomial equations in three unknowns, namely \(l_1,l_2\) and \(\lambda \). Again we used the automatic generator from Larsson et al. (2017a) to create the polynomial solvers for all of the minimal configurations. The structure of the change-of-scale solvers turned out to be similar to the DES solvers (i.e., same monomials and number of solutions, but the coefficients in equations are computed differently).

5.4 Degeneracies

The change-of-scale solvers suffer from the same degeneracies that are listed in Sect. 4.6 for the DES solvers. There are likely different degeneracies between the two families of solvers, but an exhaustive analysis is difficult.

5.5 Dense Change of Scale Due to Imaging

Up to a global scale ambiguity, the rectified scale of an imaged scene plane region can be approximated with (9). The projective and radial lens distortion components of the imaging transformation are linearized in (9), so the approximation of the rectified scale is more accurate for smaller regions.

The combined change-of-scale effects of lens distortion and perspective warping due to the imaging of a scene plane can be seen in Fig. 5. The reference point is the image of the centroid of the convex hull of rectified coplanar covariant regions. The dense relative change of scale is rendered by the alpha-blended parula colormap in the original images of Fig. 5. Regions with larger scale change due to imaging are orange; regions close to the scale change of the imaged reference point are blue, and regions with vanishing relative scale change are purple. The purple regions will be expanded in the rectified image and the yellow regions shrunk such that the affine rectification restores the affine invariant that coplanar regions whose preimages are of equal scale are the same scale in the rectified image.

For pinhole cameras, regions undergoing an equal change of scale from imaging are projected to isolines (Criminisi and Zisserman 2000). However, as seen in Fig. 5, for radially-distorted cameras parameterized by the division model (see Sect. 3.1), regions undergoing equal change of scale from imaging are constrained to circles. This is consistent with the fact that scene lines are imaged as circles under the division model of radial lens distortion (Bukhari and Dailey 2013; Fitzgibbon 2001; Strand and Hayman 2005; Wang et al. 2009). The distorted image of the vanishing line as a circle under the division model is shown in a synthetic scene of Fig. 2 and in real images in Figs. 5 and 12 (the orange circular segments).

The dense relative change of scale is useful for automatic rectification. E.g., in images where the image of the vanishing line intersects the image extents, regions approaching the vanishing line rectify to arbitrarily large scales. Thus a bound on the rectified scale is needed to prevent the rectified image from blowing up. Using (9), an image can be masked such that any masked point has a relative change of scale bounded by some user threshold, which can be used to generate reasonably sized rectifications. All images in this document were automatically generated with this method.

6 Robust Estimation

The solvers are used in a LO-RANSAC-based robust-estimation framework (Chum et al. 2004). Affine rectifications and undistortions are jointly hypothesized by one of the proposed solvers. A metric upgrade is attempted, and models with maximal consensus sets are locally optimized by an extension of the method introduced in Pritts et al. (2014). The metric-rectifications are presented in the results.

Fig. 6
figure 6

Stability study. The equation residuals (deviation from 0) for the particular polynomial system of equations solved by each of the DES and CS solvers is used to measure solver stability (see Sects. 4.2, 5.2, respectively). The minimal solution closest to the ground truth is evaluated and reported for 1000 noiseless synthetic scenes. The basis selection method of Larsson et al. (2018) is essential for stable solver generation

6.1 Local Features and Descriptors

Affine-covariant region detectors are highly repeatable on the same imaged scene texture with respect to significant changes of viewpoint and illumination (Mikolajczyk and Schmid 2005; Mishkin et al. 2018). Their proven robustness in the multi-view matching task makes them good candidates for representing the local geometry of repeated textures. In particular, we use the maximally-stable extremal region and Hessian-affine detectors (Matas et al. 2002; Mikolajczyk and Schmid 2004). The affine-covariant regions are given by an affine transform (see Sect. 4.1), equivalently 3 distinct points, which defines an affine frame in the image space (Obdržálek and Matas 2002). The image patch local to the affine frame is embedded into a descriptor vector by the RootSIFT transform (Arandjelović and Zisserman 2012; Lowe 2004).

Fig. 7
figure 7

Warp errors for fixed \(1-\sigma \) pixel noise. Reports the cumulative distributions of raw warp errors \(\varDelta ^{\mathrm{warp}}\) (see Sect. 7.1.1) for the bench of solvers on 1000 synthetic scenes with \(1-\sigma \) pixel of imaging white noise added. The proposed solvers (with undistortion estimation) give significantly better proposals than the state of the art

Fig. 8
figure 8

Sensitivity benchmark. Comparison of two error measures after 25 iterations of a simple ransac for different solvers with increasing levels of white noise added to the affine frame correspondences, where the normalized division model parameter is set to \(-\,4\) (see Sect. 3.1), which is similar to the distortion of a GoPro Hero 4. (top row) Shows results for translated coplanar repeats, and (bottom row) shows results for rigidly-transformed coplanar repeats. (left column) Reports the root mean square warp error \(\varDelta _{\mathrm{RMS}}^{\mathrm{warp}}\), and (right column) reports the relative error of the estimated division model parameter. The proposed solvers are significantly more robust for both types of repeats on both error measures

Fig. 9
figure 9

(left) Real solutions. The histograms of the number of real solutions returned by the proposed solvers. (right) Feasible solutions. Typically, only 1 solution is feasible. Feasibility is determined by checking that the division model parameter falls in a reasonable interval. The frequencies were calculated on results from 150,000 trials on different scenes with varying levels of imaged white-noise

6.2 Appearance Clustering and Sampling

Affine frames are tentatively labeled as repeated texture by their appearance. The appearance of an affine frame is given by the RootSIFT embedding of the image patch local to the affine frame (Arandjelović and Zisserman 2012). The RootSIFT descriptors are agglomeratively clustered, which establishes pair-wise tentative correspondences among connected components. Each appearance cluster has some proportion of its indices corresponding to affine frames that represent the same coplanar repeated scene content, which are the inliers of that appearance cluster. The remaining affine frames are the outliers.

Sample configurations for the proposed minimal solvers are illustrated in Fig. 4 and detailed in Sect. 4.3. To recap, the solver variants for the proposed undistorting and rectifying minimal solvers—either from the DES or CS family—are 3 correspondences of 2 covariant regions (the 222-solvers), a corresponded set of 3 covariant regions and a correspondence of 2 covariant regions (the 32-solvers), and a corresponded set of 4 covariant regions (the 4-solvers). For each RANSAC trial, appearance clusters are selected with the probability given by its relative size to the other appearance clusters, and the required number of correspondences or corresponded sets are drawn from the selected clusters.

6.3 Metric Upgrade and Local Optimization

The affine-covariant regions that are members of the minimal sample are affine rectified by each feasible model returned by the solver; typically there is only 1 (see Fig. 9). A metric upgrade is estimated from the affine-rectified minimal sample set using the linear solver introduced in Pritts et al. (2014). Then all affine-covariant regions are metrically-upgraded using the estimate. The consensus set is measured in the metric-rectified space by verifying the congruence of the basis vectors of the corresponded affine frames. Congruence is an invariant of metric-rectified space and is a stronger constraint than the equal-scale invariant of affine-rectified space that was used to derive the proposed solvers. The metric upgrade essentially comes for free by inputting the affine-covariant regions sampled for the proposed solvers to the linear metric-upgrade solver proposed in Pritts et al. (2014). By using the metric-upgrade, the verification step of ransac can enforce the congruence of corresponding affine-covariant region extents (equivalently, the lengths of the linear basis vectors) to estimate an accurate consensus set. Models with the maximal consensus set are locally optimized in a method similar to Pritts et al. (2014).

7 Experiments

The stabilities and noise sensitivities of the proposed solvers are evaluated on synthetic data. We compare the proposed solvers to a bench of 4 state-of-the-art solvers (see Table 3). We apply the denotations for the solvers introduced in Sect. 4.3 to all the solvers in the benchmark; e.g., a solver requiring 2 correspondences of 2 affine-covariant regions will be prefixed by \(\mathtt {H} _{22}\), while the proposed solver requiring 1 corresponded set of 4 affine-covariant regions is prefixed by \(\mathtt {H} _4\).

Included are two state-of-the-art single-view solvers for radially-distorted conjugate translations, denoted \(\mathtt {H} _2\mathbf {l} \mathbf {u} \lambda \) and \(\mathtt {H} _{22}\mathbf {l} \mathbf {u} \mathbf {v} \lambda \) (Pritts et al. 2018); a full-homography and radial distortion solver, denoted \(\mathtt {H} _{22}\lambda \) (Fitzgibbon 2001); and the change-of-scale solver for affine rectification of Chum and Matas (2010), denoted \(\mathtt {H} ^{\mathrm{CS}}_{22}\mathbf {l} \).

The sensitivity benchmarks measure the performance of rectification accuracy by the warp error (see Sect. 7.1.1) and the relative error of the division parameter estimate. Stability is measured by the equation residuals of the solution that is closest to ground truth. The \(\mathtt {H} _{22}\lambda \) solver is omitted from the warp error since the vanishing line is not estimated, and the \(\mathtt {H} ^{\mathrm{CS}}_{22}\mathbf {l} \) and \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \) solvers are omitted from benchmarks involving lens distortion since the solvers assume a pinhole camera.

7.1 Synthetic Data

The performance of the proposed solvers on 1000 synthetic images of 3D scenes with known ground-truth parameters is evaluated. A camera with a random but realistic focal length is randomly placed with respect to a scene plane such that it is mostly in the camera’s field-of-view. The image resolution is set to \(1000\times 1000\) pixels. The noise sensitivity of the solvers are evaluated both on conjugately-translated and rigidly-transformed coplanar repeats (see Fig. 8). Scenes with conjugately-translated coplanar repeats are evaluated so that the proposed solvers can be compared to state-of-the-art solvers of Pritts et al. (2018). For either motion type, affine frames are generated on the scene plane such that their scale with respect to the scene plane is realistic. The modeling choice reflects the use of affine-covariant region detectors on real images (see Sect. 4.1).

The image is distorted according to the division model. For the sensitivity experiments, isotropic white noise is added to the distorted affine frames at increasing levels. Performance is characterized by the relative error of the estimated distortion parameter and by the warp error, which measures the accuracy of the affine-rectification.

7.1.1 Warp Error

Since the accuracy of scene-plane rectification is a primary concern, the warp error for rectifying homographies proposed by Pritts et al. (2016) is reported, which we extend to incorporate the division model for radial lens distortion (Fitzgibbon 2001). A scene plane is tessellated by a \(10\times 10\) square grid of points \(\{\,\mathbf {X} _{i} \,\} _{i=1}^{100} \) and imaged as \(\left\{ \,\tilde{\mathbf{x }}_{i}\,\right\} _{i=1}^{100}\) by the lens-distorted ground-truth camera. The tessellation ensures that error is uniformly measured over the scene plane. A round trip between the image space and rectified space is made by affine-rectifying \(\left\{ \,\tilde{\mathbf{x }}_{i}\,\right\} _{i=1}^{100}\) using the estimated division model parameter \(\hat{\lambda }\) and rectifying homography \(\hat{\mathtt {H}}\) and then imaging the rectified plane by the ground-truth camera \(\mathtt {P} \). Ideally, the ground-truth camera \(\mathtt {P} \) images the rectified points onto the distorted points \(\left\{ \,\tilde{\mathbf{x }}_{i}\,\right\} _{i=1}^{100}\). There is an affine ambiguity, denoted \(\mathtt {A} _{}\), between \(\hat{\mathtt {H}}\) and the ground-truth camera matrix \(\mathtt {P} \). The ambiguity is estimated during computation of the warp error,

$$\begin{aligned} \varDelta ^{\mathrm{warp}}=\min _{\mathtt {A} _{}} \sum _{i} d^2(\tilde{\mathbf{x }},f^d(\mathtt {P} \mathtt {A} _{} \hat{\mathtt {H}} f(\tilde{\mathbf{x }},\hat{\lambda })),\lambda ), \end{aligned}$$
(11)

where \(d(\cdot ,\cdot )\) is the Euclidean distance, \(f^d\) is the inverse of the division model [the inverse of (3)], and \(\left\{ \,\tilde{\mathbf{x }}_{i}\,\right\} _{i=1}^{100}\) are the imaged grid points of the scene-plane tessellation. The root mean square warp error for \(\left\{ \,\tilde{\mathbf{x }}_{i}\,\right\} _{i=1}^{100}\) is reported and denoted as \(\varDelta ^{\mathrm{warp}}_{\mathrm{RMS}}\). The vanishing line is not directly estimated by the solver \(\mathtt {H} _{22}\lambda \) of Fitzgibbon (2001), so it is not reported.

7.1.2 Numerical Stability

The stability study of Fig. 6 compares compares the solver variants generated using the standard GRevLex bases versus solvers generated using the basis selection method of Larsson et al. (2018) (also see Sect. 4.4). The generator of Larsson et al. (2017a) was used to generate both sets of solvers. Stability is measured as the equation residual (equivalently, deviation from 0) of the polynomial system of equations associated with each solver (see Sects. 4.2, 5.2) for the solution that is closest to ground truth for noiseless affine-frame correspondences across realistic synthetic scenes, which are generated as described in the introduction of Sect. 7.1.

The normalized ground-truth parameter of the division model \(\lambda \) is set to \(-\,4\), a value typical for wide field-of-view cameras like the GoPro, where the image is normalized by \({1}/({\text {width}+\text {height}})\). Figure 6 reports the histogram of \(\log _{10}\) equation residuals and shows that the basis selection method of Larsson et al. (2018) significantly improves the stability of the generated solvers. The basis-sampled solvers are used for the remainder of the experiments.

7.1.3 Noise Sensitivity

The proposed and state-of-the-art solvers are tested with increasing levels of white noise added to the point parameterizations (see Sect. 4.1) of the affine-covariant region correspondences that are either translated or rigidly-transformed on the scene plane (see Fig. 8). The amount of white noise is given by the standard deviation of a zero-mean isotropic Gaussian distribution, and the solvers are tested at noise levels of \(\sigma \in \{\, 0.1,0.5,1,2,5\, \}\). The ground-truth normalized division model parameter is set to \(\lambda =-4\), which is typical for GoPro-type imagery in normalized image coordinates.

Fig. 10
figure 10

Distortion study. Reports the root-mean-square warp error \(\varDelta ^{\mathrm{warp}}_{\mathrm{RMS}}\) (see Sect. 7.1.1) for 1000 synthetic scenes imaged by cameras with varying normalized division model parameter with \(1-\sigma \) pixel white noise. Solvers \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \mid \lambda =-4\), \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \mid \lambda =-2\), and \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \mid \lambda =0\) rectify the pinhole image that is undistorted with the given fixed division model parameter. The \(\mathtt {H} ^{\mathrm{DES}}_{222}\mathbf {l} \lambda \) solver is competitive even for the case where the fixed division model parameter matches ground truth and gives stable performance across all distortion levels

The cumulative distributions of warp errors in Fig. 7 show that for 1-pixel white noise on conjugately-translated affine frames, the proposed solvers—\(\mathtt {H} ^{\mathrm{DES}}_{222}\mathbf {l} \lambda \), \(\mathtt {H} ^{\mathrm{DES}}_{32}\mathbf {l} \lambda \), \(\mathtt {H} ^{\mathrm{DES}}_{4}\mathbf {l} \lambda \),\(\mathtt {H} ^{\mathrm{CS}}_{222}\mathbf {l} \lambda \), \(\mathtt {H} ^{\mathrm{CS}}_{32}\mathbf {l} \lambda \) and \(\mathtt {H} ^{\mathrm{CS}}_{4}\mathbf {l} \lambda \)—give significantly more accurate estimates than the state-of-the-art conjugate translation solvers of Pritts et al. (2018). Interestingly, all of the proposed undistorting variants from both the DES and CS families of rectifying solvers have nearly identical performance.

If 5 pixel RMS warp error is fixed as a threshold for a good model proposal, then 30% of the models given by the proposed solvers are good versus roughly 10% by Pritts et al. (2018). The proposed \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \) solver and the \(\mathtt {H} ^{\mathrm{CS}}_{22}\mathbf {l} \) of Chum and Matas (2010) each give biased proposals since they cannot estimate lens distortion.

Fig. 11
figure 11

Field-of-view study. The proposed solver \(\mathtt {H} ^{\mathrm{DES}}_{222}\mathbf {l} \lambda \) gives accurate rectifications across all fields-of-view: (left-to-right) Nikon D60, GoPro Hero 4 at the medium- and wide-FOV settings, and a Panasonic DMC-GM5 with a Samyang 7.5 mm fisheye lens. The outputs are the undistorted (middle row) and rectified images (bottom row)

Fig. 12
figure 12

Wide-angle and fisheye results. Input images (left) with the estimated distorted vanishing line (orange), undistorted (middle) and rectified (right). Results are produced with the proposed \(\mathtt {H} ^{\mathrm{DES}}_{222}\mathbf {l} \lambda \) solver

Fig. 13
figure 13

Solver comparison. The state-of-the art solvers \(\mathtt {H} _{22}\mathbf {l} \mathbf {u} \mathbf {v} \lambda \) (Pritts et al. 2018) and \(\mathtt {H} ^{\mathrm{CS}}_{22}\mathbf {l} \) (Chum and Matas 2010) are compared with the proposed solvers \(\mathtt {H} ^{\mathrm{DES}}_{222}\mathbf {l} \lambda \) and \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \) on images containing either translated or rigidly-transformed coplanar repeated patterns with increasing amounts of lens distortion. (top) small distortion, rigidly-transformed; (middle) medium distortion, translated; (bottom) large distortion, rigidly-transformed. Accurate rectifications for all images are only given by the proposed \(\mathtt {H} ^{\mathrm{DES}}_{222}\mathbf {l} \lambda \)

Fig. 14
figure 14

Straight lines don’t have to be straight. (left pair) It is difficult to disentangle the effects of radial lens distortion from the projections of curvilinear forms in the image. E.g., the waterfront, fence and compass tile mosaic are circles, which violate the plumb-line assumption and cannot be used for undistortion or rectification (Devernay and Faugeras 2001). However, the imaged rigidly-transformed coplanar repeats can be used to rectify this image with the solvers proposed in this paper. (right pair) Note that the distortion center is clearly decentered in the third image, but a good rectification is still achieved for the fisheye image

The solvers are wrapped by a basic ransac estimator that minimizes the RMS warp error \(\varDelta _{\mathrm{RMS}}^{\mathrm{warp}}\) over 25 minimal samples of affine frames for each of the conjugately-translated and rigidly-transformed coplanar repeat sensitivity studies in Fig. 8. The ransac estimates are summarized in boxplots for 1000 synthetic scenes. The interquartile range is contained within the extents of a box, and the median is the horizontal line dividing the box. As shown in Fig. 8, the proposed solvers—\(\mathtt {H} ^{\mathrm{DES}}_{222}\mathbf {l} \lambda \), \(\mathtt {H} ^{\mathrm{DES}}_{32}\mathbf {l} \lambda \), \(\mathtt {H} ^{\mathrm{DES}}_{4}\mathbf {l} \lambda \), \(\mathtt {H} ^{\mathrm{CS}}_{222}\mathbf {l} \lambda \), \(\mathtt {H} ^{\mathrm{CS}}_{32}\mathbf {l} \lambda \) and \(\mathtt {H} ^{\mathrm{CS}}_{4}\mathbf {l} \lambda \)—again give the most accurate lens distortion and rectification estimates. In fact, the proposed solvers are superior to the state of the art at all noise levels. The proposed distortion-estimating solvers give solutions with less than 5-pixel RMS warp error \(\varDelta _{\mathrm{RMS}}^{\mathrm{warp}}\) 75% of the time and estimate the correct division model parameter more than half the time at the 2-pixel noise level. The proposed fixed-lens distortion solver \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \) and the \(\mathtt {H} ^{\mathrm{CS}}_{22}\mathbf {l} \) of Chum and Matas (2010) give biased solutions since they assume the pinhole camera model.

7.1.4 Feasible Solutions and Runtime

This study shows the number of real and feasible solutions given by the proposed solvers for 150,000 trials across 1000 scenes at varying noise levels with a fixed normalized division model parameter of \(\lambda =-4\). Figure 9 (left) shows the number of real solutions, and Fig. 9 (right) shows the subset of feasible solutions as defined by the estimated normalized division-model parameter solution falling in the interval \([-8,0.5]\). All solutions are considered feasible for the \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \) solver. Figure 9 (right) shows that in \(97 \%\) of the scenes only 1 solution is feasible, which means that nearly all incorrect solutions can be quickly discarded.

The runtimes of the DES family of solvers are reported. The MATLAB implementation of the solvers on a standard desktop are 2 ms for \(\mathtt {H} ^{\mathrm{DES}}_{222}\mathbf {l} \lambda \), 2.2 ms for \(\mathtt {H} ^{\mathrm{DES}}_{32}\mathbf {l} \lambda \), 1.7 ms for \(\mathtt {H} ^{\mathrm{DES}}_{4}\mathbf {l} \lambda \), and 0.2 ms for \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \). Due to the similar structure in the equations, the CS solvers have comparable performance.

7.2 Distortion Study

The distortion study evaluates the accuracy of rectifications as measured by the warp error (see Sect. 7.1.1) over a normalized ground truth division model parameter from \(\lambda \in \{\,-5,-4,-3,-2,-1,0\,\}\), which are values that are characteristic of near-fisheye to pinhole lenses (see Sect. 3.1). The images have fixed \(1\mathrm{px}-\sigma \) white noise added. The methodology of scene generation is the same as detailed in Sect. 7.1.

Since the sensitivity experiments of Sect. 7.1.3 show that the performance of the proposed solvers is essentially the same with respect to noise, we choose \(\mathtt {H} ^{\mathrm{DES}}_{222}\mathbf {l} \lambda \) as their representative. It is evaluated against 3 solvers—\(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \mid \lambda =-4\), \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \mid \lambda =-2\), and \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \mid \lambda =0\)—each of which undistort at a different fixed normalized division model parameter, namely \(\lambda \in \{ \, -4,-2,0 \, \}\), respectively. The fixed distortion solvers estimate the affine rectification with the proposed \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \) (see Sect. 4.5) using the undistorted minimal sample, which is computed with the given fixed division model parameter of the solver.

Figure 10 shows that even for the case where the fixed division model parameter of the solver is equivalent to the ground truth, the best solutions of the proposed \(\mathtt {H} ^{\mathrm{DES}}_{222}\mathbf {l} \lambda \) are equivalent to rectifying with known ground truth. Furthermore, the \(\mathtt {H} ^{\mathrm{DES}}_{222}\mathbf {l} \lambda \) is stable, giving the same performance at a fixed noise level across all ground truth division model parameters. As expected, the warp error quickly increases for the \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \mid \lambda =-4\), \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \mid \lambda =-2\), and \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \mid \lambda =0\) solvers as the ground truth division model parameter differs from the fixed division model parameter.

7.3 Real Images

The field-of-view experiment of Fig. 11 evaluates the proposed \(\mathtt {H} ^{\mathrm{DES}}_{222}\mathbf {l} \lambda \) solver on real images taken with narrow, medium, wide-angle, and fish-eye lenses. Images with diverse scene content were chosen. Figure 11 shows that the \(\mathtt {H} ^{\mathrm{DES}}_{222}\mathbf {l} \lambda \) gives accurate rectifications for all lens types.

Figure 12 compares the proposed \(\mathtt {H} ^{\mathrm{DES}}_{222}\mathbf {l} \lambda \) and \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \) solvers to the state-of-the-art solvers on images with increasing levels of radial lens distortion (top to bottom) that contain either translated or rigidly-transformed coplanar repeated patterns. Only the proposed \(\mathtt {H} ^{\mathrm{DES}}_{222}\mathbf {l} \lambda \) accurately rectifies on both pattern types and at all levels of distortion. The results are after a local optimization and demonstrate that the method of Pritts et al. (2014) is unable to accurately rectify without a good initial guess at the lens distortion. The proposed fixed-distortion solver \(\mathtt {H} ^{\mathrm{DES}}_{22}\mathbf {l} \) gave a better rectification than the change-of-scale solver \(\mathtt {H} ^{\mathrm{CS}}_{22}\mathbf {l} \) of Chum and Matas (2010).

Figure 13 shows the rectifications of a deceiving picture of a landmark taken by wide-angle and fisheye lenses. From the wide-angle image, it is not obvious which lines are really straight in the scene making undistortion with the plumb-line constraint difficult.

Additional results for wide-angle and fisheye lenses are included in Fig. 14 near the end of this document.

8 Conclusion

This paper proposes two groups of solvers (DES and CS) that extend affine-rectification to radially-distorted images that contain essentially arbitrarily repeating coplanar patterns. Both solver groups use the invariant that imaged coplanar repeats have the same scale if rectified. Despite using the equal scale invariant of rectified coplanar repeats in different ways to impose constraints on the undistortion and rectification parameters, the generated solvers have identical structure and similar stability and robustness to imaging noise. This was a surprising finding since the CS solvers linearize the undistorting and rectifying transformation to generate the constraint equations. Given the results for the CS solvers on synthetic benchmarks and challenging images, it can be concluded that the first-order approximation of the rectifying transformation is sufficient to handle the effect of severe lens distortion of an obliquely imaged scene plane. Equivalently, the linearization is reasonable over a measurement region that is typical for an affine-covariant region detection.

Synthetic experiments show that both groups of proposed solvers are more robust to noise with respect to the state of the art, give stable estimates across a wide range of distortions, and are applicable to a broader set of image content. The paper also demonstrates that robust solvers can be generated with the basis selection method of Larsson et al. (2018) by maximizing for numerical stability. We expect basis selection to become a standard procedure for improving solver stability. Experiments on difficult images with large radial distortions confirm that the solvers give high-accuracy rectifications if used inside a robust estimator. By jointly estimating rectification and radial distortion, the proposed minimal solvers eliminate the need for sampling lens distortion parameters in ransac. The code is published at https://github.com/prittjam/repeats.

In future work, we will attempt to remove the degeneracies from the solvers unrelated to the problem formulation. Another future direction, similar to the recent work of Camposeco et al. (2018), is to generate a set of hybrid solvers by combining constraint equations from the DES and CS and the conjugate translation solvers of Pritts et al. (2018). The constraint equations for the DES and CS solvers may be sensitive to different properties of the inputted covariant regions, such as their size, shape and relative orientation. During sampling, the most robust solver given the properties of the minimal sample (as listed above) can be chosen to hypothesize the model.