1 Introduction

Typical computer vision techniques rely on point image features to perform, for example, visual SLAM [19], camera motion estimation [20], place recognition [3] or object recognition [8], among many others. However, higher level geometric primitives, such as lines, are gaining growing importance in recent years, as they provide a set of advantages, specially in terms of reliability. These include their higher robustness against illumination changes in the image, as well as their natural presence in human-made environments [25]. Not only that, unlike point features, lines can be found even in images from low-textured environments, which is known to be one of the main issues of geometric computer vision techniques. This problem can be partially mitigated by employing fish-eye or omnidirectional cameras, which provide wider Field-of-View (FoV), hence covering larger areas of the environment and consequently reducing the probability of capturing low-texture areas, but it still remains an issue.

The benefits that line features provide have been exploited recently for the development of new systems employing them either as their main source of information or at least complementary to point features [5, 6, 14, 15, 17]. Nevertheless, detecting and managing line features is more costly than using their point counterpart, specially in wide FoV cameras where distortion is significantly larger than in standard perspective cameras, causing that straight lines in the environment are no longer projected as straight 2D lines on the images. To avoid this, the images need to be rectified [11] in a process that presents two main issues: (i) it is computationally expensive, and (ii) the maximum effective FoV after rectification is reduced, in practice, to about 130 \({}^{\circ }\) [7] instead of the original \(\sim \)180 \({}^{\circ }\) FoV for fish-eye and up to 360 \({}^{\circ }\) for omnidirectional cameras. Our proposal provides a solution to these two problems.

In this paper, we present D-LSD (Distorted Line Segment Detector), the first line segment detector that enables direct detection of distorted lines from non-rectified images. We formulate the distorted line detection problem under the mild assumption of a calibrated central projection system, resulting in a very efficient line detector capable of handling from pinhole to omnidirectional camera projection models. We evaluate our approach with publicly available real images and compare the execution time and detection rates against state-of-the-art straight line detection methods on rectified images.

2 Related Work

The most popular line segment detector is LSD [16], by Grompone von Gioi et al. The LSD algorithm works by grouping pixels with the same gradient direction and validating lines as rare events in the a contrario model, according to the Helmholtz principle [10]. This method gained popularity as one of the first line detectors capable of operating in real-time.

Alternatively, in [1], Akinlar and Topal propose to detect lines from a continuous, 1 px-wide chain of edge pixels by least squares line fitting. These high quality edges are detected from their proposed Edge Drawing [23] algorithm. This approach, termed EDLines, results in a very efficient line detection algorithm (about one order of magnitude faster than LSD). A similar approach, named CannyLines, is proposed in [18] by Lu et al. where lines are detected directly on the edge map extracted by their parameter-free Canny operator.

Following a different approach, Cho et al. propose in [9] a new line segment detector exploiting the properties of digitalized lines. In this work, anchor points (i.e. peaks in the gradient map) are connected horizontally or vertically only, creating what the authors call linelets. Then these linelets are grouped into line segments according to some rules derived from the digitalization properties and validated from a probabilistic perspective. Recently, Zhang et al. proposed AG3line [24], which detects line segments by actively grouping anchor points. Finally, the detected lines are validated according to the density of anchors and the alignment of the gradient magnitudes.

Unlike our proposal, all these methods operate only in rectified images and cannot find straight lines in distorted images. In this work, we rely on the efficient Edge Drawing [23] algorithm for continuous edge segmentation and exploit the fact that distorted lines are locally straight to fit 2D line segments, similar to the EDLines [1] approach. Then, connected 2D line segments forming a smooth convex curve are grouped and validated as being the projection of a 3D line, according to the calibrated intrinsic parameters of the camera.

3 Background

The mapping of a 3D point \(\boldsymbol{x} \in \mathrm {I\!R}^3\) to image coordinates \(\boldsymbol{u} \in \Omega \subset \mathrm {I\!R}^2\) is characterized by the projection function \(\pi : \mathrm {I\!R}^3 \rightarrow \Omega \). In this work, we consider central projection cameras in which all light-rays pass through a single point in space: the projection (or optical) center of the camera. This includes both dioptric and catadioptric cameras.

In general, lines are projected as conic sections in any radially distorted image. Svoboda and Pajdla [22] showed that lines are imaged as conic sections in catadioptric systems. Accordingly, Barreto [4] showed that this is also true for perspective cameras with radially symmetric distortion following the division model [13]. Here, we only assume that lines are imaged as smooth convex curves, i.e. its curvature is either always non-negative or always non-positive, up to a smoothness threshold \(\theta \in \mathrm {I\!R}^+\) (see Fig. 1). Note that this is a typical assumption used, for example, to find circles [2] or ellipses [21].

Fig. 1.
figure 1

Curves are approximated in a piecewise linear fashion, i.e. as polygonal chains. The polygonal chain starting at segment A breaks both the convexity and smoothness assumptions at B. Therefore, a new polygonal chain candidate starts at B.

Finally, under the central projection assumption, the two endpoints \(\boldsymbol{x_1}, \boldsymbol{x_2} \in \mathrm {I\!R}^3\) defining a three-dimensional line, along with the center of projection, define a plane with normal vector \(\boldsymbol{n} \in \mathrm {I\!R}^3\) (see Fig. 2). This fact, along with the inverse of the projection function \(\pi ^{-1}: \Omega \rightarrow \mathrm {I\!R}^3\) will be used to generate a model of the observed curve. Note that the depth of a 3D point is lost during the projection process, thus \(\pi ^{-1}\) can only recover its direction.

Fig. 2.
figure 2

In a central projection system, a 3D line and the center of projection form a plane. This plane intersects the unit sphere in the dotted circle. The line segment is thus an arc on the unit sphere.

4 Algorithm Description

D-LSD operates in 4 main steps as follows. First, a set of continuous edges are extracted from the intensity image. Subsequently, smooth convex curves are extracted as polygonal chains from connected edges. A model of a 3D line is then extracted by means of a least squares minimization from the pixels on the curve and, finally, such curve is validated as a distorted line according to its reprojection error in the image.

4.1 Edge Detection

We rely on the Edge Drawing (ED) algorithm by Topal and Akinlar [23] for edge detection, since it is a very efficient method to extract connected, 1 px-wide chains of edge pixels. In summary, this algorithm works through these steps:

  1. 1.

    The image gradient magnitude and direction is computed at each pixel from a smoothed version of the input image.

  2. 2.

    Pixels with local maximum gradient are selected. These pixels, known as anchors, correspond to edge elements with a very high probability.

  3. 3.

    Finally, anchors are connected using the neighbor’s gradient magnitude and direction, following gradient maximum values.

4.2 Polygonal Chain Extraction

Smooth convex curves are then extracted from continuous edge segments in two steps. First, straight lines are extracted from the edge segments, similar to the line segment extraction method EDLines [1]. Essentially, we fit a straight line to the segments under a least squares approach and then pixels are added one by one from the connected edge segment until the error exceeds a given threshold. When no more pixels can be added for that threshold, a new line is initialized and the process is repeated with the remaining pixels.

Finally, once straight lines have been extracted, they are grouped into polygonal chains according to the smoothness and convexity assumptions. Starting from a line segment, consecutive line segments are added to the group as long as the angle between them is below a certain threshold and the whole group shares the same turn direction (refer to Fig. 1). This builds a candidate curve.

4.3 Model Generation

Given a set of pixels from a candidate curve, in this step we want to extract a model of its corresponding 3D line in a least squares sense.

As stated in Section 3, the points from a 3D line, along with the camera’s center of projection, all lie in a plane. Therefore, in this step, we are interested in estimating the the parameters of such plane.

A plane is described by the unit normal vector \(\boldsymbol{n} \in \mathrm {I\!R}^3\) and its distance to the origin \(D \in \mathrm {I\!R}\). This way, the point-to-plane distance can be simplified to

$$\begin{aligned} d_{\boldsymbol{n}}(\boldsymbol{x}) = \boldsymbol{x} \cdot \boldsymbol{n} + D \end{aligned}$$
(1)

Setting the center of projection coincident to the origin makes \(D = 0\), and then the least squares solution to the plane parameters, in terms of the point-to-plane distance, is given by:

$$\begin{aligned} \begin{aligned} \boldsymbol{n}^{\boldsymbol{*}} =&\mathop {\mathrm{arg\,min}}\limits _{\boldsymbol{n}}\ \sum _{\boldsymbol{u} \in \mathcal {C}} \left\Vert \pi ^{-1}(\boldsymbol{u}) \cdot \boldsymbol{n}\right\Vert ^2\\&\text {subject to}\; \left\Vert \boldsymbol{n}\right\Vert = 1 \end{aligned} \end{aligned}$$
(2)

where \(\pi ^{-1}\) is the inverse projection function, for all pixels \(\boldsymbol{u} \in \Omega \) in the segmented curve \(\mathcal {C}\).

Note that (2) is a quadratic system with a quadratic constraint and, thus, it can be expressed compactly in matrix form as:

$$\begin{aligned} \begin{aligned} \boldsymbol{n}^{\boldsymbol{*}} =&\mathop {\mathrm{arg\,min}}\limits _{\boldsymbol{n}}\ \boldsymbol{n}^\top \mathbf {M} \boldsymbol{n}\\&\text {subject to}\; \boldsymbol{n}^\top \mathbf {I_{3 \times 3}} \boldsymbol{n} = 1 \end{aligned} \end{aligned}$$
(3)

where M is the symmetric matrix

$$\begin{aligned} \boldsymbol{M} = \sum _{\boldsymbol{u} \in \mathcal {C}} \pi ^{-1}(\boldsymbol{u})\big (\pi ^{-1}(\boldsymbol{u})\big )^\top \end{aligned}$$
(4)

We solve this constrained optimization problem in closed form using the method of Lagrange multipliers as follows. Considering the Lagrangian:

$$\begin{aligned} \mathcal {L}(\boldsymbol{n}) = \boldsymbol{n}^\top \mathbf {M} \boldsymbol{n} + \lambda (\boldsymbol{n}^\top \mathbf {I} \boldsymbol{n}) \end{aligned}$$
(5)

the necessary conditions for optimality are then:

$$\begin{aligned} \frac{\partial \mathcal {L}(\boldsymbol{n})}{\partial \boldsymbol{n}} = 2\boldsymbol{n}^\top (\mathbf {M} + \lambda \mathbf {I}) = \boldsymbol{0}^\top \end{aligned}$$
(6)

Disregarding the trivial solution, the \(3 \times 3\) matrix \(\mathbf {M} + \lambda \mathbf {I}\) must be singular in order to satisfy (6). Thus, we solve for the values of \(\lambda \in \mathrm {I\!R}\) that make

$$\begin{aligned} \det \big (\mathbf {M} + \lambda \mathbf {I}\big ) = 0 \end{aligned}$$
(7)

This is a polynomial expression in \(\lambda \) of degree 3, for which a closed form solution exists. This is the characteristic polynomial, and the roots correspond to the eigenvalues of \(\mathbf {M}\). Finally, the solution to the original problem can be extracted from the kernel of \(\mathbf {M} + \lambda \mathbf {I}\).

Up to 3 solutions satisfy equation (6), so we choose as the optimal solution the one with the lowest cost, given by \(\boldsymbol{n}^\top \mathbf {M} \boldsymbol{n}\).

4.4 Geometric Validation

Finally, once we have a model of the 3D line, we geometrically verify if it corresponds to the observed curve as follows. Given a pixel \(\boldsymbol{u}\) of the curve \(\mathcal {C}\), we get the direction vector of the corresponding light-ray and project it to the plane model of the line estimated as explained above. The projected vector on the plane is finally re-projected to the image using the projection function \(\pi \). Thus, we say that a curve is the projection of a 3D line, within a certain threshold \(\rho \), if, for all segmented pixels of the curve, their re-projection fall within a \(\rho \)-radius circle around the source pixels, measured in image coordinates:

$$\begin{aligned} \left\Vert \boldsymbol{u} - \pi \big (\boldsymbol{x} - (\boldsymbol{x} \cdot \boldsymbol{n}) \boldsymbol{n}\big )\right\Vert \le \rho ,\quad \forall \boldsymbol{u} \in \mathcal {C} \end{aligned}$$
(8)

where \(\boldsymbol{x} = \pi ^{-1}(\boldsymbol{u})\) and \(\boldsymbol{n}\) is the normal vector of the fitted plane. It is worth noticing that, for computational efficiency reasons, only the endpoints of the 2D line segments forming the polygonal chain are verified.

5 Experimental Evaluation

We have evaluated D-LSD with real images from the TUM mono dataset [12], comparing it with the fastest state-of-the-art line segment detectors: LSD [16], EDLines [1] and AG3line [24]. For a fair comparison, we have used the original authors’ C++ implementationsFootnote 1. All the experiments were conducted on an Intel Core i7-7700HQ CPU with 16 GB of RAM and on a Linux-based OS.

The TUM monocular dataset is tailored for the evaluation of visual odometry and SLAM systems. In this evaluation, we focus on the specific wide_whitePaper calibration sequence, which contains 800 images showing two sheets of white paper on top of a dark table. This sequence is particularly interesting since long and high contrast line segments are observed along the whole sequence from different points of view. A wide-angle camera was used to record this sequence, having 148 \({}^{\circ }\) \(\times \) 122 \({}^{\circ }\) non-rectified field of view. The camera parameters, as provided with the dataset, are calibrated using the pinhole projection with the FoV [11] distortion model. Finally, the camera has a global shutter CMOS sensor with 1280 px \(\times \) 1024 px resolution, recording frames at a fixed rate of 10 fps.

In order to test the other methods, we first rectify the raw images using the tools provided with the datasetFootnote 2 in crop mode, which keeps as much pixels from the original image as possible but without adding any black borders (see Fig. 3). We set the output image resolution to match the original 1280 px \(\times \) 1024 px for a fair comparison between the methods using rectified images and D-LSD. The other evaluated methods are then applied to the rectified images keeping their default parameters. In turn, for D-LSD, we have set the smoothness threshold to \(\theta = \frac{\pi }{16}\) and the reprojection error for line validation to \(\rho = {4\,\mathrm{px}}\). Additionally, small (less than 25 px) single straight line polygonal chains are omitted.

Fig. 3.
figure 3

Original (3a) and rectified (3b) images from the wide_whitePaper sequence of the TUM monocular dataset [12].

We have measured the D-LSD performance with the recall value, computed as the ratio of the number of line segments found in the distorted image by D-LSD w.r.t. the number of segments in the image, as given by any of the state-of-the-art methods on the rectified image. Thus, a maximum value of 100 % would tell us that all the line segments detected in the rectified image have been recovered with D-LSD. A correspondence between rectified and distorted lines is set if at least 90 % of the rectified line falls within 4 px of a distorted line detected by D-LSD. We omit small line segments from the evaluation since their detection is highly affected by the internal parameters of each algorithm. However, the length in pixels of the same line segment is different in rectified and non-rectified images. Thus, we set the minimum line length threshold in terms of the angle between the light-rays of the endpoints of a line, which does not depend on the actual projection function. In this way, we can discard the same small segments for all the evaluated methods. The length threshold is represented as a percentage of the vertical FoV.

The recall rates for the whole wide_whitePaper sequence and for different values of the minimum line length threshold are shown in Fig. 4. The results of the experiment show that most of the long lines (about 21 \({}^{\circ }\) and above) are detected by D-LSD, while it struggles to detected smaller lines. This is, in part, because of some internal thresholds of D-LSD that are expressed in pixels and thus are not equivalent when using rectified images. The total number of line segments detected by each algorithm along with the average execution time are reported in Table 1. Only line segments with length above 25 % minimum length threshold (i.e. 61 \({}^{\circ }\) and above) are taken into account. It is important to highlight that one of the advantages of using D-LSD is that it can detect a larger number of the long lines than the rest of the methods, since it uses the original non-rectified images with wider FoV. Not only that, our proposal is the fastest detector, even without taking into account the previous rectification step needed by the others (first column in Table 1).

Fig. 4.
figure 4

D-LSD recall ratio compared to state-of-the-art line segment detectors on rectified images from the wide_whitePaper sequence.

Table 1. Total number of line segments detected in the wide_whitePaper sequence at 25 % length threshold and average execution time (with standard deviation) measured in milliseconds.

6 Conclusions

In this work, we have presented a new line segment detector for calibrated images that works directly on the original, non-rectified image. An open source C++ implementation of our algorithm is available at https://github.com/dzunigan/line_detection.

Our approach, coined D-LSD, relies on the efficient Edge Drawing [23] algorithm for edge detection. Edge pixels are then grouped into polygonal chains, representing smooth convex curves. Finally, the segmented curves are validated as the projection of a 3D line by fitting a plane that includes the projection center and the light-rays passing through the curve pixels. This plane fitting problem is formulated as a quadratically constrained quadratic optimization problem, which is solved in closed form using the method of Lagrange multipliers. Our method is general enough to handle any central projection system: from wide-angle to omnidirectional cameras.

We have evaluated the proposed detector with images captured by a wide-angle camera available in the TUM monocular dataset [12], comparing its detection capabilities against state-of-the-art line segment detectors on rectified images. In the experiments, D-LSD shows comparable performance for non-short segments while being the fastest approach and without requiring any previous image rectification. We believe D-LSD could be of much interest for demanding real time applications that rely on line features for further image analysis.