RCLane: Relay Chain Prediction for Lane Detection

Xu, Shenghua; Cai, Xinyue; Zhao, Bin; Zhang, Li; Xu, Hang; Fu, Yanwei; Xue, Xiangyang

doi:10.1007/978-3-031-19839-7_27

Shenghua Xu¹²,
Xinyue Cai¹³,
Bin Zhao¹²,
Li Zhang¹⁴,
Hang Xu¹³,
Yanwei Fu¹² &
…
Xiangyang Xue¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13698))

Included in the following conference series:

European Conference on Computer Vision

2706 Accesses
7 Citations

Abstract

Lane detection is an important component of many real-world autonomous systems. Despite a wide variety of lane detection approaches have been proposed, reporting steady benchmark improvements over time, lane detection remains a largely unsolved problem. This is because most of the existing lane detection methods either treat the lane detection as a dense prediction or a detection task, few of them consider the unique topologies (Y-shape, Fork-shape, nearly horizontal lane) of the lane markers, which leads to sub-optimal solution. In this paper, we present a new method for lane detection based on relay chain prediction. Specifically, our model predicts a segmentation map to classify the foreground and background region. For each pixel point in the foreground region, we go through the forward branch and backward branch to recover the whole lane. Each branch decodes a transfer map and a distance map to produce the direction moving to the next point, and how many steps to progressively predict a relay station (next point). As such, our model is able to capture the keypoints along the lanes. Despite its simplicity, our strategy allows us to establish new state-of-the-art on four major benchmarks including TuSimple, CULane, CurveLanes and LLAMAS.

S. Xu and X. Cai—Equal contribution.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Polynomial Regression Network for Variable-Number Lane Detection

Geometric Constrained Joint Lane Segmentation and Lane Boundary Detection

Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection

Keywords

1 Introduction

Lane detection, the process of identifying lanes as approximated curves, is a fundamental step in developing advanced autonomous driving system and plays a vital role in applications such as driving route planning, lane keeping, real-time positioning and adaptive cruise control.

Early lane detection methods [3, 8,9,10,11, 14, 28, 34] usually extract hand-crafted features and cluster foreground points on lanes through post-processing. However, traditional methods can not detect diverse lanes correctly for so many complicated scenes in driving scenarios. Thanks to the development of deep-learning, a wide variety of lane detection approaches based on convolution neural network(CNN) have been proposed, such as segmentation-based methods and proposal-based methods, reporting steady benchmark improvements over time.

Proposal-based methods initialize a fixed number of anchors directly and model global information focusing on the optimization of proposal coordinates regression. LaneATT [26] designs slender anchors according to long and thin characteristic of lanes. However, line proposals fail to generalize local locations of all lane points for curve lanes or lanes with more complex topologies. While segmentation-based methods treat lane detection as dense prediction tasks to capture local location information of lanes. LaneAF [1] focuses on local geometry to integrate into global results. However, this bottom-up manner can not capture the global geometry of lanes directly. In some cases such as occlusion or resolution reduction for points on the far side of lane, model performance will be affected due to the loss of lane shape information. Visualization results in Fig. 1(b) of these methods show their shortcomings. Lanes always span half or almost all of the image, these methods neglect this long and thin characteristic of lanes which requires networks to focus on the global shape message and local location information simultaneously. In addition, complex lanes such as Y-shape lanes and Fork-shape lanes are common in the current autonomous driving scenario, while existing methods often fail at these challenging scenes which are shown in Fig. 1(a).

To address this important limitation of current algorithms, we propose a more accurate lane detection solution in the unconstrained driving scenarios, which is called RCLane inspired by the idea of Relay Chain for focusing on local location and global shape information of lanes at the meanwhile. Each foreground point on the lane can be treated as a relay station for recovering the whole lane sequentially in a chain mode. Relay station construction is proposed for strengthening the model’s ability of learning local message that is fundamental to describe flexible shapes of lanes. To be specific, we construct a transfer map representing the relative location from current pixel to its two neighbors on the same lane. Furthermore, we apply bilateral prediction strategy aiming to improve generalization ability for lanes with complex topologies. Finally, we design global shape message learning module. Concretely, this module predicts the distance map describing the distance from each foreground point to the two end points on the same lane. The contributions of this work are as follows:

We propose novel relay chain representation for lanes to model global geometry shape and local location information of lanes simultaneously.
We introduce a novel pair of lane encoding and decoding algorithms to facilitate the process of lane detection with relay chain representation.
Extensive experiments on four major lane detection benchmarks show that our approach beats the state-of-the-art alternatives, often by a clear margin and achieves real-time performance.

2 Related Work

Existing methods for lane detection can be categorized into: segmentation-based methods, proposal-based methods, row-wise methods and polynomial regression methods.

Segmentation-Based Methods. Segmentation-based methods [7, 12, 13, 20, 21], typically make predictions based on pixel-wise classification. Each pixel will be classified as either on lane or background to generate a binary segmentation mask. Then a post-processing step is used to decode it into a set of lanes. But it is still challenging to assign different points to their corresponding lane instances. A common solution is to predict the instance segmentation mask. However, the number of lanes has to be predefined and fixed when using this strategy, which is not robust for real driving scenarios.

Proposal-Based Methods. Proposal-based methods [4, 26, 32], take a top-to-down pipeline that directly regresses the relative coordinates of lane shapes. Nevertheless, they always struggle in lanes with complex topologies such as curve lanes and Y-shaped lanes. The fixed anchor shape has a major flaw when regressing the variable lane shapes in some hard scenes.

Row-Wise Methods. Based on the grid division of the input image, row-wise detection approaches [6, 15, 22, 23, 33] have achieved great progress in terms of accuracy and efficiency. Generally, row-wise detection methods directly predict the lane position for each row and construct the set of lanes through post-processing. However, detecting nearly horizontal lanes which fall at small vertical intervals is still a major problem.

Polynomial Regression Methods. Polynomial regression methods [16, 27] directly outputs polynomials representing each lane. The deep network is firstly used in [27] to predict the lane curve equation, along with the domains for these polynomials and confidence scores for each lane. [16] uses a transformer [30] to learn richer structures and context, and reframes the lane detection output as parameters of a lane shape model. However, despite of the fast speed polynomial regression methods achieve, there is still some distance from the state of the art results.

3 Method

Given an input image $I \in \mathbb {R}^{H \times W \times C}$, the goal of RCLane is to predict a collection of lanes $L = \{l_1, l_2, \cdots , l_N\}$, where N is the total number of lanes. Generally, each lane $l_k$ is represented as follows:

$$\begin{aligned} l_k = \{(x_{1}, y_{1}), (x_{2}, y_{2}), \cdots , (x_{N_k}, y_{N_k})\}, \end{aligned}$$

(1)

The overall structure of our RCLane is shown in Fig. 2. This section will first present the concept of lane detection with relay chain, then introduce the lane encoder for relay station construction, followed by a lane decoder to attain curve lanes. Finally, the network architecture and losses we adopt is detailed.

3.1 Lane Detection with Relay Chain

Focusing on the combination of local location and global shape information to detect lanes with complex topologies, we propose a novel lane detection method RCLane with the idea of relay chain. Relay chain is a structure composed of relay stations which are connected in a chain mode. Relay station is responsible for data processing and transmitting it to adjacent stations, while chain is a kind of structure that organizes these stations from an overall perspective. All stations are associated to corresponding lane points respectively.

We design the structure of relay chain which is appropriate for combining local location and global geometry message in lane detection and propose RCLane in this work. To be specific, each foreground point on the lane is treated as a relay station and can extend to the neighbor points iteratively to decode the lane in a chain mode. All foreground points are supervised by two kinds of message mentioned above. Moreover, the structure of chain has high flexibility to fit lanes with complex topologies.

Next, we will introduce the relay station construction and propose bilateral predictions for complex topologies and global shape message learning to explain how to detect lanes with the idea of Relay Chain progressively.

Relay Station Construction. Segmentation-based approaches normally predict all foreground points on lanes and cluster them via post-processing. [1] predicts horizontal and vertical affinity fields for clustering and associating pixels belonging to the same lane. [24] regresses a vector describing the local geometry of the curve that current pixel belongs to and refines shape further in the decoding algorithm. Nevertheless, they both fix the vertical intervals between adjacent points and decode lanes row-by-row from bottom to top. In fact, horizontal offsets are used for refining the position of current points while vertical offsets are for exploring the vertical neighbors of them. And the fixed vertical offsets can not adapt to the high degree of freedom for lanes. For example, they can only detect a fraction of the nearly horizontal lanes. Thus, we propose relay station construction module to establish relationships between neighboring points on the lane. Each relay station $p=(p_x, p_y)$ predicts offsets to its neighboring point $p^{next}=(p^{next}_x, p^{next}_y)$ on the same lane with a fixed step length d as is shown in Eq. 2, 3 in two directions. And the deformation trend of lanes can be fitted considerably by eliminating vertical constraints. All relay stations are then connected to form a chain which is the lane exactly.

$$\begin{aligned} (p^{next}_x, p^{next}_y) = (p_x, p_y) + (\varDelta x, \varDelta y), \end{aligned}$$

(2)

$$\begin{aligned} \varDelta x^2 + \varDelta y^2 = d^2. \end{aligned}$$

(3)

Bilateral Predictions for Complex Topologies. The current autonomous driving scenario contains lanes with complex topologies such as Y-shape and Fork-shape lanes, which can be regarded as that two lanes merges as the stem. One-way prediction can only detect one of lanes because it can only extend to one limb when starting from the stem of these lanes. We adopt a two-way detection strategy that splits the next neighboring point $p^{next}$ into the forward point $p^f$ and the backward point $p^b$. Points on different limbs can recover lanes they belong to respectively and compose the final Y-shape or fork-shape lanes as is illustrated in Fig. 3(b). Let F denotes the output feature map from the backbone whose resolution drops by a factor of 4 compared to the original image. We design a transfer output head and pick F as input. F goes through convolution-based transfer head to get the transfer map T which consists of forward and backward components $T_f, T_b \in \mathbb {R}^{H \times W \times 2}$. Each location in $T_f$ is a 2D vector, which represents the offsets between the forward neighboring point $p^f$ and the current pixel p. The definition of $T_b$ is similar as $T_f$. Consequently, we can detect the forward and backward neighboring points $p^f$, $p^b$ of p guided by T.

$$\begin{aligned} p^f = p + T_f(p),\quad {p^b} = p + T_b(p). \end{aligned}$$

(4)

With the guidance of local location information in transfer map T, the whole lane can be detected iteratively via bilateral strategy.

Global Shape Message Learning. Previous works predict positions of end points for lanes to guide decoding process. FastDraw [22] predicts end tokens to encode the global geometry while CondLaneNet [15] recovers the row-wise shape through the vertical range prediction. These methods actually ignores the relation between the end points and other points on the same lane. We make every relay station learns the global shape message transmitted in the chain by utilizing the relation mentioned above. In detail, we design a distance head to predict the distance map D that consists of the forward and backward components $D_f, D_b \in \mathbb {R}^{H \times W \times 1}$. Each location in $D_f$ is a scalar, which represents the distance from the current pixel p to the forward end point $p_{end}^{f}$ on the lane. With this global shape information, we can know when to stop the lane decoding process. Specifically speaking, the iterations for decoding the forward branch of p is $\frac{D_f}{d}$. The definition of $D_b$ is similar as $D_f$ as well. With the combination of local location and global geometry information, our relay chain prediction strategy performs considerably well even in complex scenarios. Next, we will introduce the novel pair of lane encoding and decoding algorithms designed for lane detection.

3.2 Lane Encoder for Relay Station Construction

The lane encoder is to create the supervision of transfer and distance maps for training. Given an image $I \in \mathbb {R}^{H \times W \times 3}$ and its segmentation mask $\overline{S} \in \mathbb {R}^{H \times W \times 1}$, for any foreground point $p_i = (x_i, y_i) \in \overline{S}$ we denote its corresponding lane as $\gamma _L$. The two forward and backward end points of $\gamma _L$ are denoted as $p_{end}^{f} = (x_{end}^{f}, y_{end}^{f})$ and $p_{end}^{b} = (x_{end}^{b}, y_{end}^{b})$, which have the minimum and maximum y-coordinates respectively. The forward distance scalar $\overline{D}_f(p_i)$ and backward distance scalar $\overline{D}_b(p_i)$ of $p_i$ are formulated as the following:

$$\begin{aligned} \overline{D}_f(p_i) = \sqrt{(x_i-x_{end}^{f})^2 + (y_i-y_{end}^{f})^2}, \end{aligned}$$

(5)

$$\begin{aligned} \overline{D}_b(p_i) = \sqrt{(x_i-x_{end}^{b})^2 + (y_i-y_{end}^{b})^2}. \end{aligned}$$

(6)

To generate the forward transfer vector and backward transfer vector for pixel $p_i$, we first find the two neighbors on $\gamma _L$ of it with the fixed distance d. They are denoted as $p_i^f=(x_i^{f}, y_i^{f})$ and $p_i^b=(x_i^{b}, y_i^{b})$ and represent the forward neighbor and backward neighbor respectively. Then the forward transfer vector $\overline{T}_f(p_i)$ and the backward transfer vector $\overline{T}_b(p_i)$ for pixel $p_i$ are defined :

$$\begin{aligned} \overline{T}_f(p_i) = (x_i^{f} - x_i, y_i^{f} - y_i), \end{aligned}$$

(7)

$$\begin{aligned} \overline{T}_b(p_i) = (x_i^{b} - x_i, y_i^{b} - y_i), \end{aligned}$$

(8)

$$\begin{aligned} \vert \vert \overline{T}_f(p_i) \vert \vert _2 = \vert \vert \overline{T}_b(p_i) \vert \vert _2 = d. \end{aligned}$$

(9)

The details are shown in Fig. 3(a). In addition, for two separate parts of one Y-shape lane: $l_1 = \{(x_1, y_1), \cdots , (x_m, y_m),(x^1_{m+1}, y^1_{m+1}), \cdots , (x^1_{n_1}, y^1_{n_1})\}$, $l_2 = \{(x_1, y_1), \cdots , (x_m, y_m),(x^2_{m+1}, y^2_{m+1}), \cdots , (x^2_{n_2},y^2_{n_2})\}$. $\{(x_1, y_1), \cdots , (x_m, y_m)\}$ is the shared stem. We randomly choose one point from $(x^1_{m+1}, y^1_{m+1})$ and $(x^2_{m+1},$ $ y^2_{m+1})$ as the forward neighboring point of $(x_m, y_m)$ while $(x_m, y_m)$ is the common backward neighboring point of $(x^1_{m+1}, y^1_{m+1})$ and $(x^2_{m+1}, y^2_{m+1})$. All foreground pixels on the $\overline{S}$ are processed following the same formula and then $\overline{T}_{f,b}$ and $\overline{D}_{f,b}$ can be generated. The process is shown in Fig. 4.

3.3 Lane Decoder with Transfer and Distance Map

With the predictions of local location and global geometry, we propose a novel lane decoding algorithm to detect all curves in a given image.

Given the predicted binary segmentation mask S, transfer map T and distance map D, we collect all the foreground points of S and use a Point-NMS to get a sparse set of key points K. Every key point $p \in K$ serves as a start point to recover one global curve.

Step1: Find the forward transfer vector $T_f(p)$ and forward distance scalar $D_f(p)$ for p. The moving steps we should extend the neighbors for the forward branch can be defined as $M^{f} = \frac{D_f(p)}{d}$. In other words, we can infer the location of the forward end point of p with $D_f(p)$ on the same lane.

Here d is the step length. Then the forward neighbor pixel $p_{i+1}^{f}$ of $p_i^{f}$ can be calculated iteratively by:

$$\begin{aligned} p_{i+1}^{f} = p_i^{f} + T_f({p_i^{f}}),\ i \in \{0, 1, 2, \cdots , M^{f} - 1\},\ p_0=p. \end{aligned}$$

(10)

The forward branch of the curve can be recovered by connecting $\{p, p_1^{f}, \cdots , p_{M^{f}}^{f}\}$ sequentially. The detail is shown on the top of Fig. 5.

Step2: We calculate the point set $\{p, p_1^{b}, p_2^{b}, \cdots , p_{M^{b}}^{b}\}$ following Eq. 10 via $T_b$ and $D_b$ and connect them sequentially to recover the backward branch.

Step3: We then merge the backward and forward curve branches together to get the global curve:

$$\begin{aligned} \gamma _L=\{p_{M^{b}}^{b}, \cdots , p_2^{b}, p_1^{b}, p, p_1^{f}, p_2^{f}, \cdots , p_{M^{f}}^{f}\}. \end{aligned}$$

(11)

Finally, the non-maximum suppression [19] is performed on all the predicted curves to get the final results.

3.4 Network Architecture

The overall framework is shown in Fig. 2. SegFormer [31] is utilized as our network backbone, aiming to extract global contextual information and learn the long and thin structures of lanes. SegFormer-B0, B1 and B2 are used as small, medium and large backbones in our experiments respectively. Given an image $I \in R^{H\times W \times 3}$, the segmentation head predicts the binary segmentation mask $S \in R^{H \times W \times 1}$, the transfer head predicts the transfer map T which consists of the forward and backward parts $T_f, T_b$ $ \in \mathbb {R}^{H \times W \times 2}$, and the distance head predicts the distance map D that consists of $D_f, D_b$ $ \in \mathbb {R}^{H \times W \times 1}$.

3.5 Loss Function

To train our proposed model, we adopt different losses for predictions. For the binary segmentation mask, we adopt the OHEM loss [25] to train it in order to solve class imbalance problem due to the sparsity of lane segmentation points. The OHEM loss is formulated as follows:

$$\begin{aligned} L_{seg} = \frac{1}{N_{pos}+N_{neg}}(\sum _{i \in S_{pos}}y_i log(p_i)+\sum _{i \in S_{neg}}(1-y_i)log(1-p_i)). \end{aligned}$$

(12)

where $S_{pos}$ is the set of positive points and $S_{neg}$ is the set of hard negative points which is most likely to be misclassified as positive. $N_{pos}$ and $N_{neg}$ denote the number of points in $S_{pos}$ and $S_{neg}$ respectively. The ratio of $N_{neg}$ to $N_{pos}$ is a hyperparmeter $\mu $. As for the per-pixel transfer and distance maps, we simply adopt the smooth $L_1$ loss, which are denoted as $L_{T}$ and $L_{D}$, to train them.

$$\begin{aligned} L_D = \frac{1}{N_{pos}}\sum _{i \in S_{pos}}L_{smooth_{L_1}}(D(p_i), \overline{D}(p_i)), \end{aligned}$$

(13)

$$\begin{aligned} L_T = \frac{1}{N_{pos}}\sum _{i \in S_{pos}}L_{smooth_{L_1}}(T(p_i), \overline{T}(p_i)). \end{aligned}$$

(14)

In the training phase, the total loss is defined as follows:

$$\begin{aligned} L_{total} = L_{seg} + L_{T} + L_{D}. \end{aligned}$$

(15)

4 Experiment

4.1 Experimental Setting

Dataset. We conduct experiments on four widely used lane detection benchmark datasets: CULane [21], TuSimple [29], LLAMAS [2] and CurveLanes [32]. CULane consists of 55 h of videos which comprises nine different scenarios, including normal, crowd, dazzle night, shadow, no line, arrow, curve, cross and night. The TuSimple dataset is collected with stable lighting conditions on highways. LLAMAS is a large lane detection dataset obtained on highway scenes with annotations auto-generated by using high-definition maps. CurveLanes is a recently proposed benchmark with cases of complex topologies such as Y-shape lanes and dense lanes. The details of four datasets are shown in Table 1.

Table 1. Lane detection datasets.

Full size table

Table 2. State-of-the-art comparison on CULane. Even the small version of our RCLane achieves the state-of-art performance with only 6.3M parameters.

Full size table

Evaluation Metrics. For CULane, CurveLanes and LLAMAS, we utilize F1-measure as the evaluation metric. While for TuSimple, accuracy is presented as the official indicator. And we also report the F1-measure for TuSimple. The calculation method follows the same formula as in CondLaneNet [15].

Implementation Details. The small, medium and large versions of our RCLane-Det are used on all four datasets. Except when explicitly indicated, the input resolution is set to $320 \times 800$ during training and testing. For all training sessions, we use AdamW optimizer [17] to train 20 epochs on CULane, CurveLanes and LLAMAS, 70 epochs on TuSimple respectively with a batch size of 32. The learning rate is initialized as 6e-4 with a “poly” LR schedule. We set $\eta $ for calculating IOU between lines as 15, the ratio of $N_{neg}$ to $N_{pos}$ $\mu $ as 15, the minimum distance between any two foreground pixels of in Point-NMS $\tau $ as 2. We implement our method using the Mindspore [18] on Ascend 910.

4.2 Results

CULane. As illustrated in Table 2, RCLane achieves a new state-of-the-art result on the CULane testing set with an 80.50% F1-measure. Compared with the best model as far as we know, CondLaneNet [15], although our method performs better only 1.02% of F1-measure compared with the best model before CondLaneNet since CULane is a simpler dataset with may straight lines, it has an considerable improvements in crowded and curve scenes, which demonstrates that Relay Chain can strengthen local location connectivity through global shape learning for local occlusions and complex line topologies.

Table 3. Performance of different methods on CurveLanes.

Full size table

CurveLanes. CurveLanes [32] is a challenging benchmark with many hard scenarios. The evaluation results are shown in Table 3. We can see that our largest model (with SegFormer-B2) surpasses CondLaneNet-L by 5.33% in F1-measure, which is more pronounced than it on CULane. Due to that CurveLanes is more complex with Fork-shape, Y-shape and other curve lanes, improvements both in recall rate and accuracy prove that RCLane has generalization ability on lanes.

TuSimple. The results on TuSimple are shown in Table 4. As Tusimple is a small dataset and scenes are more simple with accurate annotations, the gap between all methods is small. Moreover, our method also achieves a new state-of-the-art F1 score of 97.64%.

Table 4. Performance of different methods on TuSimple.

Full size table

LLAMAS. LLAMAS [2] is a new dataset with more than 100K images from highway scenarios. The results of our RCLane on LLAMAS is shown in Table 5. The best result of our method is 96.13% F1 score with RCLane-L.

Table 5. Performance of different methods on LLAMAS.

Full size table

4.3 Ablation Study

Different Modules. In this section, we perform the ablation study to evaluate the impact of the proposed relay station construction, bilateral predictions and global shape message learning on CurveLanes. The results is shown in Table 6. The first row shows the baseline result, which only uses binary segmentation plus post processing named DBSCAN [5] to detect lanes. In the second row, the lane is recovered from bottom to top gradually with the guidance of the forward transfer map and forward distance map. While the third row detect lanes from top to bottom. In the fourth row, we only use the forward and backward transfer maps to predict the lane. And we present our full version of RCLane in the last row, which attains a new state-of-art result 91.43% on CurveLanes.

Comparing the first two rows, we can see that the proposed relay station construction has greatly improved the performance. Then, we add global shape information learning with distance map which can improve the performance from 88.19% to 91.43%. While we do additional two experiments in the second and third lines, the lane is detected by transfer and distance maps from one-way direction and there is a certain gap with the highest F1-score. It proves that our bilateral prediction has generalization in depicting topologies of lanes. In addition, there exists a gap between the forward the backward models. As the near lanes (the bottom region of the image) are usually occluded by the ego car, the corresponding lane points get low confidence scores from the segmentation results. Therefore the starting points are usually outside of the occluded area and the forward counterpart eventually has no chance back to cover the lanes at the bottom of the image. In contrast, the backward model detects lanes more completely with the help of the distance map when decoding from the top, including the occluded area.

Table 6. Comparison of different components on CurveLanes. The $T_f$, $T_b$, $D_f$, $D_b$ represent the forward transfer map, backward transfer map, forward distance map and backward distance map respectively.

Full size table

Comparisons with Other Methods Using the Same Backbone. We additionally use Segformer-B2 [31] as backbone to train CondLaneNet [15] and LaneAF [1] respectively and show their results on Table 7 below. Without changing the parameters of their models, our model still outperforms LaneAF and CondLaneNet by a margin on CULane [21] dataset due to its superior precision, which demonstrates the high quality of lanes detected by RCLane. It further fairly verifies the superiority of our proposed relay chain prediction method, which can process local location and global geometry information simultaneously to improve the capacity of the model.

Table 7. Comparisons with other methods using the same backbone Segformer-B2.

Full size table

Local Location and Global Shape Message Modeling. In Fig. 6 A.(1, 3), the transfer map can capture local location information depicting topology of the lane precisely, while the distance map in Fig. 6 A.(2, 4) models global shape information with large receptive field. Furthermore, in some driving scenarios, there occurs loss of lane information due to the disappearance of trace for lanes as is shown in Fig. 6(B). However, lanes are still captured faintly in the transfer map with the global shape information learning. The results show the robustness of our RCLane with local location and global shape message modeling.

5 Conclusion

In this paper, we have proposed to solve lane detection problem by learning a novel relay chain prediction model. Compared with existing lane detection methods, our model is able to capture global geometry and local information progressively with the novel relay station construction and global shape message learning. Furthermore, bilateral predictions can adapt to hard topologies, such as Fork-shape and Y-shape. Extensive experiments on four benchmarks including CULane, CurveLanes, Tusimple and LLAMAS demonstrate state-of-the-art performance and generalization ability of our RCLane.

References

Abualsaud, H., Liu, S., Lu, D., Situ, K., Rangesh, A., Trivedi, M.M.: LaneAF: robust multi-lane detection with affinity fields. arXiv preprint (2021)
Google Scholar
Behrendt, K., Soussan, R.: Unsupervised labeled lane markers using maps. In: ICCV Workshops (2019)
Google Scholar
Borkar, A., Hayes, M., Smith, M.T.: Robust lane detection and tracking with ransac and Kalman filter. In: ICIP (2009)
Google Scholar
Chen, Z., Liu, Q., Lian, C.: PointLaneNet: efficient end-to-end CNNs for accurate real-time lane detection. In: IEEE Intelligent Vehicles Symposium (2019)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)
Google Scholar
Hou, Y., Ma, Z., Liu, C., Hui, T.W., Loy, C.C.: Inter-region affinity distillation for road marking segmentation. In: CVPR (2020)
Google Scholar
Hou, Y., Ma, Z., Liu, C., Loy, C.C.: Learning lightweight lane detection CNNs by self attention distillation. In: ICCV (2019)
Google Scholar
Hur, J., Kang, S.N., Seo, S.W.: Multi-lane detection in urban driving environments using conditional random fields. In: IV (2013)
Google Scholar
Jiang, R., Klette, R., Vaudrey, T., Wang, S.: New lane model and distance transform for lane detection and tracking. In: International Conference on Computer Analysis of Images and Patterns (2009)
Google Scholar
Jiang, Y., Gao, F., Xu, G.: Computer vision-based multiple-lane detection on straight road and in a curve. In: 2010 International Conference on Image Analysis and Signal Processing (2010)
Google Scholar
Kim, Z.: Robust lane detection and tracking in challenging scenarios. IEEE Trans. Intell. Transp. Syst. 9(1), 16–26 (2008)
Article Google Scholar
Ko, Y., Lee, Y., Azam, S., Munir, F., Jeon, M., Pedrycz, W.: Key points estimation and point instance segmentation approach for lane detection. IEEE Trans. Intell. Transp. Syst. 23(7), 8949–8958 (2021)
Article Google Scholar
Lee, S., et al.: VPGNet: vanishing point guided network for lane and road marking detection and recognition. In: ICCV (2017)
Google Scholar
Liu, G., Wörgötter, F., Markelić, I.: Combining statistical Hough transform and particle filter for robust lane detection and tracking. In: IEEE Intelligent Vehicles Symposium (2010)
Google Scholar
Liu, L., Chen, X., Zhu, S., Tan, P.: CondLaneNet: a top-to-down lane detection framework based on conditional convolution. In: ICCV (2021)
Google Scholar
Liu, R., Yuan, Z., Liu, T., Xiong, Z.: End-to-end lane shape prediction with transformers. In: WACV (2021)
Google Scholar
Loshchilov, I., Hutter, F.: Fixing weight decay regularization in Adam (2018)
Google Scholar
Mindspore (2020). www.mindspore.cn/
Neubeck, A., Van Gool, L.: Efficient non-maximum suppression. In: ICPR (2006)
Google Scholar
Neven, D., De Brabandere, B., Georgoulis, S., Proesmans, M., Van Gool, L.: Towards end-to-end lane detection: an instance segmentation approach. In: IEEE Intelligent Vehicles Symposium (2018)
Google Scholar
Pan, X., Shi, J., Luo, P., Wang, X., Tang, X.: Spatial As Deep: spatial CNN for traffic scene understanding. In: AAAI (2018)
Google Scholar
Philion, J.: FastDraw: addressing the long tail of lane detection by adapting a sequential prediction network. In: CVPR (2019)
Google Scholar
Qin, Z., Wang, H., Li, X.: Ultra fast structure-aware deep lane detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 276–291. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_17
Chapter Google Scholar
Qu, Z., Jin, H., Zhou, Y., Yang, Z., Zhang, W.: Focus on local: detecting lane marker from bottom up via key point. In: CVPR (2021)
Google Scholar
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: CVPR (2016)
Google Scholar
Tabelini, L., Berriel, R., Paixao, T.M., Badue, C., De Souza, A.F., Oliveira-Santos, T.: Keep your eyes on the lane: real-time attention-guided lane detection. In: CVPR (2021)
Google Scholar
Tabelini, L., Berriel, R., Paixao, T.M., Badue, C., De Souza, A.F., Oliveira-Santos, T.: PolyLaneNet: lane estimation via deep polynomial regression. In: ICPR (2021)
Google Scholar
Tan, H., Zhou, Y., Zhu, Y., Yao, D., Li, K.: A novel curve lane detection based on improved River Flow and RANSA. In: IEEE Conference on Intelligent Transportation Systems (ITSC) (2014)
Google Scholar
TuSimple: Tusimple benchmark (2019). https://github.com/TuSimple/tusimple-benchmark
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Google Scholar
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: SegFormer: simple and efficient design for semantic segmentation with transformers. arXiv preprint (2021)
Google Scholar
Xu, H., Wang, S., Cai, X., Zhang, W., Liang, X., Li, Z.: CurveLane-NAS: unifying lane-sensitive architecture search and adaptive point blending. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 689–704. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_41
Chapter Google Scholar
Yoo, S., et al.: End-to-end lane marker detection via row-wise classification. In: CVPR Workshops (2020)
Google Scholar
Zhou, S., Jiang, Y., Xi, J., Gong, J., Xiong, G., Chen, H.: A novel lane detection based on geometrical model and Gabor filter. In: IEEE Intelligent Vehicles Symposium (2010)
Google Scholar

Download references

Acknowledgments

This work was supported in part by National Natural Science Foundation of China (Grant No. 6210020439), Lingang Laboratory (Grant No. LG-QS-202202-07), Natural Science Foundation of Shanghai (Grant No. 22ZR1407500), Shanghai Municipal Science and Technology Major Project (Grant No. 2018SHZDZX01 and 2021SHZDZX0103), Science and Technology Innovation 2030 - Brain Science and Brain-Inspired Intelligence Project (Grant No. 2021ZD0200204), MindSpore and CAAI-Huawei MindSpore Open Fund.

Author information

Authors and Affiliations

Fudan University, Shanghai, China
Shenghua Xu, Bin Zhao, Yanwei Fu & Xiangyang Xue
Huawei Noah’s Ark Lab, Shanghai, China
Xinyue Cai & Hang Xu
School of Data Science, Fudan University, Shanghai, China
Li Zhang

Authors

Shenghua Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xinyue Cai
View author publications
You can also search for this author in PubMed Google Scholar
Bin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yanwei Fu
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyang Xue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Zhang .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4494 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, S. et al. (2022). RCLane: Relay Chain Prediction for Lane Detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13698. Springer, Cham. https://doi.org/10.1007/978-3-031-19839-7_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-19839-7_27
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19838-0
Online ISBN: 978-3-031-19839-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

RCLane: Relay Chain Prediction for Lane Detection

Abstract

Similar content being viewed by others

Polynomial Regression Network for Variable-Number Lane Detection

Geometric Constrained Joint Lane Segmentation and Lane Boundary Detection

Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection

Keywords

1 Introduction

2 Related Work