Keywords

1 Introduction

Lane detection, the process of identifying lanes as approximated curves, is a fundamental step in developing advanced autonomous driving system and plays a vital role in applications such as driving route planning, lane keeping, real-time positioning and adaptive cruise control.

Fig. 1.
figure 1

Challenging scenes (curve lanes, Y-shape lanes). The first row of (a) shows the ground truth while the second row is our predictions. The first row of (b) shows the result of segmentation-based methods that global shape of lane is not well fitted. While the second row of (b) shows proposal-based methods, can not depict local locations of Y-shape and curve lanes.

Early lane detection methods [3, 8,9,10,11, 14, 28, 34] usually extract hand-crafted features and cluster foreground points on lanes through post-processing. However, traditional methods can not detect diverse lanes correctly for so many complicated scenes in driving scenarios. Thanks to the development of deep-learning, a wide variety of lane detection approaches based on convolution neural network(CNN) have been proposed, such as segmentation-based methods and proposal-based methods, reporting steady benchmark improvements over time.

Proposal-based methods initialize a fixed number of anchors directly and model global information focusing on the optimization of proposal coordinates regression. LaneATT [26] designs slender anchors according to long and thin characteristic of lanes. However, line proposals fail to generalize local locations of all lane points for curve lanes or lanes with more complex topologies. While segmentation-based methods treat lane detection as dense prediction tasks to capture local location information of lanes. LaneAF [1] focuses on local geometry to integrate into global results. However, this bottom-up manner can not capture the global geometry of lanes directly. In some cases such as occlusion or resolution reduction for points on the far side of lane, model performance will be affected due to the loss of lane shape information. Visualization results in Fig. 1(b) of these methods show their shortcomings. Lanes always span half or almost all of the image, these methods neglect this long and thin characteristic of lanes which requires networks to focus on the global shape message and local location information simultaneously. In addition, complex lanes such as Y-shape lanes and Fork-shape lanes are common in the current autonomous driving scenario, while existing methods often fail at these challenging scenes which are shown in Fig. 1(a).

To address this important limitation of current algorithms, we propose a more accurate lane detection solution in the unconstrained driving scenarios, which is called RCLane inspired by the idea of Relay Chain for focusing on local location and global shape information of lanes at the meanwhile. Each foreground point on the lane can be treated as a relay station for recovering the whole lane sequentially in a chain mode. Relay station construction is proposed for strengthening the model’s ability of learning local message that is fundamental to describe flexible shapes of lanes. To be specific, we construct a transfer map representing the relative location from current pixel to its two neighbors on the same lane. Furthermore, we apply bilateral prediction strategy aiming to improve generalization ability for lanes with complex topologies. Finally, we design global shape message learning module. Concretely, this module predicts the distance map describing the distance from each foreground point to the two end points on the same lane. The contributions of this work are as follows:

  • We propose novel relay chain representation for lanes to model global geometry shape and local location information of lanes simultaneously.

  • We introduce a novel pair of lane encoding and decoding algorithms to facilitate the process of lane detection with relay chain representation.

  • Extensive experiments on four major lane detection benchmarks show that our approach beats the state-of-the-art alternatives, often by a clear margin and achieves real-time performance.

2 Related Work

Existing methods for lane detection can be categorized into: segmentation-based methods, proposal-based methods, row-wise methods and polynomial regression methods.

Segmentation-Based Methods. Segmentation-based methods [7, 12, 13, 20, 21], typically make predictions based on pixel-wise classification. Each pixel will be classified as either on lane or background to generate a binary segmentation mask. Then a post-processing step is used to decode it into a set of lanes. But it is still challenging to assign different points to their corresponding lane instances. A common solution is to predict the instance segmentation mask. However, the number of lanes has to be predefined and fixed when using this strategy, which is not robust for real driving scenarios.

Proposal-Based Methods. Proposal-based methods [4, 26, 32], take a top-to-down pipeline that directly regresses the relative coordinates of lane shapes. Nevertheless, they always struggle in lanes with complex topologies such as curve lanes and Y-shaped lanes. The fixed anchor shape has a major flaw when regressing the variable lane shapes in some hard scenes.

Row-Wise Methods. Based on the grid division of the input image, row-wise detection approaches [6, 15, 22, 23, 33] have achieved great progress in terms of accuracy and efficiency. Generally, row-wise detection methods directly predict the lane position for each row and construct the set of lanes through post-processing. However, detecting nearly horizontal lanes which fall at small vertical intervals is still a major problem.

Polynomial Regression Methods. Polynomial regression methods [16, 27] directly outputs polynomials representing each lane. The deep network is firstly used in [27] to predict the lane curve equation, along with the domains for these polynomials and confidence scores for each lane. [16] uses a transformer [30] to learn richer structures and context, and reframes the lane detection output as parameters of a lane shape model. However, despite of the fast speed polynomial regression methods achieve, there is still some distance from the state of the art results.

Fig. 2.
figure 2

Schematic illustration of proposed RCLane. Standard Segformer [31] is used as backbone. The output head consists of three branches. The segment head predicts segmentation map (S). The distance head and the transfer head predict distance map (D) and transfer map (T) respectively. Both kinds of maps contain forward and backward parts. Then, Point-NMS is used for sparse segmentation results. All predictions are fed into the lane decoder (Fig. 5), to get final results.

3 Method

Given an input image \(I \in \mathbb {R}^{H \times W \times C}\), the goal of RCLane is to predict a collection of lanes \(L = \{l_1, l_2, \cdots , l_N\}\), where N is the total number of lanes. Generally, each lane \(l_k\) is represented as follows:

$$\begin{aligned} l_k = \{(x_{1}, y_{1}), (x_{2}, y_{2}), \cdots , (x_{N_k}, y_{N_k})\}, \end{aligned}$$
(1)

The overall structure of our RCLane is shown in Fig. 2. This section will first present the concept of lane detection with relay chain, then introduce the lane encoder for relay station construction, followed by a lane decoder to attain curve lanes. Finally, the network architecture and losses we adopt is detailed.

3.1 Lane Detection with Relay Chain

Focusing on the combination of local location and global shape information to detect lanes with complex topologies, we propose a novel lane detection method RCLane with the idea of relay chain. Relay chain is a structure composed of relay stations which are connected in a chain mode. Relay station is responsible for data processing and transmitting it to adjacent stations, while chain is a kind of structure that organizes these stations from an overall perspective. All stations are associated to corresponding lane points respectively.

We design the structure of relay chain which is appropriate for combining local location and global geometry message in lane detection and propose RCLane in this work. To be specific, each foreground point on the lane is treated as a relay station and can extend to the neighbor points iteratively to decode the lane in a chain mode. All foreground points are supervised by two kinds of message mentioned above. Moreover, the structure of chain has high flexibility to fit lanes with complex topologies.

Next, we will introduce the relay station construction and propose bilateral predictions for complex topologies and global shape message learning to explain how to detect lanes with the idea of Relay Chain progressively.

Relay Station Construction. Segmentation-based approaches normally predict all foreground points on lanes and cluster them via post-processing. [1] predicts horizontal and vertical affinity fields for clustering and associating pixels belonging to the same lane. [24] regresses a vector describing the local geometry of the curve that current pixel belongs to and refines shape further in the decoding algorithm. Nevertheless, they both fix the vertical intervals between adjacent points and decode lanes row-by-row from bottom to top. In fact, horizontal offsets are used for refining the position of current points while vertical offsets are for exploring the vertical neighbors of them. And the fixed vertical offsets can not adapt to the high degree of freedom for lanes. For example, they can only detect a fraction of the nearly horizontal lanes. Thus, we propose relay station construction module to establish relationships between neighboring points on the lane. Each relay station \(p=(p_x, p_y)\) predicts offsets to its neighboring point \(p^{next}=(p^{next}_x, p^{next}_y)\) on the same lane with a fixed step length d as is shown in Eq. 23 in two directions. And the deformation trend of lanes can be fitted considerably by eliminating vertical constraints. All relay stations are then connected to form a chain which is the lane exactly.

$$\begin{aligned} (p^{next}_x, p^{next}_y) = (p_x, p_y) + (\varDelta x, \varDelta y), \end{aligned}$$
(2)
$$\begin{aligned} \varDelta x^2 + \varDelta y^2 = d^2. \end{aligned}$$
(3)

Bilateral Predictions for Complex Topologies. The current autonomous driving scenario contains lanes with complex topologies such as Y-shape and Fork-shape lanes, which can be regarded as that two lanes merges as the stem. One-way prediction can only detect one of lanes because it can only extend to one limb when starting from the stem of these lanes. We adopt a two-way detection strategy that splits the next neighboring point \(p^{next}\) into the forward point \(p^f\) and the backward point \(p^b\). Points on different limbs can recover lanes they belong to respectively and compose the final Y-shape or fork-shape lanes as is illustrated in Fig. 3(b). Let F denotes the output feature map from the backbone whose resolution drops by a factor of 4 compared to the original image. We design a transfer output head and pick F as input. F goes through convolution-based transfer head to get the transfer map T which consists of forward and backward components \(T_f, T_b \in \mathbb {R}^{H \times W \times 2}\). Each location in \(T_f\) is a 2D vector, which represents the offsets between the forward neighboring point \(p^f\) and the current pixel p. The definition of \(T_b\) is similar as \(T_f\). Consequently, we can detect the forward and backward neighboring points \(p^f\), \(p^b\) of p guided by T.

$$\begin{aligned} p^f = p + T_f(p),\quad {p^b} = p + T_b(p). \end{aligned}$$
(4)
Fig. 3.
figure 3

(a) is an illustration of the transfer vectors and distance scalars for \(p_i\). \(\overline{T}_{f,b}(p_i)\) are the forward and backward transfer vectors. \(\overline{D}_{f,b}(p_i)\) are the forward and backward distance scalars. (b) shows our bilateral predictions can not only decode Y-shape or fork-shape lanes, but also fit simple structures, like straight lanes and curved lanes.

With the guidance of local location information in transfer map T, the whole lane can be detected iteratively via bilateral strategy.

Global Shape Message Learning. Previous works predict positions of end points for lanes to guide decoding process. FastDraw [22] predicts end tokens to encode the global geometry while CondLaneNet [15] recovers the row-wise shape through the vertical range prediction. These methods actually ignores the relation between the end points and other points on the same lane. We make every relay station learns the global shape message transmitted in the chain by utilizing the relation mentioned above. In detail, we design a distance head to predict the distance map D that consists of the forward and backward components \(D_f, D_b \in \mathbb {R}^{H \times W \times 1}\). Each location in \(D_f\) is a scalar, which represents the distance from the current pixel p to the forward end point \(p_{end}^{f}\) on the lane. With this global shape information, we can know when to stop the lane decoding process. Specifically speaking, the iterations for decoding the forward branch of p is \(\frac{D_f}{d}\). The definition of \(D_b\) is similar as \(D_f\) as well. With the combination of local location and global geometry information, our relay chain prediction strategy performs considerably well even in complex scenarios. Next, we will introduce the novel pair of lane encoding and decoding algorithms designed for lane detection.

3.2 Lane Encoder for Relay Station Construction

The lane encoder is to create the supervision of transfer and distance maps for training. Given an image \(I \in \mathbb {R}^{H \times W \times 3}\) and its segmentation mask \(\overline{S} \in \mathbb {R}^{H \times W \times 1}\), for any foreground point \(p_i = (x_i, y_i) \in \overline{S}\) we denote its corresponding lane as \(\gamma _L\). The two forward and backward end points of \(\gamma _L\) are denoted as \(p_{end}^{f} = (x_{end}^{f}, y_{end}^{f})\) and \(p_{end}^{b} = (x_{end}^{b}, y_{end}^{b})\), which have the minimum and maximum y-coordinates respectively. The forward distance scalar \(\overline{D}_f(p_i)\) and backward distance scalar \(\overline{D}_b(p_i)\) of \(p_i\) are formulated as the following:

$$\begin{aligned} \overline{D}_f(p_i) = \sqrt{(x_i-x_{end}^{f})^2 + (y_i-y_{end}^{f})^2}, \end{aligned}$$
(5)
$$\begin{aligned} \overline{D}_b(p_i) = \sqrt{(x_i-x_{end}^{b})^2 + (y_i-y_{end}^{b})^2}. \end{aligned}$$
(6)
Fig. 4.
figure 4

Lane encoder. All foreground points are matched with the nearest lanes. The arrows in a circle indicate transfer vectors of a foreground point to its two neighbors on lane. The distance scalars represent distances between the current point and two end points of the lane. All results are generated with point-wise traversal.

To generate the forward transfer vector and backward transfer vector for pixel \(p_i\), we first find the two neighbors on \(\gamma _L\) of it with the fixed distance d. They are denoted as \(p_i^f=(x_i^{f}, y_i^{f})\) and \(p_i^b=(x_i^{b}, y_i^{b})\) and represent the forward neighbor and backward neighbor respectively. Then the forward transfer vector \(\overline{T}_f(p_i)\) and the backward transfer vector \(\overline{T}_b(p_i)\) for pixel \(p_i\) are defined :

$$\begin{aligned} \overline{T}_f(p_i) = (x_i^{f} - x_i, y_i^{f} - y_i), \end{aligned}$$
(7)
$$\begin{aligned} \overline{T}_b(p_i) = (x_i^{b} - x_i, y_i^{b} - y_i), \end{aligned}$$
(8)
$$\begin{aligned} \vert \vert \overline{T}_f(p_i) \vert \vert _2 = \vert \vert \overline{T}_b(p_i) \vert \vert _2 = d. \end{aligned}$$
(9)

The details are shown in Fig. 3(a). In addition, for two separate parts of one Y-shape lane: \(l_1 = \{(x_1, y_1), \cdots , (x_m, y_m),(x^1_{m+1}, y^1_{m+1}), \cdots , (x^1_{n_1}, y^1_{n_1})\}\), \(l_2 = \{(x_1, y_1), \cdots , (x_m, y_m),(x^2_{m+1}, y^2_{m+1}), \cdots , (x^2_{n_2},y^2_{n_2})\}\). \(\{(x_1, y_1), \cdots , (x_m, y_m)\}\) is the shared stem. We randomly choose one point from \((x^1_{m+1}, y^1_{m+1})\) and \((x^2_{m+1},\) \( y^2_{m+1})\) as the forward neighboring point of \((x_m, y_m)\) while \((x_m, y_m)\) is the common backward neighboring point of \((x^1_{m+1}, y^1_{m+1})\) and \((x^2_{m+1}, y^2_{m+1})\). All foreground pixels on the \(\overline{S}\) are processed following the same formula and then \(\overline{T}_{f,b}\) and \(\overline{D}_{f,b}\) can be generated. The process is shown in Fig. 4.

3.3 Lane Decoder with Transfer and Distance Map

With the predictions of local location and global geometry, we propose a novel lane decoding algorithm to detect all curves in a given image.

Fig. 5.
figure 5

The illustration of the lane decoder. The forward branch predicts the forward part of the lane via forward transfer map \(T_f\) and forward distance map \(D_f\). The backward part can be decoded from the backward branch similarly.

Given the predicted binary segmentation mask S, transfer map T and distance map D, we collect all the foreground points of S and use a Point-NMS to get a sparse set of key points K. Every key point \(p \in K\) serves as a start point to recover one global curve.

Step1: Find the forward transfer vector \(T_f(p)\) and forward distance scalar \(D_f(p)\) for p. The moving steps we should extend the neighbors for the forward branch can be defined as \(M^{f} = \frac{D_f(p)}{d}\). In other words, we can infer the location of the forward end point of p with \(D_f(p)\) on the same lane.

Here d is the step length. Then the forward neighbor pixel \(p_{i+1}^{f}\) of \(p_i^{f}\) can be calculated iteratively by:

$$\begin{aligned} p_{i+1}^{f} = p_i^{f} + T_f({p_i^{f}}),\ i \in \{0, 1, 2, \cdots , M^{f} - 1\},\ p_0=p. \end{aligned}$$
(10)

The forward branch of the curve can be recovered by connecting \(\{p, p_1^{f}, \cdots , p_{M^{f}}^{f}\}\) sequentially. The detail is shown on the top of Fig. 5.

Step2: We calculate the point set \(\{p, p_1^{b}, p_2^{b}, \cdots , p_{M^{b}}^{b}\}\) following Eq. 10 via \(T_b\) and \(D_b\) and connect them sequentially to recover the backward branch.

Step3: We then merge the backward and forward curve branches together to get the global curve:

$$\begin{aligned} \gamma _L=\{p_{M^{b}}^{b}, \cdots , p_2^{b}, p_1^{b}, p, p_1^{f}, p_2^{f}, \cdots , p_{M^{f}}^{f}\}. \end{aligned}$$
(11)

Finally, the non-maximum suppression [19] is performed on all the predicted curves to get the final results.

3.4 Network Architecture

The overall framework is shown in Fig. 2. SegFormer [31] is utilized as our network backbone, aiming to extract global contextual information and learn the long and thin structures of lanes. SegFormer-B0, B1 and B2 are used as small, medium and large backbones in our experiments respectively. Given an image \(I \in R^{H\times W \times 3}\), the segmentation head predicts the binary segmentation mask \(S \in R^{H \times W \times 1}\), the transfer head predicts the transfer map T which consists of the forward and backward parts \(T_f, T_b\) \( \in \mathbb {R}^{H \times W \times 2}\), and the distance head predicts the distance map D that consists of \(D_f, D_b\) \( \in \mathbb {R}^{H \times W \times 1}\).

3.5 Loss Function

To train our proposed model, we adopt different losses for predictions. For the binary segmentation mask, we adopt the OHEM loss [25] to train it in order to solve class imbalance problem due to the sparsity of lane segmentation points. The OHEM loss is formulated as follows:

$$\begin{aligned} L_{seg} = \frac{1}{N_{pos}+N_{neg}}(\sum _{i \in S_{pos}}y_i log(p_i)+\sum _{i \in S_{neg}}(1-y_i)log(1-p_i)). \end{aligned}$$
(12)

where \(S_{pos}\) is the set of positive points and \(S_{neg}\) is the set of hard negative points which is most likely to be misclassified as positive. \(N_{pos}\) and \(N_{neg}\) denote the number of points in \(S_{pos}\) and \(S_{neg}\) respectively. The ratio of \(N_{neg}\) to \(N_{pos}\) is a hyperparmeter \(\mu \). As for the per-pixel transfer and distance maps, we simply adopt the smooth \(L_1\) loss, which are denoted as \(L_{T}\) and \(L_{D}\), to train them.

$$\begin{aligned} L_D = \frac{1}{N_{pos}}\sum _{i \in S_{pos}}L_{smooth_{L_1}}(D(p_i), \overline{D}(p_i)), \end{aligned}$$
(13)
$$\begin{aligned} L_T = \frac{1}{N_{pos}}\sum _{i \in S_{pos}}L_{smooth_{L_1}}(T(p_i), \overline{T}(p_i)). \end{aligned}$$
(14)

In the training phase, the total loss is defined as follows:

$$\begin{aligned} L_{total} = L_{seg} + L_{T} + L_{D}. \end{aligned}$$
(15)

4 Experiment

4.1 Experimental Setting

Dataset. We conduct experiments on four widely used lane detection benchmark datasets: CULane [21], TuSimple [29], LLAMAS [2] and CurveLanes [32]. CULane consists of 55 h of videos which comprises nine different scenarios, including normal, crowd, dazzle night, shadow, no line, arrow, curve, cross and night. The TuSimple dataset is collected with stable lighting conditions on highways. LLAMAS is a large lane detection dataset obtained on highway scenes with annotations auto-generated by using high-definition maps. CurveLanes is a recently proposed benchmark with cases of complex topologies such as Y-shape lanes and dense lanes. The details of four datasets are shown in Table 1.

Table 1. Lane detection datasets.
Table 2. State-of-the-art comparison on CULane. Even the small version of our RCLane achieves the state-of-art performance with only 6.3M parameters.

Evaluation Metrics. For CULane, CurveLanes and LLAMAS, we utilize F1-measure as the evaluation metric. While for TuSimple, accuracy is presented as the official indicator. And we also report the F1-measure for TuSimple. The calculation method follows the same formula as in CondLaneNet [15].

Implementation Details. The small, medium and large versions of our RCLane-Det are used on all four datasets. Except when explicitly indicated, the input resolution is set to \(320 \times 800\) during training and testing. For all training sessions, we use AdamW optimizer [17] to train 20 epochs on CULane, CurveLanes and LLAMAS, 70 epochs on TuSimple respectively with a batch size of 32. The learning rate is initialized as 6e-4 with a “poly” LR schedule. We set \(\eta \) for calculating IOU between lines as 15, the ratio of \(N_{neg}\) to \(N_{pos}\) \(\mu \) as 15, the minimum distance between any two foreground pixels of in Point-NMS \(\tau \) as 2. We implement our method using the Mindspore [18] on Ascend 910.

4.2 Results

CULane. As illustrated in Table 2, RCLane achieves a new state-of-the-art result on the CULane testing set with an 80.50% F1-measure. Compared with the best model as far as we know, CondLaneNet [15], although our method performs better only 1.02% of F1-measure compared with the best model before CondLaneNet since CULane is a simpler dataset with may straight lines, it has an considerable improvements in crowded and curve scenes, which demonstrates that Relay Chain can strengthen local location connectivity through global shape learning for local occlusions and complex line topologies.

Table 3. Performance of different methods on CurveLanes.

CurveLanes. CurveLanes [32] is a challenging benchmark with many hard scenarios. The evaluation results are shown in Table 3. We can see that our largest model (with SegFormer-B2) surpasses CondLaneNet-L by 5.33% in F1-measure, which is more pronounced than it on CULane. Due to that CurveLanes is more complex with Fork-shape, Y-shape and other curve lanes, improvements both in recall rate and accuracy prove that RCLane has generalization ability on lanes.

TuSimple. The results on TuSimple are shown in Table 4. As Tusimple is a small dataset and scenes are more simple with accurate annotations, the gap between all methods is small. Moreover, our method also achieves a new state-of-the-art F1 score of 97.64%.

Table 4. Performance of different methods on TuSimple.

LLAMAS. LLAMAS [2] is a new dataset with more than 100K images from highway scenarios. The results of our RCLane on LLAMAS is shown in Table 5. The best result of our method is 96.13% F1 score with RCLane-L.

Table 5. Performance of different methods on LLAMAS.

4.3 Ablation Study

Different Modules. In this section, we perform the ablation study to evaluate the impact of the proposed relay station construction, bilateral predictions and global shape message learning on CurveLanes. The results is shown in Table 6. The first row shows the baseline result, which only uses binary segmentation plus post processing named DBSCAN [5] to detect lanes. In the second row, the lane is recovered from bottom to top gradually with the guidance of the forward transfer map and forward distance map. While the third row detect lanes from top to bottom. In the fourth row, we only use the forward and backward transfer maps to predict the lane. And we present our full version of RCLane in the last row, which attains a new state-of-art result 91.43% on CurveLanes.

Comparing the first two rows, we can see that the proposed relay station construction has greatly improved the performance. Then, we add global shape information learning with distance map which can improve the performance from 88.19% to 91.43%. While we do additional two experiments in the second and third lines, the lane is detected by transfer and distance maps from one-way direction and there is a certain gap with the highest F1-score. It proves that our bilateral prediction has generalization in depicting topologies of lanes. In addition, there exists a gap between the forward the backward models. As the near lanes (the bottom region of the image) are usually occluded by the ego car, the corresponding lane points get low confidence scores from the segmentation results. Therefore the starting points are usually outside of the occluded area and the forward counterpart eventually has no chance back to cover the lanes at the bottom of the image. In contrast, the backward model detects lanes more completely with the help of the distance map when decoding from the top, including the occluded area.

Table 6. Comparison of different components on CurveLanes. The \(T_f\), \(T_b\), \(D_f\), \(D_b\) represent the forward transfer map, backward transfer map, forward distance map and backward distance map respectively.

Comparisons with Other Methods Using the Same Backbone. We additionally use Segformer-B2 [31] as backbone to train CondLaneNet [15] and LaneAF [1] respectively and show their results on Table 7 below. Without changing the parameters of their models, our model still outperforms LaneAF and CondLaneNet by a margin on CULane [21] dataset due to its superior precision, which demonstrates the high quality of lanes detected by RCLane. It further fairly verifies the superiority of our proposed relay chain prediction method, which can process local location and global geometry information simultaneously to improve the capacity of the model.

Table 7. Comparisons with other methods using the same backbone Segformer-B2.
Fig. 6.
figure 6

Visualization of network outputs. A.(1, 3) are features of \(D_f\) and \(D_b\), while A.(2, 4) are features of \(T_f\) and \(T_b\). A.(5) is the segmentation result and becomes sparse map A.(6) via Point-NMS. B is a harder frame compared to A.

Local Location and Global Shape Message Modeling. In Fig. 6 A.(1, 3), the transfer map can capture local location information depicting topology of the lane precisely, while the distance map in Fig. 6 A.(2, 4) models global shape information with large receptive field. Furthermore, in some driving scenarios, there occurs loss of lane information due to the disappearance of trace for lanes as is shown in Fig. 6(B). However, lanes are still captured faintly in the transfer map with the global shape information learning. The results show the robustness of our RCLane with local location and global shape message modeling.

5 Conclusion

In this paper, we have proposed to solve lane detection problem by learning a novel relay chain prediction model. Compared with existing lane detection methods, our model is able to capture global geometry and local information progressively with the novel relay station construction and global shape message learning. Furthermore, bilateral predictions can adapt to hard topologies, such as Fork-shape and Y-shape. Extensive experiments on four benchmarks including CULane, CurveLanes, Tusimple and LLAMAS demonstrate state-of-the-art performance and generalization ability of our RCLane.