SinLane

SinLane: Siamese Visual Transformer following Pyramid Feature Integration for Lane Detection

¹Shanghai Jiao Tong University, ²Zhejiang University,
³University of Notre Dame,
ECAI 2024
^†Corresponding Author

Abstract

Lane detection is an important yet challenging task in autonomous driving systems. Based on the development of the Visual Transformer, early Transformer-based lane detection studies have achieved promising results in some scenarios. However, for complex road conditions such as uneven illumination intensity and heavy traffic, the performance of these methods remains limited and may even be worse than that of contemporaneous CNN-based methods. In this paper, we propose a novel Transformer-based end-to-end network, called SinLane, that attains the attention weights focusing on the sparse yet meaningful locations and improves the accuracy of lane detection in complex environments. SinLane is composed of a novel Siamese Visual Transformer structure and a novel Feature Pyramid Network (FPN) structure called Pyramid Feature Integration (PFI). We utilize the proposed PFI to better integrate global semantics and finer-scale features and to promote the optimization of the Transformer. Moreover, the designed Siamese Visual Transformer is combined with multiple levels of the PFI and is employed to refine the multi-scale lane line features output from the PFI. Extensive experiments on three benchmark datasets of lane detection demonstrate that our SinLane achieves state-of-the-art results with high accuracy and efficiency. Specifically, our SinLane improves the accuracy by over 3% compared to the current best-performing Transformer-based method for lane detection on CULane. Our code has be released.

Attention Maps

Attention map examples of LSTR and our proposed SinLane. The two models are both trained with the same number of epochs. The attention weights of LSTR concentrate on the middle area of the lane lines. On the contrary, the attention weights of our method are evenly distributed from top to bottom on each line on the road.

Quality Results

Visualization results of ground truth (GT), LSTR, CLRNet, and our SinLane method on the benchmark dataset CULane. The results are generated using the same backbone ResNet18.

Quantitative Results

Comparison results of recent methods and our method on the CULane dataset. In order to compare the computation speeds in the same environment, we remeasure FPS on the same machine with an RTX3090 GPU using open-source code (if code is available).

SinLane: Siamese Visual Transformer following Pyramid Feature Integration for Lane Detection

Abstract

The main structure of the Siamese Visual Transformer.

The architecture of our proposed PFI. The inputs of PFI are different scales of feature maps generated by the backbone.

Attention Maps

Quality Results

Visualization results of ground truth (GT), LSTR, CLRNet, and our SinLane method on the benchmark dataset CULane. The results are generated using the same backbone ResNet18.

Quantitative Results

Comparison results of recent methods and our method on the CULane dataset. In order to compare the computation speeds in the same environment, we remeasure FPS on the same machine with an RTX3090 GPU using open-source code (if code is available).

Comparison results on the Tusimple dataset.

Comparison results on the LLAMAS dataset.

Video Results on CULane & Tusimple & LLAMAS dataset (TODO)

Poster (TODO)

Supplement Material