HCLT-YOLO: A Hybrid CNN and Lightweight Transformer Architecture for Object Detection in Complex Traffic Scenes

被引：0

作者：

Chen, Zhige ^{[1
]}

Yang, Kai ^{[1
]}

Wu, Yandong ^{[1
]}

Yang, Hao ^{[2
]}

Tang, Xiaolin ^{[1
]}

机构：

[1] Chongqing Univ, Coll Mech & Vehicle Engn, State Key Lab Mech Transmiss Adv Equipment, Chongqing 400030, Peoples R China

[2] Chongqing Univ Technol, Key Lab Adv Mfg Technol Automobile Parts, Minist Educ, Chongqing 400054, Peoples R China

来源：

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY | 2025年 / 74卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Autonomous driving; deep learning; lightweight transformer; traffic sign detection; FRAMEWORK;

D O I：

10.1109/TVT.2024.3496513

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The swift and accurate detection of traffic signs in traffic scenes is a pivotal aspect of environmental perception technology in autonomous driving systems. Traffic signs provide essential road information and regulatory instructions, which are critical to ensuring road safety. This paper presents the HCLTYOLO model to address the challenges of false alarms and missed detections in complex traffic environments. Specifically, we propose a novel hybrid CNN-transformer network architecture that efficiently integrates both local and global features, thereby improving traffic sign feature representation. To further enhance the model & acirc;s sensitivity to small traffic signs, we optimize the structure by introducing a dedicated small-object detection layer through upsampling and by leveraging SIoU to improve detection accuracy and computational efficiency. However, the addition of the small object detection layer and the Transformer module increases the overall computational complexity and parameter count, potentially affecting real-time performance. To address this issue, we introduce the DG-C2f module, which employs linear transformations for feature mapping, streamlining the convolution process and enhancing real-time feasibility. Experimental evaluations on the GTSDB and TT100K datasets demonstrate that the proposed model improves detection accuracy by 2.5% and 6.8%, respectively, compared to YOLOv8s models. Notably, the detection accuracy for small traffic signs improved significantly, by 6.9% and 11.7%, respectively. Additionally, processor-in-the-loop experiments on the NVIDIA Jetson AGX Orin show that the model achieves an inference speed of 46 FPS, meeting the real-time requirements for in-vehicle applications.

引用

页码：3681 / 3694

页数：14

共 50 条

[1] End-to-End Object Detection by Sparse R-CNN With Hybrid Matching in Complex Traffic Scenes
Han, Xue-juan
Qu, Zhong
Wang, Shi-Yan
Xia, Shu-Fang
Wang, Sheng-Ye
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 512 - 525
[2] Conflagration-YOLO: a lightweight object detection architecture for conflagration
Sun, Ning
Shen, Pengfei
Ye, Xiaoling
Chen, Yifei
Cheng, Xiping
Wang, Pingping
Min, Jie
AI COMMUNICATIONS, 2023, 36 (04) : 361 - 376
[3] Small Object Detection in Traffic Scenes Based on YOLO-MXANet
He, Xiaowei
Cheng, Rao
Zheng, Zhonglong
Wang, Zeji
SENSORS, 2021, 21 (21)
[4] Generalized Haar Filter based CNN for Object Detection in Traffic Scenes
Lu, Keyu
Li, Jian
An, Xiangjing
He, Hangen
Hu, Xiping
2017 13TH IEEE CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2017, : 1657 - 1662
[5] CNN-Based Lightweight Flame Detection Method in Complex Scenes
Li X.
Zhang D.
Sun L.
Xu Y.
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2021, 34 (05): : 415 - 422
[6] Research on Object Detection Method Based on FF-YOLO for Complex Scenes
Chen Baoyuan
Liu Yitong
Sun Kun
IEEE ACCESS, 2021, 9 : 127950 - 127960
[7] MRD-YOLO: A Multispectral Object Detection Algorithm for Complex Road Scenes
Sun, Chaoyue
Chen, Yajun
Qiu, Xiaoyang
Li, Rongzhen
You, Longxiang
SENSORS, 2024, 24 (10)
[8] CTAFFNet: CNN-Transformer Adaptive Feature Fusion Object Detection Algorithm for Complex Traffic Scenarios
Dong, Xinlong
Shi, Peicheng
Liang, Taonian
Yang, Aixi
TRANSPORTATION RESEARCH RECORD, 2024,
[9] CNN-Transformer Hybrid Architecture for Early Fire Detection
Yang, Chenyue
Pan, Yixuan
Cao, Yichao
Lu, Xiaobo
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 570 - 581
[10] GhostFormer: Efficiently amalgamated CNN-transformer architecture for object detection
Xie, Xin
Wu, Dengquan
Xie, Mingye
Li, Zixi
PATTERN RECOGNITION, 2024, 148

← 1 2 3 4 5 →