Pedestrian Crossing Intention Prediction with Multi-Modal Transformer-Based Model

被引：0

作者：

Wang, Ting Wei ^{[1
]}

Lai, Shang-Hong ^{[1
,2
]}

机构：

[1] Natl Tsing Hua Univ, Inst Informat Syst & Applicat, Hsinchu, Taiwan

[2] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu, Taiwan

来源：

2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC | 2023年

关键词：

D O I：

10.1109/APSIPAASC58517.2023.10317161

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The popularity of autonomous driving and advanced driver assistance systems can potentially reduce thousands of car accidents and casualties. In particular, pedestrian prediction and protection is an urgent development priority for such systems. Prediction of pedestrians' intentions of crossing the road or their actions can help such systems to assess the risk of pedestrians in front of vehicles in advance. In this paper, we propose a multi-modal pedestrian crossing intention prediction framework based on the transformer model to provide a better solution. Our method takes advantage of the excellent sequential modeling capability of the Transformer, enabling the model to perform stably in this task. We also propose to represent traffic environment information in a novel way, allowing such information can be efficiently exploited. Moreover, we include the lifted 3D human pose and 3D head orientation information estimated from pedestrian image into the model prediction, allowing the model to understand pedestrian posture better. Finally, our experimental results show the proposed system provides state-of-the-art accuracy on benchmarking datasets.

引用

页码：1349 / 1356

页数：8

共 50 条

[41] Personalized emotion analysis based on fuzzy multi-modal transformer model
Liu, Jianbang
Ang, Mei Choo
Chaw, Jun Kit
Ng, Kok Weng
Kor, Ah-Lian
APPLIED INTELLIGENCE, 2025, 55 (03)
[42] Multi-Modal Pedestrian Detection with Large Misalignment Based on Modal-Wise Regression and Multi-Modal IoU
Wanchaitanawong, Napat
Tanaka, Masayuki
Shibata, Takashi
Okutomi, Masatoshi
PROCEEDINGS OF 17TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA 2021), 2021,
[43] An Improved Transformer-Based Model for Urban Pedestrian Detection
Wu, Tianyong
Li, Xiang
Dong, Qiuxuan
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2025, 18 (01)
[44] MCIP: Multi-Stream Network for Pedestrian Crossing Intention Prediction
Ham, Je-Seok
Bae, Kangmin
Moon, Jinyoung
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2023, 13801 LNCS : 663 - 679
[45] Dynamical User Intention Prediction via Multi-modal Learning
Liu, Xuanwu
Li, Zhao
Mao, Yuanhui
Lai, Lixiang
Gao, Ben
Deng, Yao
Yu, Guoxian
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT I, 2020, 12112 : 519 - 535
[46] Research on Pedestrian Crossing Intention Prediction Based on Deep Learning
Huo, Chunbao
Ma, Jie
Tong, Zhibo
PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC INFORMATION TECHNOLOGY AND COMPUTER ENGINEERING, EITCE 2023, 2023, : 282 - 287
[47] A Vision Transformer-Based Framework for Knowledge Transfer From Multi-Modal to Mono-Modal Lymphoma Subtyping Models
Guetarni, Bilel
Windal, Feryal
Benhabiles, Halim
Petit, Marianne
Dubois, Romain
Leteurtre, Emmanuelle
Collard, Dominique
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (09) : 5562 - 5572
[48] Multi-modal transformer with language modality distillation for early pedestrian action anticipation
Osman, Nada
Camporese, Guglielmo
Ballan, Lamberto
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
[49] Transformer-Based Multi-Modal Data Fusion Method for COPD Classification and Physiological and Biochemical Indicators Identification
Xie, Weidong
Fang, Yushan
Yang, Guicheng
Yu, Kun
Li, Wei
BIOMOLECULES, 2023, 13 (09)
[50] Pedestrian crossing intention prediction in the wild:A survey
Yancheng Ling
Zhenliang Ma
Chain, 2024, 1 (04) : 263 - 279

← 1 2 3 4 5 →