Pedestrian Crossing Intention Prediction with Multi-Modal Transformer-Based Model

被引：0

作者：

Wang, Ting Wei ^{[1
]}

Lai, Shang-Hong ^{[1
,2
]}

机构：

[1] Natl Tsing Hua Univ, Inst Informat Syst & Applicat, Hsinchu, Taiwan

[2] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu, Taiwan

来源：

2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC | 2023年

关键词：

D O I：

10.1109/APSIPAASC58517.2023.10317161

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The popularity of autonomous driving and advanced driver assistance systems can potentially reduce thousands of car accidents and casualties. In particular, pedestrian prediction and protection is an urgent development priority for such systems. Prediction of pedestrians' intentions of crossing the road or their actions can help such systems to assess the risk of pedestrians in front of vehicles in advance. In this paper, we propose a multi-modal pedestrian crossing intention prediction framework based on the transformer model to provide a better solution. Our method takes advantage of the excellent sequential modeling capability of the Transformer, enabling the model to perform stably in this task. We also propose to represent traffic environment information in a novel way, allowing such information can be efficiently exploited. Moreover, we include the lifted 3D human pose and 3D head orientation information estimated from pedestrian image into the model prediction, allowing the model to understand pedestrian posture better. Finally, our experimental results show the proposed system provides state-of-the-art accuracy on benchmarking datasets.

引用

页码：1349 / 1356

页数：8

共 50 条

[31] MULTI-VIEW AND MULTI-MODAL EVENT DETECTION UTILIZING TRANSFORMER-BASED MULTI-SENSOR FUSION
Yasuda, Masahiro
Ohishi, Yasunori
Saito, Shoichiro
Harado, Noboru
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4638 - 4642
[32] Multi-modal pedestrian detection with misalignment based on modal-wise regression and multi-modal IoU
Wanchaitanawong, Napat
Tanaka, Masayuki
Shibata, Takashi
Okutomi, Masatoshi
JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (01)
[33] Learning Pedestrian Group Representations for Multi-modal Trajectory Prediction
Bae, Inhwan
Park, Jin-Hwi
Jeon, Hae-Gon
COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 : 270 - 289
[34] Multi-modal long document classification based on Hierarchical Prompt and Multi-modal Transformer
Liu, Tengfei
Hu, Yongli
Gao, Junbin
Wang, Jiapu
Sun, Yanfeng
Yin, Baocai
NEURAL NETWORKS, 2024, 176
[35] An intention-based multi-modal trajectory prediction framework for overtaking maneuver
Zhang, Mingfang
Liu, Ying
Li, Huajian
Wang, Li
Wang, Pangwei
2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 2850 - 2855
[36] FNR: a similarity and transformer-based approach to detect multi-modal fake news in social media
Ghorbanpour, Faeze
Ramezani, Maryam
Fazli, Mohammad Amin
Rabiee, Hamid R.
SOCIAL NETWORK ANALYSIS AND MINING, 2023, 13 (01)
[37] FNR: a similarity and transformer-based approach to detect multi-modal fake news in social media
Faeze Ghorbanpour
Maryam Ramezani
Mohammad Amin Fazli
Hamid R. Rabiee
Social Network Analysis and Mining, 13
[38] SERVER: Multi-modal Speech Emotion Recognition using Transformer-based and Vision-based Embeddings
Nhat Truong Pham
Duc Ngoc Minh Dang
Bich Ngoc Hong Pham
Sy Dzung Nguyen
PROCEEDINGS OF 2023 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY, ICIIT 2023, 2023, : 234 - 238
[39] UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency Detection
Guo, Ruohao
Ying, Xianghua
Qi, Yanyu
Qu, Liao
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7622 - 7635
[40] Dual-attention transformer-based hybrid network for multi-modal medical image segmentation
Zhang, Menghui
Zhang, Yuchen
Liu, Shuaibing
Han, Yahui
Cao, Honggang
Qiao, Bingbing
SCIENTIFIC REPORTS, 2024, 14 (01):

← 1 2 3 4 5 →