A multi-modal spatial-temporal model for accurate motion forecasting with visual fusion

被引:7
|
作者
Wang, Xiaoding [1 ,2 ]
Liu, Jianmin [1 ,2 ]
Lin, Hui [1 ,2 ]
Garg, Sahil [3 ]
Alrashoud, Mubarak [4 ]
机构
[1] Fujian Normal Univ, Coll Comp & Cyber Secur, 8 Xuefu South Rd, Fuzhou 350117, Fujian, Peoples R China
[2] Fujian Prov Univ, Engn Res Ctr Cyber Secur & Educ Informatizat, 8 Xuefu South Rd, Fuzhou 350117, Fujian, Peoples R China
[3] Ecole Technol Super, Elect Engn Dept, Montreal, PQ H3C 1K3, Canada
[4] King Saud Univ, Coll Comp & Informat Sci CCIS, Dept Software Engn SWE, Riyadh 11543, Saudi Arabia
关键词
Motion forecasting; Intelligent transportation; Spatial-temporal cross attention; Multi-source visual fusion;
D O I
10.1016/j.inffus.2023.102046
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The multi-source visual information from ring cameras and stereo cameras provides a direct observation of the road, traffic conditions, and vehicle behavior. However, relying solely on visual information may not provide a complete environmental understanding. It is crucial for intelligent transportation systems to effectively utilize multi-source, multi-modal data to accurately predict the future motion trajectory of vehicles accurately. Therefore, this paper presents a new model for predicting multi-modal trajectories by integrating multi-source visual feature. A spatial-temporal cross attention fusion module is developed to capture the spatiotemporal interactions among vehicles, while leveraging the road's geographic structure to improve prediction accuracy. The experimental results on the realistic dataset Argoverse 2 demonstrate that, in comparison to other methods, ours improves the metrics of minADE (Minimum Average Displacement Error), minFDE (Minimum Final Displacement Error), and MR (Miss Rate) by 1.08%, 3.15%, and 2.14% , respectively, in unimodal prediction. In multimodal prediction, the improvements are 5.47%, 4.46%, and 6.50%. Our method effectively captures the temporal and spatial characteristics of vehicle movement trajectories, making it suitable for autonomous driving applications.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Multi-modal Gait Recognition via Effective Spatial-Temporal Feature Fusion
    Cui, Yufeng
    Kang, Yimei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 17949 - 17957
  • [2] Graph based Spatial-temporal Fusion for Multi-modal Person Re-identification
    Zhang, Yaobin
    Lv, Jianming
    Liu, Chen
    Cai, Hongmin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3736 - 3744
  • [3] Multi-modal sensor fusion for highly accurate vehicle motion state estimation
    Marco, Vicent Rodrigo
    Kalkkuhl, Jens
    Raisch, Joerg
    Scholte, Wouter J.
    Nijmeijer, Henk
    Seel, Thomas
    CONTROL ENGINEERING PRACTICE, 2020, 100
  • [4] DFMM-Precip: Deep Fusion of Multi-Modal Data for Accurate Precipitation Forecasting
    Li, Jinwen
    Wu, Li
    Liu, Jiarui
    Wang, Xiaoying
    Xue, Wei
    Water (Switzerland), 2024, 16 (24)
  • [5] A Lightweight and Accurate Spatial-Temporal Transformer for Traffic Forecasting
    Li, Guanyao
    Zhong, Shuhan
    Deng, Xingdong
    Xiang, Letian
    Chan, S. -H. Gary
    Li, Ruiyuan
    Liu, Yang
    Zhang, Ming
    Hung, Chih-Chieh
    Peng, Wen-Chih
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 10967 - 10980
  • [6] Multi-Modal Pedestrian Trajectory Prediction for Edge Agents Based on Spatial-Temporal Graph
    Zou, Xiangyu
    Sun, Bin
    Zhao, Duan
    Zhu, Zongwei
    Zhao, Jinjin
    He, Yongxin
    IEEE ACCESS, 2020, 8 : 83321 - 83332
  • [7] Multi-Step Spatial-Temporal Fusion Network for Traffic Flow Forecasting
    Dong, Honghui
    Meng, Ziying
    Wang, Yiming
    Jia, Limin
    Qin, Yong
    2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 3412 - 3419
  • [8] MMSTN: A Multi-Modal Spatial-Temporal Network for Tropical Cyclone Short-term Prediction
    Huang, Cheng
    Bai, Cong
    Chan, Sixian
    Zhang, Jinglin
    GEOPHYSICAL RESEARCH LETTERS, 2022, 49 (04)
  • [9] Multi-modal transform-based fusion model for new product sales forecasting
    Li, Xiangzhen
    Shen, Jiaxing
    Wang, Dezhi
    Lu, Wu
    Chen, Yuanyi
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [10] Multi-modal Fusion
    Liu, Huaping
    Hussain, Amir
    Wang, Shuliang
    INFORMATION SCIENCES, 2018, 432 : 462 - 462