A multi-modal spatial-temporal model for accurate motion forecasting with visual fusion

被引:7
|
作者
Wang, Xiaoding [1 ,2 ]
Liu, Jianmin [1 ,2 ]
Lin, Hui [1 ,2 ]
Garg, Sahil [3 ]
Alrashoud, Mubarak [4 ]
机构
[1] Fujian Normal Univ, Coll Comp & Cyber Secur, 8 Xuefu South Rd, Fuzhou 350117, Fujian, Peoples R China
[2] Fujian Prov Univ, Engn Res Ctr Cyber Secur & Educ Informatizat, 8 Xuefu South Rd, Fuzhou 350117, Fujian, Peoples R China
[3] Ecole Technol Super, Elect Engn Dept, Montreal, PQ H3C 1K3, Canada
[4] King Saud Univ, Coll Comp & Informat Sci CCIS, Dept Software Engn SWE, Riyadh 11543, Saudi Arabia
关键词
Motion forecasting; Intelligent transportation; Spatial-temporal cross attention; Multi-source visual fusion;
D O I
10.1016/j.inffus.2023.102046
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The multi-source visual information from ring cameras and stereo cameras provides a direct observation of the road, traffic conditions, and vehicle behavior. However, relying solely on visual information may not provide a complete environmental understanding. It is crucial for intelligent transportation systems to effectively utilize multi-source, multi-modal data to accurately predict the future motion trajectory of vehicles accurately. Therefore, this paper presents a new model for predicting multi-modal trajectories by integrating multi-source visual feature. A spatial-temporal cross attention fusion module is developed to capture the spatiotemporal interactions among vehicles, while leveraging the road's geographic structure to improve prediction accuracy. The experimental results on the realistic dataset Argoverse 2 demonstrate that, in comparison to other methods, ours improves the metrics of minADE (Minimum Average Displacement Error), minFDE (Minimum Final Displacement Error), and MR (Miss Rate) by 1.08%, 3.15%, and 2.14% , respectively, in unimodal prediction. In multimodal prediction, the improvements are 5.47%, 4.46%, and 6.50%. Our method effectively captures the temporal and spatial characteristics of vehicle movement trajectories, making it suitable for autonomous driving applications.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Multi-modal spatio-temporal meteorological forecasting with deep neural network
    Zhang, Xinbang
    Jin, Qizhao
    Yu, Tingzhao
    Xiang, Shiming
    Kuang, Qiuming
    Prinet, Veronique
    Pan, Chunhong
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 188 : 380 - 393
  • [42] Spatial-temporal forecasting the sunspot diagram
    Covas, Eurico
    ASTRONOMY & ASTROPHYSICS, 2017, 605
  • [43] Spatial-temporal forecasting of tourism demand
    Yang, Yang
    Zhang, Honglei
    ANNALS OF TOURISM RESEARCH, 2019, 75 : 106 - 119
  • [44] Multi-modal spatial relational attention networks for visual question answering
    Yao, Haibo
    Wang, Lipeng
    Cai, Chengtao
    Sun, Yuxin
    Zhang, Zhi
    Luo, Yongkang
    IMAGE AND VISION COMPUTING, 2023, 140
  • [45] Spatial-temporal forecasting of solar radiation
    Boland, John
    RENEWABLE ENERGY, 2015, 75 : 607 - 616
  • [46] Dynamic memory network with spatial-temporal feature fusion for visual tracking
    Zhang, Hongchao
    Bao, Hua
    Lu, Yixiang
    Zhang, Dexiang
    Xun, Lina
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (05)
  • [47] Disease Classification Model Based on Multi-Modal Feature Fusion
    Wan, Zhengyu
    Shao, Xinhui
    IEEE ACCESS, 2023, 11 : 27536 - 27545
  • [48] Multi-Modal Sarcasm Detection in Twitter with Hierarchical Fusion Model
    Cai, Yitao
    Cai, Huiyu
    Wan, Xiaojun
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2506 - 2515
  • [49] Spatial-Temporal Fusion Graph Neural Networks With Mixed Adjacency for Weather Forecasting
    Guo, Ang
    Liu, Yanghe
    Shao, Shiyu
    Shi, Xiaowei
    Feng, Zhenni
    IEEE ACCESS, 2025, 13 : 15812 - 15824
  • [50] Multi-Scale Spatial-Temporal Transformer for Meteorological Variable Forecasting
    Li, Tian-Bao
    Su, Yu-Ting
    Song, Dan
    Li, Wen-Hui
    Wei, Zhi-Qiang
    Liu, An-An
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2474 - 2486