Egocentric Human Trajectory Forecasting With a Wearable Camera and Multi-Modal Fusion

被引:6
|
作者
Qiu, Jianing [1 ]
Chen, Lipeng [2 ]
Gu, Xiao [1 ]
Lo, Frank P-W [1 ]
Tsai, Ya-Yen [1 ]
Sun, Jiankai [2 ,3 ]
Liu, Jiaqi [2 ,4 ]
Lo, Benny [1 ]
机构
[1] Imperial Coll London, Hamlyn Ctr Robot Surg, London SW7 2AZ, England
[2] Tencent Robot X, Shenzhen 518057, Peoples R China
[3] Stanford Univ, Dept Aeronaut & Astronaut, Stanford, CA 94305 USA
[4] Shanghai Jiao Tong Univ, Inst Med Robot, Shanghai 200240, Peoples R China
关键词
Human trajectory forecasting; egocentric vision; multi-modal learning;
D O I
10.1109/LRA.2022.3188101
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
In this letter, we address the problem of forecasting the trajectory of an egocentric camera wearer (ego-person) in crowded spaces. The trajectory forecasting ability learned from the data of different camera wearers walking around in the real world can he transferred to assist visually impaired people in navigation, as well as to instill human navigation behaviours in mobile robots, enabling better human-robot interactions. To this end, a novel egocentric human trajectory forecasting dataset was constructed, containing real trajectories of people navigating in crowded spaces wearing a camera, as well as extracted rich contextual data. We extract and utilize three different modalities to forecast the trajectory of the camera wearer, i.e., his/her past trajectory, the past trajectories of nearby people, and the environment such as the scene semantics or the depth of the scene. A Transformer-based encoder-decoder neural network model, integrated with a novel cascaded cross-attention mechanism that fuses multiple modalities, has been designed to predict the future trajectory of the camera wearer. Extensive experiments have been conducted, with results showing that our model outperforms the state-of-the-art methods in egocentric human trajectory forecasting.
引用
收藏
页码:8799 / 8806
页数:8
相关论文
共 50 条
  • [21] Multi-modal data fusion: A description
    Coppock, S
    Mazlack, LJ
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS, 2004, 3214 : 1136 - 1142
  • [22] Multi-Modal Trajectory Prediction of NBA Players
    Hauri, Sandro
    Djuric, Nemanja
    Radosavljevic, Vladan
    Vucetic, Slobodan
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1639 - 1648
  • [23] Multi-modal Air Trajectory Traffic Management
    Henderson, Thomas C.
    Marston, Vista
    Sacharny, David
    INTELLIGENT AUTONOMOUS SYSTEMS 18, VOL 2, IAS18-2023, 2024, 794 : 247 - 255
  • [24] Multi-modal Fusion of LiDAR and Camera Sensors for Enhanced Perception in Intelligent Traffic Systems
    Wen, Nu
    Wang, Xiuli
    Guo, Jing
    Wang, Yankun
    Wang, Yang
    2024 INTERNATIONAL CONFERENCE ON ELECTRONIC ENGINEERING AND INFORMATION SYSTEMS, EEISS 2024, 2024, : 166 - 174
  • [25] Probabilistic multi-modal depth estimation based on camera-LiDAR sensor fusion
    Obando-Ceron, Johan S.
    Romero-Cano, Victor
    Monteiro, Sildomar
    MACHINE VISION AND APPLICATIONS, 2023, 34 (05)
  • [26] Multi-modal fusion architecture search for camera-based semantic scene completion
    Wang, Xuzhi
    Feng, Wei
    Wan, Liang
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 243
  • [27] Multi-modal egocentric activity recognition using multi-kernel learning
    Arabaci, Mehmet Ali
    Ozkan, Fatih
    Surer, Elif
    Jancovic, Peter
    Temizel, Alptekin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 16299 - 16328
  • [28] Multi-modal egocentric activity recognition using multi-kernel learning
    Mehmet Ali Arabacı
    Fatih Özkan
    Elif Surer
    Peter Jančovič
    Alptekin Temizel
    Multimedia Tools and Applications, 2021, 80 : 16299 - 16328
  • [29] Multi-modal learning for geospatial vegetation forecasting
    Benson, Vitus
    Robin, Claire
    Requena-Mesa, Christian
    Alonso, Lazaro
    Carvalhais, Nuno
    Cortes, Jose
    Gao, Zhihan
    Linscheid, Nora
    Weynants, Melanie
    Reichstein, Markus
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27788 - 27799
  • [30] A multi-modal spatial-temporal model for accurate motion forecasting with visual fusion
    Wang, Xiaoding
    Liu, Jianmin
    Lin, Hui
    Garg, Sahil
    Alrashoud, Mubarak
    INFORMATION FUSION, 2024, 102