Deep Attention Models for Human Tracking Using RGBD

被引:12
|
作者
Rasoulidanesh, Maryamsadat [1 ]
Yadav, Srishti [1 ]
Herath, Sachini [2 ]
Vaghei, Yasaman [3 ]
Payandeh, Shahram [1 ]
机构
[1] Simon Fraser Univ, Sch Engn Sci, Networked Robot & Sensing Lab, Burnaby, BC V5A 1S6, Canada
[2] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[3] Simon Fraser Univ, Sch Mechatron Syst Engn, Burnaby, BC V5A 1S6, Canada
关键词
computer vision; visual tracking; attention model; RGBD; Kinect; deep network; convolutional neural network; Long Short-Term Memory; DEPTH;
D O I
10.3390/s19040750
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Visual tracking performance has long been limited by the lack of better appearance models. These models fail either where they tend to change rapidly, like in motion-based tracking, or where accurate information of the object may not be available, like in color camouflage (where background and foreground colors are similar). This paper proposes a robust, adaptive appearance model which works accurately in situations of color camouflage, even in the presence of complex natural objects. The proposed model includes depth as an additional feature in a hierarchical modular neural framework for online object tracking. The model adapts to the confusing appearance by identifying the stable property of depth between the target and the surrounding object(s). The depth complements the existing RGB features in scenarios when RGB features fail to adapt, hence becoming unstable over a long duration of time. The parameters of the model are learned efficiently in the Deep network, which consists of three modules: (1) The spatial attention layer, which discards the majority of the background by selecting a region containing the object of interest; (2) the appearance attention layer, which extracts appearance and spatial information about the tracked object; and (3) the state estimation layer, which enables the framework to predict future object appearance and location. Three different models were trained and tested to analyze the effect of depth along with RGB information. Also, a model is proposed to utilize only depth as a standalone input for tracking purposes. The proposed models were also evaluated in real-time using KinectV2 and showed very promising results. The results of our proposed network structures and their comparison with the state-of-the-art RGB tracking model demonstrate that adding depth significantly improves the accuracy of tracking in a more challenging environment (i.e., cluttered and camouflaged environments). Furthermore, the results of depth-based models showed that depth data can provide enough information for accurate tracking, even without RGB information.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Attention, please! A survey of neural attention models in deep learning
    Alana de Santana Correia
    Esther Luna Colombini
    Artificial Intelligence Review, 2022, 55 : 6037 - 6124
  • [32] Resource-Efficient RGBD Aerial Tracking
    Yang, Jinyu
    Gao, Shang
    Li, Zhe
    Zheng, Feng
    Leonardis, Ales
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13374 - 13383
  • [33] An Ensemble of Complementary Models for Deep Tracking
    Qiuyu Kong
    Jin Tang
    Chenglong Li
    Xin Wang
    Jian Zhang
    Cognitive Computation, 2022, 14 : 1096 - 1106
  • [34] An Ensemble of Complementary Models for Deep Tracking
    Kong, Qiuyu
    Tang, Jin
    Li, Chenglong
    Wang, Xin
    Zhang, Jian
    COGNITIVE COMPUTATION, 2022, 14 (03) : 1096 - 1106
  • [35] An Automatic Human Fall Detection Approach Using RGBD Cameras
    Zhang, Shugang
    Li, Zhen
    Wei, Zhiqiang
    Wang, Shuang
    PROCEEDINGS OF 2016 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2016, : 781 - 784
  • [36] Automatic Visual Attention Detection for Mobile Eye Tracking Using Pre-Trained Computer Vision Models and Human Gaze
    Barz, Michael
    Sonntag, Daniel
    SENSORS, 2021, 21 (12)
  • [37] Medical Text Classification Using Hybrid Deep Learning Models with Multihead Attention
    Prabhakar, Sunil Kumar
    Won, Dong-Ok
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [38] Enhancing VR Gaming Experience using Computational Attention Models and Eye-Tracking
    Ennadifi, Elias
    Ravet, Thierry
    Mancas, Matei
    Mokhtari, Mohammed El Amine
    Gosselin, Bernard
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON INTERACTIVE MEDIA EXPERIENCES, IMX 2023, 2023, : 194 - 198
  • [39] Deep Learning for Human Visual Attention Recognition Using Transfer Learning
    Nam Vu Hoai
    Huong Nguyen Mai
    Cuong Pham
    2018 INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR COMMUNICATIONS (ATC), 2018, : 42 - 46
  • [40] A Seamless Deep Learning Approach for Apple Detection, Depth Estimation, and Tracking Using YOLO Models Enhanced by Multi-Head Attention Mechanism
    Sekharamantry, Praveen Kumar
    Melgani, Farid
    Malacarne, Jonni
    Ricci, Riccardo
    de Almeida Silva, Rodrigo
    Marcato Jr, Jose
    COMPUTERS, 2024, 13 (03)