Deep Attention Models for Human Tracking Using RGBD

被引:12
|
作者
Rasoulidanesh, Maryamsadat [1 ]
Yadav, Srishti [1 ]
Herath, Sachini [2 ]
Vaghei, Yasaman [3 ]
Payandeh, Shahram [1 ]
机构
[1] Simon Fraser Univ, Sch Engn Sci, Networked Robot & Sensing Lab, Burnaby, BC V5A 1S6, Canada
[2] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[3] Simon Fraser Univ, Sch Mechatron Syst Engn, Burnaby, BC V5A 1S6, Canada
关键词
computer vision; visual tracking; attention model; RGBD; Kinect; deep network; convolutional neural network; Long Short-Term Memory; DEPTH;
D O I
10.3390/s19040750
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Visual tracking performance has long been limited by the lack of better appearance models. These models fail either where they tend to change rapidly, like in motion-based tracking, or where accurate information of the object may not be available, like in color camouflage (where background and foreground colors are similar). This paper proposes a robust, adaptive appearance model which works accurately in situations of color camouflage, even in the presence of complex natural objects. The proposed model includes depth as an additional feature in a hierarchical modular neural framework for online object tracking. The model adapts to the confusing appearance by identifying the stable property of depth between the target and the surrounding object(s). The depth complements the existing RGB features in scenarios when RGB features fail to adapt, hence becoming unstable over a long duration of time. The parameters of the model are learned efficiently in the Deep network, which consists of three modules: (1) The spatial attention layer, which discards the majority of the background by selecting a region containing the object of interest; (2) the appearance attention layer, which extracts appearance and spatial information about the tracked object; and (3) the state estimation layer, which enables the framework to predict future object appearance and location. Three different models were trained and tested to analyze the effect of depth along with RGB information. Also, a model is proposed to utilize only depth as a standalone input for tracking purposes. The proposed models were also evaluated in real-time using KinectV2 and showed very promising results. The results of our proposed network structures and their comparison with the state-of-the-art RGB tracking model demonstrate that adding depth significantly improves the accuracy of tracking in a more challenging environment (i.e., cluttered and camouflaged environments). Furthermore, the results of depth-based models showed that depth data can provide enough information for accurate tracking, even without RGB information.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Tracking people in RGBD videos using deep learning and motion clues
    Xue, Hongyang
    Liu, Yao
    Cai, Deng
    He, Xiaofei
    NEUROCOMPUTING, 2016, 204 : 70 - 76
  • [2] Human Motion Tracking by Multiple RGBD Cameras
    Liu, Zhenbao
    Huang, Jinxin
    Han, Junwei
    Bu, Shuhui
    Lv, Jianfeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (09) : 2014 - 2027
  • [3] Context Attention: Human Motion Prediction Using Context Information and Deep Learning Attention Models
    Laplaza, Javier
    Moreno-Noguer, Francesc
    Sanfeliu, Alberto
    ROBOT2022: FIFTH IBERIAN ROBOTICS CONFERENCE: ADVANCES IN ROBOTICS, VOL 1, 2023, 589 : 102 - 112
  • [4] AMATrack: A Unified Network With Asymmetric Multimodal Mixed Attention for RGBD Tracking
    Ye, Ping
    Xiao, Gang
    Liu, Jun
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
  • [5] RGBD IMAGE SEGMENTATION USING DEEP EDGE
    Wibisono, Jan Kristanto
    Hang, Hsueh-Ming
    2017 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS 2017), 2017, : 565 - 569
  • [6] Tracking the articulated motion of the human body with two RGBD cameras
    Damien Michel
    Costas Panagiotakis
    Antonis A. Argyros
    Machine Vision and Applications, 2015, 26 : 41 - 54
  • [7] Tracking the articulated motion of the human body with two RGBD cameras
    Michel, Damien
    Panagiotakis, Costas
    Argyros, Antonis A.
    MACHINE VISION AND APPLICATIONS, 2015, 26 (01) : 41 - 54
  • [8] REAL-TIME HUMAN DETECTION AND TRACKING IN COMPLEX ENVIRONMENTS USING SINGLE RGBD CAMERA
    Liu, Jun
    Liu, Ye
    Cui, Ying
    Chen, Yan Qiu
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 3088 - 3092
  • [9] Tracking the allocation of attention using human pupillary oscillations
    Naber, Marnix
    Alvarez, George A.
    Nakayama, Ken
    FRONTIERS IN PSYCHOLOGY, 2013, 4
  • [10] Tracking Control For Wheeled Mobile Robot Using RGBD Sensor
    Fareh, Raouf
    Rabie, Tamer
    Baziyad, Mohammed
    2017 4TH INTERNATIONAL CONFERENCE ON CONTROL, DECISION AND INFORMATION TECHNOLOGIES (CODIT), 2017, : 668 - 673