Deep Attention Models for Human Tracking Using RGBD

被引：12

作者：

Rasoulidanesh, Maryamsadat ^{[1
]}

Yadav, Srishti ^{[1
]}

Herath, Sachini ^{[2
]}

Vaghei, Yasaman ^{[3
]}

Payandeh, Shahram ^{[1
]}

机构：

[1] Simon Fraser Univ, Sch Engn Sci, Networked Robot & Sensing Lab, Burnaby, BC V5A 1S6, Canada

[2] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada

[3] Simon Fraser Univ, Sch Mechatron Syst Engn, Burnaby, BC V5A 1S6, Canada

来源：

SENSORS | 2019年 / 19卷 / 04期

关键词：

computer vision; visual tracking; attention model; RGBD; Kinect; deep network; convolutional neural network; Long Short-Term Memory; DEPTH;

D O I：

10.3390/s19040750

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Visual tracking performance has long been limited by the lack of better appearance models. These models fail either where they tend to change rapidly, like in motion-based tracking, or where accurate information of the object may not be available, like in color camouflage (where background and foreground colors are similar). This paper proposes a robust, adaptive appearance model which works accurately in situations of color camouflage, even in the presence of complex natural objects. The proposed model includes depth as an additional feature in a hierarchical modular neural framework for online object tracking. The model adapts to the confusing appearance by identifying the stable property of depth between the target and the surrounding object(s). The depth complements the existing RGB features in scenarios when RGB features fail to adapt, hence becoming unstable over a long duration of time. The parameters of the model are learned efficiently in the Deep network, which consists of three modules: (1) The spatial attention layer, which discards the majority of the background by selecting a region containing the object of interest; (2) the appearance attention layer, which extracts appearance and spatial information about the tracked object; and (3) the state estimation layer, which enables the framework to predict future object appearance and location. Three different models were trained and tested to analyze the effect of depth along with RGB information. Also, a model is proposed to utilize only depth as a standalone input for tracking purposes. The proposed models were also evaluated in real-time using KinectV2 and showed very promising results. The results of our proposed network structures and their comparison with the state-of-the-art RGB tracking model demonstrate that adding depth significantly improves the accuracy of tracking in a more challenging environment (i.e., cluttered and camouflaged environments). Furthermore, the results of depth-based models showed that depth data can provide enough information for accurate tracking, even without RGB information.

引用

页数：14

共 50 条

[31] Attention, please! A survey of neural attention models in deep learning
Alana de Santana Correia
Esther Luna Colombini
Artificial Intelligence Review, 2022, 55 : 6037 - 6124
[32] Resource-Efficient RGBD Aerial Tracking
Yang, Jinyu
Gao, Shang
Li, Zhe
Zheng, Feng
Leonardis, Ales
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13374 - 13383
[33] An Ensemble of Complementary Models for Deep Tracking
Qiuyu Kong
Jin Tang
Chenglong Li
Xin Wang
Jian Zhang
Cognitive Computation, 2022, 14 : 1096 - 1106
[34] An Ensemble of Complementary Models for Deep Tracking
Kong, Qiuyu
Tang, Jin
Li, Chenglong
Wang, Xin
Zhang, Jian
COGNITIVE COMPUTATION, 2022, 14 (03) : 1096 - 1106
[35] An Automatic Human Fall Detection Approach Using RGBD Cameras
Zhang, Shugang
Li, Zhen
Wei, Zhiqiang
Wang, Shuang
PROCEEDINGS OF 2016 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2016, : 781 - 784
[36] Automatic Visual Attention Detection for Mobile Eye Tracking Using Pre-Trained Computer Vision Models and Human Gaze
Barz, Michael
Sonntag, Daniel
SENSORS, 2021, 21 (12)
[37] Medical Text Classification Using Hybrid Deep Learning Models with Multihead Attention
Prabhakar, Sunil Kumar
Won, Dong-Ok
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
[38] Enhancing VR Gaming Experience using Computational Attention Models and Eye-Tracking
Ennadifi, Elias
Ravet, Thierry
Mancas, Matei
Mokhtari, Mohammed El Amine
Gosselin, Bernard
PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON INTERACTIVE MEDIA EXPERIENCES, IMX 2023, 2023, : 194 - 198
[39] Deep Learning for Human Visual Attention Recognition Using Transfer Learning
Nam Vu Hoai
Huong Nguyen Mai
Cuong Pham
2018 INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR COMMUNICATIONS (ATC), 2018, : 42 - 46
[40] A Seamless Deep Learning Approach for Apple Detection, Depth Estimation, and Tracking Using YOLO Models Enhanced by Multi-Head Attention Mechanism
Sekharamantry, Praveen Kumar
Melgani, Farid
Malacarne, Jonni
Ricci, Riccardo
de Almeida Silva, Rodrigo
Marcato Jr, Jose
COMPUTERS, 2024, 13 (03)

← 1 2 3 4 5 →