A spatiotemporal attention-based ResC3D model for large-scale gesture recognition

被引:16
|
作者
Li, Yunan [1 ,2 ]
Miao, Qiguang [1 ,2 ]
Qi, Xiangda [1 ,2 ]
Ma, Zhenxin [1 ,2 ]
Ouyang, Wanli [3 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian, Shaanxi, Peoples R China
[2] Xian Key Lab Big Data & Intelligent Vis, Xian, Shaanxi, Peoples R China
[3] Univ Sydney, Sch Elect & Informat Engn, Sydney, NSW, Australia
基金
国家重点研发计划;
关键词
Gesture recognition; Spatiotemporal attention mechanism; ResC3D model; BEHAVIOR DETECTION; OPTICAL-FLOW; FUSION; SCENES;
D O I
10.1007/s00138-018-0996-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Abnormal gesture recognition has many applications in the fields of visual surveillance, crowd behavior analysis, and sensitive video content detection. However, the recognition of dynamic gestures with large-scale videos remains a challenging task due to the barriers of gesture-irrelevant factors like the variations in illumination, movement path, and background. In this paper, we propose a spatiotemporal attention-based ResC3D model for abnormal gesture recognition with large-scale videos. One key idea is to find a compact and effective representation of the gesture in both spatial and temporal contexts. To eliminate the influence of gesture-irrelevant factors, we first employ the enhancement techniques such as Retinex and hybrid median filer to improve the quality of RGB and depth inputs. Then, we design a spatiotemporal attention scheme to focus on the most valuable cues related to the moving parts for the gesture. Upon these representations, a ResC3D network, which leverages the advantages of both residual network and C3D model, is developed to extract features, together with a canonical correlation analysis-based fusion scheme for blending features from different modalities. The performance of our method is evaluated on the Chalearn IsoGD Dataset. Experiments demonstrate the effectiveness of each module of our method and show the ultimate accuracy reaches 68.14%, which outperforms other state-of-the-art methods, including our basic work in 2017 Chalearn Looking at People Workshop of ICCV.
引用
收藏
页码:875 / 888
页数:14
相关论文
共 50 条
  • [1] A spatiotemporal attention-based ResC3D model for large-scale gesture recognition
    Yunan Li
    Qiguang Miao
    Xiangda Qi
    Zhenxin Ma
    Wanli Ouyang
    [J]. Machine Vision and Applications, 2019, 30 : 875 - 888
  • [2] Multimodal Gesture Recognition Based on the ResC3D Network
    Miao, Qiguang
    Li, Yunan
    Ouyang, Wanli
    Ma, Zhenxin
    Xu, Xin
    Shi, Weikang
    Cao, Xiaochun
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 3047 - 3055
  • [3] Large-scale Gesture Recognition with a Fusion of RGB-D Data Based on the C3D model
    Li, Yunan
    Miao, Qiguang
    Tian, Kuan
    Fan, Yingying
    Xu, Xin
    Li, Rui
    Song, Jianfeng
    [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 25 - 30
  • [4] A Large-Scale Database and a CNN Model for Attention-Based Glaucoma Detection
    Li, Liu
    Xu, Mai
    Liu, Hanruo
    Li, Yang
    Wang, Xiaofei
    Jiang, Lai
    Wang, Zulin
    Fan, Xiang
    Wang, Ningli
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (02) : 413 - 424
  • [5] Hierarchical Attention-Based Astronaut Gesture Recognition: A Dataset and CNN Model
    Gu Lingyun
    Zhang Lin
    Wang Zhaokui
    [J]. IEEE ACCESS, 2020, 8 (08): : 68787 - 68798
  • [6] Large-Scale Gesture Recognition With a Fusion of RGB-D Data Based on Saliency Theory and C3D Model
    Li, Yunan
    Miao, Qiguang
    Tian, Kuan
    Fan, Yingying
    Xu, Xin
    Li, Rui
    Song, Jianfeng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (10) : 2956 - 2964
  • [7] Large-scale gesture recognition with a fusion of RGB-D data based on optical flow and the C3D model
    Li, Yunan
    Miao, Qiguang
    Tian, Kuan
    Fan, Yingying
    Xu, Xin
    Ma, Zhenxin
    Song, Jianfeng
    [J]. PATTERN RECOGNITION LETTERS, 2019, 119 : 187 - 194
  • [8] Attention-Based Video Hashing for Large-Scale Video Retrieval
    Wang, Yingxin
    Nie, Xiushan
    Shi, Yang
    Zhou, Xin
    Yin, Yilong
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (03) : 491 - 502
  • [9] Attention-Based Gated Recurrent Unit for Gesture Recognition
    Khodabandelou, Ghazaleh
    Jung, Pyeong-Gook
    Amirat, Yacine
    Mohammed, Samer
    [J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2021, 18 (02) : 495 - 507
  • [10] 3D Gesture Analysis Using a Large-Scale Gesture Database
    Yousefi, Shahrouz
    Li, Haibo
    Liu, Li
    [J]. ADVANCES IN VISUAL COMPUTING (ISVC 2014), PT 1, 2014, 8887 : 206 - 217