Action recognition for depth video using multi-view dynamic images

被引:76
|
作者
Xiao, Yang [1 ]
Chen, Jun [1 ]
Wang, Yancheng [1 ]
Cao, Zhiguo [1 ]
Zhou, Joey Tianyi [2 ]
Bai, Xiang [3 ]
机构
[1] Huazhong Univ Sci & Technol, Natl Key Lab Sci & Technol Multispectral Informa, Wuhan 430074, Hubei, Peoples R China
[2] ASTAR, Inst High Performance Comp, Singapore, Singapore
[3] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Hubei, Peoples R China
基金
国家重点研发计划;
关键词
Action recognition; Depth video; Multi-view dynamic image; Convolutional neural network; Action proposal;
D O I
10.1016/j.ins.2018.12.050
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dynamic imaging is a recently proposed action description paradigm for simultaneously capturing motion and temporal evolution information, particularly in the context of deep convolutional neural networks (CNNs). Compared with optical flow for motion characterization, dynamic imaging exhibits superior efficiency and compactness. Inspired by the success of dynamic imaging in RGB video, this study extends it to the depth domain. To better exploit three-dimensional (3D) characteristics, multi-view dynamic images are proposed. In particular, the raw depth video is densely projected with respect to different virtual imaging viewpoints by rotating the virtual camera within the 3D space. Subsequently, dynamic images are extracted from the obtained multi-view depth videos and multi-view dynamic images are thus constructed from these images. Accordingly, more view-tolerant visual cues can be involved. A novel CNN model is then proposed to perform feature learning on multi-view dynamic images. Particularly, the dynamic images from different views share the same convolutional layers but correspond to different fully connected layers. This is aimed at enhancing the tuning effectiveness on shallow convolutional layers by alleviating the gradient vanishing problem. Moreover, as the spatial occurrence variation of the actions may impair the CNN, an action proposal approach is also put forth. In experiments, the proposed approach can achieve state-of-the-art performance on three challenging datasets. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:287 / 304
页数:18
相关论文
共 50 条
  • [21] Multi-view video plus depth representation and coding
    Merkle, Philipp
    Smolic, Aljoscha
    Mueller, Karsten
    Wiegand, Thomas
    2007 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-7, 2007, : 201 - 204
  • [22] A DEPTH REFINEMENT ALGORITHM FOR MULTI-VIEW VIDEO SYNTHESIS
    Shih, Hsin-Chia
    Hsiao, Hsu-Feng
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 742 - 745
  • [23] Distributed compressed video sensing of multi-view images using ADMM
    Sumi, Taichi
    Nakamura, Ikumi
    Kuroki, Yoshimitsu
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [24] DVANet: Disentangling View and Action Features for Multi-View Action Recognition
    Siddiqui, Nyle
    Tirupattur, Praveen
    Shah, Mubarak
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4873 - 4881
  • [25] Multi-view human action recognition: A survey
    Iosifidis, Alexandros
    Tefas, Anastasios
    Pitas, Ioannis
    2013 NINTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2013), 2013, : 522 - 525
  • [26] Multi-View Super Vector for Action Recognition
    Cai, Zhuowei
    Wang, Limin
    Peng, Xiaojiang
    Qiao, Yu
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 596 - 603
  • [27] Continuous Multi-View Human Action Recognition
    Wang, Qiang
    Sun, Gan
    Dong, Jiahua
    Wang, Qianqian
    Ding, Zhengming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) : 3603 - 3614
  • [28] Generative Multi-View Human Action Recognition
    Wang, Lichen
    Ding, Zhengming
    Tao, Zhiqiang
    Liu, Yunyu
    Fu, Yun
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6221 - 6230
  • [29] Global depth estimation for multi-view video coding using camera parameters
    Zhang, Xiaoyun
    Zhu, Weile
    Yang, George
    VISAPP 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2008, : 631 - +
  • [30] FREE VIEWPOINT VIDEO SYNTHESIS USING MULTI-VIEW DEPTH AND COLOR CAMERAS
    Matsumoto, Kazuki
    Song, Chiyoung
    de Sorbier, Francois
    Saito, Hideo
    2013 IEEE 11TH IVMSP WORKSHOP: 3D IMAGE/VIDEO TECHNOLOGIES AND APPLICATIONS (IVMSP 2013), 2013,