Action recognition for depth video using multi-view dynamic images

被引:76
|
作者
Xiao, Yang [1 ]
Chen, Jun [1 ]
Wang, Yancheng [1 ]
Cao, Zhiguo [1 ]
Zhou, Joey Tianyi [2 ]
Bai, Xiang [3 ]
机构
[1] Huazhong Univ Sci & Technol, Natl Key Lab Sci & Technol Multispectral Informa, Wuhan 430074, Hubei, Peoples R China
[2] ASTAR, Inst High Performance Comp, Singapore, Singapore
[3] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Hubei, Peoples R China
基金
国家重点研发计划;
关键词
Action recognition; Depth video; Multi-view dynamic image; Convolutional neural network; Action proposal;
D O I
10.1016/j.ins.2018.12.050
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dynamic imaging is a recently proposed action description paradigm for simultaneously capturing motion and temporal evolution information, particularly in the context of deep convolutional neural networks (CNNs). Compared with optical flow for motion characterization, dynamic imaging exhibits superior efficiency and compactness. Inspired by the success of dynamic imaging in RGB video, this study extends it to the depth domain. To better exploit three-dimensional (3D) characteristics, multi-view dynamic images are proposed. In particular, the raw depth video is densely projected with respect to different virtual imaging viewpoints by rotating the virtual camera within the 3D space. Subsequently, dynamic images are extracted from the obtained multi-view depth videos and multi-view dynamic images are thus constructed from these images. Accordingly, more view-tolerant visual cues can be involved. A novel CNN model is then proposed to perform feature learning on multi-view dynamic images. Particularly, the dynamic images from different views share the same convolutional layers but correspond to different fully connected layers. This is aimed at enhancing the tuning effectiveness on shallow convolutional layers by alleviating the gradient vanishing problem. Moreover, as the spatial occurrence variation of the actions may impair the CNN, an action proposal approach is also put forth. In experiments, the proposed approach can achieve state-of-the-art performance on three challenging datasets. (C) 2018 Elsevier Inc. All rights reserved.
引用
下载
收藏
页码:287 / 304
页数:18
相关论文
共 50 条
  • [1] A framework for multi-view video coding using layered depth images
    Yoon, SU
    Lee, EK
    Kim, SY
    Ho, YS
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2005, PT 1, 2005, 3767 : 431 - 442
  • [2] Multi-view depth video coding using depth view synthesis
    Na, Sang-Tae
    Oh, Kwan-Jung
    Lee, Cheon
    Ho, Yo-Sung
    PROCEEDINGS OF 2008 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-10, 2008, : 1400 - 1403
  • [3] Generation of layered depth images from multi-view video
    Cheng, Xiaoyu
    Sun, Lifeng
    Yang, Shiqiang
    2007 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-7, 2007, : 2477 - 2480
  • [4] Video Action Recognition Using Motion and Multi-View Excitation with Temporal Aggregation
    Joefrie, Yuri Yudhaswana
    Aono, Masaki
    ENTROPY, 2022, 24 (11)
  • [5] Unsupervised video segmentation for multi-view daily action recognition
    Liu, Zhigang
    Wu, Yin
    Yin, Ziyang
    Gao, Chunlei
    IMAGE AND VISION COMPUTING, 2023, 134
  • [6] Multi-view representation learning for multi-view action recognition
    Hao, Tong
    Wu, Dan
    Wang, Qian
    Sun, Jin-Sheng
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2017, 48 : 453 - 460
  • [7] Multi-View Hierarchical Bidirectional Recurrent Neural Network for Depth Video Sequence Based Action Recognition
    Liu, Xueping
    Li, Yibo
    Wang, Qingjun
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2018, 32 (10)
  • [8] On Multi-View Face Recognition Using Lytro Images
    Chiesa, Valeria
    Dugelay, Jean-Luc
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2250 - 2254
  • [9] Multi-View Action Recognition using Contrastive Learning
    Shah, Ketul
    Shah, Anshul
    Lau, Chun Pong
    de Melo, Celso M.
    Chellappa, Rama
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3370 - 3380
  • [10] Silhouette-Based Multi-View Human Action Recognition in Video
    Aryanfar, Alihossein
    Yaakob, Razali
    Halin, Alfian Abdul
    Sulaiman, Md Nasir
    Kasmiran, Khairul Azhar
    2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND TECHNOLOGY (ICCST), 2014,