Learning multi-temporal-scale deep information for action recognition

被引:25
|
作者
Yao, Guangle [1 ,2 ,3 ]
Lei, Tao [1 ]
Zhong, Jiandan [1 ,2 ,3 ]
Jiang, Ping [1 ]
机构
[1] Chinese Acad Sci, Inst Opt & Elect, Chengdu, Sichuan, Peoples R China
[2] Univ Elect Sci & Technol China, Chengdu, Sichuan, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
关键词
Action recognition; Convolutional neural networks; Deep learning; Spatiotemporal information; HISTOGRAMS; NETWORKS;
D O I
10.1007/s10489-018-1347-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition in video is widely applied in video indexing, intelligent surveillance, multimedia understanding, and other fields. A typical human action contains the spatiotemporal information from various scales. Learning and fusing the multi-temporal-scale information make action recognition more reliable in terms of recognition accuracy. To demonstrate this argument, in this paper, we use Res3D, a 3D Convolution Neural Network (CNN) architecture, to extract information in multiple temporal scales. And in each temporal scale, we transfer the knowledge learned from RGB to 3-channel optical flow (OF) and learn information from RGB and OF fields. We also propose Parallel Pair Discriminant Correlation Analysis (PPDCA) to fuse the multi-temporal-scale information into action representation with a lower dimension. Experimental results show that compared with single-temporal-scale method, the proposed multi-temporal-scale method gains higher recognition accuracy, and spends more time on feature extraction, but less time on classification due to the representation with lower dimension. Moreover, the proposed method achieves recognition performance comparable to that of the state-of-the-art methods. The source code and 3D filter animations are available online: https://github.com/JerryYaoGl/multi-temporal-scale.
引用
收藏
页码:2017 / 2029
页数:13
相关论文
共 50 条
  • [21] Action Recognition in Radio Signals Based on Multi-Scale Deep Features
    Hao, Xiaojun
    Xu, Guangying
    Ma, Hongbin
    Yang, Shuyuan
    TENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2018), 2019, 11069
  • [22] Deep Temporal Feature Encoding for Action Recognition
    Li, Lin
    Zhang, Zhaoxiang
    Huang, Yan
    Wang, Liang
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 1109 - 1114
  • [23] SPATIO-TEMPORAL MULTI-SCALE SOFT QUANTIZATION LEARNING FOR SKELETON-BASED HUMAN ACTION RECOGNITION
    Yang, Jianyu
    Zhu, Chen
    Yuan, Junsong
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1078 - 1083
  • [24] DEEP TEMPORAL PYRAMID DESIGN FOR ACTION RECOGNITION
    Mazari, Ahmed
    Sahbi, Hichem
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2077 - 2081
  • [25] Multi-scale Spatiotemporal Information Fusion Network for Video Action Recognition
    Cai, Yutong
    Lin, Weiyao
    See, John
    Cheng, Ming-Ming
    Liu, Guangcan
    Xiong, Hongkai
    2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP), 2018,
  • [26] Temporal Order Information for Complex Action Recognition
    Liu, Fang
    Xu, Xiangmin
    Ling, Chunmei
    2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-CHINA (ICCE-CHINA), 2016,
  • [27] Deep Learning for Human Action Recognition
    Shekokar, R. U.
    Kale, S. N.
    2021 6TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2021,
  • [28] MTT: Multi-Scale Temporal Transformer for Skeleton-Based Action Recognition
    Kong, Jun
    Bian, Yuhang
    Jiang, Min
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 528 - 532
  • [29] MULTI-SCALE TEMPORAL FEATURE FUSION FOR FEW-SHOT ACTION RECOGNITION
    Lee, Jun-Tae
    Yun, Sungrack
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1785 - 1789
  • [30] Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain
    Husain, Farzad
    Dellen, Babette
    Torras, Carme
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2016, 1 (02): : 984 - 991