Going deeper with two-stream ConvNets for action recognition in video surveillance

被引:53
|
作者
Han, Yamin [1 ]
Zhang, Peng [1 ]
Zhuo, Tao [2 ]
Huang, Wei [3 ]
Zhang, Yanning [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Shaanxi, Peoples R China
[2] Natl Univ Singapore, Sensor Enhanced Social Media SeSaMe Ctr, Singapore, Singapore
[3] Nanchang Univ, Sch Informat Engn, Nanchang, Jiangxi, Peoples R China
基金
中国国家自然科学基金; 新加坡国家研究基金会;
关键词
Deeper; Two-stream; ConvNets; Action recognition; Video surveillance;
D O I
10.1016/j.patrec.2017.08.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning by deep convolutional networks have shown an outstanding effectiveness in a variety of vision based classification tasks, and for which, large datasets are the prerequisites to guarantee its high performance. But in many realistic circumstances, using a massive quantity of training samples to achieve more sophisticated analysis is hard to be fulfilled always, such as human action recognition in videos, and the resulting problem of data deficiency, especially for the labeled data, would critically limit the deeper model structure as a promising solution due to its high risk of overfitting. Additionally, in lacking of high modeling capacity constrained by of model depth, the high-level visual cues like object interaction, scene context and pose variations concurrent with human action also could become the extrinsic and intrinsic challenges for the traditional deep convolutional networks. For the limitations above, in this paper, we proposed a strategy of dataset remodeling by transferring parameters of ResNet-101 layers trained on the ImageNet dataset to initialize learning model and adopt an augmented data variation approach to overcome the overfitting challenge of sample deficiency. For model structure improvement, a novel deeper two-stream ConvNets has been designed for the learning of action complexity. With a dis-order strategy of training/testing video sets, the proposed model and learning strategy are able to collaboratively achieve a significant improvement of action recognition. Experiments on two challenging datasets UCF101 and KTH have verified a superior performance in comparison with other state-of-the-art methods. (C) 2017 Published by Elsevier B.V.
引用
收藏
页码:83 / 90
页数:8
相关论文
共 50 条
  • [1] Two-Stream Gated Fusion ConvNets for Action Recognition
    Zhu, Jiagang
    Zou, Wei
    Zhu, Zheng
    [J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 597 - 602
  • [2] SALIENCY-CONTEXT TWO-STREAM CONVNETS FOR ACTION RECOGNITION
    Chen, Quan-Qi
    Liu, Feng
    Li, Xue
    Liu, Bao-Di
    Zhang, Yu-Jin
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 3076 - 3080
  • [3] Pairwise Two-Stream ConvNets for Cross-Domain Action Recognition With Small Data
    Gao, Zan
    Guo, Leming
    Ren, Tongwei
    Liu, An-An
    Cheng, Zhi-Yong
    Chen, Shengyong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (03) : 1147 - 1161
  • [4] Two-Stream Convolutional Neural Network for Video Action Recognition
    Qiao, Han
    Liu, Shuang
    Xu, Qingzhen
    Liu, Shouqiang
    Yang, Wanggan
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (10): : 3668 - 3684
  • [5] Convolutional Two-Stream Network Fusion for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Zisserman, Andrew
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
  • [6] Two-stream Graph Attention Convolutional for Video Action Recognition
    Zhang, Deyuan
    Gao, Hongwei
    Dai, Hailong
    Shi, Xiangbin
    [J]. 2021 IEEE 15TH INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING (BIGDATASE 2021), 2021, : 23 - 27
  • [7] Two-Stream Convolution Neural Network with Video-stream for Action Recognition
    Dai, Wei
    Chen, Yimin
    Huang, Chen
    Gao, Ming-Ke
    Zhang, Xinyu
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [8] Semi-Coupled Two-Stream Fusion ConvNets for Action Recognition at Extremely Low Resolutions
    Chen, Jiawei
    Wu, Jonathan
    Konrad, Janusz
    Ishwar, Prakash
    [J]. 2017 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2017), 2017, : 139 - 147
  • [9] TBRNet: Two-Stream BiLSTM Residual Network for Video Action Recognition
    Wu, Xiao
    Ji, Qingge
    [J]. ALGORITHMS, 2020, 13 (07) : 1 - 21
  • [10] Two-Stream Action Recognition-Oriented Video Super-Resolution
    Zhang, Haochen
    Liu, Dong
    Xiong, Zhiwei
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8798 - 8807