Multiple depth-levels features fusion enhanced network for action recognition

被引:3
|
作者
Wang, Shengquan [1 ]
Kong, Jun [1 ]
Jiang, Min [1 ]
Liu, Tianshan [2 ]
机构
[1] Jiangnan Univ, Jiangsu Prov Engn Lab Pattern Recognit & Computat, Wuxi 214122, Jiangsu, Peoples R China
[2] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong 999077, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Action recognition; Two-stream; Multiple depth-levels features fusion; Group-wise spatial-channel enhance; HISTOGRAMS;
D O I
10.1016/j.jvcir.2020.102929
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As a challenging task of video classification, action recognition has become a significant topic of computer vision community. The most popular methods based on two-stream architecture up to now are still simply fusing the prediction scores of each stream. In that case, the complementary characteristics of two streams cannot be fully utilized and the effect of shallower features is often overlooked. In addition, the equal treatment to features may weaken the role of the feature contributing significantly to the classification. Accordingly, a novel network called Multiple Depth-levels Features Fusion Enhanced Network (MDFFEN) is proposed. It improves on two aspects of two-stream architecture. In terms of the two-stream interaction mechanism, multiple depth-levels features fusion (MDFF) is formed to aggregate spatial-temporal features extracted from several sub-modules of original two streams by spatial-temporal features fusion (STFF). And with respect to further refining the spatiotemporal features, we propose a group-wise spatial-channel enhance (GSCE) module to highlight the meaningful regions and expressive channels automatically by priority assignment. The competitive results are achieved after we validate MDFFEN on three public challenging action recognition datasets, HDMB51, UCF101 and ChaLearn LAP IsoGD.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Spatiotemporal attention enhanced features fusion network for action recognition
    Zhuang, Danfeng
    Jiang, Min
    Kong, Jun
    Liu, Tianshan
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (03) : 823 - 841
  • [2] Spatiotemporal attention enhanced features fusion network for action recognition
    Danfeng Zhuang
    Min Jiang
    Jun Kong
    Tianshan Liu
    [J]. International Journal of Machine Learning and Cybernetics, 2021, 12 : 823 - 841
  • [3] Fusing Multiple Features for Depth-Based Action Recognition
    Zhu, Yu
    Chen, Wenbin
    Guo, Guodong
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2015, 6 (02)
  • [4] DMMs-Based Multiple Features Fusion for Human Action Recognition
    Bulbul, Mohammad Farhad
    Jiang, Yunsheng
    Ma, Jinwen
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2015, 6 (04): : 23 - 39
  • [5] Diverse Features Fusion Network for video-based action recognition
    Deng, Haoyang
    Kong, Jun
    Jiang, Min
    Liu, Tianshan
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 77
  • [6] Symmetrical Enhanced Fusion Network for Skeleton-Based Action Recognition
    Kong, Jun
    Deng, Haoyang
    Jiang, Min
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (11) : 4394 - 4408
  • [7] Deep learning network model based on fusion of spatiotemporal features for action recognition
    Yang, Ge
    Zou, Wu-xing
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (07) : 9875 - 9896
  • [8] A deep multimodal network based on bottleneck layer features fusion for action recognition
    Singh, Tej
    Vishwakarma, Dinesh Kumar
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (24) : 33505 - 33525
  • [9] Hybrid features for skeleton-based action recognition based on network fusion
    Chen, Zhangmeng
    Pan, Junjun
    Yang, Xiaosong
    Qin, Hong
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2020, 31 (4-5)
  • [10] A deep multimodal network based on bottleneck layer features fusion for action recognition
    Tej Singh
    Dinesh Kumar Vishwakarma
    [J]. Multimedia Tools and Applications, 2021, 80 : 33505 - 33525