Video-level Multi-model Fusion for Action Recognition

被引:3
|
作者
Wang, Xiaomin [1 ]
Zhang, Junsan [1 ]
Wang, Leiquan [1 ]
Yu, Philip S. [2 ]
Zhu, Jie [3 ]
Li, Haisheng [4 ]
机构
[1] China Univ Petr EastChina, Coll Comp Sci & Technol, Qingdao, Shandong, Peoples R China
[2] Univ Illinois, Dept Comp Sci, Chicago, IL 60680 USA
[3] Natl Police Univ Criminal Justice, Dept Informat Management, Hangzhou, Peoples R China
[4] Beijing Technol & Business Univ, Beijing Key Lab Big Data Technol Food Safety, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
action recognition; video-leval recognition; 3D convolution; multi-model fusion;
D O I
10.1145/3357384.3357935
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The approaches based on spatio-temporal features for video action recognition have emerged such as two-stream based methods and 3D convolution based methods. However, current methods suffer from the problems caused by partial observation, or restricted to single information modeling, and so on. Segment-level recognition results obtained from dense sampling can not represent the entire video and, therefore lead to partial observation. And a single model is hard to capture the complementary information on spacial, temporal and spatio-temporal information from video at the same time. Therefore, the challenge is to build the video-level representation and capture multiple information. In this paper, a video-level multi-model fusion action recognition method is proposed to solve these problems. Firstly, an efficient video-level 3D convolution model is proposed to get the global information in the video which assembling segment-level 3D convolution models. Secondly, a multi-model fusion architecture is proposed for video action recognition to capture multiple information. The spatial, temporal and spatio-temporal information are aggregate with SVM classifier. Experimental results show that this method achieves the state-of-the-art performance on the datasets of UCF-101(97.6%) without pre-training on Kinetics.
引用
收藏
页码:159 / 168
页数:10
相关论文
共 50 条
  • [1] Complete Video-Level Representations for Action Recognition
    Li, Min
    Bai, Ruwen
    Meng, Bo
    Ren, Junxing
    Jiang, Miao
    Yang, Yang
    Li, Linghan
    Du, Hong
    IEEE ACCESS, 2021, 9 : 92134 - 92142
  • [2] Consistent constraint-based video-level learning for action recognition
    Shi, Qinghongya
    Zhang, Hong-Bo
    Ren, Hao-Tian
    Du, Ji-Xiang
    Lei, Qing
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2020, 2020 (01)
  • [3] End-to-end Video-level Representation Learning for Action Recognition
    Zhu, Jiagang
    Zhu, Zheng
    Zou, Wei
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 645 - 650
  • [4] Consistent constraint-based video-level learning for action recognition
    Qinghongya Shi
    Hong-Bo Zhang
    Hao-Tian Ren
    Ji-Xiang Du
    Qing Lei
    EURASIP Journal on Image and Video Processing, 2020
  • [5] Action Recognition Based on Multi-model Voting with Cross Layer Fusion
    Luo Huilan
    Lu Fei
    Yan Yuan
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2019, 41 (03) : 649 - 655
  • [6] Lane Change Intention Recognition Based on Multi-Model Fusion
    Fang, Yijie
    Liao, Zhuhua
    Huang, Haokai
    Li, Yanjun
    Computer Engineering and Applications, 2024, 60 (02) : 344 - 352
  • [7] Abnormal gesture recognition based on multi-model fusion strategy
    Lin, Chi
    Lin, Xuxin
    Xie, Yiliang
    Liang, Yanyan
    MACHINE VISION AND APPLICATIONS, 2019, 30 (05) : 889 - 900
  • [8] Abnormal gesture recognition based on multi-model fusion strategy
    Chi Lin
    Xuxin Lin
    Yiliang Xie
    Yanyan Liang
    Machine Vision and Applications, 2019, 30 : 889 - 900
  • [9] Two Stage Emotion Recognition using Frame-level and Video-level Features
    Viegas, Carla
    2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020), 2020, : 912 - 915
  • [10] An embedded multi-model video encoder
    Meng Qinglei
    Jiang Li
    Li Wei
    2006 CHINESE CONTROL CONFERENCE, VOLS 1-5, 2006, : 938 - +