Video-level Multi-model Fusion for Action Recognition

被引:3
|
作者
Wang, Xiaomin [1 ]
Zhang, Junsan [1 ]
Wang, Leiquan [1 ]
Yu, Philip S. [2 ]
Zhu, Jie [3 ]
Li, Haisheng [4 ]
机构
[1] China Univ Petr EastChina, Coll Comp Sci & Technol, Qingdao, Shandong, Peoples R China
[2] Univ Illinois, Dept Comp Sci, Chicago, IL 60680 USA
[3] Natl Police Univ Criminal Justice, Dept Informat Management, Hangzhou, Peoples R China
[4] Beijing Technol & Business Univ, Beijing Key Lab Big Data Technol Food Safety, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
action recognition; video-leval recognition; 3D convolution; multi-model fusion;
D O I
10.1145/3357384.3357935
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The approaches based on spatio-temporal features for video action recognition have emerged such as two-stream based methods and 3D convolution based methods. However, current methods suffer from the problems caused by partial observation, or restricted to single information modeling, and so on. Segment-level recognition results obtained from dense sampling can not represent the entire video and, therefore lead to partial observation. And a single model is hard to capture the complementary information on spacial, temporal and spatio-temporal information from video at the same time. Therefore, the challenge is to build the video-level representation and capture multiple information. In this paper, a video-level multi-model fusion action recognition method is proposed to solve these problems. Firstly, an efficient video-level 3D convolution model is proposed to get the global information in the video which assembling segment-level 3D convolution models. Secondly, a multi-model fusion architecture is proposed for video action recognition to capture multiple information. The spatial, temporal and spatio-temporal information are aggregate with SVM classifier. Experimental results show that this method achieves the state-of-the-art performance on the datasets of UCF-101(97.6%) without pre-training on Kinetics.
引用
收藏
页码:159 / 168
页数:10
相关论文
共 50 条
  • [21] Multi-scale Spatiotemporal Information Fusion Network for Video Action Recognition
    Cai, Yutong
    Lin, Weiyao
    See, John
    Cheng, Ming-Ming
    Liu, Guangcan
    Xiong, Hongkai
    2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP), 2018,
  • [22] Learning frame-level affinity with video-level labels for weakly supervised temporal action detection
    Li, Bairong
    Zhu, Yuesheng
    Liu, Ruixin
    Weng, Zhenyu
    NEUROCOMPUTING, 2021, 463 : 109 - 121
  • [23] Deep Spatiotemporal Relation Learning With 3D Multi-Level Dense Fusion for Video Action Recognition
    Zhang, Junxuan
    Hu, Haifeng
    IEEE ACCESS, 2019, 7 : 15222 - 15229
  • [24] Video-level Violence Rating with Rank Prediction
    Wang, Yu
    Kato, Jien
    Proceedings 3rd IAPR Asian Conference on Pattern Recognition ACPR 2015, 2015, : 71 - 75
  • [25] Spatiotemporal Fusion Networks for Video Action Recognition
    Liu, Zheng
    Hu, Haifeng
    Zhang, Junxuan
    NEURAL PROCESSING LETTERS, 2019, 50 (02) : 1877 - 1890
  • [26] Spatiotemporal Fusion Networks for Video Action Recognition
    Zheng Liu
    Haifeng Hu
    Junxuan Zhang
    Neural Processing Letters, 2019, 50 : 1877 - 1890
  • [27] Deep Fusion Module for Video Action Recognition
    Li, Yunyao
    Zheng, Zihao
    Zhou, Mingliang
    Yang, Guangchao
    Wei, Xuekai
    Pu, Huayan
    Luo, Jun
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2024, 33 (14)
  • [28] Skin Disease Recognition Method Based on Multi-Model Fusion of Convolutional Neural Network
    Xu M.
    Guo L.
    Song P.
    Chi Y.
    Du S.
    Geng S.
    Zhang Y.
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2019, 53 (11): : 125 - 130
  • [29] Video-level View Classification in Focused Cardiac Ultrasound
    Rodrigues, Catarina
    Malainho, Barbara
    Claudia Tonelli, Ana
    Santanche, Andre
    Carvalho-Filho, Marco A.
    Correia-Pinto, Jorge
    Pereira, Vitor H.
    Fonseca, Jaime C.
    Queiros, Sandro
    MEDICAL IMAGING 2024: ULTRASONIC IMAGING AND TOMOGRAPHY, 2024, 12932
  • [30] Multi-model hybrid ensemble weighted adaptive approach with decision level fusion for personalized affect recognition based on visual cues
    Jadhav, Nagesh
    Sugandhi, Rekha
    BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2021, 69 (06)