Audio and video feature fusion for activity recognition in unconstrained videos

被引:0
|
作者
Lopes, Jose [1 ]
Singh, Sameer [1 ]
机构
[1] Univ Loughborough, Res Sch Informat, Loughborough LE11 3TU, Leics, England
来源
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2006, PROCEEDINGS | 2006年 / 4224卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Combining audio and image processing for understanding video content has several benefits when compared to using each modality on their own. For the task of context and activity recognition in video sequences, it is important to explore both data streams to gather relevant information. In this paper we describe a video context and activity recognition model. Our work extracts a range of audio and visual features, followed by feature reduction and information fusion. We show that combining audio with video based decision making improves the quality of context and activity recognition in videos by 4% over audio data and 18% over image data.
引用
收藏
页码:823 / 831
页数:9
相关论文
共 50 条
  • [1] Audio/Video Fusion for Objects recognition
    Lacheze, Loic
    Guo, Yan
    Benosman, Ryad
    Gas, Bruno
    Couverture, Charlie
    2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2009, : 652 - 657
  • [2] End-to-End Bloody Video Recognition by Audio-Visual Feature Fusion
    Hou, Congcong
    Wu, Xiaoyu
    Wang, Ge
    PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I, 2018, 11256 : 501 - 510
  • [3] Video-Audio Emotion Recognition Based on Feature Fusion Deep Learning Method
    Song, Yanan
    Cai, Yuanyang
    Tan, Lizhe
    2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2021, : 611 - 616
  • [4] RECOGNITION OF BLUE MOVIES BY FUSION OF AUDIO AND VIDEO
    Zuo, Haiqiang
    Wu, Ou
    Hu, Weiming
    Xu, Bo
    2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 37 - 40
  • [5] A feature map aggregation network for unconstrained video face recognition
    Zhang, Luyang
    Wang, Huaibin
    Wang, Haitao
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2413 - 2425
  • [6] Low-level fusion of audio and video feature for multi-modal emotion recognition
    Wimmer, Matthias
    Schuller, Bjoern
    Arsic, Dejan
    Rigoll, Gerhard
    Radig, Bernd
    VISAPP 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2008, : 145 - +
  • [7] A Stochastic Late Fusion Approach to Human Action Recognition in Unconstrained Images and Videos
    Cheema, Muhammad Shahzad
    Eweiwi, Abdalrahman
    Bauckhage, Christian
    PATTERN RECOGNITION, GCPR 2014, 2014, 8753 : 616 - 628
  • [8] Emotion Recognition Using Fusion of Audio and Video Features
    Ortega, Juan D. S.
    Cardinal, Patrick
    Koerich, Alessandro L.
    2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 3847 - 3852
  • [9] ACTION RECOGNITION IN UNCONSTRAINED AMATEUR VIDEOS
    Liu, Jingen
    Luo, Jiebo
    Shah, Mubarak
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3549 - +
  • [10] Audio-Visual Event Localization in Unconstrained Videos
    Tian, Yapeng
    Shi, Jing
    Li, Bochen
    Duan, Zhiyao
    Xu, Chenliang
    COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 252 - 268