Audio and video feature fusion for activity recognition in unconstrained videos

被引：0

作者：

Lopes, Jose ^{[1
]}

Singh, Sameer ^{[1
]}

机构：

[1] Univ Loughborough, Res Sch Informat, Loughborough LE11 3TU, Leics, England

来源：

INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2006, PROCEEDINGS | 2006年 / 4224卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Combining audio and image processing for understanding video content has several benefits when compared to using each modality on their own. For the task of context and activity recognition in video sequences, it is important to explore both data streams to gather relevant information. In this paper we describe a video context and activity recognition model. Our work extracts a range of audio and visual features, followed by feature reduction and information fusion. We show that combining audio with video based decision making improves the quality of context and activity recognition in videos by 4% over audio data and 18% over image data.

引用

页码：823 / 831

页数：9

共 50 条

[1] Audio/Video Fusion for Objects recognition
Lacheze, Loic
Guo, Yan
Benosman, Ryad
Gas, Bruno
Couverture, Charlie
2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2009, : 652 - 657
[2] End-to-End Bloody Video Recognition by Audio-Visual Feature Fusion
Hou, Congcong
Wu, Xiaoyu
Wang, Ge
PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I, 2018, 11256 : 501 - 510
[3] Video-Audio Emotion Recognition Based on Feature Fusion Deep Learning Method
Song, Yanan
Cai, Yuanyang
Tan, Lizhe
2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2021, : 611 - 616
[4] RECOGNITION OF BLUE MOVIES BY FUSION OF AUDIO AND VIDEO
Zuo, Haiqiang
Wu, Ou
Hu, Weiming
Xu, Bo
2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 37 - 40
[5] A feature map aggregation network for unconstrained video face recognition
Zhang, Luyang
Wang, Huaibin
Wang, Haitao
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2413 - 2425
[6] Low-level fusion of audio and video feature for multi-modal emotion recognition
Wimmer, Matthias
Schuller, Bjoern
Arsic, Dejan
Rigoll, Gerhard
Radig, Bernd
VISAPP 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2008, : 145 - +
[7] A Stochastic Late Fusion Approach to Human Action Recognition in Unconstrained Images and Videos
Cheema, Muhammad Shahzad
Eweiwi, Abdalrahman
Bauckhage, Christian
PATTERN RECOGNITION, GCPR 2014, 2014, 8753 : 616 - 628
[8] Emotion Recognition Using Fusion of Audio and Video Features
Ortega, Juan D. S.
Cardinal, Patrick
Koerich, Alessandro L.
2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 3847 - 3852
[9] ACTION RECOGNITION IN UNCONSTRAINED AMATEUR VIDEOS
Liu, Jingen
Luo, Jiebo
Shah, Mubarak
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3549 - +
[10] Audio-Visual Event Localization in Unconstrained Videos
Tian, Yapeng
Shi, Jing
Li, Bochen
Duan, Zhiyao
Xu, Chenliang
COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 252 - 268

← 1 2 3 4 5 →