Recognizing high-level audio-visual concepts using context

被引：0

作者：

Naphade, MR ^{[1
]}

Huang, TS ^{[1
]}

机构：

[1] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL 61801 USA

来源：

2001 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS | 2001年

关键词：

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Recognition of high-level semantics from audio-visual data is a challenging multimedia understanding problem The difficulty mainly lies in the gap that exists between low level media features and high level semantic concepts In an attempt to bridge this gap we proposed a probabilistic framework for semantic understanding [6, 5] The components of this framework are probabilistic multimedia objects and a graphical network of such objects In this paper we show how the framework supports detection of multiple high-level concepts, which enjoy spatial and temporal support More importantly, we show why context matters and how it can be modeled Using a factor graph framework, we model context and use it to improve detection of sites, objects and events Using concepts Outdoor and flying-helicopter we demonstrate how the factor graph multinet models context Using ROC curves and probability of error curves we support the intuition that context should help.

引用

页码：46 / 49

页数：4

共 50 条

[41] Teleimmersive Audio-Visual Communication Using Commodity Hardware
Viet Anh Nguyen
Lu, Jiangbo
Zhao, Shengkui
Jones, Douglas L.
Do, Minh N.
IEEE SIGNAL PROCESSING MAGAZINE, 2014, 31 (06) : 118 - +
[42] Audio-visual speech recognition using deep learning
Noda, Kuniaki
Yamaguchi, Yuki
Nakadai, Kazuhiro
Okuno, Hiroshi G.
Ogata, Tetsuya
APPLIED INTELLIGENCE, 2015, 42 (04) : 722 - 737
[43] Object category detection using audio-visual cues
Luo, Jie
Caputo, Barbara
Zweig, Alon
Bach, Joerg-Hendrik
Anemueller, Joern
COMPUTER VISION SYSTEMS, PROCEEDINGS, 2008, 5008 : 539 - 548
[44] Audio-Visual Model Distillation Using Acoustic Images
Perez, Andres F.
Sanguineti, Valentina
Morerio, Pietro
Murino, Vittorio
2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 2843 - 2852
[45] Human interaction categorization by using audio-visual cues
M. J. Marín-Jiménez
R. Muñoz-Salinas
E. Yeguas-Bolivar
N. Pérez de la Blanca
Machine Vision and Applications, 2014, 25 : 71 - 84
[46] MULTIMEDIA PRESENTATION DEVELOPMENT USING THE AUDIO-VISUAL CONNECTION
MOORE, DJ
IBM SYSTEMS JOURNAL, 1990, 29 (04) : 494 - 508
[47] Harnessing high-level concepts, visual, and auditory features for violence detection in videos
Peixoto, Bruno M.
Lavi, Bahram
Dias, Zanoni
Rocha, Anderson
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 78
[48] Audio-visual speech recognition using deep learning
Kuniaki Noda
Yuki Yamaguchi
Kazuhiro Nakadai
Hiroshi G. Okuno
Tetsuya Ogata
Applied Intelligence, 2015, 42 : 722 - 737
[49] Audio-visual speech recognition using an infrared headset
Huang, J
Potamianos, G
Connell, J
Neti, C
SPEECH COMMUNICATION, 2004, 44 (1-4) : 83 - 96
[50] USING MULTIPLE VISUAL TANDEM STREAMS IN AUDIO-VISUAL SPEECH RECOGNITION
Topkaya, Ibrahim Saygin
Erdogan, Hakan
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4988 - 4991

← 1 2 3 4 5 →