Recognizing high-level audio-visual concepts using context

被引:0
|
作者
Naphade, MR [1 ]
Huang, TS [1 ]
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL 61801 USA
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Recognition of high-level semantics from audio-visual data is a challenging multimedia understanding problem The difficulty mainly lies in the gap that exists between low level media features and high level semantic concepts In an attempt to bridge this gap we proposed a probabilistic framework for semantic understanding [6, 5] The components of this framework are probabilistic multimedia objects and a graphical network of such objects In this paper we show how the framework supports detection of multiple high-level concepts, which enjoy spatial and temporal support More importantly, we show why context matters and how it can be modeled Using a factor graph framework, we model context and use it to improve detection of sites, objects and events Using concepts Outdoor and flying-helicopter we demonstrate how the factor graph multinet models context Using ROC curves and probability of error curves we support the intuition that context should help.
引用
收藏
页码:46 / 49
页数:4
相关论文
共 50 条
  • [1] Comparison of low- and high-level visual features for audio-visual continuous automatic speech recognition
    Aleksic, PS
    Katsaggelos, AK
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 917 - 920
  • [2] Informative subspaces for audio-visual processing: High-level function from low-level fusion
    Fisher, JW
    Darrell, T
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 4104 - 4107
  • [3] Recognizing emotions for the audio-visual document indexing
    Le, XH
    Quénot, G
    Castelli, E
    ISCC2004: NINTH INTERNATIONAL SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2004, : 580 - 584
  • [4] Mimi4x: an interactive audio-visual installation for high-level structural improvisation
    Francois, Alexandre R. J.
    Schankler, Isaac
    Chew, Elaine
    INTERNATIONAL JOURNAL OF ARTS AND TECHNOLOGY, 2013, 6 (02) : 138 - 151
  • [5] MIMI4X: AN INTERACTIVE AUDIO-VISUAL INSTALLATION FOR HIGH-LEVEL STRUCTURAL IMPROVISATION
    Francois, Alexandre R. J.
    Schankler, Isaac
    Chew, Elaine
    2010 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2010), 2010, : 1618 - 1623
  • [6] Using Visual Context and Region Semantics for High-Level Concept Detection
    Mylonas, Phivos
    Spyrou, Evaggelos
    Avrithis, Yannis
    Kollias, Stefanos
    IEEE TRANSACTIONS ON MULTIMEDIA, 2009, 11 (02) : 229 - 243
  • [7] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
    Tamura, Satoshi
    Ishikawa, Masato
    Hashiba, Takashi
    Takeuchi, Shin'ichi
    Hayamizu, Satoru
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
  • [8] Learning high-level visual concepts using attributed primitives and genetic programming
    Krawiec, Krzysztof
    APPLICATIONS OF EVOLUTIONARY COMPUTING, PROCEEDINGS, 2006, 3907 : 515 - 519
  • [9] AUDIO-VISUAL PROGRAMMING FOR THE PIANO CLASS + INCLUDING LESSON PLAN USING AUDIO-VISUAL MEDIA
    LANCASTER, EL
    CLAVIER, 1976, 15 (05): : 28 - 33
  • [10] The future of spirituality in the context of immersive audio-visual media Bible's imagery as immersive audio-visual media experience
    Herteliu, Agnos-Millian
    CROIRE EN LA TECHNOLOGIE: MEDIATISATION DU FUTUR ET FUTUR DE LA MEDIATISATION, 2018, : 332 - 349