Semantic indexing of multimedia content using visual, audio, and text cues

被引:0
|
作者
机构
[1] Adams, W.H.
[2] Iyengar, Giridharan
[3] Lin, Ching-Yung
[4] Naphade, Milind Ramesh
[5] Neti, Chalapathy
[6] Nock, Harriet J.
[7] Smith, John R.
来源
Adams, W.H. (whadams@us.ibm.com) | 1600年 / Hindawi Publishing Corporation卷 / 2003期
关键词
Information analysis - Learning systems - Markov processes - Semantics - Statistical methods;
D O I
暂无
中图分类号
学科分类号
摘要
We present a learning-based approach to the semantic indexing of multimedia content using cues derived from audio, visual, and text features. We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of the concepts in the lexicon. To achieve robust detection of concepts, we exploit features from multiple modalities, namely, audio, video, and text, Concept representations are modeled using Gaussian mixture models (GMM), hidden Markov models (HMM), and support vector machines (SVM), Models such as Bayesian networks and SVMs are used in a late-fusion approach to model concepts that are not explicitly modeled in terms of features. Our experiments indicate promise in the proposed classification and fusion methodologies: our proposed fusion scheme achieves more than 10% relative improvement over the best unimodal concept detector.
引用
下载
收藏
相关论文
共 50 条
  • [31] Detection and classification of vehicles using audio visual cues
    Anuja Prasad S.
    Leena Mary
    Bino I. Koshy
    Multimedia Tools and Applications, 2023, 82 : 44087 - 44106
  • [32] Automatic speech recognition using audio visual cues
    Yashwanth, H
    Mahendrakar, H
    David, S
    PROCEEDINGS OF THE IEEE INDICON 2004, 2004, : 166 - 169
  • [33] USING AUDIO AND VISUAL CUES FOR SPEAKER DIARISATION INITIALISATION
    Garau, Giulia
    Bourlard, Herve
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4942 - 4945
  • [34] Video Description Generation using Audio and Visual Cues
    Jin, Qin
    Liang, Junwei
    ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 239 - 242
  • [35] Discovering meaningful multimedia patterns with audio-visual concepts and associated text
    Xie, L
    Kennedy, L
    Chang, SE
    Divakaran, A
    Sun, H
    Lin, CY
    ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5, 2004, : 2383 - 2386
  • [36] Determining the context of text using augmented latent semantic indexing
    Rishel, Tom
    Perkins, Louise A.
    Yenduri, Sumanth
    Zand, Farnaz
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2007, 58 (14): : 2197 - 2204
  • [37] Audio-visual Encoding of Multimedia Content for Enhancing Movie Recommendations
    Deldjoo, Yashar
    Constantin, Mihai Gabriel
    Eghbal-Zadeh, Hamid
    Ionescu, Bogdan
    Schedl, Markus
    Cremonesi, Paolo
    12TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS), 2018, : 455 - 459
  • [38] Improving text classification using local latent semantic indexing
    Liu, T
    Chen, H
    Zhang, BY
    Ma, WY
    Wu, GY
    FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 162 - 169
  • [39] Cross-lingual audio-to-text alignment for multimedia content management
    Lyu, Dau-Cheng
    Lyu, Ren-Yuan
    Chiang, Yuang-Chin
    Hsu, Chun-Nan
    DECISION SUPPORT SYSTEMS, 2008, 45 (03) : 554 - 566
  • [40] Audio-Visual Content Analysis Based Clustering for Unsupervised Debate Indexing
    Keum, Ji-Soo
    Lee, Hyon-Soo
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2008, 27 (05): : 244 - 251