Semantic indexing of multimedia content using visual, audio, and text cues

被引：0

作者：

机构：

[1] Adams, W.H.

[2] Iyengar, Giridharan

[3] Lin, Ching-Yung

[4] Naphade, Milind Ramesh

[5] Neti, Chalapathy

[6] Nock, Harriet J.

[7] Smith, John R.

来源：

Adams, W.H. (whadams@us.ibm.com) | 1600年 / Hindawi Publishing Corporation卷 / 2003期

关键词：

Information analysis - Learning systems - Markov processes - Semantics - Statistical methods;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We present a learning-based approach to the semantic indexing of multimedia content using cues derived from audio, visual, and text features. We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of the concepts in the lexicon. To achieve robust detection of concepts, we exploit features from multiple modalities, namely, audio, video, and text, Concept representations are modeled using Gaussian mixture models (GMM), hidden Markov models (HMM), and support vector machines (SVM), Models such as Bayesian networks and SVMs are used in a late-fusion approach to model concepts that are not explicitly modeled in terms of features. Our experiments indicate promise in the proposed classification and fusion methodologies: our proposed fusion scheme achieves more than 10% relative improvement over the best unimodal concept detector.

引用

下载

共 50 条

[31] Detection and classification of vehicles using audio visual cues
Anuja Prasad S.
Leena Mary
Bino I. Koshy
Multimedia Tools and Applications, 2023, 82 : 44087 - 44106
[32] Automatic speech recognition using audio visual cues
Yashwanth, H
Mahendrakar, H
David, S
PROCEEDINGS OF THE IEEE INDICON 2004, 2004, : 166 - 169
[33] USING AUDIO AND VISUAL CUES FOR SPEAKER DIARISATION INITIALISATION
Garau, Giulia
Bourlard, Herve
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4942 - 4945
[34] Video Description Generation using Audio and Visual Cues
Jin, Qin
Liang, Junwei
ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 239 - 242
[35] Discovering meaningful multimedia patterns with audio-visual concepts and associated text
Xie, L
Kennedy, L
Chang, SE
Divakaran, A
Sun, H
Lin, CY
ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5, 2004, : 2383 - 2386
[36] Determining the context of text using augmented latent semantic indexing
Rishel, Tom
Perkins, Louise A.
Yenduri, Sumanth
Zand, Farnaz
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2007, 58 (14): : 2197 - 2204
[37] Audio-visual Encoding of Multimedia Content for Enhancing Movie Recommendations
Deldjoo, Yashar
Constantin, Mihai Gabriel
Eghbal-Zadeh, Hamid
Ionescu, Bogdan
Schedl, Markus
Cremonesi, Paolo
12TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS), 2018, : 455 - 459
[38] Improving text classification using local latent semantic indexing
Liu, T
Chen, H
Zhang, BY
Ma, WY
Wu, GY
FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 162 - 169
[39] Cross-lingual audio-to-text alignment for multimedia content management
Lyu, Dau-Cheng
Lyu, Ren-Yuan
Chiang, Yuang-Chin
Hsu, Chun-Nan
DECISION SUPPORT SYSTEMS, 2008, 45 (03) : 554 - 566
[40] Audio-Visual Content Analysis Based Clustering for Unsupervised Debate Indexing
Keum, Ji-Soo
Lee, Hyon-Soo
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2008, 27 (05): : 244 - 251

← 1 2 3 4 5 →