Combining text and audio-visual features in video indexing

被引：0

作者：

Chang, SF ^{[1
]}

Manmatha, R ^{[1
]}

Chua, TS ^{[1
]}

机构：

[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA

来源：

2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING | 2005年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We discuss the opportunities, state of the art, and open research issues in using multi-modal features in video indexing. Specifically, we focus on how imperfect text data obtained by automatic speech recognition (ASR) may be used to help solve challenging problems, such as story segmentation, concept detection, retrieval, and topic clustering. We review the frameworks and machine learning techniques that are used to fuse the text features with audio-visual features. Case studies showing promising performance will be described, primarily in the broadcast news video domain.

引用

页码：1005 / 1008

页数：4

共 50 条

[1] Indexing audio-visual sequences by joint audio and video processing
Saraceno, C
Leonardi, R
[J]. VSMM98: FUTUREFUSION - APPLICATION REALITIES FOR THE VIRTUAL AGE, VOLS 1 AND 2, 1998, : 686 - 691
[2] Combining audio and video metrics to assess audio-visual quality
Becerra Martinez, Helard A.
Farias, Mylene C. Q.
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (18) : 23993 - 24012
[3] Combining audio and video metrics to assess audio-visual quality
Helard A. Becerra Martinez
Mylène C. Q. Farias
[J]. Multimedia Tools and Applications, 2018, 77 : 23993 - 24012
[4] Integrating audio-visual features and text information for story segmentation of news video
Liu, Hua-Yong
Zhou, Dong-Ru
[J]. Wuhan University Journal of Natural Sciences, 2003, 8 (04) : 1070 - 1074
[5] Integrating Audio-Visual Features and Text Information for Story Segmentation of News Video
Liu Hua-yong
[J]. Wuhan University Journal of Natural Sciences, 2003, (04) : 1070 - 1074
[6] Speaker dependent video indexing based on audio-visual interaction
Tsekeridou, S
Pitas, I
[J]. 1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 1, 1998, : 358 - 362
[7] Automatic story segmentation of news video based on audio-visual features and text information
Wang, C
Wang, Y
Liu, HY
He, YX
[J]. 2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 3008 - 3011
[8] VIDEO CAMERA IDENTIFICATION USING AUDIO-VISUAL FEATURES
Milani, S.
Cuccovillo, L.
Tagliasacchi, M.
Tubaro, S.
Aichroth, P.
[J]. 2014 5TH EUROPEAN WORKSHOP ON VISUAL INFORMATION PROCESSING (EUVIP 2014), 2014,
[9] Audio-visual content analysis for content-based video indexing
Tsekeridou, S
Pitas, I
[J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 1, 1999, : 667 - 672
[10] Content-based video parsing and indexing based on audio-visual interaction
Tsekeridou, S
Pitas, I
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2001, 11 (04) : 522 - 535

← 1 2 3 4 5 →