Speech retrieval for TV news programs by fusing the audio and video information

被引:0
|
作者
Gao, XB [1 ]
Jie, L [1 ]
Ji, HB [1 ]
机构
[1] Xidian Univ, Sch Elect Engn, Xian 710071, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A typical news story contains a brief report by the anchor person(s) in the studio, as well as news footage in the field. Investigation shows that our recognizer performs better when indexing audio from the studio than that from the field. In order to automatically extract the "reliable" audio segments for speech retrieval, we attempt to detect studio-to-field transitions by means of video parsing. Our research is based on 146 news stories collected from Hong Kong TVB Jade station. Retrieval using the entire audio track gave (average inverse rank) AIR=0.759. while,with the incorporation of video parsing, we performed retrieval based only on the studio recordings, which produced AIR=0.765.
引用
收藏
页码:994 / 997
页数:4
相关论文
共 50 条
  • [1] Speech retrieval for TV news programs by fusing the audio and video information
    Gao, Xinbo
    Li, Jie
    Ji, Hongbing
    [J]. International Conference on Signal Processing Proceedings, ICSP, 2002, 2 : 994 - 997
  • [2] Speech retrieval with video parsing for television news programs
    Meng, HM
    Tang, X
    Hui, PY
    Gao, XB
    Li, YC
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 1401 - 1404
  • [3] Face retrieval in broadcasting news video by fusing temporal and intensity information
    Le, Duy-Dinh
    Satoh, Shin'ichi
    Houle, Michael E.
    [J]. IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2006, 4071 : 391 - 400
  • [4] News Video Clip Retrieval Based on Topic Caption Text and Audio Information
    Zhao Yaqin
    Zheng Jiaqiang
    Zhou Hongping
    [J]. PROCEEDINGS OF THE 2009 WRI GLOBAL CONGRESS ON INTELLIGENT SYSTEMS, VOL IV, 2009, : 477 - 481
  • [5] Fusing Audio and Video Information for Online Speaker Diarization
    Schmalenstroeer, Joerg
    Kelling, Martin
    Leutnant, Volker
    Haeb-Umbach, Reinhold
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1159 - 1162
  • [6] MULTI-SPEAKER TRACKING BY FUSING AUDIO AND VIDEO INFORMATION
    Xiong, Zichao
    Liu, Hongqing
    Zhou, Yi
    Luo, Zhen
    [J]. 2021 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2021, : 321 - 325
  • [7] Information retrieval from unsegmented broadcast news audio
    Johnson S.E.
    Jourlin P.
    Jones K.S.
    Woodland P.C.
    [J]. International Journal of Speech Technology, 2001, 4 (3-4) : 251 - 268
  • [8] FEATURE EXTRACTION AND CLASSIFICATION FOR AUDIO INFORMATION IN NEWS VIDEO
    Song, Yu
    Wang, Wen-Hong
    Guo, Feng-Juan
    [J]. PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON WAVELET ANALYSIS AND PATTERN RECOGNITION, 2009, : 43 - +
  • [9] Content-based TV sports video retrieval based on audio-visual features and text information
    Liu, HY
    [J]. IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 481 - 484
  • [10] Indexing and retrieval of TV news programs based on MPEG-7
    Fatemi, N
    Khaled, OA
    [J]. ICCE: 2001 INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, DIGEST OF TECHNICAL PAPERS, 2001, : 360 - 361