SPEECH SHOT EXTRACTION FROM BROADCAST NEWS VIDEOS

被引:4
|
作者
Kumagai, Shogo [1 ,5 ]
Doman, Keisuke [1 ,4 ]
Takahashi, Tomokazu [2 ]
Deguchi, Daisuke [3 ]
Ide, Ichiro [1 ]
Murase, Hiroshi [1 ]
机构
[1] Nagoya Univ, Grad Sch Informat Sci, Chikusa Ku, Furo Cho, Nagoya, Aichi 4648601, Japan
[2] Gifu Shotoku Gakuen Univ, Fac Econ & Informat, Gifu 5008288, Japan
[3] Nagoya Univ, Informat & Commun Headquarters, Chikusa Ku, Nagoya, Aichi 4648601, Japan
[4] Japan Soc Promot Sci, Tokyo, Japan
[5] Ricoh Co Ltd, Tokyo, Japan
关键词
Speech shot extraction; audio-visual integration; broadcast news videos;
D O I
10.1142/S1793351X12400077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a method for discriminating between a speech shot and a narrated shot to extract genuine speech shots from a broadcast news video. Speech shots in news videos contain a wealth of multimedia information of the speaker, and could thus be considered valuable as archived material. In order to extract speech shots from news videos, there is an approach that uses the position and size of a face region. However, it is difficult to extract them with only such an approach, since news videos contain non-speech shots where the speaker is not the subject that appears in the screen, namely, narrated shots. To solve this problem, we propose a method to discriminate between a speech shot and a narrated shot in two stages. The first stage of the proposed method directly evaluates the inconsistency between a subject and a speaker based on the co-occurrence between lip motion and voice. The second stage of the proposed method evaluates based on the intra-and inter-shot features that focus on the tendency of speech shots. With the combination of both stages, the proposed method accurately discriminates between a speech shot and a narrated shot. In the experiments, the overall accuracy of speech shots extraction by the proposed method was 0.871. Therefore, we confirmed the effectiveness of the proposed method.
引用
收藏
页码:179 / 204
页数:26
相关论文
共 50 条
  • [11] Speech recognition for Turkish broadcast news
    Arisoy, Ebru
    Saraclar, Murat
    2007 IEEE 15TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1-3, 2007, : 1054 - 1057
  • [12] Expanding Arabic Treebank to Speech: Results from Broadcast News
    Maamouri, Mohamed
    Bies, Ann
    Kulick, Seth
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1856 - 1861
  • [13] A system for semantic segmentation of TV news broadcast videos
    Kannao, Raghvendra
    Guha, Prithwijit
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (9-10) : 6191 - 6225
  • [14] A system for semantic segmentation of TV news broadcast videos
    Raghvendra Kannao
    Prithwijit Guha
    Multimedia Tools and Applications, 2020, 79 : 6191 - 6225
  • [15] AUTOMATIC KEY-FRAME EXTRACTION FROM BROADCAST SOCCER VIDEOS
    Simoes, Nielsen C.
    Leite, Neucimar J.
    Marcotegui, Beatriz
    VISAPP 2009: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2009, : 216 - +
  • [16] Combining experts for anchorperson shot detection in news videos
    De Santo, M
    Percannella, G
    Sansone, C
    Vento, M
    PATTERN ANALYSIS AND APPLICATIONS, 2005, 7 (04) : 447 - 460
  • [17] News videos anchor person detection by shot clustering
    Ji, Ping
    Cao, Liujuan
    Zhang, Xiguang
    Zhang, Longfei
    Wu, Weimin
    NEUROCOMPUTING, 2014, 123 : 86 - 99
  • [18] Automatic speech summarization applied to English broadcast news speech
    Hori, C
    Furui, S
    Malkin, R
    Yu, H
    Waibel, A
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 9 - 12
  • [19] Combining experts for anchorperson shot detection in news videos
    M. De Santo
    G. Percannella
    C. Sansone
    M. Vento
    Pattern Analysis and Applications, 2004, 7 : 447 - 460
  • [20] A Comparative Study on Speech Summarization of Broadcast News and Lecture Speech
    Zhang, Jian
    Chan, Ho Yin
    Fung, Pascale
    Cao, Lu
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2488 - 2491