SPEECH SHOT EXTRACTION FROM BROADCAST NEWS VIDEOS

被引:4
|
作者
Kumagai, Shogo [1 ,5 ]
Doman, Keisuke [1 ,4 ]
Takahashi, Tomokazu [2 ]
Deguchi, Daisuke [3 ]
Ide, Ichiro [1 ]
Murase, Hiroshi [1 ]
机构
[1] Nagoya Univ, Grad Sch Informat Sci, Chikusa Ku, Furo Cho, Nagoya, Aichi 4648601, Japan
[2] Gifu Shotoku Gakuen Univ, Fac Econ & Informat, Gifu 5008288, Japan
[3] Nagoya Univ, Informat & Commun Headquarters, Chikusa Ku, Nagoya, Aichi 4648601, Japan
[4] Japan Soc Promot Sci, Tokyo, Japan
[5] Ricoh Co Ltd, Tokyo, Japan
关键词
Speech shot extraction; audio-visual integration; broadcast news videos;
D O I
10.1142/S1793351X12400077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a method for discriminating between a speech shot and a narrated shot to extract genuine speech shots from a broadcast news video. Speech shots in news videos contain a wealth of multimedia information of the speaker, and could thus be considered valuable as archived material. In order to extract speech shots from news videos, there is an approach that uses the position and size of a face region. However, it is difficult to extract them with only such an approach, since news videos contain non-speech shots where the speaker is not the subject that appears in the screen, namely, narrated shots. To solve this problem, we propose a method to discriminate between a speech shot and a narrated shot in two stages. The first stage of the proposed method directly evaluates the inconsistency between a subject and a speaker based on the co-occurrence between lip motion and voice. The second stage of the proposed method evaluates based on the intra-and inter-shot features that focus on the tendency of speech shots. With the combination of both stages, the proposed method accurately discriminates between a speech shot and a narrated shot. In the experiments, the overall accuracy of speech shots extraction by the proposed method was 0.871. Therefore, we confirmed the effectiveness of the proposed method.
引用
收藏
页码:179 / 204
页数:26
相关论文
共 50 条
  • [21] From Speech to Trees: Applying Treebank Annotation to Arabic Broadcast News
    Maamouri, Mohamed
    Bies, Ann
    Kulick, Seth
    Zaghouani, Wajdi
    Graff, David
    Ciul, Michael
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 2117 - 2122
  • [22] Investigation on Mandarin Broadcast News Speech Recognition
    Hwang, Mei-Yuh
    Lei, Xin
    Wang, Wen
    Shinozaki, Takahiro
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1233 - +
  • [23] A study on Mandarin broadcast news speech recognition
    Chen, CL
    Wang, YR
    Chen, SH
    2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 257 - 260
  • [24] Voice retrieval of Mandarin broadcast news speech
    Chen, B
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2006, 20 (01) : 91 - 109
  • [25] Online Speech Activity Detection in Broadcast News
    Gao, Chao
    Saikumar, Guruprasad
    Khanwalkar, Saurabh
    Herscovici, Avi
    Kumar, Anoop
    Srivastava, Amit
    Natarajan, Premkumar
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2648 - 2651
  • [26] Japanese broadcast news transcription and information extraction
    Furui, S
    Ohtsuki, K
    Zhang, ZP
    COMMUNICATIONS OF THE ACM, 2000, 43 (02) : 71 - 73
  • [27] Discrimination of speech from nonspeeech in broadcast news based on modulation frequency features
    Markaki, Maria
    Stylianou, Yannis
    SPEECH COMMUNICATION, 2011, 53 (05) : 726 - 735
  • [28] A multi-expert approach for shot classiflcation in news videos
    De Santo, M
    Percannella, G
    Sansone, C
    Vento, M
    IMAGE ANALYSIS AND RECOGNITION, PT 1, PROCEEDINGS, 2004, 3211 : 564 - 571
  • [29] Structural Metadata Annotation of Speech Corpora: Comparing Broadcast News and Broadcast Conversations
    Kolar, Jachym
    Svec, Jan
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 77 - 82
  • [30] RUNDKAST: An Annotated Norwegian Broadcast News Speech Corpus
    Amdal, Ingunn
    Strand, Ole Morten
    Almberg, Jorn
    Svendsen, Torbjorn
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1907 - 1913