SPEECH SHOT EXTRACTION FROM BROADCAST NEWS VIDEOS

被引：4

作者：

Kumagai, Shogo ^{[1
,5
]}

Doman, Keisuke ^{[1
,4
]}

Takahashi, Tomokazu ^{[2
]}

Deguchi, Daisuke ^{[3
]}

Ide, Ichiro ^{[1
]}

Murase, Hiroshi ^{[1
]}

机构：

[1] Nagoya Univ, Grad Sch Informat Sci, Chikusa Ku, Furo Cho, Nagoya, Aichi 4648601, Japan

[2] Gifu Shotoku Gakuen Univ, Fac Econ & Informat, Gifu 5008288, Japan

[3] Nagoya Univ, Informat & Commun Headquarters, Chikusa Ku, Nagoya, Aichi 4648601, Japan

[4] Japan Soc Promot Sci, Tokyo, Japan

[5] Ricoh Co Ltd, Tokyo, Japan

来源：

INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING | 2012年 / 6卷 / 02期

关键词：

Speech shot extraction; audio-visual integration; broadcast news videos;

D O I：

10.1142/S1793351X12400077

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a method for discriminating between a speech shot and a narrated shot to extract genuine speech shots from a broadcast news video. Speech shots in news videos contain a wealth of multimedia information of the speaker, and could thus be considered valuable as archived material. In order to extract speech shots from news videos, there is an approach that uses the position and size of a face region. However, it is difficult to extract them with only such an approach, since news videos contain non-speech shots where the speaker is not the subject that appears in the screen, namely, narrated shots. To solve this problem, we propose a method to discriminate between a speech shot and a narrated shot in two stages. The first stage of the proposed method directly evaluates the inconsistency between a subject and a speaker based on the co-occurrence between lip motion and voice. The second stage of the proposed method evaluates based on the intra-and inter-shot features that focus on the tendency of speech shots. With the combination of both stages, the proposed method accurately discriminates between a speech shot and a narrated shot. In the experiments, the overall accuracy of speech shots extraction by the proposed method was 0.871. Therefore, we confirmed the effectiveness of the proposed method.

引用

页码：179 / 204

页数：26

共 50 条

[21] From Speech to Trees: Applying Treebank Annotation to Arabic Broadcast News
Maamouri, Mohamed
Bies, Ann
Kulick, Seth
Zaghouani, Wajdi
Graff, David
Ciul, Michael
LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 2117 - 2122
[22] Investigation on Mandarin Broadcast News Speech Recognition
Hwang, Mei-Yuh
Lei, Xin
Wang, Wen
Shinozaki, Takahiro
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1233 - +
[23] A study on Mandarin broadcast news speech recognition
Chen, CL
Wang, YR
Chen, SH
2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 257 - 260
[24] Voice retrieval of Mandarin broadcast news speech
Chen, B
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2006, 20 (01) : 91 - 109
[25] Online Speech Activity Detection in Broadcast News
Gao, Chao
Saikumar, Guruprasad
Khanwalkar, Saurabh
Herscovici, Avi
Kumar, Anoop
Srivastava, Amit
Natarajan, Premkumar
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2648 - 2651
[26] Japanese broadcast news transcription and information extraction
Furui, S
Ohtsuki, K
Zhang, ZP
COMMUNICATIONS OF THE ACM, 2000, 43 (02) : 71 - 73
[27] Discrimination of speech from nonspeeech in broadcast news based on modulation frequency features
Markaki, Maria
Stylianou, Yannis
SPEECH COMMUNICATION, 2011, 53 (05) : 726 - 735
[28] A multi-expert approach for shot classiflcation in news videos
De Santo, M
Percannella, G
Sansone, C
Vento, M
IMAGE ANALYSIS AND RECOGNITION, PT 1, PROCEEDINGS, 2004, 3211 : 564 - 571
[29] Structural Metadata Annotation of Speech Corpora: Comparing Broadcast News and Broadcast Conversations
Kolar, Jachym
Svec, Jan
SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 77 - 82
[30] RUNDKAST: An Annotated Norwegian Broadcast News Speech Corpus
Amdal, Ingunn
Strand, Ole Morten
Almberg, Jorn
Svendsen, Torbjorn
SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1907 - 1913

← 1 2 3 4 5 →