SPEECH SHOT EXTRACTION FROM BROADCAST NEWS VIDEOS

被引：4

作者：

Kumagai, Shogo ^{[1
,5
]}

Doman, Keisuke ^{[1
,4
]}

Takahashi, Tomokazu ^{[2
]}

Deguchi, Daisuke ^{[3
]}

Ide, Ichiro ^{[1
]}

Murase, Hiroshi ^{[1
]}

机构：

[1] Nagoya Univ, Grad Sch Informat Sci, Chikusa Ku, Furo Cho, Nagoya, Aichi 4648601, Japan

[2] Gifu Shotoku Gakuen Univ, Fac Econ & Informat, Gifu 5008288, Japan

[3] Nagoya Univ, Informat & Commun Headquarters, Chikusa Ku, Nagoya, Aichi 4648601, Japan

[4] Japan Soc Promot Sci, Tokyo, Japan

[5] Ricoh Co Ltd, Tokyo, Japan

来源：

INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING | 2012年 / 6卷 / 02期

关键词：

Speech shot extraction; audio-visual integration; broadcast news videos;

D O I：

10.1142/S1793351X12400077

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a method for discriminating between a speech shot and a narrated shot to extract genuine speech shots from a broadcast news video. Speech shots in news videos contain a wealth of multimedia information of the speaker, and could thus be considered valuable as archived material. In order to extract speech shots from news videos, there is an approach that uses the position and size of a face region. However, it is difficult to extract them with only such an approach, since news videos contain non-speech shots where the speaker is not the subject that appears in the screen, namely, narrated shots. To solve this problem, we propose a method to discriminate between a speech shot and a narrated shot in two stages. The first stage of the proposed method directly evaluates the inconsistency between a subject and a speaker based on the co-occurrence between lip motion and voice. The second stage of the proposed method evaluates based on the intra-and inter-shot features that focus on the tendency of speech shots. With the combination of both stages, the proposed method accurately discriminates between a speech shot and a narrated shot. In the experiments, the overall accuracy of speech shots extraction by the proposed method was 0.871. Therefore, we confirmed the effectiveness of the proposed method.

引用

页码：179 / 204

页数：26

共 50 条

[11] Speech recognition for Turkish broadcast news
Arisoy, Ebru
Saraclar, Murat
2007 IEEE 15TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1-3, 2007, : 1054 - 1057
[12] Expanding Arabic Treebank to Speech: Results from Broadcast News
Maamouri, Mohamed
Bies, Ann
Kulick, Seth
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1856 - 1861
[13] A system for semantic segmentation of TV news broadcast videos
Kannao, Raghvendra
Guha, Prithwijit
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (9-10) : 6191 - 6225
[14] A system for semantic segmentation of TV news broadcast videos
Raghvendra Kannao
Prithwijit Guha
Multimedia Tools and Applications, 2020, 79 : 6191 - 6225
[15] AUTOMATIC KEY-FRAME EXTRACTION FROM BROADCAST SOCCER VIDEOS
Simoes, Nielsen C.
Leite, Neucimar J.
Marcotegui, Beatriz
VISAPP 2009: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2009, : 216 - +
[16] Combining experts for anchorperson shot detection in news videos
De Santo, M
Percannella, G
Sansone, C
Vento, M
PATTERN ANALYSIS AND APPLICATIONS, 2005, 7 (04) : 447 - 460
[17] News videos anchor person detection by shot clustering
Ji, Ping
Cao, Liujuan
Zhang, Xiguang
Zhang, Longfei
Wu, Weimin
NEUROCOMPUTING, 2014, 123 : 86 - 99
[18] Automatic speech summarization applied to English broadcast news speech
Hori, C
Furui, S
Malkin, R
Yu, H
Waibel, A
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 9 - 12
[19] Combining experts for anchorperson shot detection in news videos
M. De Santo
G. Percannella
C. Sansone
M. Vento
Pattern Analysis and Applications, 2004, 7 : 447 - 460
[20] A Comparative Study on Speech Summarization of Broadcast News and Lecture Speech
Zhang, Jian
Chan, Ho Yin
Fung, Pascale
Cao, Lu
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2488 - 2491

← 1 2 3 4 5 →