Integrating audio-visual features and text information for story segmentation of news video

被引：0

作者：

Liu, Hua-Yong ^{[1
]}

Zhou, Dong-Ru ^{[1
]}

机构：

[1] Sch. of Comp., Wuhan Univ., Wuhan 430072, China

来源：

Wuhan University Journal of Natural Sciences | 2003年 / 8卷 / 04期

关键词：

School of Computer; Wuhan University; Wuhan; 430072; Hubei; China Abstract: Video data are composed of multimodal information streams including visual; auditory and textual streams; so an approach of story segmentation for news video using multimodal analysis is described in this paper. The proposed approach detects the topic-caption frames; and integrates them with silence clips detection results; as well as shot segmentation results to locate the news story boundaries. The integration of audio-visual features and text information overcomes the weakness of the approach using only image analysis techniques. On test data with 135 400 frames; when the boundaries between news stories are detected; the accuracy rate 85.8~ and the recall rate 97.5~ are obtained. The experimental results show the approach is valid and robust. Key words: news video; story segmentation; audio-visual features analysis; text detection CLC number: TP 311. 5 Received date: 2002-12-23 Foundation item: Supported by the Nanonal Natural Science Foundation of China (60173045) Biogi~phg: Liu Hua-yong (1978-); male; Ph.D; can&date; research direetton:vldeo retneval and speech ~ignal processing. E-mad: hyhut9 _en@ sina. corn 1 To whom correspondence should be addressed;

D O I：

10.1007/bf02903674

中图分类号：

学科分类号：

摘要：

引用

页码：1070 / 1074

共 50 条

[11] Automated generation of news content hierarchy by integrating audio, video, and text information
Huang, Q
Liu, Z
Rosenberg, A
Gibbon, D
Shahraray, B
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 3025 - 3028
[12] Audio-visual speaker recognition for video broadcast news
Maison, B
Neti, C
Senior, A
JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2001, 29 (1-2): : 71 - 79
[13] Audio-Visual Speaker Recognition for Video Broadcast News
Benoît Maison
Chalapathy Neti
Andrew Senior
Journal of VLSI signal processing systems for signal, image and video technology, 2001, 29 : 71 - 79
[14] Extracting semantic information from basketball video based on audio-visual features
Kim, K
Choi, J
Kim, N
Kim, P
IMAGE AND VIDEO RETRIEVAL, 2002, 2383 : 278 - 288
[15] Audio-Visual Segmentation
Zhou, Jinxing
Wang, Jianyuan
Zhang, Jiayi
Sun, Weixuan
Zhang, Jing
Birchfield, Stan
Guo, Dan
Kong, Lingpeng
Wang, Meng
Zhong, Yiran
COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 386 - 403
[16] VIDEO CAMERA IDENTIFICATION USING AUDIO-VISUAL FEATURES
Milani, S.
Cuccovillo, L.
Tagliasacchi, M.
Tubaro, S.
Aichroth, P.
2014 5TH EUROPEAN WORKSHOP ON VISUAL INFORMATION PROCESSING (EUVIP 2014), 2014,
[17] BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation Knowledge
Liu, Chen
Li, Peike
Zhang, Hu
Li, Lincheng
Huang, Zi
Wang, Dadong
Yu, Xin
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10015 - 10028
[18] Identification of story units in audio-visual sequences by joint audio and video processing
Saraceno, C
Leonardi, R
1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 1, 1998, : 363 - 367
[19] Audio-Visual Segmentation with Semantics
Zhou, Jinxing
Shen, Xuyang
Wang, Jianyuan
Zhang, Jiayi
Sun, Weixuan
Zhang, Jing
Birchfield, Stan
Guo, Dan
Kong, Lingpeng
Wang, Meng
Zhong, Yiran
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 1644 - 1664
[20] Integration of audio and video semantic features for news video scene segmentation
Xu, J
Liu, HB
Zhou, DR
VISUALIZATION AND OPTIMIZATION TECHNIQUES, 2001, 4553 : 227 - 232

← 1 2 3 4 5 →