Automated generation of news content hierarchy by integrating audio, video, and text information

被引：16

作者：

Huang, Q ^{[1
]}

Liu, Z ^{[1
]}

Rosenberg, A ^{[1
]}

Gibbon, D ^{[1
]}

Shahraray, B ^{[1
]}

机构：

[1] AT&T Bell Labs, Res, Red Bank, NJ 07701 USA

来源：

ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI | 1999年

关键词：

D O I：

10.1109/ICASSP.1999.757478

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper addresses the problem of generating semantically meaningful content by integrating information from different media. The goal is to automatically construct a compact yet meaningful abstraction of the multimedia data that can serve as an effective index table, allowing users to browse through large amounts of data in a non-linens fashion With flexibility, efficiency, and confidence. We propose an integrated solution in the context of broadcast news that simultaneously utilizes cues from video, audio, and text to achieve the goal. Some experimental results are presented and discussed in the paper.

引用

下载

页码：3025 / 3028

页数：4

共 50 条

[21] Mining Audio, Text and Visual Information for Talking Face Generation
Yu, Lingyun
Yu, Jun
Ling, Qiang
2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 787 - 795
[22] MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
Hayes, Thomas
Zhang, Songyang
Yin, Xi
Pang, Guan
Sheng, Sasha
Yang, Harry
Ge, Songwei
Hu, Qiyuan
Parikh, Devi
COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 431 - 449
[23] Text Summarization and Speech Synthesis for the Automated Generation of Personalized Audio Presentations
Lawless, Seamus
Lavin, Peter
Bayomi, Mostafa
Cabral, Joao P.
Ghorab, M. Rami
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2015, 2015, 9103 : 307 - 320
[24] Video content parsing based on combined audio and visual information
Zhang, T
Kuo, CCJ
MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS IV, 1999, 3846 : 78 - 89
[25] What Is the Content of the World's Technologically Mediated Information and Communication Capacity: How Much Text, Image, Audio, and Video?
Hilbert, Martin
INFORMATION SOCIETY, 2014, 30 (02): : 127 - 143
[26] Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
Yariv, Guy
Gat, Itai
Benaim, Sagie
Wolf, Lior
Schwartz, Idan
Adi, Yossi
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6639 - 6647
[27] Content Based Lecture Video Retrieval Using Speech and Video Text Information
Yang, Haojin
Meinel, Christoph
IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, 2014, 7 (02): : 142 - 154
[28] TA2V: Text-Audio Guided Video Generation
Zhao, Minglu
Wang, Wenmin
Chen, Tongbao
Zhang, Rui
Li, Ruochen
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7250 - 7264
[29] Modem trends of data stream generation of audio-video content
Shvaichenko, Volodymyr
TCSET 2006: MODERN PROBLEMS OF RADIO ENGINEERING, TELECOMMUNICATIONS AND COMPUTER SCIENCE, PROCEEDINGS, 2006, : 345 - 346
[30] Low level processing of audio and video information for extracting the semantics of content
Adami, N
Bugatti, A
Leonardi, R
Migliorati, P
2001 IEEE FOURTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2001, : 607 - 612

← 1 2 3 4 5 →