Automated generation of news content hierarchy by integrating audio, video, and text information

被引:16
|
作者
Huang, Q [1 ]
Liu, Z [1 ]
Rosenberg, A [1 ]
Gibbon, D [1 ]
Shahraray, B [1 ]
机构
[1] AT&T Bell Labs, Res, Red Bank, NJ 07701 USA
关键词
D O I
10.1109/ICASSP.1999.757478
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper addresses the problem of generating semantically meaningful content by integrating information from different media. The goal is to automatically construct a compact yet meaningful abstraction of the multimedia data that can serve as an effective index table, allowing users to browse through large amounts of data in a non-linens fashion With flexibility, efficiency, and confidence. We propose an integrated solution in the context of broadcast news that simultaneously utilizes cues from video, audio, and text to achieve the goal. Some experimental results are presented and discussed in the paper.
引用
下载
收藏
页码:3025 / 3028
页数:4
相关论文
共 50 条
  • [21] Mining Audio, Text and Visual Information for Talking Face Generation
    Yu, Lingyun
    Yu, Jun
    Ling, Qiang
    2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 787 - 795
  • [22] MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
    Hayes, Thomas
    Zhang, Songyang
    Yin, Xi
    Pang, Guan
    Sheng, Sasha
    Yang, Harry
    Ge, Songwei
    Hu, Qiyuan
    Parikh, Devi
    COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 431 - 449
  • [23] Text Summarization and Speech Synthesis for the Automated Generation of Personalized Audio Presentations
    Lawless, Seamus
    Lavin, Peter
    Bayomi, Mostafa
    Cabral, Joao P.
    Ghorab, M. Rami
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2015, 2015, 9103 : 307 - 320
  • [24] Video content parsing based on combined audio and visual information
    Zhang, T
    Kuo, CCJ
    MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS IV, 1999, 3846 : 78 - 89
  • [25] What Is the Content of the World's Technologically Mediated Information and Communication Capacity: How Much Text, Image, Audio, and Video?
    Hilbert, Martin
    INFORMATION SOCIETY, 2014, 30 (02): : 127 - 143
  • [26] Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
    Yariv, Guy
    Gat, Itai
    Benaim, Sagie
    Wolf, Lior
    Schwartz, Idan
    Adi, Yossi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6639 - 6647
  • [27] Content Based Lecture Video Retrieval Using Speech and Video Text Information
    Yang, Haojin
    Meinel, Christoph
    IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, 2014, 7 (02): : 142 - 154
  • [28] TA2V: Text-Audio Guided Video Generation
    Zhao, Minglu
    Wang, Wenmin
    Chen, Tongbao
    Zhang, Rui
    Li, Ruochen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7250 - 7264
  • [29] Modem trends of data stream generation of audio-video content
    Shvaichenko, Volodymyr
    TCSET 2006: MODERN PROBLEMS OF RADIO ENGINEERING, TELECOMMUNICATIONS AND COMPUTER SCIENCE, PROCEEDINGS, 2006, : 345 - 346
  • [30] Low level processing of audio and video information for extracting the semantics of content
    Adami, N
    Bugatti, A
    Leonardi, R
    Migliorati, P
    2001 IEEE FOURTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2001, : 607 - 612