共 50 条
- [21] Mining Audio, Text and Visual Information for Talking Face Generation 2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 787 - 795
- [22] MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 431 - 449
- [23] Text Summarization and Speech Synthesis for the Automated Generation of Personalized Audio Presentations NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2015, 2015, 9103 : 307 - 320
- [24] Video content parsing based on combined audio and visual information MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS IV, 1999, 3846 : 78 - 89
- [25] What Is the Content of the World's Technologically Mediated Information and Communication Capacity: How Much Text, Image, Audio, and Video? INFORMATION SOCIETY, 2014, 30 (02): : 127 - 143
- [26] Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6639 - 6647
- [27] Content Based Lecture Video Retrieval Using Speech and Video Text Information IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, 2014, 7 (02): : 142 - 154
- [28] TA2V: Text-Audio Guided Video Generation IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7250 - 7264
- [29] Modem trends of data stream generation of audio-video content TCSET 2006: MODERN PROBLEMS OF RADIO ENGINEERING, TELECOMMUNICATIONS AND COMPUTER SCIENCE, PROCEEDINGS, 2006, : 345 - 346
- [30] Low level processing of audio and video information for extracting the semantics of content 2001 IEEE FOURTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2001, : 607 - 612