Music Conditioned Generation for Human-Centric Video

被引:0
|
作者
Zhao, Zimeng [1 ]
Zuo, Binghui [1 ]
Wang, Yangang [1 ]
机构
[1] Southeast Univ, Sch Automat, Key Lab Measurement & Control Complex Syst Engn, Minist Educ, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金;
关键词
Multiple signal classification; Generative adversarial networks; Correlation; Visualization; Training; Task analysis; Feature extraction; Video generation; signal processing; cross-modal learning; human-centric;
D O I
10.1109/LSP.2024.3358978
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Music and human-centric video are two fundamental signals across languages. Correlation analysis between the two is currently used in choreography and film accompaniment. This letter explores this correlation in a new task: human-centric video generation from a start-end image pair and transitional music. Existing human-centric generation methods are not competent for this task because they require frame-wise pose as input or have difficulty handling long-duration videos. Our key idea is to build a temporal generation framework dominated by DDPM and assisted by VAE and GAN. It reduces the computational cost of music-image diffusion by utilizing the latent space compactness of VAE and the image translation efficiency of GAN. To produce videos with both long duration and high quality, our framework first generates small-scale keyframes and then generates high-resolution videos. To strengthen the frame-wise consistency of the human body, a frame-aligned correspondence map is adopted as an intermediate supervision. Extensive experiments compared with the SOTA method have demonstrated the rationality and effectiveness of this signal generation framework.
引用
收藏
页码:506 / 510
页数:5
相关论文
共 50 条
  • [1] Toward human-centric deep video understanding
    Zeng, Wenjun
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2020, 9
  • [2] Human-Centric Navigation System Video Vortex for Video Retrieval
    Haseyama, Miki
    Ogawa, Takahiro
    IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE 2011), 2011, : 167 - 168
  • [3] PANDA: A Gigapixel-level Human-centric Video Dataset
    Wang, Xueyang
    Zhang, Xiya
    Zhu, Yinheng
    Guo, Yuchen
    Yuan, Xiaoyun
    Xiang, Liuyu
    Wang, Zerun
    Ding, Guiguang
    Brady, David
    Dai, Qionghai
    Fang, Lu
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3265 - 3275
  • [4] A Unified Framework for Human-centric Point Cloud Video Understanding
    Xu, Yiteng
    Ye, Kecheng
    Han, Xiao
    Ren, Yiming
    Zhu, Xinge
    Ma, Yuexin
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 1155 - 1164
  • [5] EgoEnv: Human-centric environment representations from egocentric video
    Nagarajan, Tushar
    Ramakrishnan, Santhosh Kumar
    Desai, Ruta
    Hillis, James
    Grauman, Kristen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Human-centric sensing
    Srivastava, Mani
    Abdelzaher, Tarek
    Szymanski, Boleslaw
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2012, 370 (1958): : 176 - 197
  • [7] Human-Centric Computing
    Rabaey, Jan M.
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (01) : 3 - 11
  • [8] The Human-Centric SMED
    Fonda, Edoardo
    Meneghetti, Antonella
    SUSTAINABILITY, 2022, 14 (01)
  • [9] Human-Centric Computing
    Rabaey, Jan M.
    2021 IEEE INTERNATIONAL ELECTRON DEVICES MEETING (IEDM), 2021,
  • [10] Human-centric assembly
    Tracht, Kirsten
    Weidner, Robert
    WT Werkstattstechnik, 2023, 113 (09):