Diff-BGM: A Diffusion Model for Video Background Music Generation

被引:1
|
作者
Li, Sizhe [1 ]
Qin, Yiming [1 ]
Zheng, Minghang [1 ]
Jin, Xin [2 ,3 ]
Liu, Yang [1 ]
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China
[2] Beijing Elect Sci & Technol Inst, Beijing, Peoples R China
[3] Beijing Inst Gen Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52733.2024.02582
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When editing a video, a piece of attractive background music is indispensable. However, video background music generation tasks face several challenges, for example, the lack of suitable training datasets, and the difficulties in flexibly controlling the music generation process and sequentially aligning the video and music. In this work, we first propose a high-quality music-video dataset BGM909 with detailed annotation and shot detection to provide multi-modal information about the video and music. We then present evaluation metrics to assess music quality, including music diversity and alignment between music and video with retrieval precision metrics. Finally, we propose the Diff-BGM framework to automatically generate the background music for a given video, which uses different signals to control different aspects of the music during the generation process, i.e., uses dynamic video features to control music rhythm and semantic features to control the melody and atmosphere. We propose to align the video and music sequentially by introducing a segment-aware cross-attention layer. Experiments verify the effectiveness of our proposed method. The code and models are available at https://github.com/sizhelee/Diff-BGM.
引用
收藏
页码:27338 / 27347
页数:10
相关论文
共 50 条
  • [31] Diff-TTS: A Denoising Diffusion Model for Text-to-Speech
    Jeong, Myeonghun
    Kim, Hyeongju
    Cheon, Sung Jun
    Choi, Byoung Jin
    Kim, Nam Soo
    INTERSPEECH 2021, 2021, : 3605 - 3609
  • [32] Diff-pcg: diffusion point cloud generation conditioned on continuous normalizing flow
    Yu, Ting
    Meng, Weiliang
    Wu, Zhongqi
    Guo, Jianwei
    Zhang, Xiaopeng
    VISUAL COMPUTER, 2025, 41 (02): : 853 - 867
  • [33] Automatic Panorama Generation from a Video with Dynamic Background
    Kumara, W. G. C. W.
    Chang, Shih-Ming
    Shih, Timothy K.
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER), 2013, : 8 - 14
  • [34] Nonparametric On-line Background Generation for Surveillance Video
    Zhang, Rui
    Gong, Weiguo
    Yaworski, Andrew
    Greenspan, Michael
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 1177 - 1180
  • [35] Background reference frame generation method for surveillance video based on image block codebook model
    Zhang W.
    Wang Y.
    Chen X.
    Wang Y.
    Jing Q.
    Lei W.
    Tongxin Xuebao/Journal on Communications, 2023, 44 (01): : 129 - 141
  • [36] DIFF-HOD: DIFFUSION MODEL FOR OBJECT DETECTION IN HAZY WEATHER CONDITIONS
    Li, Yizhan
    Yu, Rongwei
    Shi, Junjie
    Wang, Lina
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 6285 - 6289
  • [37] StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation
    Zhai, Shangjin
    Ye, Zhichao
    Liu, Jialin
    Xie, Weijian
    Hu, Jiaqi
    Peng, Zhen
    Xue, Hua
    Chen, Danpeng
    Wang, Xiaomeng
    Yang, Lei
    Wang, Nan
    Liu, Haomin
    Zhang, Guofeng
    arXiv,
  • [38] A Diffusion Dimensionality Reduction Approach to Background Subtraction in Video Sequences
    Dushnik, Dina
    Schclar, Alon
    Averbuch, Amir
    Saabni, Raid
    PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL INTELLIGENCE (IJCCI), 2020, : 294 - 300
  • [39] Music Conditioned Generation for Human-Centric Video
    Zhao, Zimeng
    Zuo, Binghui
    Wang, Yangang
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 506 - 510
  • [40] Automatic Generation of Summarized Driving Video with Music and Captions
    Takenaka, Kazuhito
    Bando, Takashi
    Taniguchi, Tadahiro
    IECON 2015 - 41ST ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2015, : 2409 - 2414