Diff-BGM: A Diffusion Model for Video Background Music Generation

被引:1
|
作者
Li, Sizhe [1 ]
Qin, Yiming [1 ]
Zheng, Minghang [1 ]
Jin, Xin [2 ,3 ]
Liu, Yang [1 ]
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China
[2] Beijing Elect Sci & Technol Inst, Beijing, Peoples R China
[3] Beijing Inst Gen Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52733.2024.02582
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When editing a video, a piece of attractive background music is indispensable. However, video background music generation tasks face several challenges, for example, the lack of suitable training datasets, and the difficulties in flexibly controlling the music generation process and sequentially aligning the video and music. In this work, we first propose a high-quality music-video dataset BGM909 with detailed annotation and shot detection to provide multi-modal information about the video and music. We then present evaluation metrics to assess music quality, including music diversity and alignment between music and video with retrieval precision metrics. Finally, we propose the Diff-BGM framework to automatically generate the background music for a given video, which uses different signals to control different aspects of the music during the generation process, i.e., uses dynamic video features to control music rhythm and semantic features to control the melody and atmosphere. We propose to align the video and music sequentially by introducing a segment-aware cross-attention layer. Experiments verify the effectiveness of our proposed method. The code and models are available at https://github.com/sizhelee/Diff-BGM.
引用
收藏
页码:27338 / 27347
页数:10
相关论文
共 50 条
  • [1] Video Background Music Generation with Controllable Music Transformer
    Di, Shangzhe
    Jiang, Zeren
    Liu, Si
    Wang, Zhaokai
    Zhu, Leyan
    He, Zexin
    Liu, Hongming
    Yan, Shuicheng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2037 - 2045
  • [2] BS-BGM: Leveraging Vision Language Models and Shot Boundary for Beat-Synced Video Background Music Generation
    Sun, Yuji
    Luo, Yunbin
    Chen, Hao
    IEEE ACCESS, 2025, 13 : 57241 - 57254
  • [3] Video Background Music Generation: Dataset, Method and Evaluation
    Zhuo, Le
    Wang, Zhaokai
    Wang, Baisen
    Liao, Yue
    Bao, Chenxi
    Peng, Stanley
    Han, Songhao
    Zhang, Aixi
    Fang, Fei
    Liu, Si
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15591 - 15601
  • [4] CRS-Diff: Controllable Remote Sensing Image Generation With Diffusion Model
    Tang, Datao
    Cao, Xiangyong
    Hou, Xingsong
    Jiang, Zhongyuan
    Liu, Junmin
    Meng, Deyu
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [5] Diff-Font: Diffusion Model for Robust One-Shot Font Generation
    He, Haibin
    Chen, Xinyuan
    Wang, Chaoyue
    Liu, Juhua
    Du, Bo
    Tao, Dacheng
    Yu, Qiao
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5372 - 5386
  • [6] Application and Research of Music Generation System Based on CVAE and Transformer-XL in Video Background Music
    Min, Jun
    Gao, Zhiwei
    Wang, Lei
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2025, 21 (02) : 1409 - 1418
  • [7] Video Background Music Recognition and Automatic Recommendation Based on GMM Model
    Zhou W.
    Ma K.
    Informatica (Slovenia), 2023, 47 (07): : 41 - 50
  • [8] Effective Σ-Δ Background Estimation for Video Background Generation
    Cheng, Fan-Chei
    Chen, Yu-Kumg
    2008 IEEE ASIA-PACIFIC SERVICES COMPUTING CONFERENCE, VOLS 1-3, PROCEEDINGS, 2008, : 1315 - 1321
  • [9] Diff-Tree: A Diffusion Model for Diversified Tree Point Cloud Generation with High Realism
    Xu, Haifeng
    Huai, Yongjian
    Nie, Xiaoying
    Meng, Qingkuo
    Zhao, Xun
    Pei, Xuanda
    Lu, Hao
    REMOTE SENSING, 2025, 17 (05)
  • [10] Video Echoed in Harmony: Learning and Sampling Video-Integrated Chord Progression Sequences for Controllable Video Background Music Generation
    Tong, Xinyi
    Chen, Sitong
    Yu, Peiyang
    Liu, Nian
    Qv, Hui
    Ma, Tao
    Zheng, Bo
    Yu, Feng
    Zhu, Song-Chun
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024,