Diff-BGM: A Diffusion Model for Video Background Music Generation

被引：1

作者：

Li, Sizhe ^{[1
]}

Qin, Yiming ^{[1
]}

Zheng, Minghang ^{[1
]}

Jin, Xin ^{[2
,3
]}

Liu, Yang ^{[1
]}

机构：

[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China

[2] Beijing Elect Sci & Technol Inst, Beijing, Peoples R China

[3] Beijing Inst Gen Artificial Intelligence, Beijing, Peoples R China

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR52733.2024.02582

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

When editing a video, a piece of attractive background music is indispensable. However, video background music generation tasks face several challenges, for example, the lack of suitable training datasets, and the difficulties in flexibly controlling the music generation process and sequentially aligning the video and music. In this work, we first propose a high-quality music-video dataset BGM909 with detailed annotation and shot detection to provide multi-modal information about the video and music. We then present evaluation metrics to assess music quality, including music diversity and alignment between music and video with retrieval precision metrics. Finally, we propose the Diff-BGM framework to automatically generate the background music for a given video, which uses different signals to control different aspects of the music during the generation process, i.e., uses dynamic video features to control music rhythm and semantic features to control the melody and atmosphere. We propose to align the video and music sequentially by introducing a segment-aware cross-attention layer. Experiments verify the effectiveness of our proposed method. The code and models are available at https://github.com/sizhelee/Diff-BGM.

引用

页码：27338 / 27347

页数：10

共 50 条

[41] A system for automatic generation of music sports-video
Zhang, WG
Xing, LY
Huang, QM
Gao, W
2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 1287 - 1290
[42] Automated Music Video Generation Using Emotion Synchronization
Shin, Ki-Ho
Kim, Hye-Rin
Lee, In-Kwon
2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 2594 - 2597
[43] DSR-Diff: Depth map super-resolution with diffusion model
Shi, Yuan
Cao, Huiyun
Xia, Bin
Zhu, Rui
Liao, Qingmin
Yang, Wenming
PATTERN RECOGNITION LETTERS, 2024, 184 : 225 - 231
[44] ASD-Diff: Unsupervised Anomalous Sound Detection with Masked Diffusion Model
Fan, Xin
Fang, Wenjie
Hu, Ying
MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 55 - 65
[45] Automatic Recommendation Algorithm for Video Background Music Based on Deep Learning
Kai, Hong
COMPLEXITY, 2021, 2021
[46] Automatic synthesis of background music track data by analysis of video contents
Modegi, T
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2004, PT 3, PROCEEDINGS, 2004, 3333 : 591 - 598
[47] BACKGROUND MUSIC RECOMMENDATION FOR VIDEO BASED ON MULTIMODAL LATENT SEMANTIC ANALYSIS
Kuo, Fang-Fei
Shan, Man-Kwan
Lee, Suh-Yin
2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2013), 2013,
[48] UP-Diff: Latent Diffusion Model for Remote Sensing Urban Prediction
Wang, Zeyu
Hao, Zecheng
Zhang, Yuhan
Feng, Yuchao
Guo, Yufei
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2025, 22
[49] Automatic synthesis of background music track data by analysis of video contents
Modegi, T
IEEE INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2004 (ISCIT 2004), PROCEEDINGS, VOLS 1 AND 2: SMART INFO-MEDIA SYSTEMS, 2004, : 431 - 436
[50] Automatic Music Video Generation Based on Simultaneous Soundtrack Recommendation and Video Editing
Lin, Jen-Chun
Wei, Wen-Li
Yang, James
Wang, Hsin-Min
Liao, Hong-Yuan Mark
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 519 - 527

← 1 2 3 4 5 →