Continuous Emotion-Based Image-to-Music Generation

被引:0
|
作者
Wang, Yajie [1 ,2 ]
Chen, Mulin [1 ,2 ]
Li, Xuelong [1 ,2 ]
机构
[1] Northwestern Polytech Univ, Sch Artificial Intelligence, OPt & Elect iOPEN, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Key Lab Intelligent Interact & Applicat, Minist Ind & Informat Technol, Xian 710072, Peoples R China
关键词
Image-to-music generation; valence-arousal space; multi-modal cognitive computing; vicinagearth security;
D O I
10.1109/TMM.2023.3338089
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image-to-music generation aims to generate realistic pure music according to a given image. Although many previous works are conducted on bridging image and music, they mainly focus on the content-based cross-modal matching. For example, matching the Christmas song to an image that contains a Christmas tree. By comparison, image-to-music generation is a more challenging task due to its ambiguity and subjectivity. Specifically, there is no explicit correlation between the image content and music melody, without any lyric and human sound. Meanwhile, the perception of generated music varies from person to person. Inspired by the synesthesia phenomenon, we think that if an image tends to elicit a certain emotion on human, the generated music should also leave a similar impression. Therefore, in this paper, we propose a continuous emotion-based image-to-music generation framework, which uses emotion as the key for cross-modal generation. Specifically, a new image-music dataset is established, which uses valence-arousal (VA) space to capture the complex and nuanced nature of emotions. After that, a plug and play model is proposed to translate an image into a piece of music with similar emotion, which projects the emotions into continuous-valued labels, and explores both the intra-modal and inter-modal emotional consistency with contrastive learning. To our best knowledge, this is the first end-to-end framework towards the task of pure music generation from natural images. Extensive experiments show that the generated music achieves satisfactory emotional consistency with the input images, as well as impressive quality.
引用
收藏
页码:5670 / 5679
页数:10
相关论文
共 50 条
  • [21] Emotion-based interaction
    Sourina, Olga
    Li, Ling
    Pan, Zhigeng
    JOURNAL ON MULTIMODAL USER INTERFACES, 2012, 5 (1-2) : 1 - 1
  • [22] Emotion-based interaction
    Olga Sourina
    Ling Li
    Zhigeng Pan
    Journal on Multimodal User Interfaces, 2012, 5 : 1 - 1
  • [23] A Reinforcement Learning Approach to Emotion-based Automatic Playlist Generation
    Chi, Chung-Yi
    Tsai, Richard Tzong-Han
    Lai, Jeng-You
    Hsu, Jane Yung-Jen
    INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2010), 2010, : 60 - 65
  • [24] Emotion-based Music Recommendation Using Audio Features and User Playlist
    Deng, James J.
    Leung, Clement
    2012 6TH INTERNATIONAL CONFERENCE ON NEW TRENDS IN INFORMATION SCIENCE, SERVICE SCIENCE AND DATA MINING (ISSDM2012), 2012, : 796 - 801
  • [25] Automatic Emotion-Based Music Classification for Supporting Intelligent IoT Applications
    Seo, Yeong-Seok
    Huh, Jun-Ho
    ELECTRONICS, 2019, 8 (02)
  • [26] A NOVEL METHOD FOR MUSIC RETRIEVAL BY INTEGRATING CONTENT-BASED AND EMOTION-BASED FEATURES
    Lu, Cheng-Che
    Tseng, Vincent S.
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2010, 6 (09): : 4077 - 4091
  • [27] An Emotion-Based Search Engine
    Benazzouz, Yazid
    Boudour, Rachid
    PROCEEDINGS OF THE FUTURE TECHNOLOGIES CONFERENCE (FTC) 2019, VOL 1, 2020, 1069 : 193 - 203
  • [28] Emotion-based music recommendation and classification using machine learning with IoT Framework
    Quasim, Mohammad Tabrez
    Alkhammash, Eman H.
    Khan, Mohammad Ayoub
    Hadjouni, Myriam
    SOFT COMPUTING, 2021, 25 (18) : 12249 - 12260
  • [29] EMOTION-BASED MUSIC RETRIEVAL ON A WELL-REDUCED AUDIO FEATURE SPACE
    Ruxanda, Maria M.
    Chua, Bee Yong
    Nanopoulos, Alexandros
    Jensen, Christian S.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 181 - +
  • [30] Emotion-Based Interventions for Clinicians
    Stevens, Francis L.
    JOURNAL OF CONTEMPORARY PSYCHOTHERAPY, 2022, 52 (04) : 329 - 336