Bridging Music and Image via Cross-Modal Ranking Analysis

被引:16
|
作者
Wu, Xixuan [1 ]
Qiao, Yu [2 ]
Wang, Xiaogang [3 ]
Tang, Xiaoou [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Informat Engn, Hong Kong, Hong Kong, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518000, Peoples R China
[3] Chinese Univ Hong Kong, Dept Elect Engn, Hong Kong, Hong Kong, Peoples R China
关键词
Cross-modal; feature embedding; lyric-based image attribute; music-image matching; ordinal regression;
D O I
10.1109/TMM.2016.2557722
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human perceptions of music and image are closely related to each other, since both can inspire similar human sensations, such as emotion, motion, and power. This paper aims to explore whether and how music and image can be automatically matched by machines. The main contributions are three aspects. First, we construct a benchmark dataset composed of more than 45 000 music-image pairs. Human labelers are recruited to annotate whether these pairs are well-matched or not. The results show that they generally agree with each other on the matching degree of music-image pairs. Secondly, we investigate suitable semantic representations of music and image for this cross-modal matching task. In particular, we adopt lyrics as a middle-media to connect music and image, and design a set of lyric-based attributes for image representation. Thirdly, we propose cross-modal ranking analysis (CMRA) to learn the semantic similarity between music and image with ranking labeling information. CMRA aims to find the optimal embedding spaces for both music and image in the sense of maximizing the ordinal margin between music-image pairs. The proposed method is able to learn the non-linear relationship between music and image, and to integrate heterogeneous ranking data from different modalities into a unified space. Experimental results demonstrate that the proposed method outperforms state-of-the-art cross-modal methods in the music-image matching task, and achieves a consistency rate of 91.5% with human labelers.
引用
收藏
页码:1305 / 1318
页数:14
相关论文
共 50 条
  • [21] Multimode Fiber Image Transmission via Cross-Modal Knowledge distillation
    Lin, Weixuan
    Wu, Di
    Boulet, Benoit
    2024 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CCECE 2024, 2024, : 13 - 19
  • [22] A cross-modal method of labeling music tags
    Jia-Lien Hsu
    Yen-Fu Li
    Multimedia Tools and Applications, 2012, 58 : 521 - 541
  • [23] AUDIO-TO-SYMBOLIC ARRANGEMENT VIA CROSS-MODAL MUSIC REPRESENTATION LEARNING
    Wang, Ziyu
    Xu, Dejing
    Xia, Gus
    Shan, Ying
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 181 - 185
  • [24] Assessing kinetic meaning of music and dance via deep cross-modal retrieval
    Raposo, Francisco Afonso
    Martins de Matos, David
    Ribeiro, Ricardo
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (21): : 14481 - 14493
  • [25] Assessing kinetic meaning of music and dance via deep cross-modal retrieval
    Francisco Afonso Raposo
    David Martins de Matos
    Ricardo Ribeiro
    Neural Computing and Applications, 2021, 33 : 14481 - 14493
  • [26] Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval
    Cheng, Qingrong
    Gu, Xiaodong
    NEURAL NETWORKS, 2021, 134 : 143 - 162
  • [27] A Cross-modal Heuristic for Periodic Pattern Analysis of Samba Music and Dance
    Naveda, Luiz
    Leman, Marc
    JOURNAL OF NEW MUSIC RESEARCH, 2009, 38 (03) : 255 - 283
  • [28] Cross-Domain Image Captioning via Cross-Modal Retrieval and Model Adaptation
    Zhao, Wentian
    Wu, Xinxiao
    Luo, Jiebo
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 1180 - 1192
  • [29] Cross-modal localization via sparsity
    Kidron, Einat
    Schechner, Yoav Y.
    Elad, Michael
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (04) : 1390 - 1404
  • [30] Information Fusion via Deep Cross-Modal Factor Analysis
    Gao, Lei
    Guan, Ling
    2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2019,