Bridging Music and Image via Cross-Modal Ranking Analysis

被引:16
|
作者
Wu, Xixuan [1 ]
Qiao, Yu [2 ]
Wang, Xiaogang [3 ]
Tang, Xiaoou [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Informat Engn, Hong Kong, Hong Kong, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518000, Peoples R China
[3] Chinese Univ Hong Kong, Dept Elect Engn, Hong Kong, Hong Kong, Peoples R China
关键词
Cross-modal; feature embedding; lyric-based image attribute; music-image matching; ordinal regression;
D O I
10.1109/TMM.2016.2557722
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human perceptions of music and image are closely related to each other, since both can inspire similar human sensations, such as emotion, motion, and power. This paper aims to explore whether and how music and image can be automatically matched by machines. The main contributions are three aspects. First, we construct a benchmark dataset composed of more than 45 000 music-image pairs. Human labelers are recruited to annotate whether these pairs are well-matched or not. The results show that they generally agree with each other on the matching degree of music-image pairs. Secondly, we investigate suitable semantic representations of music and image for this cross-modal matching task. In particular, we adopt lyrics as a middle-media to connect music and image, and design a set of lyric-based attributes for image representation. Thirdly, we propose cross-modal ranking analysis (CMRA) to learn the semantic similarity between music and image with ranking labeling information. CMRA aims to find the optimal embedding spaces for both music and image in the sense of maximizing the ordinal margin between music-image pairs. The proposed method is able to learn the non-linear relationship between music and image, and to integrate heterogeneous ranking data from different modalities into a unified space. Experimental results demonstrate that the proposed method outperforms state-of-the-art cross-modal methods in the music-image matching task, and achieves a consistency rate of 91.5% with human labelers.
引用
收藏
页码:1305 / 1318
页数:14
相关论文
共 50 条
  • [31] Deep Cross-Modal Hashing Based on Semantic Consistent Ranking
    Liu, Xiaoqing
    Zeng, Huanqiang
    Shi, Yifan
    Zhu, Jianqing
    Hsia, Chih-Hsien
    Ma, Kai-Kuang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9530 - 9542
  • [32] Ranking-Based Supervised Discrete Cross-Modal Hashing
    Li H.-Q.
    Wang Y.-X.
    Chen Z.-D.
    Luo X.
    Xu X.-S.
    Jisuanji Xuebao/Chinese Journal of Computers, 2021, 44 (08): : 1620 - 1635
  • [33] Deep Cross-Modal Hashing With Ranking Learning for Noisy Labels
    Shu, Zhenqiu
    Bai, Yibing
    Yong, Kailing
    Yu, Zhengtao
    IEEE TRANSACTIONS ON BIG DATA, 2025, 11 (02) : 553 - 565
  • [34] Cross-modal attention for multi-modal image registration
    Song, Xinrui
    Chao, Hanqing
    Xu, Xuanang
    Guo, Hengtao
    Xu, Sheng
    Turkbey, Baris
    Wood, Bradford J.
    Sanford, Thomas
    Wang, Ge
    Yan, Pingkun
    MEDICAL IMAGE ANALYSIS, 2022, 82
  • [35] Domain Adaptive Cross-Modal Image Retrieval via Modality and Domain Translations
    Yanagi, Rintaro
    Togo, Ren
    Ogawa, Takahiro
    Haseyama, Miki
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2021, E104A (06) : 866 - 875
  • [36] CMAF: Cross-modal Augmentation via Fusion for Underwater Acoustic Image Recognition
    Yang, Shih-Wei
    Shen, Li-Hsiang
    Shuai, Hong-Han
    Feng, Kai-Ten
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (05)
  • [37] Cross-Modal Image Registration via Rasterized Parameter Prediction for Object Tracking
    Zhang, Qing
    Xiang, Wei
    APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [38] Cross-Modal Manifold Propagation for Image Recommendation
    Jian, Meng
    Guo, Jingjing
    Fu, Xin
    Wu, Lifang
    Jia, Ting
    APPLIED SCIENCES-BASEL, 2022, 12 (06):
  • [39] Cross-Modal Saliency Correlation for Image Annotation
    Yun Gu
    Haoyang Xue
    Jie Yang
    Neural Processing Letters, 2017, 45 : 777 - 789
  • [40] Cross-Modal Saliency Correlation for Image Annotation
    Gu, Yun
    Xue, Haoyang
    Yang, Jie
    NEURAL PROCESSING LETTERS, 2017, 45 (03) : 777 - 789