Bridging Music and Image via Cross-Modal Ranking Analysis

被引:16
|
作者
Wu, Xixuan [1 ]
Qiao, Yu [2 ]
Wang, Xiaogang [3 ]
Tang, Xiaoou [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Informat Engn, Hong Kong, Hong Kong, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518000, Peoples R China
[3] Chinese Univ Hong Kong, Dept Elect Engn, Hong Kong, Hong Kong, Peoples R China
关键词
Cross-modal; feature embedding; lyric-based image attribute; music-image matching; ordinal regression;
D O I
10.1109/TMM.2016.2557722
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human perceptions of music and image are closely related to each other, since both can inspire similar human sensations, such as emotion, motion, and power. This paper aims to explore whether and how music and image can be automatically matched by machines. The main contributions are three aspects. First, we construct a benchmark dataset composed of more than 45 000 music-image pairs. Human labelers are recruited to annotate whether these pairs are well-matched or not. The results show that they generally agree with each other on the matching degree of music-image pairs. Secondly, we investigate suitable semantic representations of music and image for this cross-modal matching task. In particular, we adopt lyrics as a middle-media to connect music and image, and design a set of lyric-based attributes for image representation. Thirdly, we propose cross-modal ranking analysis (CMRA) to learn the semantic similarity between music and image with ranking labeling information. CMRA aims to find the optimal embedding spaces for both music and image in the sense of maximizing the ordinal margin between music-image pairs. The proposed method is able to learn the non-linear relationship between music and image, and to integrate heterogeneous ranking data from different modalities into a unified space. Experimental results demonstrate that the proposed method outperforms state-of-the-art cross-modal methods in the music-image matching task, and achieves a consistency rate of 91.5% with human labelers.
引用
收藏
页码:1305 / 1318
页数:14
相关论文
共 50 条
  • [1] Cross-Modal Image Clustering via Canonical Correlation Analysis
    Jin, Cheng
    Mao, Wenhui
    Zhang, Ruiqi
    Zhang, Yuejie
    Xue, Xiangyang
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 151 - 159
  • [2] Image Tagging via Cross-Modal Semantic Mapping
    Deng, Zhi-Hong
    Yu, Hongliang
    Yang, Yunlun
    MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 1143 - 1146
  • [3] RGBT Tracking via Noise-Robust Cross-Modal Ranking
    Li, Chenglong
    Xiang, Zhiqiang
    Tang, Jin
    Luo, Bin
    Wang, Futian
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (09) : 5019 - 5031
  • [4] Cross-modal image sentiment analysis via deep correlation of textual semantic
    Zhang, Ke
    Zhu, Yunwen
    Zhang, Wenjun
    Zhu, Yonghua
    KNOWLEDGE-BASED SYSTEMS, 2021, 216
  • [5] Cross-modal challenging: Projection of brain response on stereoscopic image quality ranking
    Shen, Lili
    Sun, Xichun
    Pan, Zhaoqing
    Li, Xintong
    Zheng, Jianpu
    Zhang, Yixuan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 87
  • [6] A CROSS-MODAL VARIATIONAL FRAMEWORK FOR FOOD IMAGE ANALYSIS
    Theodoridis, Thomas
    Solachidis, Vassilios
    Dimitropoulos, Kosmas
    Daras, Petros
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 3244 - 3248
  • [7] Image sentiment analysis via active sample refinement and cross-modal semantics mining
    Zhang H.-B.
    Shi H.-W.
    Xiong Q.-P.
    Hou J.-Y.
    Kongzhi yu Juece/Control and Decision, 2022, 37 (11): : 2949 - 2958
  • [8] PL-ranking: A Novel Ranking Method for Cross-Modal Retrieval
    Zhang, Liang
    Ma, Bingpeng
    Li, Guorong
    Huang, Qingming
    Tian, Qi
    MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 1355 - 1364
  • [9] A Web Image Retrieval Re-ranking Scheme with Cross-Modal Association Rules
    Zhu, Yong
    Xiong, Naixue
    Park, Jong Hyuk
    He, Ruhan
    INTERNATIONAL SYMPOSIUM ON UBIQUITOUS MULTIMEDIA COMPUTING, PROCEEDINGS, 2008, : 83 - +
  • [10] Database-adaptive Re-ranking for Enhancing Cross-modal Image Retrieval
    Yanagi, Rintaro
    Togo, Ren
    Ogawa, Takahiro
    Haseyama, Miki
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3816 - 3825