Bridging Music and Image via Cross-Modal Ranking Analysis

被引:16
|
作者
Wu, Xixuan [1 ]
Qiao, Yu [2 ]
Wang, Xiaogang [3 ]
Tang, Xiaoou [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Informat Engn, Hong Kong, Hong Kong, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518000, Peoples R China
[3] Chinese Univ Hong Kong, Dept Elect Engn, Hong Kong, Hong Kong, Peoples R China
关键词
Cross-modal; feature embedding; lyric-based image attribute; music-image matching; ordinal regression;
D O I
10.1109/TMM.2016.2557722
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human perceptions of music and image are closely related to each other, since both can inspire similar human sensations, such as emotion, motion, and power. This paper aims to explore whether and how music and image can be automatically matched by machines. The main contributions are three aspects. First, we construct a benchmark dataset composed of more than 45 000 music-image pairs. Human labelers are recruited to annotate whether these pairs are well-matched or not. The results show that they generally agree with each other on the matching degree of music-image pairs. Secondly, we investigate suitable semantic representations of music and image for this cross-modal matching task. In particular, we adopt lyrics as a middle-media to connect music and image, and design a set of lyric-based attributes for image representation. Thirdly, we propose cross-modal ranking analysis (CMRA) to learn the semantic similarity between music and image with ranking labeling information. CMRA aims to find the optimal embedding spaces for both music and image in the sense of maximizing the ordinal margin between music-image pairs. The proposed method is able to learn the non-linear relationship between music and image, and to integrate heterogeneous ranking data from different modalities into a unified space. Experimental results demonstrate that the proposed method outperforms state-of-the-art cross-modal methods in the music-image matching task, and achieves a consistency rate of 91.5% with human labelers.
引用
收藏
页码:1305 / 1318
页数:14
相关论文
共 50 条
  • [41] Audio-to-Image Cross-Modal Generation
    Zelaszczyk, Maciej
    Mandziuk, Jacek
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [42] Composing with Cross-modal Correspondences: Music and Odors in Concert
    Crisinel, Anne-Sylvie
    Jacquier, Caroline
    Deroy, Ophelia
    Spence, Charles
    CHEMOSENSORY PERCEPTION, 2013, 6 (01) : 45 - 52
  • [43] The effect of expertise in music reading: cross-modal competence
    Drai-Zerbib, Veronique
    Baccino, Thierry
    JOURNAL OF EYE MOVEMENT RESEARCH, 2013, 6 (05):
  • [44] Cross-Modal Descriptions of Music in the Latin Middle Ages
    Hentschel, Frank
    ARCHIV FUR MUSIKWISSENSCHAFT, 2023, 80 (04): : 252 - 287
  • [45] Cross-Modal Object Detection Via UAV
    Li, Ang
    Ni, Shouxiang
    Chen, Yanan
    Chen, Jianxin
    Wei, Xin
    Zhou, Liang
    Guizani, Mohsen
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (08) : 10894 - 10905
  • [46] Cross-Modal Subspace Clustering via Deep Canonical Correlation Analysis
    Gao, Quanxue
    Lian, Huanhuan
    Wang, Qianqian
    Sun, Gan
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 3938 - 3945
  • [47] Cross-Modal Multivariate Pattern Analysis
    Meyer, Kaspar
    Kaplan, Jonas T.
    JOVE-JOURNAL OF VISUALIZED EXPERIMENTS, 2011, (57):
  • [48] Cross-modal deep discriminant analysis
    Dai, Xue-mei
    Li, Sheng-Gang
    NEUROCOMPUTING, 2018, 314 : 437 - 444
  • [49] Onsets Coincidence for Cross-Modal Analysis
    Barzelay, Zohar
    Schechner, Yoav Y.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2010, 12 (02) : 108 - 120
  • [50] Cross-Modal Consistency for Single-Modal MR Image Segmentation
    Xu, Wenxuan
    Li, Cangxin
    Bian, Yun
    Meng, Qingquan
    Zhu, Weifang
    Shi, Fei
    Chen, Xinjian
    Shao, Chengwei
    Xiang, Dehui
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2024, 71 (09) : 2557 - 2567