Training audio transformers for cover song identification

被引:0
|
作者
Te Zeng
Francis C. M. Lau
机构
[1] The University of Hong Kong,Department of Computer Science
关键词
Cover song identification; Transformer; Music representation learning;
D O I
暂无
中图分类号
学科分类号
摘要
In the past decades, convolutional neural networks (CNNs) have been commonly adopted in audio perception tasks, which aim to learn latent representations. However, for audio analysis, CNNs may exhibit limitations in effectively modeling temporal contextual information. Analogous to the successes of transformer architecture used in the fields of computer vision and audio classification, to capture long-range global contexts better, we here extend this line of work and propose an Audio Similarity Transformer (ASimT), a convolution-free, purely transformer network-based architecture for learning effective representations of audio signals. Furthermore, we introduce a novel loss MAPLoss, used in tandem with classification loss, to directly enhance the mean average precision. In the experiments, ASimT demonstrates its state-of-the-art performance in cover song identification on public datasets.
引用
收藏
相关论文
共 50 条
  • [41] Two-layer similarity fusion model for cover song identification
    Chen, Ning
    Li, Mingyu
    Xiao, Haidong
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2017,
  • [42] Content-Based Cover Song Identification in Music Digital Libraries
    Miotto, Riccardo
    Montecchio, Nicola
    Orio, Nicola
    DIGITAL LIBRARIES, 2010, 91 : 195 - 204
  • [43] Karalk: a karaoke dataset for cover song identification and singing voice analysis
    Bayle, Yann
    Marsik, Ladislav
    Rusek, Martin
    Robine, Matthias
    Hanna, Pierre
    Slaninova, Katerina
    Martinovic, Jan
    Pokorny, Jaroslav
    2017 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2017, : 177 - 184
  • [44] Two-layer similarity fusion model for cover song identification
    Ning Chen
    Mingyu Li
    Haidong Xiao
    EURASIP Journal on Audio, Speech, and Music Processing, 2017
  • [45] LEARNING A REPRESENTATION FOR COVER SONG IDENTIFICATION USING CONVOLUTIONAL NEURAL NETWORK
    Yu, Zhesong
    Xu, Xiaoshuo
    Chen, Xiaoou
    Yang, Deshun
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 541 - 545
  • [46] WideResNet with Joint Representation Learning and Data Augmentation for Cover Song Identification
    Hu, Shichao
    Zhang, Bin
    Lu, Jinhong
    Jiang, Yiliang
    Wang, Wucheng
    Kong, Lingcheng
    Zhao, Weifeng
    Jiang, Tao
    INTERSPEECH 2022, 2022, : 4187 - 4191
  • [47] Deep learning of chroma representation for cover song identification in compression domain
    Jiunn-Tsair Fang
    Yu-Ruey Chang
    Pao-Chi Chang
    Multidimensional Systems and Signal Processing, 2018, 29 : 887 - 902
  • [48] CQTXNet: A Modified Xception Network with Attention Modules for Cover Song Identification
    Seo, Jinsoo
    Kim, Junghyun
    Kim, Hyemi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (01) : 49 - 52
  • [49] Salient Chromagram Extraction Based on Trend Removal for Cover Song Identification
    Seo, Jin S.
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (01): : 51 - 54
  • [50] Efficient Two-Layer Model Towards Cover Song Identification
    Xu, Xiaoshuo
    Cheng, Yao
    Chen, Xiaoou
    Yang, Deshun
    MULTIMEDIA MODELING, MMM 2018, PT II, 2018, 10705 : 118 - 128