Training audio transformers for cover song identification

被引:0
|
作者
Te Zeng
Francis C. M. Lau
机构
[1] The University of Hong Kong,Department of Computer Science
关键词
Cover song identification; Transformer; Music representation learning;
D O I
暂无
中图分类号
学科分类号
摘要
In the past decades, convolutional neural networks (CNNs) have been commonly adopted in audio perception tasks, which aim to learn latent representations. However, for audio analysis, CNNs may exhibit limitations in effectively modeling temporal contextual information. Analogous to the successes of transformer architecture used in the fields of computer vision and audio classification, to capture long-range global contexts better, we here extend this line of work and propose an Audio Similarity Transformer (ASimT), a convolution-free, purely transformer network-based architecture for learning effective representations of audio signals. Furthermore, we introduce a novel loss MAPLoss, used in tandem with classification loss, to directly enhance the mean average precision. In the experiments, ASimT demonstrates its state-of-the-art performance in cover song identification on public datasets.
引用
收藏
相关论文
共 50 条
  • [1] Training audio transformers for cover song identification
    Zeng, Te
    Lau, Francis C. M.
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [2] Audio cover song identification based on tonal sequence alignment
    Serra, Joan
    Gomez, Emilia
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 61 - 64
  • [3] Efficient Training of Audio Transformers with Patchout
    Koutini, Khaled
    Schlueter, Jan
    Eghbal-zadeh, Hamid
    Widmer, Gerhard
    INTERSPEECH 2022, 2022, : 2753 - 2757
  • [4] BYTECOVER: COVER SONG IDENTIFICATION VIA MULTI-LOSS TRAINING
    Du, Xingjian
    Yu, Zhesong
    Zhu, Bilei
    Chen, Xiaoou
    Ma, Zejun
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 551 - 555
  • [5] Audio hashing technique for automatic song identification
    Mapelli, F
    Lancini, R
    ITRE2003: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: RESEARCH AND EDUCATION, 2003, : 84 - 88
  • [6] Similarity fusion scheme for cover song identification
    Chen, Ning
    Xiao, Hai-dong
    ELECTRONICS LETTERS, 2016, 52 (13) : 1173 - 1174
  • [7] Fusing similarity functions for cover song identification
    Ning Chen
    Wei Li
    Haidong Xiao
    Multimedia Tools and Applications, 2018, 77 : 2629 - 2652
  • [8] A HEURISTIC FOR DISTANCE FUSION IN COVER SONG IDENTIFICATION
    Degani, Alessio
    Dalai, Marco
    Leonardi, Riccardo
    Migliorati, Pierangelo
    2013 14TH INTERNATIONAL WORKSHOP ON IMAGE ANALYSIS FOR MULTIMEDIA INTERACTIVE SERVICES (WIAMIS), 2013,
  • [9] Deep feature learning for cover song identification
    Fang, Jiunn-Tsair
    Day, Chi-Ting
    Chang, Pao-Chi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (22) : 23225 - 23238
  • [10] Cross recurrence quantification for cover song identification
    Serra, Joan
    Serra, Xavier
    Andrzejak, Ralph G.
    NEW JOURNAL OF PHYSICS, 2009, 11