Training audio transformers for cover song identification

被引:0
|
作者
Te Zeng
Francis C. M. Lau
机构
[1] The University of Hong Kong,Department of Computer Science
关键词
Cover song identification; Transformer; Music representation learning;
D O I
暂无
中图分类号
学科分类号
摘要
In the past decades, convolutional neural networks (CNNs) have been commonly adopted in audio perception tasks, which aim to learn latent representations. However, for audio analysis, CNNs may exhibit limitations in effectively modeling temporal contextual information. Analogous to the successes of transformer architecture used in the fields of computer vision and audio classification, to capture long-range global contexts better, we here extend this line of work and propose an Audio Similarity Transformer (ASimT), a convolution-free, purely transformer network-based architecture for learning effective representations of audio signals. Furthermore, we introduce a novel loss MAPLoss, used in tandem with classification loss, to directly enhance the mean average precision. In the experiments, ASimT demonstrates its state-of-the-art performance in cover song identification on public datasets.
引用
收藏
相关论文
共 50 条
  • [31] FALCON: FAst Lucene-based Cover sOng identificatioN
    Department of Information Engineering, University of Padova, Padova, Italy
    MM - Proc. ACM Multimedia Int. Conf., (1477-1480):
  • [32] MUSIC FINGERPRINT EXTRACTION FOR CLASSICAL MUSIC COVER SONG IDENTIFICATION
    Kim, Samuel
    Unal, Erdem
    Narayanan, Shrikanth
    2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 1261 - 1264
  • [33] On Accuracy and Time Processing Evaluation of Cover Song Identification Systems
    Ferreira, Martha Dais
    Correa, Debora Cristina
    Grivet, Marcos Antonio
    dos Santos, Geovan Tavares
    de Mello, Rodrigo Fernandes
    Nonato, Luis Gustavo
    JOURNAL OF NEW MUSIC RESEARCH, 2016, 45 (04) : 333 - 342
  • [34] Dynamic chroma feature vectors with applications to cover song identification
    Kim, Samuel
    Narayanan, Shrikanth
    2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 988 - 991
  • [35] A code-based chromagram similarity for cover song identification
    Seo, Jin Soo
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2019, 38 (03): : 314 - 319
  • [36] Streaming Audio Transformers for Online Audio Tagging
    Dinkel, Heinrich
    Yan, Zhiyong
    Wang, Yongqing
    Zhang, Junbo
    Wang, Yujun
    Bin Wang
    INTERSPEECH 2024, 2024, : 1145 - 1149
  • [37] Known-Artist Live Song Identification Using Audio Hashprints
    Tsai, T. J.
    Praetzlich, Thomas
    Mueller, Meinard
    IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (07) : 1569 - 1582
  • [38] PCA Summarization for Audio Song Identification using Gaussian Mixture Models
    Panagiotou, Vaia
    Mitianoudis, Nikolaos
    2013 18TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2013,
  • [39] Fast Cover Song Retrieval in Advanced Audio Coding Domain based on Deep Learning Technique
    Fang, Jiunn-Tsair
    Chang, Yu-Ruey
    Chang, Pao-Chi
    2016 DATA COMPRESSION CONFERENCE (DCC), 2016, : 591 - 591
  • [40] HOW TO SELECT AUDIO TRANSFORMERS
    SCHULZ, B
    INSTRUMENTS & CONTROL SYSTEMS, 1977, 50 (05): : 77 - 78