Music auto-tagging using scattering transform and convolutional neural network with self-attention

被引:10
|
作者
Song, Guangxiao [1 ,2 ]
Wang, Zhijie [1 ]
Han, Fang [1 ]
Ding, Shenyi [1 ]
Gu, Xiaochun [1 ]
机构
[1] Donghua Univ, Coll Informat Sci & Technol, Shanghai 201620, Peoples R China
[2] Washington Univ, Dept Comp Sci & Engn, St Louis, MO 63130 USA
基金
中国国家自然科学基金;
关键词
Music auto-tagging; Convolutional neural network; Attention mechanism; Scattering transform; Deep learning; DEEP; CLASSIFICATION; IMAGE;
D O I
10.1016/j.asoc.2020.106702
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a branch of machine learning, deep learning has been used for tackling with the music auto-tagging problem. Deep learning methods, especially those with convolutional neural network (CNN) architecture, have exhibited good performance on this multi-label classification task. However, the feature extracting part and preprocessing part of this architecture need to be improved. In this paper, we propose a deep-learning model based on CNN with scattering transform and self-attention mechanism for music automatic tagging. To get a balance between information integrity and feature extraction in the preprocessing phase, we employ the scattering transform. Then, a multi-layer CNN is used to extract higher-level features from the scattering coefficients. In order to select better receptive fields of the CNN, self-attention sub-network is appended at the last layer of CNN. Experimental results on the MagnaTagATune dataset and Million Song Dataset (MSD) show the proposed model is a good choice for music auto-tagging task, since the scores of the area under the receiver operating characteristic curve (ROC-AUC) and the area under the precision-recall curve (PR-AUC) obtained in this paper surpass the state-of-the-art models. Furthermore, we visualize the distributions of attention weights, activations of the CNN and ROC-AUC scores on each tag for better understanding of the model. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] A Multi-scale Convolutional Neural Network Architecture for Music Auto-Tagging
    Dabral, Tanmaya Shekhar
    Deshmukh, Amala Sanjay
    Malapati, Aruna
    [J]. SOFT COMPUTING FOR PROBLEM SOLVING, SOCPROS 2017, VOL 1, 2019, 816 : 757 - 764
  • [2] Music auto-tagging using deep Recurrent Neural Networks
    Song, Guangxiao
    Wang, Zhijie
    Han, Fang
    Ding, Shenyi
    Iqbal, Muhammad Ather
    [J]. NEUROCOMPUTING, 2018, 292 : 104 - 110
  • [3] Secondary Learning and Kernel Initialization on Auto-tagging of Music Events Using Convolutional Neural Networks
    Wu, Chi-Sheng
    Pan, Lei
    Soo, Von-Wun
    [J]. PROCEEDINGS OF THE 2017 IEEE INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION AND ENGINEERING (IEEE-ICICE 2017), 2017, : 412 - 415
  • [4] Development of System for Auto-Tagging Articles, Based on Neural Network
    Mukalov, Pavlo
    Zelinskyi, Oleksandr
    Levkovych, Roman
    Tarnavskyi, Petro
    Pylyp, Anastasiia
    Shakhovska, Nataliya
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT SYSTEMS (COLINS-2019), VOL I: MAIN CONFERENCE, 2019, 2362 : 106 - 115
  • [5] Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging
    Lee, Jongpil
    Nam, Juhan
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (08) : 1208 - 1212
  • [6] Leukocyte subtypes identification using bilinear self-attention convolutional neural network
    Yang, Dongxu
    Zhao, Hongdong
    Han, Tiecheng
    Kang, Qing
    Ma, Juncheng
    Lu, Haiyan
    [J]. MEASUREMENT, 2021, 173
  • [7] Image Classification based on Self-attention Convolutional Neural Network
    Cai, Xiaohong
    Li, Ming
    Cao, Hui
    Ma, Jingang
    Wang, Xiaoyan
    Zhuang, Xuqiang
    [J]. SIXTH INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2021, 11913
  • [8] Self-attention convolutional neural network for improved MR image reconstruction
    Wu, Yan
    Ma, Yajun
    Liu, Jing
    Du, Jiang
    Xing, Lei
    [J]. INFORMATION SCIENCES, 2019, 490 : 317 - 328
  • [9] Regional Self-Attention Convolutional Neural Network for Facial Expression Recognition
    Zhou, Lifang
    Wang, Yi
    Lei, Bangjun
    Yang, Weibin
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (08)
  • [10] TOWARDS REAL-TIME MUSIC AUTO-TAGGING USING SPARSE FEATURES
    Yang, Yi-Hsuan
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2013), 2013,