Multi-Instrument Automatic Music Transcription With Self-Attention-Based Instance Segmentation

被引:18
|
作者
Wu, Yu-Te [1 ]
Chen, Berlin [1 ]
Su, Li [2 ]
机构
[1] Natl Taiwan Normal Univ, Dept Comp Sci & Informat Engn, Taiepi 116, Taiwan
[2] Acad Sinica, Inst Informat Sci, Taipei 115, Taiwan
关键词
Instruments; Task analysis; Music; Multiple signal classification; Hidden Markov models; Speech processing; Deep learning; Automatic music transcription; deep learning; multi-pitch estimation; multi-pitch streaming; self-attention;
D O I
10.1109/TASLP.2020.3030482
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Multi-instrument automatic music transcription (AMT) is a critical but less investigated problem in the field of music information retrieval (MIR). With all the difficulties faced by traditional AMT research, multi-instrument AMT needs further investigation on high-level music semantic modeling, efficient training methods for multiple attributes, and a clear problem scenario for system performance evaluation. In this article, we propose a multi-instrument AMT method, with signal processing techniques specifying pitch saliency, novel deep learning techniques, and concepts partly inspired by multi-object recognition, instance segmentation, and image-to-image translation in computer vision. The proposed method is flexible for all the sub-tasks in multi-instrument AMT, including multi-instrument note tracking, a task that has rarely been investigated before. State-of-the-art performance is also reported in the sub-task of multi-pitch streaming.
引用
收藏
页码:2796 / 2809
页数:14
相关论文
共 50 条
  • [1] Research on Neural Network-based Automatic Music Multi-Instrument Classification Approach
    Guo, Ribin
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (01) : 792 - 798
  • [2] Self-Attention-based Multi-Scale Feature Fusion Network for Road Ponding Segmentation
    Yang, Shangyu
    Zhang, Ronghui
    Sun, Wencai
    Chen, Shengru
    Ye, Cong
    Wu, Hao
    Li, Mengran
    [J]. 2024 2ND ASIA CONFERENCE ON COMPUTER VISION, IMAGE PROCESSING AND PATTERN RECOGNITION, CVIPPR 2024, 2024,
  • [3] Transcribing Multi-Instrument Polyphonic Music With Hierarchical Eigeninstruments
    Grindlay, Graham
    Ellis, Daniel P. W.
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2011, 5 (06) : 1159 - 1169
  • [4] MICW: A MULTI-INSTRUMENT MUSIC GENERATION MODEL BASED ON THE IMPROVED COMPOUND WORD
    Liao, Yikai
    Yue, Wang
    Jian, Yuqing
    Wang, Zijun
    Gao, Yuchong
    Lu, Chenhao
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
  • [5] Semantic Segmentation of Remote Sensing Image via Self-Attention-Based Multi-Scale Feature Fusion
    Guo D.
    Fu Y.
    Zhu Y.
    Wen W.
    [J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (08): : 1259 - 1268
  • [6] Convolutional Self-Attention-Based Multi-User MIMO Demapper
    Michon, Athur
    Aoudia, Faycal Ait
    Srinath, K. Pavan
    [J]. IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 2621 - 2626
  • [7] Local Self-attention-based Hybrid Multiple Instance Learning for Partial Spoof Speech Detection
    Zhu, Yupeng
    Chen, Yanxiang
    Zhao, Zuxing
    Liu, Xueliang
    Guo, Jinlin
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (05)
  • [8] Self-attention-based Group Recommendation
    Yang, Xiaoping
    Shi, Yuliang
    [J]. PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 2540 - 2546
  • [9] Multi-instrument musical transcription using a dynamic graphical model
    Vogel, BK
    Jordan, MI
    Wessel, D
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 493 - 496
  • [10] MULTI-INSTRUMENT DETECTION IN POLYPHONIC MUSIC USING GAUSSIAN MIXTURE BASED FACTORIAL HMM
    Ranjani, H. G.
    Sreenivas, T. V.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 191 - 195