Polyphonic pitch tracking with deep layered learning

被引:6
|
作者
Elowsson, Anders [1 ,2 ]
机构
[1] KTH Royal Inst Technol, Sch Elect Engn & Comp Sci, Stockholm, Sweden
[2] Univ Oslo, RITMO Ctr Interdisciplinary Studies Rhythm Time &, Oslo, Norway
来源
基金
瑞典研究理事会;
关键词
FUNDAMENTAL-FREQUENCY ESTIMATION; MULTIPITCH ESTIMATION; MUSIC TRANSCRIPTION;
D O I
10.1121/10.0001468
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This article presents a polyphonic pitch tracking system that is able to extract both framewise and note-based estimates from audio. The system uses several artificial neural networks trained individually in a deep layered learning setup. First, cascading networks are applied to a spectrogram for framewise fundamental frequency (f(0)) estimation. A sparse receptive field is learned by the first network and then used as a filter kernel for parameter sharing throughout the system. Thef(0)activations are connected across time to extract pitch contours. These contours define a framework within which subsequent networks perform onset and offset detection, operating across both time and smaller pitch fluctuations at the same time. As input, the networks use, e.g., variations of latent representations from thef(0)estimation network. Finally, erroneous tentative notes are removed one by one in an iterative procedure that allows a network to classify notes within a correct context. The system was evaluated on four public test sets: MAPS, Bach10, TRIOS, and the MIREX Woodwind quintet and achieved state-of-the-art results for all four datasets. It performs well across all subtasksf(0), pitched onset, and pitched offset tracking.
引用
收藏
页码:446 / 468
页数:23
相关论文
共 50 条
  • [1] Polyphonic pitch tracking with deep layered learning
    Elowsson, Anders
    Journal of the Acoustical Society of America, 2020, 148 (01): : 446 - 468
  • [2] POLYPHONIC PITCH TRACKING BY EXAMPLE
    Smaragdis, Paris
    2011 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2011, : 125 - 128
  • [3] Singing Voice Separation and Pitch Extraction from Monaural Polyphonic Audio Music Via DNN and Adaptive Pitch Tracking
    Fan, Zhe-Cheng
    Jang, Jyh-Shing Roger
    Lu, Chung-Li
    2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2016, : 178 - 185
  • [4] A survey of Deep Learning for Polyphonic Sound event detection
    Dang, An
    Vu, Toan H.
    Wang, Jia-Ching
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT), 2017, : 75 - 78
  • [5] Neural RAPT: deep learning-based pitch tracking with prior algorithmic knowledge instillation
    Wang K.
    Liu J.
    Peng Y.
    Huang H.
    International Journal of Speech Technology, 2023, 26 (04) : 999 - 1015
  • [6] A Computationally Efficient Method for Polyphonic Pitch Estimation
    Ruohua Zhou
    Joshua D. Reiss
    Marco Mattavelli
    Giorgio Zoia
    EURASIP Journal on Advances in Signal Processing, 2009
  • [7] Polyphonic pitch extraction from musical signals
    Lepain, P
    JOURNAL OF NEW MUSIC RESEARCH, 1999, 28 (04) : 296 - 309
  • [8] A Computationally Efficient Method for Polyphonic Pitch Estimation
    Zhou, Ruohua
    Reiss, Joshua D.
    Mattavelli, Marco
    Zoia, Giorgio
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2009,
  • [9] Detecting pitch of singing voice in polyphonic audio
    Li, YP
    Wang, DL
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 17 - 20
  • [10] Predominant vocal pitch detection in polyphonic music
    Shao, Xi
    Xu, Changsheng
    Kankanhalli, Mohan S.
    2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 897 - 900