Polyphonic pitch tracking with deep layered learning

被引:6
|
作者
Elowsson, Anders [1 ,2 ]
机构
[1] KTH Royal Inst Technol, Sch Elect Engn & Comp Sci, Stockholm, Sweden
[2] Univ Oslo, RITMO Ctr Interdisciplinary Studies Rhythm Time &, Oslo, Norway
来源
基金
瑞典研究理事会;
关键词
FUNDAMENTAL-FREQUENCY ESTIMATION; MULTIPITCH ESTIMATION; MUSIC TRANSCRIPTION;
D O I
10.1121/10.0001468
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This article presents a polyphonic pitch tracking system that is able to extract both framewise and note-based estimates from audio. The system uses several artificial neural networks trained individually in a deep layered learning setup. First, cascading networks are applied to a spectrogram for framewise fundamental frequency (f(0)) estimation. A sparse receptive field is learned by the first network and then used as a filter kernel for parameter sharing throughout the system. Thef(0)activations are connected across time to extract pitch contours. These contours define a framework within which subsequent networks perform onset and offset detection, operating across both time and smaller pitch fluctuations at the same time. As input, the networks use, e.g., variations of latent representations from thef(0)estimation network. Finally, erroneous tentative notes are removed one by one in an iterative procedure that allows a network to classify notes within a correct context. The system was evaluated on four public test sets: MAPS, Bach10, TRIOS, and the MIREX Woodwind quintet and achieved state-of-the-art results for all four datasets. It performs well across all subtasksf(0), pitched onset, and pitched offset tracking.
引用
收藏
页码:446 / 468
页数:23
相关论文
共 50 条
  • [31] Combining Deep Learning and Preference Learning for Object Tracking
    Pang, Shuchao
    Jose del Coz, Juan
    Yu, Zhezhou
    Luaces, Oscar
    Diez, Jorge
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT III, 2016, 9949 : 70 - 77
  • [32] Pitch detection in polyphonic music using instrument tone models
    Li, Yipeng
    Wang, DeLiang
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PTS 1-3, 2007, : 481 - +
  • [33] Diagnosis of coronary layered plaque by deep learning
    Araki, Makoto
    Park, Sangjoon
    Nakajima, Akihiro
    Lee, Hang
    Ye, Jong Chul
    Jang, Ik-Kyung
    SCIENTIFIC REPORTS, 2023, 13 (01):
  • [34] Weighted Initialisation of Evolutionary Instrument and Pitch Detection in Polyphonic Music
    Dettmer, Justin
    Vatolkin, Igor
    Glasmachers, Tobias
    ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024, 2024, 14633 : 114 - 129
  • [35] Diagnosis of coronary layered plaque by deep learning
    Park, S.
    Arakai, M.
    Nakajima, A.
    Lee, H.
    Ye, J. C.
    Jang, I. K.
    EUROPEAN HEART JOURNAL, 2022, 43 : 338 - 338
  • [36] Diagnosis of coronary layered plaque by deep learning
    Makoto Araki
    Sangjoon Park
    Akihiro Nakajima
    Hang Lee
    Jong Chul Ye
    Ik-Kyung Jang
    Scientific Reports, 13 (1)
  • [37] Polyphonic support for collaborative learning
    Trausan-Matu, Stefan
    Stahl, Gerry
    Sarmiento, Johann
    Groupware: Design, Implementation, and Use, 2006, 4154 : 132 - 139
  • [38] Tracking partials for the sinusoidal modeling of polyphonic sounds
    Lagrange, M
    Marchand, S
    Rault, JB
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 229 - 232
  • [39] Vocal Pitch Extraction in Polyphonic Music using Convolutional Residual Network
    Dong, Mingye
    Wu, Jie
    Luan, Jian
    INTERSPEECH 2019, 2019, : 2010 - 2014
  • [40] Real-Time Polyphonic Pitch Detection on Acoustic Musical Signals
    Goodman, Thomas A.
    Batten, Ian
    2018 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2018, : 656 - 661