Polyphonic pitch tracking with deep layered learning

被引:6
|
作者
Elowsson, Anders [1 ,2 ]
机构
[1] KTH Royal Inst Technol, Sch Elect Engn & Comp Sci, Stockholm, Sweden
[2] Univ Oslo, RITMO Ctr Interdisciplinary Studies Rhythm Time &, Oslo, Norway
来源
基金
瑞典研究理事会;
关键词
FUNDAMENTAL-FREQUENCY ESTIMATION; MULTIPITCH ESTIMATION; MUSIC TRANSCRIPTION;
D O I
10.1121/10.0001468
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This article presents a polyphonic pitch tracking system that is able to extract both framewise and note-based estimates from audio. The system uses several artificial neural networks trained individually in a deep layered learning setup. First, cascading networks are applied to a spectrogram for framewise fundamental frequency (f(0)) estimation. A sparse receptive field is learned by the first network and then used as a filter kernel for parameter sharing throughout the system. Thef(0)activations are connected across time to extract pitch contours. These contours define a framework within which subsequent networks perform onset and offset detection, operating across both time and smaller pitch fluctuations at the same time. As input, the networks use, e.g., variations of latent representations from thef(0)estimation network. Finally, erroneous tentative notes are removed one by one in an iterative procedure that allows a network to classify notes within a correct context. The system was evaluated on four public test sets: MAPS, Bach10, TRIOS, and the MIREX Woodwind quintet and achieved state-of-the-art results for all four datasets. It performs well across all subtasksf(0), pitched onset, and pitched offset tracking.
引用
收藏
页码:446 / 468
页数:23
相关论文
共 50 条
  • [21] Pedestrian tracking by learning deep features
    Huang, Honghe
    Xu, Yi
    Huang, Yanjie
    Yang, Qian
    Zhou, Zhiguo
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2018, 57 : 172 - 175
  • [22] Tracking Hammerhead Sharks With Deep Learning
    Pena, Alvaro
    Perez, Noel
    Benitez, Diego S.
    Hearn, Alex
    2020 IEEE COLOMBIAN CONFERENCE ON APPLICATIONS OF COMPUTATIONAL INTELLIGENCE (IEEE COLCACI 2020), 2020,
  • [23] Deep Metric Learning for Visual Tracking
    Hu, Junlin
    Lu, Jiwen
    Tan, Yap-Peng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2016, 26 (11) : 2056 - 2068
  • [24] Active Learning for Deep Visual Tracking
    Yuan, Di
    Chang, Xiaojun
    Liu, Qiao
    Yang, Yi
    Wang, Dehua
    Shu, Minglei
    He, Zhenyu
    Shi, Guangming
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (10) : 1 - 13
  • [25] Deep learning for enhanced index tracking
    Dai, Zhiwen
    Li, Lingfei
    QUANTITATIVE FINANCE, 2024, 24 (05) : 569 - 591
  • [26] Deep Learning in Visual Tracking: A Review
    Jiao, Licheng
    Wang, Dan
    Bai, Yidong
    Chen, Puhua
    Liu, Fang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 5497 - 5516
  • [27] Deep learning application on object tracking
    Taglout, Ramdane
    Saoud, Bilal
    PRZEGLAD ELEKTROTECHNICZNY, 2023, 99 (09): : 145 - 149
  • [28] A Review of Visual Tracking with Deep Learning
    Feng, Xiaoyu
    Mei, Wei
    Hu, Dashuai
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRIAL ENGINEERING (AIIE 2016), 2016, 133 : 231 - 234
  • [29] Harmonic and inharmonic Nonnegative Matrix Factorization for polyphonic pitch transcription
    Vincent, Emmanuel
    Bertin, Nancy
    Badeau, Roland
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 109 - +
  • [30] Singing Voice Detection in Polyphonic Music using Predominant Pitch
    Rao, Vishweshwara
    Ramakrishnan, S.
    Rao, Preeti
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1135 - 1138