Time-Frequency Representations for Single-Channel Music Source Separation

被引:0
|
作者
Tan, Vanessa H. [1 ]
de Leon, Franz [1 ]
机构
[1] Univ Philippines Diliman, Elect & Elect Engn Inst, Quezon City, Philippines
关键词
music source separation; single-channel; time-frequency representations; deep learning; MELODY EXTRACTION;
D O I
10.1109/ismac.2019.8836141
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Inspired by the success of image classification and speech recognition, deep learning algorithms have been explored to solve music source separation. Solving this problem would open to a wide range of applications like automatic transcription, audio post-production, and many more. Most algorithms usually use the Short Time Fourier Transform (STFT) as the Time-Frequency (T-F) input representation. Each deep learning model has a different configuration for STFT. There is no constant STFT parameters that is used in solving music source separation. This paper explores the different parameters for STFT and investigates another representation, the Constant-Q Transform, in separating three individual sound sources. Results of experiments show that dilated convolutional layers are great for STFT while normal convolutional layers are great for CQT. The best T-F representation for music source separation is STFT with dilated CNNs and a soft masking method. Furthermore, researchers should still consider the parameters of the T-F representations to have better performance for their deep learning models.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] PHASE RECONSTRUCTION WITH LEARNED TIME-FREQUENCY REPRESENTATIONS FOR SINGLE-CHANNEL SPEECH SEPARATION
    Wichern, Gordon
    Le Roux, Jonathan
    [J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 396 - 400
  • [2] Impact of phase estimation on single-channel speech separation based on time-frequency masking
    Mayer, Florian
    Williamson, Donald S.
    Mowlaee, Pejman
    Wang, DeLiang
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (06): : 4668 - 4679
  • [3] ICA-Based Single Channel Source Separation With Time-Frequency Decomposition
    Mika, Dariusz
    Budzik, Grzegorz
    Jozwik, Jerzy
    [J]. 2020 IEEE 7TH INTERNATIONAL WORKSHOP ON METROLOGY FOR AEROSPACE (METROAEROSPACE), 2020, : 238 - 243
  • [4] Single Channel Source Separation with ICA-Based Time-Frequency Decomposition
    Mika, Dariusz
    Budzik, Grzegorz
    Jozwik, Jerzy
    [J]. SENSORS, 2020, 20 (07)
  • [5] Blind source separation based on time-frequency signal representations
    Belouchrani, A
    Amin, MG
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1998, 46 (11) : 2888 - 2897
  • [6] Audio source separation with multiple microphones on time-frequency representations
    Sawada, Hiroshi
    [J]. INDEPENDENT COMPONENT ANALYSES, COMPRESSIVE SAMPLING, WAVELETS, NEURAL NET, BIOSYSTEMS, AND NANOENGINEERING XI, 2013, 8750
  • [7] Coherent modulation spectral filtering for single-channel music source separation
    Atlas, L
    Janssen, C
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 461 - 464
  • [8] TIME-FREQUENCY CONSTRAINTS FOR PHASE ESTIMATION IN SINGLE-CHANNEL SPEECH ENHANCEMENT
    Mowlaee, Pejman
    Saeidi, Rahim
    [J]. 2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2014, : 337 - 341
  • [9] Modulation Recognition Method of Time-frequency Aliasing Signals in Single-channel
    Li, Pengbo
    Qi, Lin
    Wang, Junliang
    [J]. ELEVENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2019), 2020, 11373
  • [10] The Cyclostationary Characteristic Analysis of the Time-Frequency Overlapped Signal in Single-Channel
    Yu, Zhibin
    Sun, Yongkui
    Yu, Ningyu
    [J]. 2011 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN ENERGY ENGINEERING (ICAEE), 2012, 14 : 1041 - 1046