Predominant audio source separation in polyphonic music

被引:0
|
作者
Lekshmi Chandrika Reghunath
Rajeev Rajan
机构
[1] College of Engineering Trivandrum,Department of Electronics and Communication Engineering
[2] APJ Abdul Kalam Technological University,Department of Electronics and Communication Engineering
[3] Government Engineering College Barton Hill,undefined
[4] APJ Abdul Kalam Technological University,undefined
关键词
Predominant; Spectrogram; Time-frequency filtering; Generative adversarial network; Binary masking;
D O I
暂无
中图分类号
学科分类号
摘要
Predominant source separation is the separation of one or more desired predominant signals, such as voice or leading instruments, from polyphonic music. The proposed work uses time-frequency filtering on predominant source separation and conditional adversarial networks to improve the perceived quality of isolated sounds. The pitch tracks corresponding to the prominent sound sources of the polyphonic music are estimated using a predominant pitch extraction algorithm and a binary mask corresponding to each pitch track and its harmonics are generated. Time-frequency filtering is performed on the spectrogram of the input signal using a binary mask that isolates the dominant sources based on pitch. The perceptual quality of source-separated music signal is enhanced using a CycleGAN-based conditional adversarial network operating on spectrogram images. The proposed work is systematically evaluated using the IRMAS and ADC 2004 datasets. Subjective and objective evaluations have been carried out. The reconstructed spectrogram is converted back to music signals by applying the inverse short-time Fourier transform. The intelligibility of separated audio is enhanced using an intelligibility enhancement module based on an audio style transfer scheme. The performance of the proposed method is compared with state-of-the-art Demucs and Wave-U-Net architectures and shows competing performance both objectively and subjectively.
引用
收藏
相关论文
共 50 条
  • [41] LOW RESOURCE AUDIO-TO-LYRICS ALIGNMENT FROM POLYPHONIC MUSIC RECORDINGS
    Demirel, Emir
    Ahlback, Sven
    Dixon, Simon
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 586 - 590
  • [42] DISCOVERY OF REPEATED VOCAL PATTERNS IN POLYPHONIC AUDIO: A CASE STUDY ON FLAMENCO MUSIC
    Kroher, Nadine
    Pikrakis, Aggelos
    Moreno, Jesus
    Diaz-Banez, Jose-Miguel
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 41 - 45
  • [43] Blind Source Separation in Polyphonic Music Recordings Using Deep Neural Networks Trained via Policy Gradients
    Schulze, Soeren
    Leuschner, Johannes
    King, Emily J.
    SIGNALS, 2021, 2 (04): : 637 - 661
  • [44] Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music
    Reghunath, Lekshmi Chandrika
    Rajan, Rajeev
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022, 2022 (01)
  • [45] Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music
    Lekshmi Chandrika Reghunath
    Rajeev Rajan
    EURASIP Journal on Audio, Speech, and Music Processing, 2022
  • [46] NOTES ON A NEW SOURCE OF POLYPHONIC MUSIC OF THE 1500S
    ZIINO, A
    NUOVA RIVISTA MUSICALE ITALIANA, 1976, 10 (03): : 437 - 441
  • [47] MOTION INFORMED AUDIO SOURCE SEPARATION
    Parekh, Sanjeel
    Essid, Slim
    Ozerov, Alexey
    Duong, Ngoc Q. K.
    Perez, Patrick
    Richard, Gael
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 6 - 10
  • [48] Joint Audio Inpainting and Source Separation
    Bilen, Cagdas
    Ozerov, Alexey
    Perez, Patrick
    LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION, LVA/ICA 2015, 2015, 9237 : 251 - 258
  • [49] Audio source separation of convolutive mixtures
    Mitianoudis, N
    Davies, ME
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05): : 489 - 497
  • [50] Single channel audio source separation
    Gao, Bin
    Woo, W.L.
    Dlay, S.S.
    WSEAS Transactions on Signal Processing, 2008, 4 (04): : 173 - 182