Predominant audio source separation in polyphonic music

被引:0
|
作者
Lekshmi Chandrika Reghunath
Rajeev Rajan
机构
[1] College of Engineering Trivandrum,Department of Electronics and Communication Engineering
[2] APJ Abdul Kalam Technological University,Department of Electronics and Communication Engineering
[3] Government Engineering College Barton Hill,undefined
[4] APJ Abdul Kalam Technological University,undefined
关键词
Predominant; Spectrogram; Time-frequency filtering; Generative adversarial network; Binary masking;
D O I
暂无
中图分类号
学科分类号
摘要
Predominant source separation is the separation of one or more desired predominant signals, such as voice or leading instruments, from polyphonic music. The proposed work uses time-frequency filtering on predominant source separation and conditional adversarial networks to improve the perceived quality of isolated sounds. The pitch tracks corresponding to the prominent sound sources of the polyphonic music are estimated using a predominant pitch extraction algorithm and a binary mask corresponding to each pitch track and its harmonics are generated. Time-frequency filtering is performed on the spectrogram of the input signal using a binary mask that isolates the dominant sources based on pitch. The perceptual quality of source-separated music signal is enhanced using a CycleGAN-based conditional adversarial network operating on spectrogram images. The proposed work is systematically evaluated using the IRMAS and ADC 2004 datasets. Subjective and objective evaluations have been carried out. The reconstructed spectrogram is converted back to music signals by applying the inverse short-time Fourier transform. The intelligibility of separated audio is enhanced using an intelligibility enhancement module based on an audio style transfer scheme. The performance of the proposed method is compared with state-of-the-art Demucs and Wave-U-Net architectures and shows competing performance both objectively and subjectively.
引用
收藏
相关论文
共 50 条
  • [1] Predominant audio source separation in polyphonic music
    Reghunath, Lekshmi Chandrika
    Rajan, Rajeev
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [2] On the Importance of Audio-source Separation for Singer Identification in Polyphonic Music
    Sharma, Bidisha
    Das, Rohan Kumar
    Li, Haizhou
    INTERSPEECH 2019, 2019, : 2020 - 2024
  • [3] Factors in factorization: Does better audio source separation imply better polyphonic music transcription ?
    Tavares, Tiago Fernandes
    Tzanetakis, George
    Driessen, Peter
    2013 IEEE 15TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2013, : 424 - 428
  • [4] Automatic music transcription and audio source separation
    Plumbley, MD
    Abdallah, SA
    Bello, JP
    Davies, ME
    Monti, G
    Sandler, MB
    CYBERNETICS AND SYSTEMS, 2002, 33 (06) : 603 - 627
  • [5] Predominant vocal pitch detection in polyphonic music
    Shao, Xi
    Xu, Changsheng
    Kankanhalli, Mohan S.
    2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 897 - 900
  • [6] Note-Based Sound Source Separation in Monoaural Polyphonic Music
    Aczel, Kristof
    Vajk, Istvan
    ACTA ACUSTICA UNITED WITH ACUSTICA, 2010, 96 (05) : 947 - 958
  • [7] Polyphonic audio matching and alignment for music retrieval
    Hu, N
    Dannenberg, RB
    Tzanetakis, G
    2003 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS PROCEEDINGS, 2003, : 185 - 188
  • [8] Sparse pursuit and dictionary learning for blind source separation in polyphonic music recordings
    Sören Schulze
    Emily J. King
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [9] Sparse pursuit and dictionary learning for blind source separation in polyphonic music recordings
    Schulze, Soeren
    King, Emily J.
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [10] Musical Source Clustering and Identification in Polyphonic Audio
    Arora, Vipul
    Behera, Laxmidhar
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (06) : 1003 - 1012