Clean vs. Overlapped Speech-Music Detection Using Harmonic-Percussive Features and Multi-Task Learning

被引:0
|
作者
Bhattacharjee, Mrinmoy [1 ]
Prasanna, S. R. M. [2 ]
Guha, Prithwijit [1 ]
机构
[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati 781039, Assam, India
[2] Indian Inst Technol Dharwad, Dept Elect Engn, Dharwad 580011, Karnataka, India
关键词
Spectrogram; Task analysis; Harmonic analysis; Multiple signal classification; Speech processing; Feature extraction; Training; Speech music overlap detection; harmonic percussive source separation; multi-task learning; radio broadcast audio classification; BACKGROUND MUSIC; NETWORK; NOISE;
D O I
10.1109/TASLP.2022.3164199
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Detection of speech and music signals in isolated and overlapped conditions is an essential preprocessing step for many audio applications. Speech signals have wavy and continuous harmonics, while music signals exhibit horizontally linear and discontinuous harmonic patterns. Music signals also contain more percussive components than speech signals, manifested as vertical striations in the spectrograms. In case of speech music overlap, it might be challenging for automatic feature learning systems to extract class-specific horizontal and vertical striations from the combined spectrogram representation. A pre-processing step of separating the harmonic and percussive components before training might aid the classifier. Thus, this work proposes the use of harmonic-percussive source separation method to generate features for better detection of speech and music signals. Additionally, this work also explores the traditional and cascaded-information multi-task learning (MTL) frameworks to design better classifiers. MTL framework aids the training of the main task by employing simultaneous learning of several related auxiliary tasks. Results have been reported both on synthetically generated speech music overlapped signals and real recordings. Four state-of-the-art approaches are used for performance comparison. Experiments show that harmonic and percussive decomposition of spectrograms perform better as features. Moreover, the MTL-framework based classifiers further improve performances.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [1] MULTI-TASK LEARNING IMPROVES SYNTHETIC SPEECH DETECTION
    Mo, Yichuan
    Wang, Shilin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6392 - 6396
  • [2] Learning Saliency Features for Face Detection and Recognition Using Multi-task Network
    Qian Zhao
    Shuzhi Sam Ge
    Mao Ye
    Sibang Liu
    Wei He
    International Journal of Social Robotics, 2016, 8 : 709 - 720
  • [3] Learning Saliency Features for Face Detection and Recognition Using Multi-task Network
    Zhao, Qian
    Ge, Shuzhi Sam
    Ye, Mao
    Liu, Sibang
    He, Wei
    INTERNATIONAL JOURNAL OF SOCIAL ROBOTICS, 2016, 8 (05) : 709 - 720
  • [4] HHSD: Hindi Hate Speech Detection Leveraging Multi-Task Learning
    Kapil, Prashant
    Kumari, Gitanjali
    Ekbal, Asif
    Pal, Santanu
    Chatterjee, Arindam
    Vinutha, B. N.
    IEEE ACCESS, 2023, 11 : 101460 - 101473
  • [5] Towards Analyzing the Efficacy of Multi-task Learning in Hate Speech Detection
    Maity, Krishanu
    Balaji, Gokulapriyan
    Saha, Sriparna
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT VI, 2024, 14452 : 317 - 328
  • [6] Speech Emotion Recognition using Decomposed Speech via Multi-task Learning
    Hsu, Jia-Hao
    Wu, Chung-Hsien
    Wei, Yu-Hung
    INTERSPEECH 2023, 2023, : 4553 - 4557
  • [7] VOICE TOXICITY DETECTION USING MULTI-TASK LEARNING
    Nandwana, Mahesh Kumar
    He, Yifan
    Liu, Joseph
    Yu, Xiao
    Shang, Charles
    Du Bois, Eloi
    McGuire, Morgan
    Bhat, Kiran
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 331 - 335
  • [8] Improving Speech-Based Dysarthria Detection using Multi-task Learning with Gradient Projection
    Xiang, Yan
    Berisha, Visar
    Liss, Julie
    Chakrabarti, Chaitali
    INTERSPEECH 2024, 2024, : 902 - 906
  • [9] SUPERVISED CHORUS DETECTION FOR POPULAR MUSIC USING CONVOLUTIONAL NEURAL NETWORK AND MULTI-TASK LEARNING
    Wang, Ju-Chiang
    Smith, Jordan B. L.
    Chen, Jitong
    Song, Xuchen
    Wang, Yuxuan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 566 - 570
  • [10] Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning
    Parry, Jack
    DeMattos, Eric
    Klementiev, Anita
    Ind, Axel
    Morse-Kopp, Daniela
    Clarke, Georgia
    Palaz, Dimitri
    INTERSPEECH 2022, 2022, : 1158 - 1162