Clean vs. Overlapped Speech-Music Detection Using Harmonic-Percussive Features and Multi-Task Learning

被引：0

作者：

Bhattacharjee, Mrinmoy ^{[1
]}

Prasanna, S. R. M. ^{[2
]}

Guha, Prithwijit ^{[1
]}

机构：

[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati 781039, Assam, India

[2] Indian Inst Technol Dharwad, Dept Elect Engn, Dharwad 580011, Karnataka, India

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2023年 / 31卷

关键词：

Spectrogram; Task analysis; Harmonic analysis; Multiple signal classification; Speech processing; Feature extraction; Training; Speech music overlap detection; harmonic percussive source separation; multi-task learning; radio broadcast audio classification; BACKGROUND MUSIC; NETWORK; NOISE;

D O I：

10.1109/TASLP.2022.3164199

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Detection of speech and music signals in isolated and overlapped conditions is an essential preprocessing step for many audio applications. Speech signals have wavy and continuous harmonics, while music signals exhibit horizontally linear and discontinuous harmonic patterns. Music signals also contain more percussive components than speech signals, manifested as vertical striations in the spectrograms. In case of speech music overlap, it might be challenging for automatic feature learning systems to extract class-specific horizontal and vertical striations from the combined spectrogram representation. A pre-processing step of separating the harmonic and percussive components before training might aid the classifier. Thus, this work proposes the use of harmonic-percussive source separation method to generate features for better detection of speech and music signals. Additionally, this work also explores the traditional and cascaded-information multi-task learning (MTL) frameworks to design better classifiers. MTL framework aids the training of the main task by employing simultaneous learning of several related auxiliary tasks. Results have been reported both on synthetically generated speech music overlapped signals and real recordings. Four state-of-the-art approaches are used for performance comparison. Experiments show that harmonic and percussive decomposition of spectrograms perform better as features. Moreover, the MTL-framework based classifiers further improve performances.

引用

页码：1 / 10

页数：10

共 50 条

[1] MULTI-TASK LEARNING IMPROVES SYNTHETIC SPEECH DETECTION
Mo, Yichuan
Wang, Shilin
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6392 - 6396
[2] Learning Saliency Features for Face Detection and Recognition Using Multi-task Network
Qian Zhao
Shuzhi Sam Ge
Mao Ye
Sibang Liu
Wei He
International Journal of Social Robotics, 2016, 8 : 709 - 720
[3] Learning Saliency Features for Face Detection and Recognition Using Multi-task Network
Zhao, Qian
Ge, Shuzhi Sam
Ye, Mao
Liu, Sibang
He, Wei
INTERNATIONAL JOURNAL OF SOCIAL ROBOTICS, 2016, 8 (05) : 709 - 720
[4] HHSD: Hindi Hate Speech Detection Leveraging Multi-Task Learning
Kapil, Prashant
Kumari, Gitanjali
Ekbal, Asif
Pal, Santanu
Chatterjee, Arindam
Vinutha, B. N.
IEEE ACCESS, 2023, 11 : 101460 - 101473
[5] Towards Analyzing the Efficacy of Multi-task Learning in Hate Speech Detection
Maity, Krishanu
Balaji, Gokulapriyan
Saha, Sriparna
NEURAL INFORMATION PROCESSING, ICONIP 2023, PT VI, 2024, 14452 : 317 - 328
[6] Speech Emotion Recognition using Decomposed Speech via Multi-task Learning
Hsu, Jia-Hao
Wu, Chung-Hsien
Wei, Yu-Hung
INTERSPEECH 2023, 2023, : 4553 - 4557
[7] VOICE TOXICITY DETECTION USING MULTI-TASK LEARNING
Nandwana, Mahesh Kumar
He, Yifan
Liu, Joseph
Yu, Xiao
Shang, Charles
Du Bois, Eloi
McGuire, Morgan
Bhat, Kiran
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 331 - 335
[8] Improving Speech-Based Dysarthria Detection using Multi-task Learning with Gradient Projection
Xiang, Yan
Berisha, Visar
Liss, Julie
Chakrabarti, Chaitali
INTERSPEECH 2024, 2024, : 902 - 906
[9] SUPERVISED CHORUS DETECTION FOR POPULAR MUSIC USING CONVOLUTIONAL NEURAL NETWORK AND MULTI-TASK LEARNING
Wang, Ju-Chiang
Smith, Jordan B. L.
Chen, Jitong
Song, Xuchen
Wang, Yuxuan
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 566 - 570
[10] Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning
Parry, Jack
DeMattos, Eric
Klementiev, Anita
Ind, Axel
Morse-Kopp, Daniela
Clarke, Georgia
Palaz, Dimitri
INTERSPEECH 2022, 2022, : 1158 - 1162

← 1 2 3 4 5 →