Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net

被引：8

作者：

Sudo, Yui ^{[1
]}

Itoyama, Katsutoshi ^{[1
]}

Nishida, Kenji ^{[1
]}

Nakadai, Kazuhiro ^{[1
,2
]}

机构：

[1] Tokyo Inst Technol, Sch Engn, Dept Syst & Control Engn, Tokyo, Japan

[2] Honda Res Inst Japan Co Ltd, Saitama, Japan

来源：

2021 IEEE/SICE INTERNATIONAL SYMPOSIUM ON SYSTEM INTEGRATION (SII) | 2021年

关键词：

D O I：

10.1109/IEEECONF49454.2021.9382730

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

This paper proposes a multi-channel environmental sound segmentation method. Environmental sound segmentation is an integrated method that deals with sound source localization, sound source separation and class identification. When multiple microphones are available, spatial features can be used to improve the separation accuracy of signals from different directions; however, conventional methods have two drawbacks: (a) Since sound source localization and sound source separation using spatial features and class identification using spectral features are trained in the same neural network, it overfits to the relationship between the direction of arrival and the class. (b) Although the permutation invariant training used in speech recognition could be extended, it is not practical for environmental sounds due to the maximum number of speakers limitation. This paper proposes multi-channel environmental sound segmentation method that combines U-Net which simultaneously performs sound source localization and sound source separation, and convolutional neural network which classifies the separated sounds. This method prevents overfitting to the relationship between the direction of arrival and the class. Simulation experiments using the created datasets including 75-class environmental sounds showed that the root mean squared error of the proposed method was lower than that of the conventional method.

引用

页码：382 / 387

页数：6

共 50 条

[31] Weakly Supervised U-Net with Limited Upsampling for Sound Event Detection
Lee, Sangwon
Kim, Hyemi
Jang, Gil-Jin
APPLIED SCIENCES-BASEL, 2023, 13 (11):
[32] The effect of characteristics of sound source on spatial impression of multi-channel audio reproduction system
Hasuo, Misaki
Kamekawa, Toru
Marui, Atsushi
2018 AES INTERNATIONAL CONFERENCE ON SPATIAL REPRODUCTION - AESTHETICS AND SCIENCE, 2018,
[33] A Multi-Channel System for Sound Control in the Open Space
Ciesielka, Wojciech
ARCHIVES OF ACOUSTICS, 2009, 34 (04) : 559 - 577
[34] Snoring Sound Recognition Using Multi-Channel Spectrograms
Ye, Ziqiang
Peng, Jianxin
Zhang, Xiaowen
Song, Lijuan
ARCHIVES OF ACOUSTICS, 2024, 49 (02) : 169 - 178
[35] A MULTI-CHANNEL TEMPORAL ATTENTION CONVOLUTIONAL NEURAL NETWORK MODEL FOR ENVIRONMENTAL SOUND CLASSIFICATION
Wang, You
Feng, Chuyao
Anderson, David, V
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 930 - 934
[36] Multi-Channel System for Sound Creation in Open Areas
Golas, Andrzej
Suder-Debska, Katarzyna
ARCHIVES OF ACOUSTICS, 2012, 37 (03) : 323 - 329
[37] Design of Multi-channel Sound Online Testing System
Qin Zhiying
Qi Kanghua
Zhang Zhejuan
2016 13TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI), 2016, : 730 - 734
[38] An Approach for Single-Channel Sound Source Localization
Youssef, Karim
Barakat, Julien Moussa H.
Said, Sherif
Al Kork, Samer
Beyrouthy, Taha
IEEE ACCESS, 2024, 12 : 107476 - 107487
[39] Multi-channel source separation by factorial HMMs
Reyes-Gomez, MJ
Raj, B
Ellis, DPW
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 664 - 667
[40] Multi-channel Convolutional Neural Networks with Multi-level Feature Fusion for Environmental Sound Classification
Chong, Dading
Zou, Yuexian
Wang, Wenwu
MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 157 - 168

← 1 2 3 4 5 →