Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net

被引:8
|
作者
Sudo, Yui [1 ]
Itoyama, Katsutoshi [1 ]
Nishida, Kenji [1 ]
Nakadai, Kazuhiro [1 ,2 ]
机构
[1] Tokyo Inst Technol, Sch Engn, Dept Syst & Control Engn, Tokyo, Japan
[2] Honda Res Inst Japan Co Ltd, Saitama, Japan
关键词
D O I
10.1109/IEEECONF49454.2021.9382730
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper proposes a multi-channel environmental sound segmentation method. Environmental sound segmentation is an integrated method that deals with sound source localization, sound source separation and class identification. When multiple microphones are available, spatial features can be used to improve the separation accuracy of signals from different directions; however, conventional methods have two drawbacks: (a) Since sound source localization and sound source separation using spatial features and class identification using spectral features are trained in the same neural network, it overfits to the relationship between the direction of arrival and the class. (b) Although the permutation invariant training used in speech recognition could be extended, it is not practical for environmental sounds due to the maximum number of speakers limitation. This paper proposes multi-channel environmental sound segmentation method that combines U-Net which simultaneously performs sound source localization and sound source separation, and convolutional neural network which classifies the separated sounds. This method prevents overfitting to the relationship between the direction of arrival and the class. Simulation experiments using the created datasets including 75-class environmental sounds showed that the root mean squared error of the proposed method was lower than that of the conventional method.
引用
收藏
页码:382 / 387
页数:6
相关论文
共 50 条
  • [31] Weakly Supervised U-Net with Limited Upsampling for Sound Event Detection
    Lee, Sangwon
    Kim, Hyemi
    Jang, Gil-Jin
    APPLIED SCIENCES-BASEL, 2023, 13 (11):
  • [32] The effect of characteristics of sound source on spatial impression of multi-channel audio reproduction system
    Hasuo, Misaki
    Kamekawa, Toru
    Marui, Atsushi
    2018 AES INTERNATIONAL CONFERENCE ON SPATIAL REPRODUCTION - AESTHETICS AND SCIENCE, 2018,
  • [33] A Multi-Channel System for Sound Control in the Open Space
    Ciesielka, Wojciech
    ARCHIVES OF ACOUSTICS, 2009, 34 (04) : 559 - 577
  • [34] Snoring Sound Recognition Using Multi-Channel Spectrograms
    Ye, Ziqiang
    Peng, Jianxin
    Zhang, Xiaowen
    Song, Lijuan
    ARCHIVES OF ACOUSTICS, 2024, 49 (02) : 169 - 178
  • [35] A MULTI-CHANNEL TEMPORAL ATTENTION CONVOLUTIONAL NEURAL NETWORK MODEL FOR ENVIRONMENTAL SOUND CLASSIFICATION
    Wang, You
    Feng, Chuyao
    Anderson, David, V
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 930 - 934
  • [36] Multi-Channel System for Sound Creation in Open Areas
    Golas, Andrzej
    Suder-Debska, Katarzyna
    ARCHIVES OF ACOUSTICS, 2012, 37 (03) : 323 - 329
  • [37] Design of Multi-channel Sound Online Testing System
    Qin Zhiying
    Qi Kanghua
    Zhang Zhejuan
    2016 13TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI), 2016, : 730 - 734
  • [38] An Approach for Single-Channel Sound Source Localization
    Youssef, Karim
    Barakat, Julien Moussa H.
    Said, Sherif
    Al Kork, Samer
    Beyrouthy, Taha
    IEEE ACCESS, 2024, 12 : 107476 - 107487
  • [39] Multi-channel source separation by factorial HMMs
    Reyes-Gomez, MJ
    Raj, B
    Ellis, DPW
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 664 - 667
  • [40] Multi-channel Convolutional Neural Networks with Multi-level Feature Fusion for Environmental Sound Classification
    Chong, Dading
    Zou, Yuexian
    Wang, Wenwu
    MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 157 - 168