Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net

被引:8
|
作者
Sudo, Yui [1 ]
Itoyama, Katsutoshi [1 ]
Nishida, Kenji [1 ]
Nakadai, Kazuhiro [1 ,2 ]
机构
[1] Tokyo Inst Technol, Sch Engn, Dept Syst & Control Engn, Tokyo, Japan
[2] Honda Res Inst Japan Co Ltd, Saitama, Japan
关键词
D O I
10.1109/IEEECONF49454.2021.9382730
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper proposes a multi-channel environmental sound segmentation method. Environmental sound segmentation is an integrated method that deals with sound source localization, sound source separation and class identification. When multiple microphones are available, spatial features can be used to improve the separation accuracy of signals from different directions; however, conventional methods have two drawbacks: (a) Since sound source localization and sound source separation using spatial features and class identification using spectral features are trained in the same neural network, it overfits to the relationship between the direction of arrival and the class. (b) Although the permutation invariant training used in speech recognition could be extended, it is not practical for environmental sounds due to the maximum number of speakers limitation. This paper proposes multi-channel environmental sound segmentation method that combines U-Net which simultaneously performs sound source localization and sound source separation, and convolutional neural network which classifies the separated sounds. This method prevents overfitting to the relationship between the direction of arrival and the class. Simulation experiments using the created datasets including 75-class environmental sounds showed that the root mean squared error of the proposed method was lower than that of the conventional method.
引用
收藏
页码:382 / 387
页数:6
相关论文
共 50 条
  • [21] Design of multi-channel programmable sound generator
    Department of Electronics Engineering, Xi'an University of Technology, Xi'an 710048, China
    Dianzi Qijian, 2006, 4 (1110-1113):
  • [22] A method to convert stereo to multi-channel sound
    Irwan, R
    Aarts, RM
    PROCEEDINGS OF THE AES 19TH INTERNATIONAL CONFERENCE SURROUND SOUND: TECHNIQUES, TECHNOLOGY AND PERCEPTION, 2001, : 139 - 143
  • [23] U-NET: A Supervised Approach for Monaural Source Separation
    Basir, Samiul
    Hossain, Md. Nahid
    Hosen, Md. Shakhawat
    Ali, Md. Sadek
    Riaz, Zainab
    Islam, Md. Shohidul
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024, 49 (09) : 12679 - 12691
  • [24] Multi-Branch U-Net for Interactive Segmentation
    Li, Zhicheng
    Wang, Tao
    Mei, Chun
    Pei, Zhenyu
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 974 - 978
  • [25] Multiple Objects Localization Using Image Segmentation with U-Net
    Stursa, Dominik
    Dolezel, Petr
    Honc, Daniel
    PROCESS CONTROL '21 - PROCEEDING OF THE 2021 23RD INTERNATIONAL CONFERENCE ON PROCESS CONTROL (PC), 2021, : 180 - 185
  • [26] High-axial resolution single-molecule localization under dense excitation with a multi-channel deep U-Net
    Zhang, Weihang
    Zhang, Zhihong
    Bian, Liheng
    Wang, Haoqian
    Suo, Jinli
    Dai, Qionghai
    OPTICS LETTERS, 2021, 46 (21) : 5477 - 5480
  • [27] Coarse to Fine Vertebrae Localization and Segmentation with SpatialConfiguration-Net and U-Net
    Payer, Christian
    Stern, Darko
    Bischof, Horst
    Urschler, Martin
    PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 124 - 133
  • [28] Segmentation of Mammogram Images Using U-Net with Fusion of Channel and Spatial Attention Modules (U-Net CASAM)
    Robert Singh, A.
    Vidya, S.
    Hariharasitaraman, S.
    Athisayamani, Suganya
    Hsu, Fang Rong
    Lecture Notes in Networks and Systems, 2024, 966 LNNS : 435 - 448
  • [29] CHANNEL ATTENTION RESIDUAL U-NET FOR RETINAL VESSEL SEGMENTATION
    Guo, Changlu
    Szemenyei, Marton
    Hu, Yangtao
    Wang, Wenle
    Zhou, Wei
    Yi, Yugen
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 1185 - 1189
  • [30] Dense U-Net for Limited Angle Tomography of Sound Pressure Fields
    Rothkamm, Oliver
    Guertler, Ohannes
    Czarske, Juergen
    Kuschmierz, Robert
    APPLIED SCIENCES-BASEL, 2021, 11 (10):