Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net

被引:8
|
作者
Sudo, Yui [1 ]
Itoyama, Katsutoshi [1 ]
Nishida, Kenji [1 ]
Nakadai, Kazuhiro [1 ,2 ]
机构
[1] Tokyo Inst Technol, Sch Engn, Dept Syst & Control Engn, Tokyo, Japan
[2] Honda Res Inst Japan Co Ltd, Saitama, Japan
关键词
D O I
10.1109/IEEECONF49454.2021.9382730
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper proposes a multi-channel environmental sound segmentation method. Environmental sound segmentation is an integrated method that deals with sound source localization, sound source separation and class identification. When multiple microphones are available, spatial features can be used to improve the separation accuracy of signals from different directions; however, conventional methods have two drawbacks: (a) Since sound source localization and sound source separation using spatial features and class identification using spectral features are trained in the same neural network, it overfits to the relationship between the direction of arrival and the class. (b) Although the permutation invariant training used in speech recognition could be extended, it is not practical for environmental sounds due to the maximum number of speakers limitation. This paper proposes multi-channel environmental sound segmentation method that combines U-Net which simultaneously performs sound source localization and sound source separation, and convolutional neural network which classifies the separated sounds. This method prevents overfitting to the relationship between the direction of arrival and the class. Simulation experiments using the created datasets including 75-class environmental sounds showed that the root mean squared error of the proposed method was lower than that of the conventional method.
引用
收藏
页码:382 / 387
页数:6
相关论文
共 50 条
  • [41] SCAU-Net: Spatial-Channel Attention U-Net for Gland Segmentation
    Zhao, Peng
    Zhang, Jindi
    Fang, Weijia
    Deng, Shuiguang
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2020, 8
  • [42] MFA U-Net: a U-Net like multi-stage feature analysis network for medical image segmentation
    Wang, Yupeng
    Wang, Suyu
    He, Jian
    PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (04)
  • [43] TelLungNet - Enabling Telemedicine Utilizing an Improved U-Net Lung Image Segmentation
    Rudro, Rifat Al Mamun
    Talukder, Shafin
    Islam, Nayma
    Alam, Api
    Ahmed, Tanvir
    Nur, Kamruddin
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 1387 - 1393
  • [44] Automation and acceleration of graph cut based image segmentation utilizing U-net
    Sato, Masatoshi
    Aomori, Hisashi
    Otake, Tsuyoshi
    IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2024, 15 (01): : 54 - 71
  • [45] AN ADAPTIVE MULTI-CHANNEL AUDIO-PLAY SYSTEM WITH SOUND-SOURCE RELOCATION CAPABILITIES
    Kim, K. H.
    Zhou, Tianran
    Park, Kyu-Shik
    Lee, Seok-Phil
    Lim, Tae-Beom
    2010 DIGEST OF TECHNICAL PAPERS INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS ICCE, 2010,
  • [46] Multi-Convolutional Channel Residual Spatial Attention U-Net for Industrial and Medical Image Segmentation
    Chen, Haoyu
    Kim, Kyungbaek
    IEEE ACCESS, 2024, 12 : 76089 - 76101
  • [47] Development of a Multi-Channel Wearable Heart Sound Visualization System
    Guo, Binbin
    Tang, Hong
    Xia, Shufeng
    Wang, Miao
    Hu, Yating
    Zhao, Zehang
    JOURNAL OF PERSONALIZED MEDICINE, 2022, 12 (12):
  • [48] MANUAL AND AUTOMATED CONTROL OF MULTI-CHANNEL THEATER SOUND SYSTEMS
    MCCROSKEY, LA
    SMPTE JOURNAL, 1984, 93 (06): : 574 - 580
  • [49] A MULTI-CHANNEL POSTFILTER BASED ON THE DIFFUSE NOISE SOUND FIELD
    Pfeifenberger, Lukas
    Pernkopf, Franz
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 686 - 690
  • [50] THE 4—3—N MATRIX MULTI-CHANNEL SOUND SYSTEM
    谢兴甫
    ChineseJournalofAcoustics, 1982, (02) : 210 - 218