Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net

被引:8
|
作者
Sudo, Yui [1 ]
Itoyama, Katsutoshi [1 ]
Nishida, Kenji [1 ]
Nakadai, Kazuhiro [1 ,2 ]
机构
[1] Tokyo Inst Technol, Sch Engn, Dept Syst & Control Engn, Tokyo, Japan
[2] Honda Res Inst Japan Co Ltd, Saitama, Japan
关键词
D O I
10.1109/IEEECONF49454.2021.9382730
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper proposes a multi-channel environmental sound segmentation method. Environmental sound segmentation is an integrated method that deals with sound source localization, sound source separation and class identification. When multiple microphones are available, spatial features can be used to improve the separation accuracy of signals from different directions; however, conventional methods have two drawbacks: (a) Since sound source localization and sound source separation using spatial features and class identification using spectral features are trained in the same neural network, it overfits to the relationship between the direction of arrival and the class. (b) Although the permutation invariant training used in speech recognition could be extended, it is not practical for environmental sounds due to the maximum number of speakers limitation. This paper proposes multi-channel environmental sound segmentation method that combines U-Net which simultaneously performs sound source localization and sound source separation, and convolutional neural network which classifies the separated sounds. This method prevents overfitting to the relationship between the direction of arrival and the class. Simulation experiments using the created datasets including 75-class environmental sounds showed that the root mean squared error of the proposed method was lower than that of the conventional method.
引用
收藏
页码:382 / 387
页数:6
相关论文
共 50 条
  • [1] Environmental sound segmentation utilizing Mask U-Net
    Sudo, Yui
    Itoyama, Katsutoshi
    Nishida, Kenji
    Nakadai, Kazuhiro
    2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 5340 - 5345
  • [2] Multi-channel U-Net for Music Source Separation
    Kadandale, Venkatesh S.
    Montesinos, Juan F.
    Haro, Gloria
    Gomez, Emilia
    2020 IEEE 22ND INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2020,
  • [3] Multi-channel Environmental sound segmentation
    Sudo, Yui
    Itoyama, Katsutoshi
    Nishida, Kenji
    Nakadai, Kazuhiro
    2020 IEEE/SICE INTERNATIONAL SYMPOSIUM ON SYSTEM INTEGRATION (SII), 2020, : 820 - 825
  • [4] Sound event aware environmental sound segmentation with Mask U-Net
    Sudo, Y.
    Itoyama, K.
    Nishida, K.
    Nakadai, K.
    ADVANCED ROBOTICS, 2020, 34 (20) : 1280 - 1290
  • [5] DSP Integration of Sound Source Localization and Multi-channel Wiener Filter
    Lee, Byoung-gi
    Kim, Hyun-dong
    Choi, Jong-suk
    Kim, Seyun
    Cho, Nam Ik
    2010 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2010, : 4830 - 4835
  • [6] Overlapped Sound Event Classification via Multi-Channel Sound Separation Network
    Giannoulis, Panagiotis
    Potamianos, Gerasimos
    Maragos, Petros
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 571 - 575
  • [7] Multi-channel separation of dynamic speech and sound events
    Fujimura, Takuya
    Scheibler, Robin
    INTERSPEECH 2023, 2023, : 3749 - 3753
  • [8] Sound Source Localization Based on Multi-Channel Cross-Correlation Weighted Beamforming
    Liu, Mengran
    Hu, Junhao
    Zeng, Qiang
    Jian, Zeming
    Nie, Lei
    MICROMACHINES, 2022, 13 (07)
  • [9] Multi-channel Acoustic Mapping of Respiratory System Based on Adventitious Sound Source Separation
    Sen, Ipek
    Saraclar, Murat
    Kahya, Yasemin P.
    BIYOMUT: 2009 14TH NATIONAL BIOMEDICAL ENGINEERING MEETING, 2009, : 457 - 460
  • [10] TRUNet: Transformer-Recurrent-U Network for End-to-end Multi-channel Reverberant Sound Source Separation
    Aroudi, Ali
    Uhlich, Stefan
    Font, Marc Ferras
    INTERSPEECH 2022, 2022, : 911 - 915