MULTI-CHANNEL SPEECH ENHANCEMENT USING GRAPH NEURAL NETWORKS

被引:18
|
作者
Tzirakis, Panagiotis [1 ]
Kumar, Anurag [1 ]
Donley, Jacob [1 ]
机构
[1] Facebook Real Labs Res, Redmond, WA 98052 USA
关键词
Speech enhancement; deep learning; multi-channel processing; graph neural networks;
D O I
10.1109/ICASSP39728.2021.9413955
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Multi-channel speech enhancement aims to extract clean speech from a noisy mixture using signals captured from multiple microphones. Recently proposed methods tackle this problem by incorporating deep neural network models with spatial filtering techniques such as the minimum variance distortionless response (MVDR) beamformer. In this paper, we introduce a different research direction by viewing each audio channel as a node lying in a non-Euclidean space and, specifically, a graph. This formulation allows us to apply graph neural networks (GNN) to find spatial correlations among the different channels (nodes). We utilize graph convolution networks (GCN) by incorporating them in the embedding space of a U-Net architecture. We use LibriSpeech dataset and simulate room acoustics data to extensively experiment with our approach using different array types, and number of microphones. Results indicate the superiority of our approach when compared to prior state-of-the-art method.
引用
收藏
页码:3415 / 3419
页数:5
相关论文
共 50 条
  • [1] A Novel Approach to Multi-Channel Speech Enhancement Based on Graph Neural Networks
    Chau, Hoang Ngoc
    Bui, Tien Dat
    Nguyen, Huu Binh
    Duong, Thanh Thi Hien
    Nguyen, Quoc Cuong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1133 - 1144
  • [2] Multi-Channel Graph Neural Networks
    Zhou, Kaixiong
    Song, Qingquan
    Huang, Xiao
    Zha, Daochen
    Zou, Na
    Hu, Xia
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1352 - 1358
  • [3] CONSISTENCY-AWARE MULTI-CHANNEL SPEECH ENHANCEMENT USING DEEP NEURAL NETWORKS
    Masuyama, Yoshiki
    Togami, Masahito
    Komatsu, Tatsuya
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 821 - 825
  • [4] Multi-channel speech enhancement using early and late fusion convolutional neural networks
    S. Siva Priyanka
    T. Kishore Kumar
    [J]. Signal, Image and Video Processing, 2023, 17 : 973 - 979
  • [5] Multi-channel speech enhancement using early and late fusion convolutional neural networks
    Priyanka, S. Siva
    Kumar, T. Kishore
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 973 - 979
  • [6] Multi-Channel Pooling Graph Neural Networks
    Du, Jinlong
    Wang, Senzhang
    Miao, Hao
    Zhang, Jiaqiang
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1442 - 1448
  • [7] All-Neural Multi-Channel Speech Enhancement
    Wang, Zhong-Qiu
    Wang, DeLiang
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3234 - 3238
  • [8] DeepMCGCN: Multi-channel Deep Graph Neural Networks
    Lei Meng
    Zhonglin Ye
    Yanlin Yang
    Haixing Zhao
    [J]. International Journal of Computational Intelligence Systems, 17
  • [9] DeepMCGCN: Multi-channel Deep Graph Neural Networks
    Meng, Lei
    Ye, Zhonglin
    Yang, Yanlin
    Zhao, Haixing
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
  • [10] Adaptive Multi-Channel Deep Graph Neural Networks
    Wang, Renbiao
    Li, Fengtai
    Liu, Shuwei
    Li, Weihao
    Chen, Shizhan
    Feng, Bin
    Jin, Di
    [J]. SYMMETRY-BASEL, 2024, 16 (04):