MULTI-CHANNEL SPEECH ENHANCEMENT USING GRAPH NEURAL NETWORKS

被引：18

作者：

Tzirakis, Panagiotis ^{[1
]}

Kumar, Anurag ^{[1
]}

Donley, Jacob ^{[1
]}

机构：

[1] Facebook Real Labs Res, Redmond, WA 98052 USA

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

Speech enhancement; deep learning; multi-channel processing; graph neural networks;

D O I：

10.1109/ICASSP39728.2021.9413955

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Multi-channel speech enhancement aims to extract clean speech from a noisy mixture using signals captured from multiple microphones. Recently proposed methods tackle this problem by incorporating deep neural network models with spatial filtering techniques such as the minimum variance distortionless response (MVDR) beamformer. In this paper, we introduce a different research direction by viewing each audio channel as a node lying in a non-Euclidean space and, specifically, a graph. This formulation allows us to apply graph neural networks (GNN) to find spatial correlations among the different channels (nodes). We utilize graph convolution networks (GCN) by incorporating them in the embedding space of a U-Net architecture. We use LibriSpeech dataset and simulate room acoustics data to extensively experiment with our approach using different array types, and number of microphones. Results indicate the superiority of our approach when compared to prior state-of-the-art method.

引用

页码：3415 / 3419

页数：5

共 50 条

[1] A Novel Approach to Multi-Channel Speech Enhancement Based on Graph Neural Networks
Chau, Hoang Ngoc
Bui, Tien Dat
Nguyen, Huu Binh
Duong, Thanh Thi Hien
Nguyen, Quoc Cuong
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1133 - 1144
[2] Multi-Channel Graph Neural Networks
Zhou, Kaixiong
Song, Qingquan
Huang, Xiao
Zha, Daochen
Zou, Na
Hu, Xia
[J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1352 - 1358
[3] CONSISTENCY-AWARE MULTI-CHANNEL SPEECH ENHANCEMENT USING DEEP NEURAL NETWORKS
Masuyama, Yoshiki
Togami, Masahito
Komatsu, Tatsuya
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 821 - 825
[4] Multi-channel speech enhancement using early and late fusion convolutional neural networks
S. Siva Priyanka
T. Kishore Kumar
[J]. Signal, Image and Video Processing, 2023, 17 : 973 - 979
[5] Multi-channel speech enhancement using early and late fusion convolutional neural networks
Priyanka, S. Siva
Kumar, T. Kishore
[J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 973 - 979
[6] Multi-Channel Pooling Graph Neural Networks
Du, Jinlong
Wang, Senzhang
Miao, Hao
Zhang, Jiaqiang
[J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1442 - 1448
[7] All-Neural Multi-Channel Speech Enhancement
Wang, Zhong-Qiu
Wang, DeLiang
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3234 - 3238
[8] DeepMCGCN: Multi-channel Deep Graph Neural Networks
Lei Meng
Zhonglin Ye
Yanlin Yang
Haixing Zhao
[J]. International Journal of Computational Intelligence Systems, 17
[9] DeepMCGCN: Multi-channel Deep Graph Neural Networks
Meng, Lei
Ye, Zhonglin
Yang, Yanlin
Zhao, Haixing
[J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
[10] Adaptive Multi-Channel Deep Graph Neural Networks
Wang, Renbiao
Li, Fengtai
Liu, Shuwei
Li, Weihao
Chen, Shizhan
Feng, Bin
Jin, Di
[J]. SYMMETRY-BASEL, 2024, 16 (04):

← 1 2 3 4 5 →