A Novel Approach to Multi-Channel Speech Enhancement Based on Graph Neural Networks

被引:1
|
作者
Chau, Hoang Ngoc [1 ]
Bui, Tien Dat [2 ]
Nguyen, Huu Binh [1 ]
Duong, Thanh Thi Hien [3 ]
Nguyen, Quoc Cuong [1 ]
机构
[1] Hanoi Univ Sci & Technol, Sch Elect & Elect Engn, Hanoi 100000, Vietnam
[2] Viettel Grp, Viettel Cyberspace Ctr, Hanoi 100000, Vietnam
[3] Hanoi Univ Min & Geol, Hanoi 100000, Vietnam
关键词
Multi-channel speech enhancement; deep learning-based; graph convolutional networks; complex ideal ratio mask; TIME-FREQUENCY MASKING; BEAMFORMER; DOMAIN;
D O I
10.1109/TASLP.2024.3352259
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Multi-channel speech enhancement aims at utilizing spatial relationships between signals captured from a microphone array along with temporal-spectral information efficiently to estimate the clean target. An emerging approach is to design deep learning-based end-to-end architectures. In this work, we provide a new way to process latent multi-channel representations. We introduce a novel end-to-end system called temporal graph convolutional network, which views the embedding space of multi-channel signals as a graph and leverages the power of graph neural networks (GNNs) to analyze spatial correlations as well as temporal-spectral information simultaneously. To be specific, graph convolutional networks (GCNs), a popular GNN variant, are integrated into a complex convolutional encoder-decoder structure to compute a complex ideal ratio mask. The estimated mask is subsequently multiplied with the reference microphone spectrogram to get enhanced speech. We demonstrate the superiority of our approach by comparing it to state-of-the-art methods on ConferencingSpeech 2021 Challenge data. Our results and analyses prove that GCN is a novel yet promising mechanism for speech enhancement systems, providing an interesting alternative for recent deep learning-based approaches and inspiration for future research.
引用
收藏
页码:1133 / 1144
页数:12
相关论文
共 50 条
  • [31] A multi-channel subband generalized singular value decomposition approach to speech enhancement
    Spriet, A
    Moonen, M
    Wouters, J
    [J]. EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, 2002, 13 (02): : 149 - 158
  • [32] Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement
    Taherian, Hassan
    Wang, Zhong-Qiu
    Chang, Jorge
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1293 - 1302
  • [33] A Complex Neural Network Adaptive Beamforming for Multi-channel Speech Enhancement in Time Domain
    Jiang, Tao
    Liu, Hongqing
    Zhou, Yi
    Gan, Lu
    [J]. COMMUNICATIONS AND NETWORKING (CHINACOM 2021), 2022, : 129 - 139
  • [34] A Feature Integration Network for Multi-Channel Speech Enhancement
    Zeng, Xiao
    Zhang, Xue
    Wang, Mingjiang
    [J]. Sensors, 2024, 24 (22)
  • [35] Beamforming and lightweight GRU neural network combination model for multi-channel speech enhancement
    Cao, Zhengdong
    Li, Dongmei
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (8-9) : 5677 - 5683
  • [36] Multi-Channel Graph Neural Network for Entity Alignment
    Cao, Yixin
    Liu, Zhiyuan
    Li, Chengjiang
    Liu, Zhiyuan
    Li, Juanzi
    Chua, Tat-Seng
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1452 - 1461
  • [37] Adaptive multi-channel Bayesian Graph Neural Network
    Yang, Dong
    Liu, Zhaowei
    Wang, Yingjie
    Xu, Jindong
    Yan, Weiqing
    Li, Ranran
    [J]. NEUROCOMPUTING, 2024, 575
  • [38] EXPLORING MULTI-CHANNEL FEATURES FOR DENOISING-AUTOENCODER-BASED SPEECH ENHANCEMENT
    Araki, Shoko
    Hayashi, Tomoki
    Delcroix, Marc
    Fujimoto, Masakiyo
    Takeda, Kazuya
    Nakatani, Tomohiro
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 116 - 120
  • [39] A SUPERVISED MULTI-CHANNEL SPEECH ENHANCEMENT ALGORITHM BASED ON BAYESIAN NMF MODEL
    Chung, Hanwook
    Plourde, Eric
    Champagne, Benoit
    [J]. 2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 221 - 225
  • [40] A New Neural Beamformer for Multi-channel Speech Separation
    Liu, Ruqiao
    Zhou, Yi
    Liu, Hongqing
    Xu, Xinmeng
    Jia, Jie
    Chen, Binbin
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2022, 94 (10): : 977 - 987