A Novel Approach to Multi-Channel Speech Enhancement Based on Graph Neural Networks

被引:1
|
作者
Chau, Hoang Ngoc [1 ]
Bui, Tien Dat [2 ]
Nguyen, Huu Binh [1 ]
Duong, Thanh Thi Hien [3 ]
Nguyen, Quoc Cuong [1 ]
机构
[1] Hanoi Univ Sci & Technol, Sch Elect & Elect Engn, Hanoi 100000, Vietnam
[2] Viettel Grp, Viettel Cyberspace Ctr, Hanoi 100000, Vietnam
[3] Hanoi Univ Min & Geol, Hanoi 100000, Vietnam
关键词
Multi-channel speech enhancement; deep learning-based; graph convolutional networks; complex ideal ratio mask; TIME-FREQUENCY MASKING; BEAMFORMER; DOMAIN;
D O I
10.1109/TASLP.2024.3352259
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Multi-channel speech enhancement aims at utilizing spatial relationships between signals captured from a microphone array along with temporal-spectral information efficiently to estimate the clean target. An emerging approach is to design deep learning-based end-to-end architectures. In this work, we provide a new way to process latent multi-channel representations. We introduce a novel end-to-end system called temporal graph convolutional network, which views the embedding space of multi-channel signals as a graph and leverages the power of graph neural networks (GNNs) to analyze spatial correlations as well as temporal-spectral information simultaneously. To be specific, graph convolutional networks (GCNs), a popular GNN variant, are integrated into a complex convolutional encoder-decoder structure to compute a complex ideal ratio mask. The estimated mask is subsequently multiplied with the reference microphone spectrogram to get enhanced speech. We demonstrate the superiority of our approach by comparing it to state-of-the-art methods on ConferencingSpeech 2021 Challenge data. Our results and analyses prove that GCN is a novel yet promising mechanism for speech enhancement systems, providing an interesting alternative for recent deep learning-based approaches and inspiration for future research.
引用
收藏
页码:1133 / 1144
页数:12
相关论文
共 50 条
  • [41] Drug-target interaction predication via multi-channel graph neural networks
    Li, Yang
    Qiao, Guanyu
    Wang, Keqi
    Wang, Guohua
    [J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [42] JOINT TRAINING OF DEEP NEURAL NETWORKS FOR MULTI-CHANNEL DEREVERBERATION AND SPEECH SOURCE SEPARATION
    Togami, Masahito
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3032 - 3036
  • [43] A Regression Approach to Speech Enhancement Based on Deep Neural Networks
    Xu, Yong
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (01) : 7 - 19
  • [44] DEEP BEAMFORMING NETWORKS FOR MULTI-CHANNEL SPEECH RECOGNITION
    Xiao, Xiong
    Watanabe, Shinji
    Erdogan, Hakan
    Lu, Liang
    Hershey, John
    Seltzer, Michael L.
    Chen, Guoguo
    Zhang, Yu
    Mandel, Michael
    Yu, Dong
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5745 - 5749
  • [45] A Neural Beamspace-Domain Filter for Real-Time Multi-Channel Speech Enhancement
    Liu, Wenzhe
    Li, Andong
    Wang, Xiao
    Yuan, Minmin
    Chen, Yi
    Zheng, Chengshi
    Li, Xiaodong
    [J]. SYMMETRY-BASEL, 2022, 14 (06):
  • [46] Factorized MVDR Deep Beamforming for Multi-Channel Speech Enhancement
    Kim, Hansol
    Kang, Kyeongmuk
    Shin, Jong Won
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1898 - 1902
  • [47] A separation and interaction framework for causal multi-channel speech enhancement
    Liu, Wenzhe
    Li, Andong
    Zheng, Chengshi
    Li, Xiaodong
    [J]. DIGITAL SIGNAL PROCESSING, 2022, 126
  • [48] Multi-channel Speech Enhancement with Multiple-target GANs
    Yuan, Jing
    Bao, Changchun
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2020), 2020,
  • [49] A Causal U-net based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement
    Ren, Xinlei
    Zhang, Xu
    Chen, Lianwu
    Zheng, Xiguang
    Zhang, Chen
    Guo, Liang
    Yu, Bing
    [J]. INTERSPEECH 2021, 2021, : 1832 - 1836
  • [50] A novel approach to identify insider-based jamming attacks in multi-channel wireless networks
    Nguyen, Hoang
    Pongthawornkamol, Thadpong
    Nahrstedt, Klara
    [J]. MILCOM 2009 - 2009 IEEE MILITARY COMMUNICATIONS CONFERENCE, VOLS 1-4, 2009, : 2646 - 2652