A Novel Approach to Multi-Channel Speech Enhancement Based on Graph Neural Networks

被引:1
|
作者
Chau, Hoang Ngoc [1 ]
Bui, Tien Dat [2 ]
Nguyen, Huu Binh [1 ]
Duong, Thanh Thi Hien [3 ]
Nguyen, Quoc Cuong [1 ]
机构
[1] Hanoi Univ Sci & Technol, Sch Elect & Elect Engn, Hanoi 100000, Vietnam
[2] Viettel Grp, Viettel Cyberspace Ctr, Hanoi 100000, Vietnam
[3] Hanoi Univ Min & Geol, Hanoi 100000, Vietnam
关键词
Multi-channel speech enhancement; deep learning-based; graph convolutional networks; complex ideal ratio mask; TIME-FREQUENCY MASKING; BEAMFORMER; DOMAIN;
D O I
10.1109/TASLP.2024.3352259
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Multi-channel speech enhancement aims at utilizing spatial relationships between signals captured from a microphone array along with temporal-spectral information efficiently to estimate the clean target. An emerging approach is to design deep learning-based end-to-end architectures. In this work, we provide a new way to process latent multi-channel representations. We introduce a novel end-to-end system called temporal graph convolutional network, which views the embedding space of multi-channel signals as a graph and leverages the power of graph neural networks (GNNs) to analyze spatial correlations as well as temporal-spectral information simultaneously. To be specific, graph convolutional networks (GCNs), a popular GNN variant, are integrated into a complex convolutional encoder-decoder structure to compute a complex ideal ratio mask. The estimated mask is subsequently multiplied with the reference microphone spectrogram to get enhanced speech. We demonstrate the superiority of our approach by comparing it to state-of-the-art methods on ConferencingSpeech 2021 Challenge data. Our results and analyses prove that GCN is a novel yet promising mechanism for speech enhancement systems, providing an interesting alternative for recent deep learning-based approaches and inspiration for future research.
引用
收藏
页码:1133 / 1144
页数:12
相关论文
共 50 条
  • [1] MULTI-CHANNEL SPEECH ENHANCEMENT USING GRAPH NEURAL NETWORKS
    Tzirakis, Panagiotis
    Kumar, Anurag
    Donley, Jacob
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3415 - 3419
  • [2] Multi-Channel Graph Neural Networks
    Zhou, Kaixiong
    Song, Qingquan
    Huang, Xiao
    Zha, Daochen
    Zou, Na
    Hu, Xia
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1352 - 1358
  • [3] Multi-Channel Pooling Graph Neural Networks
    Du, Jinlong
    Wang, Senzhang
    Miao, Hao
    Zhang, Jiaqiang
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1442 - 1448
  • [4] All-Neural Multi-Channel Speech Enhancement
    Wang, Zhong-Qiu
    Wang, DeLiang
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3234 - 3238
  • [5] Multimodal Sentiment Detection Based on Multi-channel Graph Neural Networks
    Yang, Xiaocui
    Feng, Shi
    Zhang, Yifei
    Wang, Daling
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 328 - 339
  • [6] DeepMCGCN: Multi-channel Deep Graph Neural Networks
    Lei Meng
    Zhonglin Ye
    Yanlin Yang
    Haixing Zhao
    [J]. International Journal of Computational Intelligence Systems, 17
  • [7] DeepMCGCN: Multi-channel Deep Graph Neural Networks
    Meng, Lei
    Ye, Zhonglin
    Yang, Yanlin
    Zhao, Haixing
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
  • [8] Adaptive Multi-Channel Deep Graph Neural Networks
    Wang, Renbiao
    Li, Fengtai
    Liu, Shuwei
    Li, Weihao
    Chen, Shizhan
    Feng, Bin
    Jin, Di
    [J]. SYMMETRY-BASEL, 2024, 16 (04):
  • [9] CONSISTENCY-AWARE MULTI-CHANNEL SPEECH ENHANCEMENT USING DEEP NEURAL NETWORKS
    Masuyama, Yoshiki
    Togami, Masahito
    Komatsu, Tatsuya
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 821 - 825
  • [10] Multi-channel speech enhancement using early and late fusion convolutional neural networks
    S. Siva Priyanka
    T. Kishore Kumar
    [J]. Signal, Image and Video Processing, 2023, 17 : 973 - 979