VSEGAN: VISUAL SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK

被引:4
|
作者
Xu, Xinmeng [1 ,2 ]
Wang, Yang [1 ]
Xu, Dongxiang [1 ]
Peng, Yiyuan [1 ]
Zhang, Cong [1 ]
Jia, Jie [1 ]
Chen, Binbin [1 ]
机构
[1] Vivo AI Lab, Shenzhen, Peoples R China
[2] Trinity Coll Dublin, EE Engn, Dublin, Ireland
关键词
speech enhancement; visual information; multi-layer feature fusion convolution network; generative adversarial network;
D O I
10.1109/ICASSP43922.2022.9747187
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech enhancement is an essential task of improving speech quality in noise scenario. Several state-of-the-art approaches have introduced visual information for speech enhancement, since the visual aspect of speech is essentially unaffected by acoustic environment. This paper proposes a novel framework that involves visual information for speech enhancement, by incorporating a Generative Adversarial Network (GAN). In particular, the proposed visual speech enhancement GAN consists of two networks trained in adversarial manner, i) a generator that adopts multi-layer feature fusion convolution network to enhance input noisy speech, and ii) a discriminator that attempts to minimize the discrepancy between the distributions of the clean speech signal and enhanced speech signal. Experiment results demonstrated superior performance of the proposed model against several state-of-the-art models.
引用
收藏
页码:7307 / 7311
页数:5
相关论文
共 50 条
  • [21] Speech Enhancement Using Generative Adversarial Network by Distilling Knowledge from Statistical Method
    Wu, Jianfeng
    Hua, Yongzhu
    Yang, Shengying
    Qin, Hongshuai
    Qin, Huibin
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (16):
  • [22] Perception-guided generative adversarial network for end-to-end speech enhancement
    Li, Yihao
    Sun, Meng
    Zhang, Xiongwei
    [J]. APPLIED SOFT COMPUTING, 2022, 128
  • [23] EXPLORING SPEECH ENHANCEMENT WITH GENERATIVE ADVERSARIAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Donahue, Chris
    Li, Bo
    Prabhavalkar, Rohit
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5024 - 5028
  • [24] TIME-FREQUENCY MASKING-BASED SPEECH ENHANCEMENT USING GENERATIVE ADVERSARIAL NETWORK
    Soni, Meet H.
    Shah, Neil
    Patil, Hemant A.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5039 - 5043
  • [25] Noise Prior Knowledge Learning for Speech Enhancement via Gated Convolutional Generative Adversarial Network
    Fan, Cunhang
    Liu, Bin
    Tao, Jianhua
    Yi, Jiangyan
    Wen, Zhengqi
    Bai, Ye
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 662 - 666
  • [26] Transforming the Emotion in Speech using a Generative Adversarial Network
    Yasuda, Kenji
    Orihara, Ryohei
    Sei, Yuichi
    Tahara, Yasuyuki
    Ohsuga, Akihiko
    [J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 427 - 434
  • [27] Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions
    Li, Lujun
    Wudamu
    Kuerzinger, Ludwig
    Watzel, Tobias
    Rigoll, Gerhard
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (16):
  • [28] A New Method for Improving Generative Adversarial Networks in Speech Enhancement
    Yang, Fan
    Li, Junfeng
    Yan, Yonghong
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [29] Speech Enhancement Based On Spectrogram Conditional Generative Adversarial Networks
    Han, Ru
    Liu, Jianming
    Wang, Mingwen
    [J]. ELEVENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2019), 2020, 11373
  • [30] Enhanced network optimized generative adversarial network for image enhancement
    Lingyu Yan
    Jiarun Fu
    Chunzhi Wang
    Zhiwei Ye
    Hongwei Chen
    Hefei Ling
    [J]. Multimedia Tools and Applications, 2021, 80 : 14363 - 14381