VSEGAN: VISUAL SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK

被引:4
|
作者
Xu, Xinmeng [1 ,2 ]
Wang, Yang [1 ]
Xu, Dongxiang [1 ]
Peng, Yiyuan [1 ]
Zhang, Cong [1 ]
Jia, Jie [1 ]
Chen, Binbin [1 ]
机构
[1] Vivo AI Lab, Shenzhen, Peoples R China
[2] Trinity Coll Dublin, EE Engn, Dublin, Ireland
关键词
speech enhancement; visual information; multi-layer feature fusion convolution network; generative adversarial network;
D O I
10.1109/ICASSP43922.2022.9747187
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech enhancement is an essential task of improving speech quality in noise scenario. Several state-of-the-art approaches have introduced visual information for speech enhancement, since the visual aspect of speech is essentially unaffected by acoustic environment. This paper proposes a novel framework that involves visual information for speech enhancement, by incorporating a Generative Adversarial Network (GAN). In particular, the proposed visual speech enhancement GAN consists of two networks trained in adversarial manner, i) a generator that adopts multi-layer feature fusion convolution network to enhance input noisy speech, and ii) a discriminator that attempts to minimize the discrepancy between the distributions of the clean speech signal and enhanced speech signal. Experiment results demonstrated superior performance of the proposed model against several state-of-the-art models.
引用
收藏
页码:7307 / 7311
页数:5
相关论文
共 50 条
  • [1] SEGAN: Speech Enhancement Generative Adversarial Network
    Pascual, Santiago
    Bonafonte, Antonio
    Serra, Joan
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3642 - 3646
  • [2] Speech Enhancement Using Generative Adversarial Network (GAN)
    Huq, Mahmudul
    Maskeliunas, Rytis
    [J]. HYBRID INTELLIGENT SYSTEMS, HIS 2021, 2022, 420 : 273 - 282
  • [3] GSC Based Speech Enhancement with Generative Adversarial Network
    Zhou, Yao
    Bao, Changchun
    Cheng, Rui
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 901 - 906
  • [4] Enhancement of Alaryngeal Speech using Generative Adversarial Network (GAN)
    Huq, Mahmudul
    [J]. 2021 IEEE/ACS 18TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2021,
  • [5] SELF-ATTENTION GENERATIVE ADVERSARIAL NETWORK FOR SPEECH ENHANCEMENT
    Huy Phan
    Nguyen, Huy Le
    Chen, Oliver Y.
    Koch, Philipp
    Duong, Ngoc Q. K.
    McLoughlin, Ian
    Mertins, Alfred
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7103 - 7107
  • [6] Speech Enhancement via Residual Dense Generative Adversarial Network
    Zhou, Lin
    Zhong, Qiuyue
    Wang, Tianyi
    Lu, Siyuan
    Hu, Hongmei
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2021, 38 (03): : 279 - 289
  • [7] LANGUAGE AND NOISE TRANSFER IN SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK
    Pascual, Santiago
    Park, Maruchan
    Serra, Joan
    Bonafonte, Antonio
    Ahn, Kang-Hun
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5019 - 5023
  • [8] Improved Wasserstein conditional generative adversarial network speech enhancement
    Shan Qin
    Ting Jiang
    [J]. EURASIP Journal on Wireless Communications and Networking, 2018
  • [9] Improved Wasserstein conditional generative adversarial network speech enhancement
    Qin, Shan
    Jiang, Ting
    [J]. EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2018,
  • [10] A Loss With Mixed Penalty for Speech Enhancement Generative Adversarial Network
    Cao, Jie
    Zhou, Yaofeng
    Yu, Hong
    Li, Xiaoxu
    Wang, Dan
    Ma, Zhanyu
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 86 - 90