SPEECH INTELLIGIBILITY ENHANCEMENT USING NON-PARALLEL SPEAKING STYLE CONVERSION WITH STARGAN AND DYNAMIC RANGE COMPRESSION

被引:0
|
作者
Li, Gang [1 ,2 ]
Hu, Ruimin [1 ,2 ]
Ke, Shanfa [1 ]
Zhang, Rui [1 ]
Wang, Xiaochen [1 ,3 ]
Gao, Li [1 ]
机构
[1] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp Sci, Wuhan, Hubei, Peoples R China
[2] Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan, Hubei, Peoples R China
[3] Wuhan Univ Shenzhen, Res Inst, Shenzhen, Peoples R China
关键词
speech intelligibility; Lombard effect; speaking style conversion (SSC); StarGAN; dynamic range compression (DRC); LOMBARD SPEECH; VOCODER; NOISE;
D O I
10.1109/icme46284.2020.9102916
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Speech intelligibility enhancement is a perceptual enhancement technique for clean speech reproduced in noisy environments. It is typically used in the listening stage of multimedia communications. In this study, we enhance speech intelligibility by speaking style conversion (SSC), which is a datadriven approach inspired by a vocal mechanism named Lombard effect. The proposed SSC method combines star generative adversarial network (StarGAN) based mapping and dynamic range compression (DRC). It has two main advantages: 1) different from gender-independent conversion in previous studies, StarGAN can separately learn speech features of different genders to provide a differential conversion among genders with a single model and non-parallel training data; 2) we design a multi-level enhancement strategy with the use of DRC in the StarGAN architecture, which improves the SSC performance in strong noise interference. Experiments show that our method outperforms baseline methods.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression
    Zorila, Tudor-Catalin
    Kandia, Varvara
    Stylianou, Yannis
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 634 - 637
  • [22] Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data
    Zhang, Mingyang
    Zhou, Yi
    Zhao, Li
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1290 - 1302
  • [23] Phoneme-guided Dysarthric speech conversion With non-parallel data by joint training
    Xunquan Chen
    Atsuki Oshiro
    Jinhui Chen
    Ryoichi Takashima
    Tetsuya Takiguchi
    Signal, Image and Video Processing, 2022, 16 : 1641 - 1648
  • [24] Phoneme-guided Dysarthric speech conversion With non-parallel data by joint training
    Chen, Xunquan
    Oshiro, Atsuki
    Chen, Jinhui
    Takashima, Ryoichi
    Takiguchi, Tetsuya
    SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (06) : 1641 - 1648
  • [25] JOINT MAGNITUDE ESTIMATION AND PHASE RECOVERY USING CYCLE-IN-CYCLE GAN FOR NON-PARALLEL SPEECH ENHANCEMENT
    Yu, Guochen
    Li, Andong
    Wang, Yutian
    Guo, Yinuo
    Wang, Hui
    Zheng, Chengshi
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6967 - 6971
  • [26] Modeling Noise Influence to Speech Intelligibility Non-intrusively by Reduced Speech Dynamic Range
    Chen, Fei
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1359 - 1362
  • [27] Enriching Source Style Transfer in Recognition-Synthesis based Non-Parallel Voice Conversion
    Wang, Zhichao
    Zhou, Xinyong
    Yang, Fengyu
    Li, Tao
    Du, Hongqiang
    Xie, Lei
    Gan, Wendong
    Chen, Haitao
    Li, Hai
    INTERSPEECH 2021, 2021, : 831 - 835
  • [28] Non-parallel Voice Conversion using Weighted Generative Adversarial Networks
    Paul, Dipjyoti
    Pantazis, Yannis
    Stylianou, Yannis
    INTERSPEECH 2019, 2019, : 659 - 663
  • [29] The effect of hearing aid dynamic range compression on speech intelligibility in a realistic virtual sound environment
    Mansour, Naim
    Marschall, Marton
    Westermann, Adam
    May, Tobias
    Dau, Torsten
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2022, 151 (01): : 232 - 241
  • [30] BOOTSTRAPPING NON-PARALLEL VOICE CONVERSION FROM SPEAKER-ADAPTIVE TEXT-TO-SPEECH
    Luong, Hieu-Thi
    Yamagishi, Junichi
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 200 - 207