SPEECH INTELLIGIBILITY ENHANCEMENT USING NON-PARALLEL SPEAKING STYLE CONVERSION WITH STARGAN AND DYNAMIC RANGE COMPRESSION

被引:0
|
作者
Li, Gang [1 ,2 ]
Hu, Ruimin [1 ,2 ]
Ke, Shanfa [1 ]
Zhang, Rui [1 ]
Wang, Xiaochen [1 ,3 ]
Gao, Li [1 ]
机构
[1] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp Sci, Wuhan, Hubei, Peoples R China
[2] Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan, Hubei, Peoples R China
[3] Wuhan Univ Shenzhen, Res Inst, Shenzhen, Peoples R China
关键词
speech intelligibility; Lombard effect; speaking style conversion (SSC); StarGAN; dynamic range compression (DRC); LOMBARD SPEECH; VOCODER; NOISE;
D O I
10.1109/icme46284.2020.9102916
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Speech intelligibility enhancement is a perceptual enhancement technique for clean speech reproduced in noisy environments. It is typically used in the listening stage of multimedia communications. In this study, we enhance speech intelligibility by speaking style conversion (SSC), which is a datadriven approach inspired by a vocal mechanism named Lombard effect. The proposed SSC method combines star generative adversarial network (StarGAN) based mapping and dynamic range compression (DRC). It has two main advantages: 1) different from gender-independent conversion in previous studies, StarGAN can separately learn speech features of different genders to provide a differential conversion among genders with a single model and non-parallel training data; 2) we design a multi-level enhancement strategy with the use of DRC in the StarGAN architecture, which improves the SSC performance in strong noise interference. Experiments show that our method outperforms baseline methods.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Speech Intelligibility Enhancement By Non-Parallel Speech Style Conversion Using CWT and iMetricGAN Based CycleGAN
    Xiao, Jing
    Liu, Jiaqi
    Li, Dengshi
    Zhao, Lanxin
    Wang, Qianrui
    MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 544 - 556
  • [2] StarGAN-VC plus ASR: StarGAN-based Non-Parallel Voice Conversion Regularized by Automatic Speech Recognition
    Sakamoto, Shoki
    Taniguchi, Akira
    Taniguchi, Tadahiro
    Kameoka, Hirokazu
    INTERSPEECH 2021, 2021, : 1359 - 1363
  • [3] Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion
    Paul, Dipjyoti
    Shifas, Muhammed P., V
    Pantazis, Yannis
    Stylianou, Yannis
    INTERSPEECH 2020, 2020, : 1361 - 1365
  • [4] Speech-in-noise enhancement using amplification and dynamic range compression controlled by the speech intelligibility index
    Schepker, Henning
    Rennies, Jan
    Doclo, Simon
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2015, 138 (05): : 2692 - 2706
  • [5] Non-Parallel Whisper-to-Normal Speaking Style Conversion Using Auxiliary Classifier Variational Autoencoder
    Seki, Shogo
    Kameoka, Hirokazu
    Kaneko, Takuhiro
    Tanaka, Kou
    IEEE ACCESS, 2023, 11 : 44590 - 44599
  • [6] Non-parallel Many-to-many Voice Conversion with PSR-StarGAN
    Li, Yanping
    Xu, Dongxiang
    Zhang, Yan
    Wang, Yang
    Chen, Binbin
    INTERSPEECH 2020, 2020, : 781 - 785
  • [7] Transferring Source Style in Non-Parallel Voice Conversion
    Liu, Songxiang
    Cao, Yuewen
    Kang, Shiyin
    Hu, Na
    Liu, Xunying
    Su, Dan
    Yu, Dong
    Meng, Helen
    INTERSPEECH 2020, 2020, : 4721 - 4725
  • [8] Intelligibility of time-compressed synthetic speech: Compression method and speaking style
    Valentini-Botinhao, Cassia
    Toman, Markus
    Pucher, Michael
    Schabus, Dietmar
    Yamagishi, Junichi
    SPEECH COMMUNICATION, 2015, 74 : 52 - 64
  • [9] CYCLE-CONSISTENT ADVERSARIAL NETWORKS FOR NON-PARALLEL VOCAL EFFORT BASED SPEAKING STYLE CONVERSION
    Seshadri, Shreyas
    Juvela, Lauri
    Yamagishi, Junichi
    Rasanen, Okko
    Alku, Paavo
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6835 - 6839
  • [10] Parallel vs. Non-parallel Voice Conversion for Esophageal Speech
    Serrano, Luis
    Raman, Sneha
    Tavarez, David
    Navas, Eva
    Hernaez, Inma
    INTERSPEECH 2019, 2019, : 4549 - 4553