SPEECH INTELLIGIBILITY ENHANCEMENT USING NON-PARALLEL SPEAKING STYLE CONVERSION WITH STARGAN AND DYNAMIC RANGE COMPRESSION

被引:0
|
作者
Li, Gang [1 ,2 ]
Hu, Ruimin [1 ,2 ]
Ke, Shanfa [1 ]
Zhang, Rui [1 ]
Wang, Xiaochen [1 ,3 ]
Gao, Li [1 ]
机构
[1] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp Sci, Wuhan, Hubei, Peoples R China
[2] Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan, Hubei, Peoples R China
[3] Wuhan Univ Shenzhen, Res Inst, Shenzhen, Peoples R China
关键词
speech intelligibility; Lombard effect; speaking style conversion (SSC); StarGAN; dynamic range compression (DRC); LOMBARD SPEECH; VOCODER; NOISE;
D O I
10.1109/icme46284.2020.9102916
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Speech intelligibility enhancement is a perceptual enhancement technique for clean speech reproduced in noisy environments. It is typically used in the listening stage of multimedia communications. In this study, we enhance speech intelligibility by speaking style conversion (SSC), which is a datadriven approach inspired by a vocal mechanism named Lombard effect. The proposed SSC method combines star generative adversarial network (StarGAN) based mapping and dynamic range compression (DRC). It has two main advantages: 1) different from gender-independent conversion in previous studies, StarGAN can separately learn speech features of different genders to provide a differential conversion among genders with a single model and non-parallel training data; 2) we design a multi-level enhancement strategy with the use of DRC in the StarGAN architecture, which improves the SSC performance in strong noise interference. Experiments show that our method outperforms baseline methods.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Intelligibility and Clarity of Reverberant Speech: Effects of Wide Dynamic Range Compression Release Time and Working Memory
    Reinhart, Paul N.
    Souza, Pamela E.
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2016, 59 (06): : 1543 - 1554
  • [42] Improving speech intelligibility in noise by SII-dependent preprocessing using frequency-dependent amplification and dynamic range compression
    Schepker, Henning
    Rennies, Jan
    Doclo, Simon
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3544 - 3548
  • [43] Side effects of fast-acting dynamic range compression that affect intelligibility in a competing speech task
    Stone, MA
    Moore, BCJ
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2004, 116 (04): : 2311 - 2323
  • [44] Side effects of fast-acting dynamic range compression that affect intelligibility in a competing speech task
    Stone, M.A. (mas19@cam.ac.uk), 1600, Acoustical Society of America (116):
  • [45] Reimagining speech: a scoping review of deep learning-based methods for non-parallel voice conversion
    Bargum, Anders R.
    Serafin, Stefania
    Erkut, Cumhur
    FRONTIERS IN SIGNAL PROCESSING, 2024, 4
  • [46] Vocal Effort Based Speaking Style Conversion Using Vocoder Features and Parallel Learning
    Seshadri, Shreyas
    Juvela, Lauri
    Rasanen, Okko
    Alku, Paavo
    IEEE ACCESS, 2019, 7 : 17230 - 17246
  • [47] Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition
    Takashima, Yuki
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (01)
  • [48] Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition
    Yuki Takashima
    Toru Nakashika
    Tetsuya Takiguchi
    Yasuo Ariki
    EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [49] Enhanced Variational Auto-encoder for Voice Conversion Using Non-parallel Corpora
    Huang Guojie
    Jin Hui
    Yu Yibiao
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 46 - 49
  • [50] Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder
    Hsu, Chin-Cheng
    Hwang, Hsin-Te
    Wu, Yi-Chiao
    Tsao, Yu
    Wang, Hsin-Min
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,