SPEECH INTELLIGIBILITY ENHANCEMENT USING NON-PARALLEL SPEAKING STYLE CONVERSION WITH STARGAN AND DYNAMIC RANGE COMPRESSION

被引:0
|
作者
Li, Gang [1 ,2 ]
Hu, Ruimin [1 ,2 ]
Ke, Shanfa [1 ]
Zhang, Rui [1 ]
Wang, Xiaochen [1 ,3 ]
Gao, Li [1 ]
机构
[1] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp Sci, Wuhan, Hubei, Peoples R China
[2] Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan, Hubei, Peoples R China
[3] Wuhan Univ Shenzhen, Res Inst, Shenzhen, Peoples R China
关键词
speech intelligibility; Lombard effect; speaking style conversion (SSC); StarGAN; dynamic range compression (DRC); LOMBARD SPEECH; VOCODER; NOISE;
D O I
10.1109/icme46284.2020.9102916
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Speech intelligibility enhancement is a perceptual enhancement technique for clean speech reproduced in noisy environments. It is typically used in the listening stage of multimedia communications. In this study, we enhance speech intelligibility by speaking style conversion (SSC), which is a datadriven approach inspired by a vocal mechanism named Lombard effect. The proposed SSC method combines star generative adversarial network (StarGAN) based mapping and dynamic range compression (DRC). It has two main advantages: 1) different from gender-independent conversion in previous studies, StarGAN can separately learn speech features of different genders to provide a differential conversion among genders with a single model and non-parallel training data; 2) we design a multi-level enhancement strategy with the use of DRC in the StarGAN architecture, which improves the SSC performance in strong noise interference. Experiments show that our method outperforms baseline methods.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] A Novel Iterative Speaker Model Alignment Method from Non-Parallel Speech for Voice Conversion
    Song, Peng
    Zheng, Wenming
    Zhang, Xinran
    Jin, Yun
    Zha, Cheng
    Xin, Minghai
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2015, E98A (10) : 2178 - 2181
  • [32] Non-parallel and many-to-many voice conversion using variational autoencoders integrating speech recognition and speaker verification
    Saito, Yuki
    Nakamura, Taiki
    Ijima, Yusuke
    Nishida, Kyosuke
    Takamichi, Shinnosuke
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2021, 42 (01) : 1 - 11
  • [33] CycleGAN-based Non-parallel Speech Enhancement with an Adaptive Attention-in-attention Mechanism
    Yu, Guochen
    Wang, Yutian
    Zheng, Chengshi
    Wang, Hui
    Zhang, Qin
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 523 - 529
  • [34] Conversion of Non-Audible Murmur to Normal Speech Based on FR-GMM using Non-Parallel Training Adaptation Method
    Kumar, Rajesh T.
    Suresh, G. R.
    Subaraja, Kanaga S.
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT SUSTAINABLE SYSTEMS (ICISS 2019), 2019, : 97 - 103
  • [35] Non-parallel Voice Conversion with Controllable Speaker Individuality using Variational Autoencoder
    Tuan Vu Ho
    Akagi, Masato
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 106 - 111
  • [36] Investigation of Text-to-Speech-based Synthetic Parallel Data for Sequence-to-Sequence Non-Parallel Voice Conversion
    Ma, Ding
    Huang, Wen-Chin
    Toda, Tomoki
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 870 - 877
  • [37] Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine
    Nakashika, Toru
    Takiguchi, Tetsuya
    Minami, Yasuhiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 2032 - 2045
  • [38] Non-Parallel Voice Conversion System Using An Auto-Regressive Model
    Ezzine, Kadria
    Frikha, Mondher
    Di Martino, Joseph
    PROCEEDINGS OF THE 2022 5TH INTERNATIONAL CONFERENCE ON ADVANCED SYSTEMS AND EMERGENT TECHNOLOGIES IC_ASET'2022), 2022, : 500 - 504
  • [39] Parallel-data-free Many-to-many Voice Conversion based on DNN Integrated with Eigenspace Using a Non-parallel Speech Corpus
    Hashimoto, Tetsuya
    Uchida, Hidetsugu
    Saito, Daisuke
    Minematsu, Nobuaki
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1278 - 1282
  • [40] Non-Parallel Text Style Transfer using Self-Attentional Discriminator as Supervisor
    Feng, Kuan
    Zhu, Yanmin
    Yu, Jiadi
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 416 - 426