Parallel vs. Non-parallel Voice Conversion for Esophageal Speech

被引:3
|
作者
Serrano, Luis [1 ]
Raman, Sneha [1 ]
Tavarez, David [1 ]
Navas, Eva [1 ]
Hernaez, Inma [1 ]
机构
[1] Univ Basque Country UPV EHU, Leioa, Spain
来源
基金
欧盟地平线“2020”;
关键词
voice conversion; speech and voice disorders; alaryngeal voices; speech intelligibility; TRACHEOESOPHAGEAL SPEECH; NEURAL-NETWORKS; ENHANCEMENT; TRANSFORMATION;
D O I
10.21437/Interspeech.2019-2194
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
State of the art systems for voice conversion have been shown to generate highly natural sounding converted speech. Voice conversion techniques have also been applied to alaryngeal speech, with the aim of improving its quality or its intelligibility. In this paper, we present an attempt to apply a voice conversion strategy based on phonetic posteriorgrams (PPGs), which produces very high quality converted speech, to improve the characteristics of esophageal speech. The main advantage of this PPG based architecture lies in the fact that it is able to convert speech from any source, without the need to previously train the system with a parallel corpus. However, our results show that the PPG approach degrades the intelligibility of the converted speech considerably, especially when the input speech is already poorly intelligible. In this paper two systems are compared, an LSTM based one-to-one conversion system, which is referred to as the baseline, and the new system using phonetic posteriorgrams. Both spectral parameters and f(0) are converted using DNN (Deep Neural Network) based architectures. Results from both objective and subjective evaluations are presented, showing that although ASR (Automated Speech Recognition) errors are reduced, original esophageal speech is still preferred by subjects.
引用
收藏
页码:4549 / 4553
页数:5
相关论文
共 50 条
  • [1] SINGING VOICE CONVERSION WITH NON-PARALLEL DATA
    Chen, Xin
    Chu, Wei
    Guo, Jinxi
    Xu, Ning
    [J]. 2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 292 - 296
  • [2] Non-Parallel Voice Conversion for ASR Augmentation
    Wang, Gary
    Rosenberg, Andrew
    Ramabhadran, Bhuvana
    Biadsy, Fadi
    Huang, Yinghui
    Emond, Jesse
    Mengibar, Pedro Moreno
    [J]. INTERSPEECH 2022, 2022, : 3408 - 3412
  • [3] Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression
    Wu, Yi-Chiao
    Tobing, Patrick Lumban
    Kobayashi, Kazuhiro
    Hayashi, Tomoki
    Toda, Tomoki
    [J]. IEEE ACCESS, 2020, 8 : 62094 - 62106
  • [4] CVC: Contrastive Learning for Non-parallel Voice Conversion
    Li, Tingle
    Liu, Yichen
    Hu, Chenxu
    Zhao, Hang
    [J]. INTERSPEECH 2021, 2021, : 1324 - 1328
  • [5] NOVEL METRIC LEARNING FOR NON-PARALLEL VOICE CONVERSION
    Shah, Nirmesh J.
    Patil, Hemant A.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3722 - 3726
  • [6] Frame Labeling and Mapping for Non-parallel Voice Conversion
    Dong, Minghui
    Yang, Chenyu
    Ehnes, Jochen Walter
    Lu, Yanfeng
    Ming, Huaiping
    Huang, Dongyan
    [J]. 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2017, : 361 - 365
  • [7] Non-parallel Voice Conversion with Generative Attentional Networks
    Chiu, Tse Wei
    Guo, You Sheng
    Chang, Pao-Chi
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 141 - 145
  • [8] Non-Parallel Voice Conversion with Cyclic Variational Autoencoder
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. INTERSPEECH 2019, 2019, : 674 - 678
  • [9] Transferring Source Style in Non-Parallel Voice Conversion
    Liu, Songxiang
    Cao, Yuewen
    Kang, Shiyin
    Hu, Na
    Liu, Xunying
    Su, Dan
    Yu, Dong
    Meng, Helen
    [J]. INTERSPEECH 2020, 2020, : 4721 - 4725
  • [10] Mixture of Factor Analyzers Using Priors From Non-Parallel Speech for Voice Conversion
    Wu, Zhizheng
    Kinnunen, Tomi
    Chng, Eng Siong
    Li, Haizhou
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (12) : 914 - 917