Unsupervised Acoustic Unit Representation Learning for Voice Conversion using WaveNet Auto-encoders

被引:7
|
作者
Chen, Mingjie [1 ]
Hain, Thomas [1 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield, S Yorkshire, England
来源
关键词
voice conversion; acoustic unit discovery; SPEECH REPRESENTATION; DISCOVERY; SPEAKER;
D O I
10.21437/Interspeech.2020-1785
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Unsupervised representation learning of speech has been of keen interest in recent years, which is for example evident in the wide interest of the ZeroSpeech challenges. This work presents a new method for learning frame level representations based on WaveNet auto-encoders. Of particular interest in the ZeroSpeech Challenge 2019 were models with discrete latent variable such as the Vector Quantized Variational Auto-Encoder (VQVAE). However these models generate speech with relatively poor quality. In this work we aim to address this with two approaches: first WaveNet is used as the decoder and to generate waveform data directly from the latent representation; second, the low complexity of latent representations is improved with two alternative disentanglement learning methods, namely instance normalization and sliced vector quantization. The method was developed and tested in the context of the recent ZeroSpeech challenge 2020. The system output submitted to the challenge obtained the top position for naturalness (Mean Opinion Score 4.06), top position for intelligibility (Character Error Rate 0.15), and third position for the quality of the representation (ABX test score 12.5). These and further analysis in this paper illustrates that quality of the converted speech and the acoustic units representation can be well balanced.
引用
收藏
页码:4866 / 4870
页数:5
相关论文
共 50 条
  • [1] Unsupervised representation learning with Laplacian pyramid auto-encoders
    Zhao Qilu
    Li Zongmin
    Dong Junyu
    [J]. APPLIED SOFT COMPUTING, 2019, 85
  • [2] Unsupervised Hyperbolic Representation Learning via Message Passing Auto-Encoders
    Park, Jiwoong
    Cho, Junho
    Chang, Hyung Jin
    Choi, Jin Young
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5512 - 5522
  • [3] Denoising Auto-Encoders toward Robust Unsupervised Feature Representation
    Xiong, Wei
    Du, Bo
    Zhang, Lefei
    Zhang, Liangpei
    Tao, Dacheng
    [J]. 2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 4721 - 4728
  • [4] Unsupervised Belief Representation Learning with Information-Theoretic Variational Graph Auto-Encoders
    Li, Jinning
    Shao, Huajie
    Sun, Dachun
    Wang, Ruijie
    Yan, Yuchen
    Li, Jinyang
    Liu, Shengzhong
    Tong, Hanghang
    Abdelzaher, Tarek
    [J]. PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1728 - 1738
  • [5] Explicit guiding auto-encoders for learning meaningful representation
    Yanan Sun
    Hua Mao
    Yongsheng Sang
    Zhang Yi
    [J]. Neural Computing and Applications, 2017, 28 : 429 - 436
  • [6] EXPLORING CONVOLUTIONAL AUTO-ENCODERS FOR REPRESENTATION LEARNING ON NETWORKS
    Nerurkar, Pranav
    Chandane, Madhav
    Bhirud, Sunil
    [J]. COMPUTER SCIENCE-AGH, 2019, 20 (03): : 350 - 365
  • [7] Unsupervised Anomaly Localization Using Variational Auto-Encoders
    Zimmerer, David
    Isensee, Fabian
    Petersen, Jens
    Kohl, Simon
    Maier-Hein, Klaus
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT IV, 2019, 11767 : 289 - 297
  • [8] Stacked Convolutional Sparse Auto-Encoders for Representation Learning
    Zhu, Yi
    Li, Lei
    Wu, Xindong
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2021, 15 (02)
  • [9] Nonparametric Variational Auto-encoders for Hierarchical Representation Learning
    Goyal, Prasoon
    Hu, Zhiting
    Liang, Xiaodan
    Wang, Chenyu
    Xing, Eric P.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5104 - 5112
  • [10] Explicit guiding auto-encoders for learning meaningful representation
    Sun, Yanan
    Mao, Hua
    Sang, Yongsheng
    Yi, Zhang
    [J]. NEURAL COMPUTING & APPLICATIONS, 2017, 28 (03): : 429 - 436