LEARNING ACCENT REPRESENTATION WITH MULTI-LEVEL VAE TOWARDS CONTROLLABLE SPEECH SYNTHESIS

被引:2
|
作者
Melechovsky, Jan [1 ]
Mehrish, Ambuj [1 ]
Herremans, Dorien [1 ]
Sisman, Berrak [2 ]
机构
[1] Singapore Univ Technol & Design, Singapore, Singapore
[2] Univ Texas Dallas, Richardson, TX USA
关键词
Accent; Text-to-Speech; Multi-level Variational Autoencoder; Disentanglement; Controllable speech synthesis; CONVERSION;
D O I
10.1109/SLT54892.2023.10023072
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Accent is a crucial aspect of speech that helps define one's identity. We note that the state-of-the-art Text-to-Speech (TTS) systems can achieve high-quality generated voice, but still lack in terms of versatility and customizability. Moreover, they generally do not take into account accent, which is an important feature of speaking style. In this work, we utilize the concept of Multi-level VAE (ML-VAE) to build a control mechanism that aims to disentangle accent from a reference accented speaker; and to synthesize voices in different accents such as English, American, Irish, and Scottish. The proposed framework can also achieve high-quality accented voice generation for multi-speaker setup, which we believe is remarkable. We investigate the performance through objective metrics and conduct listening experiments for a subjective performance assessment. We showed that the proposed method achieves good performance for naturalness, speaker similarity, and accent similarity.
引用
收藏
页码:928 / 935
页数:8
相关论文
共 50 条
  • [41] Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning
    Cheng, Xuxin
    Xu, Wanshi
    Zhu, Zhihong
    Li, Hongxiang
    Zou, Yuexian
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 326 - 336
  • [42] A multi-level semantic-assisted unsupervised heterogeneous network representation learning model
    Liu, Qun
    Peng, Chengxin
    Xia, Shuyin
    Wang, Guoyin
    NEUROCOMPUTING, 2024, 574
  • [43] Multi-level attentive deep user-item representation learning for recommendation system
    Da'u, Aminu
    Salim, Naomie
    Idris, Rabiu
    NEUROCOMPUTING, 2021, 433 : 119 - 130
  • [44] Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation
    Zhang, Zhichang
    Zhu, Lin
    Yu, Peilin
    JMIR MEDICAL INFORMATICS, 2020, 8 (05)
  • [45] Enhancing Enterprise Credit Risk Assessment with Cascaded Multi-level Graph Representation Learning
    Song, Lingyun
    Li, Haodong
    Tan, Yacong
    Li, Zhanhuai
    Shang, Xuequn
    NEURAL NETWORKS, 2024, 169 : 475 - 484
  • [46] An ontology-based multi-level semantic representation model for learning objects annotation
    Rezgui, Kalthoum
    Mhiri, Hedia
    Ghedira, Khaled
    2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 1391 - 1398
  • [47] Multi-Level Information Aggregation Based Graph Attention Networks Towards Fake Speech Detection
    Zhou, Jian
    Li, Yong
    Fan, Cunhang
    Tao, Liang
    Kwan, Hon Keung
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1580 - 1584
  • [48] Hierarchical Bayesian learning framework for multi-level modeling using multi-level data
    Jia, Xinyu
    Papadimitriou, Costas
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2022, 179
  • [49] Multi-level graph contrastive learning
    Shao, Pengpeng
    Tao, Jianhua
    NEUROCOMPUTING, 2024, 570
  • [50] Towards a Multi-level Exploration of Human and Computational Re-representation in Unified Cognitive Frameworks
    Olteteanu, Ana-Maria
    Schoettner, Mikkel
    Bahety, Arpit
    FRONTIERS IN PSYCHOLOGY, 2019, 10