Differentiable Duration Refinement Using Internal Division for Non-Autoregressive Text-to-Speech

被引:0
|
作者
Lee, Jaeuk [1 ]
Shin, Yoonsoo [1 ]
Chang, Joon-Hyuk [1 ]
机构
[1] Hanyang University, School of Electronics, Seoul,04763, Korea, Republic of
来源
关键词
D O I
10.1109/LSP.2024.3495578
中图分类号
学科分类号
摘要
29
引用
下载
收藏
页码:3154 / 3158
相关论文
共 50 条
  • [21] Modeling segmental duration for Turkish text-to-speech
    Öztürk, Ö
    Çiloglu, T
    PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 272 - 275
  • [22] Non-Autoregressive Speech Recognition with Error Correction Module
    Qian, Yukun
    Zhuang, Xuyi
    Zhang, Zehua
    Zhou, Lianyu
    Lin, Xu
    Wang, Mingjiang
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1103 - 1108
  • [23] CTC-based Non-autoregressive Speech Translation
    Xu, Chen
    Liu, Xiaoqian
    Liu, Xiaowen
    Sun, Qingxuan
    Zhang, Yuhao
    Yang, Murun
    Dong, Qianqian
    Ko, Tom
    Wang, Mingxuan
    Xiao, Tong
    Ma, Anxiang
    Zhu, Jingbo
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 13321 - 13339
  • [24] Diffusion Models for Non-autoregressive Text Generation: A Survey
    Li, Yifan
    Zhou, Kun
    Zhao, Wayne Xin
    Wen, Ji-Rong
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6692 - 6701
  • [25] WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
    Chen, Nanxin
    Zhang, Yu
    Zen, Heiga
    Weiss, Ron J.
    Norouzi, Mohammad
    Dehak, Najim
    Chan, William
    INTERSPEECH 2021, 2021, : 3765 - 3769
  • [26] HIERARCHICAL PROSODY MODELING FOR NON-AUTOREGRESSIVE SPEECH SYNTHESIS
    Chien, Chung-Ming
    Lee, Hung-yi
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 446 - 453
  • [27] Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement
    Lee, Jason
    Mansimov, Elman
    Cho, Kyunghyun
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 1173 - 1182
  • [28] Modeling segmental duration in German text-to-speech synthesis
    Mobius, B
    vanSanten, J
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2395 - 2398
  • [29] Syllable duration prediction for Farsi text-to-speech systems
    Nazari, B.
    Nayebi, K.
    Sheikhzadeh, H.
    Scientia Iranica, 2004, 11 (03) : 225 - 233
  • [30] The pause duration prediction for mandarin text-to-speech system
    Yu, J
    Tao, JH
    Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 204 - 208