Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody

被引:1
|
作者
Makarov, Peter [1 ]
Abbas, Ammar [1 ]
Lajszczak, Mateusz [1 ]
Joly, Arnaud [1 ]
Karlapati, Sri [1 ]
Moinet, Alexis [1 ]
Drugman, Thomas [1 ]
Karanasou, Penny [1 ]
机构
[1] Amazon, Alexa AI, Cambridge, England
来源
关键词
neural text-to-speech; long-form TTS; multi-speaker TTS; contextual word embeddings; FastSpeech; BERT;
D O I
10.21437/Interspeech.2022-379
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Generating expressive and contextually appropriate prosody remains a challenge for modern text-to-speech (TTS) systems. This is particularly evident for long, multi-sentence inputs. In this paper, we examine simple extensions to a Transformer-based FastSpeech-like system, with the goal of improving prosody for multi-sentence TTS. We find that long context, powerful text features, and training on multi-speaker data all improve prosody. More interestingly, they result in synergies. Long context disambiguates prosody, improves coherence, and plays to the strengths of Transformers. Finetuning word-level features from a powerful language model, such as BERT, appears to benefit from more training data, readily available in a multi-speaker setting. We look into objective metrics on pausing and pacing and perform thorough subjective evaluations for speech naturalness. Our main system, which incorporates all the extensions, achieves consistently strong results, including statistically significant improvements in speech naturalness over all its competitors.
引用
收藏
页码:3368 / 3372
页数:5
相关论文
共 50 条
  • [21] Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents
    Hessel, Jack
    Lee, Lillian
    Mimno, David
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2034 - 2045
  • [22] Multi-sentence Compression Using Word Graph and Integer Linear Programming
    Tuan, Dung Tran
    Van Chi, Nam
    Nghiem, Minh-Quoc
    ADVANCED TOPICS IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2017, 710 : 367 - 377
  • [23] A Neural Semantic Parser for Math Problems Incorporating Multi-Sentence Information
    Sun, Ruiyong
    Zhao, Yijia
    Zhang, Qi
    Ding, Keyu
    Wang, Shijin
    Wei, Cui
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (04)
  • [24] Multi-Document Abstractive Summarization Using ILP Based Multi-Sentence Compression
    Banerjee, Siddhartha
    Mitra, Prasenjit
    Sugiyama, Kazunari
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 1208 - 1214
  • [25] Using Semantically Connected Parse Trees to Answer Multi-Sentence Queries
    Ilvovsky, D. A.
    AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2014, 48 (01) : 33 - 41
  • [26] Multi-Sentence Argument Linking via An Event-Aware Hierarchical Encoder
    Yang, Hang
    Chen, Yubo
    Liu, Kang
    Zhao, Jun
    Wang, Taifeng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3578 - 3582
  • [27] A New Annotated Portuguese/Spanish Corpus for the Multi-Sentence Compression Task
    Pontes, Elvys Linhares
    Torres-Moreno, Juan-Manuel
    Huet, Stephane
    Linhares, Andrea Carneiro
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3192 - 3196
  • [28] Finding Maximal Common Sub-parse Thickets for Multi-sentence Search
    Galitsky, Boris A.
    Ilvovsky, Dmitry
    Kuznetsov, Sergei O.
    Strok, Fedor
    GRAPH STRUCTURES FOR KNOWLEDGE REPRESENTATION AND REASONING, GKR 2013, 2014, 8323 : 39 - 57
  • [29] Paragraph-based Transformer Pre-training for Multi-Sentence Inference
    Di Liello, Luca
    Garg, Siddhant
    Soldaini, Luca
    Moschitti, Alessandro
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2521 - 2531
  • [30] Construction risk identification using a multi-sentence context-aware method
    Gao, Nan
    Touran, Ali
    Wang, Qi
    Beauchamp, Nicholas
    AUTOMATION IN CONSTRUCTION, 2024, 164