SegINR: Segment-Wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech

被引:0
|
作者
Kim, Minchan [1 ,2 ]
Jeong, Myeonghun [1 ,2 ]
Lee, Joun Yeop [3 ]
Kim, Nam Soo [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, Inst New Media & Commun, Seoul 08826, South Korea
[3] Samsung Res, Seoul 06765, South Korea
关键词
Semantics; Predictive models; Computational modeling; Transducers; Training; Indexes; Regulation; Linguistics; Computational efficiency; Implicit neural representation; sequence alignment; text-to-speech;
D O I
10.1109/LSP.2025.3528858
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We present SegINR, a novel approach to neural Text-to-Speech (TTS) that eliminates the need for either an auxiliary duration predictor or autoregressive (AR) sequence modeling for alignment. SegINR simplifies the TTS process by directly converting text sequences into frame-level features. Encoded text embeddings are transformed into segments of frame-level features with length regulation using a conditional implicit neural representation (INR). This method, termed Segment-wise INR (SegINR), captures temporal dynamics within each segment while autonomously defining segment boundaries, resulting in lower computational costs. Integrated into a two-stage TTS framework, SegINR is employed for semantic token prediction. Experiments in zero-shot adaptive TTS scenarios show that SegINR outperforms conventional methods in speech quality with computational efficiency.
引用
收藏
页码:646 / 650
页数:5
相关论文
共 50 条
  • [21] Text-To-Speech quality evaluation based on LSTM Recurrent Neural Networks
    Tang, Meng
    Zhu, Jie
    2019 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC), 2019, : 260 - 264
  • [22] Deep Voice 2: Multi-Speaker Neural Text-to-Speech
    Arik, Sercan O.
    Diamos, Gregory
    Gibiansky, Andrew
    Miller, John
    Peng, Kainan
    Ping, Wei
    Raiman, Jonathan
    Zhou, Yanqi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [23] Development of Assamese Text-to-speech System using Deep Neural Network
    Deka, Abhash
    Sarmah, Priyankoo
    Samudravijaya, K.
    Prasanna, S. R. M.
    2019 25TH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2019,
  • [24] A high quality text-to-speech system composed of multiple neural networks
    Karaali, O
    Corrigan, G
    Massey, N
    Miller, C
    Schnurr, O
    Mackie, A
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 1237 - 1240
  • [25] Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech
    Choi, Yeunju
    Jung, Youngmoon
    Suh, Youngjoo
    Kim, Hoirin
    IEEE ACCESS, 2022, 10 : 52621 - 52629
  • [26] Spectral-Wise Implicit Neural Representation for Hyperspectral Image Reconstruction
    Chen, Huan
    Zhao, Wangcai
    Xu, Tingfa
    Shi, Guokai
    Zhou, Shiyun
    Liu, Peifu
    Li, Jianan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3714 - 3727
  • [27] Real-time neural text-to-speech with sequence-to-sequence acoustic model and WaveGlow or single Gaussian WaveRNN vocoders
    Okamoto, Takuma
    Toda, Tomoki
    Shiga, Yoshinori
    Kawai, Hisashi
    INTERSPEECH 2019, 2019, : 1308 - 1312
  • [28] Partial compensation for coarticulatory vowel nasalization across concatenative and neural text-to-speech
    Zellou, Georgia
    Cohn, Michelle
    Block, Aleese
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2021, 149 (05): : 3424 - 3436
  • [29] Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems
    Vipperla, Ravichander
    Park, Sangjun
    Choo, Kihyun
    Ishtiaq, Samin
    Min, Kyoungbo
    Bhattacharya, Sourav
    Mehrotra, Abhinav
    Ramos, Alberto Gil C. P.
    Lane, Nicholas D.
    INTERSPEECH 2020, 2020, : 3565 - 3569
  • [30] Exploring Efficient Neural Architectures for Linguistic-Acoustic Mapping in Text-To-Speech
    Pascual, Santiago
    Serra, Joan
    Bonafonte, Antonio
    APPLIED SCIENCES-BASEL, 2019, 9 (16):