Text normalization in mandarin Text-to-Speech system

被引:0
|
作者
Jia, Yuxiang [1 ,2 ]
Huang, Dezhi [2 ]
Liu, Wu [2 ]
Dong, Yuan [2 ,3 ]
Yu, Shiwen [1 ]
Wang, Haila [2 ]
机构
[1] Peking Univ, Inst Computat Linguist, Beijing 100871, Peoples R China
[2] France Telecom R&D Beijing, Speech & Nat Language Proc Unit, Beijing, Peoples R China
[3] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
关键词
Text-to-Speech (TTS); text normalization; finite state automata; maximum entropy classifier;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Text normalization is an important component in Text-to-Speech system and the difficulty in text normalization is to disambiguate the Non-Standard Words (NSWs). This paper develops a taxonomy of NSWs on the basis of a large scale Chinese corpus, and proposes a two-stage NSWs disambiguation strategy, Finite State Automata (FSA) for initial classification and Maximum Entropy (ME) classifiers for subclass disambiguation. Based on the above NSWs taxonomy, the two-stage approach achieves an F-score of 98.53% in open test, 5.23% higher than that of FSA based approach. Experiments show that the NSWs taxonomy ensures FSA a high baseline performance and ME classifiers make considerable improvement, and the two-stage approach adapts well to new domains.
引用
下载
收藏
页码:4693 / +
页数:2
相关论文
共 50 条
  • [21] Text-to-speech system for Danish
    1600, Publ by Elsevier Science Publishers B.V., Amsterdam, Neth
  • [22] Refining Unit Boundaries for Mandarin Text-to-Speech Database
    Dong, Minghui
    Cen, Ling
    Chan, Paul
    Li, Haizhou
    2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2009, : 245 - 248
  • [23] Statistically Augmented Preprocessing/Normalization Module for a Romanian Text-to-Speech System
    Ungurean, Catalin
    Burileanu, Dragos
    Surmei, Mihai
    2013 7TH CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN - COMPUTER DIALOGUE (SPED), 2013,
  • [24] Bangla text normalization for text-to-speech synthesizer using machine learning algorithms
    Islam, Md. Rezaul
    Ahmad, Arif
    Rahman, Mohammad Shahidur
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (01)
  • [25] A statistical model with hierarchical structure for predicting prosody in a mandarin text-to-speech system
    Yu, MS
    Pan, NH
    JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2005, 28 (03) : 385 - 399
  • [26] STRESS PREDICITION FOR MANDARIN TEXT-TO-SPEECH SYSTEM USING DISCOURSE CONTEXT FEATURE
    Che, Hao
    Tao, Jianhua
    2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [27] An efficient text analyzer with prosody generator-driven approach for mandarin text-to-speech
    Hwang, SH
    Yeh, CY
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 488 - 491
  • [28] Efficient text analyser with prosody generator-driven approach for Mandarin text-to-speech
    Yeh, CY
    Hwang, SH
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2005, 152 (06): : 793 - 799
  • [29] Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech
    Li, Ya
    Tao, Jianhua
    Hirose, Keikichi
    Xu, Xiaoying
    Lai, Wei
    SPEECH COMMUNICATION, 2015, 72 : 59 - 73
  • [30] Document Structure Analysis and Text Normalization for Chinese Putonghua and Cantonese Text-to-Speech Synthesis
    Zhou, Xinxin
    Wu, Zhiyong
    Yuan, Chun
    Zhong, Yuzhuo
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL I, PROCEEDINGS, 2008, : 477 - 481