A WFST-based Log-linear Framework for Speaking-style Transformation

被引:0
|
作者
Neubig, Graham [1 ]
Mori, Shinsuke [1 ]
Kawahara, Tatsuya [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto 6068501, Japan
关键词
speaking style transformation; disfluency detection; weighted finite state transducers; log-linear model;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When attempting to make transcripts from automatic speech recognition results, disfluency deletion, transformation of colloquial expressions, and insertion of dropped words must be performed to ensure that the final product is clean transcript-style text. This paper introduces a system for the automatic transformation of the spoken word to transcript-style language that enables not only deletion of disfluencies, but also substitutions of colloquial expressions and insertion of dropped words. A number of potentially useful features are combined in a log-linear probabilistic framework, and the utility of each is examined. The system is implemented using weighted finite state transducers (WFSTs) to allow for easy combination of features and integration with other WFST-based systems. On evaluation, the best system achieved a 5.37% word error rate, a 5.49% absolute gain over a rule-based baseline and a 1.54% absolute gain over a simple noisy-channel model.
引用
收藏
页码:1503 / 1506
页数:4
相关论文
共 50 条
  • [1] Language model adaptation using WFST-based speaking-style translation
    Hori, Takaaki
    Willett, Daniel
    Minami, Yasuhiro
    ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, (228-231):
  • [2] Language model adaptation using WFST-based speaking-style translation
    Hori, T
    Willett, D
    Minami, Y
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 228 - 231
  • [3] A LOG-LINEAR MODELING FRAMEWORK FOR SELECTIVE MIXING
    MORRIS, M
    MATHEMATICAL BIOSCIENCES, 1991, 107 (02) : 349 - 377
  • [4] A Framework to Interpret Nonstandard Log-Linear Models
    Mair, Patrick
    AUSTRIAN JOURNAL OF STATISTICS, 2007, 36 (02) : 89 - 103
  • [5] Automatic Transcription of Lecture Speech using Language Model Based on Speaking-Style Transformation of Proceeding Texts
    Akita, Yuya
    Watanabe, Makoto
    Kawahara, Tatsuya
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2323 - 2326
  • [6] Log-Linear Framework for Linear Feature Transformations in Speech Recognition
    Tahir, Muhammad Ali
    Heigold, Georg
    Plahl, Christian
    Schlueter, Ralf
    Ney, Hermann
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 76 - 81
  • [7] ROBUSTNESS OF PHONEME-BASED HMMS AGAINST SPEAKING-STYLE VARIATIONS
    MATSUOKA, T
    SHIKANO, K
    IEICE TRANSACTIONS ON COMMUNICATIONS ELECTRONICS INFORMATION AND SYSTEMS, 1991, 74 (07): : 1761 - 1767
  • [8] Topic-independent speaking-style transformation of language model for spontaneous speech recognition
    Akita, Yuya
    Kawahara, Tatsuya
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 33 - +
  • [9] Score normalization-based speaking-style variation robust speaker recognition
    State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
    不详
    Qinghua Daxue Xuebao, 2009, SUPPL. 1 (1278-1282):
  • [10] Large Vocabulary Continuous Speech Recognition Using WFST-based Linear Classifier for Structured Data
    Watanabe, Shinji
    Hori, Takaaki
    Nakamura, Atsushi
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 346 - 349