A WFST-based Log-linear Framework for Speaking-style Transformation

被引：0

作者：

Neubig, Graham ^{[1
]}

Mori, Shinsuke ^{[1
]}

Kawahara, Tatsuya ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto 6068501, Japan

来源：

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 | 2009年

关键词：

speaking style transformation; disfluency detection; weighted finite state transducers; log-linear model;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

When attempting to make transcripts from automatic speech recognition results, disfluency deletion, transformation of colloquial expressions, and insertion of dropped words must be performed to ensure that the final product is clean transcript-style text. This paper introduces a system for the automatic transformation of the spoken word to transcript-style language that enables not only deletion of disfluencies, but also substitutions of colloquial expressions and insertion of dropped words. A number of potentially useful features are combined in a log-linear probabilistic framework, and the utility of each is examined. The system is implemented using weighted finite state transducers (WFSTs) to allow for easy combination of features and integration with other WFST-based systems. On evaluation, the best system achieved a 5.37% word error rate, a 5.49% absolute gain over a rule-based baseline and a 1.54% absolute gain over a simple noisy-channel model.

引用

页码：1503 / 1506

页数：4

共 50 条

[1] Language model adaptation using WFST-based speaking-style translation
Hori, Takaaki
Willett, Daniel
Minami, Yasuhiro
ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, (228-231):
[2] Language model adaptation using WFST-based speaking-style translation
Hori, T
Willett, D
Minami, Y
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 228 - 231
[3] A LOG-LINEAR MODELING FRAMEWORK FOR SELECTIVE MIXING
MORRIS, M
MATHEMATICAL BIOSCIENCES, 1991, 107 (02) : 349 - 377
[4] A Framework to Interpret Nonstandard Log-Linear Models
Mair, Patrick
AUSTRIAN JOURNAL OF STATISTICS, 2007, 36 (02) : 89 - 103
[5] Automatic Transcription of Lecture Speech using Language Model Based on Speaking-Style Transformation of Proceeding Texts
Akita, Yuya
Watanabe, Makoto
Kawahara, Tatsuya
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2323 - 2326
[6] Log-Linear Framework for Linear Feature Transformations in Speech Recognition
Tahir, Muhammad Ali
Heigold, Georg
Plahl, Christian
Schlueter, Ralf
Ney, Hermann
2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 76 - 81
[7] ROBUSTNESS OF PHONEME-BASED HMMS AGAINST SPEAKING-STYLE VARIATIONS
MATSUOKA, T
SHIKANO, K
IEICE TRANSACTIONS ON COMMUNICATIONS ELECTRONICS INFORMATION AND SYSTEMS, 1991, 74 (07): : 1761 - 1767
[8] Topic-independent speaking-style transformation of language model for spontaneous speech recognition
Akita, Yuya
Kawahara, Tatsuya
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 33 - +
[9] Score normalization-based speaking-style variation robust speaker recognition
State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
不详
Qinghua Daxue Xuebao, 2009, SUPPL. 1 (1278-1282):
[10] Large Vocabulary Continuous Speech Recognition Using WFST-based Linear Classifier for Structured Data
Watanabe, Shinji
Hori, Takaaki
Nakamura, Atsushi
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 346 - 349

← 1 2 3 4 5 →