A Possibilistic Approach for Building Statistical Language Models

被引:1
|
作者
Momtazi, Saeedeh [1 ]
Sameti, Hossein [2 ]
机构
[1] Univ Saarland, Saarbrucken, Germany
[2] Sharif Univ Technol, Tehran, Iran
关键词
D O I
10.1109/ISDA.2009.197
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class-based n-gram language models are those most frequently-used in continuous speech recognition systems, especially for languages for which no richly annotated corpora are available. Various word clustering algorithms have been proposed to build such class-based models. In this work, we discuss the superiority of soft approaches to class construction, whereby each word can be assigned to more than one class. We also propose a new method for possibilistic word clustering. The possibilistic C-mean algorithm is used as our clustering method. Various parameters of this algorithm are investigated; e.g., centroid initialization, distance measure, and words' feature vector In the experiments reported here, this algorithm is applied to the 20,000 most frequent Persian words, and the language model built with the clusters created in this fashion is evaluated based on its perplexity and the accuracy of a continuous speech recognition system. Our results indicate a 10% reduction in perplexity and a 4% reduction in word error rate.
引用
收藏
页码:1014 / +
页数:2
相关论文
共 50 条
  • [1] Building Statistical Language Models of Code
    Schulam, Peter
    Rosenfeld, Roni
    Devanbu, Premkumar
    2013 1ST INTERNATIONAL WORKSHOP ON DATA ANALYSIS PATTERNS IN SOFTWARE ENGINEERING (DAPSE), 2013, : 1 - 3
  • [2] Combination of Probabilistic and Possibilistic Language Models
    Oger, Stanislas
    Popescu, Vladimir
    Linares, Georges
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1808 - 1811
  • [3] Possibilistic approach to the Bayes statistical decisions
    Hryniewicz, O
    SOFT METHODS IN PROBABILITY, STATISTICS AND DATA ANALYSIS, 2002, : 207 - 218
  • [4] Building neural network models for time series:: A statistical approach
    Medeiros, MC
    Teräsvirta, T
    Rech, G
    JOURNAL OF FORECASTING, 2006, 25 (01) : 49 - 75
  • [5] Probabilistic and Possibilistic Language Models Based on the World Wide Web
    Oger, Stanislas
    Popescu, Vladimir
    Linares, Georges
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2651 - +
  • [6] A maximum entropy approach for integrating semantic information in statistical language models
    Chueh, CH
    Chien, JT
    Wang, HM
    2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 309 - 312
  • [7] A Chinese OCR spelling check approach based on statistical language models
    Li, Z
    Bao, T
    Zhu, XY
    Wang, CH
    Naoi, SS
    2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 4727 - 4732
  • [8] A possibilistic clustering approach toward generative mixture models
    Chatzis, Sotirios P.
    Tsechpenakis, Gavriil
    PATTERN RECOGNITION, 2012, 45 (05) : 1819 - 1825
  • [9] Web-based possibilistic language models for automatic speech recognition
    Oger, Stanislas
    Linares, Georges
    COMPUTER SPEECH AND LANGUAGE, 2014, 28 (04): : 923 - 939
  • [10] Possibilistic Stable Models
    Nicolas, Pascal
    Garcia, Laurent
    Stephan, Igor
    19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 248 - 253