Extended N-gram Model for Analysis of Polish Texts

被引:2
|
作者
Banasiak, Dariusz [1 ]
Mierzwa, Jaroslaw [1 ]
Sterna, Antoni [1 ]
机构
[1] Wroclaw Univ Sci & Technol, Fac Elect, Dept Comp Engn, Wroclaw, Poland
来源
关键词
Natural language processing; N-grams models; Morphological analysis;
D O I
10.1007/978-3-319-67792-7_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper presents extended N-gram model designed for analysis of texts in Polish language. One of possible applications of the model is automatic detection and correction of errors that occur during computerized text edition. N-grams belong to the group of statistical methods in Natural Language Processing (NLP). They are created through analysis of sufficiently large language data resources called corpora. In the classic version N-grams represent the sequences of words of certain length that appear in analyzed language resources. Presented approach introduces N-grams that include also results of morphological analysis of texts. As a result, three types of N-grams may be distinguished: lexical (containing original words from text or their basic forms), morphosyntactic (sequences of morphosyntactic tags assigned to words) and mixed (combination of lexical and morphological description). Extended model with new types of N-grams encompasses language properties specific for Polish such as free word order and complex inflection.
引用
收藏
页码:355 / 364
页数:10
相关论文
共 50 条
  • [1] Recasting the discriminative n-gram model as a pseudo-conventional n-gram model for LVCSR
    Zhou, Zhengyu
    Meng, Helen
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4933 - 4936
  • [2] Analysis of N-gram model on Telugu Document Classification
    Rani, B. Padmaja
    Vardhan, B. Vishnu
    Durga, A. Kanaka
    Reddy, L. Pratap
    Babu, A. Vinaya
    [J]. 2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 3199 - +
  • [3] Polish Word Recognition Based on n-Gram Methods
    Wojcicki, Piotr
    Zientarski, Tomasz
    [J]. IEEE ACCESS, 2024, 12 : 49817 - 49825
  • [4] Using n-gram graphs for sentiment analysis: an extended study on Twitter
    Aisopos, Fotis
    Tzannetos, Dimitrios
    Violos, John
    Varvarigou, Theodora
    [J]. PROCEEDINGS 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2016), 2016, : 44 - 51
  • [5] Pseudo-Conventional N-Gram Representation of the Discriminative N-Gram Model for LVCSR
    Zhou, Zhengyu
    Meng, Helen
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (06) : 943 - 952
  • [6] Pipilika N-gram Viewer: An Efficient Large Scale N-gram Model for Bengali
    Ahmad, Adnan
    Talha, Mahbubur Rub
    Amin, Md. Ruhul
    Chowdhury, Farida
    [J]. 2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [7] Supervised N-gram Topic Model
    Kawamae, Noriaki
    [J]. WSDM'14: PROCEEDINGS OF THE 7TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2014, : 473 - 482
  • [8] Application of N-Gram Based Distances to Genetic Texts Comparison
    Kirzhner, Valery
    Volkovich, Zeev
    [J]. BIOSEMIOTICS, 2021, 14 (02) : 271 - 285
  • [9] Similar N-gram Language Model
    Gillot, Christian
    Cerisara, Christophe
    Langlois, David
    Haton, Jean-Paul
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1824 - 1827
  • [10] Evaluation of N-gram term conflation approach for arabic texts
    Abu-Salem, H
    [J]. PROCEEDINGS OF 2003 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING, VOLS I AND II, 2003, : 2561 - 2567