Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji

被引:8
|
作者
Hamarashid, Hozan K. [1 ]
Saeed, Soran A. [2 ]
Rashid, Tarik A. [3 ]
机构
[1] Sulaimani Polytech Univ, Comp Sci Inst, Dept Informat Technol, Sulaimani, Krg, Iraq
[2] Sulaimani Polytech Univ, Sulaimani, Krg, Iraq
[3] Univ Kurdistan Hewler, Sch Sci & Engn, Comp Sci & Engn Dept, Erbil, Krg, Iraq
来源
NEURAL COMPUTING & APPLICATIONS | 2021年 / 33卷 / 09期
关键词
Next word prediction; Kurdish language; N-gram; Corpus;
D O I
10.1007/s00521-020-05245-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Next word prediction is an input technology that simplifies the process of typing by suggesting the next word to a user to select, as typing in a conversation consumes time. A few previous studies have focused on the Kurdish language, including the use of next word prediction. However, the lack of a Kurdish text corpus presents a challenge. Moreover, the lack of a sufficient number of N-grams for the Kurdish language, for instance, five-grams, is the reason for the rare use of next Kurdish word prediction. Furthermore, the improper display of several Kurdish letters in the RStudio software is another problem. This paper provides a Kurdish corpus, creates five, and presents a unique research work on next word prediction for Kurdish Sorani and Kurmanji. The N-gram model has been used for next word prediction to reduce the amount of time while typing in the Kurdish language. In addition, little work has been conducted on next Kurdish word prediction; thus, the N-gram model is utilized to suggest text accurately. To do so, R programming and RStudio are used to build the application. The model is 96.3% accurate.
引用
收藏
页码:4547 / 4566
页数:20
相关论文
共 50 条
  • [1] Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji
    Hozan K. Hamarashid
    Soran A. Saeed
    Tarik A. Rashid
    [J]. Neural Computing and Applications, 2021, 33 : 4547 - 4566
  • [2] Bangla Word Clustering Based on N-gram Language Model
    Ismail, Sabir
    Rahman, M. Shahidur
    [J]. 2014 1ST INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION & COMMUNICATION TECHNOLOGY (ICEEICT 2014), 2014,
  • [3] A language independent n-gram model for word segmentation
    Kang, Seung-Shik
    Hwang, Kyu-Baek
    [J]. Lect. Notes Comput. Sci, 1600, (557-565):
  • [4] A language independent n-gram model for word segmentation
    Kang, Seung-Shik
    Hwang, Kyu-Baek
    [J]. AI 2006: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4304 : 557 - +
  • [5] Polish Word Recognition Based on n-Gram Methods
    Wojcicki, Piotr
    Zientarski, Tomasz
    [J]. IEEE ACCESS, 2024, 12 : 49817 - 49825
  • [6] An N-gram based model for predicting of word-formation in Assamese language
    Bhuyan, M. P.
    Sarma, S. K.
    [J]. JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2019, 40 (02): : 427 - 440
  • [7] Comparison of web-based unsupervised translation disambiguation word model and N-gram model
    Institute of Computational Linguistics, Peking University, Beijing 100871, China
    不详
    [J]. Dianzi Yu Xinxi Xuebao, 2009, 12 (2969-2974):
  • [8] Word N-gram Based Classification for Data Leakage Prevention
    Alneyadi, Sultan
    Sithirasenan, Elankayer
    Muthukkumarasamy, Vallipuram
    [J]. 2013 12TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2013), 2013, : 578 - 585
  • [9] A Corpus Based Unsupervised Bangla Word Stemming Using N-Gram Language Model
    Urmi, Tapashee Tabassum
    Jammy, Jasmine Jahan
    Ismail, Sabir
    [J]. 2016 5TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS AND VISION (ICIEV), 2016, : 824 - 828
  • [10] Self-Organizing n-gram Model for Automatic Word Spacing
    Park, Seong-Bae
    Tae, Yoon-Shik
    Park, Se-Young
    [J]. COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 633 - 640