Automatic restoration of diacritics based on word n-grams for Slovak texts

被引:0
|
作者
Toth, Stefan [1 ]
Zaymus, Emanuel [1 ]
Duracik, Michal [1 ]
Mesko, Matej [1 ]
Hrkut, Patrik [1 ]
机构
[1] Univ Zilina, Dept Software Technol, Fac Management Sci & Informat, Zilina, Slovakia
关键词
diacritic; diacritics restoration; n-gram; Slovak language;
D O I
10.1109/informatics47936.2019.9119328
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the past and even now, many people still write texts without diacritics, especially in chat messages, e-mails or discussion posts. This issue evolved from historical reasons when people had a problem with text encoding in messages or wanted to write them faster. In this paper, we propose an algorithm based on word n-grams (contiguous sequence of n words) that restore diacritics of text written in the Slovak language. We also compare and evaluate our results with existing algorithms developed for Slovak texts.
引用
收藏
页码:243 / 248
页数:6
相关论文
共 50 条
  • [41] Measuring similarity between Karel programs using character and word n-grams
    G. Sidorov
    M. Ibarra Romero
    I. Markov
    R. Guzman-Cabrera
    L. Chanona-Hernández
    F. Velásquez
    [J]. Programming and Computer Software, 2017, 43 : 47 - 50
  • [42] A novel technique for words reordering based on n-grams
    Athanaselis, Theologos
    Bakamidis, Stelios
    Dologlou, Loannis
    [J]. 2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 41 - 44
  • [43] An improved N-grams based Model for Authorship Attribution
    Boughaci, Dalila
    Benmesbah, Mounir
    Zebiri, Aniss
    [J]. 2019 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCIS), 2019, : 70 - 75
  • [44] Automated labeling of PDF mathematical exercises with word N-grams VSM classification
    Yamauchi, Taisei
    Flanagan, Brendan
    Nakamoto, Ryosuke
    Dai, Yiling
    Takami, Kyosuke
    Ogata, Hiroaki
    [J]. SMART LEARNING ENVIRONMENTS, 2023, 10 (01)
  • [45] Automated labeling of PDF mathematical exercises with word N-grams VSM classification
    Taisei Yamauchi
    Brendan Flanagan
    Ryosuke Nakamoto
    Yiling Dai
    Kyosuke Takami
    Hiroaki Ogata
    [J]. Smart Learning Environments, 10
  • [46] A New Unsupervised Binning Approach for Metagenomic Sequences Based on N-grams and Automatic Feature Weighting
    Liao, Ruiqi
    Zhang, Ruichang
    Guan, Jihong
    Zhou, Shuigeng
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (01) : 42 - 54
  • [47] Hierarchical classification of Chinese documents based on N-grams
    Guan, JH
    Zhou, SG
    [J]. DIGITAL LIBRARIES: TECHNOLOGY AND MANAGEMENT OF INDIGENOUS KNOWLEDGE FOR GLOBAL ACCESS, 2003, 2911 : 643 - 652
  • [48] Use of Word and Character N-Grams for Low-Resourced Local Languages
    Regalado, Ralph Vincent
    Agarap, Abien Fred
    Baliber, Renz Iver
    Yambao, Arian
    Cheng, Charibeth
    [J]. 2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 250 - 254
  • [49] The use of word n-grams and parts of speech for hierarchical cluster language modeling
    Tang, Wen
    Vergyri, Dimitra
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1057 - 1060
  • [50] The Effects of a Corpus on isiZulu Spellcheckers based on N-grams
    Ndaba, Balone
    Suleman, Hussein
    Keet, C. Maria
    Khumalo, Langa
    [J]. 2016 IST-AFRICA WEEK CONFERENCE, 2016,