Automatic restoration of diacritics based on word n-grams for Slovak texts

被引:0
|
作者
Toth, Stefan [1 ]
Zaymus, Emanuel [1 ]
Duracik, Michal [1 ]
Mesko, Matej [1 ]
Hrkut, Patrik [1 ]
机构
[1] Univ Zilina, Dept Software Technol, Fac Management Sci & Informat, Zilina, Slovakia
关键词
diacritic; diacritics restoration; n-gram; Slovak language;
D O I
10.1109/informatics47936.2019.9119328
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the past and even now, many people still write texts without diacritics, especially in chat messages, e-mails or discussion posts. This issue evolved from historical reasons when people had a problem with text encoding in messages or wanted to write them faster. In this paper, we propose an algorithm based on word n-grams (contiguous sequence of n words) that restore diacritics of text written in the Slovak language. We also compare and evaluate our results with existing algorithms developed for Slovak texts.
引用
收藏
页码:243 / 248
页数:6
相关论文
共 50 条
  • [21] Contextual Spellchecking Based on N-grams
    Srdic, Ivan
    Gledec, Gordan
    [J]. CENTRAL EUROPEAN CONFERENCE ON INFORMATION AND INTELLIGENT SYSTEMS: PROCEEDINGS ARCHIVE 2017, 2017, : 29 - 33
  • [22] Using Word N-Grams as Features in Arabic Text Classification
    Al-Thubaity, Abdulmohsen
    Alhoshan, Muneera
    Hazzaa, Itisam
    [J]. SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2015, 569 : 35 - 43
  • [23] Error Classification Using Automatic Measures Based on n-grams and Edit Distance
    Benko, L'ubomir
    Benkova, Lucia
    Munkova, Dasa
    Munk, Michal
    Shulzenko, Danylo
    [J]. ADVANCED RESEARCH IN TECHNOLOGIES, INFORMATION, INNOVATION AND SUSTAINABILITY, ARTIIS 2022, PT I, 2022, 1675 : 345 - 356
  • [24] Algorithm for Updating n-Grams Word Dictionary for Web Classification
    Abidin, Taufik Fuadi
    Ferdhiana, Ridha
    [J]. 2016 INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTING (ICIC), 2016, : 432 - 436
  • [25] Combining Word and Character N-grams for Detecting Deceptive Opinions
    Siagian, Al Hafiz Akbar Maulana
    Aritsugi, Masayoshi
    [J]. 2017 IEEE 41ST ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2017, : 828 - 833
  • [26] Word-conditioned phone N-grams for speaker recognition
    Lei, Howard
    Mirghafori, Nikki
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 253 - +
  • [27] Learning Chinese Word Embeddings With Words and Subcharacter N-Grams
    Kang, Ruizhi
    Zhang, Hongjun
    Hao, Wenning
    Cheng, Kai
    Zhang, Guanglu
    [J]. IEEE ACCESS, 2019, 7 : 42987 - 42992
  • [28] AUTOMATIC RECOGNITION OF COMMON ARABIC HANDWRITTEN WORDS BASED ON OCR AND N-GRAMS
    Dinges, Laslo
    Al-Hamadi, Ayoub
    Elzobi, Moftah
    Nuernberger, Andreas
    [J]. 2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3625 - 3629
  • [29] Frequency Consolidation Among Word N-Grams A Practical Procedure
    Buerki, Andreas
    [J]. COMPUTATIONAL AND CORPUS-BASED PHRASEOLOGY, EUROPHRAS 2017, 2017, 10596 : 432 - 446
  • [30] The method of N-grams in large-scale clustering of DNA texts
    Volkovich, Z
    Kirzhner, V
    Bolshoy, A
    Nevo, E
    Korol, A
    [J]. PATTERN RECOGNITION, 2005, 38 (11) : 1902 - 1912