A Spell Checker for a Low-resourced and Morphologically Rich Language

被引:0
|
作者
Octaviano, Manolito, Jr. [1 ]
Borra, Allan [1 ]
机构
[1] De La Salle Univ, Coll Comp Studies, Manila, Philippines
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Spell checking plays an important role in improving the quality of documents by identifying misspelled words in the document. There are various efforts made towards advancement of spell checkers on other languages such as in English that has almost perfected spell checking system (e.g. Microsoft Word). However, few efforts were made to develop an efficient Filipino spell checker. One major challenge of existing Filipino spell checkers, being dictionary-based, is the lack of a complete dictionary to capture all inflected forms (e.g. isinasama 'including', isasama 'will be included', and isinama 'included' with the base form sama 'include'), borrowing (e.g. magtex 'to text' and nagtex 'texted'), and code-switching (e.g. magtext 'to text', and nag-text 'texted' with the base form 'text') of a word. In addition, existing systems cannot handle code switching wherein valid words are being marked as erroneous. In this research, a spell checking is designed for Filipino low-resourced morphologically rich language. It detects and corrects typographical errors in the language and introduces a modified version of metaphone algorithm for ranking the candidate suggestions. The system results to 81% recall, 53.64% precision, 64.53% f-measure, and 87.78% suggestion adequacy on 100 sentences taken from exercise documents of Filipino students.
引用
收藏
页码:1853 / 1856
页数:4
相关论文
共 50 条
  • [1] Gramatika: A Grammar Checker for the Low-Resourced Filipino Language
    Go, Matthew Phillip
    Nocon, Nicco
    Borra, Allan
    TENCON 2017 - 2017 IEEE REGION 10 CONFERENCE, 2017, : 471 - 475
  • [2] An Automatic Summarizer for a Low-Resourced Language
    Pattnaik, Sagarika
    Nayak, Ajit Kumar
    ADVANCED COMPUTING AND INTELLIGENT ENGINEERING, 2020, 1082 : 285 - 295
  • [3] Word Sense Disambiguation for Morphologically Rich Low-Resourced Languages: A Systematic Literature Review and Meta-Analysis
    Masethe, Hlaudi Daniel
    Masethe, Mosima Anna
    Ojo, Sunday Olusegun
    Giunchiglia, Fausto
    Owolawi, Pius Adewale
    INFORMATION, 2024, 15 (09)
  • [4] Performance of Recent Large Language Models for a Low-Resourced Language
    Jayakody, Ravindu
    Dias, Gihan
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 162 - 167
  • [5] A Need Finding Study with Low-Resourced Language Content Creators
    Nigatu, Hellina Hailu
    Canny, John
    Chasins, Sarah
    PROCEEDINGS OF THE 4TH AFRICAN CONFERENCE FOR HUMAN COMPUTER INTERACTION, AFRICHI 2023, 2023, : 1 - 4
  • [6] A First LVCSR System for Luxembourgish, a Low-Resourced European Language
    Adda-Decker, Martine
    Lamel, Lori
    Adda, Gilles
    Lavergne, Thomas
    HUMAN LANGUAGE TECHNOLOGY CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, 2014, 8387 : 479 - 490
  • [7] Diabetes in low-resourced countries
    Ashwal, Eran
    Hadar, Eran
    Hod, Moshe
    BEST PRACTICE & RESEARCH CLINICAL OBSTETRICS & GYNAECOLOGY, 2015, 29 (01) : 91 - 101
  • [8] Common latent representation learning for low-resourced spoken language identification
    Chen, Chen
    Bu, Yulin
    Chen, Yong
    Chen, Deyun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (12) : 34515 - 34535
  • [9] AN INVESTIGATION INTO LANGUAGE MODEL DATA AUGMENTATION FOR LOW-RESOURCED STT AND KWS
    Huang, Guangpu
    da Silva, Thiago Fraga
    Lamel, Lori
    Gauvain, Jean-Luc
    Gorin, Arseniy
    Laurent, Antoine
    Lileikyte, Rasa
    Messouadi, Abdel
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5790 - 5794
  • [10] Common latent representation learning for low-resourced spoken language identification
    Chen Chen
    Yulin Bu
    Yong Chen
    Deyun Chen
    Multimedia Tools and Applications, 2024, 83 : 34515 - 34535