A Spell Checker for a Low-resourced and Morphologically Rich Language

被引:0
|
作者
Octaviano, Manolito, Jr. [1 ]
Borra, Allan [1 ]
机构
[1] De La Salle Univ, Coll Comp Studies, Manila, Philippines
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Spell checking plays an important role in improving the quality of documents by identifying misspelled words in the document. There are various efforts made towards advancement of spell checkers on other languages such as in English that has almost perfected spell checking system (e.g. Microsoft Word). However, few efforts were made to develop an efficient Filipino spell checker. One major challenge of existing Filipino spell checkers, being dictionary-based, is the lack of a complete dictionary to capture all inflected forms (e.g. isinasama 'including', isasama 'will be included', and isinama 'included' with the base form sama 'include'), borrowing (e.g. magtex 'to text' and nagtex 'texted'), and code-switching (e.g. magtext 'to text', and nag-text 'texted' with the base form 'text') of a word. In addition, existing systems cannot handle code switching wherein valid words are being marked as erroneous. In this research, a spell checking is designed for Filipino low-resourced morphologically rich language. It detects and corrects typographical errors in the language and introduces a modified version of metaphone algorithm for ranking the candidate suggestions. The system results to 81% recall, 53.64% precision, 64.53% f-measure, and 87.78% suggestion adequacy on 100 sentences taken from exercise documents of Filipino students.
引用
收藏
页码:1853 / 1856
页数:4
相关论文
共 50 条
  • [21] Leveraging Large Language Models in Low-resourced Language NLP: A spaCy Implementation for Modern Tibetan
    Kyogoku, Yuki
    Erhard, Franz Xaver
    Engels, James
    Barnett, Robert
    REVUE D ETUDES TIBETAINES, 2025, (74):
  • [22] Wavelet Scattering Transform for Improving Generalization in Low-Resourced Spoken Language Identification
    Dey, Spandan
    Singh, Premjeet
    Saha, Goutam
    INTERSPEECH 2023, 2023, : 1953 - 1957
  • [23] Digital measures in epilepsy in low-resourced environments
    Ali, Amza
    Clarke, Dave F.
    EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH, 2024, 24 (06) : 705 - 712
  • [24] Ethical Considerations for Low-resourced Machine Translation
    Haroutunian, Levon
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 44 - 54
  • [25] Data sharing in low-resourced research environments
    Rappert, Brian
    Bezuidenhout, Louise
    PROMETHEUS, 2016, 34 (3-4) : 207 - 224
  • [26] Transformer-based Machine Translation for Low-resourced Languages embedded with Language Identification
    Sefara, Tshephisho J.
    Zwane, Skhumbuzo G.
    Gama, Nelisiwe
    Sibisi, Hlawulani
    Senoamadi, Phillemon N.
    Marivate, Vukosi
    2021 CONFERENCE ON INFORMATION COMMUNICATIONS TECHNOLOGY AND SOCIETY (ICTAS), 2021, : 127 - 132
  • [27] Text Classification of News Articles Using Machine Learning on Low-resourced Language: Tigrigna
    Fesseha, Awet
    Xiong, Shengwu
    Emiru, Eshete Derb
    Dahou, Abdelghani
    2020 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2020), 2020, : 34 - 38
  • [28] Analysis of Automatic Evaluation Metric on Low-Resourced Language: BERTScore vs BLEU Score
    Datta, Goutam
    Joshi, Nisheeth
    Gupta, Kusum
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 155 - 162
  • [29] Explainable Pre-Trained Language Models for Sentiment Analysis in Low-Resourced Languages
    Mabokela, Koena Ronny
    Primus, Mpho
    Celik, Turgay
    BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (11)
  • [30] END-TO-END CODE-SWITCHING ASR FOR LOW-RESOURCED LANGUAGE PAIRS
    Yue, Xianghu
    Lee, Grandee
    Yilmaz, Emre
    Deng, Fang
    Li, Haizhou
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 972 - 979