Arabic spelling error detection and correction

被引:18
|
作者
Attia, Mohammed [1 ,2 ]
Pecina, Pavel [3 ]
Samih, Younes [4 ]
Shaalan, Khaled [2 ]
Van Genabith, Josef [1 ]
机构
[1] Dublin City Univ, Sch Comp, Dublin, Ireland
[2] British Univ Dubai, Fac Engn & IT, Dubai, U Arab Emirates
[3] Charles Univ Prague, Fac Math & Phys, Prague, Czech Republic
[4] Univ Dusseldorf, Dept Linguist & Informat Sci, Dusseldorf, Germany
基金
爱尔兰科学基金会; 新加坡国家研究基金会;
关键词
WORDS;
D O I
10.1017/S1351324915000030
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A spelling error detection and correction application is typically based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We develop our dictionary of 9.2 million fully-inflected Arabic words (types) from a morphological transducer and a large corpus, validated and manually revised. We improve the error model by analyzing error types and creating an edit distance re-ranker. We also improve the language model by analyzing the level of noise in different data sources and selecting an optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2013, OpenOffice Ayaspell 3.4 and Google Docs.
引用
收藏
页码:751 / 773
页数:23
相关论文
共 50 条
  • [1] Spelling Error Detection and Correction for Arabic Using NooJ
    Kassmi, Rafik
    Mbarki, Samir
    Mouloudi, Abdelaziz
    [J]. FORMALIZING NATURAL LANGUAGES: APPLICATIONS TO NATURAL LANGUAGE PROCESSING AND DIGITAL HUMANITIES, NOOJ 2023, 2024, 1816 : 202 - 212
  • [2] Spelling Error Detection and Correction for Arabic Using NooJ
    Kassmi, Rafik
    Mbarki, Samir
    Mouloudi, Abdelaziz
    [J]. Communications in Computer and Information Science, 2024, 1816 CCIS : 202 - 212
  • [3] ERROR-DETECTION AND CORRECTION IN SPELLING
    LYDIATT, S
    [J]. ACADEMIC THERAPY, 1984, 20 (01): : 33 - 40
  • [4] Deep Learning for Arabic Error Detection and Correction
    Alkhatib, Manar
    Monem, Azza Abdel
    Shaalan, Khaled
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (05)
  • [5] A hybrid model for spelling error detection and correction for Urdu language
    Aziz, Romila
    Anwar, Muhammad Waqas
    Jamal, Muhammad Hasan
    Bajwa, Usama Ijaz
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (21): : 14707 - 14721
  • [6] A hybrid model for spelling error detection and correction for Urdu language
    Romila Aziz
    Muhammad Waqas Anwar
    Muhammad Hasan Jamal
    Usama Ijaz Bajwa
    [J]. Neural Computing and Applications, 2021, 33 : 14707 - 14721
  • [7] SPELLING ERROR DETECTION-CORRECTION FOR LARGE TEXT FILES
    POLLOCK, JJ
    ZAMORA, A
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1981, 181 (MAR): : 31 - CINF
  • [8] AUTOMATIC SPELLING ERROR-DETECTION AND CORRECTION IN TEXTUAL DATABASES
    POLLOCK, JJ
    ZAMORA, A
    [J]. PROCEEDINGS OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1982, 19 : 236 - 238
  • [9] Real Word Spelling Error Detection and Correction for Urdu Language
    Aziz, Romila
    Anwar, Muhammad Waqas
    Jamal, Muhammad Hasan
    Bajwa, Usama Ijaz
    Castilla, Angel Kuc
    Rios, Carlos Uc
    Thompson, Ernesto Bautista
    Ashraf, Imran
    [J]. IEEE ACCESS, 2023, 11 : 100948 - 100962
  • [10] Chinese Spelling Error Detection and Correction Based on Knowledge Graph
    Sun, Ximin
    Zhou, Jing
    Wang, Shuai
    Li, Huichao
    Jia, Jiangkai
    Zhu, Jiazheng
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS. DASFAA 2022 INTERNATIONAL WORKSHOPS, 2022, 13248 : 149 - 159