A Hybrid Ranking Approach to Chinese Spelling Check

被引:7
|
作者
Liu, Xiaodong [1 ]
Cheng, Fei [1 ]
Duh, Kevin [1 ]
Matsumoto, Yuji [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara, Japan
关键词
Algorithms; Performance; Languages; Chinese spelling check; candidate generation; candidate ranking; FEATURE-SELECTION; CLASSIFICATION; TUTORIAL;
D O I
10.1145/2822264
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel framework for Chinese Spelling Check (CSC), which is an automatic algorithm to detect and correct Chinese spelling errors. Our framework contains two key components: candidate generation and candidate ranking. Our framework differs from previous research, such as Statistical Machine Translation (SMT) based model or Language Model (LM) based model, in that we use both SMT and LM models as components of our framework for generating the correction candidates, in order to obtain maximum recall; to improve the precision, we further employ a Support Vector Machines (SVM) classifier to rank the candidates generated by the SMT and the LM. Experiments show that our framework outperforms other systems, which adopted the same or similar resources as ours in the SIGHAN 7 shared task; even comparing with the state-of- the-art systems, which used more resources, such as a considerable large dictionary, an idiom dictionary and other semantic information, our framework still obtains competitive results. Furthermore, to address the resource scarceness problem for training the SMT model, we generate around 2 million artificial training sentences using the Chinese character confusion sets, which include a set of Chinese characters with similar shapes and similar pronunciations, provided by the SIGHAN 7 shared task.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Correcting real-word spelling errors: A new hybrid approach
    Dashti, Seyed MohammadSadegh
    Bardsiri, Amid Khatibi
    Bardsiri, Vahid Khatibi
    [J]. DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2018, 33 (03) : 488 - 499
  • [42] Documents ranking based on a hybrid language model for Chinese information retrieval
    Zheng, Dequan
    Yu, Feng
    Zhao, Tiejun
    Li, Sheng
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON INFORMATION ACQUISITION, VOLS 1 AND 2, CONFERENCE PROCEEDINGS, 2006, : 279 - 283
  • [43] SPELLING IMPROVEMENT THROUGH A SELF-CHECK DEVICE
    EICHOLZ, GC
    [J]. ELEMENTARY SCHOOL JOURNAL, 1964, 64 (07): : 373 - 376
  • [44] IMPROVEMENT OF CLUSTERING ALGORITHMS BY IMPLEMENTATION OF SPELLING BASED RANKING
    Bryer, Evan
    Rhujittawiwat, Theppatorn
    Rose, John R.
    Wilder, Colin F.
    [J]. IADIS-INTERNATIONAL JOURNAL ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2021, 16 (02): : 45 - 60
  • [45] Spelling in Chinese: An RT study
    Zhou, HY
    Shu, H
    [J]. INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2004, 39 (5-6) : 98 - 98
  • [46] A Hybrid System Approach to Determine the Ranking of a Debutant Country in Eurovision
    Ochoa, Alberto
    Munoz-Zavala, Angel E.
    Hernandez-Aguirre, Arturo
    [J]. JOURNAL OF COMPUTERS, 2009, 4 (08) : 713 - 720
  • [47] An Automatic Approach to "De, Di, De" of spelling Errors Detection in Chinese Text
    Gu, Lei
    Wang, Yong
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE OF ONLINE ANALYSIS AND COMPUTING SCIENCE (ICOACS), 2016, : 76 - 79
  • [48] Think Twice: A Post-Processing Approach for the Chinese Spelling Error Correction
    Gou, Wei
    Chen, Zheng
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (13):
  • [49] Online-Review-Driven Products Ranking: A Hybrid Approach
    Qu, Shaojian
    Zhang, Yang
    Ji, Ying
    Wang, Zheng
    Geng, Ruijuan
    [J]. SYSTEMS, 2023, 11 (03):
  • [50] Soft Hybrid Filter Pruning using a Dual Ranking Approach
    Chen, Peng-Yu
    Yang, Jen-Chieh
    Wang, Sheng-De
    [J]. 2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 2432 - 2439