A Hybrid Ranking Approach to Chinese Spelling Check

被引:7
|
作者
Liu, Xiaodong [1 ]
Cheng, Fei [1 ]
Duh, Kevin [1 ]
Matsumoto, Yuji [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara, Japan
关键词
Algorithms; Performance; Languages; Chinese spelling check; candidate generation; candidate ranking; FEATURE-SELECTION; CLASSIFICATION; TUTORIAL;
D O I
10.1145/2822264
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel framework for Chinese Spelling Check (CSC), which is an automatic algorithm to detect and correct Chinese spelling errors. Our framework contains two key components: candidate generation and candidate ranking. Our framework differs from previous research, such as Statistical Machine Translation (SMT) based model or Language Model (LM) based model, in that we use both SMT and LM models as components of our framework for generating the correction candidates, in order to obtain maximum recall; to improve the precision, we further employ a Support Vector Machines (SVM) classifier to rank the candidates generated by the SMT and the LM. Experiments show that our framework outperforms other systems, which adopted the same or similar resources as ours in the SIGHAN 7 shared task; even comparing with the state-of- the-art systems, which used more resources, such as a considerable large dictionary, an idiom dictionary and other semantic information, our framework still obtains competitive results. Furthermore, to address the resource scarceness problem for training the SMT model, we generate around 2 million artificial training sentences using the Chinese character confusion sets, which include a set of Chinese characters with similar shapes and similar pronunciations, provided by the SIGHAN 7 shared task.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check
    Wane, Dingmin
    Song, Yan
    Li, Jing
    Han, Jialong
    Zhang, Haisong
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2517 - 2527
  • [2] A Hybrid Model for Chinese Spelling Check
    Zhao, Hai
    Cai, Deng
    Xin, Yang
    Wang, Yuzhu
    Jia, Zhongye
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2017, 16 (03)
  • [3] A Chinese OCR spelling check approach based on statistical language models
    Li, Z
    Bao, T
    Zhu, XY
    Wang, CH
    Naoi, SS
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 4727 - 4732
  • [4] Improve Chinese Spelling Check by Reevaluation
    Wang, Shuai
    Shang, Lin
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT III, 2022, 13282 : 237 - 248
  • [5] A Probabilistic Framework for Chinese Spelling Check
    Chen, Kuan-Yu
    Wang, Hsin-Min
    Chen, Hsin-Hsi
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2015, 14 (04)
  • [6] Improving Chinese Spelling Correction by Ranking
    Feng, Junjia
    Wang, Shuai
    Yin, Wenbiao
    Shang, Lin
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [7] CCCSpell: A Consistent and Contrastive Learning Approach with Character Similarity for Chinese Spelling Check
    Su, Jindian
    Lin, Xiaobin
    Xie, Yunhao
    Cheng, Zehua
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [8] Chinese Spelling Check based on Sequence Labeling
    Han, Zijia
    Lv, Chengguo
    Wang, Qiansheng
    Fu, Guohong
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 373 - 378
  • [9] Dynamic Connected Networks for Chinese Spelling Check
    Wang, Baoxin
    Che, Wanxiang
    Wu, Dayong
    Wang, Shijin
    Hu, Guoping
    Liu, Ting
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2437 - 2446
  • [10] Prompt as a Knowledge Probe for Chinese Spelling Check
    Peng, Kun
    Sun, Nannan
    Cao, Jiahao
    Liu, Rui
    Ren, Jiaqian
    Jiang, Lei
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2022, PT III, 2022, 13370 : 516 - 527