A Hybrid Model for Chinese Spelling Check

被引:16
|
作者
Zhao, Hai [1 ,2 ]
Cai, Deng [1 ,2 ]
Xin, Yang [3 ]
Wang, Yuzhu [3 ]
Jia, Zhongye [4 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, 800 Dongchuan Rd, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Key Lab Shanghai Educ Commiss Intelligent Interac, 800 Dongchuan Rd, Shanghai 200240, Peoples R China
[3] Huawei Technol Co Ltd, 2222 Xinjinqiao Rd, Shanghai 201206, Peoples R China
[4] Baosteel Res Inst, 655 Fujin Rd, Shanghai 201900, Peoples R China
基金
中国国家自然科学基金;
关键词
Chinese spelling check; hybrid model; graph model; conditional random field; rule-based model;
D O I
10.1145/3047405
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spelling check for Chinese has more challenging difficulties than that for other languages. A hybrid model for Chinese spelling check is presented in this article. The hybrid model consists of three components: one graph-based model for generic errors and two independently trained models for specific errors. In the graph model, a directed acyclic graph is generated for each sentence, and the single-source shortest-path algorithm is performed on the graph to detect and correct general spelling errors at the same time. Prior to that, two types of errors over functional words (characters) are first solved by conditional random fields: the confusion of (at) (pinyin is zai in Chinese), (again, more, then) (pinyin: zai) and (of) (pinyin: de), (- ly, adverb- forming particle) (pinyin: de), and (so that, have to) (pinyin: de). Finally, a rule- based model is exploited to distinguish pronoun usage confusion: (she) (pinyin: ta), (he) (pinyin: ta), and some other common collocation errors. The proposed model is evaluated on the standard datasets released by the SIGHAN Bake-off shared tasks, giving state-of-the-art results.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] A Hybrid Ranking Approach to Chinese Spelling Check
    Liu, Xiaodong
    Cheng, Fei
    Duh, Kevin
    Matsumoto, Yuji
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2015, 14 (04)
  • [2] A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check
    Wane, Dingmin
    Song, Yan
    Li, Jing
    Han, Jialong
    Zhang, Haisong
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2517 - 2527
  • [3] SpellBERT: A Lightweight Pretrained Model for Chinese Spelling Check
    Ji, Tuo
    Yan, Hang
    Qiu, Xipeng
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3544 - 3551
  • [4] Improve Chinese Spelling Check by Reevaluation
    Wang, Shuai
    Shang, Lin
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT III, 2022, 13282 : 237 - 248
  • [5] A Probabilistic Framework for Chinese Spelling Check
    Chen, Kuan-Yu
    Wang, Hsin-Min
    Chen, Hsin-Hsi
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2015, 14 (04)
  • [6] Chinese Spelling Check based on Sequence Labeling
    Han, Zijia
    Lv, Chengguo
    Wang, Qiansheng
    Fu, Guohong
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 373 - 378
  • [7] Dynamic Connected Networks for Chinese Spelling Check
    Wang, Baoxin
    Che, Wanxiang
    Wu, Dayong
    Wang, Shijin
    Hu, Guoping
    Liu, Ting
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2437 - 2446
  • [8] Prompt as a Knowledge Probe for Chinese Spelling Check
    Peng, Kun
    Sun, Nannan
    Cao, Jiahao
    Liu, Rui
    Ren, Jiaqian
    Jiang, Lei
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2022, PT III, 2022, 13370 : 516 - 527
  • [9] Efficient word segmentation for enhancing Chinese spelling check in pre-trained language model
    Li, Fangfang
    Jiang, Jie
    Tang, Dafu
    Shan, Youran
    Duan, Junwen
    Zhang, Shichao
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024,
  • [10] Make BERT-based Chinese Spelling Check Model Enhanced by Layerwise Attention and Gaussian
    Cao, Yongchang
    He, Liang
    Wu, Zhen
    Dai, Xinyu
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,