A Probabilistic Framework for Chinese Spelling Check

被引:3
|
作者
Chen, Kuan-Yu [1 ,2 ]
Wang, Hsin-Min [1 ]
Chen, Hsin-Hsi [3 ]
机构
[1] Acad Sinica, Inst Informat Sci, Taichung, Taiwan
[2] Natl Taiwan Univ, Taichung, Taiwan
[3] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taichung, Taiwan
关键词
Algorithms; Performance; Theory; Language model; Chinese; spelling check; topic modeling; probabilistic;
D O I
10.1145/2826234
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Chinese spelling check (CSC) is still an unsolved problem today since there are many homonymous or homomorphous characters. Recently, more and more CSC systems have been proposed. To the best of our knowledge, language modeling is one of the major components among these systems because of its simplicity and moderately good predictive power. After deeply analyzing the school of research, we are aware that most of the systems only employ the conventional n-gram language models. The contributions of this article are threefold. First, we propose a novel probabilistic framework for CSC, which naturally combines several important components, such as the substitution model and the language model, to inherit their individual merits as well as to overcome their limitations. Second, we incorporate the topic language models into the CSC system in an unsupervised fashion. The topic language models can capture the long-span semantic information from a word (character) string while the conventional n-gram language models can only preserve the local regularity information. Third, we further integrate Web resources with the proposed framework to enhance the overall performance. Our rigorously empirical experiments demonstrate the consistent and utility performance of the proposed framework in the CSC task.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] DUKE: Distance Fusion and Knowledge Enhanced Framework for Chinese Spelling Check
    Liang, Jianzeng
    Huang, Wenkang
    Li, Fengyi
    Shi, Qiuhui
    [J]. 2022 EURO-ASIA CONFERENCE ON FRONTIERS OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, FCSIT, 2022, : 1 - 5
  • [2] Dual-Detector: An Unsupervised Learning Framework for Chinese Spelling Check
    Shao, Feiran
    Li, Jinlong
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT IV, 2023, 13938 : 162 - 173
  • [3] Improve Chinese Spelling Check by Reevaluation
    Wang, Shuai
    Shang, Lin
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT III, 2022, 13282 : 237 - 248
  • [4] A Hybrid Model for Chinese Spelling Check
    Zhao, Hai
    Cai, Deng
    Xin, Yang
    Wang, Yuzhu
    Jia, Zhongye
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2017, 16 (03)
  • [5] A Hybrid Ranking Approach to Chinese Spelling Check
    Liu, Xiaodong
    Cheng, Fei
    Duh, Kevin
    Matsumoto, Yuji
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2015, 14 (04)
  • [6] Chinese Spelling Check based on Sequence Labeling
    Han, Zijia
    Lv, Chengguo
    Wang, Qiansheng
    Fu, Guohong
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 373 - 378
  • [7] Dynamic Connected Networks for Chinese Spelling Check
    Wang, Baoxin
    Che, Wanxiang
    Wu, Dayong
    Wang, Shijin
    Hu, Guoping
    Liu, Ting
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2437 - 2446
  • [8] Prompt as a Knowledge Probe for Chinese Spelling Check
    Peng, Kun
    Sun, Nannan
    Cao, Jiahao
    Liu, Rui
    Ren, Jiaqian
    Jiang, Lei
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2022, PT III, 2022, 13370 : 516 - 527
  • [9] SpellBERT: A Lightweight Pretrained Model for Chinese Spelling Check
    Ji, Tuo
    Yan, Hang
    Qiu, Xipeng
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3544 - 3551
  • [10] Chinese Spelling Check via Bidirectional LSTM-CRF
    Duan, Jianyong
    Wang, Bing
    Tan, Zheng
    Wei, Xiaopeng
    Wang, Hao
    [J]. PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 1333 - 1336