SpellBERT: A Lightweight Pretrained Model for Chinese Spelling Check

被引:0
|
作者
Ji, Tuo [1 ]
Yan, Hang
Qiu, Xipeng
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Chinese Spelling Check (CSC) is to detect and correct Chinese spelling errors. Many models utilize a predefined confusion set to learn a mapping between correct characters and its visually similar or phonetically similar misuses but the mapping may be out-of-domain. To that end, we propose SpellBERT, a pretrained model with graph-based extra features and independent on confusion set. To explicitly capture the two erroneous patterns, we employ a graph neural network to introduce radical and pinyin information as visual and phonetic features. For better fusing these features with character representations, we devise masked language model alike pre-training tasks. With this feature-rich pre-training, SpellBERT with only half size of BERT can show competitive performance and make a state-of-the-art result on the OCR dataset where most of the errors are not covered by the existing confusion set
引用
收藏
页码:3544 / 3551
页数:8
相关论文
共 50 条
  • [1] A Hybrid Model for Chinese Spelling Check
    Zhao, Hai
    Cai, Deng
    Xin, Yang
    Wang, Yuzhu
    Jia, Zhongye
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2017, 16 (03)
  • [2] Improve Chinese Spelling Check by Reevaluation
    Wang, Shuai
    Shang, Lin
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT III, 2022, 13282 : 237 - 248
  • [3] A Probabilistic Framework for Chinese Spelling Check
    Chen, Kuan-Yu
    Wang, Hsin-Min
    Chen, Hsin-Hsi
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2015, 14 (04)
  • [4] A Hybrid Ranking Approach to Chinese Spelling Check
    Liu, Xiaodong
    Cheng, Fei
    Duh, Kevin
    Matsumoto, Yuji
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2015, 14 (04)
  • [5] Chinese Spelling Check based on Sequence Labeling
    Han, Zijia
    Lv, Chengguo
    Wang, Qiansheng
    Fu, Guohong
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 373 - 378
  • [6] Dynamic Connected Networks for Chinese Spelling Check
    Wang, Baoxin
    Che, Wanxiang
    Wu, Dayong
    Wang, Shijin
    Hu, Guoping
    Liu, Ting
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2437 - 2446
  • [7] Prompt as a Knowledge Probe for Chinese Spelling Check
    Peng, Kun
    Sun, Nannan
    Cao, Jiahao
    Liu, Rui
    Ren, Jiaqian
    Jiang, Lei
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2022, PT III, 2022, 13370 : 516 - 527
  • [8] Efficient word segmentation for enhancing Chinese spelling check in pre-trained language model
    Li, Fangfang
    Jiang, Jie
    Tang, Dafu
    Shan, Youran
    Duan, Junwen
    Zhang, Shichao
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024,
  • [9] Make BERT-based Chinese Spelling Check Model Enhanced by Layerwise Attention and Gaussian
    Cao, Yongchang
    He, Liang
    Wu, Zhen
    Dai, Xinyu
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [10] Chinese Spelling Check via Bidirectional LSTM-CRF
    Duan, Jianyong
    Wang, Bing
    Tan, Zheng
    Wei, Xiaopeng
    Wang, Hao
    [J]. PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 1333 - 1336