ABC-Fusion: Adapter-based BERT-level confusion set fusion approach for Chinese spelling correction

被引:2
|
作者
Xie, Jiaying [1 ]
Dang, Kai [1 ]
Liu, Jie [1 ]
Liang, Enlei [1 ]
机构
[1] Nankai Univ, Coll Artificial Intelligence, Tianjin, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Chinese spelling correction; BERT adapter; Knowledge fusion;
D O I
10.1016/j.csl.2023.101540
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Chinese spelling correction (CSC) aims to automatically detect and correct spelling errors in Chinese sentences. Recently, the method that combines a pre-trained language model with external knowledge has achieved excellent performance. The knowledge is either derived from multi-modal information such as pronunciations and glyphs, or from a confusion set that collects confusing character pairs. However, existing advanced multi-modal knowledge based methods have superior performance at the cost of largely increased model size; and although context semantics is essential for CSC, current confusion set based methods fail to use the confusion set to model the semantics as they do not fuse the lexical feature. To deal with these issues, we propose an Adapter-based BERT-level Confusion Set Fusion method which fuses BERT with the semantics of confusing characters in the semantic encoding phase. A lightweight adapter is designed to be placed between BERT layers, which dynamically extracts the relevant knowledge among the confusing candidates and integrates it with the context. In this way, the contextual information and the semantics of the candidates can fully interact within BERT. Experiments1 are conducted on three benchmarks. The results demonstrate that our method outperforms the previous confusion set based methods and shows comparable performance with the multi-modal knowledge based methods.
引用
收藏
页数:14
相关论文
共 1 条
  • [1] Chinese Spelling Correction Model Based on Gated Feature Fusion
    Zhou Y.
    Sun Z.
    Wu X.
    Yu K.
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2023, 46 (04): : 91 - 122