Chinese Spelling Correction Model Based on Gated Feature Fusion

被引:0
|
作者
Zhou Y. [1 ]
Sun Z. [1 ]
Wu X. [1 ]
Yu K. [1 ]
机构
[1] School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing
关键词
Chinese spelling correction; four corner code; gated feature fusion; pre-training;
D O I
10.13190/j.jbupt.2022-167
中图分类号
学科分类号
摘要
In response to the problem of model performance being affected by incorrect pronunciation or glyph when fusing semantic, phonetic and glyph information of Chinese characters equally in Chinese spelling correction, a Chinese spelling correction model based on gated feature fusion is proposed, which uses adaptive gates to selectively fuse semantic, phonetic and glyph information to improve the performance of the model and enhance the interpretability of the model. The improved four corner code is used to encode the glyph features of Chinese characters, effectively extracting the glyph features of Chinese characters, and based on this, the glyph similarity confusion set in the pre-training stage of the model is expanded. The pre-training mask strategy based on confusion set replacement is used to enable the model to effectively learn the erroneous knowledge contained in the text. On the public SIGHAN13, SIGHAN14 and SIGHAN15 datasets, the proposed model achieves correction F1-scores of 78. 7%, 67. 8% and 77. 7%, respectively, which are 1. 5%, 1. 5% and 1. 0% higher than the optimal baseline model. © 2023 Beijing University of Posts and Telecommunications. All rights reserved.
引用
收藏
页码:91 / 122
页数:31
相关论文
共 14 条
  • [1] LIU C L, LAI M H, CHUANG Y H, Et al., Visually and phonologically similar characters in incorrect simplified Chinese words, Proceedings of the 23rd International Conference on Computational Linguistics, pp. 739-747, (2010)
  • [2] XIN Y, ZHAO H, WANG Y Z, Et al., An improved graph model for Chinese spell checking, Proceedings of the Third CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 157-166, (2014)
  • [3] DEVLIN J, CHANG M W, LEE K, Et al., Bert: pretraining of deep bidirectional transformers for language understanding, Proceedings of NAACL-HLT 2019, pp. 4171-4186, (2019)
  • [4] CHENG X Y, XU W D, CHEN K L, Et al., SpellGCN: incorporating phonological and visual similarities into language models for Chinese spelling check, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 871-881, (2020)
  • [5] LIU S L, YANG T, YUE T C, Et al., PLOME: pretraining with misspelled knowledge for Chinese spelling correction, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 2991-3000, (2021)
  • [6] ZHANG S H, HUANG H R, LIU J C, Et al., Spelling error correction with soft-masked BERT, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 882-890, (2020)
  • [7] WU S H, LIU C L, LEE L H., Chinese spelling check evaluation at SIGHAN bake-off 2013, Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing, pp. 35-42, (2013)
  • [8] YU L C, LEE L H, TSENG Y H, Et al., Overview of SIGHAN 2014 bake-off for Chinese spelling check, Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 126-132, (2014)
  • [9] TSENG Y H, LEE L H, CHANG L P, Et al., Introduction to SIGHAN 2015 bake-off for Chinese spelling check, Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, pp. 32-37, (2015)
  • [10] WANG D M, SONG Y, LI J, Et al., A hybrid approach to automatic corpus generation for Chinese spelling check, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2517-2527, (2018)