Character-level Adversarial Samples Generation Approach for Chinese Text Classification

被引:0
|
作者
Zhang, Shunxiang [1 ]
Wu, Houyue
Zhu, Guangli
Xu, Xin
Su, Mingxing
机构
[1] Anhui Univ Sci & Technol, Sch Comp Sci & Engn, Huainan 232001, Peoples R China
基金
中国国家自然科学基金;
关键词
Anti-sample generation; Text classification; Sentimental classification; Polyphonic characters; Character-level adversarial samples;
D O I
10.11999/JEIT220563
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Adversarial sample generation is a technique that makes the neural network produce misjudgments by adding small disturbance information. Which can be used to detect the robustness of text classification models. At present, the methods of sample generation in the Chinese domain include mainly traditional characters and homophones substitution, which have the problems of large disturbance amplitude of sample generation and low quality of sample generation. Polyphonic characters Generation Adversarial Sample (PGAS), a character-level countermeasure samples generation approach, is proposed in this paper. Which can generate high-quality adversarial samples with minor disturbance by replacing polyphonic characters. First, a polyphonic word dictionary to label polyphonic words is constructed. Then, the input text with polyphonic words is replaced. Finally, an adversarial sample attack experiment in the black-box model is conducted. Experiments on multiple sentiment classification datasets verify the effectiveness of the proposed method for a variety of the latest classification models.
引用
收藏
页码:2226 / 2235
页数:10
相关论文
共 42 条
  • [1] [Anonymous], 2020, ASS COMPUTATIONAL LI, P2920, DOI [10.18653/v1/2020.acl-main.263, DOI 10.18653/V1/2020.ACL-MAIN.263]
  • [2] [Anonymous], 2018, P C COMP NAT LANG LE, DOI DOI 10.18653/V1/K18-1011
  • [3] Boxin WANG, 2020, 2020 C EMP METH NAT, P6134, DOI [10.18653/v1/2020.emnlp-main.495, DOI 10.18653/V1/2020.EMNLP-MAIN.495]
  • [4] Cheng MH, 2020, AAAI CONF ARTIF INTE, V34, P3601
  • [5] Ebrahimi J, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, P31
  • [6] Ebrahimi Javid, 2018, P 27 INT C COMPUTATI
  • [7] Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers
    Gao, Ji
    Lanchantin, Jack
    Soffa, Mary Lou
    Qi, Yanjun
    [J]. 2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2018), 2018, : 50 - 56
  • [8] Goodman D, 2020, Arxiv, DOI arXiv:2002.00760
  • [9] Han WJ, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P2327
  • [10] He Ruidan, 2018, Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, P1121, DOI [10.18653/v1/P18-2092, DOI 10.18653/V1/P18-2092]