A CHINESE CHARACTER-LEVEL AND WORD-LEVEL COMPLEMENTARY TEXT CLASSIFICATION METHOD

被引:1
|
作者
Chen, Wentong [1 ]
Fan, Chunxiao [1 ]
Wu, Yuexin [1 ]
Lou, Zhixiong [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Elect Engn, Beijing, Peoples R China
关键词
text classification; word-level and character-level features fusion; attention mechanism; feature alignment;
D O I
10.1109/TAAI51410.2020.00042
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification is a basic but important task in many natural language processing tasks. Nowadays, the mainstream classification methods mostly use deep learning technology, which shows better accuracy and stability in English text classification. Different from English text, Chinese text classification task involves the granularity of feature description in text decomposition. The two commonly used feature granularity are word-level feature and character-level feature. The former will bring semantic loss in the process of word segmentation, while the latter can't use the advanced semantic feature in the pre-trained word vector. We propose a method to fuse the word-level and the character-level information with attention mechanism. We train the CWC-Net, which combines the features to make the embedded information of characters and words complementary, so as to improve the semantic understanding ability of the network for Chinese text and reduce semantic loss. The comparative experiments on four Chinese text datasets, which involving topic classification and emotion analysis show that our model is more accurate than the traditional model which only relies on word-level features or character-level features. That verifies the effectiveness of the fusion of word-level features and character-level features on the improvement of model capability.
引用
收藏
页码:187 / 192
页数:6
相关论文
共 50 条
  • [1] An Efficient Character-Level and Word-Level Feature Fusion Method for Chinese Text Classification
    Jin Wenzhen
    Zhu Hong
    Yang Guocai
    [J]. 2019 3RD INTERNATIONAL CONFERENCE ON MACHINE VISION AND INFORMATION TECHNOLOGY (CMVIT 2019), 2019, 1229
  • [2] Integrating Character-level and Word-level Representation for Affect in Arabic Tweets
    Alharbi, Abdullah I.
    Smith, Phillip
    Lee, Mark
    [J]. Data and Knowledge Engineering, 2022, 138
  • [3] Integrating Character-level and Word-level Representation for Affect in Arabic Tweets
    Alharbi, Abdullah, I
    Smith, Phillip
    Lee, Mark
    [J]. DATA & KNOWLEDGE ENGINEERING, 2022, 138
  • [4] Chinese text classification based on character-level CNN and SVM
    Wu, Huaiguang
    Li, Daiyi
    Cheng, Ming
    [J]. International Journal of Intelligent Information and Database Systems, 2019, 12 (03) : 212 - 228
  • [5] Word-Level and Pinyin-Level Based Chinese Short Text Classification
    Sun, Xinjie
    Huo, Xingying
    [J]. IEEE ACCESS, 2022, 10 : 125552 - 125563
  • [6] Character-level Convolutional Networks for Text Classification
    Zhang, Xiang
    Zhao, Junbo
    Yann Lecun
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [7] Character-level Adversarial Samples Generation Approach for Chinese Text Classification
    Zhang, Shunxiang
    Wu, Houyue
    Zhu, Guangli
    Xu, Xin
    Su, Mingxing
    [J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2023, 45 (06) : 2226 - 2235
  • [8] Character-level Neural Networks for Short Text Classification
    Liu, Jingxue
    Meng, Fanrong
    Zhou, Yong
    Liu, Bing
    [J]. 2017 INTERNATIONAL SMART CITIES CONFERENCE (ISC2), 2017,
  • [9] OPEN VOCABULARY HANDWRITING RECOGNITION USING COMBINED WORD-LEVEL AND CHARACTER-LEVEL LANGUAGE MODELS
    Kozielski, Michal
    Rybach, David
    Hahn, Stefan
    Schlueter, Ralf
    Ney, Hermann
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8257 - 8261
  • [10] Application of the character-level statistical method in text categorization
    Yang, Zhen
    Nie, Xiangfei
    Xu, Weiran
    Guo, Jun
    [J]. 2006 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PTS 1 AND 2, PROCEEDINGS, 2006, : 1412 - 1417