A Chinese text classification based on active

被引:2
|
作者
Deng, Song [1 ]
Li, Qianliang [1 ]
Dai, Renjie [2 ]
Wei, Siming [2 ]
Wu, Di [3 ]
He, Yi [4 ]
Wu, Xindong [5 ]
机构
[1] Nanjing Univ Post & Telecommun, Inst Adv Technol, Nanjing 210003, Peoples R China
[2] State Grid Shanghai Municipal Elect Power Co, Shanghai 200122, Peoples R China
[3] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
[4] Old Dominion Univ, Norfolk, VA 23462 USA
[5] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Minist Educ China, Hefei 230009, Peoples R China
关键词
Natural language processing; Deep active learning; Hierarchical confidence; Power text; Knowledge graph; ALGORITHM;
D O I
10.1016/j.asoc.2023.111067
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The construction of knowledge graph is beneficial for grid production, electrical safety protection, fault diagnosis and traceability in an observable and controllable way. Highly-precision text classification algorithm is crucial to build a professional knowledge graph in power system. Unfortunately, there are a large number of poorly described and specialized texts in the power business system, and the amount of data containing valid labels in these texts is low. This will bring great challenges to improve the precision of text classification models. To offset the gap, we propose a classification algorithm for Chinese text in the power system based on deep active learning (CCTP-DAL). Our core idea is to apply a hierarchical confidence strategy to a deep active learning model, to balance the trade-offs between the amount of training data and the accuracy of text classification. Our CCTP-DAL (1) trains the Bert model using a small amount of labeled data to calculate the confidence level of each short text, (2) selects high-confidence text data with optimal model generalization capability based on the hierarchical confidence level, and (3) fuses deep learning models and active learning strategies to ensure high text classification accuracy with less labeled training data. We benchmark our model on a real crawler data on the web with extensive experiments. The experimental results demonstrate that our proposed model can achieve higher text classification accuracy with less labeled training data compared with other deep learning models.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Classification algorithm for Chinese web text based on manifold learning
    Shi, Shengli
    Fu, Zhibin
    Li, Jinzhao
    Shi, S. (Shengli10@126.com), 2012, Advanced Institute of Convergence Information Technology (06) : 196 - 204
  • [32] Chinese Short Text Classification Based on Interactive Attention Mechanism
    Bian, Qinyu
    Rao, Yuan
    Wang, Leipeng
    Yang, Fan
    Dong, Shipeng
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN SIGNAL PROCESSING AND ARTIFICIAL INTELLIGENCE, ASPAI' 2020, 2020, : 119 - 123
  • [33] Chinese Text Sentiment Classification Based on Extreme Learning Machine
    Lin, Fangye
    Yu, Yuanlong
    PROCEEDINGS OF ELM-2016, 2018, 9 : 171 - 181
  • [34] An incremental Chinese text classification algorithm based on quick clustering
    Ma, Houfeng
    Fan, Xinghua
    Chen, Ji
    2008 INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING AND 2008 INTERNATIONAL PACIFIC WORKSHOP ON WEB MINING AND WEB-BASED APPLICATION, 2008, : 308 - 312
  • [35] Classification technique of chinese agricultural text information based on SVM
    College of Information and Electrical Engineering, China Agricultural University, Beijing
    100083, China
    不详
    100097, China
    Nongye Jixie Xuebao, (174-179):
  • [36] Chinese Web Text Classification Model Based on Manifold Learning
    Shi, Shengli
    Fu, Zhibin
    Li, Jinzhao
    INFORMATION COMPUTING AND APPLICATIONS, PT 1, 2012, 307 : 722 - +
  • [37] Research of Chinese-text automatic classification based on SVM
    Coll. of Management, Univ. of Shanghai Science and Technology, Shanghai 200093, China
    Xi Tong Cheng Yu Dian Zi Ji Shu/Syst Eng Electron, 2007, 3 (475-478):
  • [38] Sentiment Classification for Chinese Text Based on Interactive Multitask Learning
    Zhang, Han
    Sun, Shaoqi
    Hu, Yongjin
    Liu, Junxiu
    Guo, Yuanbo
    IEEE ACCESS, 2020, 8 (08): : 129626 - 129635
  • [39] Chinese Text Classification Method Based on BERT Word Embedding
    Wang, Ziniu
    Huang, Zhilin
    Gao, Jianling
    2020 5TH INTERNATIONAL CONFERENCE ON MATHEMATICS AND ARTIFICIAL INTELLIGENCE (ICMAI 2020), 2020, : 66 - 71
  • [40] N-grams based feature selection and text representation for Chinese Text Classification
    Zhihua Wei
    Duoqian Miao
    Jean Hugues Chauchat
    Rui Zhao
    Wen Li
    International Journal of Computational Intelligence Systems, 2009, 2 (4) : 365 - 374