A Chinese text classification based on active

被引:2
|
作者
Deng, Song [1 ]
Li, Qianliang [1 ]
Dai, Renjie [2 ]
Wei, Siming [2 ]
Wu, Di [3 ]
He, Yi [4 ]
Wu, Xindong [5 ]
机构
[1] Nanjing Univ Post & Telecommun, Inst Adv Technol, Nanjing 210003, Peoples R China
[2] State Grid Shanghai Municipal Elect Power Co, Shanghai 200122, Peoples R China
[3] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
[4] Old Dominion Univ, Norfolk, VA 23462 USA
[5] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Minist Educ China, Hefei 230009, Peoples R China
关键词
Natural language processing; Deep active learning; Hierarchical confidence; Power text; Knowledge graph; ALGORITHM;
D O I
10.1016/j.asoc.2023.111067
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The construction of knowledge graph is beneficial for grid production, electrical safety protection, fault diagnosis and traceability in an observable and controllable way. Highly-precision text classification algorithm is crucial to build a professional knowledge graph in power system. Unfortunately, there are a large number of poorly described and specialized texts in the power business system, and the amount of data containing valid labels in these texts is low. This will bring great challenges to improve the precision of text classification models. To offset the gap, we propose a classification algorithm for Chinese text in the power system based on deep active learning (CCTP-DAL). Our core idea is to apply a hierarchical confidence strategy to a deep active learning model, to balance the trade-offs between the amount of training data and the accuracy of text classification. Our CCTP-DAL (1) trains the Bert model using a small amount of labeled data to calculate the confidence level of each short text, (2) selects high-confidence text data with optimal model generalization capability based on the hierarchical confidence level, and (3) fuses deep learning models and active learning strategies to ensure high text classification accuracy with less labeled training data. We benchmark our model on a real crawler data on the web with extensive experiments. The experimental results demonstrate that our proposed model can achieve higher text classification accuracy with less labeled training data compared with other deep learning models.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Chinese Text Classification Based On LDA and KSVM
    Liang, Congwei
    Liu, Yong
    Du, Haiqing
    [J]. PROCEEDINGS OF THE 2015 JOINT INTERNATIONAL MECHANICAL, ELECTRONIC AND INFORMATION TECHNOLOGY CONFERENCE (JIMET 2015), 2015, 10 : 379 - 383
  • [2] Chinese web page classification based on text contents
    Liang, JZ
    [J]. ISTM/2003: 5TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-6, CONFERENCE PROCEEDINGS, 2003, : 4733 - 4736
  • [3] A Chinese text classification algorithm based on granular computing
    Qiu, Taorong
    Huang, Houkuan
    Liu, Qing
    [J]. WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 4042 - +
  • [4] Chinese Text Sentiment Classification based on Granule Network
    Zhang Xia
    Wang Suzhen
    Xu Mingzhu
    Yin Yixin
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING ( GRC 2009), 2009, : 775 - +
  • [5] Chinese Text Classification Model Based on Deep Learning
    Li, Yue
    Wang, Xutao
    Xu, Pengjian
    [J]. FUTURE INTERNET, 2018, 10 (11):
  • [6] Chinese Text Classification Based on Ant Colony Optimization
    Luo Xin
    [J]. PROCEEDINGS OF THE 2015 4TH NATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER ENGINEERING ( NCEECE 2015), 2016, 47 : 37 - 41
  • [7] A vector-based algorithm for Chinese text classification
    Luo, CR
    He, TT
    [J]. PACLIC 17: Language, Information and Computation, Proceedings, 2003, : 235 - 242
  • [8] The Instructional Design of Chinese Text Classification based on SVM
    Wei, Sichao
    Guo, Jianyi
    Yu, Zhengtao
    Chen, Peng
    Xian, Yantuan
    [J]. 2013 25TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2013, : 5114 - 5117
  • [9] Short Chinese Text Classification Based on Correlation Analysis
    Zheng, Chenyang
    Usagawa, Tsuyoshi
    [J]. PROCEEDINGS OF 2017 11TH INTERNATIONAL CONFERENCE ON INFORMATION & COMMUNICATION TECHNOLOGY AND SYSTEMS (ICTS), 2017, : 265 - 268
  • [10] Integrated features based sentiment classification for Chinese text
    Gan, Xiaohong
    [J]. Journal of Convergence Information Technology, 2012, 7 (19) : 450 - 458