Cross-Domain Text Classification Based on BERT Model

被引:3
|
作者
Zhang, Kuan [1 ]
Hei, Xinhong [1 ]
Fei, Rong [1 ]
Guo, Yufan [1 ]
Jiao, Rui [1 ]
机构
[1] Xian Univ Technol, Xian, Peoples R China
关键词
Text classification; BERT model; K-means;
D O I
10.1007/978-3-030-73216-5_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diversity of structure and classification are difficulties for information security data. With the popularization of big data technology, cross-domain text classification becomes increasingly important for the information security domain. In this paper, we propose a new text classification structure based on the BERT model. Firstly, the BERT model is used to generate the text sentence vector, and then we construct the similarity matrix by calculating the cosine similarity. Finally, the k-means and mean-shift clustering are used to extract the data feature structure. Through this structure, clustering operations are performed on the benchmark data set and the actual problems. The text information can be classified, and the effective clustering results can be obtained. At the same time, clustering evaluation indicators are used to verify the performance of the model on these datasets. Experimental results demonstrate the effectiveness of the proposed structure in the two indexes Silhouette coefficient and Calinski-Harabaz.
引用
收藏
页码:197 / 208
页数:12
相关论文
共 50 条
  • [1] JLBert: Japanese Light BERT for Cross-Domain Short Text Classification
    Kayal, Chandrai
    Chattopadhyay, Sayantan
    Gupta, Aryan
    Abrol, Satyen
    Gugol, Archie
    [J]. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, 2024, : 9536 - 9542
  • [2] Cross-Domain Labeled LDA for Cross-Domain Text Classification
    Jing, Baoyu
    Lu, Chenwei
    Wang, Deqing
    Zhuang, Fuzhen
    Niu, Cheng
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 187 - 196
  • [3] Cross-domain Text Sentiment Classification Based on Wasserstein Distance
    Guoyong, Cai
    Lin, Qiang
    Chen, Nannan
    [J]. Journal of Computers (Taiwan), 2019, 30 (06) : 276 - 285
  • [4] Cross-Domain Text Sentiment Classification Based on Wasserstein Distance
    Cai, Guoyong
    Lin, Qiang
    Chen, Nannan
    [J]. SECURITY WITH INTELLIGENT COMPUTING AND BIG-DATA SERVICES, 2020, 895 : 280 - 291
  • [5] Iterative Reinforcement Cross-Domain Text Classification
    Zhang, Di
    Xue, Gui-Rong
    Yu, Yong
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2008, 5139 : 282 - 293
  • [6] Cross-domain knowledge distillation for text classification
    Zhang, Shaokang
    Jiang, Lei
    Tan, Jianlong
    [J]. NEUROCOMPUTING, 2022, 509 : 11 - 20
  • [7] Knowledge-based Document Embedding for Cross-Domain Text Classification
    Li, Yiming
    Wei, Baogang
    Yao, Liang
    Chen, Hui
    Li, Zherong
    [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1395 - 1402
  • [8] A Partially Supervised Cross-Collection Topic Model for Cross-Domain Text Classification
    Bao, Yang
    Collier, Nigel
    Datta, Anindya
    [J]. PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 239 - 247
  • [9] Cross-Domain Text Sentiment Classification Method Based on the CNN-BiLSTM-TE Model
    Zeng, Yuyang
    Zhang, Ruirui
    Yang, Liang
    Song, Sujuan
    [J]. JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2021, 17 (04): : 818 - 833
  • [10] A BERT-Based Aspect-Level Sentiment Analysis Algorithm for Cross-Domain Text
    Liu, Ning
    Zhao, Jianhua
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022