Text clustering based on pre-trained models and autoencoders

被引:3
|
作者
Xu, Qiang [1 ]
Gu, Hao [1 ]
Ji, ShengWei [1 ]
机构
[1] Hefei Univ, Sch Artificial Intelligence & Big Data, Hefei, Anhui, Peoples R China
关键词
text clustering; medical; deep learning; pre-trained models; autoencoder; deep embedded clustering model; NEURAL-NETWORKS;
D O I
10.3389/fncom.2023.1334436
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Text clustering is the task of grouping text data based on similarity, and it holds particular importance in the medical field. sIn healthcare, medical data clustering is a highly active and effective research area. It not only provides strong support for making correct medical decisions from medical datasets but also aids in patient record management and medical information retrieval. With the development of the healthcare industry, a large amount of medical data is being generated, and traditional medical data clustering faces significant challenges. Many existing text clustering algorithms are primarily based on the bag-of-words model, which has issues such as high dimensionality, sparsity, and the neglect of word positions and context. Pre-trained models are a deep learning-based approach that treats text as a sequence to accurately capture word positions and context information. Moreover, compared to traditional K-means and fuzzy C-means clustering models, deep learning-based clustering algorithms are better at handling high-dimensional, complex, and nonlinear data. In particular, clustering algorithms based on autoencoders can learn data representations and clustering information, effectively reducing noise interference and errors during the clustering process. This paper combines pre-trained language models with deep embedding clustering models. Experimental results demonstrate that our model performs exceptionally well on four public datasets, outperforming most existing text clustering algorithms, and can be applied to medical data clustering.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader
    Xu, Weiwen
    Li, Xin
    Zhang, Wenxuan
    Zhou, Meng
    Lam, Wai
    Si, Luo
    Bing, Lidong
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [32] Annotating Columns with Pre-trained Language Models
    Suhara, Yoshihiko
    Li, Jinfeng
    Li, Yuliang
    Zhang, Dan
    Demiralp, Cagatay
    Chen, Chen
    Tan, Wang-Chiew
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1493 - 1503
  • [33] Effects of Pre-trained Word Embeddings on Text-based Deception Detection
    Nam, David
    Yasmin, Jerin
    Zulkernine, Farhana
    [J]. 2020 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2020, : 437 - 443
  • [34] Lottery Jackpots Exist in Pre-Trained Models
    Zhang, Yuxin
    Lin, Mingbao
    Zhong, Yunshan
    Chao, Fei
    Ji, Rongrong
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 14990 - 15004
  • [35] LaoPLM: Pre-trained Language Models for Lao
    Lin, Nankai
    Fu, Yingwen
    Yang, Ziyu
    Chen, Chuwei
    Jiang, Shengyi
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6506 - 6512
  • [36] Interpreting Art by Leveraging Pre-Trained Models
    Penzel, Niklas
    Denzler, Joachim
    [J]. 2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
  • [37] Natural Attack for Pre-trained Models of Code
    Yang, Zhou
    Shi, Jieke
    He, Junda
    Lo, David
    [J]. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 1482 - 1493
  • [38] Generalization of vision pre-trained models for histopathology
    Milad Sikaroudi
    Maryam Hosseini
    Ricardo Gonzalez
    Shahryar Rahnamayan
    H. R. Tizhoosh
    [J]. Scientific Reports, 13
  • [39] Pre-trained models: Past, present and future
    Han, Xu
    Zhang, Zhengyan
    Ding, Ning
    Gu, Yuxian
    Liu, Xiao
    Huo, Yuqi
    Qiu, Jiezhong
    Yao, Yuan
    Zhang, Ao
    Zhang, Liang
    Han, Wentao
    Huang, Minlie
    Jin, Qin
    Lan, Yanyan
    Liu, Yang
    Liu, Zhiyuan
    Lu, Zhiwu
    Qiu, Xipeng
    Song, Ruihua
    Tang, Jie
    Wen, Ji-Rong
    Yuan, Jinhui
    Zhao, Wayne Xin
    Zhu, Jun
    [J]. AI OPEN, 2021, 2 : 225 - 250
  • [40] Learning to Modulate pre-trained Models in RL
    Schmied, Thomas
    Hofmarcher, Markus
    Paischer, Fabian
    Pascanu, Razvan
    Hochreiter, Sepp
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,