Clustering Documents using the Document to Vector Model for Dimensionality Reduction

被引:18
|
作者
Radu, Robert-George [1 ]
Radulescu, Iulia-Maria [1 ]
Truica, Ciprian-Octavian [1 ]
Apostol, Elena-Simona [1 ]
Mocanu, Mariana [1 ]
机构
[1] Univ Politehn Bucuresti, Fac Automat Control & Comp, Comp Sci & Engn Dept, Bucharest, Romania
关键词
text clustering; document embeddings; text preprocessing; clustering evaluation;
D O I
10.1109/aqtr49680.2020.9129967
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The TF-IDF model is the most common way of representing documents in the vector space. However, its results are highly dimensional, posing problems to the classic clustering algorithms due to the curse of dimensionality. Recent word embeddings based techniques can reduce the documents representations dimensionality while also preserving the semantic relationships between words. In this paper, we analyze the accuracy of four different classical clustering algorithms (K-Means, Spherical K-Means, LDA, and DBSCAN) in combination with the Document to Vector model.
引用
收藏
页码:57 / 62
页数:6
相关论文
共 50 条
  • [21] Visualization of Topic Transitions in SNSs Using Document Embedding and Dimensionality Reduction
    Xiao, Tiandong
    Onoue, Yosuke
    2021 IEEE 14TH PACIFIC VISUALIZATION SYMPOSIUM (PACIFICVIS 2021), 2021, : 216 - 220
  • [22] Dimensionality Reduction Assisted Tensor Clustering
    Sun, Yanfeng
    Gao, Junbin
    Hong, Xia
    Guo, Yi
    Harris, Chris J.
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 1565 - 1572
  • [23] Dimensionality reduction and clustering on statistical manifolds
    Lee, Sang-Mook
    Abbott, A. Lynn
    Ararnan, Philip A.
    2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, : 3125 - +
  • [24] Clustering and dimensionality reduction on Riemannian manifolds
    Goh, Alvina
    Vidal, Rene
    2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 626 - 632
  • [25] An Efficient Productive Feature Selection and Document Clustering (PFS-DocC) Model for Document Clustering Document Clustering using PFS-DocC Model
    Pitchandi, Perumal
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (04) : 125 - 133
  • [26] Defining Conformational States of Proteins using Dimensionality Reduction and Clustering Algorithms
    Klyshko, Eugene
    Rauscher, Sarah
    BIOPHYSICAL JOURNAL, 2019, 116 (03) : 290A - 290A
  • [27] Using Dimensionality Reduction and Clustering Techniques to Classify Space Plasma Regimes
    Bakrania, Mayur R.
    Rae, I. Jonathan
    Walsh, Andrew P.
    Verscharen, Daniel
    Smith, Andy W.
    FRONTIERS IN ASTRONOMY AND SPACE SCIENCES, 2020, 7
  • [28] Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering
    Juvonen, Antti
    Sipola, Tuomo
    IV INTERNATIONAL CONGRESS ON ULTRA MODERN TELECOMMUNICATIONS AND CONTROL SYSTEMS 2012 (ICUMT), 2012, : 274 - 279
  • [29] Approaches of Dimensionality Reduction for Telugu Document Classification
    Reddy, P. Vijayapal
    Sasidhar, B.
    Reddy, B. Harinatha
    Vardhan, B. Vishnu
    Reddy, L. Pratap
    Govardhan, A.
    2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2009, : 259 - 264
  • [30] Embedding via clustering: Using spectral information to guide dimensionality reduction
    Memisevic, R
    Hinton, G
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 3198 - 3203