Subspace Clustering of Very Sparse High-Dimensional Data

被引:0
|
作者
Peng, Hankui [1 ]
Pavlidis, Nicos [2 ]
Eckley, Idris [3 ]
Tsalamanis, Ioannis [4 ]
机构
[1] Univ Lancaster, STOR I Ctr Doctoral Training, Lancaster, England
[2] Univ Lancaster, Dept Management Sci, Lancaster, England
[3] Univ Lancaster, Dept Math & Stat, Lancaster, England
[4] Off Natl Stat, Data Sci Campus, Newport, Gwent, Wales
基金
英国工程与自然科学研究理事会;
关键词
Subspace clustering; Principal angles; High-dimensionality; Short texts;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we consider the problem of clustering collections of very short texts using subspace clustering. This problem arises in many applications such as product categorisation, fraud detection, and sentiment analysis. The main challenge lies in the fact that the vectorial representation of short texts is both high-dimensional, due to the large number of unique terms in the corpus, and extremely sparse, as each text contains a very small number of words with no repetition. We propose a new, simple subspace clustering algorithm that relies on linear algebra to cluster such datasets. Experimental results on identifying product categories from product names obtained from the US Amazon website indicate that the algorithm can be competitive against state-of-the-art clustering algorithms.
引用
收藏
页码:3780 / 3783
页数:4
相关论文
共 50 条
  • [41] A Hybrid EA for High-dimensional Subspace Clustering Problem
    Lin, Lin
    Gen, Mitsuo
    Liang, Yan
    [J]. 2014 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2014, : 2855 - 2860
  • [42] On the anonymization of sparse high-dimensional data
    Ghinita, Gabriel
    Tao, Yufei
    Kalnis, Panos
    [J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 715 - +
  • [43] Interpolation of sparse high-dimensional data
    Lux, Thomas C. H.
    Watson, Layne T.
    Chang, Tyler H.
    Hong, Yili
    Cameron, Kirk
    [J]. NUMERICAL ALGORITHMS, 2021, 88 (01) : 281 - 313
  • [44] Interpolation of sparse high-dimensional data
    Thomas C. H. Lux
    Layne T. Watson
    Tyler H. Chang
    Yili Hong
    Kirk Cameron
    [J]. Numerical Algorithms, 2021, 88 : 281 - 313
  • [45] Sparse Regularization in Fuzzy c-Means for High-Dimensional Data Clustering
    Chang, Xiangyu
    Wang, Qingnan
    Liu, Yuewen
    Wang, Yu
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (09) : 2616 - 2627
  • [46] SPARSE CLUSTERING FOR CUSTOMER SEGMENTATION WITH HIGH-DIMENSIONAL MIXED-TYPE DATA
    Wang, Feifei
    Xu, Shaodong
    Qin, Yichen
    Shen, Ye
    Li, Yang
    [J]. ANNALS OF APPLIED STATISTICS, 2024, 18 (03): : 2382 - 2402
  • [47] Analyzing high-dimensional data by subspace validity
    Amir, A
    Kashi, R
    Netanyahu, NS
    [J]. THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 473 - 476
  • [48] High-dimensional data clustering using k-means subspace feature selection
    Wang, Xiao-Dong
    Chen, Rung-Ching
    Yan, Fei
    [J]. Journal of Network Intelligence, 2019, 4 (03): : 80 - 87
  • [49] Self-organizing subspace clustering for high-dimensional and multi-view data
    Araujo, Aluizio F. R.
    Antonino, Victor O.
    Ponce-Guevara, Karina L.
    [J]. NEURAL NETWORKS, 2020, 130 (130) : 253 - 268
  • [50] Clustering in high-dimensional data spaces
    Murtagh, FD
    [J]. STATISTICAL CHALLENGES IN ASTRONOMY, 2003, : 279 - 292