Text clustering with local semantic kernels

被引:10
|
作者
AlSumait, Loulwah [1 ]
Domeniconi, Carlotta [1 ]
机构
[1] George Mason Univ, Dept Comp Sci, Fairfax, VA 22030 USA
关键词
D O I
10.1007/978-1-84800-046-9_5
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Document clustering is a fundamental task of text mining, by which efficient organization, navigation, summarization, and retrieval of documents can be achieved. The clustering of documents presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of natural language. Subspace clustering is an extension of traditional clustering that is designed to capture local feature relevance, and to group documents with respect to the features (or words) that matter the most. This chapter presents a subspace clustering technique based on a locally adaptive clustering (LAC) algorithm. To improve the subspace clustering of documents and the identification of keywords achieved by LAC, kernel methods and semantic distances are deployed. The basic idea is to define a local kernel for each cluster by which semantic distances between pairs of words are computed to derive the clustering and local term weightings. The proposed approach, called semantic LAC, is evaluated using benchmark datasets. Our experiments show that semantic LAC is capable of improving the clustering quality.
引用
收藏
页码:87 / 105
页数:19
相关论文
共 50 条
  • [41] An Ontology-based Semantic Clustering Algorithm for Accounting Text
    Jiang, Yanhui
    Li, Mo
    Yao, Kaohua
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS & STATISTICS, 2013, 43 (13): : 59 - 67
  • [42] Semi-Supervised Semantic Dynamic Text Clustering Algorithm
    Qian Z.-S.
    Huang R.-Z.
    Wei Q.
    Qin Y.-B.
    Chen Y.-P.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2019, 48 (06): : 803 - 808
  • [43] SSVAE: A Deep Variational Text Clustering Model with Semantic Supplementation
    Jingjing, Xue
    Yongbin, Qin
    Ruizhang, Huang
    Lina, Ren
    Yanping, Chen
    Data Analysis and Knowledge Discovery, 2022, 6 (06) : 71 - 83
  • [44] Text Clustering Based on Domain Ontology and Latent Semantic Analysis
    Li Yaxiong
    Pan Deng
    MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 3536 - +
  • [45] Improved Semantic Similarity Method Based on HowNet for Text Clustering
    Nie, Hongmei
    Zhou, Jiaqing
    Guo, Qi
    Huang, Zhiqi
    2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2018), 2018, : 266 - 269
  • [46] Exploiting noun phrases and semantic relationships for text document clustering
    Zheng, Hai-Tao
    Kang, Bo-Yeong
    Kim, Hong-Gee
    INFORMATION SCIENCES, 2009, 179 (13) : 2249 - 2262
  • [47] Extracting Semantic Networks from Text Via Relational Clustering
    Kok, Stanley
    Domingos, Pedro
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART I, PROCEEDINGS, 2008, 5211 : 624 - 639
  • [48] Research on text similarity algorithm based on sentence semantic clustering
    Zhang, J. (zhangjinpengyy1989@163.com), 1600, Binary Information Press (10):
  • [49] Clustering massive text data streams by semantic smoothing model
    Liu, Yubao
    Cai, Jiarong
    Yin, Jian
    Wai-Chee Fu, Ada
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2007, 4632 : 389 - 400
  • [50] Clustering massive text data streams by semantic smoothing model
    Liu, Yubao
    Cai, Jiarong
    Yin, Jian
    Fu, Ada Wai-Chee
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2007, 4632 : 389 - +