Text clustering with local semantic kernels

被引:10
|
作者
AlSumait, Loulwah [1 ]
Domeniconi, Carlotta [1 ]
机构
[1] George Mason Univ, Dept Comp Sci, Fairfax, VA 22030 USA
关键词
D O I
10.1007/978-1-84800-046-9_5
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Document clustering is a fundamental task of text mining, by which efficient organization, navigation, summarization, and retrieval of documents can be achieved. The clustering of documents presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of natural language. Subspace clustering is an extension of traditional clustering that is designed to capture local feature relevance, and to group documents with respect to the features (or words) that matter the most. This chapter presents a subspace clustering technique based on a locally adaptive clustering (LAC) algorithm. To improve the subspace clustering of documents and the identification of keywords achieved by LAC, kernel methods and semantic distances are deployed. The basic idea is to define a local kernel for each cluster by which semantic distances between pairs of words are computed to derive the clustering and local term weightings. The proposed approach, called semantic LAC, is evaluated using benchmark datasets. Our experiments show that semantic LAC is capable of improving the clustering quality.
引用
收藏
页码:87 / 105
页数:19
相关论文
共 50 条
  • [21] Enhancing Text Clustering Performance Using Semantic Similarity
    Gad, Walaa K.
    Kamel, Mohamed S.
    ENTERPRISE INFORMATION SYSTEMS-BK, 2009, 24 : 325 - 335
  • [22] Uyghur text clustering based on semantic word set
    Tian, Shengwei
    Zhai, Xianmin
    Yu, Long
    Guo, Hanjun
    Journal of Computational Information Systems, 2013, 9 (02): : 781 - 790
  • [23] Semantic kernels for text classification based on topological measures of feature similarity
    Bloehdorn, Stephan
    Basili, Roberto
    Cammisa, Marco
    Moschitti, Alessandro
    ICDM 2006: Sixth International Conference on Data Mining, Proceedings, 2006, : 808 - 812
  • [24] INTENT DISCOVERY THROUGH UNSUPERVISED SEMANTIC TEXT CLUSTERING
    Padmasundari
    Bangalore, Srinivas
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 606 - 610
  • [25] Text Clustering Algorithm Based on Semantic Graph Structure
    Bai, Qiuchan
    Jin, Chunxia
    PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2016, : 312 - 316
  • [26] Semantic Models for Style-based Text Clustering
    Leoncini, Alessio
    Sangiacomo, Fabio
    Peretti, Chiara
    Argentesi, Sonia
    Zunino, Rodolfo
    Cambria, Erik
    FIFTH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2011), 2011, : 75 - 82
  • [27] Building Semantic Cognitive Maps with Text Embedding and Clustering
    Choudhary, Rishabh
    Alsayed, Omar
    Doboli, Simona
    Minai, Ali A.
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [28] Combining semantic and term frequency similarities for text clustering
    Andrade Soares, Victor Hugo
    Campello, Ricardo J. G. B.
    Nourashrafeddin, Seyednaser
    Milios, Evangelos
    Naldi, Murilo Coelho
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (03) : 1485 - 1516
  • [29] Using Graph-Kernels to Represent Semantic Information in Text Classification
    Goncalves, Teresa
    Quaresma, Paulo
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, 2009, 5632 : 632 - 646
  • [30] Development and Research of the Text Messages Semantic Clustering Methodology
    Rizun, Nina
    Kaplanski, Pawel
    Taranenko, Yurii
    2016 THIRD EUROPEAN NETWORK INTELLIGENCE CONFERENCE (ENIC 2016), 2016, : 180 - 187