Topic-Constrained Hierarchical Clustering for Document Datasets

被引:0
|
作者
Zhao, Ying [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
关键词
Constrained hierarchical clustering; Semi-supervised learning; Criterion functions;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose the topic-constrained hierarchical clustering, which organizes document datasets into hierarchical trees consistant with a given set of topics. The proposed algorithm is based on a constrained agglomerative clustering framework and a semi-supervised criterion function that emphasizes the relationship between documents and topics and the relationship among documents themselves simultaneously. The experimental evaluation show that our algorithm outperformed the traditional agglomerative algorithm by 7.8% to 11.4%.
引用
收藏
页码:181 / 192
页数:12
相关论文
共 50 条
  • [41] NBC: An Efficient Hierarchical Clustering Algorithm for Large Datasets
    Zhang, Wei
    Zhang, Gongxuan
    Wang, Yongli
    Zhu, Zhaomeng
    Li, Tao
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2015, 9 (03) : 307 - 331
  • [42] Effective data summarization for hierarchical clustering in large datasets
    Bidyut Kr. Patra
    Sukumar Nandi
    Knowledge and Information Systems, 2015, 42 : 1 - 20
  • [43] Topic Detection based on Group Average Hierarchical Clustering
    Gao, Ni
    Gao, Ling
    He, Yiyue
    Wang, Hai
    Sun, Qian
    2013 INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2013, : 88 - 92
  • [44] Building and Assessing a Constrained Clustering Hierarchical Algorithm
    Concepcion Morales, Eduardo R.
    Yurramendi Mendizabal, Yosu
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2008, 5197 : 211 - +
  • [45] Constrained Agglomerative Hierarchical Clustering Algorithms with Penalties
    Miyamoto, Sadaaki
    Terami, Akihisa
    IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 422 - 427
  • [46] A clustering scheme for large high-dimensional document datasets
    Jiang, Jung-Yi
    Chen, Jing-Wen
    Lee, Shie-Jue
    ADVANCES IN COMPUTATION AND INTELLIGENCE, PROCEEDINGS, 2007, 4683 : 511 - 519
  • [47] iVisClustering: An Interactive Visual Document Clustering via Topic Modeling
    Lee, Hanseung
    Kihm, Jaeyeon
    Choo, Jaegul
    Stasko, John
    Park, Haesun
    COMPUTER GRAPHICS FORUM, 2012, 31 (03) : 1155 - 1164
  • [48] Novel Similarity Measure for Document Clustering Based on Topic Phrases
    ELdesoky, A. E.
    Saleh, M.
    Sakr, N. A.
    ICNM: 2009 INTERNATIONAL CONFERENCE ON NETWORKING & MEDIA CONVERGENCE, 2007, : 92 - +
  • [49] A Novel Graph Based Clustering Approach to Document Topic Modeling
    Chanda, Prateek
    Das, Asit Kumar
    2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,
  • [50] TOPICVIEW: VISUAL ANALYSIS OF TOPIC MODELS AND THEIR IMPACT ON DOCUMENT CLUSTERING
    Crossno, Patricia J.
    Wilson, Andrew T.
    Shead, Timothy M.
    Davis, Warren L.
    Dunlavy, Daniel M.
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2013, 22 (05)