Clustering web documents using hierarchical representation with multi-granularity

被引:11
|
作者
Huang, Faliang [1 ]
Zhang, Shichao [2 ,5 ]
He, Minghua [3 ]
Wu, Xindong [4 ]
机构
[1] Fujian Normal Univ, Fac Software, Fuzhou 350007, Peoples R China
[2] Guangxi Normal Univ, Coll Comp Sci & IT, Guilin 541004, Peoples R China
[3] Aston Univ, Birmingham B4 7ET, Aston Triangle, England
[4] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
[5] Univ Technol Sydney, Fac Engn & Informat Technol, Broadway, NSW 2007, Australia
基金
澳大利亚研究理事会;
关键词
web document clustering; hierarchical representation; multi-granularity; INFORMATION GRANULATION;
D O I
10.1007/s11280-012-0197-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Web document cluster analysis plays an important role in information retrieval by organizing large amounts of documents into a small number of meaningful clusters. Traditional web document clustering is based on the Vector Space Model (VSM), which takes into account only two-level (document and term) knowledge granularity but ignores the bridging paragraph granularity. However, this two-level granularity may lead to unsatisfactory clustering results with "false correlation". In order to deal with the problem, a Hierarchical Representation Model with Multi-granularity (HRMM), which consists of five-layer representation of data and a two-phase clustering process is proposed based on granular computing and article structure theory. To deal with the zero-valued similarity problem resulted from the sparse term-paragraph matrix, an ontology based strategy and a tolerance-rough-set based strategy are introduced into HRMM. By using granular computing, structural knowledge hidden in documents can be more efficiently and effectively captured in HRMM and thus web document clusters with higher quality can be generated. Extensive experiments show that HRMM, HRMM with tolerance-rough-set strategy, and HRMM with ontology all outperform VSM and a representative non VSM-based algorithm, WFP, significantly in terms of the F-Score.
引用
收藏
页码:105 / 126
页数:22
相关论文
共 50 条
  • [31] Multi-granularity Network Representation Learning Based on Game Theory
    Shu, Hang
    Liu, Qun
    Xia, Shuyin
    2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2018, : 454 - 461
  • [32] Receptive Multi-Granularity Representation for Person Re-Identification
    Wang, Guanshuo
    Yuan, Yufeng
    Li, Jiwei
    Ge, Shiming
    Zhou, Xi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 6096 - 6109
  • [33] Robust Object Tracking Based on Multi-granularity Sparse Representation
    Chu, Honglin
    Wen, Jiajun
    Lai, Zhihui
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: VISUAL DATA ENGINEERING, PT I, 2019, 11935 : 142 - 154
  • [34] Multi-granularity context model for dynamic Web service composition
    Niu, Wenjia
    Li, Gang
    Zhao, Zhijun
    Tang, Hui
    Shi, Zhongzhi
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2011, 34 (01) : 312 - 326
  • [35] Supporting web query expansion efficiently using multi-granularity indexing and query processing
    Li, WS
    Agrawal, D
    DATA & KNOWLEDGE ENGINEERING, 2000, 35 (03) : 239 - 257
  • [36] A Multi-Granularity FPGA with Hierarchical Interconnects for Efficient and Flexible Mobile Computing
    Wang, Cheng C.
    Yuan, Fang-Li
    Yu, Tsung-Han
    Markovic, Dejan
    2014 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE DIGEST OF TECHNICAL PAPERS (ISSCC), 2014, 57 : 460 - +
  • [37] Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification
    Chen, Jingzhou
    Wang, Peng
    Liu, Jian
    Qian, Yuntao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4848 - 4857
  • [38] Design of ring networks based on parallel multi-granularity hierarchical OADMs
    Qi, YM
    Su, YK
    Jin, YH
    Hu, WS
    Zhu, Y
    Zhang, Y
    NETWORK ARCHITECTURES, MANAGEMENT, AND APPLICATIONS III, PTS 1 AND 2, 2005, 6022
  • [39] Improving unsupervised keyphrase extraction by modeling hierarchical multi-granularity features
    Zhang, Zhihao
    Liang, Xinnian
    Zuo, Yuan
    Lin, Chenghua
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (04)
  • [40] Multi-Granularity Decomposition for Componentized Multimedia Applications based on Graph Clustering
    Wang, Ziliang
    Zhou, Fanqin
    Feng, Lei
    Li, Wenjing
    2021 IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING (BMSB), 2021,