Clustering web documents using hierarchical representation with multi-granularity

被引:11
|
作者
Huang, Faliang [1 ]
Zhang, Shichao [2 ,5 ]
He, Minghua [3 ]
Wu, Xindong [4 ]
机构
[1] Fujian Normal Univ, Fac Software, Fuzhou 350007, Peoples R China
[2] Guangxi Normal Univ, Coll Comp Sci & IT, Guilin 541004, Peoples R China
[3] Aston Univ, Birmingham B4 7ET, Aston Triangle, England
[4] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
[5] Univ Technol Sydney, Fac Engn & Informat Technol, Broadway, NSW 2007, Australia
基金
澳大利亚研究理事会;
关键词
web document clustering; hierarchical representation; multi-granularity; INFORMATION GRANULATION;
D O I
10.1007/s11280-012-0197-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Web document cluster analysis plays an important role in information retrieval by organizing large amounts of documents into a small number of meaningful clusters. Traditional web document clustering is based on the Vector Space Model (VSM), which takes into account only two-level (document and term) knowledge granularity but ignores the bridging paragraph granularity. However, this two-level granularity may lead to unsatisfactory clustering results with "false correlation". In order to deal with the problem, a Hierarchical Representation Model with Multi-granularity (HRMM), which consists of five-layer representation of data and a two-phase clustering process is proposed based on granular computing and article structure theory. To deal with the zero-valued similarity problem resulted from the sparse term-paragraph matrix, an ontology based strategy and a tolerance-rough-set based strategy are introduced into HRMM. By using granular computing, structural knowledge hidden in documents can be more efficiently and effectively captured in HRMM and thus web document clusters with higher quality can be generated. Extensive experiments show that HRMM, HRMM with tolerance-rough-set strategy, and HRMM with ontology all outperform VSM and a representative non VSM-based algorithm, WFP, significantly in terms of the F-Score.
引用
收藏
页码:105 / 126
页数:22
相关论文
共 50 条
  • [41] Multiple heterogeneous network representation learning based on multi-granularity fusion
    Manyi Liu
    Guoyin Wang
    Jun Hu
    Ke Chen
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 817 - 832
  • [42] Multi-granularity Hierarchical Feature Extraction for Question-Answering Understanding
    Xingguo Qin
    Ya Zhou
    Guimin Huang
    Maolin Li
    Jun Li
    Cognitive Computation, 2023, 15 : 121 - 131
  • [43] Learning Global and Multi-granularity Local Representation with MLP for Sequential Recommendation
    Long, Chao
    Yuan, Huanhuan
    Fang, Junhua
    Xian, Xuefeng
    Liu, Guanfeng
    Sheng, Victor S.
    Zhao, Pengpeng
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (04)
  • [44] Multi-granularity Hierarchical Feature Extraction for Question-Answering Understanding
    Qin, Xingguo
    Zhou, Ya
    Huang, Guimin
    Li, Maolin
    Li, Jun
    COGNITIVE COMPUTATION, 2023, 15 (01) : 121 - 131
  • [45] Hierarchical synchronization with structured multi-granularity interaction for video question answering
    Qi, Shanshan
    Yang, Luxi
    Li, Chunguo
    NEUROCOMPUTING, 2024, 582
  • [46] A Multi-Granularity FPGA With Hierarchical Interconnects for Efficient and Flexible Mobile Computing
    Yuan, Fang-Li
    Wang, Cheng C.
    Yu, Tsung-Han
    Markovic, Dejan
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2015, 50 (01) : 137 - 149
  • [47] A Multi-Granularity Features Representation and Dimensionality Reduction Network for Website Fingerprinting
    Ding, Yaojun
    Hu, Bingxuan
    IEEE ACCESS, 2025, 13 : 574 - 587
  • [48] Multiple heterogeneous network representation learning based on multi-granularity fusion
    Liu, Manyi
    Wang, Guoyin
    Hu, Jun
    Chen, Ke
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (03) : 817 - 832
  • [49] QoS-based selection of multi-granularity web services for the composition
    Zhou B.
    Yin K.
    Jiang H.
    Zhang S.
    Kavs A.J.
    Journal of Software, 2011, 6 (03) : 366 - 373
  • [50] Multi-granularity Fatigue in Recommendation
    Xie, Ruobing
    Ling, Cheng
    Zhang, Shaoliang
    Xia, Feng
    Lin, Leyu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4595 - 4599