Clustering web documents using hierarchical representation with multi-granularity

被引:11
|
作者
Huang, Faliang [1 ]
Zhang, Shichao [2 ,5 ]
He, Minghua [3 ]
Wu, Xindong [4 ]
机构
[1] Fujian Normal Univ, Fac Software, Fuzhou 350007, Peoples R China
[2] Guangxi Normal Univ, Coll Comp Sci & IT, Guilin 541004, Peoples R China
[3] Aston Univ, Birmingham B4 7ET, Aston Triangle, England
[4] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
[5] Univ Technol Sydney, Fac Engn & Informat Technol, Broadway, NSW 2007, Australia
基金
澳大利亚研究理事会;
关键词
web document clustering; hierarchical representation; multi-granularity; INFORMATION GRANULATION;
D O I
10.1007/s11280-012-0197-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Web document cluster analysis plays an important role in information retrieval by organizing large amounts of documents into a small number of meaningful clusters. Traditional web document clustering is based on the Vector Space Model (VSM), which takes into account only two-level (document and term) knowledge granularity but ignores the bridging paragraph granularity. However, this two-level granularity may lead to unsatisfactory clustering results with "false correlation". In order to deal with the problem, a Hierarchical Representation Model with Multi-granularity (HRMM), which consists of five-layer representation of data and a two-phase clustering process is proposed based on granular computing and article structure theory. To deal with the zero-valued similarity problem resulted from the sparse term-paragraph matrix, an ontology based strategy and a tolerance-rough-set based strategy are introduced into HRMM. By using granular computing, structural knowledge hidden in documents can be more efficiently and effectively captured in HRMM and thus web document clusters with higher quality can be generated. Extensive experiments show that HRMM, HRMM with tolerance-rough-set strategy, and HRMM with ontology all outperform VSM and a representative non VSM-based algorithm, WFP, significantly in terms of the F-Score.
引用
收藏
页码:105 / 126
页数:22
相关论文
共 50 条
  • [31] Multi-Granularity Representation Learning for Encrypted Malicious Traffic Detection
    Gu, Yong-Hao
    Xu, Hao
    Zhang, Xiao-Qing
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (09): : 1888 - 1899
  • [32] Robust Object Tracking Based on Multi-granularity Sparse Representation
    Chu, Honglin
    Wen, Jiajun
    Lai, Zhihui
    [J]. INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: VISUAL DATA ENGINEERING, PT I, 2019, 11935 : 142 - 154
  • [33] Multi-granularity context model for dynamic Web service composition
    Niu, Wenjia
    Li, Gang
    Zhao, Zhijun
    Tang, Hui
    Shi, Zhongzhi
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2011, 34 (01) : 312 - 326
  • [34] Supporting web query expansion efficiently using multi-granularity indexing and query processing
    Li, WS
    Agrawal, D
    [J]. DATA & KNOWLEDGE ENGINEERING, 2000, 35 (03) : 239 - 257
  • [35] A Multi-Granularity FPGA with Hierarchical Interconnects for Efficient and Flexible Mobile Computing
    Wang, Cheng C.
    Yuan, Fang-Li
    Yu, Tsung-Han
    Markovic, Dejan
    [J]. 2014 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE DIGEST OF TECHNICAL PAPERS (ISSCC), 2014, 57 : 460 - +
  • [36] Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification
    Chen, Jingzhou
    Wang, Peng
    Liu, Jian
    Qian, Yuntao
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4848 - 4857
  • [37] Multi-granularity Hierarchical Feature Extraction for Question-Answering Understanding
    Xingguo Qin
    Ya Zhou
    Guimin Huang
    Maolin Li
    Jun Li
    [J]. Cognitive Computation, 2023, 15 : 121 - 131
  • [38] Improving unsupervised keyphrase extraction by modeling hierarchical multi-granularity features
    Zhang, Zhihao
    Liang, Xinnian
    Zuo, Yuan
    Lin, Chenghua
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (04)
  • [39] Design of ring networks based on parallel multi-granularity hierarchical OADMs
    Qi, YM
    Su, YK
    Jin, YH
    Hu, WS
    Zhu, Y
    Zhang, Y
    [J]. NETWORK ARCHITECTURES, MANAGEMENT, AND APPLICATIONS III, PTS 1 AND 2, 2005, 6022
  • [40] Multiple heterogeneous network representation learning based on multi-granularity fusion
    Manyi Liu
    Guoyin Wang
    Jun Hu
    Ke Chen
    [J]. International Journal of Machine Learning and Cybernetics, 2023, 14 : 817 - 832