Using Web structure and summarisation techniques for Web content mining

被引:13
|
作者
Chen, LH [1 ]
Chue, WL [1 ]
机构
[1] Nanyang Technol Univ, Div Informat Engn, Sch Elect & Elect Engn, Singapore 639798, Singapore
关键词
knowledge representation of Web documents; Web structure; summarisation; Web content mining; content-based automatic Web document clustering;
D O I
10.1016/j.ipm.2004.08.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The dynamic nature and size of the Internet can result in difficulty finding relevant information. Most users typically express their information need via short queries to search engines and they often have to physically sift through the search results based on relevance ranking set by the search engines, making the process of relevance judgement time-consuming. In this paper, we describe a novel representation technique which makes use of the Web structure together with summarisation techniques to better represent knowledge in actual Web Documents. We named the proposed technique as Semantic Virtual Document (SVD). We will discuss how the proposed SVD can be used together with a suitable clustering algorithm to achieve an automatic content-based categorization of similar Web Documents. The auto-categorization facility as well as a "Tree-like" Graphical User Interface (GUI) for post-retrieval document browsing enhances the relevance judgement process for Internet users. Furthermore, we will introduce how our cluster-biased automatic query expansion technique can be used to overcome the ambiguity of short queries typically given by users. We will outline our experimental design to evaluate the effectiveness of the proposed SVD for representation and present a prototype called iSEARCH (Intelligent SEarch And Review of Cluster Hierarchy) for Web content mining. Our results confirm, quantify and extend previous research using Web structure and summarisation techniques, introducing novel techniques for knowledge representation to enhance Web content mining. (c) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1225 / 1242
页数:18
相关论文
共 50 条
  • [31] ENRICHED CONTENT MINING FOR WEB APPLICATIONS
    Dhivya, G.
    Deepika, K.
    Kavitha, J.
    Kumari, V. Nithya
    [J]. 2015 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2015,
  • [32] Web Data Mining Trends and Techniques
    Patil, Ujwala Manoj
    Patil, J. B.
    [J]. PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI'12), 2012, : 961 - 965
  • [33] Customer segmentation by web content mining
    Zhou, Jinfeng
    Wei, Jinliang
    Xu, Bugao
    [J]. JOURNAL OF RETAILING AND CONSUMER SERVICES, 2021, 61
  • [34] Advanced AI Techniques for Web Mining
    Dzitac, Ioan
    Moisil, Ioana
    [J]. MATHEMATICAL METHODS, COMPUTATIONAL TECHNIQUES, NON-LINEAR SYSTEMS, INTELLIGENT SYSTEMS, 2008, : 343 - +
  • [35] Personalized multilingual Web content mining
    Chau, R
    Yeh, CH
    Smith, KA
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2004, 3213 : 155 - 163
  • [36] Identifying ITC Patterns by Industries Using Web Content Mining
    Ratiu-Suciu, Camelia
    Luban, Florica
    Ciolac, Camelia Elena
    [J]. PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATIONS, 2009, : 571 - 577
  • [37] A Technical Study on Information Retrieval using Web Mining Techniques
    Srinaganya, G.
    Sathiaseelan, J. G. R.
    [J]. 2015 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2015,
  • [38] A music information system automatically generated via Web content mining techniques
    Schedl, Markus
    Widmer, Gerhard
    Knees, Peter
    Pohle, Tim
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2011, 47 (03) : 426 - 439
  • [39] A Heuristic Mining Algorithm Using Web Hyperlink Structure
    Chai, Chunlai
    [J]. PROGRESS IN MEASUREMENT AND TESTING, PTS 1 AND 2, 2010, 108-111 : 11 - 16
  • [40] A Systematic Review Web Content Mining Tools and its Applications Systematic Review Web Content Mining Tools
    Pujar, Manjunath
    Mundada, Monica R.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (08) : 752 - 759