Using Web structure and summarisation techniques for Web content mining

被引:13
|
作者
Chen, LH [1 ]
Chue, WL [1 ]
机构
[1] Nanyang Technol Univ, Div Informat Engn, Sch Elect & Elect Engn, Singapore 639798, Singapore
关键词
knowledge representation of Web documents; Web structure; summarisation; Web content mining; content-based automatic Web document clustering;
D O I
10.1016/j.ipm.2004.08.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The dynamic nature and size of the Internet can result in difficulty finding relevant information. Most users typically express their information need via short queries to search engines and they often have to physically sift through the search results based on relevance ranking set by the search engines, making the process of relevance judgement time-consuming. In this paper, we describe a novel representation technique which makes use of the Web structure together with summarisation techniques to better represent knowledge in actual Web Documents. We named the proposed technique as Semantic Virtual Document (SVD). We will discuss how the proposed SVD can be used together with a suitable clustering algorithm to achieve an automatic content-based categorization of similar Web Documents. The auto-categorization facility as well as a "Tree-like" Graphical User Interface (GUI) for post-retrieval document browsing enhances the relevance judgement process for Internet users. Furthermore, we will introduce how our cluster-biased automatic query expansion technique can be used to overcome the ambiguity of short queries typically given by users. We will outline our experimental design to evaluate the effectiveness of the proposed SVD for representation and present a prototype called iSEARCH (Intelligent SEarch And Review of Cluster Hierarchy) for Web content mining. Our results confirm, quantify and extend previous research using Web structure and summarisation techniques, introducing novel techniques for knowledge representation to enhance Web content mining. (c) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1225 / 1242
页数:18
相关论文
共 50 条
  • [1] Web archiving strategies by using web mining techniques
    Kawano, H
    [J]. 2003 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS, AND SIGNAL PROCESSING, VOLS 1 AND 2, CONFERENCE PROCEEDINGS, 2003, : 915 - 918
  • [2] Optimizing web structures using web mining techniques
    Jeffrey, Jonathan
    Karski, Peter
    Lohrmann, Bjoern
    Kianmehr, Keivan
    Alhajj, Reda
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2007, 2007, 4881 : 653 - 662
  • [3] Web content mining using web design patterns
    Kudelka, Milos
    Snasel, Vaclav
    Lehecka, Ondrej
    El-Qawasmeh, Eyas
    [J]. PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 232 - +
  • [4] Using Some Web Content Mining Techniques for Arabic Text Classification
    Zubi, Zakaria Suliman
    [J]. PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON DATA NETWORKS, COMMUNICATIONS, COMPUTERS (DNCOCO '09), 2009, : 73 - 84
  • [5] Enhancing Web Caching Using Web Usage Mining Techniques
    Saidi, Samia
    Slimani, Yahya
    [J]. RECENT TRENDS IN WIRELESS AND MOBILE NETWORKS, 2010, 84 : 425 - 435
  • [6] Web Content Extraction Using Clustering with Web Structure
    Huang, Xiaotao
    Gao, Yan
    Huang, Liqun
    Zhang, Zhizhao
    Li, Yuhua
    Wang, Fen
    Kang, Ling
    [J]. ADVANCES IN NEURAL NETWORKS, PT I, 2017, 10261 : 95 - 103
  • [7] Web Page Ranking Using Web Mining Techniques: A Comprehensive Survey
    Sharma, Prem Sagar
    Yadav, Divakar
    Thakur, R. N.
    [J]. MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [8] Cyberbullying detection using web content mining
    Kovacevic, Ana
    [J]. 2014 22ND TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2014, : 939 - 942
  • [9] Machine Learning Techniques in Web Content Mining: A Comparative Analysis
    Anami, Basavaraj S.
    Wadawadagi, Ramesh S.
    Pagi, Veerappa B.
    [J]. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2014, 13 (01)
  • [10] Improving web sites with web usage mining, web content mining, and semantic analysis
    Norguet, JP
    Zimányi, E
    Steinberger, R
    [J]. SOFSEM 2006: THEORY AND PRACTICE OF COMPUTER SCIENCE, PROCEEDINGS, 2006, 3831 : 430 - 439