Topic-based web site summarization

被引:2
|
作者
Zhang, Yongzheng [1 ]
Milios, Evangelos [1 ,2 ]
Zincir-Heywood, Nur [1 ]
机构
[1] Dalhousie Univ, Fac Comp Sci, Comp Sci Dept, Halifax, NS, Canada
[2] York Univ, Dept Comp Sci, York, N Yorkshire, England
基金
加拿大自然科学与工程研究理事会;
关键词
Programming and algorithm theory; Internet; Cluster analysis; Data handling;
D O I
10.1108/17440081011090220
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a novel topic-based framework to address this problem. Design/methodology/approach - Atwo-stage framework is proposed. The first stage identifies the main topics covered in a web site via clustering and the second stage summarizes each topic separately. The proposed system is evaluated by a user study and compared with the single-topic summarization approach. Findings - The user study demonstrates that the clustering-summarization approach statistically significantly outperforms the plain summarization approach in the multi-topic web site summarization task. Text-based clustering based on selecting features with high variance over web pages is reliable; outgoing links are useful if a rich set of cross links is available. Research limitations/implications - More sophisticated clustering methods than those used in this study are worth investigating. The proposed method should be tested on web content that is less structured than organizational web sites, for example blogs. Practical implications - The proposed summarization framework can be applied to the effective organization of search engine results and faceted or topical browsing of large web sites. Originality/value - Several key components are integrated for web site summarization for the first time, including feature selection and link analysis, key phrase and key sentence extraction. Insight into the contributions of links and content to topic-based summarization was gained. A classification approach is used to minimize the number of parameters.
引用
收藏
页码:266 / +
页数:39
相关论文
共 50 条
  • [41] A topic-based browser for large online resources
    Stuckenschmidt, H
    de Waard, A
    Bhogal, R
    Fluit, C
    Kampman, A
    van Buel, J
    van Mulligen, E
    Broekstra, J
    Crowlesmith, I
    van Harmelen, F
    Scerri, T
    ENGINEERING KNOWLEDGE IN THE AGE OF THE SEMANTIC WEB, PROCEEDINGS, 2004, 3257 : 433 - 448
  • [42] Automatic image annotation based on topic-based smoothing
    Zhou, XD
    Ye, JY
    Chen, L
    Zhang, L
    Shi, BL
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2005, PROCEEDINGS, 2005, 3578 : 86 - 93
  • [43] Towards Topic-Based Trust in Social Networks
    Knap, Tomas
    Mlynkova, Irena
    UBIQUITOUS INTELLIGENCE AND COMPUTING, 2010, 6406 : 635 - 649
  • [44] Topic-based influential user detection: a survey
    Rrubaa Panchendrarajan
    Akrati Saxena
    Applied Intelligence, 2023, 53 : 5998 - 6024
  • [45] Topic-based sentiment analysis for the social web: The role of mood and issue-related words
    Thelwall, Mike
    Buckley, Kevan
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2013, 64 (08): : 1608 - 1617
  • [46] Query-topic focused web pages summarization
    Yoo, Seung Yeol
    Hoffmann, Achim
    PRICAI 2006: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4099 : 533 - 543
  • [47] Assessing topic-based users credibility in twitter
    Meddeb, Amna
    Ben Romdhane, Lotfi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (23) : 63329 - 63351
  • [48] TCPM: Topic-based Clinical Pathway Mining
    Xu, Xiao
    Jin, Tao
    Wei, Zhijie
    Lv, Cheng
    Wang, Jianmin
    2016 IEEE FIRST INTERNATIONAL CONFERENCE ON CONNECTED HEALTH: APPLICATIONS, SYSTEMS AND ENGINEERING TECHNOLOGIES (CHASE), 2016, : 292 - 301
  • [49] Topic-based Classification through Unigram Unmasking
    HaCohen-Kerner, Yaakov
    Rosenfeld, Avi
    Sabag, Asaf
    Tzidkani, Maor
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES-2018), 2018, 126 : 69 - 76
  • [50] A Discriminative Approach to Topic-Based Citation Recommendation
    Tang, Jie
    Zhang, Jing
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2009, 5476 : 572 - 579