Topic-based web site summarization

被引:2
|
作者
Zhang, Yongzheng [1 ]
Milios, Evangelos [1 ,2 ]
Zincir-Heywood, Nur [1 ]
机构
[1] Dalhousie Univ, Fac Comp Sci, Comp Sci Dept, Halifax, NS, Canada
[2] York Univ, Dept Comp Sci, York, N Yorkshire, England
基金
加拿大自然科学与工程研究理事会;
关键词
Programming and algorithm theory; Internet; Cluster analysis; Data handling;
D O I
10.1108/17440081011090220
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a novel topic-based framework to address this problem. Design/methodology/approach - Atwo-stage framework is proposed. The first stage identifies the main topics covered in a web site via clustering and the second stage summarizes each topic separately. The proposed system is evaluated by a user study and compared with the single-topic summarization approach. Findings - The user study demonstrates that the clustering-summarization approach statistically significantly outperforms the plain summarization approach in the multi-topic web site summarization task. Text-based clustering based on selecting features with high variance over web pages is reliable; outgoing links are useful if a rich set of cross links is available. Research limitations/implications - More sophisticated clustering methods than those used in this study are worth investigating. The proposed method should be tested on web content that is less structured than organizational web sites, for example blogs. Practical implications - The proposed summarization framework can be applied to the effective organization of search engine results and faceted or topical browsing of large web sites. Originality/value - Several key components are integrated for web site summarization for the first time, including feature selection and link analysis, key phrase and key sentence extraction. Insight into the contributions of links and content to topic-based summarization was gained. A classification approach is used to minimize the number of parameters.
引用
收藏
页码:266 / +
页数:39
相关论文
共 50 条
  • [31] A topic-based document correlation model
    Jia, Xi-Ping
    Peng, Hong
    Zheng, Qj-Lun
    Jiang, Zhuo-Lin
    Li, Zhao
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2487 - 2491
  • [32] Topic-Based Image Caption Generation
    Sandeep Kumar Dash
    Shantanu Acharya
    Partha Pakray
    Ranjita Das
    Alexander Gelbukh
    Arabian Journal for Science and Engineering, 2020, 45 : 3025 - 3034
  • [33] A Topic-based Reviewer Assignment System
    Kou, Ngai Meng
    Hou, Leong U.
    Mamoulis, Nikos
    Li, Yuhong
    Li, Ye
    Gong, Zhiguo
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (12): : 1852 - 1855
  • [34] Topic-Based Image Caption Generation
    Dash, Sandeep Kumar
    Acharya, Shantanu
    Pakray, Partha
    Das, Ranjita
    Gelbukh, Alexander
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2020, 45 (04) : 3025 - 3034
  • [35] Exploring Topic-Based Sharing Mechanisms
    Sleeper, Manya
    Cranor, Lorrie Faith
    Pearman, Sarah K.
    PROCEEDINGS OF THE 2017 ACM SIGCHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'17), 2017, : 6973 - 6985
  • [36] InfoBee topic-based multimedia structuring
    NTT Human Interface Labs
    NTT R&D, 10 (1115-1122):
  • [37] Collaborative topic regression for predicting topic-based social influence
    Hamzehei, Asso
    Wong, Raymond K.
    Koutra, Danai
    Chen, Fang
    MACHINE LEARNING, 2019, 108 (10) : 1831 - 1850
  • [38] Collaborative topic regression for predicting topic-based social influence
    Asso Hamzehei
    Raymond K. Wong
    Danai Koutra
    Fang Chen
    Machine Learning, 2019, 108 : 1831 - 1850
  • [39] Topic-based Defect Prediction (NIER Track)
    Tung Thanh Nguyen
    Nguyen, Tien N.
    Tu Minh Phuong
    2011 33RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2011, : 932 - 935
  • [40] Content Patterns in Topic-Based Overlapping Communities
    Rios, Sebastian A.
    Munoz, Ricardo
    SCIENTIFIC WORLD JOURNAL, 2014,