Topic-based web site summarization

被引:2
|
作者
Zhang, Yongzheng [1 ]
Milios, Evangelos [1 ,2 ]
Zincir-Heywood, Nur [1 ]
机构
[1] Dalhousie Univ, Fac Comp Sci, Comp Sci Dept, Halifax, NS, Canada
[2] York Univ, Dept Comp Sci, York, N Yorkshire, England
基金
加拿大自然科学与工程研究理事会;
关键词
Programming and algorithm theory; Internet; Cluster analysis; Data handling;
D O I
10.1108/17440081011090220
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a novel topic-based framework to address this problem. Design/methodology/approach - Atwo-stage framework is proposed. The first stage identifies the main topics covered in a web site via clustering and the second stage summarizes each topic separately. The proposed system is evaluated by a user study and compared with the single-topic summarization approach. Findings - The user study demonstrates that the clustering-summarization approach statistically significantly outperforms the plain summarization approach in the multi-topic web site summarization task. Text-based clustering based on selecting features with high variance over web pages is reliable; outgoing links are useful if a rich set of cross links is available. Research limitations/implications - More sophisticated clustering methods than those used in this study are worth investigating. The proposed method should be tested on web content that is less structured than organizational web sites, for example blogs. Practical implications - The proposed summarization framework can be applied to the effective organization of search engine results and faceted or topical browsing of large web sites. Originality/value - Several key components are integrated for web site summarization for the first time, including feature selection and link analysis, key phrase and key sentence extraction. Insight into the contributions of links and content to topic-based summarization was gained. A classification approach is used to minimize the number of parameters.
引用
收藏
页码:266 / +
页数:39
相关论文
共 50 条
  • [1] CATS: Customizable Abstractive Topic-based Summarization
    Bahrainian, Seyed Ali
    Zerveas, George
    Crestani, Fabio
    Eickhoff, Carsten
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2022, 40 (01)
  • [2] A method for the automatic summarization of topic-based clusters of documents
    Pons-Porrata, A
    Ruiz-Shulcloper, J
    Berlanga-Llavori, R
    PROGRESS IN PATTERN RECOGNITION, SPEECH AND IMAGE ANALYSIS, 2003, 2905 : 596 - 603
  • [3] TIARA: Interactive, Topic-Based Visual Text Summarization and Analysis
    Liu, Shixia
    Zhou, Michelle X.
    Pan, Shimei
    Song, Yangqiu
    Qian, Weihong
    Cai, Weijia
    Lian, Xiaoxiao
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2012, 3 (02)
  • [4] TASP : Topic-based abstractive summarization of Facebook text posts
    Benedetto, Irene
    La Quatra, Moreno
    Cagliero, Luca
    Vassio, Luca
    Trevisan, Martino
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [5] Topic-based automatic summarization algorithm for Chinese short text
    Ma, Tinghuai
    Wang, Hongmei
    Zhao, Yuwei
    Tian, Yuan
    Al-Nabhan, Najla
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2020, 17 (04) : 3582 - 3600
  • [6] Corpus construction for topic-based summarization of multi-party conversation
    Nakayama, Yuri
    Shiota, Tsukasa
    Shimada, Kazutaka
    2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 229 - 234
  • [7] Topic-based Web Page Recommendation Using Tags
    Peng, Jing
    Zeng, Daniel
    ISI: 2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS, 2009, : 269 - +
  • [8] Multi-document summarization using probabilistic topic-based network models
    1613, Institute of Information Science (32):
  • [9] Text summarization using topic-based vector space model and semantic measure
    Belwal, Ramesh Chandra
    Rai, Sawan
    Gupta, Atul
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (03)
  • [10] Multi-document Summarization using Probabilistic Topic-based Network Models
    Yang, Cheng-Zen
    Fan, Jhih-Shang
    Liu, Yu-Fan
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2016, 32 (06) : 1613 - 1634