Topic-based web site summarization

被引:2
|
作者
Zhang, Yongzheng [1 ]
Milios, Evangelos [1 ,2 ]
Zincir-Heywood, Nur [1 ]
机构
[1] Dalhousie Univ, Fac Comp Sci, Comp Sci Dept, Halifax, NS, Canada
[2] York Univ, Dept Comp Sci, York, N Yorkshire, England
基金
加拿大自然科学与工程研究理事会;
关键词
Programming and algorithm theory; Internet; Cluster analysis; Data handling;
D O I
10.1108/17440081011090220
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a novel topic-based framework to address this problem. Design/methodology/approach - Atwo-stage framework is proposed. The first stage identifies the main topics covered in a web site via clustering and the second stage summarizes each topic separately. The proposed system is evaluated by a user study and compared with the single-topic summarization approach. Findings - The user study demonstrates that the clustering-summarization approach statistically significantly outperforms the plain summarization approach in the multi-topic web site summarization task. Text-based clustering based on selecting features with high variance over web pages is reliable; outgoing links are useful if a rich set of cross links is available. Research limitations/implications - More sophisticated clustering methods than those used in this study are worth investigating. The proposed method should be tested on web content that is less structured than organizational web sites, for example blogs. Practical implications - The proposed summarization framework can be applied to the effective organization of search engine results and faceted or topical browsing of large web sites. Originality/value - Several key components are integrated for web site summarization for the first time, including feature selection and link analysis, key phrase and key sentence extraction. Insight into the contributions of links and content to topic-based summarization was gained. A classification approach is used to minimize the number of parameters.
引用
收藏
页码:266 / +
页数:39
相关论文
共 50 条
  • [21] A Hidden Topic-Based Framework toward Building Applications with Short Web Documents
    Xuan-Hieu Phan
    Cam-Tu Nguyen
    Dieu-Thu Le
    Le-Minh Nguyen
    Horiguchi, Susumu
    Quang-Thuy Ha
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (07) : 961 - 976
  • [22] Every Page is Page One: Topic-based Writing for Technical Communication and the Web
    Pohland, Liz
    TECHNICAL COMMUNICATION, 2014, 61 (02) : 132 - 132
  • [23] Topic-based Indexing of Federated Datasets
    Sorrentino, Ciro
    Giallonardo, Ester
    Zimeo, Eugenio
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 1090 - 1098
  • [24] Signaling Context in Topic-Based Writing
    Swarts, Jason
    TECHNICAL COMMUNICATION, 2022, 69 (01) : 40 - 53
  • [25] Topic-Based Communication Between Agents
    Galimullin, Rustam
    Velazquez-Quesada, Fernando R.
    STUDIA LOGICA, 2024,
  • [26] Personalized topic-based tag recommendation
    Krestel, Ralf
    Fankhauser, Peter
    NEUROCOMPUTING, 2012, 76 (01) : 61 - 70
  • [27] Topic-based Targeted Influence Maximization
    Srinivasan, Balaji V.
    Anandhavelu, N.
    Dalal, Aseem
    Yenugula, Madhavi
    Srikanthan, Prashanth
    Layek, Arijit
    2014 SIXTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORKS (COMSNETS), 2014,
  • [28] Topic-based Video Analysis: A Survey
    Pal, Ratnabali
    Sekh, Arif Ahmed
    Dogra, Debi Prosad
    Kar, Samarjit
    Roy, Partha Pratim
    Prasad, Dilip K.
    ACM COMPUTING SURVEYS, 2021, 54 (06)
  • [29] Topic-based software defect explanation
    Chen, Tse-Hsun
    Shang, Weiyi
    Nagappan, Meiyappan
    Hassan, Ahmed E.
    Thomas, Stephen W.
    JOURNAL OF SYSTEMS AND SOFTWARE, 2017, 129 : 79 - 106
  • [30] Topic tracking with dynamic topic model and topic-based weighting method
    Zhang X.
    Wang T.
    Journal of Software, 2010, 5 (05) : 482 - 489