Learnable Topical Crawler through Online Semi-supervised Clustering

被引:0
|
作者
Wu, Qing-Yao [1 ]
Ye, Yunming [1 ]
Fu, Jian [1 ]
机构
[1] Harbin Inst Technol, Shenzhen Grad Sch, Shenzhen 518055, Peoples R China
关键词
Constrained k-means; semi-supervised clustering; sample generation; topical crawler;
D O I
10.1109/ICMLC.2009.5212484
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The performance of a traditional topical crawler heavily depends on the quality and comprehensiveness of the initial training samples. However, this is often impossible in real applications since preparing good initial training samples is difficult and time-consuming. It is ideal and appealing for a topical crawler if it can learn knowledge concerning the target topics from the ever-changing environment and adapt itself to these changes during successive crawling process. In this paper, we present a semi-supervised clustering method for building a learnable topical crawler. Our approach employs a constrained k-means clustering algorithm to detect new samples from crawled pages, which is fed to page classifier and link predictor for updating the learned models. This approach enables topical crawling systems with incremental learning capability and in turn improves crawling performance. Comparison experiments have been carried out between our approach and another traditional relevance score based sample generation approach. The experimental results have shown that our approach achieves better performance.
引用
收藏
页码:231 / 236
页数:6
相关论文
共 50 条
  • [1] SCTWC: An online semi-supervised clustering approach to topical web crawlers
    Zhang, Huaxiang
    Lu, Jing
    [J]. APPLIED SOFT COMPUTING, 2010, 10 (02) : 490 - 495
  • [2] SEMI-SUPERVISED FUZZY CLUSTERING WITH LEARNABLE CLUSTER DEPENDENT KERNELS
    Bchir, Ouiem
    Frigui, Hichem
    Ben Ismail, Mohamed Maher
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2013, 22 (03)
  • [3] Semi-Supervised Clustering Algorithms Through Active Constraints
    Almazroi, Abdulwahab Ali
    Atwa, Walid
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (07) : 338 - 345
  • [4] Semi-supervised clustering methods
    Bair, Eric
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2013, 5 (05): : 349 - 361
  • [5] SEMI-SUPERVISED SPECTRAL CLUSTERING
    Mai, Xiaoyi
    Couillet, Romain
    [J]. 2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2018, : 2012 - 2016
  • [6] A review on semi-supervised clustering
    Cai, Jianghui
    Hao, Jing
    Yang, Haifeng
    Zhao, Xujun
    Yang, Yuqing
    [J]. INFORMATION SCIENCES, 2023, 632 : 164 - 200
  • [7] Learnable Subspace Orthogonal Projection for Semi-supervised Image Classification
    Li, Lijian
    Zhang, Yunhe
    Huang, Aiping
    [J]. COMPUTER VISION - ACCV 2022, PT III, 2023, 13843 : 477 - 490
  • [8] Semi-Supervised Clustering for Architectural Modularisation
    Feist, Sofia
    Sanhudo, Luis
    Esteves, Vitor
    Pires, Miguel
    Costa, Antonio Aguiar
    [J]. BUILDINGS, 2022, 12 (03)
  • [9] Semi-supervised clustering with soft labels
    Nebu, Cynthia Marea
    Joseph, Sumy
    [J]. 2015 INTERNATIONAL CONFERENCE ON CONTROL COMMUNICATION & COMPUTING INDIA (ICCC), 2015, : 612 - 616
  • [10] Research Progress on Semi-Supervised Clustering
    Yue Qin
    Shifei Ding
    Lijuan Wang
    Yanru Wang
    [J]. Cognitive Computation, 2019, 11 : 599 - 612