A sampling method based on URL clustering for fast web accessibility evaluation

被引:8
|
作者
Zhang, Meng-ni [1 ]
Wang, Can [1 ]
Bu, Jia-jun [1 ]
Yu, Zhi [1 ]
Zhou, Yu [1 ]
Chen, Chun [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Page sampling; URL clustering; Web accessibility evaluation;
D O I
10.1631/FITEE.1400377
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When evaluating the accessibility of a large website, we rely on sampling methods to reduce the cost of evaluation. This may lead to a biased evaluation when the distribution of checkpoint violations in a website is skewed and the selected samples do not provide a good representation of the entire website. To improve sampling quality, stratified sampling methods first cluster web pages in a site and then draw samples from each cluster. In existing stratified sampling methods, however, all the pages in a website need to be analyzed for clustering, causing huge I/O and computation costs. To address this issue, we propose a novel page sampling method based on URL clustering for web accessibility evaluation, namely URLSamp. Using only the URL information for stratified page sampling, URLSamp can efficiently scale to large websites. Meanwhile, by exploiting similarities in URL patterns, URLSamp cluster pages by their generating scripts and can thus effectively detect accessibility problems from web page templates. We use a data set of 45 web sites to validate our method. Experimental results show that our URLSamp method is both effective and efficient for web accessibility evaluation.
引用
收藏
页码:449 / 456
页数:8
相关论文
共 50 条
  • [1] A sampling method based on URL clustering for fast web accessibility evaluation
    Meng-ni Zhang
    Can Wang
    Jia-jun Bu
    Zhi Yu
    Yu Zhou
    Chun Chen
    Frontiers of Information Technology & Electronic Engineering, 2015, 16 : 449 - 456
  • [2] A method for webpage classification based on url using clustering
    Sunita
    Singh, Gurvinder
    Rana, Vijay
    Recent Advances in Computer Science and Communications, 2021, 14 (02) : 442 - 447
  • [3] URL Ordering based Performance Evaluation of Web Crawler
    Shoaib, Mohd
    Maurya, Ashish K.
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ENGINEERING AND TECHNOLOGY RESEARCH (ICAETR), 2014,
  • [4] BASED ON RANDOM SAMPLING FUZZY CLUSTERING VALIDITY EVALUATION METHOD
    Hu Chunchun
    Yan Xiaohong
    2011 3RD INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT (ICCTD 2011), VOL 3, 2012, : 423 - 427
  • [5] Toward a Combined Method for Evaluation of Web Accessibility
    Acosta-Vargas, Patricia
    Lujan-Mora, Sergio
    Acosta, Tania
    Salvador-Ullauri, Luis
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY & SYSTEMS (ICITS 2018), 2018, 721 : 602 - 613
  • [6] An Evaluation of Web Accessibility Metrics based on their Attributes
    Freire, Andre P.
    Fortes, Renata P. M.
    Turine, Marcelo A. S.
    Paiva, Debora M. B.
    SIGDOC'08: PROCEEDINGS OF THE 26TH ACM INTERNATIONAL CONFERENCE ON DESIGN OF COMMUNICATION, 2008, : 73 - +
  • [7] PCCS: a fast clustering and classification method for Web document
    Wang, A.H.
    Zhang, M.
    Yang, D.Q.
    Tang, S.W.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2001, 38 (04):
  • [8] The Reliability Evaluation Method of Generation System Based on the Importance Sampling Method and States Clustering
    Zhong, Sheng
    Yang, Tianmeng
    Wu, Yaowu
    Lou, Suhua
    Li, Taijun
    2017 2ND INTERNATIONAL CONFERENCE ON ADVANCES ON CLEAN ENERGY RESEARCH (ICACER 2017), 2017, 118 : 128 - 135
  • [9] A method for Web user session identification based on URL semantic analysis
    Zhu, Zhi-Guo
    Dalian Ligong Daxue Xuebao/Journal of Dalian University of Technology, 2011, 51 (03): : 440 - 446
  • [10] Incorporating URL embedding into ensemble clustering to detect web anomalies
    Li, Bo
    Yuan, Guiqin
    Shen, Li
    Zhang, Ruoyi
    Yao, Yiyang
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 96 : 176 - 184