DOM-based Web Pages to Determine the Structure of the Similarity Algorithm

被引:3
|
作者
Kang, Chunying [1 ]
机构
[1] Heilongjiang Univ, Coll Informat Sci & Technol, Harbin 150080, Heilongjiang, Peoples R China
关键词
DOM; Similarity Algorithm; Web;
D O I
10.1109/IITA.2009.218
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Web data is currently mainly in the form of HTML pages, expressed by the HTML language of Web pages through the browser after analysis is only suitable for people to browse, not suitable for data exchange as a way to deal with by a computer. This article will make web page decompound a DOM tree, then from the DOM tree body root node to start, in accordance with the breadth-first traversal order DOM tree, layer by layer comparison DOM node tree, statistics of its changes, and then the sum of all floors of the changes, If less than a certain threshold, it is structurally similar to two pages, otherwise dissimilar, because this algorithm is only concerned about the page structure information without concern for the content of the page, it has a very high operating efficiency, while the algorithm is not limited to a specific web page, with good versatility.
引用
收藏
页码:245 / 248
页数:4
相关论文
共 50 条
  • [1] Automated visual classification of DOM-based presentation failure reports for responsive web pages
    Althomali, Ibrahim
    Kapfhammer, Gregory M.
    McMinn, Phil
    [J]. SOFTWARE TESTING VERIFICATION & RELIABILITY, 2021, 31 (04):
  • [2] Detection of DOM-Based XSS Attack on Web Application
    Ninawe, Shubhangi
    Wajgi, Rakhi
    [J]. INTELLIGENT COMMUNICATION TECHNOLOGIES AND VIRTUAL MOBILE NETWORKS, ICICV 2019, 2020, 33 : 633 - 641
  • [3] An Improving Approach for DOM-Based Web Test Suite Repair
    Chen, Wei
    Cao, Hanyang
    Blanc, Xavier
    [J]. WEB ENGINEERING, ICWE 2021, 2021, 12706 : 372 - 387
  • [4] Similarity among web pages based on their link structure
    Quevedo, JU
    Huang, SHS
    [J]. IKE'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2003, : 232 - 238
  • [5] Clustering Web Pages Based on Structure and Style Similarity
    Gowda, Thamme
    Mattmann, Chris
    [J]. PROCEEDINGS OF 2016 IEEE 17TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IEEE IRI), 2016, : 175 - 180
  • [6] Automated Generation of Visual Web Tests from DOM-based Web Tests
    Leotta, Maurizio
    Stocco, Andrea
    Ricca, Filippo
    Tonella, Paolo
    [J]. 30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 775 - 782
  • [7] PESTO: A Tool for Migrating DOM-based to Visual Web Tests
    Stocco, Andrea
    Leotta, Maurizio
    Ricca, Filippo
    Tonella, Paolo
    [J]. 2014 14TH IEEE INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM 2014), 2014, : 65 - 70
  • [8] DOM-based multi-factor web information extraction study
    Zhang, Shun
    Chen, Xingshu
    Tan, Jun
    [J]. MATERIALS, MECHATRONICS AND AUTOMATION, PTS 1-3, 2011, 467-469 : 1267 - 1272
  • [9] A similarity reinforcement algorithm for heterogeneous Web pages
    Liu, N
    Yan, J
    Bai, F
    Zhang, BY
    Xi, WS
    Fan, WG
    Chen, Z
    Ji, L
    Hu, CY
    Ma, WY
    [J]. WEB TECHNOLOGIES RESEARCH AND DEVELOPMENT - APWEB 2005, 2005, 3399 : 121 - 132
  • [10] DOM-Based Print-Link Detection for Web Article Extraction
    Liu, Sam
    Lim, Suk-Hwan
    Liu, Jerry
    [J]. IMAGING AND PRINTING IN A WEB 2.0 WORLD II, 2011, 7879