DOM-based Web Pages to Determine the Structure of the Similarity Algorithm

被引:3
|
作者
Kang, Chunying [1 ]
机构
[1] Heilongjiang Univ, Coll Informat Sci & Technol, Harbin 150080, Heilongjiang, Peoples R China
关键词
DOM; Similarity Algorithm; Web;
D O I
10.1109/IITA.2009.218
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Web data is currently mainly in the form of HTML pages, expressed by the HTML language of Web pages through the browser after analysis is only suitable for people to browse, not suitable for data exchange as a way to deal with by a computer. This article will make web page decompound a DOM tree, then from the DOM tree body root node to start, in accordance with the breadth-first traversal order DOM tree, layer by layer comparison DOM node tree, statistics of its changes, and then the sum of all floors of the changes, If less than a certain threshold, it is structurally similar to two pages, otherwise dissimilar, because this algorithm is only concerned about the page structure information without concern for the content of the page, it has a very high operating efficiency, while the algorithm is not limited to a specific web page, with good versatility.
引用
收藏
页码:245 / 248
页数:4
相关论文
共 50 条
  • [31] DOM-based XHTML']HTML document structure analysis separating content from navigation elements
    Mantratzis, Constantine
    Cassidy, Steve
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION JOINTLY WITH INTERNATIONAL CONFERENCE ON INTELLIGENT AGENTS, WEB TECHNOLOGIES & INTERNET COMMERCE, VOL 1, PROCEEDINGS, 2006, : 632 - +
  • [32] A Method for Calculating the Similarity of Web Pages Based on Financial Ontology
    Xiong, Lu
    Li, Kangshun
    Liu, Suping
    [J]. COMPUTATIONAL INTELLIGENCE AND INTELLIGENT SYSTEMS, (ISICA 2015), 2016, 575 : 445 - 455
  • [33] TPS: an Unsupervised Web Page Segmentation Algorithm Based on Dom Tree Structure Mining
    Li, Chunshan
    Ye, Yunming
    Zhang, Xiaofeng
    [J]. INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (01): : 387 - 394
  • [34] The Role of Structure and Content in Perception of Visual Similarity Between Web Pages
    Song, Guangfeng
    [J]. INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2011, 27 (08) : 793 - 816
  • [35] Extracting Knowledge from Web Tables Based on DOM Tree Similarity
    Wu, Xiaolong
    Cao, Cungen
    Wang, Ya
    Fu, Jianhui
    Wang, Shi
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2016, 2016, 9983 : 302 - 313
  • [36] Mapping XML data to relational data: A DOM-based approach
    Atay, M
    Sun, YZ
    Liu, DP
    Lu, SY
    Fotouhi, F
    [J]. PROCEEDINGS OF THE EIGHTH IASTED INTERNATIONAL CONFERENCE ON INTERNET AND MULTIMEDIA SYSTEMS AND APPLICATIONS, 2004, : 59 - 64
  • [37] DEXTERJS']JS: Robust Testing Platform for DOM-Based XSS Vulnerabilities
    Parameshwaran, Inian
    Budianto, Enrico
    Shinde, Shweta
    Dang, Hung
    Sadhu, Atul
    Saxena, Prateek
    [J]. 2015 10TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE 2015) PROCEEDINGS, 2015, : 946 - 949
  • [38] DOM Semantic Expansion-Based Extraction of Topical Information from Web Pages
    Chen, Junjie
    Jia, Junyao
    Duan, Liguo
    [J]. WEB INFORMATION SYSTEMS AND MINING, PT II, 2011, 6988 : 343 - 350
  • [39] TitleFinder: Extracting the Headline of News Web Pages based on Cosine Similarity and Overlap Scoring Similarity
    Mohammadzadeh, Hadi
    Gottron, Thomas
    Schweiggert, Franz
    Heyer, Gerhard
    [J]. PROCEEDINGS OF THE TWELFTH INTERNATIONAL WORKSHOP ON WEB INFORMATION AND DATA MANAGEMENT, 2012, : 65 - 71
  • [40] Similarity Measurement of Web Sites Using Sink Web Pages
    Popescu, Doru Anastasiu
    Maria, Danauta Catrinel
    [J]. 2011 34TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2011, : 24 - 26