Data Driven XPath Generation

被引:0
|
作者
De Mol, Robin [1 ]
Bronselaer, Antoon [1 ]
Nielandt, Joachim [1 ]
De Tre, Guy [1 ]
机构
[1] Univ Ghent, Dept Telecommun & Informat Proc, B-9000 Ghent, Belgium
关键词
XPath Generation; Data Driven; !text type='HTML']HTML[!/text; WEB DATA EXTRACTION; INFORMATION EXTRACTION;
D O I
10.1007/978-3-319-11313-5_50
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The XPath query language offers a standard for information extraction from HTML documents. Therefore, the DOM tree representation is typically used, which models the hierarchical structure of the document. One of the key aspects of HTML is the separation of data and the structure that is used to represent it. A consequence thereof is that data extraction algorithms usually fail to identify data if the structure of a document is changed. In this paper, it is investigated how a set of tabular oriented XPath queries can be adapted in such a way it deals with modifications in the DOM tree of an HTML document. The basic idea is hereby that if data has already been extracted in the past, it could be used to reconstruct XPath queries that retrieve the data from a different DOM tree. Experimental results show the accuracy of our method.
引用
收藏
页码:569 / 580
页数:12
相关论文
共 50 条
  • [1] AXIOMATIZING HYBRID XPATH WITH DATA
    Areces, Carlos
    Fervari, Raul
    [J]. LOGICAL METHODS IN COMPUTER SCIENCE, 2021, 17 (03) : 5:1 - 5:37
  • [2] Tableaux for Hybrid XPath with Data
    Areces, Carlos
    Fervari, Raul
    Seiler, Nahuel
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2017), 2017, 10423 : 611 - 623
  • [3] Axiomatizations for downward XPath on data trees
    Abriola, Sergio
    Emilia Descotte, Maria
    Fervari, Raul
    Figueira, Santiago
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2017, 89 : 209 - 245
  • [4] AN EXTENSION OF DATA AUTOMATA THAT CAPTURES XPATH
    Bojanczyk, Mikolaj
    Lasota, Slawomir
    [J]. LOGICAL METHODS IN COMPUTER SCIENCE, 2012, 8 (01)
  • [5] Generating XML Data for XPath Queries
    Rychnovsky, Dusan
    Holubova, Irena
    [J]. 30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 724 - 731
  • [6] An extension of data automata that captures XPath
    Bojanczyk, Mikolaj
    Lasota, Slawomir
    [J]. 25TH ANNUAL IEEE SYMPOSIUM ON LOGIC IN COMPUTER SCIENCE (LICS 2010), 2010, : 243 - 252
  • [7] Satisfiability of Downward XPath with Data Equality Tests
    Figueira, Diego
    [J]. PODS'09: PROCEEDINGS OF THE TWENTY-EIGHTH ACM SIGMOD-SIGACT-SIGART SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2009, : 197 - 206
  • [8] Evolution of XPath Lists for Document Data Selection
    Garcia-Sanchez, Pablo
    Merelo Guervos, Juan J.
    Angel Castillo, Pedro
    Gonzalez, Jesus
    Jimenez Laredo, Juan L.
    Mora Garcia, Antonio M.
    Garcia Arenas, Maria I.
    [J]. PARALLEL PROBLEM SOLVING FROM NATURE-PPSN XI, PT II, 2010, 6239 : 341 - 350
  • [9] XPathMark: An XPath benchmark for the XMark generated data
    Franceschet, M
    [J]. DATABASE AND XML TECHNOLOGIES, PROCEEDINGS, 2005, 3671 : 129 - 143
  • [10] Alternating Automata on Data Trees and XPath Satisfiability
    Jurdzinski, Marcin
    Lazic, Ranko
    [J]. ACM TRANSACTIONS ON COMPUTATIONAL LOGIC, 2011, 12 (03)