Constant-Delay Enumeration for Nondeterministic Document Spanners

被引:19
|
作者
Amarilli, Antoine [1 ,2 ,3 ]
Bourhis, Pierre [4 ,5 ,6 ,7 ]
Mengel, Stefan [8 ,9 ]
Niewerth, Matthias [10 ]
机构
[1] LTCI, 19 Pl Marguerite Perey, F-91120 Palaiseau, France
[2] Telecom Paris, 19 Pl Marguerite Perey, F-91120 Palaiseau, France
[3] Inst Polytech Paris, Palaiseau, France
[4] Parc Sci Haute Borne,40 Ave Halley,Bat B,Pk Plaza, F-59650 Villeneuve Dascq, France
[5] CRIStAL, Paris, France
[6] CNRS, UMR 9189, Paris, France
[7] Inria Lille, Lille, France
[8] CNRS, CRIL, Paris, France
[9] Univ Artois, Fac Sci Jean Perrin, CNRS, CRIL, Rue Jean Souvraz,SP 18, F-62307 Lens, France
[10] Univ Bayreuth, Univ Str 30, D-95447 Bayreuth, Germany
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2021年 / 46卷 / 01期
关键词
Documents spanners; information extraction; constant delay enumeration;
D O I
10.1145/3436487
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the information extraction framework known as document spanners and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential variable-set automaton (VA). We pose this problem in the setting of enumeration algorithms, where we can first run a preprocessing phase and must then produce the results with a small delay between any two consecutive results. Our goal is to have an algorithm that is tractable in combined complexity, i.e., in the sizes of the input document and the VA, while ensuring the best possible data complexity bounds in the input document size, i.e., constant delay in the document size. Several recent works at PODS'18 proposed such algorithms but with linear delay in the document size or with an exponential dependency in size of the (generally nondeterministic) input VA. In particular, Florenzano et al. suggest that our desired runtime guarantees cannot be met for general sequential VAs. We refute this and show that, given a nondeterministic sequential VA and an input document, we can enumerate the mappings of the VA on the document with the following bounds: the preprocessing is linear in the document size and polynomial in the size of the VA, and the delay is independent of the document and polynomial in the size of the VA. The resulting algorithm thus achieves tractability in combined complexity and the best possible data complexity bounds. Moreover, it is rather easy to describe, particularly for the restricted case of so-called extended VAs. Finally, we evaluate our algorithm empirically using a prototype implementation.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] Constant-Delay Enumeration for Nondeterministic Document Spanners
    Amarilli, Antoine
    Bourhis, Pierre
    Mengel, Stefan
    Niewerth, Matthias
    [J]. SIGMOD RECORD, 2020, 49 (01) : 25 - 32
  • [2] Technical Perspective: Constant-Delay Enumeration for Nondeterministic Document Spanners
    Kimelfeld, Benny
    [J]. SIGMOD Record, 2020, 49 (01):
  • [3] Technical Perspective: Constant-Delay Enumeration for Nondeterministic Document Spanners
    Kimelfeld, Benny
    [J]. SIGMOD RECORD, 2020, 49 (01) : 24 - 24
  • [4] Constant Delay Algorithms for Regular Document Spanners
    Florenzano, Fernando
    Riveros, Cristian
    Ugarte, Martin
    Vansummeren, Stijn
    Vrgoc, Domagoj
    [J]. PODS'18: PROCEEDINGS OF THE 37TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2018, : 165 - 177
  • [5] Efficient Enumeration Algorithms for Regular Document Spanners
    Florenzano, Fernando
    Riveros, Cristian
    Ugarte, Martin
    Vansummeren, Stijn
    Vrgoc, Domagoj
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2020, 45 (01):
  • [6] Multi-Purpose Constant-Delay Optical Link
    Tratnik, Jurij
    Dragonja, Uros
    Batagelj, Bostjan
    [J]. 2014 EUROPEAN FREQUENCY AND TIME FORUM (EFTF), 2014, : 333 - 335
  • [7] A glimpse on constant delay enumeration
    Segoufin, Luc
    [J]. 31ST INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2014), 2014, 25 : 13 - 27
  • [8] Stability and stabilization of one class of linear nonstationary constant-delay systems
    B. G. Grebenshchikov
    A. B. Lozhnikov
    [J]. Journal of Computer and Systems Sciences International, 2017, 56 : 173 - 185
  • [9] Constant delay enumeration for conjunctive queries
    Segoufin, Luc
    [J]. SIGMOD RECORD, 2015, 44 (01) : 10 - 17
  • [10] A constant-delay MSB-first bit-serial adder
    Kang, CY
    Swartzlander, EE
    [J]. ADVANCED SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, AND IMPLEMENTATIONS XII, 2002, 4791 : 339 - 344