Querying Probabilistic Information Extraction

被引:8
|
作者
Wang, Daisy Zhe [1 ]
Franklin, Michael J. [1 ]
Garofalakis, Minos [2 ]
Hellerstein, Joseph M. [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Tech Univ Crete, Khania, Greece
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2010年 / 3卷 / 01期
关键词
D O I
10.14778/1920841.1920974
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, there has been increasing interest in extending relational query processing to include data obtained from unstructured sources. A common approach is to use stand-alone Information Extraction (IE) techniques to identify and label entities within blocks of text; the resulting entities are then imported into a standard database and processed using relational queries. This two-part approach, however, suffers from two main drawbacks. First, IE is inherently probabilistic, but traditional query processing does not properly handle probabilistic data, resulting in reduced answer quality. Second, performance inefficiencies arise due to the separation of IE from query processing. In this paper, we address these two problems by building on an in-database implementation of a leading IE model-Conditional Random Fields using the Viterbi inference algorithm. We develop two different query approaches on top of this implementation. The first uses deterministic queries over maximum-likelihood extractions, with optimizations to push the relational operators into the Viterbi algorithm. The second extends the Viterbi algorithm to produce a set of possible extraction "worlds", from which we compute top-k probabilistic query answers. We describe these approaches and explore the trade-offs of efficiency and effectiveness between them using two datasets.
引用
收藏
页码:1057 / 1067
页数:11
相关论文
共 50 条
  • [31] Synopsis information extraction in documents through probabilistic text classifiers
    Polpinij, Jantima
    Ghose, Aditya
    ASIAN DIGITAL LIBRARIES: LOOKING BACK 10 YEARS AND FORGING NEW FRONTIERS, PROCEEDINGS, 2007, 4822 : 508 - +
  • [32] Querying Visible and Invisible Information
    Benedikt, Michael
    Bourhis, Pierre
    ten Cate, Balder
    Puppis, Gabriele
    PROCEEDINGS OF THE 31ST ANNUAL ACM-IEEE SYMPOSIUM ON LOGIC IN COMPUTER SCIENCE (LICS 2016), 2016, : 297 - 306
  • [33] Querying in Spaces of Music Information
    Homenda, Wladyslaw
    Rybnik, Mariusz
    INTEGRATED UNCERTAINTY IN KNOWLEDGE MODELLING AND DECISION MAKING, 2011, 7027 : 243 - +
  • [34] Ontological extraction of content for text querying
    Andreasen, T
    Jensen, PA
    Nilsson, JF
    Paggio, P
    Pedersen, BS
    Thomsen, HE
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2002, 2553 : 123 - 136
  • [35] Flexible Content Extraction and Querying for Videos
    Demir, Utku
    Koyuncu, Murat
    Yazici, Adnan
    Yilmaz, Turgay
    Sert, Mustafa
    FLEXIBLE QUERY ANSWERING SYSTEMS, 2011, 7022 : 460 - +
  • [36] Querying Probabilistic Business Processes for Sub-Flows
    Deutch, Daniel
    THEORY OF COMPUTING SYSTEMS, 2013, 52 (03) : 367 - 402
  • [37] Querying Probabilistic Business Processes for Sub-Flows
    Daniel Deutch
    Theory of Computing Systems, 2013, 52 : 367 - 402
  • [38] Querying Probabilistic Neighborhoods in Spatial Data Sets Efficiently
    von Looz, Moritz
    Meyerhenke, Henning
    Combinatorial Algorithms, 2016, 9843 : 449 - 460
  • [39] A probabilistic approach for adapting information extraction wrappers and discovering new attributes
    Wong, TL
    Lam, W
    FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 257 - 264
  • [40] A Distributed Information Extraction System Integrating Ontological Knowledge and Probabilistic Classifiers
    Alicante, Anita
    Benerecetti, Massimo
    Corazza, Anna
    Silvestri, Stefano
    2014 NINTH INTERNATIONAL CONFERENCE ON P2P, PARALLEL, GRID, CLOUD AND INTERNET COMPUTING (3PGCIC), 2014, : 420 - 425