Querying Probabilistic Information Extraction

被引:8
|
作者
Wang, Daisy Zhe [1 ]
Franklin, Michael J. [1 ]
Garofalakis, Minos [2 ]
Hellerstein, Joseph M. [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Tech Univ Crete, Khania, Greece
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2010年 / 3卷 / 01期
关键词
D O I
10.14778/1920841.1920974
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, there has been increasing interest in extending relational query processing to include data obtained from unstructured sources. A common approach is to use stand-alone Information Extraction (IE) techniques to identify and label entities within blocks of text; the resulting entities are then imported into a standard database and processed using relational queries. This two-part approach, however, suffers from two main drawbacks. First, IE is inherently probabilistic, but traditional query processing does not properly handle probabilistic data, resulting in reduced answer quality. Second, performance inefficiencies arise due to the separation of IE from query processing. In this paper, we address these two problems by building on an in-database implementation of a leading IE model-Conditional Random Fields using the Viterbi inference algorithm. We develop two different query approaches on top of this implementation. The first uses deterministic queries over maximum-likelihood extractions, with optimizations to push the relational operators into the Viterbi algorithm. The second extends the Viterbi algorithm to produce a set of possible extraction "worlds", from which we compute top-k probabilistic query answers. We describe these approaches and explore the trade-offs of efficiency and effectiveness between them using two datasets.
引用
收藏
页码:1057 / 1067
页数:11
相关论文
共 50 条
  • [21] Analysis of a probabilistic model of redundancy in unsupervised information extraction
    Downey, Doug
    Etzioni, Oren
    Soderland, Stephen
    ARTIFICIAL INTELLIGENCE, 2010, 174 (11) : 726 - 748
  • [22] pSPARQL: A Querying Language for Probabilistic RDF Data
    Fang, Hong
    COMPLEXITY, 2019,
  • [23] Probabilistic Range Querying over Gaussian Objects
    Dong, Tingting
    Xiao, Chuan
    Ishikawa, Yoshiharu
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (04): : 694 - 704
  • [24] PROBABILISTIC QUERYING OVER UNCERTAIN DATA STREAMS
    Dezfuli, Mohammad G.
    Haghjoo, Mostafa S.
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2012, 20 (05) : 701 - 728
  • [25] Querying and ranking incomplete twigs in probabilistic XML
    Liu, Jian
    Ma, Z. M.
    Yan, Li
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2013, 16 (03): : 325 - 353
  • [26] Representing and querying correlated tuples in probabilistic databases
    Sen, Prithviraj
    Deshpande, Amol
    2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2007, : 571 - +
  • [27] Recent Advances in Querying Probabilistic Knowledge Bases
    Borgwardt, Stefan
    Ceylan, Ismail Ilkan
    Lukasiewicz, Thomas
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 5420 - 5426
  • [28] Querying and ranking incomplete twigs in probabilistic XML
    Jian Liu
    Z. M. Ma
    Li Yan
    World Wide Web, 2013, 16 : 325 - 353
  • [29] Querying with Vague Quantifiers Using Probabilistic Semantics
    Fermuller, Christian G.
    Hofer, Matthias
    Ortiz, Magdalena
    FLEXIBLE QUERY ANSWERING SYSTEMS, FQAS 2017, 2017, 10333 : 15 - 27
  • [30] Querying stool for dietary information
    Dragsted, Lars O.
    Roager, Henrik M.
    Cuparencu, Catalina
    NATURE METABOLISM, 2025, : 450 - 451