Querying Probabilistic Information Extraction

被引:8
|
作者
Wang, Daisy Zhe [1 ]
Franklin, Michael J. [1 ]
Garofalakis, Minos [2 ]
Hellerstein, Joseph M. [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Tech Univ Crete, Khania, Greece
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2010年 / 3卷 / 01期
关键词
D O I
10.14778/1920841.1920974
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, there has been increasing interest in extending relational query processing to include data obtained from unstructured sources. A common approach is to use stand-alone Information Extraction (IE) techniques to identify and label entities within blocks of text; the resulting entities are then imported into a standard database and processed using relational queries. This two-part approach, however, suffers from two main drawbacks. First, IE is inherently probabilistic, but traditional query processing does not properly handle probabilistic data, resulting in reduced answer quality. Second, performance inefficiencies arise due to the separation of IE from query processing. In this paper, we address these two problems by building on an in-database implementation of a leading IE model-Conditional Random Fields using the Viterbi inference algorithm. We develop two different query approaches on top of this implementation. The first uses deterministic queries over maximum-likelihood extractions, with optimizations to push the relational operators into the Viterbi algorithm. The second extends the Viterbi algorithm to produce a set of possible extraction "worlds", from which we compute top-k probabilistic query answers. We describe these approaches and explore the trade-offs of efficiency and effectiveness between them using two datasets.
引用
收藏
页码:1057 / 1067
页数:11
相关论文
共 50 条
  • [1] Querying and updating probabilistic information in XML
    Abiteboul, Serge
    Senellart, Pierre
    ADVANCES IN DATABASE TECHNOLOGY - EDBT 2006, 2006, 3896 : 1059 - 1068
  • [2] Querying text databases for efficient information extraction
    Agichtein, E
    Gravano, L
    19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, : 113 - 124
  • [3] Probabilistic Declarative Information Extraction
    Wang, Daisy Zhe
    Michelakis, Eirinaios
    Franklin, Michael J.
    Garofalakis, Minos
    Hellerstein, Joseph M.
    26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 173 - 176
  • [4] An intelligent multimedia information system for multimodal content extraction and querying
    Adnan Yazici
    Murat Koyuncu
    Turgay Yilmaz
    Saeid Sattari
    Mustafa Sert
    Elvan Gulen
    Multimedia Tools and Applications, 2018, 77 : 2225 - 2260
  • [5] An intelligent multimedia information system for multimodal content extraction and querying
    Yazici, Adnan
    Koyuncu, Murat
    Yilmaz, Turgay
    Sattari, Saeid
    Sert, Mustafa
    Gulen, Elvan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (02) : 2225 - 2260
  • [6] Optimal information extraction in probabilistic teleportation
    Hsu, LY
    PHYSICAL REVIEW A, 2002, 66 (01) : 6
  • [7] A Probabilistic Model of Redundancy in Information Extraction
    Downey, Doug
    Etzioni, Oren
    Soderland, Stephen
    19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 1034 - 1041
  • [8] Representation and extraction of information by probabilistic logic
    Rodder, W
    KernIsberner, G
    INFORMATION SYSTEMS, 1996, 21 (08) : 637 - 652
  • [9] Querying Probabilistic Preferences in Databases
    Kenig, Batya
    Kimelfeld, Benny
    Ping, Haoyue
    Stoyanovich, Julia
    PODS'17: PROCEEDINGS OF THE 36TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2017, : 21 - 36
  • [10] Querying and Learning in Probabilistic Databases
    Dylla, Maximilian
    Theobald, Martin
    Miliaraki, Iris
    REASONING WEB: REASONING ON THE WEB IN THE BIG DATA ERA, 2014, 8714 : 313 - +