Querying Probabilistic Information Extraction

被引:8
|
作者
Wang, Daisy Zhe [1 ]
Franklin, Michael J. [1 ]
Garofalakis, Minos [2 ]
Hellerstein, Joseph M. [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Tech Univ Crete, Khania, Greece
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2010年 / 3卷 / 01期
关键词
D O I
10.14778/1920841.1920974
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, there has been increasing interest in extending relational query processing to include data obtained from unstructured sources. A common approach is to use stand-alone Information Extraction (IE) techniques to identify and label entities within blocks of text; the resulting entities are then imported into a standard database and processed using relational queries. This two-part approach, however, suffers from two main drawbacks. First, IE is inherently probabilistic, but traditional query processing does not properly handle probabilistic data, resulting in reduced answer quality. Second, performance inefficiencies arise due to the separation of IE from query processing. In this paper, we address these two problems by building on an in-database implementation of a leading IE model-Conditional Random Fields using the Viterbi inference algorithm. We develop two different query approaches on top of this implementation. The first uses deterministic queries over maximum-likelihood extractions, with optimizations to push the relational operators into the Viterbi algorithm. The second extends the Viterbi algorithm to produce a set of possible extraction "worlds", from which we compute top-k probabilistic query answers. We describe these approaches and explore the trade-offs of efficiency and effectiveness between them using two datasets.
引用
收藏
页码:1057 / 1067
页数:11
相关论文
共 50 条
  • [41] Probabilistic Indexing and Search for Information Extraction on Handwritten German Parish Records
    Lang, Eva
    Puigcerver, Joan
    Hector Toselli, Alejandro
    Vidal, Enrique
    PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, : 44 - 49
  • [43] Querying Heterogeneous Personal Information on the Go
    Le-Phuoc, Danh
    Le-Tuan, Anh
    Schiele, Gregor
    Hauswirth, Manfred
    SEMANTIC WEB - ISWC 2014, PT II, 2014, 8797 : 454 - 469
  • [44] Fuzzy Querying in Intelligent Information Systems
    Koyuncu, Murat
    FLEXIBLE QUERY ANSWERING SYSTEMS: 8TH INTERNATIONAL CONFERENCE, FQAS 2009, 2009, 5822 : 536 - 547
  • [45] XML querying using ontological information
    Svensson, Hans Eric
    Wilk, Artur
    PRINCIPLES AND PRACTICE OF SEMANTIC WEB REASONING, 2006, 4187 : 190 - 203
  • [46] Querying incomplete information in RDF with SPARQL
    Nikolaou, Charalampos
    Koubarakis, Manolis
    ARTIFICIAL INTELLIGENCE, 2016, 237 : 138 - 171
  • [47] Querying structured information sources on the Web
    Mergen S.
    Freire J.
    Heuser C.A.
    International Journal of Metadata, Semantics and Ontologies, 2010, 5 (03) : 208 - 221
  • [48] Representing and querying XML with incomplete information
    Abiteboul, Serge
    Segoufin, Luc
    Vianu, Victor
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2006, 31 (01): : 208 - 254
  • [49] Collaborative querying for enhanced information retrieval
    Fu, L
    Goh, DHL
    Foo, SSB
    Supangat, Y
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2004, 3232 : 378 - 388
  • [50] AN INTELLIGENT INTEGRATED QUERYING SYSTEM FOR FREE-FORM INFORMATION EXTRACTION FROM VETERINARY CLINICAL RECORDS
    Tangtulyangkul, Ploy
    Fung, Chun Che
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 3430 - 3435