Extracting Contextualized Quantity Facts from Web Tables

被引:11
|
作者
Ho, Vinh Thinh [1 ]
Pal, Koninika [1 ]
Razniewski, Simon [1 ]
Berberich, Klaus [1 ,2 ]
Weikum, Gerhard [1 ]
机构
[1] Max Planck Inst Informat, Saarbrucken, Germany
[2] Htw Saar, Saarbrucken, Germany
关键词
Information Extraction; Quantity Facts; Web Tables;
D O I
10.1145/3442381.3450072
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Quantity queries, with filter conditions on quantitative measures of entities, are beyond the functionality of search engines and QA assistants. To enable such queries over web contents, this paper develops a novel method for automatically extracting quantity facts from ad-hoc web tables. This involves recognizing quantities, with normalized values and units, aligning them with the proper entities, and contextualizing these pairs with informative cues to match sophisticated queries with modifiers. Our method includes a new approach to aligning quantity columns to entity columns. Prior works assumed a single subject-column per table, whereas our approach is geared for complex tables and leverages external corpora as evidence. For contextualization, we identify informative cues from text and structural markup that surrounds a table. For query-time fact ranking, we devise a new scoring technique that exploits both context similarity and inter-fact consistency. Comparisons of our building blocks against state-of-the-art baselines and extrinsic experiments with two query benchmarks demonstrate the benefits of our method.
引用
收藏
页码:4033 / 4042
页数:10
相关论文
共 50 条
  • [1] TAKCO: A Platform for Extracting Novel Facts from Tables
    Kruit, Benno
    Boncz, Peter
    Urbani, Jacopo
    WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, : 705 - 707
  • [2] Extracting Novel Facts from Tables for Knowledge Graph Completion
    Kruit, Benno
    Boncz, Peter
    Urbani, Jacopo
    SEMANTIC WEB - ISWC 2019, PT I, 2019, 11778 : 364 - 381
  • [3] MedTable: Extracting Disease Types from Web Tables
    Koutraki, Maria
    Fetahu, Besnik
    SEMANTIC WEB: ESWC 2020 SATELLITE EVENTS, 2020, 12124 : 152 - 157
  • [4] QuTE: Answering Quantity Queries from Web Tables
    Vinh Thinh Ho
    Pal, Koninika
    Weikum, Gerhard
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 2740 - 2744
  • [5] Hybrid approach to extracting information from web-tables
    Jung, Sung-won
    Kang, Mi-young
    Kwon, Hyuk-chul
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 109 - +
  • [6] Extracting Relations from Web Tables by Leveraging Table Entity Behaviours
    de Alwis, Lahiru
    Dissanayake, Achala
    Pallewatte, Manujith
    Silva, Kalana
    Thayasivam, Uthayasanker
    2019 13TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2019, : 1 - 6
  • [7] Extracting Knowledge from Web Tables Based on DOM Tree Similarity
    Wu, Xiaolong
    Cao, Cungen
    Wang, Ya
    Fu, Jianhui
    Wang, Shi
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2016, 2016, 9983 : 302 - 313
  • [8] A scalable hybrid approach for extracting head components from Web tables
    Jung, SW
    Kwon, HC
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (02) : 174 - 187
  • [9] Extracting information from WEB tables based on abstract semantic model
    Gu, N.
    Wu, G.W.
    Wu, X.Y.
    Shi, B.L.
    Ruan Jian Xue Bao/Journal of Software, 2001, 12 (SUPPL.): : 220 - 224
  • [10] Extracting Room Prices from Web Tables - an Ontology-Aware Approach
    Buttinger, Christina
    Feilmayr, Christina
    Guttenbrunner, Michael
    Parzer, Stefan
    Proell, Birgit
    INFORMATION AND COMMUNICATION TECHNOLOGIES IN TOURISM 2010, 2010, : 223 - 234