Extracting Contextualized Quantity Facts from Web Tables

被引:11
|
作者
Ho, Vinh Thinh [1 ]
Pal, Koninika [1 ]
Razniewski, Simon [1 ]
Berberich, Klaus [1 ,2 ]
Weikum, Gerhard [1 ]
机构
[1] Max Planck Inst Informat, Saarbrucken, Germany
[2] Htw Saar, Saarbrucken, Germany
关键词
Information Extraction; Quantity Facts; Web Tables;
D O I
10.1145/3442381.3450072
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Quantity queries, with filter conditions on quantitative measures of entities, are beyond the functionality of search engines and QA assistants. To enable such queries over web contents, this paper develops a novel method for automatically extracting quantity facts from ad-hoc web tables. This involves recognizing quantities, with normalized values and units, aligning them with the proper entities, and contextualizing these pairs with informative cues to match sophisticated queries with modifiers. Our method includes a new approach to aligning quantity columns to entity columns. Prior works assumed a single subject-column per table, whereas our approach is geared for complex tables and leverages external corpora as evidence. For contextualization, we identify informative cues from text and structural markup that surrounds a table. For query-time fact ranking, we devise a new scoring technique that exploits both context similarity and inter-fact consistency. Comparisons of our building blocks against state-of-the-art baselines and extrinsic experiments with two query benchmarks demonstrate the benefits of our method.
引用
收藏
页码:4033 / 4042
页数:10
相关论文
共 50 条
  • [21] Extracting riches from the Web: Web mining/personalization
    Drogan, M
    Hsu, J
    7TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XVI, PROCEEDINGS: SYSTEMICS AND INFORMATION SYSTEMS, TECHNOLOGIES AND APPLICATION, 2003, : 214 - 219
  • [22] Extracting World Knowledge from the Web
    Yates, Alexander
    COMPUTER, 2009, 42 (06) : 94 - 97
  • [23] Extracting logical schema from the web
    Carchiolo, V
    Longheu, A
    Malgeri, M
    APPLIED INTELLIGENCE, 2003, 18 (03) : 341 - 355
  • [24] Extracting Company Information from the Web
    Lam, Man I.
    Gong, Zhiguo
    Guo, Jingzhi
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 3640 - 3645
  • [25] Extracting Templates from Web pages
    Manjula, R.
    Chilambuchelvan, A.
    2013 INTERNATIONAL CONFERENCE ON GREEN COMPUTING, COMMUNICATION AND CONSERVATION OF ENERGY (ICGCE), 2013, : 788 - 791
  • [26] Extracting Knowledge from Web Data
    Ezzikouri, Hanane
    Fakir, Mohamed
    Daoui, Cherki
    Erritali, Mohamed
    JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2014, 7 (04) : 27 - 41
  • [27] Extracting table information from the Web
    Kim, YS
    Lee, KH
    DOCUMENT ANALYSIS SYSTEMS VI, PROCEEDINGS, 2004, 3163 : 438 - 441
  • [28] Extracting Logical Schema from the Web
    Vincenza Carchiolo
    Alessandro Longheu
    Michele Malgeri
    Applied Intelligence, 2003, 18 : 341 - 355
  • [29] Extracting Comparative Commonsense from the Web
    Cao, Yanan
    Cao, Cungen
    Zang, Liangjun
    Wang, Shi
    Wang, Dongsheng
    INTELLIGENT INFORMATION PROCESSING V, 2010, 340 : 154 - 162
  • [30] Extracting semistructured information from Web
    Huang, Yu-Qing
    Qi, Guang-Zhi
    Zhang, Fu-Yan
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design & Computer Graphics, 2000, 12 (03): : 230 - 234