Annotating and Searching Web Tables Using Entities, Types and Relationships

被引:191
|
作者
Limaye, Girija [1 ]
Sarawagi, Sunita [1 ]
Chakrabarti, Soumen [1 ]
机构
[1] Indian Inst Technol, Bombay, Maharashtra, India
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2010年 / 3卷 / 01期
关键词
D O I
10.14778/1920841.1921005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Tables are a universal idiom to present relational data. Billions of tables on Web pages express entity references, attributes and relationships. This representation of relational world knowledge is usually considerably better than completely unstructured, free-format text. At the same time, unlike manually-created knowledge bases, relational information mined from "organic" Web tables need not be constrained by availability of precious editorial time. Unfortunately, in the absence of any formal, uniform schema imposed on Web tables, Web search cannot take advantage of these high-quality sources of relational information. In this paper we propose new machine learning techniques to annotate table cells with entities that they likely mention, table columns with types from which entities are drawn for cells in the column, and relations that pairs of table columns seek to express. We propose a new graphical model for making all these labeling decisions for each table simultaneously, rather than make separate local decisions for entities, types and relations. Experiments using the YAGO catalog, DBPedia, tables from Wikipedia, and over 25 million HTML tables from a 500 million page Web crawl uniformly show the superiority of our approach. We also evaluate the impact of better annotations on a prototype relational Web search tool. We demonstrate clear benefits of our annotations beyond indexing tables in a purely textual manner.
引用
收藏
页码:1338 / 1347
页数:10
相关论文
共 50 条
  • [1] ANNOTATING WEB TABLES WITH THE CROWD
    Wang, Ning
    Liu, Huaxi
    [J]. COMPUTING AND INFORMATICS, 2018, 37 (04) : 969 - 991
  • [2] Joint Learning of Representations for Web-tables, Entities and Types using Graph Convolutional Network
    Pramanick, Aniket
    Bhattacharya, Indrajit
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1197 - 1206
  • [3] Swell - Annotating and searching semantic web services
    Condack, J
    Schwabe, D
    [J]. THIRD LATIN AMERICAN WEB CONGRESS, PROCEEDINGS, 2005, : 102 - 105
  • [4] BIOSMILE web search: a web application for annotating biomedical entities and relations
    Dai, Hong-Jie
    Huang, Chi-Hsin
    Lin, Ryan T. K.
    Tsai, Richard Tzong-Han
    Hsu, Wen-Lian
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : W390 - W398
  • [5] Making Sense of Entities and Quantities in Web Tables
    Ibrahim, Yusra
    Riedewald, Mirek
    Weikum, Gerhard
    [J]. CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 1703 - 1712
  • [6] Annotating Entities with Fine-Grained Types in Austrian Court Decisions
    Revenko, Artem
    Breit, Anna
    Mireles, Victor
    Moreno-Schneider, Julian
    Sageder, Christian
    Karampatakisi, Sotirios
    [J]. FURTHER WITH KNOWLEDGE GRAPHS, 2021, 53 : 139 - 153
  • [7] Annotating Web Tables through Knowledge Bases: A Context-Based Approach
    Eslahi, Yasamin
    Bhardwaj, Akansha
    Rosso, Paolo
    Stockinger, Kurt
    Cudre-Mauroux, Philippe
    [J]. 2020 7TH SWISS CONFERENCE ON DATA SCIENCE, SDS, 2020, : 29 - 34
  • [8] Searching on the Web: Two types of expertise
    Hoelscher, C
    Strube, G
    [J]. SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 305 - 306
  • [9] WETSUIT: An Efficient Mashup Tool for Searching and Fusing Web Entities
    Endrullis, Stefan
    Thor, Andreas
    Rahm, Erhard
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12): : 1970 - 1973
  • [10] MedTable: Extracting Disease Types from Web Tables
    Koutraki, Maria
    Fetahu, Besnik
    [J]. SEMANTIC WEB: ESWC 2020 SATELLITE EVENTS, 2020, 12124 : 152 - 157