Annotating and Searching Web Tables Using Entities, Types and Relationships

被引:191
|
作者
Limaye, Girija [1 ]
Sarawagi, Sunita [1 ]
Chakrabarti, Soumen [1 ]
机构
[1] Indian Inst Technol, Bombay, Maharashtra, India
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2010年 / 3卷 / 01期
关键词
D O I
10.14778/1920841.1921005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Tables are a universal idiom to present relational data. Billions of tables on Web pages express entity references, attributes and relationships. This representation of relational world knowledge is usually considerably better than completely unstructured, free-format text. At the same time, unlike manually-created knowledge bases, relational information mined from "organic" Web tables need not be constrained by availability of precious editorial time. Unfortunately, in the absence of any formal, uniform schema imposed on Web tables, Web search cannot take advantage of these high-quality sources of relational information. In this paper we propose new machine learning techniques to annotate table cells with entities that they likely mention, table columns with types from which entities are drawn for cells in the column, and relations that pairs of table columns seek to express. We propose a new graphical model for making all these labeling decisions for each table simultaneously, rather than make separate local decisions for entities, types and relations. Experiments using the YAGO catalog, DBPedia, tables from Wikipedia, and over 25 million HTML tables from a 500 million page Web crawl uniformly show the superiority of our approach. We also evaluate the impact of better annotations on a prototype relational Web search tool. We demonstrate clear benefits of our annotations beyond indexing tables in a purely textual manner.
引用
收藏
页码:1338 / 1347
页数:10
相关论文
共 50 条
  • [31] Efficient Web searching using temporal factors
    Czumaj, A
    Finch, I
    Gasieniec, L
    Gibbons, A
    Leng, P
    Rytter, W
    Zito, M
    [J]. ALGORITHMS AND DATA STRUCTURES, 1999, 1663 : 294 - 305
  • [32] Searching Web Data using MinHash LSH
    Rao, BiChen
    Zhu, Erkang
    [J]. SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 2257 - 2258
  • [33] Efficient web searching using temporal factors
    Czumaj, A
    Finch, I
    Gasieniec, L
    Gibbons, A
    Leng, P
    Rytter, W
    Zito, M
    [J]. THEORETICAL COMPUTER SCIENCE, 2001, 262 (1-2) : 569 - 582
  • [34] Searching web documents using a summarization approach
    Qumsiyeh, Rani
    Ng, Yiu-Kai
    [J]. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2016, 12 (01) : 83 - 101
  • [35] Using keywords separation to improve searching on the Web
    Zeng, C
    [J]. INTERNATIONAL SOCIETY FOR COMPUTERS AND THEIR APPLICATIONS 13TH INTERNATIONAL CONFERENCE ON COMPUTERS AND THEIR APPLICATIONS, 1998, : 430 - 435
  • [36] Towards a Hybrid Imputation Approach Using Web Tables
    Ahmadov, Ahmad
    Thiele, Maik
    Eberius, Julian
    Lehner, Wolfgang
    Wrembel, Robert
    [J]. 2015 IEEE/ACM 2ND INTERNATIONAL SYMPOSIUM ON BIG DATA COMPUTING (BDC), 2015, : 21 - 30
  • [37] COGNAC: a web server for searching and annotating hydrogen-bonded base interactions in RNA three-dimensional structures
    Firdaus-Raih, Mohd
    Hamdani, Hazrina Yusof
    Nadzirin, Nurul
    Ramlan, Effirul Ikhwan
    Willett, Peter
    Artymiuk, Peter J.
    [J]. NUCLEIC ACIDS RESEARCH, 2014, 42 (W1) : W382 - W388
  • [38] Eighth graders' web searching strategies and outcomes: The role of task types, web experiences and epistemological beliefs
    Tu, Yi-Wen
    Shih, Meilun
    Tsai, Chin-Chung
    [J]. COMPUTERS & EDUCATION, 2008, 51 (03) : 1142 - 1153
  • [39] Automatic Annotating SRRs from Web Databases Using Naive Bayes Approach
    Rane, Nikita P.
    Patil, Dinesh D.
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND CONTROL (IC4), 2015,
  • [40] Web classification of conceptual entities using co-training
    Sun, Aixin
    Liu, Ying
    Lim, Ee-Peng
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (12) : 14367 - 14375