An Analysis of Structured Data on the Web

被引:20
|
作者
Dalvi, Nilesh [1 ]
Machanavajjhala, Ashwin [1 ]
Pang, Bo [1 ]
机构
[1] Yahoo Res, 4301 Great America Pkwy, Santa Clara, CA 95054 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2012年 / 5卷 / 07期
关键词
Structured Data on the Web; Information Spread; Information Connectivity;
D O I
10.14778/2180912.2180920
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we analyze the nature and distribution of structured data on the Web. Web-scale information extraction, or the problem of creating structured tables using extraction from the entire web, is gathering lots of research interest. We perform a study to understand and quantify the value of Web-scale extraction, and how structured information is distributed amongst top aggregator websites and tail sites for various interesting domains. We believe this is the first study of its kind, and gives us new insights for information extraction over the Web.
引用
收藏
页码:680 / 691
页数:12
相关论文
共 50 条
  • [31] The analysis of structured qualitative data
    Lauro, C
    Balbi, S
    APPLIED STOCHASTIC MODELS AND DATA ANALYSIS, 1999, 15 (01): : 1 - 27
  • [32] Analysis of structured qualitative data
    Dipto. di Matematica e Statistica, Università Federico II, via Cintia, Monte Sant'Angelo, Napoli, Italy
    Appl Stochastic Models Data Anal, 1 (1-27):
  • [33] Web Service for Data Extraction from Semi-structured Data Sources
    Yashina, Marina V.
    Nakonechnyy, Ivan I.
    PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON DEPENDABILITY AND COMPLEX SYSTEMS DEPCOS-RELCOMEX, 2014, 286 : 499 - 510
  • [34] Generating Xpath Expressions for Structured Web Data Record Segmentation
    Grigalis, Tomas
    Cenys, Antanas
    INFORMATION AND SOFTWARE TECHNOLOGIES, 2012, 319 : 38 - 47
  • [35] Finding semantic associations in hierarchically structured groups of Web data
    Rosaci, Domenico
    FORMAL ASPECTS OF COMPUTING, 2015, 27 (5-6) : 867 - 884
  • [36] Ducky : A Data Extraction System for Various Structured Web Documents
    Kanaoka, Kei
    Fujii, Yotaro
    Toyama, Motomichi
    PROCEEDINGS OF THE 18TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM (IDEAS14), 2014, : 342 - 347
  • [37] Special issue on structured and crowd-sourced data on the Web
    Brambilla, Marco
    Ceri, Stefano
    Halevy, Alon
    VLDB JOURNAL, 2013, 22 (05): : 587 - 588
  • [38] Special issue on structured and crowd-sourced data on the Web
    Marco Brambilla
    Stefano Ceri
    Alon Halevy
    The VLDB Journal, 2013, 22 : 587 - 588
  • [39] ObjectRunner: Lightweight, Targeted Extraction and Querying of Structured Web Data
    Abdessalem, Talel
    Cautis, Bogdan
    Derouiche, Nora
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (02): : 1585 - 1588
  • [40] WebDB: a system for querying semi-structured data on the Web
    Li, WS
    Shim, J
    Candan, KS
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2002, 13 (01): : 3 - 33