An Analysis of Structured Data on the Web

被引:20
|
作者
Dalvi, Nilesh [1 ]
Machanavajjhala, Ashwin [1 ]
Pang, Bo [1 ]
机构
[1] Yahoo Res, 4301 Great America Pkwy, Santa Clara, CA 95054 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2012年 / 5卷 / 07期
关键词
Structured Data on the Web; Information Spread; Information Connectivity;
D O I
10.14778/2180912.2180920
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we analyze the nature and distribution of structured data on the Web. Web-scale information extraction, or the problem of creating structured tables using extraction from the entire web, is gathering lots of research interest. We perform a study to understand and quantify the value of Web-scale extraction, and how structured information is distributed amongst top aggregator websites and tail sites for various interesting domains. We believe this is the first study of its kind, and gives us new insights for information extraction over the Web.
引用
收藏
页码:680 / 691
页数:12
相关论文
共 50 条
  • [1] Analysis of approaches to structured data on the web
    Pohorec, Sandi
    Zorman, Milan
    Kokol, Peter
    [J]. COMPUTER STANDARDS & INTERFACES, 2013, 36 (01) : 256 - 262
  • [2] Structured Data on the Web
    Cafarella, Michael J.
    Halevy, Alon
    Madhavan, Jayant
    [J]. COMMUNICATIONS OF THE ACM, 2011, 54 (02) : 72 - 79
  • [3] Structured Data in Web Search
    Halevy, Alon
    [J]. PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 7 - 7
  • [4] Annotating structured data of the deep Web
    Lu, Yiyao
    He, Hai
    Zhao, Hongkun
    Meng, Weiyi
    Yu, Clement
    [J]. 2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2007, : 351 - +
  • [5] A comprehensive data quality methodology for web and structured data
    Batini, Carlo
    Cabitza, Federico
    Cappiello, Cinzia
    Francalanci, Chiara
    [J]. International Journal of Innovative Computing and Applications, 2008, 1 (03) : 205 - 218
  • [6] A comprehensive data quality methodology for web and structured data
    Batini, Carlo
    Cabitza, Federico
    Cappiello, Cinzia
    Francalanci, Chiara
    [J]. 2006 1ST INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, 2006, : 448 - +
  • [7] Towards Unveiling Dark Web Structured Data
    Shams, Montasir
    Pavia, Sophie
    Khan, Rituparna
    Pyayt, Anna
    Gubanov, Michael
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5275 - 5282
  • [8] ENRICHED MARKING OF STRUCTURED DATA FOR WEB DOCUMENTS
    Adida, Ben
    Herman, Ivan
    Sporny, Manu
    Birbeck, Mark
    [J]. ANALES DE DOCUMENTACION, 2013, 16 (01):
  • [9] Web-Scale Extraction of Structured Data
    Cafarella, Michael J.
    Madhavan, Jayant
    Halevy, Alon
    [J]. SIGMOD RECORD, 2008, 37 (04) : 55 - 61
  • [10] Fuzzy Matching of Web Queries to Structured Data
    Cheng, Tao
    Lauw, Hady W.
    Paparizos, Stelios
    [J]. 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 713 - 716