An Analysis of Structured Data on the Web

被引:20
|
作者
Dalvi, Nilesh [1 ]
Machanavajjhala, Ashwin [1 ]
Pang, Bo [1 ]
机构
[1] Yahoo Res, 4301 Great America Pkwy, Santa Clara, CA 95054 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2012年 / 5卷 / 07期
关键词
Structured Data on the Web; Information Spread; Information Connectivity;
D O I
10.14778/2180912.2180920
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we analyze the nature and distribution of structured data on the Web. Web-scale information extraction, or the problem of creating structured tables using extraction from the entire web, is gathering lots of research interest. We perform a study to understand and quantify the value of Web-scale extraction, and how structured information is distributed amongst top aggregator websites and tail sites for various interesting domains. We believe this is the first study of its kind, and gives us new insights for information extraction over the Web.
引用
收藏
页码:680 / 691
页数:12
相关论文
共 50 条
  • [1] Analysis of approaches to structured data on the web
    Pohorec, Sandi
    Zorman, Milan
    Kokol, Peter
    COMPUTER STANDARDS & INTERFACES, 2013, 36 (01) : 256 - 262
  • [2] Structured Data on the Web
    Cafarella, Michael J.
    Halevy, Alon
    Madhavan, Jayant
    COMMUNICATIONS OF THE ACM, 2011, 54 (02) : 72 - 79
  • [3] Evolution of structured data on the web
    Guha, R.V.
    Brickley, Dan
    Macbeth, Steve
    Queue, 2015, 13 (09):
  • [4] Structured Data in Web Search
    Halevy, Alon
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 7 - 7
  • [5] Annotating structured data of the deep Web
    Lu, Yiyao
    He, Hai
    Zhao, Hongkun
    Meng, Weiyi
    Yu, Clement
    2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2007, : 351 - +
  • [6] A comprehensive data quality methodology for web and structured data
    Batini, Carlo
    Cabitza, Federico
    Cappiello, Cinzia
    Francalanci, Chiara
    International Journal of Innovative Computing and Applications, 2008, 1 (03) : 205 - 218
  • [7] A comprehensive data quality methodology for web and structured data
    Batini, Carlo
    Cabitza, Federico
    Cappiello, Cinzia
    Francalanci, Chiara
    2006 1ST INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, 2006, : 448 - +
  • [8] ENRICHED MARKING OF STRUCTURED DATA FOR WEB DOCUMENTS
    Adida, Ben
    Herman, Ivan
    Sporny, Manu
    Birbeck, Mark
    ANALES DE DOCUMENTACION, 2013, 16 (01):
  • [9] Towards Unveiling Dark Web Structured Data
    Shams, Montasir
    Pavia, Sophie
    Khan, Rituparna
    Pyayt, Anna
    Gubanov, Michael
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5275 - 5282
  • [10] Fuzzy Matching of Web Queries to Structured Data
    Cheng, Tao
    Lauw, Hady W.
    Paparizos, Stelios
    26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 713 - 716