Extracting large-scale knowledge bases from the web

被引:0
|
作者
Kumar, R [1 ]
Raghavan, P [1 ]
Rajagopalan, S [1 ]
Tomkins, A [1 ]
机构
[1] IBM Corp, Almaden Res Ctr, San Jose, CA 95120 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The subject of this paper is the creation of knowledge bases by enumerating and organizing all web occurrences of certain subgraphs. We focus on subgraphs that are signatures of web phenomena such as tightly-focused topic communities, webrings, taxonomy trees, keiretsus, etc. For instance, the signature of a webring is a central page with bidirectional links to a number of other pages. We develop novel algorithms for such enumeration problems. A key technical contribution is the development of a model for the evolution of the web graph, based on experimental observations derived from a snapshot of the web. We argue that our algorithms run efficiently in this model, and use the model to explain some statistical phenomena on the web that emerged during our experiments. Finally, we describe the design and implementation of Campfire, a knowledge base of over one hundred thousand web communities.
引用
收藏
页码:639 / 650
页数:8
相关论文
共 50 条
  • [1] Mining Large-scale Event Knowledge from Web Text
    Cao, Ya-nan
    Zhang, Peng
    Guo, Jing
    Guo, Li
    [J]. 2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2014, 29 : 478 - 487
  • [2] Refined Commonsense Knowledge From Large-Scale Web Contents
    Nguyen, Tuan-Phong
    Razniewski, Simon
    Romero, Julien
    Weikum, Gerhard
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (08) : 8431 - 8447
  • [3] Extracting Event-Centric Document Collections from Large-Scale Web Archives
    Gossen, Gerhard
    Demidova, Elena
    Risse, Thomas
    [J]. RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES (TPDL 2017), 2017, 10450 : 116 - 127
  • [4] KnoWeb - A knowledge web for large-scale, evolving distributed knowledge resources
    Daniel, RS
    Kiss, PA
    Yalowitz, JS
    [J]. 1998 IEEE INFORMATION TECHNOLOGY CONFERENCE, PROCEEDINGS, 1998, : 75 - 78
  • [5] Tracking Semantic Evolutionary Changes in Large-Scale Ontological Knowledge Bases
    Liu, Zhao
    Lu, Chang
    Alghamdi, Ghadah
    Schmidt, Renate A.
    Zhao, Yizheng
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 1130 - 1139
  • [6] A qualitative study of large-scale recommendation algorithms for biomedical knowledge bases
    Ehsan Noei
    Tsahi Hayat
    Jessica Perrie
    Recep Çolak
    Yanqi Hao
    Shankar Vembu
    Kelly Lyons
    Sam Molyneux
    [J]. International Journal on Digital Libraries, 2021, 22 : 197 - 215
  • [7] A qualitative study of large-scale recommendation algorithms for biomedical knowledge bases
    Noei, Ehsan
    Hayat, Tsahi
    Perrie, Jessica
    Colak, Recep
    Hao, Yanqi
    Vembu, Shankar
    Lyons, Kelly
    Molyneux, Sam
    [J]. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2021, 22 (02) : 197 - 215
  • [8] Creation and Interaction with Large-scale Domain-Specific Knowledge Bases
    Bharadwaj, S.
    Chiticariu, L.
    Danilevsky, M.
    Dhingra, S.
    Divekar, S.
    Carreno-Fuentes, A.
    Gupta, H.
    Gupta, N.
    Han, S. -D.
    Hernandez, M.
    Ho, H.
    Jain, P.
    Joshi, S.
    Karanam, H.
    Krishnan, S.
    Krishnamurthy, R.
    Li, Y.
    Manivannan, S.
    Mittal, A.
    Ozcan, F.
    Quamar, A.
    Raman, P.
    Saha, D.
    Sankaranarayanan, K.
    Sen, J.
    Sen, P.
    Vaithyanathan, S.
    Vasa, M.
    Wang, H.
    Zhu, H.
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (12): : 1965 - 1968
  • [9] Extracting Temporal Patterns from Large-Scale Text Corpus
    Liu, Yu
    Hua, Wen
    Zhou, Xiaofang
    [J]. DATABASES THEORY AND APPLICATIONS (ADC 2019), 2019, 11393 : 17 - 30
  • [10] Analysis Methods for Extracting Knowledge from Large-Scale WiFi Monitoring to Inform Building Facility Planning
    Ruiz-Ruiz, Antonio J.
    Blunck, Henrik
    Prentow, Thor S.
    Stisen, Allan
    Kjxrgaard, Mikkel B.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS (PERCOM), 2014, : 130 - 138