Learning to Discover Domain-Specific Web Content

被引:4
|
作者
Pham, Kien [1 ]
Santos, Aecio [1 ]
Freire, Juliana [1 ]
机构
[1] NYU, New York, NY 10003 USA
关键词
D O I
10.1145/3159652.3159724
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The ability to discover all content relevant to an information domain has many applications, from helping in the understanding of humanitarian crises to countering human and arms trafficking. In such applications, time is of essence: it is crucial to both maximize coverage and identify new content as soon as it becomes available, so that appropriate actions can be taken. In this paper, we propose new methods for efficient domain-specific re-crawling that maximize the yield for new content. By learning patterns of pages that have a high yield, our methods select a small set of pages that can be re-crawled frequently, increasing the coverage and freshness while conserving resources. Unlike previous approaches to this problem, our methods combine different factors to optimize the re-crawling strategy, do not require full snapshots for the learning step, and dynamically adapt the strategy as the crawl progresses. In an empirical evaluation, we have simulated the framework over 600 partial crawl snapshots in three different domains. The results show that our approach can achieve 150% higher coverage compared to existing, state-of-the-art techniques. In addition, it is also able to capture 80% of new relevant content within less than 4 hours of publication.
引用
收藏
页码:432 / 440
页数:9
相关论文
共 50 条
  • [41] A Sequence Learning Method for Domain-Specific Entity Linking
    Inan, Emrah
    Dikenelli, Oguz
    NAMED ENTITIES, 2018, : 14 - 21
  • [42] Inductive Learning of Declarative Domain-Specific Heuristics for ASP
    Comploi-Taupe, Richard
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2023, (385): : 129 - 140
  • [43] Domain-specific and domain-general constraints on word and sequence learning
    Lisa M. D. Archibald
    Marc F. Joanisse
    Memory & Cognition, 2013, 41 : 268 - 280
  • [44] A domain-specific language for describing machine learning datasets
    Giner-Miguelez, Joan
    Gomez, Abel
    Cabot, Jordi
    JOURNAL OF COMPUTER LANGUAGES, 2023, 76
  • [45] Domain-specific metacognitive calibration in children with learning disabilities
    Crane, N.
    Zusho, A.
    Ding, Y.
    Cancelli, A.
    CONTEMPORARY EDUCATIONAL PSYCHOLOGY, 2017, 50 : 72 - 79
  • [46] Deep Learning for Domain-Specific Action Recognition in Tennis
    Mora, Silvia Vinyes
    Knottenbelt, William J.
    2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 170 - 178
  • [47] SUPPORT OF BLENDED LEARNING IN DOMAIN-SPECIFIC TRANSLATION STUDIES
    Sosnina, E.
    11TH INTERNATIONAL CONFERENCE OF EDUCATION, RESEARCH AND INNOVATION (ICERI2018), 2018, : 5112 - 5116
  • [48] Learning to Compose Domain-Specific Transformations for Data Augmentation
    Ratner, Alexander J.
    Ehrenberg, Henry R.
    Hussain, Zeshan
    Dunnmon, Jared
    Re, Christopher
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [49] DOMAIN-SPECIFIC PRINCIPLES AFFECT LEARNING AND TRANSFER IN CHILDREN
    BROWN, AL
    COGNITIVE SCIENCE, 1990, 14 (01) : 107 - 133
  • [50] Arbiter: A Domain-Specific Language for Ethical Machine Learning
    Zucker, Julian
    d'Leeuwen, Myraeka
    PROCEEDINGS OF THE 3RD AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY AIES 2020, 2020, : 421 - 425