Towards Temporal URI Collections for Named Entities

被引:0
|
作者
Wildemann, Sergej [1 ]
Holzmann, Helge [2 ]
机构
[1] L3S Res Ctr, Hannover, Germany
[2] Internet Arch, San Francisco, CA USA
关键词
Web Archives; Temporal Information Retrieval; Collaborative Knowledge;
D O I
10.1109/JCDL.2019.00-68
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Web archives represent crucial endeavors in preserving the Web from the past and provide a valuable resource for researchers of different disciplines. Due to their size, navigation in these collections is often limited to specifying an URI and the desired date. However, typical research questions often revolve around the evolution of entities instead of specific websites. Although full-text search often seems to be the first choice to look up web pages, while it provides a quick way to yield the best match with a keyword, its diversified ranking is not made for compiling reliable entity related collections. Further, it generally ignores the temporal relevance that is needed to find pages from the past, e.g., in web archives. In this paper, we present a collection of ranked resource identifiers, characterizing nam ed entities over time. For this purpose, different datasets were collected and evaluated by comparing each w ith a combination of others. Benchmarked against web search engines, our approach achieves a remarkable precision of 83.3 % and shows promising results for high-quality lookups and temporal collection building. To not only rely on existing datasets, we have implemented an interactive platform to get humans in the loop to expand the collection by contributing URIs, metadata and temporal information as well as to correct errors.
引用
收藏
页码:241 / 250
页数:10
相关论文
共 50 条
  • [1] Tempurion: A Collaborative Temporal URI Collection for Named Entities
    Wildemann, Sergej
    Holzmann, Helge
    [J]. 2019 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2019), 2019, : 440 - 441
  • [2] NERank: Ranking Named Entities in Document Collections
    Wang, Chengyu
    Zhang, Rong
    He, Xiaofeng
    Zhou, Aoying
    [J]. PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16 COMPANION), 2016, : 123 - 124
  • [3] Temporal Role Annotation for Named Entities
    Koutraki, Maria
    Bakhshandegan-Moghaddam, Farshad
    Sack, Harald
    [J]. PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON SEMANTIC SYSTEMS, 2018, 137 : 223 - 234
  • [4] Towards a double annotation of Named Entities
    Ehrmann, Maud
    Jacquet, Guillaume
    [J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2006, 47 (03): : 63 - 88
  • [5] Towards the Annotation of Named Entities in the National Corpus of Polish
    Savary, Agata
    Waszczuk, Jakub
    Przepiorkowski, Adam
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [6] Separating Named Entities
    Ulipova, Barbora
    Grac, Marek
    [J]. RASLAN 2014: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING, 2014, : 91 - 96
  • [7] Handling conjunctions in named entities
    Dale, Robert
    Mazur, Pawel
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2007, 4394 : 131 - +
  • [8] Named Entities for Computational Linguistics
    Golikova, Daria M.
    [J]. VOPROSY ONOMASTIKI-PROBLEMS OF ONOMASTICS, 2018, 15 (01): : 207 - 215
  • [9] Handling conjunctions in named entities
    Mazur, Pawel
    Dale, Robert
    [J]. LINGUISTICAE INVESTIGATIONES, 2007, 30 (01): : 49 - 68
  • [10] Cluster analysis of named entities
    Kozareva, Z
    Silva, J
    Gamallo, P
    Lopes, G
    [J]. INTELLIGENT INFORMATION PROCESSING AND WEB MINING, 2004, : 429 - 433