De-duplicating the OpenAIRE Scholarly Communication Big Graph

被引:1
|
作者
Atzori, Claudio [1 ]
Manghi, Paolo [1 ]
Bardi, Alessia [1 ]
机构
[1] CNR, Ist Sci & Tecnol Informaz A Faedo, Via Moruzzi 1, Pisa, Italy
基金
欧盟地平线“2020”;
关键词
deduplication; graph; big data; scholarly communication; OpenAIRE;
D O I
10.1109/eScience.2018.00104
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The OpenAIRE infrastructure populates a scholarly communication big graph interlinking metadata objects of publications, datasets, software, organizations, funders, and projects. In order to de-duplicate this graph, OpenAIRE has developed GDup, an integrated, scalable, general-purpose system for entity deduplication over big information graphs. GDup offers functionalities to realize a hilly-fledged entity deduplication workflow over a generic input graph, inclusive of Ground Truth support, end-user feedback, and strategies for identifying and merging duplicates to obtain an output disambiguated graph.
引用
收藏
页码:372 / 373
页数:2
相关论文
共 24 条
  • [1] De-duplicating a large crowd-sourced catalogue of bibliographic records
    Subasic, Ilija
    Gvozdenovic, Nebojsa
    Jack, Kris
    [J]. PROGRAM-ELECTRONIC LIBRARY AND INFORMATION SYSTEMS, 2016, 50 (02) : 138 - 156
  • [2] OpenAIRE LOD Services: Scholarly Communication Data as Linked Data
    Alexiou, Giorgos
    Vahdati, Sahar
    Lange, Christoph
    Papastefanatos, George
    Lohmann, Steffen
    [J]. SEMANTICS, ANALYTICS, VISUALIZATION: ENHANCING SCHOLARLY DATA, SAVE-SD 2016, 2016, 9792 : 45 - 50
  • [3] APPCOMMUNE: Automated Third-Party Libraries De-duplicating and Updating for Android Apps
    Li, Bodong
    Zhang, Yuanyuan
    Li, Juanru
    Feng, Runhan
    Gu, Dawu
    [J]. 2019 IEEE 26TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER), 2019, : 344 - 354
  • [4] Considerations for conducting systematic reviews: evaluating the performance of different methods for de-duplicating references
    Sandra McKeown
    Zuhaib M. Mir
    [J]. Systematic Reviews, 10
  • [5] Considerations for conducting systematic reviews: evaluating the performance of different methods for de-duplicating references
    McKeown, Sandra
    Mir, Zuhaib M.
    [J]. SYSTEMATIC REVIEWS, 2021, 10 (01)
  • [6] GDup: De-duplication of Scholarly Communication Big Graphs
    Atzori, Claudio
    Manghi, Paolo
    Bardi, Alessia
    [J]. 2018 IEEE/ACM 5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING APPLICATIONS AND TECHNOLOGIES (BDCAT), 2018, : 142 - 151
  • [7] De-duplicating patient records from three independent data sources reveals the incidence of rare neuromuscular disorders in Germany
    Kirsten König
    Astrid Pechmann
    Simone Thiele
    Maggie C. Walter
    David Schorling
    Adrian Tassoni
    Hanns Lochmüller
    Clemens Müller-Reible
    Janbernd Kirschner
    [J]. Orphanet Journal of Rare Diseases, 14
  • [8] De-duplicating patient records from three independent data sources reveals the incidence of rare neuromuscular disorders in Germany
    Koenig, Kirsten
    Pechmann, Astrid
    Thiele, Simone
    Walter, Maggie C.
    Schorling, David
    Tassoni, Adrian
    Lochmueller, Hanns
    Mueller-Reible, Clemens
    Kirschner, Janbernd
    [J]. ORPHANET JOURNAL OF RARE DISEASES, 2019, 14 (1)
  • [9] Entity deduplication in big data graphs for scholarly communication
    Manghi, Paolo
    Atzori, Claudio
    De Bonis, Michele
    Bardi, Alessia
    [J]. DATA TECHNOLOGIES AND APPLICATIONS, 2020, 54 (04) : 409 - 435
  • [10] RESEARCH BEYOND SCHOLARLY COMMUNICATION - THE BIG CHALLENGE OF SCIENTOMETRICS 2.0
    Glanzel, Wolfgang
    Chi, Pei-Shan
    [J]. 17TH INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS (ISSI2019), VOL I, 2019, : 424 - 436