Flexible data integration and curation using a graph-based approach

被引:6
|
作者
Croset, Samuel [1 ]
Rupp, Joachim [1 ]
Romacker, Martin [1 ]
机构
[1] F Hoffmann La Roche & Cie AG, Roche Innovat Ctr Basel, CH-4070 Basel, Switzerland
关键词
D O I
10.1093/bioinformatics/btv644
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The increasing diversity of data available to the biomedical scientist holds promise for better understanding of diseases and discovery of new treatments for patients. In order to provide a complete picture of a biomedical question, data from many different origins needs to be combined into a unified representation. During this data integration process, inevitable errors and ambiguities present in the initial sources compromise the quality of the resulting data warehouse, and greatly diminish the scientific value of the content. Expensive and time-consuming manual curation is then required to improve the quality of the information. However, it becomes increasingly difficult to dedicate and optimize the resources for data integration projects as available repositories are growing both in size and in number everyday. Results: We present a new generic methodology to identify problematic records, causing what we describe as 'data hairball' structures. The approach is graph-based and relies on two metrics traditionally used in social sciences: the graph density and the betweenness centrality. We evaluate and discuss these measures and show their relevance for flexible, optimized and automated data curation and linkage. The methodology focuses on information coherence and correctness to improve the scientific meaningfulness of data integration endeavors, such as knowledge bases and large data warehouses.
引用
收藏
页码:918 / 925
页数:8
相关论文
共 50 条
  • [41] Using Graph-Based Ensemble Learning to Classify Imbalanced Data
    Qin, Anyong
    Shang, Zhaowei
    Tian, Jinyu
    Zhang, Taiping
    Wang, Yulong
    Tang, Yuan Yan
    2017 3RD IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS (CYBCONF), 2017, : 265 - 270
  • [42] Graph-Based Editing of Linked Data Mappings Using the RMLEditor
    Heyvaert, Pieter
    Dimou, Anastasia
    Verborgh, Ruben
    Mannens, Erik
    Van de Walle, Rik
    SEMANTIC WEB, ESWC 2016, 2016, 9989 : 123 - 127
  • [43] A Rule and Graph-Based Approach for Targeted Identity Resolution on Policing Data
    Phillips, Michael
    Amirhosseini, Mohammad Hossein
    Kazemian, Hassan B.
    2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 2077 - 2083
  • [44] Graph-based multiple classifier systems a data level fusion approach
    Neuhaus, M
    Bunke, H
    IMAGE ANALYSIS AND PROCESSING - ICIAP 2005, PROCEEDINGS, 2005, 3617 : 479 - 486
  • [45] Secure data outsourcing in presence of the inference problem: A graph-based approach
    Jebali, Adel
    Sassi, Salma
    Jemai, Abderrazak
    Chbeir, Richard
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2022, 160 : 1 - 15
  • [46] Clustering high dimensional data: A graph-based relaxed optimization approach
    Lee, Chi-Hoon
    Zaiane, Osmar R.
    Park, Ho-Hyun
    Huang, Jiayuan
    Greiner, Russell
    INFORMATION SCIENCES, 2008, 178 (23) : 4501 - 4511
  • [47] A Graph-Based Approach for Searching and Visualizing of Resources and Concepts in Data Science
    Morales-Quezada, David
    Chicaiza, Janneth
    GOOD PRACTICES AND NEW PERSPECTIVES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 3, WORLDCIST 2024, 2024, 987 : 251 - 260
  • [48] A graph-based approach to feature selection
    Zhang Z.
    Hancock E.R.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2011, 6658 LNCS : 205 - 214
  • [49] Graph-based traceability: a comprehensive approach
    Hannes Schwarz
    Jürgen Ebert
    Andreas Winter
    Software & Systems Modeling, 2010, 9 : 473 - 492
  • [50] A GRAPH-BASED APPROACH TO SURFACE RECONSTRUCTION
    MENCL, R
    COMPUTER GRAPHICS FORUM, 1995, 14 (03) : C445 - C456