Flexible data integration and curation using a graph-based approach

被引:6
|
作者
Croset, Samuel [1 ]
Rupp, Joachim [1 ]
Romacker, Martin [1 ]
机构
[1] F Hoffmann La Roche & Cie AG, Roche Innovat Ctr Basel, CH-4070 Basel, Switzerland
关键词
D O I
10.1093/bioinformatics/btv644
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The increasing diversity of data available to the biomedical scientist holds promise for better understanding of diseases and discovery of new treatments for patients. In order to provide a complete picture of a biomedical question, data from many different origins needs to be combined into a unified representation. During this data integration process, inevitable errors and ambiguities present in the initial sources compromise the quality of the resulting data warehouse, and greatly diminish the scientific value of the content. Expensive and time-consuming manual curation is then required to improve the quality of the information. However, it becomes increasingly difficult to dedicate and optimize the resources for data integration projects as available repositories are growing both in size and in number everyday. Results: We present a new generic methodology to identify problematic records, causing what we describe as 'data hairball' structures. The approach is graph-based and relies on two metrics traditionally used in social sciences: the graph density and the betweenness centrality. We evaluate and discuss these measures and show their relevance for flexible, optimized and automated data curation and linkage. The methodology focuses on information coherence and correctness to improve the scientific meaningfulness of data integration endeavors, such as knowledge bases and large data warehouses.
引用
收藏
页码:918 / 925
页数:8
相关论文
共 50 条
  • [21] Graph-based data mining
    Cook, DJ
    Holder, LB
    IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 2000, 15 (02): : 32 - +
  • [22] Graph-based data mining
    Cook, Diane J.
    Holder, Lawrence B.
    IEEE Intelligent Systems and Their Applications, 2000, 15 (02): : 32 - 41
  • [23] A graph-based approach to detecting tourist movement patterns using social media data
    Hu, Fei
    Li, Zhenlong
    Yang, Chaowei
    Jiang, Yongyao
    CARTOGRAPHY AND GEOGRAPHIC INFORMATION SCIENCE, 2019, 46 (04) : 368 - 382
  • [24] GRAPH-BASED APPROACH FOR MOTION CAPTURE DATA REPRESENTATION AND ANALYSIS
    Kao, Jiun-Yu
    Ortega, Antonio
    Narayanan, Shrikanth S.
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 2061 - 2065
  • [25] A Graph-Based Approach to Learn Semantic Descriptions of Data Sources
    Taheriyan, Mohsen
    Knoblock, Craig A.
    Szekely, Pedro
    Ambite, Jose Luis
    SEMANTIC WEB - ISWC 2013, PART I, 2013, 8218 : 607 - 623
  • [26] A new graph-based clustering approach: Application to PMSI data
    Elghazel, Haytham
    Kheddouci, Hamamache
    Deslandres, Veronique
    Dussauchoy, Alain
    2006 INTERNATIONAL CONFERENCE ON SERVICE SYSTEMS AND SERVICE MANAGEMENT, VOLS 1 AND 2, PROCEEDINGS, 2006, : 110 - 115
  • [27] Understanding Horizon 2020 Data: A Knowledge Graph-Based Approach
    Giarelis, Nikolaos
    Karacapilidis, Nikos
    APPLIED SCIENCES-BASEL, 2021, 11 (23):
  • [28] A Graph-Based Approach for Data Fusion and Segmentation of Multimodal Images
    Iyer, Geoffrey
    Chanussot, Jocelyn
    Bertozzi, Andrea L.
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (05): : 4419 - 4429
  • [29] Visual explainable artificial intelligence for graph-based visual question answering and scene graph curation
    Sebastian Künzel
    Tanja Munz-Körner
    Pascal Tilli
    Noel Schäfer
    Sandeep Vidyapu
    Ngoc Thang Vu
    Daniel Weiskopf
    Visual Computing for Industry, Biomedicine, and Art, 8 (1)
  • [30] Formulation and integration of MDAO systems for collaborative design: A graph-based methodological approach
    van Gent, Imco
    La Rocca, Gianfranco
    AEROSPACE SCIENCE AND TECHNOLOGY, 2019, 90 : 410 - 433