The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping

被引:2
|
作者
Ashish, Naveen [1 ]
Dewan, Peehoo [1 ]
Toga, Arthur W. [1 ]
机构
[1] Univ So Calif, Keck Sch Med, Lab Neuro Imaging, Stevens Neuroimaging & Informat Inst, Los Angeles, CA 90033 USA
来源
基金
美国国家卫生研究院;
关键词
data mapping; machine learning; active Learning; data harmonization; common data model; UNIFORM DATA SET;
D O I
10.3389/fninf.2015.00030
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
This work is focused on mapping biomedical datasets to a common representation, as an integral part of data harmonization for integrated biomedical data access and sharing. We present GEM, an intelligent software assistant for automated data mapping across different datasets or from a dataset to a common data model. The GEM system automates data mapping by providing precise suggestions for data element mappings. It leverages the detailed metadata about elements in associated dataset documentation such as data dictionaries that are typically available with biomedical datasets. It employs unsupervised text mining techniques to determine similarity between data elements and also employs machine-learning classifiers to identify element matches. It further provides an active-learning capability where the process of training the GEM system is optimized. Our experimental evaluations show that the GEM system provides highly accurate data mappings (over 90% accuracy) for real datasets of thousands of data elements each, in the Alzheimer's disease research domain. Further, the effort in training the system for new datasets is also optimized. We are currently employing the GEM system to map Alzheimer's disease datasets from around the globe into a common representation, as part of a global Alzheimer's disease integrated data sharing and analysis network called GAAIN(1) GEM achieves significantly higher data mapping accuracy for biomedical datasets compared to other state-of-the-art tools for database schema matching that have similar functionality. With the use of active-learning capabilities, the user effort in training the system is minimal.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [31] An active learning system for mining time-changing data streams
    Huang, Shucheng
    Dong, Yisheng
    INTELLIGENT DATA ANALYSIS, 2007, 11 (04) : 401 - 419
  • [32] Intelligent Medical Data Storage System Using Machine Learning Approach
    Saranya, M. S.
    Selvi, M.
    Ganapathy, S.
    Muthurajkumar, S.
    Ramesh, L. Sai
    Kannan, A.
    2016 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2017, : 191 - 195
  • [33] Deep Learning Assisted Medical Insurance Data Analytics With Multimedia System
    Zhang, Cheng
    Vinodhini, B.
    Muthu, Bala Anand
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2023, 8 (02): : 69 - 80
  • [34] MKGB: A Medical Knowledge Graph Construction Framework Based on Data Lake and Active Learning
    Ren, Peng
    Hou, Wei
    Sheng, Ming
    Li, Xin
    Li, Chao
    Zhang, Yong
    HEALTH INFORMATION SCIENCE, HIS 2021, 2021, 13079 : 245 - 253
  • [35] combining an expert-Based Medical entity Recognizer to a Machine- Learning system: Methods and a case study
    Zweigenbaum, Pierre
    Lavergne, Thomas
    Grabar, Natalia
    Hamon, Thierry
    Rosset, Sophie
    Grouin, Cyril
    BIOMEDICAL INFORMATICS INSIGHTS, 2013, 6 : 51 - 62
  • [36] Deep Active Learning Framework for Lymph Node Metastasis Prediction in Medical Support System
    Zhuang, Qinghe
    Dai, Zhehao
    Wu, Jia
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [37] Development and evaluation of RapTAT: A machine learning system for concept mapping of phrases from medical narratives
    Gobbel, Glenn T.
    Reeves, Ruth
    Jayaramaraja, Shrimalini
    Giuse, Dario
    Speroff, Theodore
    Brown, Steven H.
    Elkin, Peter L.
    Matheny, Michael E.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 48 : 54 - 65
  • [38] Development and evaluation of RapTAT: A machine learning system for concept mapping of phrases from medical narratives
    Gobbel, G.T. (glenn.t.gobbel@vanderbilt.edu), 1600, Academic Press Inc. (48):
  • [39] Active Balancing Mechanism for Imbalanced Medical Data in Deep Learning-Based Classification Models
    Zhang, Hongyi
    Zhang, Haoke
    Pirbhulal, Sandeep
    Wu, Wanqing
    De Albuquerque, Victor Hugo C.
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (01)
  • [40] Power System Online Stability Assessment Using Active Learning and Synchrophasor Data
    Malbasa, Vuk
    Zheng, Ce
    Kezunovic, Mladen
    2013 IEEE GRENOBLE POWERTECH (POWERTECH), 2013,