The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping

被引:2
|
作者
Ashish, Naveen [1 ]
Dewan, Peehoo [1 ]
Toga, Arthur W. [1 ]
机构
[1] Univ So Calif, Keck Sch Med, Lab Neuro Imaging, Stevens Neuroimaging & Informat Inst, Los Angeles, CA 90033 USA
来源
基金
美国国家卫生研究院;
关键词
data mapping; machine learning; active Learning; data harmonization; common data model; UNIFORM DATA SET;
D O I
10.3389/fninf.2015.00030
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
This work is focused on mapping biomedical datasets to a common representation, as an integral part of data harmonization for integrated biomedical data access and sharing. We present GEM, an intelligent software assistant for automated data mapping across different datasets or from a dataset to a common data model. The GEM system automates data mapping by providing precise suggestions for data element mappings. It leverages the detailed metadata about elements in associated dataset documentation such as data dictionaries that are typically available with biomedical datasets. It employs unsupervised text mining techniques to determine similarity between data elements and also employs machine-learning classifiers to identify element matches. It further provides an active-learning capability where the process of training the GEM system is optimized. Our experimental evaluations show that the GEM system provides highly accurate data mappings (over 90% accuracy) for real datasets of thousands of data elements each, in the Alzheimer's disease research domain. Further, the effort in training the system for new datasets is also optimized. We are currently employing the GEM system to map Alzheimer's disease datasets from around the globe into a common representation, as part of a global Alzheimer's disease integrated data sharing and analysis network called GAAIN(1) GEM achieves significantly higher data mapping accuracy for biomedical datasets compared to other state-of-the-art tools for database schema matching that have similar functionality. With the use of active-learning capabilities, the user effort in training the system is minimal.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [41] User friendly software Physical Mapper to manage physical mapping data of human genome:: system building and its application for Down Syndrome Critical Region
    Minoshima, S
    Mitsuyama, S
    Kudoh, J
    Kawasaki, K
    Suzuki, M
    Shimizu, N
    CYTOGENETICS AND CELL GENETICS, 1997, 79 (1-2): : 39 - 39
  • [42] BLSF: Adaptive Learning for Small-Sample Medical Data With Broad Learning System Forest Integration
    Saputra, Dimas Chaerul Ekty
    Sunat, Khamron
    Ratnaningsih, Tri
    IEEE ACCESS, 2024, 12 : 180844 - 180863
  • [43] Particle Swarm Optimization Based Swarm Intelligence for Active Learning Improvement: Application on Medical Data Classification
    Zemmal, Nawel
    Azizi, Nabiha
    Sellami, Mokhtar
    Cheriguene, Soraya
    Ziani, Amel
    AlDwairi, Monther
    Dendani, Nadjette
    COGNITIVE COMPUTATION, 2020, 12 (05) : 991 - 1010
  • [44] Particle Swarm Optimization Based Swarm Intelligence for Active Learning Improvement: Application on Medical Data Classification
    Nawel Zemmal
    Nabiha Azizi
    Mokhtar Sellami
    Soraya Cheriguene
    Amel Ziani
    Monther AlDwairi
    Nadjette Dendani
    Cognitive Computation, 2020, 12 : 991 - 1010
  • [45] An Efficient High-Quality Medical Lesion Image Data Labeling Method Based on Active Learning
    Zhou, Jiancun
    Cao, Rui
    Kang, Jian
    Guo, Kehua
    Xu, Yangting
    IEEE ACCESS, 2020, 8 : 144331 - 144342
  • [46] Intrusion Detection System Based on Support Vector Machine Active Learning and Data Fusion
    Zhao, Man
    Zhai, Jing
    He, Zhouqian
    ADVANCES IN COMPUTATION AND INTELLIGENCE, 2010, 6382 : 272 - +
  • [47] CrowdAL: Towards a Blockchain-empowered Active Learning System in Crowd Data Labeling
    Hou, Shaojie
    Wang, Yuandou
    Zhao, Zhiming
    2024 IEEE 20TH INTERNATIONAL CONFERENCE ON E-SCIENCE, E-SCIENCE 2024, 2024,
  • [48] Evolutionary trajectory hybridization for improving deep learning accuracy in medical data prediction system
    Aswad, Firas Mohammed
    Saffer, Khalid Mohammed
    Salman, Ihsan
    MATERIALS TODAY-PROCEEDINGS, 2022, 61 : 653 - 659
  • [49] Clinical decision support system based on RST with machine learning for medical data classification
    Singh, Kamakhya Narain
    Mantri, Jibendu Kumar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 39707 - 39730
  • [50] Clinical decision support system based on RST with machine learning for medical data classification
    Kamakhya Narain Singh
    Jibendu Kumar Mantri
    Multimedia Tools and Applications, 2024, 83 : 39707 - 39730