Amalur: The Convergence of Data Integration and Machine Learning

被引:0
|
作者
Li, Ziyu [1 ]
Sun, Wenbo [1 ]
Zhan, Danning [1 ]
Kang, Yan [2 ]
Chen, Lydia [3 ,4 ]
Bozzon, Alessandro [1 ]
Hai, Rihan [1 ]
机构
[1] Delft Univ Technol, Dept Software Technol, NL-2628 CD Delft, Netherlands
[2] WeBank, Shenzhen 518052, Peoples R China
[3] Univ Neuchatel, Dept Comp Sci, CH-2000 Neuchatel, Switzerland
[4] Delft Univ Technol, NL-2628 CD Delft, Netherlands
基金
荷兰研究理事会;
关键词
Metadata; Data integration; Training; Federated learning; Data privacy; Soft sensors; Training data; Machine learning; data integration; federated learning;
D O I
10.1109/TKDE.2024.3357389
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning (ML) training data is often scattered across disparate collections of datasets, called data silos. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manual work and computational resources. With data privacy constraints, data often cannot leave the premises of data silos; hence model training should proceed in a decentralized manner. In this work, we present a vision of bridging traditional data integration (DI) techniques with the requirements of modern machine learning systems. We explore the possibilities of utilizing metadata obtained from data integration processes for improving the effectiveness, efficiency, and privacy of ML models. Towards this direction, we analyze ML training and inference over data silos. Bringing data integration and machine learning together, we highlight new research opportunities from the aspects of systems, representations, factorized learning, and federated learning.
引用
收藏
页码:7353 / 7367
页数:15
相关论文
共 50 条
  • [1] Data Integration in Machine Learning
    Li, Yifeng
    Ngom, Alioune
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, : 1665 - 1671
  • [2] Data Integration using Machine Learning
    Birgersson, Marcus
    Hansson, Gustav
    Franke, Ulrik
    2016 IEEE 20TH INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING WORKSHOP (EDOCW), 2016, : 313 - 322
  • [3] Machine Learning for Medical Data Integration
    Mueller, Armin
    Christmann, Lara-Sophie
    Kohler, Severin
    Eils, Roland
    Prasser, Fabian
    CARING IS SHARING-EXPLOITING THE VALUE IN DATA FOR HEALTH AND INNOVATION-PROCEEDINGS OF MIE 2023, 2023, 302 : 691 - 695
  • [4] Data Integration and Machine Learning: A Natural Synergy
    Dong, Xin Luna
    Rekatsinas, Theodoros
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (12): : 2094 - 2097
  • [5] Interactive Machine Learning for Laboratory Data Integration
    Fillmore, Nathanael
    Do, Nhan
    Brophy, Mary
    Zimolzak, Andrew
    MEDINFO 2019: HEALTH AND WELLBEING E-NETWORKS FOR ALL, 2019, 264 : 133 - 137
  • [6] Machine learning methods for transcription data integration
    Holloway, D. T.
    Kon, M. A.
    DeLisi, C.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2006, 50 (06) : 631 - 643
  • [7] Data Integration and Machine Learning: A Natural Synergy
    Dong, Xin Luna
    Rekatsinas, Theodoros
    SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 1645 - 1650
  • [8] Data Integration and Machine Learning: A Natural Synergy
    Dong, Xin Luna
    Rekatsinas, Theodoros
    KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 3193 - 3194
  • [9] Machine learning methods for transcription data integration
    Holloway, Dustin T.
    Kon, Mark A.
    DeLisi, Charles
    IBM Journal of Research and Development, 2006, 50 (06): : 631 - 643
  • [10] An Integration of Extreme Learning Machine for Classification of Big Data
    Zhou, Guanwu
    Zhao, Yulong
    Xu, Wenju
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND COMPUTER APPLICATIONS (ICSA 2013), 2013, 92 : 81 - 86