Amalur: The Convergence of Data Integration and Machine Learning

被引:0
|
作者
Li, Ziyu [1 ]
Sun, Wenbo [1 ]
Zhan, Danning [1 ]
Kang, Yan [2 ]
Chen, Lydia [3 ,4 ]
Bozzon, Alessandro [1 ]
Hai, Rihan [1 ]
机构
[1] Delft Univ Technol, Dept Software Technol, NL-2628 CD Delft, Netherlands
[2] WeBank, Shenzhen 518052, Peoples R China
[3] Univ Neuchatel, Dept Comp Sci, CH-2000 Neuchatel, Switzerland
[4] Delft Univ Technol, NL-2628 CD Delft, Netherlands
基金
荷兰研究理事会;
关键词
Metadata; Data integration; Training; Federated learning; Data privacy; Soft sensors; Training data; Machine learning; data integration; federated learning;
D O I
10.1109/TKDE.2024.3357389
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning (ML) training data is often scattered across disparate collections of datasets, called data silos. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manual work and computational resources. With data privacy constraints, data often cannot leave the premises of data silos; hence model training should proceed in a decentralized manner. In this work, we present a vision of bridging traditional data integration (DI) techniques with the requirements of modern machine learning systems. We explore the possibilities of utilizing metadata obtained from data integration processes for improving the effectiveness, efficiency, and privacy of ML models. Towards this direction, we analyze ML training and inference over data silos. Bringing data integration and machine learning together, we highlight new research opportunities from the aspects of systems, representations, factorized learning, and federated learning.
引用
收藏
页码:7353 / 7367
页数:15
相关论文
共 50 条
  • [31] Survey on the Convergence of Machine Learning and Blockchain
    Ding, Shengwen
    Hu, Chenhui
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, 2023, 543 : 170 - 189
  • [32] Machine Learning Robustness, Fairness, and their Convergence
    Lee, Jae-Gil
    Roh, Yuji
    Song, Hwanjun
    Whang, Steven Euijong
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 4046 - 4047
  • [33] Integration of Machine Learning and Optimization for Robot Learning
    Mosavi, Amir
    Varkonyi-Koczy, Annamaria R.
    RECENT GLOBAL RESEARCH AND EDUCATION: TECHNOLOGICAL CHALLENGES, 2017, 519 : 349 - 355
  • [34] Using Unsupervised Machine Learning for Data Quality. Application to Financial Governmental Data Integration
    Necba, Hanae
    Rhanoui, Maryem
    El Asri, Bouchra
    BIG DATA, CLOUD AND APPLICATIONS, BDCA 2018, 2018, 872 : 197 - 209
  • [35] Integration of machine learning and data analysis for the SAGD production performance with infill wells
    Huang, Ziteng
    Yang, Min
    Chen, Zhangxin
    CANADIAN JOURNAL OF CHEMICAL ENGINEERING, 2023, 101 (12): : 6928 - 6943
  • [36] Integration of multimodal imaging data with machine learning for improved diagnosis and prognosis in neuroimaging
    Bhattacharya, Saurabh
    Prusty, Sashikanta
    Pande, Sanjay P.
    Gulhane, Monali
    Lavate, Santosh H.
    Rakesh, Nitin
    Veerasamy, Saravanan
    FRONTIERS IN HUMAN NEUROSCIENCE, 2025, 19
  • [37] Machine Learning for multi-omics data integration and variant pathogenicity estimation
    Li, Shuang
    van der Velde, K. Joeri
    Swertz, Morris A.
    2018 IEEE 14TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE 2018), 2018, : 301 - 301
  • [38] Machine Learning for Earnings Prediction: A Nonlinear Tensor Approach for Data Integration and Completion
    Uddin, Ajim
    Tao, Xinyuan
    Chou, Chia-Ching
    Yu, Dantong
    3RD ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2022, 2022, : 282 - 290
  • [39] Advancing post-genome data and system integration through machine learning
    Azuaje, F
    COMPARATIVE AND FUNCTIONAL GENOMICS, 2002, 3 (01): : 28 - 31
  • [40] Advancing Clinical Psychiatry: Integration of Clinical and Omics Data Using Machine Learning
    Qi, Bill
    Trakadis, Yannis J.
    BIOLOGICAL PSYCHIATRY, 2023, 94 (12) : 908 - 909