Amalur: The Convergence of Data Integration and Machine Learning

被引:0
|
作者
Li, Ziyu [1 ]
Sun, Wenbo [1 ]
Zhan, Danning [1 ]
Kang, Yan [2 ]
Chen, Lydia [3 ,4 ]
Bozzon, Alessandro [1 ]
Hai, Rihan [1 ]
机构
[1] Delft Univ Technol, Dept Software Technol, NL-2628 CD Delft, Netherlands
[2] WeBank, Shenzhen 518052, Peoples R China
[3] Univ Neuchatel, Dept Comp Sci, CH-2000 Neuchatel, Switzerland
[4] Delft Univ Technol, NL-2628 CD Delft, Netherlands
基金
荷兰研究理事会;
关键词
Metadata; Data integration; Training; Federated learning; Data privacy; Soft sensors; Training data; Machine learning; data integration; federated learning;
D O I
10.1109/TKDE.2024.3357389
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning (ML) training data is often scattered across disparate collections of datasets, called data silos. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manual work and computational resources. With data privacy constraints, data often cannot leave the premises of data silos; hence model training should proceed in a decentralized manner. In this work, we present a vision of bridging traditional data integration (DI) techniques with the requirements of modern machine learning systems. We explore the possibilities of utilizing metadata obtained from data integration processes for improving the effectiveness, efficiency, and privacy of ML models. Towards this direction, we analyze ML training and inference over data silos. Bringing data integration and machine learning together, we highlight new research opportunities from the aspects of systems, representations, factorized learning, and federated learning.
引用
收藏
页码:7353 / 7367
页数:15
相关论文
共 50 条
  • [41] Integration of whole genome data with machine learning technology in breast cancer subtyping
    Majer-Burman, W.
    Pacewicz, K.
    Meler, M.
    Gniot, M.
    Sielski, D.
    Piernik, M.
    Sztromwasser, P.
    Wozna, A.
    Zawadzki, P.
    ANNALS OF ONCOLOGY, 2022, 33 : S130 - S130
  • [42] A review on machine learning principles for multi-view biological data integration
    Li, Yifeng
    Wu, Fang-Xiang
    Ngom, Alioune
    BRIEFINGS IN BIOINFORMATICS, 2018, 19 (02) : 325 - 340
  • [43] Improvement of an Online Education Model with the Integration of Machine Learning and Data Analysis in an LMS
    Villegas-Ch, William
    Roman-Canizares, Milton
    Palacios-Pacheco, Xavier
    APPLIED SCIENCES-BASEL, 2020, 10 (15):
  • [44] Integration of Machine Learning and Open Access Geospatial Data for Land Cover Mapping
    Mardani, Mohammad
    Mardani, Hossein
    De Simone, Lorenzo
    Varas, Samuel
    Kita, Naoki
    Saito, Takafumi
    REMOTE SENSING, 2019, 11 (16)
  • [45] Machine Learning for Intelligent Bioinformatics - Part 1 Machine Learning Integration
    Hamdi-Cherif, Aboubekeur
    PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING AND DATA BASES, 2010, : 315 - +
  • [46] Machine Learning approaches for convergence of IoT and Blockchain
    Singh, Krishna Kant
    Balamurugan, B.
    Chilamkurti, Naveen
    Kshatriya, Bharat S. Rawal
    OPEN COMPUTER SCIENCE, 2020, 10 (01) : 459 - 460
  • [47] The bounds on the rate of uniform convergence for learning machine
    Zou, B
    Li, LQ
    Xu, J
    ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 1, PROCEEDINGS, 2005, 3496 : 538 - 545
  • [48] IoT convergence with machine learning & blockchain: A review
    Fazel, Elham
    Nezhad, Mahmoud Zahedian
    Rezazadeh, Javad
    Moradi, Marjan
    Ayoade, John
    INTERNET OF THINGS, 2024, 26
  • [49] Machine Learning: A Convergence of Emerging Technologies in Computing
    Kiadi, Morteza
    Tan, Qing
    INTERNATIONAL CONFERENCE ON ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS (AMLTA2018), 2018, 723 : 181 - 192
  • [50] Integration of Machine Learning with Quantum Annealing
    Salloum, Hadi
    Aldaghstany, Hamza Shafee
    Orabi, Osama
    Haidar, Ahmad
    Bahrami, Mohammad Reza
    Mazzara, Manuel
    ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 3, AINA 2024, 2024, 201 : 338 - 348