A Holistic Approach to Data Access for Cloud-Native Analytics and Machine Learning

被引:3
|
作者
Koutsovasilis, Panos [1 ]
Venugopal, Srikumar [1 ]
Gkoufas, Yiannis [1 ]
Pinto, Christian [1 ]
机构
[1] IBM Res Europe Dublin, Dublin, Ireland
关键词
D O I
10.1109/CLOUD53861.2021.00084
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud providers offer a variety of storage solutions for hosting data, both in price and in performance. For Analytics and machine learning applications, object storage services are the go-to solution for hosting the datasets that exceed tens of gigabytes in size. However, such a choice results in performance degradation for these applications and requires extra engineering effort in the form of code changes to access the data on remote storage. In this paper, we present a generic end-to-end solution that offers seamless data access for remote object storage services, transparent data caching within the compute infrastructure, and data-aware topologies that boost the performance of applications deployed in Kubernetes. We introduce a custom-implemented cache mechanism that supports all the requirenents of the former and we demonstrate that our holistic solution leads up to 48% improvement for Spark implenentation of the TPC-DS benchmark and up to 191% improvement for the training of deep learning models from the MLPerf benchmark suite.
引用
下载
收藏
页码:654 / 659
页数:6
相关论文
共 50 条
  • [41] Toward Cloud-Native VNFs: An ETSI NFV Management and Orchestration Standards Approach
    Aelken J.
    Triay J.
    Chatras B.
    De Nicolas A.M.
    IEEE Communications Standards Magazine, 2024, 8 (02): : 12 - 19
  • [42] Machine learning and data analytics for the IoT
    Adi, Erwin
    Anwar, Adnan
    Baig, Zubair
    Zeadally, Sherali
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (20): : 16205 - 16233
  • [43] Machine learning and data analytics for the IoT
    Erwin Adi
    Adnan Anwar
    Zubair Baig
    Sherali Zeadally
    Neural Computing and Applications, 2020, 32 : 16205 - 16233
  • [44] Porting Non Cloud-native Applications across Linux Distributions: A Practical Approach
    Kumar, Sanjeet
    Das, Suvrojit
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE (CLOSER), 2022, : 272 - 279
  • [45] CPS data streams analytics based on machine learning for Cloud and Fog Computing: A survey
    Fei, Xiang
    Shah, Nazaraf
    Verba, Nandor
    Chao, Kuo-Ming
    Sanchez-Anguix, Victor
    Lewandowski, Jacek
    James, Anne
    Usman, Zahid
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 90 : 435 - 450
  • [46] Cloud security modeling: using TCP deltas with data analytics and machine learning techniques
    Mohamed Fazil Hussain
    Salwa Sayeedul Hasan
    Hasan Rauf
    Cluster Computing, 2025, 28 (4)
  • [47] Fluid: Dataset Abstraction and Elastic Acceleration for Cloud-native Deep Learning Training Jobs
    Gu, Rong
    Zhang, Kai
    Xu, Zhihao
    Che, Yang
    Fan, Bin
    Hou, Haojun
    Dai, Haipeng
    Yi, Li
    Ding, Yu
    Chen, Guihai
    Huang, Yihua
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2182 - 2195
  • [48] Optimizing Cloud-native Services with SAGA: A Service Affinity Graph-based Approach
    Hai Dinh-Tuan
    Six, Franz Florian
    2024 INTERNATIONAL CONFERENCE ON SMART APPLICATIONS, COMMUNICATIONS AND NETWORKING, SMARTNETS-2024, 2024,
  • [49] Summarizing Online Movie Reviews: A Machine Learning Approach to Big Data Analytics
    Khan, Atif
    Gul, Muhammad Adnan
    Uddin, M. Irfan
    Shah, Syed Atif Ali
    Ahmad, Shafiq
    Al Firdausi, Muhammad Dzulqarnain
    Zaindin, Mazen
    SCIENTIFIC PROGRAMMING, 2020, 2020
  • [50] Machine Learning Algorithms for COPD Patients Readmission Prediction: A Data Analytics Approach
    Mohamed, Israa
    Fouda, Mostafa M.
    Hosny, Khalid M.
    IEEE ACCESS, 2022, 10 : 15279 - 15287