A Holistic Approach to Data Access for Cloud-Native Analytics and Machine Learning

被引:3
|
作者
Koutsovasilis, Panos [1 ]
Venugopal, Srikumar [1 ]
Gkoufas, Yiannis [1 ]
Pinto, Christian [1 ]
机构
[1] IBM Res Europe Dublin, Dublin, Ireland
关键词
D O I
10.1109/CLOUD53861.2021.00084
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud providers offer a variety of storage solutions for hosting data, both in price and in performance. For Analytics and machine learning applications, object storage services are the go-to solution for hosting the datasets that exceed tens of gigabytes in size. However, such a choice results in performance degradation for these applications and requires extra engineering effort in the form of code changes to access the data on remote storage. In this paper, we present a generic end-to-end solution that offers seamless data access for remote object storage services, transparent data caching within the compute infrastructure, and data-aware topologies that boost the performance of applications deployed in Kubernetes. We introduce a custom-implemented cache mechanism that supports all the requirenents of the former and we demonstrate that our holistic solution leads up to 48% improvement for Spark implenentation of the TPC-DS benchmark and up to 191% improvement for the training of deep learning models from the MLPerf benchmark suite.
引用
下载
收藏
页码:654 / 659
页数:6
相关论文
共 50 条
  • [1] Cloud-Native Transactions and Analytics in SingleStore
    Prout, Adam
    Wang, Szu-Po
    Victor, Joseph
    Sun, Zhou
    Li, Yongzhu
    Chen, Jack
    Bergeron, Evan
    Hanson, Eric
    Walzer, Robert
    Gomes, Rodrigo
    Shamgunov, Nikita
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 2340 - 2352
  • [2] Proactive Autoscaling for Cloud-Native Applications using Machine Learning
    Marie-Magdelaine, Nicolas
    Ahmed, Toufik
    2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
  • [3] Machine Learning based Interference Modelling in Cloud-Native Applications
    Baluta, Alexandru
    Mukherjee, Joydeep
    Litoiu, Marin
    PROCEEDINGS OF THE 2022 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '22), 2022, : 125 - 132
  • [4] Holistic approach to machine tool data analytics
    Lenz, Juergen
    Wuest, Thorsten
    Westkaemper, Engelbert
    JOURNAL OF MANUFACTURING SYSTEMS, 2018, 48 : 180 - 191
  • [5] Cloud-Native Repositories for Big Scientific Data
    Abernathey, Ryan P.
    Blackmon-Luca, Charles C.
    Crone, Timothy J.
    Henderson, Naomi
    Lepore, Chiara
    Augspurger, Tom
    Banihirwe, Anderson
    Gentemann, Chelle L.
    Hamman, Joseph J.
    Henderson, Naomi
    Lepore, Chiara
    McCaie, Theo A.
    Robinson, Niall H.
    Signell, Richard P.
    COMPUTING IN SCIENCE & ENGINEERING, 2021, 23 (02) : 26 - 35
  • [6] Machine learning with big data analytics for cloud security
    Mohammad, Abdul Salam
    Pradhan, Manas Ranjan
    COMPUTERS & ELECTRICAL ENGINEERING, 2021, 96
  • [7] Cloud-Native Repositories for Big Scientific Data
    Abernathey, Ryan P.
    Augspurger, Tom
    Banihirwe, Anderson
    Blackmon-Luca, Charles C.
    Crone, Timothy J.
    Gentemann, Chelle L.
    Hamman, Joseph J.
    Henderson, Naomi
    Lepore, Chiara
    McCaie, Theo A.
    Robinson, Niall H.
    Signell, Richard P.
    Computing in Science and Engineering, 2021, 23 (02): : 26 - 35
  • [8] Toward Cloud-Native, Machine Learning Base Detection of Crop Disease With Imaging Spectroscopy
    Rubambiza, Gloire
    Galvan, Fernando Romero
    Pavlick, Ryan
    Weatherspoon, Hakim
    Gold, Kaitlin M.
    JOURNAL OF GEOPHYSICAL RESEARCH-BIOGEOSCIENCES, 2023, 128 (06)
  • [9] First Scalable Machine Learning Based Architecture for Cloud-native Transport SDN Controller
    Manso, Carlos
    Yoshikane, Noboru
    Vilalta, Ricard
    Munoz, Raul
    Casellas, Ramon
    Martinez, Ricardo
    Wang, Cen
    Balasis, Filippos
    Tsuritani, Takehiro
    Morita, Itsuro
    2021 OPTICAL FIBER COMMUNICATIONS CONFERENCE AND EXPOSITION (OFC), 2021,
  • [10] Dataset Placement and Data Loading Optimizations for Cloud-Native Deep Learning Workloads
    Kang, Zhuangwei
    Min, Ziran
    Zhou, Shuang
    Barve, Yogesh D.
    Gokhale, Aniruddha
    2023 IEEE 26TH INTERNATIONAL SYMPOSIUM ON REAL-TIME DISTRIBUTED COMPUTING, ISORC, 2023, : 107 - 116