A Holistic Approach to Data Access for Cloud-Native Analytics and Machine Learning

被引:3
|
作者
Koutsovasilis, Panos [1 ]
Venugopal, Srikumar [1 ]
Gkoufas, Yiannis [1 ]
Pinto, Christian [1 ]
机构
[1] IBM Res Europe Dublin, Dublin, Ireland
关键词
D O I
10.1109/CLOUD53861.2021.00084
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud providers offer a variety of storage solutions for hosting data, both in price and in performance. For Analytics and machine learning applications, object storage services are the go-to solution for hosting the datasets that exceed tens of gigabytes in size. However, such a choice results in performance degradation for these applications and requires extra engineering effort in the form of code changes to access the data on remote storage. In this paper, we present a generic end-to-end solution that offers seamless data access for remote object storage services, transparent data caching within the compute infrastructure, and data-aware topologies that boost the performance of applications deployed in Kubernetes. We introduce a custom-implemented cache mechanism that supports all the requirenents of the former and we demonstrate that our holistic solution leads up to 48% improvement for Spark implenentation of the TPC-DS benchmark and up to 191% improvement for the training of deep learning models from the MLPerf benchmark suite.
引用
下载
收藏
页码:654 / 659
页数:6
相关论文
共 50 条
  • [21] DATA CUBES AND CLOUD-NATIVE ENVIRONMENTS FOR EARTH OBSERVATION: AN OVERVIEW
    Munteanu, Alexandru
    Scalable Computing, 2024, 25 (06): : 5745 - 5759
  • [22] Scalable for Cloud-native Transport SDN Controller Using GNPy and Machine Learning techniques for QoT estimation
    Manso, Carlos
    Vilalta, Ricard
    Munoz, Raul
    Casellas, Ramon
    Martinez, Ricardo
    2021 OPTICAL FIBER COMMUNICATIONS CONFERENCE AND EXPOSITION (OFC), 2021,
  • [23] Container Migration of Core Network Component in Cloud-Native Radio Access Network
    Ramanathan, Shunmugapriya
    Kondepu, Koteswararao
    Tacca, Marco
    Valcarenghi, Luca
    Razo, Miguel
    Fumagalli, Andrea
    2020 22ND INTERNATIONAL CONFERENCE ON TRANSPARENT OPTICAL NETWORKS (ICTON 2020), 2020,
  • [24] An Enhanced Cloud-Native Deep Learning Pipeline for Network Traffic Classification
    ElKenawy, Ahmed S.
    Aly, Sherif G.
    PROCEEDINGS OF THE 2022 IEEE 11TH INTERNATIONAL CONFERENCE ON CLOUD NETWORKING (IEEE CLOUDNET 2022), 2022, : 136 - 140
  • [25] JAPO: learning join and pushdown order for cloud-native join optimization
    Yuan, Yuchen
    Feng, Xiaoyue
    Zhang, Bo
    Zhang, Pengyi
    Song, Jie
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (06)
  • [26] A review of machine learning for big data analytics: bibliometric approach
    El-Alfy, El-Sayed M.
    Mohammed, Salahadin A.
    TECHNOLOGY ANALYSIS & STRATEGIC MANAGEMENT, 2020, 32 (08) : 984 - 1005
  • [27] JAPO: learning join and pushdown order for cloud-native join optimization
    YUAN Yuchen
    FENG Xiaoyue
    ZHANG Bo
    ZHANG Pengyi
    SONG Jie
    Frontiers of Computer Science, 2024, 18 (06)
  • [28] Scalability analysis of machine learning QoT estimators for a cloud-native SDN controller on a WDM over SDM network
    Manso, C.
    Vilalta, R.
    Munoz, R.
    Yoshikane, N.
    Casellas, R.
    Martinez, R.
    Wang, C.
    Balasis, F.
    Tsuritani, T.
    Morita, I
    JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2022, 14 (04) : 257 - 266
  • [29] Analyzing large-scale Data Cubes with user-defined algorithms: A cloud-native approach
    Xu, Chen
    Du, Xiaoping
    Jian, Hongdeng
    Dong, Yi
    Qin, Wei
    Mu, Haowei
    Yan, Zhenzhen
    Zhu, Junjie
    Fan, Xiangtao
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2022, 109
  • [30] A Retrospective on Workload Identifiers: From Data Center to Cloud-Native Networks
    Babakian, Andrew
    Monclus, Pere
    Braun, Robin
    Lipman, Justin
    IEEE ACCESS, 2022, 10 : 105518 - 105527