A Holistic Approach to Data Access for Cloud-Native Analytics and Machine Learning

被引:3
|
作者
Koutsovasilis, Panos [1 ]
Venugopal, Srikumar [1 ]
Gkoufas, Yiannis [1 ]
Pinto, Christian [1 ]
机构
[1] IBM Res Europe Dublin, Dublin, Ireland
关键词
D O I
10.1109/CLOUD53861.2021.00084
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud providers offer a variety of storage solutions for hosting data, both in price and in performance. For Analytics and machine learning applications, object storage services are the go-to solution for hosting the datasets that exceed tens of gigabytes in size. However, such a choice results in performance degradation for these applications and requires extra engineering effort in the form of code changes to access the data on remote storage. In this paper, we present a generic end-to-end solution that offers seamless data access for remote object storage services, transparent data caching within the compute infrastructure, and data-aware topologies that boost the performance of applications deployed in Kubernetes. We introduce a custom-implemented cache mechanism that supports all the requirenents of the former and we demonstrate that our holistic solution leads up to 48% improvement for Spark implenentation of the TPC-DS benchmark and up to 191% improvement for the training of deep learning models from the MLPerf benchmark suite.
引用
下载
收藏
页码:654 / 659
页数:6
相关论文
共 50 条
  • [31] With super SDMs (machine learning, open access big data, and the cloud) towards more holistic global squirrel hotspots and coldspots
    Steiner, Moriz
    Huettmann, F.
    Bryans, N.
    Barker, B.
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [32] With super SDMs (machine learning, open access big data, and the cloud) towards more holistic global squirrel hotspots and coldspots
    Moriz Steiner
    F. Huettmann
    N. Bryans
    B. Barker
    Scientific Reports, 14
  • [33] Cloud Native Data Platform for Network Telemetry and Analytics
    Tovarnak, Daniel
    Racek, Matus
    Velan, Petr
    PROCEEDINGS OF THE 2021 17TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM 2021): SMART MANAGEMENT FOR FUTURE NETWORKS AND SERVICES, 2021, : 394 - 396
  • [34] Building a modern data platform based on the data lakehouse architecture and cloud-native ecosystem
    Ahmed AbouZaid
    Peter J. Barclay
    Christos Chrysoulas
    Nikolaos Pitropakis
    Discover Applied Sciences, 7 (3)
  • [35] Access Control Design Practice and Solutions in Cloud-Native Architecture: A Systematic Mapping Study
    Rahaman, Md Shahidur
    Tisha, Sadia Nasrin
    Song, Eunjee
    Cerny, Tomas
    SENSORS, 2023, 23 (07)
  • [36] DQN Approach for Adaptive Self-Healing of VNFs in Cloud-Native Network
    Arulappan, Arunkumar
    Mahanti, Aniket
    Passi, Kalpdrum
    Srinivasan, Thiruvenkadam
    Naha, Ranesh
    Raja, Gunasekaran
    IEEE ACCESS, 2024, 12 : 34489 - 34504
  • [37] ITS_LIVE: A Cloud-Native Approach to Monitoring Glaciers From Space
    Lopez, Luis A.
    Gardner, Alex S.
    Greene, Chad A.
    Kennedy, Joseph H.
    Liukis, Maria
    Fahnestock, Mark A.
    Scambos, Ted
    Fahnestock, Jacob R.
    COMPUTING IN SCIENCE & ENGINEERING, 2023, 25 (06) : 49 - 56
  • [38] Machine learning for big data analytics
    Oja, E. (erkki.oja@aalto.fi), 1600, Springer Verlag (384):
  • [39] Ganos Aero: A Cloud-Native System for Big Raster Data Management and Processing
    Xiao, Fei
    Xie, Jiong
    Chen, Zhida
    Li, Feifei
    Chen, Zhen
    Liu, Jianwei
    Liu, Yinpei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 3966 - 3969
  • [40] Data Analytics and Machine Learning in Education
    Gomez-Pulido, Juan A. A.
    Park, Young
    Soto, Ricardo
    Lanza-Gutierrez, Jose M.
    APPLIED SCIENCES-BASEL, 2023, 13 (03):