Datashim and Its Applications in Bioinformatics

被引:1
|
作者
Gkoufas, Yiannis [1 ]
Yuan, David Yu [2 ]
Pinto, Christian [1 ]
Koutsovasilis, Panagiotis [1 ]
Venugopal, Srikumar [1 ]
机构
[1] IBM Res, Dublin, Ireland
[2] European Mol Biol Lab, European Bioinformat Inst, Technol & Sci Integrat, Cambridge, England
来源
HIGH PERFORMANCE COMPUTING - ISC HIGH PERFORMANCE DIGITAL 2021 INTERNATIONAL WORKSHOPS | 2021年 / 12761卷
基金
欧盟地平线“2020”;
关键词
Datashim; Kubeflow; Kubernetes; Bioinformatics;
D O I
10.1007/978-3-030-90539-2_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bioinformatics pipelines depend on shared POSIX filesystems for its input, output and intermediate data storage. Containerization makes it more difficult for the workloads to access the shared file systems. In our previous study, we were able to run both ML and non-ML pipelines on Kubeflow successfully. However, the storage solutions were complex and less optimal. In this article, we are introducing a new concept of Dataset and its corresponding resource as a native Kubernetes object. We have implemented the concept with a new framework Datashim which takes care of all the low-level details about data access in Kubernetes pods. Its pluggable architecture is designed for the development of caching, scheduling and governance plugins. Together, they manage the entire lifecycle of the custom resource Dataset. We use Datashim to serve data from object stores to both ML and non-ML pipelines on Kubeflow. We feed training data into ML models directly with Datashim instead of downloading it to the local disks, which makes the input scalable. We have enhanced the durability of training metadata by storing it into a dataset, which also simplifies the setup of the TensorBoard, independent of the notebook server. For the non-ML pipeline, we have simplified the 1000 Genome Project pipeline with datasets injected into the pipeline dynamically. We have now established a new resource type Dataset to represent the concept of data source on Kubernetes with our novel framework Datashim to manage its lifecycle.
引用
收藏
页码:416 / 427
页数:12
相关论文
共 50 条
  • [1] Bioinformatics and its applications in agriculture
    Xue, Jian
    Zhao, Shoujing
    Liang, Yanlong
    Hou, Chunxi
    Wang, Jianhua
    COMPUTER AND COMPUTING TECHNOLOGIES IN AGRICULTURE, VOL 2, 2008, 259 : 977 - 982
  • [2] Bioinformatics and its applications in agriculture
    College of Biological and Agricultural Engineering, Jilin University, Changchun
    130022, China
    IFIP Advances in Information and Communication Technology, 2008, (977-982)
  • [3] Bioinformatics and its applications in plant biology
    Rhee, Seung Yon
    Dickerson, Julie
    Xu, Dong
    ANNUAL REVIEW PLANT BIOLOGY, 2006, 57 : 335 - 360
  • [4] Chaos game representation and its applications in bioinformatics
    Loechel, Hannah Franziska
    Heider, Dominik
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 6263 - 6271
  • [5] Nanoengineering bioinformatics: Nanotechnology paradigm and its applications
    Lyshevski, SE
    Krueger, FA
    Theodorou, E
    2003 THIRD IEEE CONFERENCE ON NANOTECHNOLOGY, VOLS ONE AND TWO, PROCEEDINGS, 2003, : 896 - 899
  • [6] Cellular Automata and Its Applications in Protein Bioinformatics
    Xiao, Xuan
    Wang, Pu
    Chou, Kuo-Chen
    CURRENT PROTEIN & PEPTIDE SCIENCE, 2011, 12 (06) : 508 - 519
  • [7] Discriminative pattern mining and its applications in bioinformatics
    Liu, Xiaoqing
    Wu, Jun
    Gu, Feiyang
    Wang, Jie
    He, Zengyou
    BRIEFINGS IN BIOINFORMATICS, 2015, 16 (05) : 884 - 900
  • [8] The Development of Bayesian Theory and Its Applications in Business and Bioinformatics
    Zhang, Yifei
    3RD INTERNATIONAL CONFERENCE ON ENERGY EQUIPMENT SCIENCE AND ENGINEERING (ICEESE 2017), 2018, 128
  • [9] An overview of topic modeling and its current applications in bioinformatics
    Liu, Lin
    Tang, Lin
    Dong, Wen
    Yao, Shaowen
    Zhou, Wei
    SPRINGERPLUS, 2016, 5
  • [10] Genomics and bioinformatics: its applications in animal health and production
    Rodriguez-Osorio, Nelida
    REVISTA COLOMBIANA DE CIENCIAS PECUARIAS, 2019, 32 : 14 - 21