Scalable transcriptomics analysis with Dask: applications in data science and machine learning

被引:1
|
作者
Moreno, Marta [1 ,2 ]
Vilaca, Ricardo [4 ,5 ]
Ferreira, Pedro G. [1 ,2 ,3 ]
机构
[1] Univ Porto, Fac Sci, Dept Comp Sci, Rua Campo Alegre, P-4169007 Porto, Portugal
[2] INESC TEC, Lab Artificial Intelligence & Decis Support, Rua Dr Roberto Frias, P-4200465 Porto, Portugal
[3] Univ Porto, Inst Mol Pathol & Immunol, Inst Res & Innovat Hlth i3s, R Alfredo Allen 208, P-4200135 Porto, Portugal
[4] INESCTEC, High Assurance Software Lab, Rua Dr Roberto Frias, P-4200465 Porto, Portugal
[5] Univ Minho, Minho Adv Comp Ctr, Dept Informat, P-4710070 Braga, Portugal
关键词
Machine learning; Scalable data science; Gene expression; Transcriptomics; Data analysis; EXPRESSION; CLASSIFICATION; TUBERCULOSIS; PREDICTION; TRENDS;
D O I
10.1186/s12859-022-05065-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Gene expression studies are an important tool in biological and biomedical research. The signal carried in expression profiles helps derive signatures for the prediction, diagnosis and prognosis of different diseases. Data science and specifically machine learning have many applications in gene expression analysis. However, as the dimensionality of genomics datasets grows, scalable solutions become necessary. Methods: In this paper we review the main steps and bottlenecks in machine learning pipelines, as well as the main concepts behind scalable data science including those of concurrent and parallel programming. We discuss the benefits of the Dask framework and how it can be integrated with the Python scientific environment to perform data analysis in computational biology and bioinformatics. Results: This review illustrates the role of Dask for boosting data science applications in different case studies. Detailed documentation and code on these procedures is made available at https:// github. com/martaccmoreno/gexp-ml-dask. Conclusion: By showing when and how Dask can be used in transcriptomics analysis, this review will serve as an entry point to help genomic data scientists develop more scalable data analysis procedures.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Scalable transcriptomics analysis with Dask: applications in data science and machine learning
    Marta Moreno
    Ricardo Vilaça
    Pedro G. Ferreira
    [J]. BMC Bioinformatics, 23
  • [2] On Development of Data Science and Machine Learning Applications in Databricks
    Ruan, Wenhao
    Chen, Yifan
    Forouraghi, Babak
    [J]. SERVICES - SERVICES 2019, 2019, 11517 : 78 - 91
  • [3] Statistical and machine learning methods for spatially resolved transcriptomics data analysis
    Zexian Zeng
    Yawei Li
    Yiming Li
    Yuan Luo
    [J]. Genome Biology, 23
  • [4] Statistical and machine learning methods for spatially resolved transcriptomics data analysis
    Zeng, Zexian
    Li, Yawei
    Li, Yiming
    Luo, Yuan
    [J]. GENOME BIOLOGY, 2022, 23 (01)
  • [5] Possibilistic Similarity Measures for Data Science and Machine Learning Applications
    Charfi, Amal
    Bouhamed, Sonda Ammar
    Bosse, Eloi
    Kallel, Imene Khanfir
    Bouchaala, Wassim
    Solaiman, Basel
    Derbel, Nabil
    [J]. IEEE ACCESS, 2020, 8 : 49198 - 49211
  • [6] Data Science: Machine Learning and Multivariate Analysis in Learning Styles
    Maiquez, Diego
    Pabon, Diego
    Condor, Mariela
    Rodriguez, Gonzalo
    Farinango, Mauricio
    Oyasa, Ana
    [J]. INNOVATION AND RESEARCH-SMART TECHNOLOGIES & SYSTEMS, VOL 2, CI3 2023, 2024, 1041 : 69 - 81
  • [7] Scalable Topological Data Analysis for Life Science Applications Invited Talk
    Kalyanaraman, Ananth
    [J]. PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2021 (CF 2021), 2021, : 208 - 208
  • [8] Fundamentals and Applications Related to Data Science, Machine Learning, and Statistical Processing V: Applications of Machine Learning at Kanadevia Corporation
    Umano, Motohide
    Miyake, Toshihide
    Ioka, Ryota
    Wada, Takahiro
    [J]. Zairyo/Journal of the Society of Materials Science, Japan, 2024, 73 (11) : 881 - 887
  • [9] Fundamentals and Applications Related to Data Science, Machine Learning, and Statistical Processing
    Nakagawa, Masao
    Nomura, Yasutoshi
    [J]. Zairyo/Journal of the Society of Materials Science, Japan, 2024, 73 (07) : 618 - 624
  • [10] Fundamentals and Applications Related to Data Science, Machine Learning, and Statistical Processing
    Nomura, Yasutoshi
    Nakagawa, Masao
    [J]. Zairyo/Journal of the Society of Materials Science, Japan, 2024, 73 (08) : 682 - 688