Spacemake: processing and analysis of large-scale spatial transcriptomics data

被引:10
|
作者
Sztanka-Toth, Tamas Ryszard [1 ,2 ]
Jens, Marvin [1 ]
Karaiskos, Nikos [1 ]
Rajewsky, Nikolaus [1 ,2 ,3 ,4 ]
机构
[1] Max Delbruck Ctr Mol Med, Helmholtz Assoc MDC, Berlin Inst Med Syst Biol BIMSB, Syst Biol Gene Regulatory Elements, D-10115 Berlin, Germany
[2] Humboldt Univ, Inst Biol, D-10099 Berlin, Germany
[3] DZHK German Ctr Cardiovasc Res, Partner Site Berlin, D-10117 Berlin, Germany
[4] Univ Med Charite, Dept Pediat Oncol, D-13353 Berlin, Germany
来源
GIGASCIENCE | 2022年 / 11卷
基金
欧盟地平线“2020”;
关键词
bioinformatics; computational biology; computational pipeline; sequence analysis; spatial transcriptomics; single-cell transcriptomics; reproducibility; modularity; scalability; workflow; SEQ; ARCHITECTURE; EXPRESSION;
D O I
10.1093/gigascience/giac064
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background Spatial sequencing methods increasingly gain popularity within RNA biology studies. State-of-the-art techniques quantify messenger RNA expression levels from tissue sections and at the same time register information about the original locations of the molecules in the tissue. The resulting data sets are processed and analyzed by accompanying software that, however, is incompatible across inputs from different technologies. Findings Here, we present spacemake, a modular, robust, and scalable spatial transcriptomics pipeline built in Snakemake and Python. Spacemake is designed to handle all major spatial transcriptomics data sets and can be readily configured for other technologies. It can process and analyze several samples in parallel, even if they stem from different experimental methods. Spacemake's unified framework enables reproducible data processing from raw sequencing data to automatically generated downstream analysis reports. Spacemake is built with a modular design and offers additional functionality such as sample merging, saturation analysis, and analysis of long reads as separate modules. Moreover, spacemake employs novoSpaRc to integrate spatial and single-cell transcriptomics data, resulting in increased gene counts for the spatial data set. Spacemake is open source and extendable, and it can be seamlessly integrated with existing computational workflows.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Data processing and evaluation for large-scale proteome profile
    Wu, S.
    Ying, W.
    Zhang, J.
    Xue, X.
    Qian, X.
    Zhu, Y.
    He, F.
    [J]. MOLECULAR & CELLULAR PROTEOMICS, 2006, 5 (10) : S121 - S121
  • [22] An Efficient Strategy for Large-Scale CORS Data Processing
    Xiong, Bolin
    Huang, Dingfa
    [J]. CHINA SATELLITE NAVIGATION CONFERENCE (CSNC) 2016 PROCEEDINGS, VOL I, 2016, 388 : 213 - 225
  • [23] Distributed Data Processing for Large-Scale Simulations on Cloud
    Lu, Tianjian
    Hoyer, Stephan
    Wang, Qing
    Hu, Lily
    Chen, Yi-Fan
    [J]. 2021 JOINT IEEE INTERNATIONAL SYMPOSIUM ON ELECTROMAGNETIC COMPATIBILITY, SIGNAL & POWER INTEGRITY, AND EMC EUROPE (EMC+SIPI AND EMC EUROPE), 2021, : 53 - 58
  • [24] Hancock: A language for processing very large-scale data
    Bonachea, D
    Fisher, K
    Rogers, A
    Smith, F
    [J]. ACM SIGPLAN NOTICES, 2000, 35 (01) : 163 - 176
  • [25] Ten simple rules for large-scale data processing
    Fungtammasan, Arkarachai
    Lee, Alexandra
    Taroni, Jaclyn
    Wheeler, Kurt
    Chin, Chen-Shan
    Davis, Sean
    Greene, Casey
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (02)
  • [26] THE DESIGN OF DATA PROCESSING COMPILERS FOR LARGE-SCALE COMPUTERS
    NUTT, R
    SWIFT, CJ
    [J]. COMMUNICATIONS OF THE ACM, 1963, 6 (07) : 360 - 360
  • [27] The Family of MapReduce and Large-Scale Data Processing Systems
    Sakr, Sherif
    Liu, Anna
    Fayoumi, Ayman G.
    [J]. ACM COMPUTING SURVEYS, 2013, 46 (01)
  • [28] DATA-PROCESSING IN LARGE-SCALE RESEARCH PROJECTS
    FLANAGAN, JC
    [J]. HARVARD EDUCATIONAL REVIEW, 1961, 31 (03) : 250 - 256
  • [29] Hancock: A language for processing very large-scale data
    Bonachea, D
    Fisher, K
    Rogers, A
    Smith, F
    [J]. USENIX ASSOCIATION PROCEEDINGS OF THE 2ND CONFERENCE ON DOMAIN-SPECIFIC LANGUAGES (DSL'99), 1999, : 163 - 176
  • [30] Optimizing data stream processing for large-scale applications
    Cappellari, Paolo
    Roantree, Mark
    Chun, Soon Ae
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 2018, 48 (09): : 1607 - 1641