Language-Agnostic Reproducible Data Analysis Using Literate Programming

被引:1
|
作者
Vassilev, Boris [1 ]
Louhimo, Riku [2 ]
Ikonen, Elina [1 ,3 ]
Hautaniemi, Sampsa [2 ]
机构
[1] Univ Helsinki, Fac Med, Dept Anat, Helsinki, Finland
[2] Univ Helsinki, Genome Scale Biol, Res Programs Unit, Helsinki, Finland
[3] Minerva Fdn, Inst Med Res, Helsinki, Finland
来源
PLOS ONE | 2016年 / 11卷 / 10期
基金
芬兰科学院;
关键词
BREAST-CANCER; EXPRESSION ANALYSES; TARGET INTERACTIONS; LAPTM4B; INTEGRATION; PREDICTOR; MICRORNAS; RESOURCE; GENES; CELLS;
D O I
10.1371/journal.pone.0164023
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A modern biomedical research project can easily contain hundreds of analysis steps and lack of reproducibility of the analyses has been recognized as a severe issue. While thorough documentation enables reproducibility, the number of analysis programs used can be so large that in reality reproducibility cannot be easily achieved. Literate programming is an approach to present computer programs to human readers. The code is rearranged to follow the logic of the program, and to explain that logic in a natural language. The code executed by the computer is extracted from the literate source code. As such, literate programming is an ideal formalism for systematizing analysis steps in biomedical research. We have developed the reproducible computing tool Lir (literate, reproducible computing) that allows a tool-agnostic approach to biomedical data analysis. We demonstrate the utility of Lir by applying it to a case study. Our aim was to investigate the role of endosomal trafficking regulators to the progression of breast cancer. In this analysis, a variety of tools were combined to interpret the available data: a relational database, standard command-line tools, and a statistical computing environment. The analysis revealed that the lipid transport related genes LAPTM4B and NDRG1 are coamplified in breast cancer patients, and identified genes potentially cooperating with LAPTM4B in breast cancer progression. Our case study demonstrates that with Lir, an array of tools can be combined in the same data analysis to improve efficiency, reproducibility, and ease of understanding. Lir is an open-source software available at github. com/borisvassilev/lir.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] MultiEmo: Language-Agnostic Sentiment Analysis
    Milkowski, Piotr
    Gruza, Marcin
    Kazienko, Przemyslaw
    Szolomicka, Joanna
    Wozniak, Stanislaw
    Koco, Jan
    [J]. COMPUTATIONAL SCIENCE, ICCS 2022, PT II, 2022, : 72 - 79
  • [2] Language-agnostic Injection Detection
    Hermerschmidt, Lars
    Straub, Andreas
    Piskachev, Goran
    [J]. 2020 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2020), 2020, : 268 - 275
  • [3] LANGUAGE-AGNOSTIC MULTILINGUAL MODELING
    Datta, Arindrima
    Ramabhadran, Bhuvana
    Emond, Jesse
    Kannan, Anjuli
    Roark, Brian
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8239 - 8243
  • [4] Multi-model Analysis of Language-Agnostic Sentiment Classification on MultiEmo Data
    Milkowski, Piotr
    Gruza, Marcin
    Kazienko, Przemyslaw
    Szolomicka, Joanna
    Wozniak, Stanislaw
    Kocon, Jan
    [J]. COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022, 2022, 13501 : 163 - 175
  • [5] Language-agnostic Topic Classification for Wikipedia
    Johnson, Isaac
    Gerlach, Martin
    Saez-Trumper, Diego
    [J]. WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, : 594 - 601
  • [6] Language-agnostic BERT Sentence Embedding
    Feng, Fangxiaoyu
    Yang, Yinfei
    Cer, Daniel
    Arivazhagan, Naveen
    Wang, Wei
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 878 - 891
  • [7] Inducing Language-Agnostic Multilingual Representations
    Zhao, Wei
    Eger, Steffen
    Bjerva, Johannes
    Augenstein, Isabelle
    [J]. 10TH CONFERENCE ON LEXICAL AND COMPUTATIONAL SEMANTICS (SEM 2021), 2021, : 229 - 240
  • [8] Language-agnostic speech anger identification
    Saitta, Alessandra
    Ntalampiras, Stavros
    [J]. 2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, : 249 - 253
  • [9] Advancing Static Code Analysis With Language-Agnostic Component Identification
    Schiewe, Micah
    Curtis, Jacob
    Bushong, Vincent
    Cerny, Tomas
    [J]. IEEE Access, 2022, 10 : 30743 - 30761
  • [10] A Multi-Language Computing Environment for Literate Programming and Reproducible Research
    Schulte, Eric
    Davison, Dan
    Dye, Thomas
    Dominik, Carsten
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2012, 46 (03): : 1 - 24