PipeVal: light-weight extensible tool for file validation

被引:1
|
作者
Patel, Yash [1 ,2 ]
Beshlikyan, Arpi [1 ,2 ]
Jordan, Madison [1 ,2 ]
Kim, Gina [1 ,2 ]
Holmes, Aaron [1 ,2 ]
Yamaguchi, Takafumi N. [1 ,2 ,3 ]
Boutros, Paul C. [1 ,2 ,3 ,4 ,5 ,6 ]
机构
[1] Univ Calif Los Angeles, Jonsson Comprehens Canc Ctr, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Inst Precis Hlth, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90095 USA
[4] Univ Calif Los Angeles, Dept Urol, Los Angeles, CA 90095 USA
[5] Univ Calif Los Angeles, Broad Stem Cell Res Ctr, Los Angeles, CA 90095 USA
[6] Univ Calif Los Angeles, Dept Human Genet, 10833 Le Conte Ave, Los Angeles, CA 90095 USA
基金
美国国家卫生研究院;
关键词
FORMAT;
D O I
10.1093/bioinformatics/btae079
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation The volume of biomedical data generated each year is growing exponentially as high-throughput molecular, imaging and mHealth technologies expand. This rise in data volume has contributed to an increasing reliance on and demand for computational methods, and consequently to increased attention to software quality and data integrity.Results To simplify data verification in diverse data-processing pipelines, we created PipeVal, a light-weight, easy-to-use, extensible tool for file validation. It is open-source, easy to integrate with complex workflows, and modularized for extensibility for new file formats. PipeVal can be rapidly inserted into existing methods and pipelines to automatically validate and verify inputs and outputs. This can reduce wasted compute time attributed to file corruption or invalid file paths, and significantly improve the quality of data-intensive software.Availability and implementation PipeVal is an open-source Python package under the GPLv2 license and it is freely available at https://github.com/uclahs-cds/package-PipeVal. The docker image is available at: https://github.com/uclahs-cds/package-PipeVal/pkgs/container/pipeval.
引用
收藏
页数:4
相关论文
共 50 条