TRCMGene: A two-step referential compression method for the efficient storage of genetic data

被引:2
|
作者
Tang, You [1 ]
Li, Min [2 ]
Sun, Jing [3 ]
Zhang, Tao [2 ]
Zhang, Jicheng [2 ]
Zheng, Ping [2 ]
机构
[1] JiLin Agr Sci & Technol Univ, Elect & Informat Engn Coll, Jilin, Jilin, Peoples R China
[2] Northeast Agr Univ, Coll Elect & Informat, Harbin, Heilongjiang, Peoples R China
[3] Qiqihar Univ, Coll Life Sci & Agr, Qiqihar, Peoples R China
来源
PLOS ONE | 2018年 / 13卷 / 11期
关键词
D O I
10.1371/journal.pone.0206521
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background The massive quantities of genetic data generated by high-throughput sequencing pose challenges to data storage, transmission and analyses. These problems are effectively solved through data compression, in which the size of data storage is reduced and the speed of data transmission is improved. Several options are available for compressing and storing genetic data. However, most of these options either do not provide sufficient compression rates or require a considerable length of time for decompression and loading. Results Here, we propose TRCMGene, a lossless genetic data compression method that uses a referential compression scheme. The novel concept of two-step compression method, which builds an index structure using K-means and k-nearest neighbours, is introduced to TRCMGene. Evaluation with several real datasets revealed that the compression factor of TRCMGene ranges from 9 to 21. TRCMGene presents a good balance between compression factor and reading time. On average, the reading time of compressed data is 60% of that of uncompressed data. Thus, TRCMGene not only saves disc space but also saves file access time and speeds up data loading. These effects collectively improve genetic data storage and transmission in the current hardware environment and render system upgrades unnecessary. TRCMGene, user manual and demos could be accessed freely from https://github.com/tangyou79/TRCM. The data mentioned in this manuscript could be downloaded from https://github.com/tangyou79/TRCM/wiki.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Two-step analysis of hierarchical data
    Giesecke, Johannes
    Kohler, Ulrich
    STATA JOURNAL, 2024, 24 (02): : 213 - 249
  • [22] A Two-Step Method for Missing Spatio-Temporal Data Reconstruction
    Cheng, Shifen
    Lu, Feng
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2017, 6 (07)
  • [23] A two-step hypergraph reduction based fitting method for unbalanced data
    Xiao, Guobao
    Zhou, Xiong
    Yan, Yan
    Wang, Hanzi
    PATTERN RECOGNITION LETTERS, 2020, 134 (134) : 106 - 115
  • [24] A two-step method to Process Bistatic SAR Data in the General Configuration
    Wang, Robert
    Loffeld, Otmar
    Nies, Holger
    Ul-Ann, Quirat
    Ortiz, Amaya Medrano
    Knedlik, Stefan
    2008 IEEE RADAR CONFERENCE, VOLS. 1-4, 2008, : 422 - 426
  • [25] AN IMPROVED TWO-STEP MOTION COMPENSATION METHOD BASED ON RAW DATA
    Li, Jincheng
    Wang, Pengbo
    Chen, Jie
    Wang, Jiakun
    Yang, Wei
    2015 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2015, : 4484 - 4486
  • [26] Two-Step Data Envelopment Analysis Approach for Efficient Engineering Enrollment Management
    Kongar, Elif
    Sobh, Tarek M.
    Baral, Mahesh
    INTERNATIONAL JOURNAL OF ENGINEERING EDUCATION, 2009, 25 (02) : 391 - 402
  • [27] Two-Step Coding For High Definition Video Compression
    Jiang, Wenfei
    Liu, Wenyu
    Latecki, Longin Jan
    Liang, Hui
    Wang, Changqing
    Feng, Bing
    2010 DATA COMPRESSION CONFERENCE (DCC 2010), 2010, : 535 - 535
  • [28] Two-step ray tracing method
    Ma, Zheng-ming
    Li, Yan-da
    Acta Geophysica Sinica, 1991, 34 (04):
  • [29] A two-step Adomian decomposition method
    Luo, XG
    APPLIED MATHEMATICS AND COMPUTATION, 2005, 170 (01) : 570 - 583
  • [30] TWO-STEP CONTROL GRADING METHOD
    周耀烈
    邵丹
    Journal of Zhejiang University Science, 2001, (04) : 112 - 116