TRCMGene: A two-step referential compression method for the efficient storage of genetic data

被引:2
|
作者
Tang, You [1 ]
Li, Min [2 ]
Sun, Jing [3 ]
Zhang, Tao [2 ]
Zhang, Jicheng [2 ]
Zheng, Ping [2 ]
机构
[1] JiLin Agr Sci & Technol Univ, Elect & Informat Engn Coll, Jilin, Jilin, Peoples R China
[2] Northeast Agr Univ, Coll Elect & Informat, Harbin, Heilongjiang, Peoples R China
[3] Qiqihar Univ, Coll Life Sci & Agr, Qiqihar, Peoples R China
来源
PLOS ONE | 2018年 / 13卷 / 11期
关键词
D O I
10.1371/journal.pone.0206521
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background The massive quantities of genetic data generated by high-throughput sequencing pose challenges to data storage, transmission and analyses. These problems are effectively solved through data compression, in which the size of data storage is reduced and the speed of data transmission is improved. Several options are available for compressing and storing genetic data. However, most of these options either do not provide sufficient compression rates or require a considerable length of time for decompression and loading. Results Here, we propose TRCMGene, a lossless genetic data compression method that uses a referential compression scheme. The novel concept of two-step compression method, which builds an index structure using K-means and k-nearest neighbours, is introduced to TRCMGene. Evaluation with several real datasets revealed that the compression factor of TRCMGene ranges from 9 to 21. TRCMGene presents a good balance between compression factor and reading time. On average, the reading time of compressed data is 60% of that of uncompressed data. Thus, TRCMGene not only saves disc space but also saves file access time and speeds up data loading. These effects collectively improve genetic data storage and transmission in the current hardware environment and render system upgrades unnecessary. TRCMGene, user manual and demos could be accessed freely from https://github.com/tangyou79/TRCM. The data mentioned in this manuscript could be downloaded from https://github.com/tangyou79/TRCM/wiki.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Efficient sequencing data compression and FPGA acceleration based on a two-step framework
    Chen, Shifu
    Chen, Yaru
    Wang, Zhouyang
    Qin, Wenjian
    Zhang, Jing
    Nand, Heera
    Zhang, Jishuai
    Li, Jun
    Zhang, Xiaoni
    Liang, Xiaoming
    Xu, Mingyan
    FRONTIERS IN GENETICS, 2023, 14
  • [2] A two-step method for compression of medical monitoring video
    Liu, Q
    Sclabassi, RJ
    Scheuer, ML
    Sun, MG
    PROCEEDINGS OF THE 25TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-4: A NEW BEGINNING FOR HUMAN HEALTH, 2003, 25 : 845 - 848
  • [3] HRCM: An Efficient Hybrid Referential Compression Method for Genomic Big Data
    Yao, Haichang
    Ji, Yimu
    Li, Kui
    Liu, Shangdong
    He, Jing
    Wang, Ruchuan
    BIOMED RESEARCH INTERNATIONAL, 2019, 2019
  • [4] A two-step method for preprocessing volume data
    Cheng, B
    Wang, Y
    Zheng, NN
    Bian, ZZ
    Zhang, YP
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2004, 95 (02) : 150 - 164
  • [5] A Two-Step Method for smFRET Data Analysis
    Chen, Jixin
    Pyle, Joseph R.
    Piecco, Kurt Waldo Sy
    Kolomeisky, Anatoly B.
    Landes, Christy F.
    JOURNAL OF PHYSICAL CHEMISTRY B, 2016, 120 (29): : 7128 - 7132
  • [6] A two-step monolithic method for the efficient simulation of incompressible flows
    Ryzhakov, P.
    Cotela, J.
    Rossi, R.
    Onate, E.
    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, 2014, 74 (12) : 919 - 934
  • [7] Larger convergence regions for an efficient two-step iterative method
    Ramandeep Behl
    I. K. Argyros
    Computational and Applied Mathematics, 2024, 43
  • [8] Larger convergence regions for an efficient two-step iterative method
    Behl, Ramandeep
    Argyros, I. K.
    COMPUTATIONAL & APPLIED MATHEMATICS, 2024, 43 (01):
  • [9] An Efficient Two-Step Iterative Method for Absolute Value Equations
    Khan A.
    Iqbal J.
    International Journal of Applied and Computational Mathematics, 2023, 9 (5)
  • [10] A two-step method of crossover adjustment for satellite altimeter data
    Fan, Xin
    Guo, Jinyun
    Zhang, Huiying
    Jia, Yongjun
    Liu, Xin
    ADVANCES IN SPACE RESEARCH, 2025, 75 (01) : 219 - 232