TRCMGene: A two-step referential compression method for the efficient storage of genetic data

被引:2
|
作者
Tang, You [1 ]
Li, Min [2 ]
Sun, Jing [3 ]
Zhang, Tao [2 ]
Zhang, Jicheng [2 ]
Zheng, Ping [2 ]
机构
[1] JiLin Agr Sci & Technol Univ, Elect & Informat Engn Coll, Jilin, Jilin, Peoples R China
[2] Northeast Agr Univ, Coll Elect & Informat, Harbin, Heilongjiang, Peoples R China
[3] Qiqihar Univ, Coll Life Sci & Agr, Qiqihar, Peoples R China
来源
PLOS ONE | 2018年 / 13卷 / 11期
关键词
D O I
10.1371/journal.pone.0206521
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background The massive quantities of genetic data generated by high-throughput sequencing pose challenges to data storage, transmission and analyses. These problems are effectively solved through data compression, in which the size of data storage is reduced and the speed of data transmission is improved. Several options are available for compressing and storing genetic data. However, most of these options either do not provide sufficient compression rates or require a considerable length of time for decompression and loading. Results Here, we propose TRCMGene, a lossless genetic data compression method that uses a referential compression scheme. The novel concept of two-step compression method, which builds an index structure using K-means and k-nearest neighbours, is introduced to TRCMGene. Evaluation with several real datasets revealed that the compression factor of TRCMGene ranges from 9 to 21. TRCMGene presents a good balance between compression factor and reading time. On average, the reading time of compressed data is 60% of that of uncompressed data. Thus, TRCMGene not only saves disc space but also saves file access time and speeds up data loading. These effects collectively improve genetic data storage and transmission in the current hardware environment and render system upgrades unnecessary. TRCMGene, user manual and demos could be accessed freely from https://github.com/tangyou79/TRCM. The data mentioned in this manuscript could be downloaded from https://github.com/tangyou79/TRCM/wiki.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] A new class of efficient and debiased two-step shrinkage estimators: method and application
    Qasim, Muhammad
    Mansson, Kristofer
    Sjolander, Par
    Kibria, B. M. Golam
    JOURNAL OF APPLIED STATISTICS, 2022, 49 (16) : 4181 - 4205
  • [42] Highly efficient family of two-step simultaneous method for all polynomial roots
    Shams, Mudassir
    Kausar, Nasreen
    Araci, Serkan
    Kong, Liang
    Carpentieri, Bruno
    AIMS MATHEMATICS, 2024, 9 (01): : 1755 - 1771
  • [43] A new efficient two-step iterative method for solving absolute value equations
    Khan, Alamgir
    Iqbal, Javed
    Shah, Rasool
    ENGINEERING COMPUTATIONS, 2024, 41 (03) : 597 - 610
  • [44] A Two-step Numerical Method for Efficient Analysis of Structural Response to Blast Load
    Li, J.
    Hao, H.
    INTERNATIONAL JOURNAL OF PROTECTIVE STRUCTURES, 2011, 2 (01) : 103 - 126
  • [45] An Efficient Two-Step Direction Finding Method in Sample-Starved Environments
    Wen, Fuxi
    Wang, Zhongmin
    2017 20TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2017, : 1511 - 1515
  • [46] Multiobjective dispatch of hydrogenerating units using a two-step genetic algorithm method
    Colnago, Glauber R.
    Correia, Paulo B.
    2009 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-5, 2009, : 2554 - 2560
  • [47] Iterative two-step genetic-algorithm-based method for efficient polynomial B-spline surface reconstruction
    Galvez, Akemi
    Iglesias, Andres
    Puig-Pey, Jaime
    INFORMATION SCIENCES, 2012, 182 (01) : 56 - 76
  • [48] Two-step deconvolution approach for wellbore storage removal
    Khalaf, Mina S.
    El-Banbi, Ahmed H.
    El-Maraghi, A.
    Sayyouh, M. H.
    JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2020, 195
  • [49] A Two-Step Phase Calibration Method for Tomographic Applications with Airborne SAR Data
    Pardini, Matteo
    Papathanassiou, Konstantinos
    10TH EUROPEAN CONFERENCE ON SYNTHETIC APERTURE RADAR (EUSAR 2014), 2014,
  • [50] Analysis of Freeway Secondary Crashes With a Two-Step Method by Loop Detector Data
    Yang, Bo
    Guo, Yanyong
    Xu, Chengcheng
    IEEE ACCESS, 2019, 7 : 22884 - 22890