Tersect: a set theoretical utility for exploring sequence variant data

被引:4
|
作者
Kurowski, Tomasz J. [1 ]
Mohareb, Fady [1 ]
机构
[1] Cranfield Univ, Sch Water Energy & Environm, Bioinformat Grp, Bedford MK43 0AL, England
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1093/bioinformatics/btz634
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A Summary: Comparing genomic features among a large panel of individuals across the same species is considered nowadays a core part of the bioinformatics analyses. This typically involves a series of complex theoretical expressions to compare, intersect, extract symmetric differences between individuals within a large set of genotypes. Several publically available tools are capable of performing such tasks; however, due to the sheer size of variants being queried, such tasks can be computationally expensive with a runtime ranging from few minutes up to several hours depending on the dataset size. This makes existing tools unsuitable for interactive data query or as part of genomic data visualization platforms such as genome browsers. Tersect is a lightweight, high-performance command-line utility which interprets and applies flexible set theoretical expressions to sets of sequence variant data. It can be used both for interactive data exploration and as part of a larger pipeline thanks to its highly optimized storage and indexing algorithms for variant data.
引用
收藏
页码:934 / 935
页数:2
相关论文
共 50 条
  • [1] The utility of sequence teaming tasks: theoretical status and empirical data
    Buchner, A
    Frensch, PA
    PSYCHOLOGISCHE RUNDSCHAU, 2000, 51 (01) : 10 - 18
  • [2] Theoretical bound on frequency hopping sequence set
    Liu, X.
    Peng, D. Y.
    ELECTRONICS LETTERS, 2013, 49 (10) : 654 - 655
  • [3] Fast Utility Mining on Sequence Data
    Gan, Wensheng
    Lin, Jerry Chun-Wei
    Zhang, Jiexiong
    Fournier-Viger, Philippe
    Chao, Han-Chieh
    Yu, Philip S.
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (02) : 487 - 500
  • [4] Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle
    Pausch, Hubert
    MacLeod, Iona M.
    Fries, Ruedi
    Emmerling, Reiner
    Bowman, Phil J.
    Daetwyler, Hans D.
    Goddard, Michael E.
    GENETICS SELECTION EVOLUTION, 2017, 49
  • [5] Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle
    Hubert Pausch
    Iona M. MacLeod
    Ruedi Fries
    Reiner Emmerling
    Phil J. Bowman
    Hans D. Daetwyler
    Michael E. Goddard
    Genetics Selection Evolution, 49
  • [6] Exploring the Utility of Performing a Down Set as a Postactivation Potentiation Strategy
    Wong, Hanson
    Gentles, Jeremy
    Bazyler, Caleb
    Ramsey, Michael
    JOURNAL OF STRENGTH AND CONDITIONING RESEARCH, 2021, 35 (05) : 1217 - 1222
  • [7] Exploring the utility of "next-generation" sequence data on inferring the phylogeny of the South American Valeriana (Valerianaceae)
    Bell, Charles D.
    Gonzalez, Lauren A.
    MOLECULAR PHYLOGENETICS AND EVOLUTION, 2018, 123 : 44 - 49
  • [8] On-Shelf Utility Mining of Sequence Data
    Zhang, Chunkai
    Du, Zilin
    Yang, Yuting
    Gan, Wensheng
    Yu, Philip S.
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2022, 16 (02)
  • [9] Set-valued Data Anonymization Maintaining Data Utility and Data Property
    Gunawan, Dedi
    Mambo, Masahiro
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM 2018), 2018,
  • [10] Exploring the set-theoretical structure of objects by additive trees
    Candel, MJJM
    PSYCHOMETRIKA, 1997, 62 (01) : 119 - 131