Tersect: a set theoretical utility for exploring sequence variant data

被引:4
|
作者
Kurowski, Tomasz J. [1 ]
Mohareb, Fady [1 ]
机构
[1] Cranfield Univ, Sch Water Energy & Environm, Bioinformat Grp, Bedford MK43 0AL, England
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1093/bioinformatics/btz634
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A Summary: Comparing genomic features among a large panel of individuals across the same species is considered nowadays a core part of the bioinformatics analyses. This typically involves a series of complex theoretical expressions to compare, intersect, extract symmetric differences between individuals within a large set of genotypes. Several publically available tools are capable of performing such tasks; however, due to the sheer size of variants being queried, such tasks can be computationally expensive with a runtime ranging from few minutes up to several hours depending on the dataset size. This makes existing tools unsuitable for interactive data query or as part of genomic data visualization platforms such as genome browsers. Tersect is a lightweight, high-performance command-line utility which interprets and applies flexible set theoretical expressions to sets of sequence variant data. It can be used both for interactive data exploration and as part of a larger pipeline thanks to its highly optimized storage and indexing algorithms for variant data.
引用
收藏
页码:934 / 935
页数:2
相关论文
共 50 条
  • [11] Exploring the set-theoretical structure of objects by additive trees
    Math J. J. M. Candel
    Psychometrika, 1997, 62 : 119 - 131
  • [12] Social Set Analysis: A Set Theoretical Approach to Big Data Analytics
    Vatrapu, Ravi
    Mukkamala, Raghava Rao
    Hussain, Abid
    Flesch, Benjamin
    IEEE ACCESS, 2016, 4 : 2542 - 2571
  • [13] Exploring reticulate patterns in DNA sequence data
    Bandelt, HJ
    PLANT SPECIES-LEVEL SYSTEMATICS: NEW PERSPECTIVES ON PATTERN & PROCESS, 2005, 143 : 245 - 269
  • [14] PREDICTING HYPOPARATHYROIDISM DIAGNOSIS IN MEDICARE CLAIMS - EXPLORING THE UTILITY OF SEQUENCE ANALYSIS
    Li, S.
    Yu, B.
    Wang, X.
    Yu, G.
    Singh, D.
    Miyasato, G.
    Yajima, M.
    VALUE IN HEALTH, 2019, 22 : S156 - S157
  • [15] Synthetic data use: exploring use cases to optimise data utility
    James S.
    Harbron C.
    Branson J.
    Sundler M.
    Discover Artificial Intelligence, 2021, 1 (01):
  • [16] An approach to generating the sequence of part variant design based on information transfer utility
    Xu, Xinsheng
    Lin, Jing
    Xiao, Ying
    Yu, Jianzhe
    Liu, Qing
    Geng, Jie
    ASSEMBLY AUTOMATION, 2019, 39 (01) : 186 - 199
  • [17] ON THE RATIONALE AND UTILITY OF WEIGHTING NUCLEOTIDE-SEQUENCE DATA
    ALBERT, VA
    MISHLER, BD
    CLADISTICS-THE INTERNATIONAL JOURNAL OF THE WILLI HENNIG SOCIETY, 1992, 8 (01): : 73 - 83
  • [18] A variant reference data set for the Africanized honeybee, Apis mellifera
    Samir M. Kadri
    Brock A. Harpur
    Ricardo O. Orsi
    Amro Zayed
    Scientific Data, 3
  • [19] A variant reference data set for the Africanized honeybee, Apis mellifera
    Kadri, Samir M.
    Harpur, Brock A.
    Orsi, Ricardo O.
    Zayed, Amro
    SCIENTIFIC DATA, 2016, 3
  • [20] Exploring a Tobacco Data Set with a Multiblock PLS Method
    Vivien, Myrtille
    Sabatier, Robert
    CURRENT ANALYTICAL CHEMISTRY, 2012, 8 (02) : 273 - 282