Tersect: a set theoretical utility for exploring sequence variant data

被引:4
|
作者
Kurowski, Tomasz J. [1 ]
Mohareb, Fady [1 ]
机构
[1] Cranfield Univ, Sch Water Energy & Environm, Bioinformat Grp, Bedford MK43 0AL, England
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1093/bioinformatics/btz634
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A Summary: Comparing genomic features among a large panel of individuals across the same species is considered nowadays a core part of the bioinformatics analyses. This typically involves a series of complex theoretical expressions to compare, intersect, extract symmetric differences between individuals within a large set of genotypes. Several publically available tools are capable of performing such tasks; however, due to the sheer size of variants being queried, such tasks can be computationally expensive with a runtime ranging from few minutes up to several hours depending on the dataset size. This makes existing tools unsuitable for interactive data query or as part of genomic data visualization platforms such as genome browsers. Tersect is a lightweight, high-performance command-line utility which interprets and applies flexible set theoretical expressions to sets of sequence variant data. It can be used both for interactive data exploration and as part of a larger pipeline thanks to its highly optimized storage and indexing algorithms for variant data.
引用
收藏
页码:934 / 935
页数:2
相关论文
共 50 条
  • [21] Estimating Incremental Dimensional Algorithm with Sequence Data Set
    Adaekalavan, S.
    2013 INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, INFORMATICS AND MEDICAL ENGINEERING (PRIME), 2013,
  • [22] An event set approach to sequence discovery in medical data
    Ramirez, Jorge C.G.
    Cook, Diane J.
    Peterson, Lynn L.
    Peterson, Dolores M.
    Intelligent Data Analysis, 2000, 4 (06) : 513 - 530
  • [23] Towards a Set Theoretical Approach to Big Data Analytics
    Mukkamala, Raghava Rao
    Hussain, Abid
    Vatrapu, Ravi
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 629 - 636
  • [24] Towards publishing set-valued data with high utility
    Lin, Sinhong, 1600, Binary Information Press (10):
  • [25] EXPLORING THEORETICAL FUNCTIONS OF CORPUS DATA IN TEACHING TRANSLATION
    Poirier, Eric
    CADERNOS DE TRADUCAO, 2016, 36 (01): : 177 - 212
  • [26] ProUM: Projection-based utility mining on sequence data
    Gan, Wensheng
    Lin, Jerry Chun-Wei
    Zhang, Jiexiong
    Chao, Han-Chieh
    Fujita, Hamido
    Yu, Philip S.
    INFORMATION SCIENCES, 2020, 513 : 222 - 240
  • [27] Application and utility of (imputed) sequence data in genomic studies in horses
    Reich, Paula
    Falker-Gieske, Clemens
    Tetens, Jens
    ZUCHTUNGSKUNDE, 2022, 94 (05): : 380 - 391
  • [28] HUSP-SP: Faster Utility Mining on Sequence Data
    Zhang, Chunkai
    Yang, Yuting
    Du, Zilin
    Gan, Wensheng
    Yu, Philip S.
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (01)
  • [29] Exploring complexity and contradiction in information technology outsourcing: A set-theoretical approach
    Bui, Quang Neo
    Leo, Ezekiel
    Adelakun, Olayele
    JOURNAL OF STRATEGIC INFORMATION SYSTEMS, 2019, 28 (03): : 330 - 355
  • [30] Exploring the Need For an Updated Mixed File Research Data Set
    Davies, Simon R.
    Macfarlane, Richard
    Buchanan, William J.
    2021 7TH INTERNATIONAL CONFERENCE ON ENGINEERING AND EMERGING TECHNOLOGIES (ICEET 2021), 2021, : 426 - 430