Computing Maximal Covers for Protein Sequences

被引：1

作者：

Golding, G. Brian ^{[1
]}

Koponen, Holly ^{[2
]}

Mhaskar, Neerja ^{[2
,3
]}

Smyth, W. F. ^{[2
]}

机构：

[1] McMaster Univ, Dept Biol, Hamilton, ON, Canada

[2] McMaster Univ, Dept Comp & Software, Hamilton, ON, Canada

[3] McMaster Univ, Dept Comp & Software, 1280 Main St West, Hamilton, ON L8S 4L8, Canada

来源：

JOURNAL OF COMPUTATIONAL BIOLOGY | 2023年 / 30卷 / 02期

基金：

加拿大自然科学与工程研究理事会;

关键词：

MAXCOVER; MUMmer; protein; repeats; string covers; ARRAY;

D O I：

10.1089/cmb.2021.0520

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

A partial cover of a string or sequence of length n, which we model as an array x=x[1..n], is a repeating substring u of x such that "many " positions in x lie within occurrences of u. A maximal cover u*-introduced in 2018 by Mhaskar and Smyth as optimal cover-is a partial cover that, over all partial covers u, maximizes the positions covered. Applying data structures also introduced by Mhaskar and Smyth, our software MAXCOVER for the first time enables efficient computation of u* for any x-in particular, as described here, for protein sequences of Arabidopsis, Caenorhabditis elegans, Drosophila melanogaster, and humans. In this protein context, we also compare an extended version of MAXCOVER with existing software (MUMmer's repeat-match) for the closely related task of computing non-extendible repeating substrings (a.k.a. maximal repeats). In practice, MAXCOVER is an order-of-magnitude faster than MUMmer, with much lower space requirements, while producing more compact output that, nevertheless, yields a more exact and user-friendly specification of the repeats.

引用

页码：149 / 160

页数：12

共 50 条

[41] CONDITIONS FOR EXISTENCE OF MINIMAL CLOSED COVERS COMPOSED OF MAXIMAL COMPATIBLES
PAGER, D
IEEE TRANSACTIONS ON COMPUTERS, 1971, C 20 (04) : 450 - &
[42] Computing Interpolating Sequences
Valentin V. Andreev
Timothy H. McNicholl
Theory of Computing Systems, 2010, 46 : 340 - 350
[43] Computing spectral sequences
Romero, A.
Rubio, J.
Sergeraert, F.
JOURNAL OF SYMBOLIC COMPUTATION, 2006, 41 (10) : 1059 - 1079
[44] Computing Interpolating Sequences
Andreev, Valentin V.
McNicholl, Timothy H.
THEORY OF COMPUTING SYSTEMS, 2010, 46 (02) : 340 - 350
[45] Some infinite sequences of canonical covers of degree 2
Nguyen Bin
ADVANCES IN GEOMETRY, 2021, 21 (01) : 143 - 148
[46] RELATIVE PROJECTIVE COVERS AND ALMOST SPLIT-SEQUENCES
THEVENAZ, J
COMMUNICATIONS IN ALGEBRA, 1985, 13 (07) : 1535 - 1554
[47] Maximal green sequences for string algebras
Garver, Alexander
Serhiyenko, Khrystyna
JOURNAL OF COMBINATORIAL ALGEBRA, 2022, 6 (1-2) : 45 - 78
[48] Maximal Green Sequences for Preprojective Algebras
Engenhorst, Magnus
ALGEBRAS AND REPRESENTATION THEORY, 2017, 20 (01) : 163 - 174
[49] Maximal Frequent Sequences for Document Classification
Hai Nguyen Thi Tuyet
Tan Hanh
PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR COMMUNICATIONS (ATC), 2016, : 152 - 157
[50] Maximal free sequences in a Boolean algebra
Monk, J. D.
COMMENTATIONES MATHEMATICAE UNIVERSITATIS CAROLINAE, 2011, 52 (04): : 593 - 610

← 1 2 3 4 5 →