A new algorithm for detecting low-complexity regions in protein sequences

被引:20
|
作者
Shin, SW [1 ]
Kim, SM [1 ]
机构
[1] Kyungpook Natl Univ, Dept Comp Engn, Taegu 702701, South Korea
关键词
D O I
10.1093/bioinformatics/bth497
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Pair-wise alignment of protein sequences and local similarity searches produce many false positives because of compositionally biased regions, also called low-complexity regions (LCRs), of amino acid residues. Masking and filtering such regions significantly improves the reliability of homology searches and, consequently, functional predictions. Most of the available algorithms are based on a statistical approach. We wished to investigate the structural properties of LCRs in biological sequences and develop an algorithm for filtering them. Results: We present an algorithm for detecting and masking LCRs in protein sequences to improve the quality of database searches. We developed the algorithm based on the complexity analysis of subsequences delimited by a pair of identical, repeating subsequences. Given a protein sequence, the algorithm first computes the suffix tree of the sequence. It then collects repeating subsequences from the tree. Finally, the algorithm iteratively tests whether each subsequence delimited by a pair of repeating subsequences meets a given criteria. Test results with 1000 proteins from 20 families in Pfam show that the repeating subsequences are a good indicator for the low-complexity regions, and the algorithm based on such structural information strongly compete with others.
引用
收藏
页码:160 / 170
页数:11
相关论文
共 50 条
  • [31] Prion-like low-complexity sequences: Key regulators of protein solubility and phase behavior
    Franzmann, Titus M.
    Alberti, Simon
    [J]. JOURNAL OF BIOLOGICAL CHEMISTRY, 2019, 294 (18) : 7128 - 7136
  • [32] Mutations linked to neurological disease enhance self-association of low-complexity protein sequences
    Zhou, Xiaoming
    Sumrow, Lily
    Tashiro, Kyuto
    Sutherland, Lillian
    Liu, Daifei
    Qin, Tian
    Kato, Masato
    Liszczak, Glen
    McKnight, Steven L.
    [J]. SCIENCE, 2022, 377 (6601) : 46 - +
  • [33] A New Low-Complexity Near-ML Detection Algorithm for Spatial Modulation
    Tang, Qian
    Xiao, Yue
    Yang, Ping
    Yu, Qiaoling
    Li, Shaoqian
    [J]. IEEE WIRELESS COMMUNICATIONS LETTERS, 2013, 2 (01) : 90 - 93
  • [34] Gentle Masking of Low-Complexity Sequences Improves Homology Search
    Frith, Martin C.
    [J]. PLOS ONE, 2011, 6 (12):
  • [35] Sequence Determines the Switch in the Fibril Forming Regions in the Low-Complexity FUS Protein and Its Variants
    Kumar, Abhinaw
    Chakraborty, Debayan
    Mugnai, Mauro Lorenzo
    Straub, John E.
    Thirumalai, D.
    [J]. JOURNAL OF PHYSICAL CHEMISTRY LETTERS, 2021, 12 (37): : 9026 - 9032
  • [36] Low-complexity generation of scalable complete complementary sets of sequences
    Lowe, Darryn
    Huang, Xiaojing
    [J]. 2006 INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES,VOLS 1-3, 2006, : 941 - +
  • [37] Low-Complexity Channel Estimation Using Supercomplementary Blocks of Sequences
    Dotlic, Igor
    Murray, Carl
    Mclaughlin, Michael
    [J]. IEEE ACCESS, 2023, 11 : 18995 - 19006
  • [38] Dissecting the role of low-complexity regions in the evolution of vertebrate proteins
    Núria Radó-Trilla
    MMar Albà
    [J]. BMC Evolutionary Biology, 12
  • [39] Dissecting the role of low-complexity regions in the evolution of vertebrate proteins
    Rado-Trilla, Nuria
    Alba, M. Mar
    [J]. BMC EVOLUTIONARY BIOLOGY, 2012, 12
  • [40] A Low-Complexity Algorithm for NB-IoT Networks
    Alemaishat, Salem
    [J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY (IOTBDS), 2021, : 205 - 214