Word Decoding of Protein Amino Acid Sequences with Availability Analysis: A Linguistic Approach

被引:19
|
作者
Motomura, Kenta [1 ,2 ]
Fujita, Tomohiro [1 ]
Tsutsumi, Motosuke [1 ]
Kikuzato, Satsuki [1 ]
Nakamura, Morikazu [2 ]
Otaki, Joji M. [1 ]
机构
[1] Univ Ryukyus, Dept Chem Biol & Marine Sci, BCPH Unit Mol Physiol, Nishihara, Okinawa 90301, Japan
[2] Univ Ryukyus, Dept Informat Sci, Nishihara, Okinawa 90301, Japan
来源
PLOS ONE | 2012年 / 7卷 / 11期
关键词
POWER-LAW DISTRIBUTIONS; N-GRAM PATTERNS; ZIPFS LAW; SECONDARY STRUCTURE; BIOLOGICAL SEQUENCES; RANDOM TEXTS; LANGUAGE; DATABASE; TOPOLOGY; EXHIBIT;
D O I
10.1371/journal.pone.0050039
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The amino acid sequences of proteins determine their three-dimensional structures and functions. However, how sequence information is related to structures and functions is still enigmatic. In this study, we show that at least a part of the sequence information can be extracted by treating amino acid sequences of proteins as a collection of English words, based on a working hypothesis that amino acid sequences of proteins are composed of short constituent amino acid sequences (SCSs) or "words". We first confirmed that the English language highly likely follows Zipf's law, a special case of power law. We found that the rank-frequency plot of SCSs in proteins exhibits a similar distribution when low-rank tails are excluded. In comparison with natural English and "compressed" English without spaces between words, amino acid sequences of proteins show larger linear ranges and smaller exponents with heavier low-rank tails, demonstrating that the SCS distribution in proteins is largely scale-free. A distribution pattern of SCSs in proteins is similar among species, but species-specific features are also present. Based on the availability scores of SCSs, we found that sequence motifs are enriched in high-availability sites (i.e., "key words") and vice versa. In fact, the highest availability peak within a given protein sequence often directly corresponds to a sequence motif. The amino acid composition of high-availability sites within motifs is different from that of entire motifs and all protein sequences, suggesting the possible functional importance of specific SCSs and their compositional amino acids within motifs. We anticipate that our availability-based word decoding approach is complementary to sequence alignment approaches in predicting functionally important sites of unknown proteins from their amino acid sequences.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Parallel and Antiparallel β-Strands Differ in Amino Acid Composition and Availability of Short Constituent Sequences
    Tsutsumi, Motosuke
    Otaki, Joji M.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2011, 51 (06) : 1457 - 1464
  • [22] STRUCTURE - A PASCAL PACKAGE FOR ANALYSIS OF PROTEIN STRUCTURAL CHARACTERISTICS FROM AMINO-ACID SEQUENCES
    RADVANY, M
    DAVIS, LE
    ANDERSON, BE
    FASEB JOURNAL, 1988, 2 (06): : A1766 - A1766
  • [23] Similarity analysis of protein sequences using a reduced k-mer amino acid model
    Wen, Jia
    Zhang, Yuyan
    Wang, Huanxu
    COMMUNICATIONS IN INFORMATION AND SYSTEMS, 2020, 20 (01) : 45 - 60
  • [24] A numerical measure of amino acid residues similarity based on the analysis of their surroundings in natural protein sequences
    Rogov, SI
    Nekrasov, AN
    PROTEIN ENGINEERING, 2001, 14 (07): : 459 - 463
  • [25] Analysis of amino acid sequences with artificial neural networks
    Wrede, P
    Schneider, G
    Schuchhardt, J
    Muller, G
    CHEMIE IN UNSERER ZEIT, 1996, 30 (04) : 172 - 181
  • [26] Analysis of amino-acid sequences by statistical technique
    Tsumoto, S
    Hirano, S
    Yasuda, A
    Tsumoto, K
    INFORMATION SCIENCES, 2002, 145 (3-4) : 205 - 214
  • [27] Reduced availability of amino acid inhibits muscle protein synthesis.
    Kobayashi, H
    Borsheim, E
    Traber, DL
    Badalamenti, J
    Wolfe, RR
    FASEB JOURNAL, 2001, 15 (05): : A759 - A759
  • [28] Differential responsiveness of protein synthesis and degradation to amino acid availability in humans
    Giordano, M
    Castellino, P
    DeFronzo, RA
    DIABETES, 1996, 45 (04) : 393 - 399
  • [29] THE RELATION OF AMINO ACID AVAILABILITY IN DIETARY PROTEIN TO LIVER ENZYME ACTIVITY
    WILLIAMS, JN
    ELVEHJEM, CA
    JOURNAL OF BIOLOGICAL CHEMISTRY, 1949, 181 (02) : 559 - 564
  • [30] Effects of antinutritional factors on protein digestibility and amino acid availability in foods
    Gilani, GS
    Cockell, KA
    Sepehr, E
    JOURNAL OF AOAC INTERNATIONAL, 2005, 88 (03) : 967 - 987