A Measure of Protein Sequence Characteristics Based on the Frequency and the Position Entropy of Existing K-words

被引:0
|
作者
Qi, Zhao-Hui [1 ]
Jin, Meng-Zhe [1 ]
Yang, Hong [2 ]
机构
[1] Shijiazhuang Tiedao Univ, Coll Informat Sci & Technol, Shijiazhuang 050043, Hebei, Peoples R China
[2] Qingdao Binhai Univ, Qingdao 266555, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
2-D GRAPHICAL REPRESENTATION; PHYSICOCHEMICAL PROPERTIES; DNA-SEQUENCES; ALIGNMENT; CURVE; SIMILARITY; PHYLOGENY; GENE;
D O I
暂无
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Based on the frequency and the position distribution entropy of the existing k-words, we construct a modified statistical method for k-words. We call this method as an Existing-k-word method. The method consists of two parts. The first is to extract the existing k-words in proteins but not the all possible 20(k) k-words. The other is to design a feature vector consisting of the frequencies and the position distribution entropies of the existing k-words. Then, this proposed method is applied to two datasets, nine ND5 proteins (NADH dehydrogenase subunit 5), and twenty-four transferrin protein sequences. The results illustrate the utility of the proposed method.
引用
收藏
页码:731 / 748
页数:18
相关论文
共 50 条
  • [1] Alignment-free sequence comparison using joint frequency and position information of k-words
    Han, Gyu-Bum
    Chung, Byung Chang
    Cho, Dong-Ho
    2017 39TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2017, : 3880 - 3883
  • [2] Alignment-free genome sequence comparison method based on pair transition difference of k-words
    Han, Gyu-Bum
    Cho, Dong-Ho
    2017 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL & HEALTH INFORMATICS (BHI), 2017, : 45 - 48
  • [3] Extraction of high quality k-words for alignment-free sequence comparison
    Gunasinghe, Upuli
    Alahakoon, Damminda
    Bedingfield, Susan
    JOURNAL OF THEORETICAL BIOLOGY, 2014, 358 : 31 - 51
  • [4] Specificity Analysis of Genome Based on Statistically Identical K-Words With Same Base Combination
    Seo, Hyein
    Song, Yong-Joon
    Cho, Kiho
    Cho, Dong-Ho
    IEEE OPEN JOURNAL OF ENGINEERING IN MEDICINE AND BIOLOGY, 2020, 1 : 214 - 219
  • [5] A measure of DNA sequence dissimilarity based on Mahalanobis distance between frequencies of words
    Wu, T.-J.
    Burke, J. P.
    Davison, D. B.
    Biometrics, 53 (04):
  • [6] A measure of DNA sequence dissimilarity based on Mahalanobis distance between frequencies of words
    Wu, TJ
    Burke, JP
    Davison, DB
    BIOMETRICS, 1997, 53 (04) : 1431 - 1439
  • [7] An improved Position Weight Matrix method based on an Entropy Measure for the Recognition of Prokaryotic Promoters
    Wu, Qinqin
    Wang, Jiajun
    Yan, Hong
    2009 3RD INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1-11, 2009, : 126 - +
  • [8] An Improved Position Weight Matrix method based on an entropy measure for the recognition of prokaryotic promoters
    Wu, Qinqin
    Wang, Jiajun
    Yan, Hong
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2011, 5 (01) : 22 - 37
  • [9] Improving Naive Bayes by Reducing the Importance of Low-Frequency Words Based on Entropy of Words for Spam Email Classification
    Trikanjananun, Phaiboon
    Numsomran, Arjin
    Tipsuwannaporn, Vittaya
    2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 10 - 14
  • [10] A new distance measure for comparing sequence profiles based on path lengths along an entropy surface
    Benson, G
    BIOINFORMATICS, 2002, 18 : S44 - S53