Data augmentation algorithms for detecting conserved domains in protein sequences: A comparative study

被引:6
|
作者
Bi, Chengpeng [1 ,2 ,3 ]
机构
[1] Univ Missouri, Bioinformat & Intelligent Comp Lab, Childrens Mercy Hosp & Clin, Sch Med, Kansas City, MO 64108 USA
[2] Univ Missouri, Bioinformat & Intelligent Comp Lab, Childrens Mercy Hosp & Clin, Sch Comp, Kansas City, MO 64108 USA
[3] Univ Missouri, Bioinformat & Intelligent Comp Lab, Childrens Mercy Hosp & Clin, Sch Engn, Kansas City, MO 64108 USA
关键词
data augmentation; expectation maximization (EM); Gibbs sampling; Markov chain Monte Carlo; motif discovery; multiple local alignment; protein sequence analysis;
D O I
10.1021/pr070475q
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein conserved domains are distinct units of molecular structure, usually associated with particular aspects of molecular function such as catalysis or binding. These conserved subsequences are often unobserved and thus in need of detection. Motif discovery methods can be used to find these unobserved domains given a set of sequences. This paper presents the data augmentation (DA) framework that unifies a suite of motif-finding algorithms through maximizing the same likelihood function by imputing the unobserved data. The data augmentation refers to those methods that formulate iterative optimization by exploiting the unobserved data. Two categories of maximum likelihood based motif-finding algorithms are illustrated under the DA framework. The first is the deterministic algorithms that are to maximize the likelihood function by performing an iteratively optimal local search in the alignment space. The second is the stochastic algorithms that are to iteratively draw motif location samples via Monte Carlo simulation and simultaneously keep track of the superior solution with the best likelihood. As a result, four DA motif discovery algorithms are described, evaluated, and compared by aligning real and simulated protein sequences.
引用
收藏
页码:192 / 201
页数:10
相关论文
共 50 条
  • [1] A Comparative Study of Clustering Algorithms for Protein Sequences
    Tang, DongMing
    Zhu, QingXin
    Yang, Fan
    [J]. 2009 FOURTH INTERNATIONAL CONFERENCE ON BIO-INSPIRED COMPUTING: THEORIES AND APPLICATIONS, PROCEEDINGS, 2009, : 120 - 124
  • [2] A Comparative Study of Algorithms Detecting Differential Rhythmicity in Transcriptomic Data
    Miao, Lin
    Weidemann, Douglas E.
    Ngo, Katherine
    Unruh, Benjamin A.
    Kojima, Shihoko
    [J]. BIOINFORMATICS AND BIOLOGY INSIGHTS, 2024, 18
  • [3] CDvist: a webserver for identification and visualization of conserved domains in protein sequences
    Adebali, Ogun
    Ortega, Davi R.
    Zhulin, Igor B.
    [J]. BIOINFORMATICS, 2015, 31 (09) : 1475 - 1477
  • [4] A Comparative Study of Pattern Matching Algorithms on Sequences
    Min, Fan
    Wu, Xindong
    [J]. ROUGH SETS, FUZZY SETS, DATA MINING AND GRANULAR COMPUTING, PROCEEDINGS, 2009, 5908 : 510 - +
  • [5] A Novel Algorithm for Detecting Co-evolutionary Domains in Protein and Nucleotide Sequences
    Zhang, Xiaoyu
    Liao, Xiangke
    Zhu, Hao
    Li, Kenli
    Shi, Benyun
    Peng, Shaoliang
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 56 - 61
  • [6] Detection of conserved domains in protein sequences using a maximum-density subgraph algorithm
    Matsuda, H
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2000, E83A (04) : 713 - 721
  • [7] A Comparative Study of Orthogonal Algorithms for Detecting the HIF in MDCs
    Yeh, Hen-Geul
    Sim, Sokhom
    Yinger, Robert
    Bravo, Richard
    [J]. 2017 IEEE GREEN ENERGY AND SMART SYSTEMS CONFERENCE (IGESSC), 2017,
  • [8] A comparative study of some algorithms for detecting communities in social networks
    Akachar, Elyazid
    Ouhbi, Brahim
    Frikh, Bouchra
    [J]. 2016 4TH IEEE INTERNATIONAL COLLOQUIUM ON INFORMATION SCIENCE AND TECHNOLOGY (CIST), 2016, : 257 - 262
  • [9] A Machine Learning Algorithms for Detecting Phishing Websites: A Comparative Study
    Taha, Mohammed A.
    Jabar, Haider D.A.
    Mohammed, Widad K.
    [J]. Iraqi Journal for Computer Science and Mathematics, 2024, 5 (03): : 275 - 286
  • [10] A Comparative Study of Machine Learning Algorithms for Detecting Breast Cancer
    Khan, Razib Hayat
    Miah, Jonayet
    Rahman, Md Minhazur
    Tayaba, Maliha
    [J]. 2023 IEEE 13TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE, CCWC, 2023, : 647 - 652