A greedy, graph-based algorithm for the alignment of multiple homologous gene lists

被引:21
|
作者
Fostier, Jan [2 ]
Proost, Sebastian [1 ,3 ]
Dhoedt, Bart [2 ]
Saeys, Yvan [1 ,3 ]
Demeester, Piet [2 ]
Van de Peer, Yves [1 ,3 ]
Vandepoele, Klaas [1 ,3 ]
机构
[1] VIB, Dept Plant Syst Biol, Ghent, Belgium
[2] Ghent Univ IBBT, Dept Informat Technol INTEC, Ghent, Belgium
[3] Univ Ghent, Dept Plant Biotechnol & Genet, Ghent, Belgium
关键词
SEQUENCE ALIGNMENT; GENOMIC PROFILES; IDENTIFICATION; PHYLOGENY; BENCHMARK; ACCURACY; PROTEINS; CLUSTAL; SYNTENY; COFFEE;
D O I
10.1093/bioinformatics/btr008
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Many comparative genomics studies rely on the correct identification of homologous genomic regions using accurate alignment tools. In such case, the alphabet of the input sequences consists of complete genes, rather than nucleotides or amino acids. As optimal multiple sequence alignment is computationally impractical, a progressive alignment strategy is often employed. However, such an approach is susceptible to the propagation of alignment errors in early pairwise alignment steps, especially when dealing with strongly diverged genomic regions. In this article, we present a novel accurate and efficient greedy, graph-based algorithm for the alignment of multiple homologous genomic segments, represented as ordered gene lists. Results: Based on provable properties of the graph structure, several heuristics are developed to resolve local alignment conflicts that occur due to gene duplication and/or rearrangement events on the different genomic segments. The performance of the algorithm is assessed by comparing the alignment results of homologous genomic segments in Arabidopsis thaliana to those obtained by using both a progressive alignment method and an earlier graph-based implementation. Especially for datasets that contain strongly diverged segments, the proposed method achieves a substantially higher alignment accuracy, and proves to be sufficiently fast for large datasets including a few dozens of eukaryotic genomes.
引用
收藏
页码:749 / 756
页数:8
相关论文
共 50 条
  • [41] Urns with Multiple Drawings and Graph-Based Interaction
    Dahiya, Yogesh
    Sahasrabudhe, Neeraja
    [J]. JOURNAL OF THEORETICAL PROBABILITY, 2024,
  • [42] GraDit: graph-based data repair algorithm for multiple data edits rule violations
    Madjida, Wa Ode Zuhayeni
    Nugraha, I. Gusti Bagus Baskara
    [J]. INTERNATIONAL CONFERENCE ON DATA AND INFORMATION SCIENCE (ICODIS), 2018, 971
  • [43] Maximal Path Based Conflict Resolution Approach in Multiple Homologous Gene List Alignment
    Noel, Ridwan Rashid
    Hasan, Rakibul
    Rahman, M. Sohel
    [J]. 2012 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2012, : 587 - 591
  • [44] An Graph-based Algorithm for Prioritizing Cancer Susceptibility Genes from Gene Fusion Data
    Zhang, Xuanping
    Xu, Mingzhe
    Wang, Yixuan
    Gao, Aiqing
    Zhao, Zhongmeng
    Huang, Yi
    Xiao, Xiao
    Wang, Jiayin
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 2204 - 2210
  • [45] Graph Clustering: a graph-based clustering algorithm for the electromagnetic calorimeter in LHCb
    Canudas, Nuria Valls
    Gomez, Miriam Calvo
    Vilasis-Cardona, Xavier
    Ribe, Elisabet Golobardes
    [J]. EUROPEAN PHYSICAL JOURNAL C, 2023, 83 (02):
  • [46] Graph Clustering: a graph-based clustering algorithm for the electromagnetic calorimeter in LHCb
    Núria Valls Canudas
    Míriam Calvo Gómez
    Xavier Vilasís-Cardona
    Elisabet Golobardes Ribé
    [J]. The European Physical Journal C, 83
  • [47] GASOLINE: a Greedy And Stochastic algorithm for Optimal Local multiple alignment of Interaction NEtworks
    Micale, Giovanni
    Pulvirenti, Alfredo
    Giugno, Rosalba
    Ferro, Alfredo
    [J]. PLOS ONE, 2014, 9 (06):
  • [49] An Efficient Parallel Algorithm for Graph-Based Image Segmentation
    Wassenberg, Jan
    Middelmann, Wolfgang
    Sanders, Peter
    [J]. COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PROCEEDINGS, 2009, 5702 : 1003 - +
  • [50] Graph-based KNN Algorithm for Spam SMS Detection
    Tran Phuc Ho
    Kang, Ho-Seok
    Kim, Sung-Ryul
    [J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2013, 19 (16) : 2404 - 2419