A greedy, graph-based algorithm for the alignment of multiple homologous gene lists

被引:21
|
作者
Fostier, Jan [2 ]
Proost, Sebastian [1 ,3 ]
Dhoedt, Bart [2 ]
Saeys, Yvan [1 ,3 ]
Demeester, Piet [2 ]
Van de Peer, Yves [1 ,3 ]
Vandepoele, Klaas [1 ,3 ]
机构
[1] VIB, Dept Plant Syst Biol, Ghent, Belgium
[2] Ghent Univ IBBT, Dept Informat Technol INTEC, Ghent, Belgium
[3] Univ Ghent, Dept Plant Biotechnol & Genet, Ghent, Belgium
关键词
SEQUENCE ALIGNMENT; GENOMIC PROFILES; IDENTIFICATION; PHYLOGENY; BENCHMARK; ACCURACY; PROTEINS; CLUSTAL; SYNTENY; COFFEE;
D O I
10.1093/bioinformatics/btr008
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Many comparative genomics studies rely on the correct identification of homologous genomic regions using accurate alignment tools. In such case, the alphabet of the input sequences consists of complete genes, rather than nucleotides or amino acids. As optimal multiple sequence alignment is computationally impractical, a progressive alignment strategy is often employed. However, such an approach is susceptible to the propagation of alignment errors in early pairwise alignment steps, especially when dealing with strongly diverged genomic regions. In this article, we present a novel accurate and efficient greedy, graph-based algorithm for the alignment of multiple homologous genomic segments, represented as ordered gene lists. Results: Based on provable properties of the graph structure, several heuristics are developed to resolve local alignment conflicts that occur due to gene duplication and/or rearrangement events on the different genomic segments. The performance of the algorithm is assessed by comparing the alignment results of homologous genomic segments in Arabidopsis thaliana to those obtained by using both a progressive alignment method and an earlier graph-based implementation. Especially for datasets that contain strongly diverged segments, the proposed method achieves a substantially higher alignment accuracy, and proves to be sufficiently fast for large datasets including a few dozens of eukaryotic genomes.
引用
收藏
页码:749 / 756
页数:8
相关论文
共 50 条
  • [1] A graph-based genetic algorithm for the multiple sequence alignment problem
    Lopes, Heitor S.
    Moritz, Guilherme L.
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING - ICAISC 2006, PROCEEDINGS, 2006, 4029 : 420 - 429
  • [2] A graph-based algorithm for alignment of OWL ontologies
    Le, Bach Tharth
    Dieng-Kuntz, Rose
    [J]. PROCEEDINGS OF THE IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE: WI 2007, 2007, : 466 - +
  • [3] On Graph-Based Data Structures to Multiple Genome Alignment
    Jafarzadeh, Nafiseh
    Iranmanesh, Ali
    [J]. MATCH-COMMUNICATIONS IN MATHEMATICAL AND IN COMPUTER CHEMISTRY, 2020, 83 (01) : 33 - 62
  • [4] A Greedy Clustering Algorithm for Multiple Sequence Alignment
    Lebsir, Rabah
    Layeb, Abdesslem
    Fariza, Tahi
    [J]. INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2021, 15 (04)
  • [5] Graph-based molecular alignment (GMA)
    Marialke, J.
    Koerner, R.
    Tietze, S.
    Apostolakis, Joannis
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (02) : 591 - 601
  • [6] Graph-based Alignment and Uniformity for Recommendation
    Yang, Liangwei
    Liu, Zhiwei
    Wang, Chen
    Yang, Mingdai
    Liu, Xiaolong
    Ma, Jing
    Yu, Philip S.
    [J]. PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 4395 - 4399
  • [7] A Novel Ant Based Algorithm for Multiple Graph Alignment
    Tran Ngoc Ha
    Do Duc Dong
    Hoang Xuan Huan
    [J]. 2014 INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR COMMUNICATIONS (ATC), 2014, : 181 - 186
  • [8] Greedy Maximization Framework for Graph-based Influence Functions
    Cohen, Edith
    [J]. PROCEEDINGS OF 2016 FOURTH IEEE WORKSHOP ON HOT TOPICS IN WEB SYSTEMS AND TECHNOLOGIES (HOTWEB), 2016, : 29 - 35
  • [9] Research on a Graph-Based Algorithm
    Dai, Shang-ping
    Duan Xin
    [J]. PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN, VOL 1, 2008, : 17 - 20
  • [10] Graph-based modeling of tandem repeats improves global multiple sequence alignment
    Szalkowski, Adam M.
    Anisimova, Maria
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (17)