Efficient gene orthology inference via large-scale rearrangements

被引:0
|
作者
Rubert, Diego P. [1 ,2 ,3 ]
Braga, Marilia D. V. [2 ,3 ]
机构
[1] Univ Fed Mato Grosso do Sul, Fac Comp, Campo Grande, Brazil
[2] Bielefeld Univ, Fac Technol, Bielefeld, Germany
[3] Bielefeld Univ, Ctr Biotechnol CeBiTec, Bielefeld, Germany
关键词
Comparative genomics; Double-cut-and-join; Indels; Gene orthology; DOUBLE-CUT; GENOME; ALGORITHM; DISTANCE; JOIN;
D O I
10.1186/s13015-023-00238-y
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Recently we developed a gene orthology inference tool based on genome rearrangements (Journal of Bioinformatics and Computational Biology 19:6, 2021). Given a set of genomes our method first computes all pairwise gene similarities. Then it runs pairwise ILP comparisons to compute optimal gene matchings, which minimize, by taking the similarities into account, the weighted rearrangement distance between the analyzed genomes (a problem that is NP-hard). The gene matchings are then integrated into gene families in the final step. The mentioned ILP includes an optimal capping that connects each end of a linear segment of one genome to an end of a linear segment in the other genome, producing an exponential increase of the search space. Results In this work, we design and implement a heuristic capping algorithm that replaces the optimal capping by clustering (based on their gene content intersections) the linear segments into m >= 1subsets, whose ends are capped independently. Furthermore, in each subset, instead of allowing all possible connections, we let only the ends of content-related segments be connected. Although there is no guarantee that m is much bigger than one, and with the possible side effect of resulting in sub-optimal instead of optimal gene matchings, the heuristic works very well in practice, from both the speed performance and the quality of computed solutions. Our experiments on primate and fruit fly genomes show two positive results. First, for complete assemblies of five primates the version with heuristic capping reports orthologies that are very similar to the orthologies computed by the version of our tool with optimal capping. Second, we were able to efficiently analyze fruit fly genomes with incomplete assemblies distributed in hundreds or even thousands of contigs, obtaining gene families that are very similar to FlyBase families. Indeed, our tool inferred a higher number of complete cliques, with a higher intersection with FlyBase, when compared to gene families computed by other inference tools. We added a post-processing for refining, with the aid of the mcl algorithm, our ambiguous families (those with more than one gene per genome), improving even more the accuracy of our results. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities and the post-processing refinement of ambiguous families with mcl. Both the original version with optimal capping and the new modified version with heuristic capping can be downloaded, together with their detailed documentations, at https://gitlab.ub.uni-bielefeld.de/gi/FFGC or as a Conda package at https://anaconda.org/bioco nda/ffgc.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Efficient gene orthology inference via large-scale rearrangements
    Diego P. Rubert
    Marília D. V. Braga
    Algorithms for Molecular Biology, 18
  • [2] Gene Orthology Inference via Large-Scale Rearrangements for Partially Assembled Genomes
    Rubert, Diego P.
    Braga, Marília D.V.
    Leibniz International Proceedings in Informatics, LIPIcs, 2022, 242
  • [3] Algorithm of OMA for large-scale orthology inference
    Roth, Alexander C. J.
    Gonnet, Gaston H.
    Dessimoz, Christophe
    BMC BIOINFORMATICS, 2008, 9 (1)
  • [4] Algorithm of OMA for large-scale orthology inference
    Alexander CJ Roth
    Gaston H Gonnet
    Christophe Dessimoz
    BMC Bioinformatics, 9
  • [5] Erratum to: Algorithm of OMA for large-scale orthology inference
    Alexander CJ Roth
    Gaston H Gonnet
    Christophe Dessimoz
    BMC Bioinformatics, 10
  • [6] Integrating gene annotation with orthology inference at scale
    Kirilenko, Bogdan M.
    Munegowda, Chetan
    Osipova, Ekaterina
    Jebb, David
    Sharma, Virag
    Blumer, Moritz
    Morales, Ariadna E.
    Ahmed, Alexis-Walid
    Kontopoulos, Dimitrios-Georgios
    Hilgers, Leon
    Lindblad-Toh, Kerstin
    Karlsson, Elinor K.
    Hiller, Michael
    SCIENCE, 2023, 380 (6643) : 368 - +
  • [7] The potential of family-free rearrangements towards gene orthology inference
    Rubert, Diego P.
    Doerr, Daniel
    Braga, Marilia D., V
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2021, 19 (06)
  • [8] Algorithm of OMA for large-scale orthology inference (vol 9, pg 518, 2008)
    Roth, Alexander C. J.
    Gonnet, Gaston H.
    Dessimoz, Christophe
    BMC BIOINFORMATICS, 2009, 10
  • [9] Large-scale assignment of orthology: back to phylogenetics?
    Toni Gabaldón
    Genome Biology, 9
  • [10] Large-scale assignment of orthology: back to phylogenetics?
    Gabaldon, Toni
    GENOME BIOLOGY, 2008, 9 (10) : 235