An Efficient Motif Finding Algorithm for Large DNA Data Sets

被引:0
|
作者
Yu, Qiang [1 ]
Huo, Hongwei [1 ]
Chen, Xiaoyang [1 ]
Guo, Haitao [1 ]
Vitter, Jeffrey Scott [2 ]
Huan, Jun [2 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Peoples R China
[2] Univ Kansas, Informat & Telecommun Technol Ctr, Lawrence, KS 66047 USA
关键词
Motif discovery; ChIP-seq; emerging substrings; MapReduce; DISCOVERY; SEARCH;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The planted (l, d) motif discovery has been successfully used to locate transcription factor binding sites in dozens of promoter sequences over the past decade. However, there has not been enough work done in identifying (l, d) motifs in the next-generation sequencing (ChIP-seq) data sets, which contain thousands of input sequences and thereby bring new challenge to make a good identification in reasonable time. To cater this need, we propose a new planted (l, d) motif discovery algorithm named MCES, which identifies motifs by mining and combining emerging substrings. Specially, to handle larger data sets, we design a MapReduce-based strategy to mine emerging substrings distributedly. Experimental results on the simulated data show that i) MCES is able to identify (l, d) motifs efficiently and effectively in thousands to millions of input sequences, and runs faster than the state-of-the-art (l, d) motif discovery algorithms, such as F-motif and TraverStringsR; ii) MCES is able to identify motifs without known lengths, and has a better identification accuracy than the competing algorithm CisFinder. Also, the validity of MCES is tested on real data sets.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] An Efficient Algorithm for Discovering Motifs in Large DNA Data Sets
    Yu, Qiang
    Huo, Hongwei
    Chen, Xiaoyang
    Guo, Haitao
    Vitter, Jeffrey Scott
    Huan, Jun
    [J]. IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2015, 14 (05) : 535 - 544
  • [2] A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets
    Zhang, Yipu
    Wang, Ping
    [J]. BIOMED RESEARCH INTERNATIONAL, 2015, 2015
  • [3] A private DNA motif finding algorithm
    Chen, Rui
    Peng, Yun
    Choi, Byron
    Xu, Jianliang
    Hu, Haibo
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 50 : 122 - 132
  • [4] A New Efficient Algorithm for Quorum Planted Motif Search on Large DNA Datasets
    Yu, Qiang
    Zhang, Xiao
    [J]. IEEE ACCESS, 2019, 7 : 129617 - 129626
  • [5] An Efficient Exact Algorithm for Planted Motif Search on Large DNA Sequence Datasets
    Yu, Qiang
    Hu, Yana
    Hu, Xinnan
    Lan, Jingfen
    Guo, Yang
    [J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2024, 21 (05) : 1542 - 1551
  • [6] Cloud-based MOTIFSIM: Detecting Similarity in Large DNA Motif Data Sets
    Tran, Ngoc Tam L.
    Huang, Chun-Hsi
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2017, 24 (05) : 450 - 459
  • [7] A GROUP FINDING ALGORITHM FOR MULTIDIMENSIONAL DATA SETS
    Sharma, Sanjib
    Johnston, Kathryn V.
    [J]. ASTROPHYSICAL JOURNAL, 2009, 703 (01): : 1061 - 1077
  • [8] Efficient motif finding algorithms for large-alphabet inputs
    Pavel P Kuksa
    Vladimir Pavlovic
    [J]. BMC Bioinformatics, 11
  • [9] An Efficient Combinatorial Approach for Solving the DNA Motif Finding Problem
    Geraci, Filippo
    Pellegrini, Marco
    Renda, M. Elena
    [J]. 2009 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2009, : 335 - 340
  • [10] Efficient motif finding algorithms for large-alphabet inputs
    Kuksa, Pavel P.
    Pavlovic, Vladimir
    [J]. BMC BIOINFORMATICS, 2010, 11