An Efficient Motif Finding Algorithm for Large DNA Data Sets

被引:0
|
作者
Yu, Qiang [1 ]
Huo, Hongwei [1 ]
Chen, Xiaoyang [1 ]
Guo, Haitao [1 ]
Vitter, Jeffrey Scott [2 ]
Huan, Jun [2 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Peoples R China
[2] Univ Kansas, Informat & Telecommun Technol Ctr, Lawrence, KS 66047 USA
来源
2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) | 2014年
关键词
Motif discovery; ChIP-seq; emerging substrings; MapReduce; DISCOVERY; SEARCH;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The planted (l, d) motif discovery has been successfully used to locate transcription factor binding sites in dozens of promoter sequences over the past decade. However, there has not been enough work done in identifying (l, d) motifs in the next-generation sequencing (ChIP-seq) data sets, which contain thousands of input sequences and thereby bring new challenge to make a good identification in reasonable time. To cater this need, we propose a new planted (l, d) motif discovery algorithm named MCES, which identifies motifs by mining and combining emerging substrings. Specially, to handle larger data sets, we design a MapReduce-based strategy to mine emerging substrings distributedly. Experimental results on the simulated data show that i) MCES is able to identify (l, d) motifs efficiently and effectively in thousands to millions of input sequences, and runs faster than the state-of-the-art (l, d) motif discovery algorithms, such as F-motif and TraverStringsR; ii) MCES is able to identify motifs without known lengths, and has a better identification accuracy than the competing algorithm CisFinder. Also, the validity of MCES is tested on real data sets.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] An efficient algorithm for finding all maximal conflict sets in concurrent programs
    Hiraishi, K
    SECOND INTERNATIONAL WORKSHOP ON SOFTWARE ENGINEERING FOR PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 1997, : 39 - 47
  • [32] A new algorithm for DNA motif discovery using multiple sample sequence sets
    Yu, Qiang
    Zhao, Xiang
    Huo, Hongwei
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2019, 17 (04)
  • [33] A clique algorithm for motif finding problem
    Yan, SG
    Lai, J
    INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 4, PROCEEDINGS, 2004, : 130 - 134
  • [34] PROJECTION Algorithm for Motif Finding on GPUs
    Clemente, Jhoirene B.
    Cabarle, Francis George C.
    Adorna, Henry N.
    THEORY AND PRACTICE OF COMPUTATION, 2012, 5 : 101 - 115
  • [35] Distributed parameterized algorithm for Motif Finding
    Zhang, Zu-Ping
    Wang, Li
    Zhongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Central South University (Science and Technology), 2007, 38 (05): : 943 - 949
  • [36] Ensemble Algorithms for DNA Motif Finding
    Kim, Juho
    Yu, Seunghak
    Yoon, Sungroh
    2014 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION AND COMMUNICATIONS (ICEIC), 2014,
  • [37] A survey of DNA motif finding algorithms
    Das, Modan K.
    Dai, Ho-Kwok
    BMC BIOINFORMATICS, 2007, 8 (Suppl 7)
  • [38] A survey of DNA motif finding algorithms
    Modan K Das
    Ho-Kwok Dai
    BMC Bioinformatics, 8
  • [39] Detecting Motifs in a Large Data Set: Applying Probabilistic Insights to Motif Finding
    Boucher, Christina
    Brown, Daniel G.
    BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, PROCEEDINGS, 2009, 5462 : 139 - 150
  • [40] PMS5: an efficient exact algorithm for the (l, d)-motif finding problem
    Dinh, Hieu
    Rajasekaran, Sanguthevar
    Kundeti, Vamsi K.
    BMC BIOINFORMATICS, 2011, 12