Estimating motifs under order restrictions

被引:0
|
作者
van Zwet, Erik W. [1 ]
Kechris, Katherina J.
Bickel, Peter J.
Eisen, Michael B.
机构
[1] Leiden Univ, Inst Math, NL-2300 RA Leiden, Netherlands
[2] Univ Calif San Francisco, Dept Biochem & Biophys, San Francisco, CA 94143 USA
[3] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[4] Univ Calif Berkeley, Dept Mol & Cell Biol, Berkeley, CA 94720 USA
[5] Ernest Orlando Lawrence Berkeley Natl Lab, Div Life Sci, Berkeley, CA USA
关键词
D O I
暂无
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Transcription factors and many other DNA-binding proteins recognize more than one specific sequence. Among sequences recognized by a given DNA-binding protein, different positions exhibit varying degrees of conservation. The reason is that base pairs that are more extensively contacted by the protein tend to be more conserved. This observation can be used in the discovery of transcription factor binding sites. Here we present a rigorous means to accomplish this. In particular, we constrain the order of the information (entropy) in the columns of the position specific weight matrix (PWM) which characterizes the motif being sought. We then show how to compute the maximum likelihood estimate of a PWM under such order restrictions. This computation is easily integrated with the EM algorithm or the Gibbs sampler to enhance performance in the search for motifs in unaligned sequences. We demonstrate our method on a well-known data set of binding sites of the transcription factor Crp in E. coli.
引用
收藏
页数:18
相关论文
共 50 条