Bayesian Unsupervised Learning of DNA Regulatory Binding Regions
[摘要] Identification of regulatory binding motifs, that is, short specific words, within DNA sequences is a commonly occurring problem in computational bioinformatics. A wide variety of probabilistic approaches have been proposed in the literature to either scan for previously known motif types or to attempt de novo identification of a fixed number (typically one) of putative motifs. Mostapproaches assume the existence of reliable biodatabase information to build probabilistic a priori description of the motif classes. Examples of attempts to do probabilistic unsupervised learning about the number of putative de novo motif types and theirpositions within a set of DNA sequences are very rare in the literature. Here we show how such a learning problem can be formulated using a Bayesian model that targets to simultaneously maximize the marginal likelihood of sequence data arising under multiple motif types as well as under the background DNA model, which equals a variable length Markov chain. It is demonstrated how the adopted Bayesian modelling strategy combined with recently introduced nonstandard stochastic computation tools yields a more tractable learning procedure than is possible with the standard Monte Carlo approaches. Improvements and extensions of the proposed approach are also discussed.
[发布日期] [发布机构]
[效力级别] [学科分类] 人工智能
[关键词] [时效性]