A specialized learner for inferring structured cis-regulatory modules

[摘要]

Background

The process of transcription is controlled by systems of transcription factors, which bind to specific patterns of binding sites in the transcriptional control regions of genes, called cis-regulatory modules (CRMs). We present an expressive and easily comprehensible CRM representation which is capable of capturing several aspects of a CRM's structure and distinguishing between DNA sequences which do or do not contain it. We also present a learning algorithm tailored for this domain, and a novel method to avoid overfitting by controlling the expressivity of the model.

Results

We are able to find statistically significant CRMs more often then a current state-of-the-art approach on the same data sets. We also show experimentally that each aspect of our expressive CRM model space makes a positive contribution to the learned models on yeast and fly data.

Conclusion

Structural aspects are an important part of CRMs, both in terms of interpreting them biologically and learning them accurately. Source code for our algorithm is available at: http://www.cs.wisc.edu/~noto/crm webcite

[发布日期] 2006-12-05 [发布机构]

[效力级别] [学科分类]

[关键词] [时效性]

浏览次数：1

统一登录查看全文激活码登录查看全文