已收录 268921 条政策
 政策提纲
  • 暂无提纲
Top-Down Clustering for Protein Subfamily Identification
[摘要] We propose a novel method for the task of protein subfamily identification; that is, finding subgroups of functionally closely related sequences within a protein family. In line with phylogenomic analysis, the method first builds a hierarchical tree using as input a multiple alignment of the protein sequences, then uses a post-pruning procedure to extract clusters from the tree. Differently from existing methods, it constructs the hierarchical tree top-down, rather than bottom-up and associates particular mutations with each division into subclusters. The motivating hypothesis for this method is that it may yield a better tree topology with more accurate subfamily identification as a result and additionally indicates functionally important sites and allows for easy classification of new proteins. A thorough experimental evaluation confirms the hypothesis. The novel method yields more accurate clusters and a better tree topology than the state-of-the-art method SCI-PHY, identifies known functional sites, and identifies mutations that alone allow for classifying new sequences with an accuracy approaching that of hidden Markov models.
[发布日期]  [发布机构] 
[效力级别]  [学科分类] 生物技术
[关键词] clustering trees;top-down clustering;decision trees;protein subfamily identification;phylogenomics [时效性] 
   浏览次数:21      统一登录查看全文      激活码登录查看全文