已收录 268918 条政策
 政策提纲
  • 暂无提纲
Exploiting topic modeling to boost metagenomic reads binning
[摘要] BackgroundWith the rapid development of high-throughput technologies, researchers can sequence the whole metagenome of a microbial community sampled directly from the environment. The assignment of these metagenomic reads into different species or taxonomical classes is a vital step for metagenomic analysis, which is referred to as binning of metagenomic data.ResultsIn this paper, we propose a new method TM-MCluster for binning metagenomic reads. First, we represent each metagenomic read as a set of "k-mers" with their frequencies occurring in the read. Then, we employ a probabilistic topic model -- the Latent Dirichlet Allocation (LDA) model to the reads, which generates a number of hidden "topics" such that each read can be represented by a distribution vector of the generated topics. Finally, as in the MCluster method, we apply SKWIC -- a variant of the classical K-means algorithm with automatic feature weighting mechanism to cluster these reads represented by topic distributions.ConclusionsExperiments show that the new method TM-MCluster outperforms major existing methods, including AbundanceBin, MetaCluster 3.0/5.0 and MCluster. This result indicates that the exploitation of topic modeling can effectively improve the binning performance of metagenomic reads.
[发布日期] 2015-03-18 [发布机构] 
[效力级别]  [学科分类] 
[关键词] Metagenomics;Metagenomic data binning;Topic modeling [时效性] 
   浏览次数:1      统一登录查看全文      激活码登录查看全文