已收录 273081 条政策
 政策提纲
  • 暂无提纲
Functional Interpretation of High-Throughput Sequencing Data.
[摘要] Functional interpretation of high-throughput sequencing (HTS) data provides insight into biological systems, including important pathways in the context under study. A common approach is gene set enrichment (GSE) testing. GSE emerged in the age of microarrays as a way to biologically interpret long lists of differentially expressed genes (DEGs). However, HTS data has characteristics not present in microarray data that can bias GSE results. My thesis is focused on identifying, characterizing, and accounting for biases to improve functional interpretation in HTS data. In this thesis, I present GSE tests designed for ChIP-seq data and RNA-seq data. Our tests have applications beyond HTS data, which we show by using them to analyze genomic features, including mappability and repeat content. ChIP-Enrich is a GSE test for ChIP-seq data. It includes a database of locus definitions to annotate peaks to different gene loci (such as exons, introns, promoters, and other intergenic regions), which allows for biological discovery unique to different regions. ChIP-Enrich empirically adjusts for the observed bias due to the varying lengths of these gene loci in its enrichment test. RNA-Enrich is a GSE test for RNA-seq data. RNA-Enrich corrects for the selection bias often observed in RNA-seq data, where long and highly expressed genes are more likely to be identified as DEGs. Unlike other GSE tests for RNA-seq data, RNA-Enrich does not require permutations or a cut-off to define DEGs, and works well with small sample sizes. For both ChIP-Enrich and RNA-Enrich, we showed well-calibrated type I error compared to competing methods. Finally, we characterize sequence mappability, which is one potential bias in the interpretation of HTS data. We characterize properties of the main contributors of low mappability (transposons and segmental duplications), overall mappability, and their relationship with gene locus length and function. Across different transcribed and regulatory regions, certain gene functions showed unique signatures involving significantly more/fewer associated repeats, higher/lower mappability, and longer/shorter locus length. Our analyses provide insight into evolutionary selection pressures that maintain complexity of gene regulation. Overall, we demonstrate that considering characteristics of the human genome is essential to improving functional interpretation of HTS data.
[发布日期]  [发布机构] University of Michigan
[效力级别] next-generation sequencing [学科分类] 
[关键词] bioinformatics;next-generation sequencing;gene set enrichment testing;functional interpretation;ChIP-seq;RNA-seq;Genetics;Molecular;Cellular and Developmental Biology;Science (General);Statistics and Numeric Data;Health Sciences;Science;Bioinformatics [时效性] 
   浏览次数:29      统一登录查看全文      激活码登录查看全文