已收录 268921 条政策
 政策提纲
  • 暂无提纲
Detecting and Correcting Contamination in Genetic Data.
[摘要] While technological innovation has dramatically increased the amount and variety of genomic data available to geneticists, no assay is perfect and both human error and technical artifacts can lead to erroneous data. A proper analysis pipeline must both detect errors, and, if possible, correct them. One common source of errors in genetic data is sample-to-sample contamination. This dissertation will identify methods to address contamination in the most common types of genetic studies.Chapter 2 focuses on methods for detecting and quantifying contamination in both array-based and next-generation sequencing (NGS) genotype data. For the array-based data, we use the observed intensities from the genotyping instruments to quantify contamination with two distinct methods: 1) a regression-based model using intensities and population allele frequencies and 2) a multivariate normal mixture model that looks at the clustering of intensities. For NGS data, we model the reads using a mixture model to determine the proportion of reads from the true sample and the contaminating sample.Chapter 3 outlines a method to make accurate genotype calls with contaminated NGS data. Given an estimated level of contamination, we propose a likelihood that can be maximized to call genotypes and estimate allele frequencies for samples with no previous genotype data. We investigate the method from data from two common sequencing strategies: 1) low-pass (2-4x depth) genome-wide sequencing and 2) high-depth (50-100x depth) exome sequencing.Chapter 4 looks at contamination in the context of RNA sequencing (RNA-Seq) data. While the technology to generate RNA-Seq data is similar to exome sequencing, the difference in expression between the contaminating and true sample makes it more difficult to accurately estimate the contamination proportion. We propose methods to improve the quality of these estimates.
[发布日期]  [发布机构] University of Michigan
[效力级别] genetic sequencing [学科分类] 
[关键词] contamination;genetic sequencing;Genetics;Statistics and Numeric Data;Science;Biostatistics [时效性] 
   浏览次数:25      统一登录查看全文      激活码登录查看全文