Detecting Cross-Contamination in Sequencing Data Using Regression Techniques
摘要:
Cross-contamination of a test sample used to determine cancer is identified using gene sequencing data. Each test sample includes a number of test sequences that may include a single nucleotide polymorphism (SNP) that can be indicative of cancer. The test sequences are be filtered to remove or negate at least some of the SNPs from the test sequences. Negating the test sequences allows more test sequences to be simultaneously analyzed to determine cross-contamination. Cross-contamination is determined by modeling the variant allele frequency for the test sequences as a function of minor allele frequency, contamination level, and background noise. In some cases, the variant allele frequency is based on a probability function including the minor allele frequency. Cross-contamination of the test sample is determined if the determined contamination level is above a threshold and statistically significant.
信息查询
0/0