Abstract:
Provided herein are methods for simultaneously identifying genomic copy number variations (CNVs) and sequence variations in an enriched genomic sample and compositions, systems, and kits for performing such methods. In some aspects, the methods include: (a) obtaining a plurality of sequence reads from an enriched genomic sample that includes a plurality of genomic backbone regions and a plurality of genomic mutation regions of interest in a genomic locus of a subject; (b) obtaining a plurality of sequence reads from corresponding genomic backbone regions and genomic mutation regions of at least one reference genomic sample; (c) assembling the plurality sequence reads from the enriched genomic sample and the at least one reference genomic sample; and (d) determining, based on computational analysis of the assembly, whether the genomic locus has a copy number variation (CNV) and/or a sequence variation. The present disclosure further includes aspects in which the methods are performed by a computer and provide an output to a user identifying a genomic CNV and/or sequence variation.
Abstract:
Provided herein is a method for identifying a sequence variant in an enriched sample. In certain embodiments, this method may comprise: (a) obtaining: (i) a plurality of sequence reads from a sample that has been enriched for a genomic region and (ii) a reference sequence for the genomic region; (b) assembling the sequence reads to obtain a plurality of discrete sequence assemblies that correspond to potential variants; (c) determining which of the potential variants are true and which are artifacts by examining the sequence reads that make up each of the discrete sequence assemblies; (d) optionally determining whether each of the true potential variants contains a mutation that is known to be associated with the reference sequence; and (e) outputting a report indicating whether the sample comprises a sequence variant.
Abstract:
Provided herein are methods for simultaneously identifying genomic copy number variations (CNVs) and sequence variations in an enriched genomic sample and compositions, systems, and kits for performing such methods. In some aspects, the methods include: (a) obtaining a plurality of sequence reads from an enriched genomic sample that includes a plurality of genomic backbone regions and a plurality of genomic mutation regions of interest in a genomic locus of a subject; (b) obtaining a plurality of sequence reads from corresponding genomic backbone regions and genomic mutation regions of at least one reference genomic sample; (c) assembling the plurality sequence reads from the enriched genomic sample and the at least one reference genomic sample; and (d) determining, based on computational analysis of the assembly, whether the genomic locus has a copy number variation (CNV) and/or a sequence variation. The present disclosure further includes aspects in which the methods are performed by a computer and provide an output to a user identifying a genomic CNV and/or sequence variation.
Abstract:
Provided herein, among other things, is a computer-implemented method for assigning a sequence read to a genomic location, the method including: a) accessing a file containing a sequence read, wherein the sequence read is obtained from a nucleic acid sample that has been enriched by hybridization to a plurality of capture sequences; and b) assigning the sequence read to a genomic location by: i) identifying a capture sequence as being a match with the sequence read if the sequence read contains one or more subsequences of the capture sequence; ii) calculating, using a computer, a score indicating the degree of sequence similarity between each of the matched capture sequences and the sequence read; and iii) assigning the sequence read to the genomic location if the calculated score for a matched capture sequence is above a threshold.