DISCOVERY ROUTING SYSTEMS AND ENGINES
    1.
    发明公开

    公开(公告)号:US20240355427A1

    公开(公告)日:2024-10-24

    申请号:US18759642

    申请日:2024-06-28

    CPC classification number: G16B50/30 G16B50/00 G16H50/20 G16H80/00 G16Z99/00

    Abstract: The inventive subject matter provides apparatus, systems, and methods that improve on the pace of discovering new practical information based on large amounts of datasets collected. In most cases, anomalies from the datasets are automatically identified, flagged, and validated by a cross-validation engine. Only validated anomalies are then associated with a subject matter expert who is qualified to take action on the anomaly. In other words, the inventive subject matter bridges the gap between the overwhelming amount of scientific data which can now be harvested and the comparatively limited amount analytical resources available to extract practical information from the data. Practical information can be in the form of trends, patterns, maps, hypotheses, or predictions, for example, and such practical information has implications in medicine, in environmental sciences, entertainment, travel, shopping, social interactions, or other areas.

    Methods and systems for detecting sequence variants

    公开(公告)号:US12106826B2

    公开(公告)日:2024-10-01

    申请号:US18494317

    申请日:2023-10-25

    Inventor: Deniz Kural

    CPC classification number: G16B30/10 G16B30/00 G16B30/20 G16B50/00

    Abstract: The invention provides methods for identifying rare variants near a structural variation in a genetic sequence, for example, in a nucleic acid sample taken from a subject. The invention additionally includes methods for aligning reads (e.g., nucleic acid reads) to a reference sequence construct accounting for the structural variation, methods for building a reference sequence construct accounting for the structural variation or the structural variation and the rare variant, and systems that use the alignment methods to identify rare variants. The method is scalable, and can be used to align millions of reads to a construct thousands of bases long, or longer.

    EFFICIENT PAYLOAD EXTRACTION FROM POLYNUCLEOTIDE SEQUENCE READS

    公开(公告)号:US20240312567A1

    公开(公告)日:2024-09-19

    申请号:US18678358

    申请日:2024-05-30

    CPC classification number: G16B40/00 G16B30/00 G16B50/00

    Abstract: Systems and techniques for extracting information-containing payloads from DNA or other polynucleotides are provided. Decoding the sequence of payload regions from multiple polynucleotides to obtain encoded information includes sequencing the molecules with a polynucleotide sequencer. Reads generated by the polynucleotide sequencer can include information from multiple different sources mixed together. Primer sequences present in the reads identify which reads contain information from the same source. A computationally efficient technique for finding primer sequences in the reads includes comparing hashes of the reads and hashes of primer sequences to find an approximate location then computing edit distances between the primer sequences and the reads to find an exact location. Reads that include the same primer sequences may be clustered together. Sequences of the payload regions are extracted based on the locations of the primer sequences.

    Method for compressing genomic data

    公开(公告)号:US12080384B2

    公开(公告)日:2024-09-03

    申请号:US15736166

    申请日:2016-06-16

    Abstract: The present invention relates to a method for compressing genomic data, whereby the genomic data are stored in at least one data file containing at least a plurality of reads built by a genome sequencing method, whereby each read includes a mapping position, a CIGAR string and an actual sequenced nucleotide sequence as a local part of the donor genome, comprising the steps: —unwind a nucleotide sequence of a current read of one of said data files by using the mapping position and the CIGAR string of said current read, whereby said current read has at least one previous read, —compute a difference between the unwound nucleotide sequence of said current read and an unwound nucleotide sequence of at least one of said previous reads, whereby said difference contains the differences of the mapping positions and the nucleotide sequences, —pass said computed difference to an entropy coder to compress said difference, —encode said current read by the compressed difference, and —repeat the forgoing steps with said current read as one of said previous reads and a following read as a new current read until no more following reads are available.

Patent Agency Ranking