-
公开(公告)号:US09008974B2
公开(公告)日:2015-04-14
申请号:US13337382
申请日:2011-12-27
CPC classification number: G06F19/14 , G06F19/24 , G06K9/6276
Abstract: In an implementation, a query signature corresponding to a query sequence based on a set of closest cluster centroids is generated. Based on the query signature, one or more target reference signatures from a plurality of reference signatures are identified. Further based on the one or more target reference signatures, a probable taxonomic group is identified and assigned to the query sequence.
Abstract translation: 在实现中,生成与基于一组最近的聚类质心的查询序列相对应的查询签名。 基于查询签名,识别来自多个参考签名的一个或多个目标参考签名。 进一步基于一个或多个目标参考签名,识别可能的分类群并将其分配给查询序列。
-
公开(公告)号:US09116839B2
公开(公告)日:2015-08-25
申请号:US13472737
申请日:2012-05-16
Applicant: Sharmila Shekhar Mande , Varun Mehra , Tarini Shankar Ghosh
Inventor: Sharmila Shekhar Mande , Varun Mehra , Tarini Shankar Ghosh
CPC classification number: G06F19/14 , G06F19/24 , G06K9/622 , G06K9/6268
Abstract: Method(s) and system(s) for identifying horizontally transferred genes are described herein. The method includes defining a cuboid in a three dimensional space, wherein the cuboid includes fragment points corresponding to the genomic fragments belonging to a plurality of sequenced microbial genomes, and dividing the cuboid into a plurality of grids. The method further includes selecting one or more grids corresponding to a selected genome and classifying each of the selected grids as one of majority, minority, and mixed grids, based on number of fragment points corresponding to the selected genome in each of the selected grids. Further, at least one genomic fragment from the minority and the mixed grids is identified as the horizontally transferred gene based on a distance ratio assessment.
Abstract translation: 本文描述了用于识别水平转移的基因的方法和系统。 该方法包括在三维空间中定义长方体,其中长方体包括对应于属于多个测序的微生物基因组的基因组片段的片段点,并将长方体分成多个网格。 该方法还包括基于所选择的基因组中选择的基因组的选择基因组的数量,选择与所选择的基因组相对应的一个或多个网格,并将所选择的网格中的每一个分类为多数,少数和混合网格之一。 此外,基于距离比评估,将来自少数和至少混合网格的至少一个基因组片段识别为水平转移的基因。
-
公开(公告)号:US08972201B2
公开(公告)日:2015-03-03
申请号:US13428794
申请日:2012-03-23
Applicant: Sharmila Shekhar Mande , Monzoorul Hague Mohammed , Anirban Dutta , Tungadri Bose
Inventor: Sharmila Shekhar Mande , Monzoorul Hague Mohammed , Anirban Dutta , Tungadri Bose
IPC: G06F19/22
CPC classification number: G06F19/22
Abstract: Systems and methods for compression of a genomic data file are described herein. In one embodiment, genomic sequences, sequence headers, and quality sequences associated with a plurality of data streams provided in a genomic data file are identified. Each of the genomic sequences includes at least one of primary characters and secondary characters. Further, the secondary characters from each of the genomic sequences may be removed to obtain an intermediate genomic sequence file and a quality score corresponding to the secondary character may be modified in quality sequences to obtain an intermediate quality sequence file. Based on the intermediate genomic sequence file and the intermediate quality sequence file, a modified genomic sequence file and a modified quality sequence file, respectively are generated. A compressed genomic data file is obtained using at least the modified genomic sequence and the modified quality sequence.
Abstract translation: 本文描述了用于压缩基因组数据文件的系统和方法。 在一个实施方案中,识别与在基因组数据文件中提供的多个数据流相关联的基因组序列,序列标题和质量序列。 每个基因组序列包括主要字符和次要字符中的至少一个。 此外,可以除去每个基因组序列的次要字符以获得中间基因组序列文件,并且可以在质量序列中修改对应于次要字符的质量得分以获得中间质量序列文件。 基于中间基因组序列文件和中间质量序列文件,分别生成修饰的基因组序列文件和修改的质量序列文件。 使用至少修饰的基因组序列和修饰的质量序列获得压缩的基因组数据文件。
-
公开(公告)号:US20130166518A1
公开(公告)日:2013-06-27
申请号:US13428794
申请日:2012-03-23
Applicant: Sharmila Shekhar MANDE , Monzoorul Haque MOHAMMED , Anirban DUTTA , Tungadri BOSE
Inventor: Sharmila Shekhar MANDE , Monzoorul Haque MOHAMMED , Anirban DUTTA , Tungadri BOSE
CPC classification number: G06F19/22
Abstract: Systems and methods for compression of a genomic data file are described herein. In one embodiment, genomic sequences, sequence headers, and quality sequences associated with a plurality of data streams provided in a genomic data file are identified. Each of the genomic sequences includes at least one of primary characters and secondary characters. Further, the secondary characters from each of the genomic sequences may be removed to obtain an intermediate genomic sequence file and a quality score corresponding to the secondary character may be modified in quality sequences to obtain an intermediate quality sequence file. Based on the intermediate genomic sequence file and the intermediate quality sequence file, a modified genomic sequence file and a modified quality sequence file, respectively are generated. A compressed genomic data file is obtained using at least the modified genomic sequence and the modified quality sequence.
Abstract translation: 本文描述了用于压缩基因组数据文件的系统和方法。 在一个实施方案中,识别与在基因组数据文件中提供的多个数据流相关联的基因组序列,序列标题和质量序列。 每个基因组序列包括主要字符和次要字符中的至少一个。 此外,可以除去每个基因组序列的次要字符以获得中间基因组序列文件,并且可以在质量序列中修改对应于次要字符的质量得分以获得中间质量序列文件。 基于中间基因组序列文件和中间质量序列文件,分别生成修饰的基因组序列文件和修改的质量序列文件。 使用至少修饰的基因组序列和修饰的质量序列获得压缩的基因组数据文件。
-
公开(公告)号:US09372959B2
公开(公告)日:2016-06-21
申请号:US13484885
申请日:2012-05-31
Applicant: Sharmila Shekhar Mande , Tarini Shankar Ghosh , Varun Mehra
Inventor: Sharmila Shekhar Mande , Tarini Shankar Ghosh , Varun Mehra
Abstract: Systems and methods for assembly of metagenomic sequences are described herein. In one embodiment, a plurality of metagenomic sequences is represented in three dimensional space to obtain a plurality of sequence vectors. Based on plurality of the sequence vectors, a cuboid having a plurality of grids is defined in the three dimensional space such that it encompasses the plurality of metagenomic sequences. Further, the plurality of metagenomic sequences is assembled into one or more contigs based on traversal of the plurality of grids. In one implementation, the one or more contigs are assembled such that a contig includes metagenomic sequences probably originating from the same genome.
Abstract translation: 本文描述了用于组装宏基因组序列的系统和方法。 在一个实施方案中,在三维空间中表示多个宏基因组序列以获得多个序列载体。 基于多个序列向量,在三维空间中定义具有多个网格的长方体,使得其包含多个宏基因组序列。 此外,基于遍历多个网格,将多个宏基因组序列组装成一个或多个重叠群。 在一个实施方案中,组装一个或多个重叠群使得重叠群包括可能源自相同基因组的重组基因组序列。
-
公开(公告)号:US20130325428A1
公开(公告)日:2013-12-05
申请号:US13484885
申请日:2012-05-31
Applicant: Sharmila Shekhar Mande , Tarini Shankar Ghosh , Varun Mehra
Inventor: Sharmila Shekhar Mande , Tarini Shankar Ghosh , Varun Mehra
IPC: G06F19/10
Abstract: Systems and methods for assembly of metagenomic sequences are described herein. In one embodiment, a plurality of metagenomic sequences is represented in three dimensional space to obtain a plurality of sequence vectors. Based on plurality of the sequence vectors, a cuboid having a plurality of grids is defined in the three dimensional space such that it encompasses the plurality of metagenomic sequences. Further, the plurality of metagenomic sequences is assembled into one or more contigs based on traversal of the plurality of grids. In one implementation, the one or more contigs are assembled such that a contig includes metagenomic sequences probably originating from the same genome.
Abstract translation: 本文描述了用于组装宏基因组序列的系统和方法。 在一个实施方案中,在三维空间中表示多个宏基因组序列以获得多个序列载体。 基于多个序列向量,在三维空间中定义具有多个网格的长方体,使得其包含多个宏基因组序列。 此外,基于遍历多个网格,将多个宏基因组序列组装成一个或多个重叠群。 在一个实施方案中,组装一个或多个重叠群,使得重叠群包括可能源自相同基因组的重组基因组序列。
-
公开(公告)号:US20130158880A1
公开(公告)日:2013-06-20
申请号:US13331557
申请日:2011-12-20
Applicant: Sharmila Shekhar Mande , Kuntal Kumar Bhusan , Tarini Shankar Ghosh
Inventor: Sharmila Shekhar Mande , Kuntal Kumar Bhusan , Tarini Shankar Ghosh
IPC: G06F19/00
CPC classification number: G06F19/26 , G06F19/14 , G06F19/20 , G06F19/28 , G06T11/206
Abstract: Systems and methods for analyzing community structures within a plurality of environmental samples are described herein. The method includes obtaining taxa data corresponding to taxonomic groups within the plurality of the environmental samples. Based on the taxa data, an abundance value for each of the taxonomic groups with respect to each of the plurality of environmental samples is determined. Further, based on abundance values, an interaction factor for each pair of the taxonomic groups in the plurality of environmental samples is computed. The interaction factor is indicative of a degree of interaction between a pair of taxonomic groups from among the taxonomic groups. Based in part on interaction factors and abundance values, the plurality of the environmental samples is clustered.
Abstract translation: 本文描述了用于分析多个环境样品中的社区结构的系统和方法。 该方法包括获得与多个环境样品中的分类群相对应的分类数据。 基于分类单元数据,确定每个分类群相对于多个环境样本中的每一个的丰度值。 此外,基于丰度值,计算多个环境样本中每对分类群的相互作用因子。 相互作用因子表明分类群中一对分类群之间的相互作用程度。 部分基于相互作用因子和丰度值,将多个环境样本聚类。
-
公开(公告)号:US20130132353A1
公开(公告)日:2013-05-23
申请号:US13428790
申请日:2012-03-23
Applicant: Sharmila Shekhar MANDE , Monzoorul Haque MOHAMMED , Anirban DUTTA , Tungadri BOSE , Sudha CHADARAM
Inventor: Sharmila Shekhar MANDE , Monzoorul Haque MOHAMMED , Anirban DUTTA , Tungadri BOSE , Sudha CHADARAM
CPC classification number: G06F19/22
Abstract: The present subject matter discloses a system and a method for compression of genomic data. In one embodiment, the method for compression of genomic data includes obtaining modified genomic data from genomic data based at least in part on intermediary data identified from the genomic data. In one implementation, the modified genomic data includes a plurality of primary characters. The genomic data may then be modified to generate one or more most-frequent character files based at least on a most-frequent character and a second most-frequent character from among the plurality of primary characters. Further, based at least on the one or more most-frequent character files and the modified genomic data, a least-frequent characters file may be created from the modified genomic data.
Abstract translation: 本主题公开了一种用于压缩基因组数据的系统和方法。 在一个实施方案中,压缩基因组数据的方法包括至少部分地基于从基因组数据鉴定的中间数据,从基因组数据获得修饰的基因组数据。 在一个实现中,修改的基因组数据包括多个主要字符。 然后可以修改基因组数据以至少基于多个主要字符中最频繁的字符和第二最频繁的字符来生成一个或多个最频繁的字符文件。 此外,至少基于一个或多个最频繁的字符文件和修改的基因组数据,可以从修改的基因组数据创建最不频繁的字符文件。
-
公开(公告)号:US09014987B2
公开(公告)日:2015-04-21
申请号:US13331557
申请日:2011-12-20
Applicant: Sharmila Shekhar Mande , Kuntal Kumar Bhusan , Tarini Shankar Ghosh
Inventor: Sharmila Shekhar Mande , Kuntal Kumar Bhusan , Tarini Shankar Ghosh
CPC classification number: G06F19/26 , G06F19/14 , G06F19/20 , G06F19/28 , G06T11/206
Abstract: Systems and methods for analyzing community structures within a plurality of environmental samples are described herein. The method includes obtaining taxa data corresponding to taxonomic groups within the plurality of the environmental samples. Based on the taxa data, an abundance value for each of the taxonomic groups with respect to each of the plurality of environmental samples is determined. Further, based on abundance values, an interaction factor for each pair of the taxonomic groups in the plurality of environmental samples is computed. The interaction factor is indicative of a degree of interaction between a pair of taxonomic groups from among the taxonomic groups. Based in part on interaction factors and abundance values, the plurality of the environmental samples is clustered.
Abstract translation: 本文描述了用于分析多个环境样品中的社区结构的系统和方法。 该方法包括获得与多个环境样品中的分类群相对应的分类数据。 基于分类单元数据,确定每个分类群相对于多个环境样本中的每一个的丰度值。 此外,基于丰度值,计算多个环境样本中每对分类群的相互作用因子。 相互作用因子表明分类群中一对分类群之间的相互作用程度。 部分基于相互作用因子和丰度值,将多个环境样本聚类。
-
公开(公告)号:US08972200B2
公开(公告)日:2015-03-03
申请号:US13428790
申请日:2012-03-23
Applicant: Sharmila Shekhar Mande , Monzoorul Haque Mohammed , Anirban Dutta , Tungadri Bose , Sudha Chadaram
Inventor: Sharmila Shekhar Mande , Monzoorul Haque Mohammed , Anirban Dutta , Tungadri Bose , Sudha Chadaram
IPC: G06F19/22
CPC classification number: G06F19/22
Abstract: The present subject matter discloses a system and a method for compression of genomic data. In one embodiment, the method for compression of genomic data includes obtaining modified genomic data from genomic data based at least in part on intermediary data identified from the genomic data. In one implementation, the modified genomic data includes a plurality of primary characters. The genomic data may then be modified to generate one or more most-frequent character files based at least on a most-frequent character and a second most-frequent character from among the plurality of primary characters. Further, based at least on the one or more most-frequent character files and the modified genomic data, a least-frequent characters file may be created from the modified genomic data.
Abstract translation: 本主题公开了一种用于压缩基因组数据的系统和方法。 在一个实施方案中,压缩基因组数据的方法包括至少部分地基于从基因组数据鉴定的中间数据,从基因组数据获得修饰的基因组数据。 在一个实现中,修改的基因组数据包括多个主要字符。 然后可以修改基因组数据以至少基于多个主要字符中最频繁的字符和第二最频繁的字符来生成一个或多个最频繁的字符文件。 此外,至少基于一个或多个最频繁的字符文件和修改的基因组数据,可以从修改的基因组数据创建最不频繁的字符文件。
-
-
-
-
-
-
-
-
-