Patent search ap:"MARVIN MENDELSSOHN" Page 1

1.

发明授权
Automatic selection of blocking column for de-duplication 失效
Title translation: 自动选择用于重复数据删除的阻止列

公开(公告)号：US08560505B2

公开(公告)日：2013-10-15

申请号：US13313518

申请日：2011-12-07

Applicant: Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

Inventor： Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

IPC: G06F7/00

CPC classification number: G06F17/30303

Abstract: Blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.

Abstract translation: 阻塞列选择可以包括确定多个列集合的每个列集合的第一参数，其中第一参数指示列集合中的块的分布，以及为每个列集合确定第二参数。第二个参数可以指示列集的块大小。对于每个列集合，可以使用处理器来计算取决于至少第一参数和第二参数的可阻止性的度量。可以根据阻塞性的测量对多个列集进行排序。

2.

发明申请
AUTOMATIC SELECTION OF BLOCKING COLUMN FOR DE-DUPLICATION 失效
Title translation: 自动选择阻塞柱进行去重现

公开(公告)号：US20130151487A1

公开(公告)日：2013-06-13

申请号：US13313518

申请日：2011-12-07

Applicant: SNIGDHA CHATURVEDI , TANVEER A. FARUQUIE , HIMA P. KARANAM , MARVIN MENDELSSOHN , MUKESH K. MOHANIA , L. VENKATA SUBRAMANIAM

Inventor： SNIGDHA CHATURVEDI , TANVEER A. FARUQUIE , HIMA P. KARANAM , MARVIN MENDELSSOHN , MUKESH K. MOHANIA , L. VENKATA SUBRAMANIAM

IPC: G06F7/00 , G06F17/30

CPC classification number: G06F17/30303

Abstract: Blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.

Abstract translation: 阻塞列选择可以包括确定多个列集合的每个列集合的第一参数，其中第一参数指示列集合中的块的分布，以及为每个列集合确定第二参数。第二个参数可以指示列集的块大小。对于每个列集合，可以使用处理器来计算取决于至少第一参数和第二参数的可阻止性的度量。可以根据阻塞性的测量对多个列集进行排序。

3.

发明授权
Automatically mining patterns for rule based data standardization systems 有权
Title translation: 自动挖掘基于规则的数据标准化系统的模式

公开(公告)号：US08996524B2

公开(公告)日：2015-03-31

申请号：US13415144

申请日：2012-03-08

Applicant: Snigdha Chaturvedi , Tanveer A Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

Inventor： Snigdha Chaturvedi , Tanveer A Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

IPC: G06F7/00 , G06F17/30

CPC classification number: G06F17/30705 , G06F17/2775 , G06F17/30675 , G06F2216/03 , G06Q10/06 , G06Q10/10 , G06Q30/02

Abstract: Methods, computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.

Abstract translation: 提供方法，计算机程序产品和系统用于挖掘文本数据集中的子模式。这些实施例有助于找到数据集内的N个经常出现的子模式的集合，从数据集中提取N个子模式，并将所提取的子模式聚类成K个组，其中每个提取的子模式被放置在基于距离值D的与其他提取的子模式相同的组，其确定子模式和同一组内的每个其他子模式之间的相似度。

4.

发明授权
Systems and methods for discovering synonymous elements using context over multiple similar addresses 失效
Title translation: 使用上下文发现多个相似地址的同义元素的系统和方法

公开(公告)号：US08682898B2

公开(公告)日：2014-03-25

申请号：US12771543

申请日：2010-04-30

Applicant: Sachindra Joshi , Tanveer Faruquie , Hima Prasad Karanam , Marvin Mendelssohn , Mukesh Kumar Mohania , Angel Marie Smith , L Venkata Subramaniam , Girish Venkatachaliah

Inventor： Sachindra Joshi , Tanveer Faruquie , Hima Prasad Karanam , Marvin Mendelssohn , Mukesh Kumar Mohania , Angel Marie Smith , L Venkata Subramaniam , Girish Venkatachaliah

IPC: G06F7/00 , G06F17/00

CPC classification number: G06F17/2735 , G06F17/2795

Abstract: A clustering-based approach to data standardization is provided. Certain embodiments take as input a plurality of addresses, identify one or more features of the addresses, cluster the addresses based on the one or more features, utilize the cluster(s) to provide a data-based context useful in identifying one or more synonyms for elements contained in the address(es), and standardize the address(es) to an acceptable format, with one or more synonyms and/or other elements being added to or taken away from the input address(es) as part of the standardization process.

Abstract translation: 提供了基于聚类的数据标准化方法。某些实施例将多个地址作为输入，识别地址的一个或多个特征，基于一个或多个特征聚集地址，利用群集提供用于识别一个或多个同义词的基于数据的上下文对于包含在地址中的元素，并将地址标准化为可接受的格式，其中一个或多个同义词和/或其他元素作为标准化的一部分被添加到或从输入地址中取走处理。

5.

发明授权
Automatically mining patterns for rule based data standardization systems 有权

公开(公告)号：US10163063B2

公开(公告)日：2018-12-25

申请号：US13414374

申请日：2012-03-07

Applicant: Snigdha Chaturvedi , Tanveer A Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

Inventor： Snigdha Chaturvedi , Tanveer A Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

IPC: G06F17/30 , G06Q10/06 , G06Q10/10 , G06Q30/02

Abstract: Computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.

6.

发明申请
Automatically Mining Patterns for Rule Based Data Standardization Systems 审中-公开
Title translation: 基于规则的数据标准化系统自动挖掘模式

公开(公告)号：US20130238611A1

公开(公告)日：2013-09-12

申请号：US13415144

申请日：2012-03-08

Applicant: Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

Inventor： Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

IPC: G06F17/30

CPC classification number: G06F17/30705 , G06F17/2775 , G06F17/30675 , G06F2216/03 , G06Q10/06 , G06Q10/10 , G06Q30/02

Abstract: Methods, computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.

Abstract translation: 提供方法，计算机程序产品和系统用于挖掘文本数据集中的子模式。这些实施例有助于找到数据集内的N个经常出现的子模式的集合，从数据集中提取N个子模式，并将所提取的子模式聚类成K个组，其中每个提取的子模式被放置在基于距离值D的与其他提取的子模式相同的组，其确定子模式和同一组内的每个其他子模式之间的相似度。

7.

发明申请
Automatically Mining Patterns For Rule Based Data Standardization Systems 审中-公开
Title translation: 自动挖掘基于规则的数据标准化系统的模式

公开(公告)号：US20130238610A1

公开(公告)日：2013-09-12

申请号：US13414374

申请日：2012-03-07

Applicant: Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

Inventor： Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

IPC: G06F17/30

CPC classification number: G06F17/30705 , G06F17/2775 , G06F17/30675 , G06F2216/03 , G06Q10/06 , G06Q10/10 , G06Q30/02 , Y04S10/54

Abstract: Computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.

Abstract translation: 提供计算机程序产品和系统用于挖掘文本数据集中的子模式。这些实施例有助于找到数据集内的N个经常出现的子模式的集合，从数据集中提取N个子模式，并将所提取的子模式聚类成K个组，其中每个提取的子模式被放置在基于距离值D的与其他提取的子模式相同的组，其确定子模式和同一组内的每个其他子模式之间的相似度。

8.

发明申请
Systems and Methods for Discovering Synonymous Elements Using Context Over Multiple Similar Addresses 失效
Title translation: 使用上下文多个相似地址发现同义元素的系统和方法

公开(公告)号：US20110270808A1

公开(公告)日：2011-11-03

申请号：US12771543

申请日：2010-04-30

Applicant: Tanveer A. Faruquie , Sachindra Joshi , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , Angel Smith , L. V. Subramaniam , Girish Venkatachaliah

Inventor： Tanveer A. Faruquie , Sachindra Joshi , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , Angel Smith , L. V. Subramaniam , Girish Venkatachaliah

IPC: G06F17/30

CPC classification number: G06F17/2735 , G06F17/2795

Abstract: A clustering-based approach to data standardization is provided. Certain embodiments take as input a plurality of addresses, identify one or more features of the addresses, cluster the addresses based on the one or more features, utilize the cluster(s) to provide a data-based context useful in identifying one or more synonyms for elements contained in the address(es), and standardize the address(es) to an acceptable format, with one or more synonyms and/or other elements being added to or taken away from the input address(es) as part of the standardization process.

Abstract translation: 提供了基于聚类的数据标准化方法。某些实施例将多个地址作为输入，识别地址的一个或多个特征，基于一个或多个特征聚集地址，利用群集提供用于识别一个或多个同义词的基于数据的上下文对于包含在地址中的元素，并将地址标准化为可接受的格式，其中一个或多个同义词和/或其他元素作为标准化的一部分被添加到或从输入地址中取走处理。

9.

发明授权
Automatic selection of blocking column for de-duplication 失效
Title translation: 自动选择用于重复数据删除的阻止列

公开(公告)号：US08560506B2

公开(公告)日：2013-10-15

申请号：US13447726

申请日：2012-04-16

Applicant: Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

Inventor： Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

IPC: G06F7/00

CPC classification number: G06F17/30303

Abstract: A method of blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.

Abstract translation: 阻止列选择的方法可以包括为多个列集合的每个列集合确定第一参数，其中第一参数指示列集合中的块的分布，以及为每个列集合确定第二参数。第二个参数可以指示列集的块大小。对于每个列集合，可以使用处理器来计算取决于至少第一参数和第二参数的可阻止性的度量。可以根据阻塞性的测量对多个列集进行排序。

10.

发明申请
AUTOMATIC SELECTION OF BLOCKING COLUMN FOR DE-DUPLICATION 失效
Title translation: 自动选择阻塞柱进行去重现

公开(公告)号：US20130151490A1

公开(公告)日：2013-06-13

申请号：US13447726

申请日：2012-04-16

Applicant: SNIGDHA CHATURVEDI , TANVEER A. FARUQUIE , HIMA P. KARANAM , MARVIN MENDELSSOHN , MUKESH K. MOHANIA , L. VENKATA SUBRAMANIAM

Inventor： SNIGDHA CHATURVEDI , TANVEER A. FARUQUIE , HIMA P. KARANAM , MARVIN MENDELSSOHN , MUKESH K. MOHANIA , L. VENKATA SUBRAMANIAM

IPC: G06F17/30

CPC classification number: G06F17/30303

Abstract: A method of blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.

Abstract translation: 阻止列选择的方法可以包括为多个列集合的每个列集合确定第一参数，其中第一参数指示列集合中的块的分布，以及为每个列集合确定第二参数。第二个参数可以指示列集的块大小。对于每个列集合，可以使用处理器来计算取决于至少第一参数和第二参数的可阻止性的度量。可以根据阻塞性的测量对多个列集进行排序。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification