System and method for dynamically evaluating latent concepts in unstructured documents
    1.
    发明授权
    System and method for dynamically evaluating latent concepts in unstructured documents 有权
    动态评估非结构化文档中的潜在概念的系统和方法

    公开(公告)号:US06978274B1

    公开(公告)日:2005-12-20

    申请号:US09944474

    申请日:2001-08-31

    Abstract: A system and method for dynamically evaluating latent concepts in unstructured documents is disclosed. A multiplicity of concepts are extracted from a set of unstructured documents into a lexicon. The lexicon uniquely identifies each concept and a frequency of occurrence. A frequency of occurrence representation is created for the documents set. The frequency representation provides an ordered corpus of the frequencies of occurrence of each concept. A subset of concepts is selected from the frequency of occurrence representation filtered against a pre-defined threshold. A group of weighted clusters of concepts selected from the concepts subset is generated. A matrix of best fit approximations is determined for each document weighted against each group of weighted clusters of concepts.

    Abstract translation: 公开了一种用于动态评估非结构化文档中的潜在概念的系统和方法。 将多个概念从一组非结构化文档提取到词典中。 词典唯一地标识每个概念和发生频率。 为文档集创建出现表示的频率。 频率表示提供每个概念的发生频率的有序语料库。 从根据预定义阈值过滤的发生表现的频率中选择概念的子集。 生成从概念子集中选择的一组加权的概念群集。 对于每组加权的概念集合加权的每个文档确定最佳拟合近似矩阵。

    System And Method For Thematically Grouping Documents Into Clusters
    3.
    发明申请
    System And Method For Thematically Grouping Documents Into Clusters 有权
    将文件分组到集群的系统和方法

    公开(公告)号:US20110022597A1

    公开(公告)日:2011-01-27

    申请号:US12897710

    申请日:2010-10-04

    Abstract: A system and method for thematically grouping documents into clusters is provided. Concepts are extracted from a plurality of documents. The concepts include nouns or noun phrases. A number of occurrences for each concept are determined within each document. A bounded range is applied to the concepts and a subset of the concepts is selected by removing the concepts that fall outside the bounded range. The bounded range includes upper edge conditions and lower edge conditions. Themes are generated from the subset of concepts by identifying two or more concepts with common semantic meaning. Clusters of the documents are generated based on the themes.

    Abstract translation: 提供了一种用于将文档主题分组为集群的系统和方法。 从多个文档中提取概念。 概念包括名词或名词短语。 在每个文档中确定每个概念的一些事件。 将有界范围应用于概念,并通过删除超出有界范围的概念来选择概念的子集。 有界范围包括上边缘条件和较低边缘条件。 通过识别具有共同语义意义的两个或多个概念,从概念子集生成主题。 根据主题生成文档集群。

    System and method for dynamically evaluating latent concepts in unstructured documents
    4.
    发明申请
    System and method for dynamically evaluating latent concepts in unstructured documents 有权
    动态评估非结构化文档中的潜在概念的系统和方法

    公开(公告)号:US20060089947A1

    公开(公告)日:2006-04-27

    申请号:US11304406

    申请日:2005-12-14

    Abstract: A system and method for dynamically evaluating latent concepts in unstructured documents is disclosed. A multiplicity of concepts are extracted from a set of unstructured documents into a lexicon. The lexicon uniquely identifies each concept and a frequency of occurrence. A frequency of occurrence representation is created for the documents set. The frequency representation provides an ordered corpus of the frequencies of occurrence of each concept. A subset of concepts is selected from the frequency of occurrence representation filtered against a pre-defined threshold. A group of weighted clusters of concepts selected from the concepts subset is generated. A matrix of best fit approximations is determined for each document weighted against each group of weighted clusters of concepts.

    Abstract translation: 公开了一种用于动态评估非结构化文档中的潜在概念的系统和方法。 将多个概念从一组非结构化文档提取到词典中。 词典唯一地标识每个概念和发生频率。 为文档集创建出现表示的频率。 频率表示提供每个概念的发生频率的有序语料库。 从根据预定义阈值过滤的发生表现的频率中选择概念的子集。 生成从概念子集中选择的一组加权的概念群集。 对于每组加权的概念集合加权的每个文档确定最佳拟合近似矩阵。

    System and method for grouping similar documents
    5.
    发明授权
    System and method for grouping similar documents 有权
    用于对类似文档进行分组的系统和方法

    公开(公告)号:US08380718B2

    公开(公告)日:2013-02-19

    申请号:US13225325

    申请日:2011-09-02

    Abstract: A system and method for grouping similar documents is provided. Frequencies of occurrences are determined for terms and noun phrases within a set of documents. A subset of the documents is selected by removing those documents having terms and noun phrases that fall outside a bounded range of upper and lower conditions for frequency of occurrence. Each of the documents in the subset is mapped to a cluster of documents based on a similarity of the documents to the cluster documents.

    Abstract translation: 提供了一种用于分组相似文档的系统和方法。 确定一组文件中的术语和名词短语的出现频率。 通过删除具有术语和名词短语的文档来选择文档的子集,这些文本和名词短语落在发生频率的上下限条件的有界范围之外。 基于文档与集群文档的相似性,将子集中的每个文档映射到一组文档。

    System and method for clustering unstructured documents
    6.
    发明授权
    System and method for clustering unstructured documents 有权
    用于聚类非结构化文档的系统和方法

    公开(公告)号:US07809727B2

    公开(公告)日:2010-10-05

    申请号:US11964000

    申请日:2007-12-24

    Abstract: A system and method for clustering unstructured documents is provided. Documents having terms with frequencies of occurrence that satisfy upper and lower edge conditions are selected. Concepts are generated for the selected documents. The selected documents are grouped into clusters of the documents. A weight for each of the clusters is evaluated. A similarity value is determined from the frequencies of occurrence for at least one of the terms from the concepts and the cluster weights for each selected document. Each selected document is assigned into one such cluster based on the similarity value of the selected document.

    Abstract translation: 提供了一种用于聚类非结构化文档的系统和方法。 选择具有满足上下边缘条件的出现频率的项的文档。 为所选文档生成概念。 所选择的文档被分组成文档的集群。 评估每个簇的权重。 从针对每个所选文档的概念和集群权重的至少一个项目的出现频率确定相似性值。 基于所选文档的相似度值,将每个所选择的文档分配到一个这样的集群中。

    System and method for dynamically evaluating latent concepts in unstructured documents
    7.
    发明授权
    System and method for dynamically evaluating latent concepts in unstructured documents 有权
    动态评估非结构化文档中的潜在概念的系统和方法

    公开(公告)号:US07313556B2

    公开(公告)日:2007-12-25

    申请号:US11304406

    申请日:2005-12-14

    Abstract: A system and method for dynamically evaluating latent concepts in unstructured documents is disclosed. A multiplicity of concepts are extracted from a set of unstructured documents into a lexicon. The lexicon uniquely identifies each concept and a frequency of occurrence. A frequency of occurrence representation is created for the documents set. The frequency representation provides an ordered corpus of the frequencies of occurrence of each concept. A subset of concepts is selected from the frequency of occurrence representation filtered against a pre-defined threshold. A group of weighted clusters of concepts selected from the concepts subset is generated. A matrix of best fit approximations is determined for each document weighted against each group of weighted clusters of concepts.

    Abstract translation: 公开了一种用于动态评估非结构化文档中的潜在概念的系统和方法。 将多个概念从一组非结构化文档提取到词典中。 词典唯一地标识每个概念和发生频率。 为文档集创建出现表示的频率。 频率表示提供每个概念的发生频率的有序语料库。 从根据预定义阈值过滤的发生表现的频率中选择概念的子集。 生成从概念子集中选择的一组加权的概念群集。 对于每组加权的概念集合加权的每个文档确定最佳拟合近似矩阵。

    System and method for thematically grouping documents into clusters
    8.
    发明授权
    System and method for thematically grouping documents into clusters 有权
    将文档主题分组为集群的系统和方法

    公开(公告)号:US08015188B2

    公开(公告)日:2011-09-06

    申请号:US12897710

    申请日:2010-10-04

    Abstract: A system and method for thematically grouping documents into clusters is provided. Concepts are extracted from a plurality of documents. The concepts include nouns or noun phrases. A number of occurrences for each concept are determined within each document. A bounded range is applied to the concepts and a subset of the concepts is selected by removing the concepts that fall outside the bounded range. The bounded range includes upper edge conditions and lower edge conditions. Themes are generated from the subset of concepts by identifying two or more concepts with common semantic meaning. Clusters of the documents are generated based on the themes.

    Abstract translation: 提供了一种用于将文档主题分组为集群的系统和方法。 从多个文档中提取概念。 概念包括名词或名词短语。 在每个文档中确定每个概念的多个事件。 将有界范围应用于概念,并通过删除超出有界范围的概念来选择概念的子集。 有界范围包括上边缘条件和较低边缘条件。 通过识别具有共同语义意义的两个或多个概念,从概念子集生成主题。 根据主题生成文档集群。

    System And Method For Clustering Unstructured Documents
    9.
    发明申请
    System And Method For Clustering Unstructured Documents 有权
    用于聚类非结构化文档的系统和方法

    公开(公告)号:US20080104063A1

    公开(公告)日:2008-05-01

    申请号:US11964000

    申请日:2007-12-24

    Abstract: A system and method for clustering unstructured documents is provided. Documents having terms with frequencies of occurrence that satisfy upper and lower edge conditions are selected. Concepts are generated for the selected documents. The selected documents are grouped into clusters of the documents. A weight for each of the clusters is evaluated. A similarity value is determined from the frequencies of occurrence for at least one of the terms from the concepts and the cluster weights for each selected document. Each selected document is assigned into one such cluster based on the similarity value of the selected document.

    Abstract translation: 提供了一种用于聚类非结构化文档的系统和方法。 选择具有满足上下边缘条件的出现频率的项的文档。 为所选文档生成概念。 所选择的文档被分组成文档的集群。 评估每个簇的权重。 从针对每个所选文档的概念和集群权重的至少一个项目的出现频率确定相似性值。 基于所选文档的相似度值,将每个所选择的文档分配到一个这样的集群中。

    System and method for reorienting clusters within a display to provide a perspective-corrected representation
    10.
    发明授权
    System and method for reorienting clusters within a display to provide a perspective-corrected representation 有权
    用于在显示器内重新定向簇的系统和方法以提供透视校正的表示

    公开(公告)号:US07948491B2

    公开(公告)日:2011-05-24

    申请号:US12060005

    申请日:2008-03-31

    Applicant: Dan Gallivan

    Inventor: Dan Gallivan

    CPC classification number: G06T11/206 G06T11/20 G06T17/00

    Abstract: A system and method for reorienting clusters within a display is provided. Clusters are maintained within a display. Each cluster includes a center located at a distance relative to a common origin for the display and a radius measured from the center. A pair of the clusters is selected and a bounding region is determined for each cluster in the pair by forming a pair of tangent vectors about the cluster and originating at the common origin. The bounding regions of the clusters in the pair are compared. The distance from the common origin of one of the clusters in the pair is increased upon overlap of the bounding regions as a perspective-corrected distance, which is determined as a function of the distances, the radii, and an angle between tangent vectors. The one cluster is moved to reorient the cluster's center at the perspective-corrected distance in the display.

    Abstract translation: 提供了一种用于在显示器内重新定向簇的系统和方法。 集群保持在显示屏内。 每个群集包括一个位于与显示器的共同原点相距一定距离的中心以及从中心测量的半径。 选择一对簇,并且通过形成关于簇的一对切向量并且起始于公共源,确定该对中的每个簇的边界区域。 比较该对中的簇的边界区域。 距离该对中的一个簇的公共原点的距离随着边界区域的重叠而增加,作为透视校正距离,其被确定为距离,半径和切向矢量之间的角度的函数。 移动一个群集以在显示器中以透视校正的距离重新定向群集的中心。

Patent Agency Ranking