Method and apparatus for adjusting the model threshold of a support vector machine for text classification and filtering
    1.
    发明申请
    Method and apparatus for adjusting the model threshold of a support vector machine for text classification and filtering 失效
    用于调整用于文本分类和过滤的支持向量机的模型阈值的方法和装置

    公开(公告)号:US20050228783A1

    公开(公告)日:2005-10-13

    申请号:US10822327

    申请日:2004-04-12

    CPC classification number: G06F17/3069 G06K9/6269

    Abstract: An information need can be modeled by a binary classifier such as support vector machine (SVM). SVMs can exhibit very conservative precision oriented behavior when modeling information needs. This conservative behavior can be overcome by adjusting the position of the hyperplane, the geometric representation of a SVM. The present invention describes a couple of automatic techniques for adjusting the position of an SVM model based upon a beta-gamma thresholding procedure, cross fold validation and retrofitting. This adjustment technique can also be applied to other types of learning strategies.

    Abstract translation: 信息需求可以由诸如支持向量机(SVM)的二进制分类器来建模。 当建模信息需求时,SVM可以表现出非常保守的精确定向行为。 这种保守的行为可以通过调整超平面的位置,SVM的几何表示来克服。 本发明描述了一些用于基于β-gamma阈值处理程序,交叉验证和翻新来调整SVM模型的位置的自动技术。 这种调整技术也可以应用于其他类型的学习策略。

    Method and apparatus for constructing a compact similarity structure and for using the same in analyzing document relevance
    2.
    发明申请
    Method and apparatus for constructing a compact similarity structure and for using the same in analyzing document relevance 失效
    用于构建紧凑型相似度结构并用于分析文档相关性的方法和装置

    公开(公告)号:US20070136336A1

    公开(公告)日:2007-06-14

    申请号:US11298500

    申请日:2005-12-12

    CPC classification number: G06F17/30705 Y10S707/99935 Y10S707/99942

    Abstract: A computer-readable medium comprises data structure for providing information about levels of similarity between pairs of N documents. The data structure comprises a plurality of entries of similarity values representing levels of similarity for a plurality of pairs of the documents. Each of the similarity values represents a level of similarity of one document of a given pair relative to the other document of the given pair. The similarity value of each entry is greater than a threshold similarity value that is greater than zero. The plurality of similarity-value entries are fewer than N2−N in number if the similarity values are asymmetric with regard to document pairing, and the plurality of similarity-value entries are fewer than N 2 - N 2 in number if the similarity values are symmetric with regard to document pairing. A method and apparatus for generating the data structure are described.

    Abstract translation: 计算机可读介质包括用于提供关于N个文档对之间的相似性级别的信息的数据结构。 数据结构包括表示多对文档对象的相似度级的多个相似度条目。 每个相似度值表示给定对的一个文档相对于给定对的另一个文档的相似度级别。 每个条目的相似度值大于大于零的阈值相似度值。 如果相似度值对于文档配对是不对称的,则多个相似值条目数量少于N≥2,并且多个相似值条目少于 N 2 如果相似性值是对称的,则数字中的 - N 2 文件配对。 描述了用于生成数据结构的方法和装置。

      Method and apparatus for constructing a compact similarity structure and for using the same in analyzing document relevance
      3.
      发明授权
      Method and apparatus for constructing a compact similarity structure and for using the same in analyzing document relevance 失效
      用于构建紧凑型相似度结构并用于分析文档相关性的方法和装置

      公开(公告)号:US07949644B2

      公开(公告)日:2011-05-24

      申请号:US12152522

      申请日:2008-05-15

      CPC classification number: G06F17/30705 Y10S707/99935 Y10S707/99942

      Abstract: A computer-readable medium comprises data structure for providing information about levels of similarity between pairs of N documents. The data structure comprises a plurality of entries of similarity values representing levels of similarity for a plurality of pairs of the documents. Each of the similarity values represents a level of similarity of one document of a given pair relative to the other document of the given pair. The similarity value of each entry is greater than a threshold similarity value that is greater than zero. The plurality of similarity-value entries are fewer than N2−N in number if the similarity values are asymmetric with regard to document pairing, and the plurality of similarity-value entries are fewer than N 2 - N 2 in number if the similarity values are symmetric with regard to document pairing. A method and apparatus for generating the data structure are described.

      Abstract translation: 计算机可读介质包括用于提供关于N个文档对之间的相似性级别的信息的数据结构。 数据结构包括表示多对文档对象的相似度级的多个相似度条目。 每个相似度值表示给定对的一个文档相对于给定对的另一个文档的相似度级别。 每个条目的相似度值大于大于零的阈值相似度值。 如果相似性值对于文档配对是不对称的,则多个相似值条目数量少于数目中的N2-N,并且如果相似度值是相似度值,则多个相似值条目数量少于N 2 -N 2 对于文件配对。 描述了用于生成数据结构的方法和装置。

      Methods and apparatus for identifying workflow graphs using an iterative analysis of empirical data
      4.
      发明申请
      Methods and apparatus for identifying workflow graphs using an iterative analysis of empirical data 审中-公开
      使用经验数据的迭代分析来识别工作流图的方法和装置

      公开(公告)号:US20080065448A1

      公开(公告)日:2008-03-13

      申请号:US11517244

      申请日:2006-09-08

      CPC classification number: G06Q10/04 G06Q10/06 G06Q10/06316 G06Q10/0633

      Abstract: A method and system for generating a workflow graph from empirical data of a process are described. A processing system obtains data corresponding to multiple instances of a process, the process including a set of tasks, the data including information about order of occurrences of the tasks. The processing system analyzes the occurrences of the tasks to identify order constraints. The processing system partitions nodes representing tasks into subsets based upon the order constraints, wherein the subsets are sequence ordered with respect to each other such that all nodes associated with a given subset either precede or follow all nodes associated with another subset. The processing system partitions nodes representing tasks into subgroups, wherein each subgroup includes one or more nodes that occur without order constraints relative to nodes associated with other subgroups. A workflow graph representative of the process is constructed wherein nodes are connected by edges.

      Abstract translation: 描述了从过程的经验数据生成工作流图的方法和系统。 处理系统获得与进程的多个实例相对应的数据,所述处理包括一组任务,所述数据包括关于任务发生顺序的信息。 处理系统分析任务的出现以识别订单约束。 处理系统基于订单约束将表示任务的节点分割成子集,其中子集相对于彼此被序列排序,使得与给定子集相关联的所有节点在与另一子集相关联的所有节点之前或之后。 处理系统将表示任务的节点划分为子组,其中每个子组包括相对于与其他子组相关联的节点没有订单约束而发生的一个或多个节点。 构建表示该过程的工作流图,其中节点通过边缘连接。

      Method and apparatus for comparing scores in a vector space retrieval process
      5.
      发明授权
      Method and apparatus for comparing scores in a vector space retrieval process 失效
      用于比较矢量空间检索过程中的分数的方法和装置

      公开(公告)号:US07356604B1

      公开(公告)日:2008-04-08

      申请号:US09551014

      申请日:2000-04-18

      Applicant: Norbert Roma

      Inventor: Norbert Roma

      CPC classification number: G06F17/3069 Y10S707/99933 Y10S707/99936

      Abstract: The delivery ratio of r (which is a fraction between 0 and 1) partitions a stream of documents into a section of top scoring r-fraction of documents and the remainder. This way a set of successively bigger delivery ratios, r1, r2, r3, . . . sections the stream into tiers. Any given document is assigned to a tier according to how many delivery ratio thresholds it matched or surpassed and how many it failed to reach. This creates a scoring structure which reflects the specificity of the document with respect to a profile in terms of density of relevant documents in the stream. In other words, a document in the kth tier is such that it failed to be classified in the top rk ratio of the stream (thus rk fraction of the stream is more relevant to the given profile than the document under consideration). At the same time this document was classified as being in the top rk−1 part of the stream. Thus this mechanism defines a score (let's call it σ) for a document depending on how it compares to other documents in the stream when scored against a given profile.

      Abstract translation: r(其为0和1之间的分数)的传送比将文档流分割成文档的最高得分r分数部分,其余部分。 这样一组相继较大的传送比例r 1,r 2,r 3,..., 。 。 将流分成几层。 任何给定的文档根据其匹配或超过的传送率阈值多少以及无法达到的数量分配给一个层级。 这创建了一个评分结构,其反映了关于流中相关文档的密度的文档的特征。 换句话说,第k 层中的文档使得它不能被分类为流的最高比例(因此r 流的分数与正在考虑的文档相比,与给定的轮廓更相关)。 同时,该文档被分类为流的顶部r 1 部分。 因此,这种机制根据与对给定配置文件打分的流中的其他文档进行比较,定义文档的分数(让我们称之为西格玛)。

      Method and apparatus for constructing a compact similarity structure and for using the same in analyzing document relevance
      6.
      发明授权
      Method and apparatus for constructing a compact similarity structure and for using the same in analyzing document relevance 失效
      用于构建紧凑型相似度结构并用于分析文档相关性的方法和装置

      公开(公告)号:US07472131B2

      公开(公告)日:2008-12-30

      申请号:US11298500

      申请日:2005-12-12

      CPC classification number: G06F17/30705 Y10S707/99935 Y10S707/99942

      Abstract: A computer-readable medium comprises data structure for providing information about levels of similarity between pairs of N documents. The data structure comprises a plurality of entries of similarity values representing levels of similarity for a plurality of pairs of the documents. Each of the similarity values represents a level of similarity of one document of a given pair relative to the other document of the given pair. The similarity value of each entry is greater than a threshold similarity value that is greater than zero. The plurality of similarity-value entries are fewer than N2−N in number if the similarity values are asymmetric with regard to document pairing, and the plurality of similarity-value entries are fewer than N 2 - N 2 in number if the similarity values are symmetric with regard to document pairing. A method and apparatus for generating the data structure are described.

      Abstract translation: 计算机可读介质包括用于提供关于N个文档对之间的相似性级别的信息的数据结构。 数据结构包括表示多对文档对象的相似度级的多个相似度条目。 每个相似度值表示给定对的一个文档相对于给定对的另一个文档的相似度级别。 每个条目的相似度值大于大于零的阈值相似度值。 如果相似度值对于文档配对是不对称的,则多个相似值条目数量少于数目中的N2-N,并且多个相似度值条目少于 N 2 - 如果相似度值对于文档配对,则数字中的 2 。 描述了用于生成数据结构的方法和装置。

        Method and apparatus for constructing a compact similarity structure and for using the same in analyzing document relevance
        7.
        发明申请
        Method and apparatus for constructing a compact similarity structure and for using the same in analyzing document relevance 失效
        用于构建紧凑型相似度结构并用于分析文档相关性的方法和装置

        公开(公告)号:US20080275870A1

        公开(公告)日:2008-11-06

        申请号:US12152522

        申请日:2008-05-15

        CPC classification number: G06F17/30705 Y10S707/99935 Y10S707/99942

        Abstract: A computer-readable medium comprises data structure for providing information about levels of similarity between pairs of N documents. The data structure comprises a plurality of entries of similarity values representing levels of similarity for a plurality of pairs of the documents. Each of the similarity values represents a level of similarity of one document of a given pair relative to the other document of the given pair. The similarity value of each entry is greater than a threshold similarity value that is greater than zero. The plurality of similarity-value entries are fewer than N2−N in number if the similarity values are asymmetric with regard to document pairing, and the plurality of similarity-value entries are fewer than N 2 - N 2 in number if the similarity values are symmetric with regard to document pairing. A method and apparatus for generating the data structure are described.

        Abstract translation: 计算机可读介质包括用于提供关于N个文档对之间的相似性级别的信息的数据结构。 数据结构包括表示多对文档对象的相似度级的多个相似度条目。 每个相似度值表示给定对的一个文档相对于给定对的另一个文档的相似度级别。 每个条目的相似度值大于大于零的阈值相似度值。 如果相似度值对于文档配对是不对称的,则多个相似值条目数量少于N≥2,并且多个相似值条目少于 N 2 如果相似性值是对称的,则数字中的 - N 2 文件配对。 描述了用于生成数据结构的方法和装置。

          Method and apparatus for adjusting the model threshold of a support vector machine for text classification and filtering
          8.
          发明授权
          Method and apparatus for adjusting the model threshold of a support vector machine for text classification and filtering 失效
          用于调整用于文本分类和过滤的支持向量机的模型阈值的方法和装置

          公开(公告)号:US07356187B2

          公开(公告)日:2008-04-08

          申请号:US10822327

          申请日:2004-04-12

          CPC classification number: G06F17/3069 G06K9/6269

          Abstract: An information need can be modeled by a binary classifier such as support vector machine (SVM). SVMs can exhibit very conservative precision oriented behavior when modeling information needs. This conservative behavior can be overcome by adjusting the position of the hyperplane, the geometric representation of a SVM. The present invention describes a couple of automatic techniques for adjusting the position of an SVM model based upon a beta-gamma thresholding procedure, cross fold validation and retrofitting. This adjustment technique can also be applied to other types of learning strategies.

          Abstract translation: 信息需求可以由诸如支持向量机(SVM)的二进制分类器来建模。 当建模信息需求时,SVM可以表现出非常保守的精确定向行为。 这种保守的行为可以通过调整超平面的位置,SVM的几何表示来克服。 本发明描述了一些用于基于β-gamma阈值处理程序,交叉验证和翻新来调整SVM模型的位置的自动技术。 这种调整技术也可以应用于其他类型的学习策略。

        Patent Agency Ranking