SYSTEMS AND METHODS FOR TRAINING DOCUMENT ANALYSIS SYSTEM FOR AUTOMATICALLY EXTRACTING DATA FROM DOCUMENTS
    1.
    发明申请
    SYSTEMS AND METHODS FOR TRAINING DOCUMENT ANALYSIS SYSTEM FOR AUTOMATICALLY EXTRACTING DATA FROM DOCUMENTS 审中-公开
    用于培训文件分析系统的系统和方法,用于从文档自动提取数据

    公开(公告)号:US20110258150A1

    公开(公告)日:2011-10-20

    申请号:US13007430

    申请日:2011-01-14

    CPC classification number: G06K9/00442 G06K9/48 G06K9/72 G06K2209/01

    Abstract: A method of training a document analysis system to extract data from documents is provided. The method includes: automatically analyzing images and text features extracted from a document to associate the document with a corresponding document category; comparing the extracted text features with a set of text features associated with corresponding category of the document, in which the set of text features includes a set of characters, words, and phrases; if the extracted features are found to consist of the characters, words, and phrases belonging to the set of text features associated with the corresponding document category, storing the extracted text features as the data contained in the corresponding document; and, if the extracted text features are found to include at least one text feature that does not belong to the set of text features associated with the corresponding document category, submitting the unrecognized text features to a training phase.

    Abstract translation: 提供了一种培训文档分析系统从文档中提取数据的方法。 该方法包括:自动分析从文档中提取的图像和文本特征,将文档与相应的文档类别相关联; 将所提取的文本特征与与文档的相应类别相关联的一组文本特征进行比较,其中该组文本特征包括一组字符,单词和短语; 如果发现所提取的特征由属于与相应文档类别相关联的文本特征集合的字符,单词和短语组成,则将所提取的文本特征存储为包含在相应文档中的数据; 并且如果所提取的文本特征被发现包括不属于与相应文档类别相关联的一组文本特征的至少一个文本特征,则将未被识别的文本特征提交到训练阶段。

Patent Agency Ranking