Patent search ap:"Jian-Wu XU" Page 1

1.

发明申请
SYSTEMS AND METHODS FOR TRAINING DOCUMENT ANALYSIS SYSTEM FOR AUTOMATICALLY EXTRACTING DATA FROM DOCUMENTS 审中-公开
Title translation: 用于培训文件分析系统的系统和方法，用于从文档自动提取数据

公开(公告)号：US20110258150A1

公开(公告)日：2011-10-20

申请号：US13007430

申请日：2011-01-14

Applicant: Depankar NEOGI , Steven K. LADD , Girish WELLING , Arjun KUMAR , Vartika SINGH , Matthew DUGGAN , Tushar MAHATA , Xiaobin YANG , Jian-Wu XU , Janice O'NEIL , Nirupam SARKAR , Gopal KRISHNA

Inventor： Depankar NEOGI , Steven K. LADD , Girish WELLING , Arjun KUMAR , Vartika SINGH , Matthew DUGGAN , Tushar MAHATA , Xiaobin YANG , Jian-Wu XU , Janice O'NEIL , Nirupam SARKAR , Gopal KRISHNA

IPC: G06F15/18

CPC classification number: G06K9/00442 , G06K9/48 , G06K9/72 , G06K2209/01

Abstract: A method of training a document analysis system to extract data from documents is provided. The method includes: automatically analyzing images and text features extracted from a document to associate the document with a corresponding document category; comparing the extracted text features with a set of text features associated with corresponding category of the document, in which the set of text features includes a set of characters, words, and phrases; if the extracted features are found to consist of the characters, words, and phrases belonging to the set of text features associated with the corresponding document category, storing the extracted text features as the data contained in the corresponding document; and, if the extracted text features are found to include at least one text feature that does not belong to the set of text features associated with the corresponding document category, submitting the unrecognized text features to a training phase.

Abstract translation: 提供了一种培训文档分析系统从文档中提取数据的方法。该方法包括：自动分析从文档中提取的图像和文本特征，将文档与相应的文档类别相关联; 将所提取的文本特征与与文档的相应类别相关联的一组文本特征进行比较，其中该组文本特征包括一组字符，单词和短语; 如果发现所提取的特征由属于与相应文档类别相关联的文本特征集合的字符，单词和短语组成，则将所提取的文本特征存储为包含在相应文档中的数据; 并且如果所提取的文本特征被发现包括不属于与相应文档类别相关联的一组文本特征的至少一个文本特征，则将未被识别的文本特征提交到训练阶段。

2.

发明申请
SYSTEMS AND METHODS FOR AUTOMATICALLY EXTRACTING DATA FROM ELECTRONIC DOCUMENTS CONTAINING MULTIPLE LAYOUT FEATURES 审中-公开
Title translation: 从包含多个布局特征的电子文档自动提取数据的系统和方法

公开(公告)号：US20110255789A1

公开(公告)日：2011-10-20

申请号：US13007443

申请日：2011-01-14

Applicant: Depankar NEOGI , Steven K. LADD , Girish WELLING , Arjun KUMAR , Vartika SINGH , Matthew DUGGAN , Tushar MAHATA , Xiaobin YANG , Jian-Wu XU , Janice O'NEIL , Nirupam SARKAR , Gopal KRISHNA

Inventor： Depankar NEOGI , Steven K. LADD , Girish WELLING , Arjun KUMAR , Vartika SINGH , Matthew DUGGAN , Tushar MAHATA , Xiaobin YANG , Jian-Wu XU , Janice O'NEIL , Nirupam SARKAR , Gopal KRISHNA

IPC: G06K9/46

CPC classification number: G06K9/00442 , G06K9/48 , G06K9/72 , G06K2209/01

Abstract: A method of automatically extracting data from an electronic document containing a plurality of layout features through progressive refinement is provided. The method includes: analyzing each document to automatically extract images and text features wherein each document includes at least two features that are related to each other, and wherein said analyzing compares extracted features with a first search space of candidate features to try and recognize the extracted features; if one of the at least two related features is not recognized and at least one feature is recognized, selecting a second search space of candidate features in response thereto and in response to predefined rules about the relationship between the two features; and comparing the unrecognized feature with said selected second search space.

Abstract translation: 提供了一种通过逐步细化从包含多个布局特征的电子文档中自动提取数据的方法。该方法包括：分析每个文档以自动提取图像和文本特征，其中每个文档包括彼此相关的至少两个特征，并且其中所述分析将提取的特征与候选特征的第一搜索空间进行比较，以尝试并识别所提取的特征; 如果不识别至少两个相关特征中的一个并且识别出至少一个特征，则响应于此选择候选特征的第二搜索空间并且响应于关于两个特征之间的关系的预定规则; 以及将所述无法识别的特征与所述选择的第二搜索空间进行比较。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification