Patent search ap:"Tushar MAHATA" Page 1

1.

发明申请
SYSTEMS AND METHODS TO AUTOMATICALLY CLASSIFY ELECTRONIC DOCUMENTS USING EXTRACTED IMAGE AND TEXT FEATURES AND USING A MACHINE LEARNING SUBSYSTEM 审中-公开
Title translation: 使用提取的图像和文字特征以及使用机器学习子系统自动分类电子文档的系统和方法

公开(公告)号：US20090116736A1

公开(公告)日：2009-05-07

申请号：US12266462

申请日：2008-11-06

Applicant: Depankar Neogi , Steven K. Ladd , Dilnawaj Ahmed , Arjun Kumar , Tushar Mahata

Inventor： Depankar Neogi , Steven K. Ladd , Dilnawaj Ahmed , Arjun Kumar , Tushar Mahata

IPC: G06K9/62

CPC classification number: G06K9/00442 , G06K9/6885

Abstract: A document analysis system that automatically classifies documents by recognizing in each document distinctive features comprises a document acquisition system, a document recognition training system, a document classification system, a document recognition system, and a job organization system. The document acquisition system receives jobs wherein each job containing at least one electronic document. The document feature recognition system automatically extracts image and text features from each received document. The document classification system automatically classifies recognized electronic documents by finding the best match between the extracted features of each of the document and feature sets associated with each category of document. The document recognition training system automatically trains the feature set for each corresponding category of documents, wherein the training system using extracted features of unrecognized documents automatically modifies the feature set for a document category. The job organization system automatically organizes each job according to the document categories it contains.

Abstract translation: 一种文档分析系统，通过在每个文档中识别独特的特征来自动分类文档包括文档获取系统，文档识别训练系统，文档分类系统，文档识别系统和作业组织系统。文档获取系统接收作业，其中每个作业包含至少一个电子文档。文档特征识别系统自动从每个收到的文档中提取图像和文本特征。文档分类系统通过找到与每个文档类别相关联的每个文档和特征集的提取的特征之间的最佳匹配来自动对识别的电子文档进行分类。文档识别训练系统自动训练每个相应类别的文档的特征集，其中使用提取的无法识别的文档的特征的训练系统自动修改文档类别的特征集。作业组织系统根据其所包含的文档类别自动组织每个作业。

2.

发明授权
Systems and methods for automatically processing electronic documents 有权
Title translation: 自动处理电子文件的系统和方法

公开(公告)号：US08897563B1

公开(公告)日：2014-11-25

申请号：US14064935

申请日：2013-10-28

Applicant: Girish Welling , Nirupam Sarkar , Tushar Mahata , Vartika Singh , Depankar Neogi , Steven K. Ladd

Inventor： Girish Welling , Nirupam Sarkar , Tushar Mahata , Vartika Singh , Depankar Neogi , Steven K. Ladd

IPC: G06K9/34 , G06K9/00

CPC classification number: G06K9/00442 , G06K9/48 , G06K9/72 , G06K2209/01

Abstract: In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically pre-processing each received electronic document using a plurality of image transformation algorithms to improve subsequent data extraction from said document is provided. The method includes: electronically partitioning each received electronic document page into pieces; automatically processing each piece of the received electronic document page using each of a plurality of image pre-processing algorithms to produce a plurality of image variations of each piece; and analyzing the outputs of subsequent processing and data extraction, on each of the image variations of the pieces to determine which output is best, from the plurality of outputs for each piece.

Abstract translation: 在接收和处理来自多个用户的作业的文档分析系统中，每个作业可以包含多个电子文档，以从电子文档中提取数据;一种使用多个图像自动预处理每个接收到的电子文档的方法提供了用于改进从所述文档提取后续数据的转换算法。该方法包括：将每个接收的电子文档页面电子分割成片; 使用多个图像预处理算法中的每一个自动处理所接收的电子文档页面以产生每个片段的多个图像变体; 并且对于每个片段的图像变化分析后续处理和数据提取的输出，以从每个片段的多个输出中确定哪个输出最佳。

3.

发明申请
SYSTEMS AND METHODS FOR AUTOMATICALLY EXTRACTING DATA FROM ELECTRONIC DOCUMENTS CONTAINING MULTIPLE LAYOUT FEATURES 审中-公开
Title translation: 从包含多个布局特征的电子文档自动提取数据的系统和方法

公开(公告)号：US20110255789A1

公开(公告)日：2011-10-20

申请号：US13007443

申请日：2011-01-14

Applicant: Depankar NEOGI , Steven K. LADD , Girish WELLING , Arjun KUMAR , Vartika SINGH , Matthew DUGGAN , Tushar MAHATA , Xiaobin YANG , Jian-Wu XU , Janice O'NEIL , Nirupam SARKAR , Gopal KRISHNA

Inventor： Depankar NEOGI , Steven K. LADD , Girish WELLING , Arjun KUMAR , Vartika SINGH , Matthew DUGGAN , Tushar MAHATA , Xiaobin YANG , Jian-Wu XU , Janice O'NEIL , Nirupam SARKAR , Gopal KRISHNA

IPC: G06K9/46

CPC classification number: G06K9/00442 , G06K9/48 , G06K9/72 , G06K2209/01

Abstract: A method of automatically extracting data from an electronic document containing a plurality of layout features through progressive refinement is provided. The method includes: analyzing each document to automatically extract images and text features wherein each document includes at least two features that are related to each other, and wherein said analyzing compares extracted features with a first search space of candidate features to try and recognize the extracted features; if one of the at least two related features is not recognized and at least one feature is recognized, selecting a second search space of candidate features in response thereto and in response to predefined rules about the relationship between the two features; and comparing the unrecognized feature with said selected second search space.

Abstract translation: 提供了一种通过逐步细化从包含多个布局特征的电子文档中自动提取数据的方法。该方法包括：分析每个文档以自动提取图像和文本特征，其中每个文档包括彼此相关的至少两个特征，并且其中所述分析将提取的特征与候选特征的第一搜索空间进行比较，以尝试并识别所提取的特征; 如果不识别至少两个相关特征中的一个并且识别出至少一个特征，则响应于此选择候选特征的第二搜索空间并且响应于关于两个特征之间的关系的预定规则; 以及将所述无法识别的特征与所述选择的第二搜索空间进行比较。

4.

发明申请
SYSTEMS AND METHODS FOR AUTOMATICALLY EXTRACTING DATA FROM ELETRONIC DOCUMENTS USING MULTIPLE CHARACTER RECOGNITION ENGINES 审中-公开
Title translation: 使用多个字符识别引擎从ELETRONIC文件自动提取数据的系统和方法

公开(公告)号：US20110255784A1

公开(公告)日：2011-10-20

申请号：US13007434

申请日：2011-01-14

Applicant: Girish WELLING , Vartika SINGH , Gopal KRISHNA , Tushar MAHATA , Nirupam SARKAR , Depankar NEOGI , Steven K. LADD

Inventor： Girish WELLING , Vartika SINGH , Gopal KRISHNA , Tushar MAHATA , Nirupam SARKAR , Depankar NEOGI , Steven K. LADD

IPC: G06K9/18

CPC classification number: G06K9/00442 , G06K9/48 , G06K9/72 , G06K2209/01

Abstract: In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically extracting data from each received electronic document using a plurality of character recognition engines is provided. The method includes: automatically processing each received electronic document page using each of a plurality of recognition engines to extract data; comparing quality of data extracted from each of the recognition engines to assign a confidence score to the extracted data; and selecting extracted data having highest confidence score as the correct extracted data.

Abstract translation: 在从每个作业可以包含多个电子文档的多个用户接收和处理作业的文档分析系统中，从电子文档中提取数据的方法，使用多个字符从每个接收到的电子文档中自动提取数据的方法提供识别引擎。该方法包括：使用多个识别引擎中的每一个自动处理所接收的电子文档页面以提取数据; 比较从每个识别引擎提取的数据的质量，以向所提取的数据分配置信度分数; 并选择具有最高置信度得分的提取数据作为正确的提取数据。

5.

发明授权
Systems and methods for automatically processing electronic documents using multiple image transformation algorithms 有权
Title translation: 使用多个图像变换算法自动处理电子文档的系统和方法

公开(公告)号：US08571317B2

公开(公告)日：2013-10-29

申请号：US13007452

申请日：2011-01-14

Applicant: Girish Welling , Nirupam Sarkar , Tushar Mahata , Vartika Singh , Depankar Neogi , Steven K. Ladd

Inventor： Girish Welling , Nirupam Sarkar , Tushar Mahata , Vartika Singh , Depankar Neogi , Steven K. Ladd

IPC: G06K9/34

CPC classification number: G06K9/00442 , G06K9/48 , G06K9/72 , G06K2209/01

Abstract: In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically pre-processing each received electronic document using a plurality of image transformation algorithms to improve subsequent data extraction from said document is provided. The method includes: electronically partitioning each received electronic document page into pieces; automatically processing each piece of the received electronic document page using each of a plurality of image pre-processing algorithms to produce a plurality of image variations of each piece; and analyzing the outputs of subsequent processing and data extraction, on each of the image variations of the pieces to determine which output is best, from the plurality of outputs for each piece.

Abstract translation: 在接收和处理来自多个用户的作业的文档分析系统中，每个作业可以包含多个电子文档，以从电子文档中提取数据;一种使用多个图像自动预处理每个接收到的电子文档的方法提供了用于改进从所述文档提取后续数据的转换算法。该方法包括：将每个接收的电子文档页面电子分割成片; 使用多个图像预处理算法中的每一个自动处理所接收的电子文档页面以产生每个片段的多个图像变体; 并且对于每个片段的图像变化分析后续处理和数据提取的输出，以从每个片段的多个输出中确定哪个输出最佳。

6.

发明申请
SYSTEMS AND METHODS FOR TRAINING DOCUMENT ANALYSIS SYSTEM FOR AUTOMATICALLY EXTRACTING DATA FROM DOCUMENTS 审中-公开
Title translation: 用于培训文件分析系统的系统和方法，用于从文档自动提取数据

公开(公告)号：US20110258150A1

公开(公告)日：2011-10-20

申请号：US13007430

申请日：2011-01-14

Applicant: Depankar NEOGI , Steven K. LADD , Girish WELLING , Arjun KUMAR , Vartika SINGH , Matthew DUGGAN , Tushar MAHATA , Xiaobin YANG , Jian-Wu XU , Janice O'NEIL , Nirupam SARKAR , Gopal KRISHNA

Inventor： Depankar NEOGI , Steven K. LADD , Girish WELLING , Arjun KUMAR , Vartika SINGH , Matthew DUGGAN , Tushar MAHATA , Xiaobin YANG , Jian-Wu XU , Janice O'NEIL , Nirupam SARKAR , Gopal KRISHNA

IPC: G06F15/18

CPC classification number: G06K9/00442 , G06K9/48 , G06K9/72 , G06K2209/01

Abstract: A method of training a document analysis system to extract data from documents is provided. The method includes: automatically analyzing images and text features extracted from a document to associate the document with a corresponding document category; comparing the extracted text features with a set of text features associated with corresponding category of the document, in which the set of text features includes a set of characters, words, and phrases; if the extracted features are found to consist of the characters, words, and phrases belonging to the set of text features associated with the corresponding document category, storing the extracted text features as the data contained in the corresponding document; and, if the extracted text features are found to include at least one text feature that does not belong to the set of text features associated with the corresponding document category, submitting the unrecognized text features to a training phase.

Abstract translation: 提供了一种培训文档分析系统从文档中提取数据的方法。该方法包括：自动分析从文档中提取的图像和文本特征，将文档与相应的文档类别相关联; 将所提取的文本特征与与文档的相应类别相关联的一组文本特征进行比较，其中该组文本特征包括一组字符，单词和短语; 如果发现所提取的特征由属于与相应文档类别相关联的文本特征集合的字符，单词和短语组成，则将所提取的文本特征存储为包含在相应文档中的数据; 并且如果所提取的文本特征被发现包括不属于与相应文档类别相关联的一组文本特征的至少一个文本特征，则将未被识别的文本特征提交到训练阶段。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification