Methods and systems for extracting information from document images

发明授权

US11816913B2 Methods and systems for extracting information from document images 有权

请登陆查看更多内容

专利标题： Methods and systems for extracting information from document images
申请号： US17332021

申请日： 2021-05-27
公开(公告)号： US11816913B2

公开(公告)日： 2023-11-14
发明人: Mouli Rastogi , Syed Afshan Ali , Mrinal Rawat , Lovekesh Vig , Puneet Agarwal , Gautam Shroff , Ashwin Srinivasan
申请人： Tata Consultancy Services Limited
申请人地址： IN Mumbai
专利权人： Tata Consultancy Services Limited
当前专利权人： Tata Consultancy Services Limited
当前专利权人地址： IN Mumbai
代理机构： Finnegan, Henderson, Farabow, Garrett & Dunner, LLP
优先权： IN 2121008796 2021.03.02
主分类号： G06K9/00
IPC分类号： G06K9/00 ; G06F16/93 ; G06F16/901 ; G06K9/62 ; G06K9/40 ; G06K9/46 ; G06V30/418 ; G06V10/30 ; G06V10/426 ; G06V10/75 ; G06V30/413 ; G06V30/414 ; G06F18/21 ; G06F18/22 ; G06V30/18

Methods and systems for extracting information from document images

摘要：

This disclosure relates to a method and system for extracting information from images of one or more templatized documents. A knowledge graph with a fixed schema based on background knowledge is used to capture spatial and semantic relationships of entities present in scanned document and an adaptive lattice-based approach based on formal concepts analysis (FCA) is used to determine a similarity metric that utilizes both spatial and semantic information to determine if the structure of the scanned document image adheres to any of the known document templates. If a known document template whose structure is closely matching the structure of the scanned document is detected, then an inductive rule learning based approach is used to learn symbolic rules to extract information present in scanned document image and if a new document template is detected, then future scanned document images belonging to new document template are automatically processed using the learnt rules.

公开/授权文献

US20220284215A1 METHODS AND SYSTEMS FOR EXTRACTING INFORMATION FROM DOCUMENT IMAGES 公开/授权日：2022-09-08

信息查询

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )