- 专利标题: Methods and systems for extracting information from document images
-
申请号: US17332021申请日: 2021-05-27
-
公开(公告)号: US11816913B2公开(公告)日: 2023-11-14
- 发明人: Mouli Rastogi , Syed Afshan Ali , Mrinal Rawat , Lovekesh Vig , Puneet Agarwal , Gautam Shroff , Ashwin Srinivasan
- 申请人: Tata Consultancy Services Limited
- 申请人地址: IN Mumbai
- 专利权人: Tata Consultancy Services Limited
- 当前专利权人: Tata Consultancy Services Limited
- 当前专利权人地址: IN Mumbai
- 代理机构: Finnegan, Henderson, Farabow, Garrett & Dunner, LLP
- 优先权: IN 2121008796 2021.03.02
- 主分类号: G06K9/00
- IPC分类号: G06K9/00 ; G06F16/93 ; G06F16/901 ; G06K9/62 ; G06K9/40 ; G06K9/46 ; G06V30/418 ; G06V10/30 ; G06V10/426 ; G06V10/75 ; G06V30/413 ; G06V30/414 ; G06F18/21 ; G06F18/22 ; G06V30/18
摘要:
This disclosure relates to a method and system for extracting information from images of one or more templatized documents. A knowledge graph with a fixed schema based on background knowledge is used to capture spatial and semantic relationships of entities present in scanned document and an adaptive lattice-based approach based on formal concepts analysis (FCA) is used to determine a similarity metric that utilizes both spatial and semantic information to determine if the structure of the scanned document image adheres to any of the known document templates. If a known document template whose structure is closely matching the structure of the scanned document is detected, then an inductive rule learning based approach is used to learn symbolic rules to extract information present in scanned document image and if a new document template is detected, then future scanned document images belonging to new document template are automatically processed using the learnt rules.
公开/授权文献
信息查询