Invention Grant
- Patent Title: Document processing method, system and medium
-
Application No.: US09891080Application Date: 2001-06-25
-
Publication No.: US07046847B2Publication Date: 2006-05-16
- Inventor: Matthew F. Hurst , Tetsuya Nasukawa
- Applicant: Matthew F. Hurst , Tetsuya Nasukawa
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agent Satheesh Kara; Louis Percello; David Aker
- Priority: JP2000-190335 20000623
- Main IPC: G06K9/34
- IPC: G06K9/34

Abstract:
A technique for extracting a meaningful text block from a document where a table, an itemized list, a multiple column, etc., are arbitrarily laid out. A document is input which is laid out using blanks or the like, then a symbol is acquired which is associated with a spatial coordinate of the document. Consecutive characters of the same type are extracted from the symbol to generate a token and a space. A stream is generated from consecutive spaces in the column direction, while a text block is generated from streams and tokens. A link is generated between the text blocks to form a document graph. Validity of a connection (link) between the text blocks in the document graph is evaluated using a language model, then the text blocks are merged if the connection is valid.
Public/Granted literature
- US20020016796A1 Document processing method, system and medium Public/Granted day:2002-02-07
Information query