Image reading apparatus and information processing apparatus that reads documents and generates image data
摘要:
Provided is an image reading apparatus capable of eliminating the need for a user to correct a character portion that cannot be recognized by OCR and improve the operation burden on the user. A non-word detecting unit detects a non-word that is not considered to be a word among a plurality of words constituting the text in a document. A determining unit determines whether or not a compound word obtained by combining the non-word with at least one of the word immediately before the non-word and the word immediately after the non-word in that arrangement order is a word. A character correcting unit identifies the text portion corresponding to the compound word in the text in the document as a failed character recognition portion, and corrects the text of the failed character recognition portion to the text of the compound word.
信息查询
0/0