Automatic generation of training data for hand-printed text recognition
摘要:
A method for generating training data for hand-printed text recognition includes obtaining a structured document, obtaining a set of hand-printed character images and database metadata from a database, generating a modified document page image, and outputting a training file. The structured document includes a document page image that includes text characters and document metadata that associates each of the text characters to a document character label. The database metadata associates each of the set of hand-printed character images to a database character label. The modified document page image is generated by iteratively processing each of the text characters. The iterative processing includes determining whether an individual text character should be replaced, selecting a replacement hand-printed character image from the set of hand-printed character images, scaling the replacement hand-printed character image, and inserting the replacement hand-printed character image into the modified document page image.
信息查询
0/0