摘要:
Some examples include detecting errors in text that has been recognized using automated text recognition technology. For instance, errors in the recognized text may be detected based on glyph image similarity and the use of a language model, dictionary information, or the like. Some implementations may group together glyphs based on association of the glyphs with the same glyph identifier and a similarity of the appearance of the glyphs. Furthermore, the words associated with each glyph may be checked against a language model, such as to check a spelling or other validity of the words, and a score may be assigned to each group of glyphs based on the validity of the words corresponding to the glyphs in that group. Groups that have a score that fails to meet a threshold may be reviewed by a person or may undergo automated correction techniques.
摘要:
An improved system and method for increasing the available workspace of a graphical user interface by providing reduced opacity of an element in the graphical user interface to make the workspace beneath the semi-transparent element visible. Later, the semi-transparent element may be made opaque again for better visibility to a user. An opacity manager may be operably coupled to a graphics interface of an operating system to change the opacity of an element of the graphical user interface. Any type of element of a graphical user interface may have its opacity reduced, including a window, a dialog box, a message box, a toolbar, a control, a button, a menu, and so forth. The system and method may reduce or increase the opacity of an element of the graphical user interface in response to any event including a system event, an application event, or a user interface event.
摘要:
Determination of an underlying grid structure that facilitates layout of East Asian text is disclosed. The underlying grid structure includes both a size of character frames and a size of a text block frame. The East Asian text may be obtained from a scan of printed material that has the text formatted according to layout conventions established by the publisher. The text may be reformatted to appear on a display of an electronic device in a manner similar to the formatting in the original scanned document. Reformatting may include reflowing the text in order to fit a greater or lesser number of characters on a line. The reflowing may maintain character spacing from the original document and follow formatting rules against locating certain characters at the start or end of a line.
摘要:
Systems and methods are provided for optimizing a glyph-based file. Individual components may be identified within glyphs of a file. Each identified component within a glyph may be a portion of the glyph, and may be a joint component or disjoint component. Groupings of components may then be determined, where the groupings are determined based at least in part by identifying similarly shaped components. A representative component may then be selected from each grouping. Composite glyphs may be generated and stored in an optimized file, where each composite glyph includes a reference to at least one representative component.
摘要:
Determination of an underlying grid structure that facilitates layout of East Asian text is disclosed. The underlying grid structure includes both a size of character frames and a size of a text block frame. The East Asian text may be obtained from a scan of printed material that has the text formatted according to layout conventions established by the publisher. The text may be reformatted to appear on a display of an electronic device in a manner similar to the formatting in the original scanned document. Reformatting may include reflowing the text in order to fit a greater or lesser number of characters on a line. The reflowing may maintain character spacing from the original document and follow formatting rules against locating certain characters at the start or end of a line.
摘要:
Systems and methods for improving automated processing of electronic media items are disclosed. In one embodiment, a computer system identifies a first set of regions of a page of an electronic media item, and a respective region type for at least one region of the first set, where the identification of the respective region type is based on one or more typographical features, historical data, and, optionally, the position and/or dimensions of the region. The computer system receives an identification by a user of a second set of regions of the page and a respective region type for at least one region of the second set, and then modifies the historical data when there is a difference between the regions and respective region types of the first set, and the regions and respective region types of the second set.
摘要:
A method for detecting and correcting skew in scanned vertical text includes identifying an image of vertically oriented characters, and identifying a plurality of vertical lines corresponding to character positions of the vertically oriented characters in the image. The method further includes generating an average slope of a subset of the plurality of lines, and causing the image to be deskewed based on the average slope.