摘要:
A lexical processor and its method of use is provided. The lexical processor includes an input interface (300) and a word generator (302) for producing an output as a function of an input word and a confusion matrix. The confusion matrix is a handwriting error model that is based on the recognition capabilities of classifiers used in preprocessing inputs to the lexical processor. The lexical processor output comprises any of the following: the input word, a rejection indicator, a candidate replacement word, or a suggestion list of related words.
摘要:
The invention provides a computer-implementable method for detecting substroke boundaries in handwriting input. The method selects pen tip velocity extremas to represent. substroke boundaries. The method includes steps for generating a velocity profile from the handwriting input; identifying a plurality of peak extrema within the velocity profile; identifying a plurality of in-line extrema within the velocity profile; and detecting the substroke boundaries by filtering the plurality of peak extrema and the plurality of in-line extrema.
摘要:
A diacritical marker recognition system and method recognizes diacritical markers in a character image based upon an analysis by a neural network of the portion of the character image most likely to contain a diacritical marker. Once the neural network determines that a diacritical marker most likely exists in the character image, the system determines by using heuristics whether a diacritical marker exists or whether the character image appears to contain a diacritical marker which is actually a regular character.
摘要:
A method and system of identifying text in a handwriting input is provided. The system includes a feature extractor (30) and a classifier (32). The feature extractor (30) extracts a plurality of features from handwriting input. The classifier (32) classifies the handwriting input according to a discriminant function that is based on a polynomial expansion. The text is identified according to the discriminant function output.
摘要:
A post-processing method for an optical character recognition (OCR) method for combining different OCR engines to identify and resolve characters and attributes of the characters that are erroneously recognized by multiple optical character recognition engines. The characters can originate from many different types of character environments. OCR engine outputs are synchronized in order to detect matches and mismatches between said OCR engine outputs by using synchronization heuristics. The mismatches are resolved using resolution heuristics and neural networks. The resolution heuristics and neural networks are based on observing many different conventional OCR engines in different character environments to find what specific OCR engine correctly identifies a certain character having particular attributes. The results are encoded into the resolution heuristics and neural networks to create an optimal OCR post-processing solution.