Abstract:
Embodiments herein provide computer-implemented techniques for allowing a user computing device to extract financial card information using optical character recognition (“OCR”). Extracting financial card information may be improved by applying various classifiers and other transformations to the image data. For example, applying a linear classifier to the image to determine digit locations before applying the OCR algorithm allows the user computing device to use less processing capacity to extract accurate card data. The OCR application may train a classifier to use the wear patterns of a card to improve OCR algorithm performance. The OCR application may apply a linear classifier and then a nonlinear classifier to improve the performance and the accuracy of the OCR algorithm. The OCR application uses the known digit patterns used by typical credit and debit cards to improve the accuracy of the OCR algorithm.
Abstract:
Extracting financial card information with relaxed alignment comprises a method to receive an image of a card, determine one or more edge finder zones in locations of the image, and identify lines in the one or more edge finder zones. The method further identifies one or more quadrilaterals formed by intersections of extrapolations of the identified lines, determines an aspect ratio of the one or more quadrilateral, and compares the determined aspect ratios of the quadrilateral to an expected aspect ratio. The method then identifies a quadrilateral that matches the expected aspect ratio and performs an optical character recognition algorithm on the rectified model. A similar method is performed on multiple cards in an image. The results of the analysis of each of the cards are compared to improve accuracy of the data.
Abstract:
Methods, systems, and apparatus, including computer program products, for ranking search results for queries. The method includes calculating a visual similarity score for one or more pairs of images in a plurality of images based on visual features of images in each of the one or more pairs; building a graph of images by linking each of one or more images in the plurality of images to one or more nearest neighbor images based on the visual similarity scores; associating a respective score with each of one or more images in the graph based on data indicative of user behavior relative to the image as a search result for a query; and determining a new score for each of one or more images in the graph based on the respective score of the image, and the respective scores of one or more nearest neighbors to the image.
Abstract:
Extracting card data comprises receiving, by one or more computing devices, a digital image of a card; perform an image recognition process on the digital representation of the card; identifying an image in the digital representation of the card; comparing the identified image to an image database comprising a plurality of images and determining that the identified image matches a stored image in the image database; determining a card type associated with the stored image and associating the card type with the card based on the determination that the identified image matches the stored image; and performing a particular optical character recognition algorithm on the digital representation of the card, the particular optical character recognition algorithm being based on the determined card type. Another example uses an issuer identification number to improve data extraction. Another example compares extracted data with user data to improve accuracy.
Abstract:
Techniques are provided for segmenting an input by cut point classification and training a cut classifier. A method may include receiving, by a computerized text recognition system, an input in a script. A heuristic may be applied to the input to insert multiple cut points. For each of the cut points, a probability may be generated and the probability may indicate a likelihood that the cut point is correct. Multiple segments of the input may be selected, and the segments may be defined by cut points having a probability over a threshold. Next, the segments of the input may be provided to a character recognizer. Additionally, a method may include training a cut classifier using a machine learning technique, based on multiple text training examples, to determine the correctness of a cut point in an input.
Abstract:
Capturing information from payment instruments comprises receiving, using one or more computer devices, an image of a back side of a payment instrument, the payment instrument comprising information imprinted thereon such that the imprinted information protrudes from a front side of the payment instrument and the imprinted information is indented into the back side of the payment instrument; extracting sets of characters from the image of the back side of the payment instrument based on the imprinted information indented into the back side of the payment instrument and depicted in the image of the back side of the payment instrument; applying a first character recognition application to process the sets of characters extracted from the image of the back side of the payment instrument; and categorizing each of the sets of characters into one of a plurality of categories relating to information required to conduct a payment transaction.
Abstract:
This specification relates to presenting image search results. In general, one aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving an image query, the image query being a query for image search results; receiving ranked image search results responsive to the image query, the image search results each including an identification of a corresponding image resource; generating a similarity matrix for images identified by the image search results; generating a hierarchical grouping of the images using the similarity matrix; identifying a canonical image for each group in the hierarchical grouping using a ranking measure; and presenting a visual representation of the image search results based on the hierarchical grouping and the identified canonical images.
Abstract:
Embodiments herein provide computer-implemented techniques for allowing a user computing device to extract financial card information using optical character recognition (“OCR”). Extracting financial card information may be improved by applying various classifiers and other transformations to the image data. For example, applying a linear classifier to the image to determine digit locations before applying the OCR algorithm allows the user computing device to use less processing capacity to extract accurate card data. The OCR application may train a classifier to use the wear patterns of a card to improve OCR algorithm performance. The OCR application may apply a linear classifier and then a nonlinear classifier to improve the performance and the accuracy of the OCR algorithm. The OCR application uses the known digit patterns used by typical credit and debit cards to improve the accuracy of the OCR algorithm.
Abstract:
Methods and systems for recognizing Devanagari script handwriting are provided. A method may include receiving a handwritten input and determining that the handwritten input comprises a shirorekha stroke based on one or more shirorekha detection criteria. Shirorekha detection criteria may be at least one criterion such as a length of the shirorekha stroke, a horizontality of the shirorekha stroke, a straightness of the shirorekha stroke, a position in time at which the shirorekha stroke is made in relation to one or more other strokes in the handwritten input, and the like. Next, one or more recognized characters may be provided corresponding to the handwritten input.
Abstract:
Comparing extracted card data from a continuous scan comprises receiving, by one or more computing devices, a digital scan of a card; obtaining a plurality of images of the card from the digital scan of the physical card; performing an optical character recognition algorithm on each of the plurality of images; comparing results of the application of the optical character recognition algorithm for each of the plurality of images; determining if a configured threshold of the results for each of the plurality of images match each other; and verifying the results when the results for each of the plurality of images match each other. Threshold confidence level for the extracted card data can be employed to determine the accuracy of the extraction. Data is further extracted from blended images and three-dimensional models of the card. Embossed text and holograms in the images may be used to prevent fraud.