METHODS AND SYSTEMS FOR DECISION-TREE-BASED AUTOMATED SYMBOL RECOGNITION
    3.
    发明申请
    METHODS AND SYSTEMS FOR DECISION-TREE-BASED AUTOMATED SYMBOL RECOGNITION 审中-公开
    用于基于决策树的自动符号识别的方法和系统

    公开(公告)号:US20160217123A1

    公开(公告)日:2016-07-28

    申请号:US14662570

    申请日:2015-03-19

    摘要: The current document is directed to methods and systems for identifying symbols corresponding to symbol images in a scanned-document image or other text-containing image, with the symbols corresponding to Chinese or Japanese characters, to Korean morpho-syllabic blocks, or to symbols of other languages that use a large number of symbols for writing and printing. In one implementation, the methods and systems to which the current document is directed create and store a decision tree, the nodes of which include classifiers that each recognizes the symbol that corresponds to a symbol image. Input of a symbol image to the decision tree and processing of the symbol image through one or more nodes of the decision tree returns a symbol corresponding to the symbol image.

    摘要翻译: 本文件涉及用于识别对应于扫描文档图像或其他含文本图像中的符号图像的符号的方法和系统,其中包含与中文或日文字符对应的符号,韩文形式音节块或符号 使用大量符号进行写入和打印的其他语言。 在一个实现中,当前文档所针对的方法和系统创建并存储决策树,其中节点包括分类器,每个分类器识别对应于符号图像的符号。 将符号图像输入到决策树并通过决策树的一个或多个节点处理符号图像返回与符号图像对应的符号。

    Text detection using features associated with neighboring glyph pairs
    4.
    发明授权
    Text detection using features associated with neighboring glyph pairs 有权
    使用与相邻字形对相关联的功能的文本检测

    公开(公告)号:US09367736B1

    公开(公告)日:2016-06-14

    申请号:US14842125

    申请日:2015-09-01

    IPC分类号: G06K9/46 G06K9/00 G06K9/34

    摘要: A multi-orientation text detection method and associated system is disclosed that utilizes orientation-variant glyph features to determine a text line in an image regardless of an orientation of the text line. Glyph features are determined for each glyph in an image with respect to a neighboring glyph. The glyph features are provided to a learned classifier that outputs a glyph pair score for each neighboring glyph pair. Each glyph pair score indicates a likelihood that the corresponding pair of neighboring glyphs form part of a same text line. The glyph pair scores are used to identify candidate text lines, which are then ranked to select a final set of text lines in the image.

    摘要翻译: 公开了一种多方向文本检测方法和相关系统,其利用取向变体字形特征来确定图像中的文本行,而不管文本行的取向如何。 为相对于相邻字形的图像中的每个字形确定字形特征。 字形特征被提供给学习的分类器,其为每个相邻字形对输出字形对分数。 每个字形对得分表示对应的相邻字形对形成相同文本行的一部分的可能性。 字形对分数用于识别候选文本行,然后将其排序以选择图像中的最后一组文本行。

    Method for segmenting text words in document images
    5.
    发明授权
    Method for segmenting text words in document images 有权
    在文件图像中分割文本字的方法

    公开(公告)号:US08965127B2

    公开(公告)日:2015-02-24

    申请号:US13826093

    申请日:2013-03-14

    申请人: Chaohong Wu Wei Ming

    发明人: Chaohong Wu Wei Ming

    IPC分类号: G06K9/34

    摘要: A word segmentation method for processing a document image applies clustering analysis to the spacing segments of a line. The spacing segments are generated by thresholding a one-dimensional vertical projection profile of the line. Taking advantage of the bimodal distribution of spacing length distribution of text lines, a k-means clustering algorithm is used, with the number of clusters pre-set to two, to classify the spacing segments as either character spacing or word spacing. Moreover, k-means++ initialization is used to enhance performance of cluster analysis. The clustering result such as cluster centers and compactness is used to prune single-word text line, single table item, etc. The locations of the word spacing segments are then used to segment the line of text into words.

    摘要翻译: 用于处理文档图像的单词分割方法将聚类分析应用于一行的间隔段。 通过对线的一维垂直投影轮廓进行阈值生成间距段。 利用文本行的间隔长度分布的双峰分布,使用k均值聚类算法,将簇的数量预先设置为2,将间隔段分为字符间距或字间距。 此外,使用k-means ++初始化来提高集群分析的性能。 集群中心和紧凑性等聚类结果用于修剪单字文本行,单表项等。然后,使用单词间隔段的位置将文本行分割成单词。

    METHOD FOR CUTTING OUT CHARACTER, CHARACTER RECOGNITION APPARATUS USING THIS METHOD, AND PROGRAM
    6.
    发明申请
    METHOD FOR CUTTING OUT CHARACTER, CHARACTER RECOGNITION APPARATUS USING THIS METHOD, AND PROGRAM 有权
    使用此方法切割字符,字符识别装置的方法和程序

    公开(公告)号:US20150015603A1

    公开(公告)日:2015-01-15

    申请号:US14378580

    申请日:2012-11-28

    申请人: OMRON Corporation

    发明人: Shiro Fujieda

    IPC分类号: G06T11/60

    摘要: A method for cutting out, from a gray-scale image generated by capturing an image of a character string, each character in the character string for recognition, includes a first step of repeating projection processing for projecting a highest or lowest gray level in a line along a direction crossing the character string in the gray-scale image, onto an axis along the character string, with the lowest gray level selected when a character in the gray-scale image is darker than a background, the highest gray level selected when the character in the gray-scale image is brighter than the background, and a projection target position moved along the character string.

    摘要翻译: 一种用于从通过捕获字符串的图像生成的灰度图像中切出用于识别的字符串中的每个字符的方法,包括:重复投影处理的第一步骤,用于在行中投射最高或最低灰度级 沿着与灰度图像中的字符串相交的方向移动到沿着字符串的轴上,当灰度图像中的字符比背景更暗时,选择最低灰度级,当当 灰度图像中的字符比背景亮,并且投影对象位置沿着字符串移动。

    Character recognition apparatus, character recognition method and program
    7.
    发明授权
    Character recognition apparatus, character recognition method and program 有权
    字符识别装置,字符识别方法和程序

    公开(公告)号:US08861862B2

    公开(公告)日:2014-10-14

    申请号:US13478585

    申请日:2012-05-23

    申请人: Ichiko Sata

    发明人: Ichiko Sata

    摘要: The character recognition apparatus recognizes characters from a read document original to correct a character string as a character recognition result in a word unit with a space character as a separator. The character recognition apparatus includes a circumscribed rectangle formation portion which forms a circumscribed rectangle for each recognized alphabet character string, a fixed-pitch font determination portion which determines whether or not a font is a fixed-pitch font based on a distance between center lines in a width direction of adjacent circumscribed rectangles, a portion for determining an excess space character which determines, in the case of a fixed-pitch font, that the space character is an excess based on that a width of a space character in the character string is narrower than a predetermined width, and a portion for deleting the space character determined as an excess from the character string.

    摘要翻译: 字符识别装置识别来自读取的原稿的字符,以将字符串作为字符识别结果校正到具有空格字符的字单元中作为分隔符。 字符识别装置包括对于每个识别的字母字符串形成外接矩形的外接矩形形成部分,基于中心线之间的距离来确定字体是否是固定间距字体的固定间距字体确定部分 相邻外接矩形的宽度方向,用于确定多余空格字符的部分,在固定间距字体的情况下,根据字符串中的空格字符的宽度确定空格字符是多余的, 比预定宽度窄的部分,以及用于删除从字符串中确定为过量的空格字符的部分。

    METHOD FOR SEGMENTING TEXT WORDS IN DOCUMENT IMAGES
    8.
    发明申请
    METHOD FOR SEGMENTING TEXT WORDS IN DOCUMENT IMAGES 有权
    在文件图像中分隔文本词的方法

    公开(公告)号:US20140270526A1

    公开(公告)日:2014-09-18

    申请号:US13826093

    申请日:2013-03-14

    申请人: Chaohong Wu Wei Ming

    发明人: Chaohong Wu Wei Ming

    IPC分类号: G06K9/34

    摘要: A word segmentation method for processing a document image applies clustering analysis to the spacing segments of a line. The spacing segments are generated by thresholding a one-dimensional vertical projection profile of the line. Taking advantage of the bimodal distribution of spacing length distribution of text lines, a k-means clustering algorithm is used, with the number of clusters pre-set to two, to classify the spacing segments as either character spacing or word spacing. Moreover, k-means++ initialization is used to enhance performance of cluster analysis. The clustering result such as cluster centers and compactness is used to prune single-word text line, single table item, etc. The locations of the word spacing segments are then used to segment the line of text into words.

    摘要翻译: 用于处理文档图像的单词分割方法将聚类分析应用于一行的间隔段。 通过对线的一维垂直投影轮廓进行阈值生成间距段。 利用文本行的间隔长度分布的双峰分布,使用k均值聚类算法,将簇的数量预先设置为2,将间隔段分为字符间距或字间距。 此外,使用k-means ++初始化来提高集群分析的性能。 集群中心和紧凑性等聚类结果用于修剪单字文本行,单表项等。然后,使用单词间隔段的位置将文本行分割成单词。

    APPARATUS, METHOD AND PROGRAM FOR CHARACTER RECOGNITION
    9.
    发明申请
    APPARATUS, METHOD AND PROGRAM FOR CHARACTER RECOGNITION 有权
    字符识别的装置,方法和程序

    公开(公告)号:US20140185106A1

    公开(公告)日:2014-07-03

    申请号:US14142079

    申请日:2013-12-27

    发明人: Hiroshi NAKAMURA

    IPC分类号: G06K9/78 G06K9/00

    CPC分类号: G06K9/348 G06K2209/01

    摘要: A character recognition apparatus may include an imaging element configured to read a character string placed on an information recording medium; an image memory configured to store image data of the character string; and a character segmenting unit configured to segment a character constituting the character string. The character segmenting unit may include a minimum intensity curve creating unit configured to detect a minimum intensity value among light intensity values, and create a minimum intensity curve of the image data according to the minimum intensity value of each pixel row; a character segmenting position detecting unit configured to calculate a space between the characters neighboring in the created minimum intensity curve, in order to detect a character segmenting position between the characters; and a character segmenting process unit configured to segment each character according to the detected character segmenting position between the characters.

    摘要翻译: 字符识别装置可以包括被配置为读取放置在信息记录介质上的字符串的成像元件; 图像存储器,被配置为存储所述字符串的图像数据; 以及字符分割单元,被配置为对构成所述字符串的字符进行分割。 字符分割单元可以包括最小强度曲线生成单元,被配置为检测光强度值中的最小强度值,并且根据每个像素行的最小强度值创建图像数据的最小强度曲线; 字符分割位置检测单元,被配置为计算所生成的最小强度曲线中相邻的字符之间的空间,以便检测字符之间的字符分割位置; 以及字符分割处理单元,被配置为根据所检测到的字符之间的字符分割位置来分割每个字符。

    System and Method for Selecting and Displaying Segmentation Parameters for Optical Character Recognition
    10.
    发明申请
    System and Method for Selecting and Displaying Segmentation Parameters for Optical Character Recognition 有权
    用于选择和显示光学字符识别的分割参数的系统和方法

    公开(公告)号:US20140105497A1

    公开(公告)日:2014-04-17

    申请号:US13684007

    申请日:2012-11-21

    IPC分类号: G06K9/34

    摘要: A computer-implemented method for selecting at least one segmentation parameter for optical character recognition is provided. The method can include receiving an image having a character string that includes one or more characters. The method can also include receiving a character string identifying each of the one or more characters. The method can also include automatically generating at least one segmentation parameter. The method can also include performing segmentation on the image having the character string using the at least one segmentation parameter. The method can also include determining if a resultant segmentation satisfies one or more criteria and if the resultant segmentation satisfies the one or more criteria, selecting the at least one segmentation parameter.

    摘要翻译: 提供了一种用于选择用于光学字符识别的至少一个分割参数的计算机实现的方法。 该方法可以包括接收具有包括一个或多个字符的字符串的图像。 该方法还可以包括接收标识一个或多个字符中的每一个的字符串。 该方法还可以包括自动生成至少一个分割参数。 该方法还可以包括使用至少一个分割参数对具有该字符串的图像执行分割。 该方法还可以包括确定所得到的分割是否满足一个或多个标准,并且如果所得到的分割满足一个或多个标准,则选择所述至少一个分割参数。