Determination of inputted image to be document or non-document
    1.
    发明授权
    Determination of inputted image to be document or non-document 有权
    输入图像的确定为文档或非文档

    公开(公告)号:US08385643B2

    公开(公告)日:2013-02-26

    申请号:US12353440

    申请日:2009-01-14

    IPC分类号: G06K9/00 G06K9/54 G06F17/00

    摘要: A preprocessing section binarizes input image data and calculates a total black pixel ratio. A feature extracting section detects connected components included in the binary image data and detects circumscribing bounding boxes of the connected components. Predetermined connected components are removed from all of the connected components based on the sizes of the detected circumscribing bounding boxes and bounding box black pixel ratios. By using the connected components that remain after removing the unnecessary connected components, a histogram is generated by specifying the sizes of the circumscribing bounding boxes as classes and numbers of the connected components as the frequencies of occurrence. A determining section determines whether the input image data is document image data or non-document image data based on information related to the generated histogram and the total black pixel ratio.

    摘要翻译: 预处理部分对输入图像数据进行二值化并计算总黑色像素比。 特征提取部分检测二进制图像数据中包括的连接分量并检测连接分量的外接边界框。 基于检测到的外接边界框和边框黑色像素比的尺寸,将所有连接的组件从所有连接的组件中移除。 通过使用除去不必要的连接部件后剩余的连接部件,通过将外接边界框的尺寸指定为连接部件的类别和编号作为发生频率来生成直方图。 确定部分基于与所生成的直方图和总黑色像素比相关的信息来确定输入图像数据是文档图像数据还是非文档图像数据。

    Image determination apparatus, image search apparatus and computer readable recording medium storing an image search program
    2.
    发明授权
    Image determination apparatus, image search apparatus and computer readable recording medium storing an image search program 有权
    图像确定装置,图像搜索装置和存储图像搜索程序的计算机可读记录介质

    公开(公告)号:US08200012B2

    公开(公告)日:2012-06-12

    申请号:US12393772

    申请日:2009-02-26

    IPC分类号: G06K9/34

    CPC分类号: G06K9/54 G06K9/346 G06K9/522

    摘要: A preprocessing section binarizes input image data and calculates a total black pixel ratio. A feature extracting section detects connected components contained in the binarized image data and detects circumscribing bounding boxes that circumscribe these connected components, respectively. Based on sizes of the circumscribing bounding boxes detected and numbers of black pixels contained therein, predetermined connected components are removed. A determining section generates an edge map by using the residual connected components, and performs two-dimensional fast Fourier transform thereon to generate spectral data. The determining section performs two-dimensional fast Fourier transform on template images to generate spectral data. The determining section determines, based on these pieces of spectral data, whether or not a circular shape is contained in the input image data.

    摘要翻译: 预处理部分对输入图像数据进行二值化并计算总黑色像素比。 特征提取部分检测二值化图像数据中包含的连接分量并检测分别围绕这些连接分量的外接边界框。 基于检测到的外接边界框的大小和包含在其中的黑色像素的数量,去除了预定的连接部件。 确定部通过使用剩余连接分量来生成边缘图,并对其进行二维快速傅立叶变换以产生光谱数据。 确定部分对模板图像执行二维快速傅立叶变换以产生光谱数据。 确定部分基于这些光谱数据确定输入图像数据中是否包含圆形形状。

    IMAGE DETERMINATION APPARATUS, IMAGE SEARCH APPARATUS AND COMPUTER READABLE RECORDING MEDIUM STORING AN IMAGE SEARCH PROGRAM
    3.
    发明申请
    IMAGE DETERMINATION APPARATUS, IMAGE SEARCH APPARATUS AND COMPUTER READABLE RECORDING MEDIUM STORING AN IMAGE SEARCH PROGRAM 有权
    图像确定装置,图像搜索装置和计算机可读记录媒体存储图像搜索程序

    公开(公告)号:US20090263025A1

    公开(公告)日:2009-10-22

    申请号:US12393772

    申请日:2009-02-26

    IPC分类号: G06K9/46

    CPC分类号: G06K9/54 G06K9/346 G06K9/522

    摘要: A preprocessing section binarizes input image data and calculates a total black pixel ratio. A feature extracting section detects connected components contained in the binarized image data and detects circumscribing bounding boxes that circumscribe these connected components, respectively. Based on sizes of the circumscribing bounding boxes detected and numbers of black pixels contained therein, predetermined connected components are removed. A determining section generates an edge map by using the residual connected components, and performs two-dimensional fast Fourier transform thereon to generate spectral data. The determining section performs two-dimensional fast Fourier transform on template images to generate spectral data. The determining section determines, based on these pieces of spectral data, whether or not a circular shape is contained in the input image data.

    摘要翻译: 预处理部分对输入图像数据进行二值化并计算总黑色像素比。 特征提取部分检测二值化图像数据中包含的连接分量并检测分别围绕这些连接分量的外接边界框。 基于检测到的外接边界框的大小和包含在其中的黑色像素的数量,去除了预定的连接部件。 确定部通过使用剩余连接分量来生成边缘图,并对其进行二维快速傅立叶变换以产生光谱数据。 确定部分对模板图像执行二维快速傅立叶变换以产生光谱数据。 确定部分基于这些光谱数据确定输入图像数据中是否包含圆形形状。

    IMAGE DETERMINATION APPARATUS, IMAGE SEARCH APPARATUS AND A RECORDING MEDIUM ON WHICH AN IMAGE SEARCH PROGRAM IS RECORDED
    4.
    发明申请
    IMAGE DETERMINATION APPARATUS, IMAGE SEARCH APPARATUS AND A RECORDING MEDIUM ON WHICH AN IMAGE SEARCH PROGRAM IS RECORDED 有权
    图像确定装置,图像搜索装置和记录图像搜索程序的记录介质

    公开(公告)号:US20090245640A1

    公开(公告)日:2009-10-01

    申请号:US12353440

    申请日:2009-01-14

    IPC分类号: G06K9/34

    摘要: A preprocessing section binarizes input image data and calculates a total black pixel ratio. A feature extracting section detects connected components included in the binary image data and detects circumscribing bounding boxes of the connected components. Predetermined connected components are removed from all of the connected components based on the sizes of the detected circumscribing bounding boxes and bounding box black pixel ratios. By using the connected components that remain after removing the unnecessary connected components, a histogram is generated by specifying the sizes of the circumscribing bounding boxes as classes and numbers of the connected components as the frequencies of occurrence. A determining section determines whether the input image data is document image data or non-document image data based on information related to the generated histogram and the total black pixel ratio.

    摘要翻译: 预处理部分对输入图像数据进行二值化并计算总黑色像素比。 特征提取部分检测二进制图像数据中包括的连接分量并检测连接分量的外接边界框。 基于检测到的外接边界框和边框黑色像素比的尺寸,将所有连接的组件从所有连接的组件中移除。 通过使用除去不必要的连接部件后剩余的连接部件,通过将外接边界框的尺寸指定为连接部件的类别和编号作为发生频率来生成直方图。 确定部分基于与所生成的直方图和总黑色像素比相关的信息来确定输入图像数据是文档图像数据还是非文档图像数据。

    Image document processing device, image document processing method, program, and storage medium
    5.
    发明授权
    Image document processing device, image document processing method, program, and storage medium 有权
    图像文件处理装置,图像文件处理方法,程序和存储介质

    公开(公告)号:US08290269B2

    公开(公告)日:2012-10-16

    申请号:US11953695

    申请日:2007-12-10

    CPC分类号: G06K9/6828 G06F17/30253

    摘要: A headline-region initial processing section clips a headline-region image in an image document, divides the image into individual character images, and extracts features of the individual character images. Based on the features, a candidate-character-sequence generating section selects N (N is an integer more than 1) character images as candidate characters in the order of degree of matching from a font-feature dictionary for storing features of individual character images, and generates M×N index matrix where M is the number of characters in an extracted character sequence. Based on the index matrix, a document-name generating section generates a meaningful document name according to the image document. An image-document-DB management section manages accumulated image documents using the document name. This provides an image document processing device and an image document processing method each allowing automatically generating and managing the meaningful document name that represents the contents of the image document, without user's operation.

    摘要翻译: 标题区域初始处理部分剪切图像文档中的标题区域图像,将图像分割成单独的字符图像,并且提取单个字符图像的特征。 基于特征,候选字符序列生成部从用于存储各个字符图像的特征的字体特征词典中选择N(N为1以上的整数)的字符图像作为匹配度的顺序的候选字符, 并生成M×N索引矩阵,其中M是提取的字符序列中的字符数。 基于索引矩阵,文档名称生成部根据图像文档生成有意义的文档名称。 图像文档DB管理部分使用文档名称来管理累积的图像文档。 这提供了一种图像文档处理设备和图像文档处理方法,每种图像文档处理方法都允许在不需要用户操作的情况下自动地生成和管理表示图像文档的内容的有意义的文档名称。

    DOCUMENT IMAGE PROCESSING APPARATUS, DOCUMENT IMAGE PROCESSING METHOD, DOCUMENT IMAGE PROCESSING PROGRAM, AND RECORDING MEDIUM ON WHICH DOCUMENT IMAGE PROCESSING PROGRAM IS RECORDED
    6.
    发明申请
    DOCUMENT IMAGE PROCESSING APPARATUS, DOCUMENT IMAGE PROCESSING METHOD, DOCUMENT IMAGE PROCESSING PROGRAM, AND RECORDING MEDIUM ON WHICH DOCUMENT IMAGE PROCESSING PROGRAM IS RECORDED 有权
    文件图像处理装置,文件图像处理方法,文件图像处理程序和记录文件图像处理程序的记录介质

    公开(公告)号:US20090028446A1

    公开(公告)日:2009-01-29

    申请号:US11972446

    申请日:2008-01-10

    IPC分类号: G06K9/72

    摘要: An image of a character string composed of M pieces of characters is clipped from a document image, and the image is divided into separate characters. Image features of each character image are extracted. Based on the image features, N (N>1, integer) pieces of character images in descending order of degree of similarity are selected as candidate characters, from a character image feature dictionary which stores the image features of character image in units of character, and a first index matrix of M×N cells is prepared. A candidate character string composed of a plurality of candidate characters constituting a first column of the first index matrix, is subjected to a lexical analysis according to a language model, and whereby a second index matrix having a character string which makes sense is prepared. In the language model, statistics are taken and then, the lexical analysis is performed.

    摘要翻译: 从文件图像剪切由M个字符组成的字符串的图像,并且将图像划分为单独的字符。 提取每个字符图像的图像特征。 基于图像特征,从以字符为单位存储字符图像的图像特征的字符图像特征词典中,选择相似度降序的N(N> 1,整数)个字符图像作为候选字符, 并准备MxN单元的第一指标矩阵。 由构成第一索引矩阵的第一列的多个候选字符组成的候选字符串根据语言模型进行词法分析,由此准备具有有意义的字符串的第二索引矩阵。 在语言模型中,进行统计,然后进行词法分析。

    Image document processing device, image document processing method, program, and storage medium
    7.
    发明申请
    Image document processing device, image document processing method, program, and storage medium 有权
    图像文件处理装置,图像文件处理方法,程序和存储介质

    公开(公告)号:US20080181505A1

    公开(公告)日:2008-07-31

    申请号:US11953695

    申请日:2007-12-10

    IPC分类号: G06K9/46

    CPC分类号: G06K9/6828 G06F17/30253

    摘要: A headline-region initial processing section clips a headline-region image in an image document, divides the image into individual character images, and extracts features of the individual character images. Based on the features, a candidate-character-sequence generating section selects N (N is an integer more than 1) character images as candidate characters in the order of degree of matching from a font-feature dictionary for storing features of individual character images, and generates M×N index matrix where M is the number of characters in an extracted character sequence. Based on the index matrix, a document-name generating section generates a meaningful document name according to the image document. An image-document-DB management section manages accumulated image documents using the document name. This provides an image document processing device and an image document processing method each allowing automatically generating and managing the meaningful document name that represents the contents of the image document, without user's operation.

    摘要翻译: 标题区域初始处理部分剪切图像文档中的标题区域图像,将图像分割成单独的字符图像,并且提取单个字符图像的特征。 基于特征,候选字符序列生成部从用于存储各个字符图像的特征的字体特征词典中选择N(N为1以上的整数)的字符图像作为匹配度的顺序的候选字符, 并生成MxN索引矩阵,其中M是提取的字符序列中的字符数。 基于索引矩阵,文档名称生成部根据图像文档生成有意义的文档名称。 图像文档DB管理部分使用文档名称来管理累积的图像文档。 这提供了一种图像文档处理设备和图像文档处理方法,每种图像文档处理方法都允许在不需要用户操作的情况下自动地生成和管理表示图像文档的内容的有意义的文档名称。

    Document image processing apparatus
    8.
    发明授权
    Document image processing apparatus 有权
    文件图像处理装置

    公开(公告)号:US08160402B2

    公开(公告)日:2012-04-17

    申请号:US11972477

    申请日:2008-01-10

    IPC分类号: G06K9/03 G06K9/18

    摘要: An image of a character string composed of M pieces of characters is clipped from a document image, and the image is divided character by character, and image features of each character image are extracted. On the basis of the image features, N (N>1, integer) pieces of character images in descending order of degree of similarity are selected as candidate characters from a character image feature dictionary which stores the image features of character image in units of character, and the first index matrix of M×N cells is prepared. A candidate character string composed of a plurality of candidate characters constituting the first column of the first index matrix, is subjected to a lexical analysis according to a predetermined language model, whereby a second index matrix adjusted into a character string which makes sense is prepared to be utilized for searching.

    摘要翻译: 从文件图像中剪辑由M个字符组成的字符串的图像,并且逐个地分割图像,并且提取每个字符图像的图像特征。 基于图像特征,从以字符为单位存储字符图像的图像特征的字符图像特征词典中选择作为相似度降序的N(N> 1,整数)个字符图像的候选字符 ,并准备M×N个单元的第一个索引矩阵。 由构成第一索引矩阵的第一列的多个候选字符构成的候选字符串根据预定语言模型进行词法分析,由此将调整为有意义的字符串的第二索引矩阵准备为 用于搜索。

    HANDWRITING RECOGNITION METHOD AND DEVICE
    9.
    发明申请
    HANDWRITING RECOGNITION METHOD AND DEVICE 审中-公开
    手写识别方法和设备

    公开(公告)号:US20120014601A1

    公开(公告)日:2012-01-19

    申请号:US13258084

    申请日:2010-06-23

    IPC分类号: G06K9/34

    摘要: A handwriting recognition method and a handwriting recognition device are provided to recognize a character sequence continuously inputted by a user for convenience. The present method comprises steps of calculating various features of the inputted character sequence which include single character recognition accuracy features and space geometry features of different stroke combinations in the inputted character sequence, calculating segmentation reliabilities of respective stroke combinations in different segmented patterns by using a probabilistic model in which coefficients of the probabilistic model are estimated by a parameter estimation method through sample trainings, recognizing characters in different writing patterns by using a multiple-template matching method when performing single character recognition of the stroke combinations, searching for the best segmentation path and conducting post-processing to optimize the recognition results. The present method and device have advantages of simple structure, low hardware requirement, fast recognition speed and high recognition accuracy and can be implemented in an embedded system.

    摘要翻译: 提供手写识别方法和手写识别装置,以方便用户识别连续输入的字符序列。 本方法包括以下步骤:计算输入的字符序列的各种特征,其包括输入的字符序列中的不同笔划组合的单个字符识别精度特征和空间几何特征,通过使用概率来计算不同分段模式中的各笔划组合的分段可靠性 模型,其中通过样本训练通过参数估计方法估计概率模型的系数,当执行笔划组合的单个字符识别时,通过使用多模板匹配方法识别不同写入模式中的字符,搜索最佳分割路径和 进行后期处理以优化识别结果。 本发明的方法和装置具有结构简单,硬件要求低,识别速度快,识别精度高等优点,可以在嵌入式系统中实现。

    Search and retrieval of documents indexed by optical character recognition
    10.
    发明授权
    Search and retrieval of documents indexed by optical character recognition 有权
    搜索和检索通过光学字符识别索引的文档

    公开(公告)号:US08208765B2

    公开(公告)日:2012-06-26

    申请号:US11972446

    申请日:2008-01-10

    IPC分类号: G06K9/00

    摘要: An image of a character string composed of M pieces of characters is clipped from a document image, and the image is divided into separate characters. Image features of each character image are extracted. Based on the image features, N (N>1, integer) pieces of character images in descending order of degree of similarity are selected as candidate characters, from a character image feature dictionary which stores the image features of character image in units of character, and a first index matrix of M×N cells is prepared. A candidate character string composed of a plurality of candidate characters constituting a first column of the first index matrix, is subjected to a lexical analysis according to a language model, and whereby a second index matrix having a character string which makes sense is prepared. In the language model, statistics are taken and then, the lexical analysis is performed.

    摘要翻译: 从文件图像剪切由M个字符组成的字符串的图像,并且将图像划分为单独的字符。 提取每个字符图像的图像特征。 基于图像特征,从以字符为单位存储字符图像的图像特征的字符图像特征词典中,选择相似度降序的N(N> 1,整数)个字符图像作为候选字符, 并准备M×N个单元的第一个索引矩阵。 由构成第一索引矩阵的第一列的多个候选字符组成的候选字符串根据语言模型进行词法分析,由此准备具有有意义的字符串的第二索引矩阵。 在语言模型中,进行统计,然后进行词法分析。