发明申请
- 专利标题: METHOD FOR SEGMENTING TEXT WORDS IN DOCUMENT IMAGES
- 专利标题(中): 在文件图像中分隔文本词的方法
-
申请号: US13826093申请日: 2013-03-14
-
公开(公告)号: US20140270526A1公开(公告)日: 2014-09-18
- 发明人: Chaohong Wu , Wei Ming
- 申请人: Chaohong Wu , Wei Ming
- 申请人地址: US CA San Mateo
- 专利权人: KONICA MINOLTA LABORATORY U.S.A., INC.
- 当前专利权人: KONICA MINOLTA LABORATORY U.S.A., INC.
- 当前专利权人地址: US CA San Mateo
- 主分类号: G06K9/34
- IPC分类号: G06K9/34
摘要:
A word segmentation method for processing a document image applies clustering analysis to the spacing segments of a line. The spacing segments are generated by thresholding a one-dimensional vertical projection profile of the line. Taking advantage of the bimodal distribution of spacing length distribution of text lines, a k-means clustering algorithm is used, with the number of clusters pre-set to two, to classify the spacing segments as either character spacing or word spacing. Moreover, k-means++ initialization is used to enhance performance of cluster analysis. The clustering result such as cluster centers and compactness is used to prune single-word text line, single table item, etc. The locations of the word spacing segments are then used to segment the line of text into words.
公开/授权文献
- US08965127B2 Method for segmenting text words in document images 公开/授权日:2015-02-24
信息查询