发明申请
US20140270526A1 METHOD FOR SEGMENTING TEXT WORDS IN DOCUMENT IMAGES 有权
在文件图像中分隔文本词的方法

METHOD FOR SEGMENTING TEXT WORDS IN DOCUMENT IMAGES
摘要:
A word segmentation method for processing a document image applies clustering analysis to the spacing segments of a line. The spacing segments are generated by thresholding a one-dimensional vertical projection profile of the line. Taking advantage of the bimodal distribution of spacing length distribution of text lines, a k-means clustering algorithm is used, with the number of clusters pre-set to two, to classify the spacing segments as either character spacing or word spacing. Moreover, k-means++ initialization is used to enhance performance of cluster analysis. The clustering result such as cluster centers and compactness is used to prune single-word text line, single table item, etc. The locations of the word spacing segments are then used to segment the line of text into words.
公开/授权文献
信息查询
0/0