Corpus generation device and method, human-machine interaction system
摘要:
A corpus generation device and method, the device comprising: a segmentation module, connected to at least one monolingual parallel corpus for segmenting a sentence into words and processing the segmented words by a knowledge-driven approach; a classification module, for classifying sentences having different tag sequences but the same meaning into the same sentence cluster; a mapping module, for determining the categories of sentence structures of all the sentences in the sentence cluster, recording and storing a mapping mode for transforming tags between sentence structures when different categories of sentence structures in the same sentence cluster are transformed; a sentence structure generation module, for generating sentence structures according to a first mapping mode between a first category of sentence structures in one of the sentence clusters and other categories of sentence structures in the same sentence cluster; and a corpus generation module, for nesting a word corresponding to a sequence tag to generate a new monolingual parallel corpus.
信息查询
0/0