- 专利标题: Corpus generation device and method, human-machine interaction system
-
申请号: US15694918申请日: 2017-09-04
-
公开(公告)号: US10268678B2公开(公告)日: 2019-04-23
- 发明人: Nan Qiu , Haofen Wang
- 申请人: SHENZHEN GOWILD ROBOTICS CO., LTD.
- 申请人地址: CN Shenzhen
- 专利权人: SHENZHEN GOWILD ROBOTICS CO., LTD.
- 当前专利权人: SHENZHEN GOWILD ROBOTICS CO., LTD.
- 当前专利权人地址: CN Shenzhen
- 代理机构: Im IP Law
- 代理商 C. Andrew Im; Chai Im
- 主分类号: G06F17/21
- IPC分类号: G06F17/21 ; G06F17/27 ; G06F17/28
摘要:
A corpus generation device and method, the device comprising: a segmentation module, connected to at least one monolingual parallel corpus for segmenting a sentence into words and processing the segmented words by a knowledge-driven approach; a classification module, for classifying sentences having different tag sequences but the same meaning into the same sentence cluster; a mapping module, for determining the categories of sentence structures of all the sentences in the sentence cluster, recording and storing a mapping mode for transforming tags between sentence structures when different categories of sentence structures in the same sentence cluster are transformed; a sentence structure generation module, for generating sentence structures according to a first mapping mode between a first category of sentence structures in one of the sentence clusters and other categories of sentence structures in the same sentence cluster; and a corpus generation module, for nesting a word corresponding to a sequence tag to generate a new monolingual parallel corpus.
公开/授权文献
信息查询