Compressing data for natural language processing

发明授权

US09146918B2 Compressing data for natural language processing 有权

标题翻译：压缩自然语言处理的数据

请登陆查看更多内容

专利标题： Compressing data for natural language processing
专利标题（中）： 压缩自然语言处理的数据
申请号： US14026240

申请日： 2013-09-13
公开(公告)号： US09146918B2

公开(公告)日： 2015-09-29
发明人: Yousuf Mohamed Ashparie , Aaron Keith Baughman
申请人： International Business Machines Corporation
申请人地址： US NY Armonk
专利权人： INTERNATIONAL BUSINESS MACHINES CORPORATION
当前专利权人： INTERNATIONAL BUSINESS MACHINES CORPORATION
当前专利权人地址： US NY Armonk
代理机构： Garg Law Firm, PLLC
代理商 Rakesh Garg; Matthew Chung
主分类号： G06F17/30
IPC分类号： G06F17/30 ; G06F7/00 ; G06F17/28

Compressing data for natural language processing

摘要：

Data pertaining to a subject matter domain, a set of text strings forming a set of seeds, a description of a linguistic structure present in a language of the domain-related data, and a statistical model applicable to the domain-related data are received. A set of portions of the domain-related data is extracted, a portion in the set of portions forming a nugget. A nugget matches the statistical model according to a criterion, and conforms to the linguistic structure within a threshold degree. The nugget is scored according to a subset of a set of features found in the nuggets. A subset of nuggets is selected. A score of each nugget included in the subset of nuggets exceeds a score threshold. The subset of nuggets is combined to form a pseudo-document. The pseudo-document is submitted to an application for answering a question related to the domain.

摘要（中）：

接收与主题域有关的数据，形成一组种子的一组文本串，以与域相关数据的语言存在的语言结构的描述，以及适用于域相关数据的统计模型。提取与域相关数据的一组部分，该部分组成部分的部分。矿块根据标准匹配统计模型，符合阈值范围内的语言结构。根据掘金中发现的一组功能的一个子集，得分块。选择块的子集。掘金子集中包含的每个矿块的得分超过分数阈值。块的子集合以形成伪文档。伪文档被提交给应用程序以回答与域相关的问题。

公开/授权文献

US20150081275A1 COMPRESSING DATA FOR NATURAL LANGUAGE PROCESSING 公开/授权日：2015-03-19

信息查询

Espacenet