发明授权
- 专利标题: Compressing data for natural language processing
- 专利标题(中): 压缩自然语言处理的数据
-
申请号: US14026240申请日: 2013-09-13
-
公开(公告)号: US09146918B2公开(公告)日: 2015-09-29
- 发明人: Yousuf Mohamed Ashparie , Aaron Keith Baughman
- 申请人: International Business Machines Corporation
- 申请人地址: US NY Armonk
- 专利权人: INTERNATIONAL BUSINESS MACHINES CORPORATION
- 当前专利权人: INTERNATIONAL BUSINESS MACHINES CORPORATION
- 当前专利权人地址: US NY Armonk
- 代理机构: Garg Law Firm, PLLC
- 代理商 Rakesh Garg; Matthew Chung
- 主分类号: G06F17/30
- IPC分类号: G06F17/30 ; G06F7/00 ; G06F17/28
摘要:
Data pertaining to a subject matter domain, a set of text strings forming a set of seeds, a description of a linguistic structure present in a language of the domain-related data, and a statistical model applicable to the domain-related data are received. A set of portions of the domain-related data is extracted, a portion in the set of portions forming a nugget. A nugget matches the statistical model according to a criterion, and conforms to the linguistic structure within a threshold degree. The nugget is scored according to a subset of a set of features found in the nuggets. A subset of nuggets is selected. A score of each nugget included in the subset of nuggets exceeds a score threshold. The subset of nuggets is combined to form a pseudo-document. The pseudo-document is submitted to an application for answering a question related to the domain.
公开/授权文献
- US20150081275A1 COMPRESSING DATA FOR NATURAL LANGUAGE PROCESSING 公开/授权日:2015-03-19
信息查询