- 专利标题: Unsupervised information extraction dictionary creation
-
申请号: US15342292申请日: 2016-11-03
-
公开(公告)号: US10558756B2公开(公告)日: 2020-02-11
- 发明人: Sheng Hua Bao , Su Yan
- 申请人: International Business Machines Corporation
- 申请人地址: US NY Armonk
- 专利权人: International Business Machines Corporation
- 当前专利权人: International Business Machines Corporation
- 当前专利权人地址: US NY Armonk
- 代理机构: ZIP Group PLLC
- 主分类号: G06F17/27
- IPC分类号: G06F17/27 ; G06F16/33 ; G06F16/36 ; G06F16/332
摘要:
A data handling system enables the unsupervised creation of an information extraction dictionary by expanding upon a word or phrase included within an expansion query. Prior to receiving the expansion query, the data handling system performs an unsupervised learning of an information corpus which includes text to assign a corpus vector to each word and phrase of the text. After the expansion query, the data handling system compares the expansion query to the corpus vectors. The data handling system ranks the corpus vectors by similarity to the expansion query and provides a ranked list of words or phrases associated with the ranked corpus vectors. The ranked list may be subsequently utilized as the information extraction dictionary.
公开/授权文献
- US20180121443A1 UNSUPERVISED INFORMATION EXTRACTION DICTIONARY CREATION 公开/授权日:2018-05-03
信息查询