发明授权
- 专利标题: Using canonical forms to develop a dictionary of names in a text
- 专利标题(中): 使用规范形式在文本中开发名称字典
-
申请号: US678929申请日: 1996-07-12
-
公开(公告)号: US5832480A公开(公告)日: 1998-11-03
- 发明人: Roy Jefferson Byrd, Jr. , Misook A. Choi , Yael Ravin , Faye Nina Wacholder
- 申请人: Roy Jefferson Byrd, Jr. , Misook A. Choi , Yael Ravin , Faye Nina Wacholder
- 申请人地址: NY Armonk
- 专利权人: International Business Machines Corporation
- 当前专利权人: International Business Machines Corporation
- 当前专利权人地址: NY Armonk
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
Descriptive canonical forms of entity types are created by scanning one or more documents in a database of a computer system to identify one or more proper names that appear in the documents as raw names. Each of the raw names has zero or more proper names, zero or more medial substrings, zero or more leading substrings, and zero or more trailing substrings. The raw names of one or more documents are "cleaned" and "split" until certain "cleaning and splitting conditions" are no longer met to obtain a list of clean and split candidate names. Anchor names are selected from the list that unambiguously represent an entity type. The anchor names have one or more entity-type attribute values. Variant names, clean and split candidate names having one or more shared attribute (values) with the anchor name, are combined with the anchor name to create an equivalence group of names that refer to the same entity. A canonical form is generated for the group from a subset of the anchor name attributes. A canonical form is created in this manner for all of the clean and split candidate names on the list.
公开/授权文献
- US5239627A Bi-directional parallel printer interface 公开/授权日:1993-08-24
信息查询