发明授权
- 专利标题: Techniques for comparing and clustering documents
- 专利标题(中): 技术比较和聚类文件
-
申请号: US13177849申请日: 2011-07-07
-
公开(公告)号: US08983963B2公开(公告)日: 2015-03-17
- 发明人: Klaus Fittges , Khalid El Mansouri
- 申请人: Klaus Fittges , Khalid El Mansouri
- 申请人地址: DE Darmstadt
- 专利权人: Software AG
- 当前专利权人: Software AG
- 当前专利权人地址: DE Darmstadt
- 代理机构: Nixon & Vanderhye PC
- 主分类号: G06F7/00
- IPC分类号: G06F7/00 ; G06F17/30
摘要:
Certain example embodiments relate to techniques for analyzing documents. A plurality of documents/document portions are imported into a database, with at least some of the documents/document portions being structured and at least some being unstructured. The imported documents/document portions are organized into one or more collections. A selection of at least one of the one or more collections is made. An index of words and/or groups of words is built (and optionally refined in accordance with one or more predefined rules) based on each of the document or document portion in each selection. A document-word matrix is built (and optionally weighted using a semantic approach), with the matrix including a value indicative of a number of times each word and/or group of words in the index appears in each document/document portion. One or more clusters of documents are generated using the document-word matrix.
公开/授权文献
- US20130013612A1 TECHNIQUES FOR COMPARING AND CLUSTERING DOCUMENTS 公开/授权日:2013-01-10
信息查询