ORTHOGRAPHICAL VARIANT DETECTION APPARATUS AND ORTHOGRAPHICAL VARIANT DETECTION PROGRAM
    1.
    发明申请
    ORTHOGRAPHICAL VARIANT DETECTION APPARATUS AND ORTHOGRAPHICAL VARIANT DETECTION PROGRAM 有权
    正交变异检测装置和正交变异检测程序

    公开(公告)号:US20130151239A1

    公开(公告)日:2013-06-13

    申请号:US13759528

    申请日:2013-02-05

    IPC分类号: G06F17/27

    CPC分类号: G06F17/2785 G06F17/2795

    摘要: Provided is an orthographical variant detection apparatus which detects orthographical variant candidates with a high precision. The orthographical variant detection apparatus includes a term extraction unit that extracts terms from document data, a similarity computation unit that computes similarity of an arbitrary pair of the extracted terms, an orthographical variant candidate determination unit that determines, based on the similarity, whether or not the terms in the pair of terms are orthographical variant candidates, and a group classification unit that groups the orthographical variant candidates based on a character string commonly included in pair of terms as the orthographical variant candidates.

    摘要翻译: 提供了以高精度检测正交变体候选的正交变体检测装置。 正交变体检测装置包括从文档数据中提取术语的术语提取单元,计算任意一对提取的项目的相似度的相似度计算单元,基于相似性确定是否相似的正交变量候选确定单元 一对项中的术语是正交变体候选,以及组分类单元,其基于通常包括在一对术语中的字符串作为正字变体候选来对正交变体候选进行分组。

    TOPIC EXTRACTION APPARATUS AND PROGRAM
    2.
    发明申请
    TOPIC EXTRACTION APPARATUS AND PROGRAM 有权
    主题提取装置和程序

    公开(公告)号:US20140019445A1

    公开(公告)日:2014-01-16

    申请号:US14023108

    申请日:2013-09-10

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3053 G06F17/2775

    摘要: According to one embodiment, a topic extracting apparatus extracts each term from a target document set, and calculates an appearance frequency of each term and a document frequency that each term appears. The topic extracting apparatus acquires a document set of appearance documents with respect to each extracted term, calculates a topic degree, extracts each term whose topic degree is not lower than a predetermined value as a topic word, and calculates freshness of the extracted topic word based on an appearance date and time. The topic extracting apparatus presents the extracted topic words in order of the freshness and also presents the number of appearance documents of each presented topic word per unit span.

    摘要翻译: 根据一个实施例,主题提取装置从目标文档集中提取每个术语,并且计算每个术语的出现频率和每个术语出现的文档频率。 主题提取装置获取关于每个提取的项目的文档集合的外观文档,计算主题度,将主题不低于预定值的每个术语提取为主题词,并且基于所提取的主题词的新鲜度计算 在出现日期和时间。 主题提取装置以新鲜的顺序呈现提取的主题词,并且还呈现每单位跨度中每个呈现的主题词的外观文档的数量。