摘要:
Provided is an orthographical variant detection apparatus which detects orthographical variant candidates with a high precision. The orthographical variant detection apparatus includes a term extraction unit that extracts terms from document data, a similarity computation unit that computes similarity of an arbitrary pair of the extracted terms, an orthographical variant candidate determination unit that determines, based on the similarity, whether or not the terms in the pair of terms are orthographical variant candidates, and a group classification unit that groups the orthographical variant candidates based on a character string commonly included in pair of terms as the orthographical variant candidates.
摘要:
According to one embodiment, a topic extracting apparatus extracts each term from a target document set, and calculates an appearance frequency of each term and a document frequency that each term appears. The topic extracting apparatus acquires a document set of appearance documents with respect to each extracted term, calculates a topic degree, extracts each term whose topic degree is not lower than a predetermined value as a topic word, and calculates freshness of the extracted topic word based on an appearance date and time. The topic extracting apparatus presents the extracted topic words in order of the freshness and also presents the number of appearance documents of each presented topic word per unit span.