Abstract:
Provided is an expression extraction device for extracting evaluation expressions from text having descriptions on evaluations of a specific evaluation target, which includes a registered expression storage unit for registering an evaluation expression including a predetermined polarity as a registered expression, an expression extraction unit for extracting multiple evaluation expressions and a conjunction expression from the text, a registered expression detection unit for detecting the evaluation expression including the registered expression registered with the registered expression storage unit out of the multiple evaluation expressions, and a polarity judgment unit for judging that the evaluation expression, which is in conjunction with the evaluation expression including the registered expression by means of the conjunction expression in a form of ordinary conjunction, and the series of evaluation expressions, which are not in conjunction with the evaluation expression by means of the conjunction expression in any form of the ordinary conjunction and adversative/concessive conjunction and are not in conjunction with each other by means of the conjunction expression in any form of the ordinary conjunction and the adversative/concessive conjunction, are of the same polarity as the registered expression.
Abstract:
A system and method for detecting preference expressions indicating evaluators' likes and dislikes of a product from evaluations of the product and stores text describing evaluation of the product in association with an attribute of the text. The method extracts an evaluating expression describing evaluation of the specific object from each of the texts, determines whether the extracted evaluating expression has positive or negative polarity, where the positive indicates favorable evaluation and the negative indicates unfavorable evaluation. The system inputs a text attribute that is designated as an object for detecting the preference expression and detects an evaluating expression, which is detected from a text having an input attribute from the extracted evaluating expressions as one of the preference expressions and outputs the preference expressions in association with a frequency of the preference expressions being determined to have the positive or negative polarity in the text having the attribute.
Abstract:
Provided is an expression extraction device for extracting evaluation expressions from text having descriptions on evaluations of a specific evaluation target, which includes a registered expression storage unit for registering an evaluation expression including a predetermined polarity as a registered expression, an expression extraction unit for extracting multiple evaluation expressions and a conjunction expression from the text, a registered expression detection unit for detecting the evaluation expression including the registered expression registered with the registered expression storage unit out of the multiple evaluation expressions, and a polarity judgment unit for judging that the evaluation expression, which is in conjunction with the evaluation expression including the registered expression by means of the conjunction expression in a form of ordinary conjunction, and the series of evaluation expressions, which are not in conjunction with the evaluation expression by means of the conjunction expression in any form of the ordinary conjunction and adversative/concessive conjunction and are not in conjunction with each other by means of the conjunction expression in any form of the ordinary conjunction and the adversative/concessive conjunction, are of the same polarity as the registered expression.
Abstract:
A system, method, and program product to correctly detect a preference expression indicating persons' likes and dislikes of a commercial product or the like. Specifically, the expression detecting system for detecting preference expressions indicating evaluators' likes and dislikes of a specific object from texts describing evaluation of the specific object, and stores each of the texts describing evaluation of a specific object in association with an attribute of the text. The method extracts an evaluating expression describing evaluation of the specific object from each of the texts; determines whether the extracted evaluating expression has positive polarity or negative polarity, where the positive polarity indicates favorable evaluation of the specific object and the negative polarity indicates unfavorable evaluation of the specific object. The system then inputs a text attribute that is designated as an object for detecting the preference expressions; and detects an evaluating expression, which is detected from a text having an input attribute from the extracted evaluating expressions as one of the preference expressions and outputs the preference expressions in association with a frequency of the preference expressions being determined to have the positive polarity or the negative polarity in the text having the attribute.
Abstract:
A system, a method, and a program for selecting advertisements for displaying advertisements associated with content when the content is sent via communication means such as the Internet. Providing suppression of the appearance of an advertisement on the internet when one role label, “Victimizer”, “Victim” or “Beneficiary” is systematically assigned to a corresponding thing referred to in the article: a company, a person, or a product. The advertisement selection mechanism controls suppression of the appearance of advertisements associated with such role labels and can calculate the affinity or linkage between an article and the advertisement as a value. The suppression of the advertisement occurs by subtracting the value of the affinity. The degree of suppression by role labels is preferentially decreased over time.
Abstract:
A defect predicate expression extraction device. The device extracts, as candidates for predicate expressions representing defects, predicate expressions occurring in the neighborhood of predicate modifying expressions representing suddenness or predicate modifying expressions representing repeatability. The defect predicate expression extraction device further extracts, as predicate expressions representing normality, predicate expressions occurring in the neighborhood of predicate modifying expressions representing normality and extracts predicate expressions representing defects by removing the predicate expressions representing normality from a list of the candidates for predicate expressions representing defects.
Abstract:
Sentence boundaries in noisy conversational transcription data are automatically identified. Noise and transcription symbols are removed, and a training set is formed with sentence boundaries marked based on long silences or on manual markings in the transcribed data. Frequencies of head and tail n-grams that occur at the beginning and ending of sentences are determined from the training set. N-grams that occur a significant number of times in the middle of sentences in relation to their occurrences at the beginning or ending of sentences are filtered out. A boundary is marked before every head n-gram and after every tail n-gram occurring in the conversational data and remaining after filtering. Turns are identified. A boundary is marked after each turn, unless the turn ends with an impermissible tail word or is an incomplete turn. The marked boundaries in the conversational data identify sentence boundaries.
Abstract:
A technique for extracting a meaningful text block from a document where a table, an itemized list, a multiple column, etc., are arbitrarily laid out. A document is input which is laid out using blanks or the like, then a symbol is acquired which is associated with a spatial coordinate of the document. Consecutive characters of the same type are extracted from the symbol to generate a token and a space. A stream is generated from consecutive spaces in the column direction, while a text block is generated from streams and tokens. A link is generated between the text blocks to form a document graph. Validity of a connection (link) between the text blocks in the document graph is evaluated using a language model, then the text blocks are merged if the connection is valid.
Abstract:
A computer implemented method, system, and product for finding correspondence between terms in two different languages. The method includes the steps of: creating a technical term set and a general term set for each of i) a first language and ii) a second language, creating two bipartite graphs, where each graph corresponds to one of the two languages, and connects the technical term set and general term set of each language, respectively, with weighted links based on corpus information, creating a third bipartite graph by creating weighted links between general terms in the first language and general terms in the second language by using a translation dictionary, creating an association matrix M corresponding to the three bipartite graphs, calculating a similarity matrix Q by calculation of an inverse matrix, and outputting correspondence between the technical term sets of the first and second language on basis of the similarity matrix.
Abstract:
A method and system for extracting opinions about a subject of interest from a text document in which each sentence is analyzed individually to identify the opinions. The most relevant feature terms related to the subject are extracted from the document based on their relevancy scores. Candidate feature terms are definite noun phrases at the beginning of the sentences. For each sentence that refers to the subject or a feature term, the invention determines whether the sentence includes an opinion polarity about the subject or the feature term. The opinion polarity is detected by identifying opinion terms in the sentence using an opinion dictionary or an opinion rule base, parsing the sentence with an English parser to identify grammatical components in the sentence and their relationships, and finding a matching entry in the dictionary or the rule base.