METHOD, PROGRAM AND SYSTEM FOR FINDING CORRESPONDENCE BETWEEN TERMS
    1.
    发明申请
    METHOD, PROGRAM AND SYSTEM FOR FINDING CORRESPONDENCE BETWEEN TERMS 失效
    方法,程序和系统,用于发现条款之间的相关性

    公开(公告)号:US20120232884A1

    公开(公告)日:2012-09-13

    申请号:US13413866

    申请日:2012-03-07

    CPC classification number: G06F17/2809

    Abstract: A computer implemented method, system, and product for finding correspondence between terms in two different languages. The method includes the steps of: creating a technical term set and a general term set for each of i) a first language and ii) a second language, creating two bipartite graphs, where each graph corresponds to one of the two languages, and connects the technical term set and general term set of each language, respectively, with weighted links based on corpus information, creating a third bipartite graph by creating weighted links between general terms in the first language and general terms in the second language by using a translation dictionary, creating an association matrix M corresponding to the three bipartite graphs, calculating a similarity matrix Q by calculation of an inverse matrix, and outputting correspondence between the technical term sets of the first and second language on basis of the similarity matrix.

    Abstract translation: 一种计算机实现的方法,系统和产品,用于查找两种不同语言的术语之间的对应关系。 该方法包括以下步骤:为i)第一语言和ii)第二语言创建技术术语集和一般术语集,创建两个二分图,其中每个图对应于两种语言之一,并且连接 分别使用基于语料库信息的加权链接的每个语言的技术术语集和通用术语集合,通过使用翻译词典在第一语言的一般术语和第二语言的一般术语之间创建加权链接来创建第三个二分图 创建对应于三个二分图的关联矩阵M,通过计算逆矩阵来计算相似度矩阵Q,并且基于相似性矩阵输出第一和第二语言的技术术语集之间的对应关系。

    Method and system for extracting opinions from text documents
    2.
    发明授权
    Method and system for extracting opinions from text documents 有权
    从文本文件中提取意见的方法和系统

    公开(公告)号:US08200477B2

    公开(公告)日:2012-06-12

    申请号:US10692025

    申请日:2003-10-22

    Abstract: A method and system for extracting opinions about a subject of interest from a text document in which each sentence is analyzed individually to identify the opinions. The most relevant feature terms related to the subject are extracted from the document based on their relevancy scores. Candidate feature terms are definite noun phrases at the beginning of the sentences. For each sentence that refers to the subject or a feature term, the invention determines whether the sentence includes an opinion polarity about the subject or the feature term. The opinion polarity is detected by identifying opinion terms in the sentence using an opinion dictionary or an opinion rule base, parsing the sentence with an English parser to identify grammatical components in the sentence and their relationships, and finding a matching entry in the dictionary or the rule base.

    Abstract translation: 一种方法和系统,用于从单独分析每个句子的文本文档中提取有关感兴趣主题的意见,以识别意见。 根据相关性分数,从文档中提取与该主题相关的最相关的功能术语。 候选特征术语是句子开头的明确的名词短语。 对于涉及主题或特征术语的每个句子,本发明确定句子是否包括关于被摄体或特征项的意见极性。 通过使用意见词典或意见规则库识别句子中的意见词来检测意见极性,用英语解析器解析句子以识别句子中的语法成分及其关系,并在字典中找到匹配的条目或 规则库。

    Expression detecting system, an expression detecting method and a program
    3.
    发明授权
    Expression detecting system, an expression detecting method and a program 有权
    表达检测系统,表达检测方法和程序

    公开(公告)号:US07546310B2

    公开(公告)日:2009-06-09

    申请号:US11273924

    申请日:2005-11-14

    Abstract: A system and method for detecting preference expressions indicating evaluators' likes and dislikes of a product from evaluations of the product and stores text describing evaluation of the product in association with an attribute of the text. The method extracts an evaluating expression describing evaluation of the specific object from each of the texts, determines whether the extracted evaluating expression has positive or negative polarity, where the positive indicates favorable evaluation and the negative indicates unfavorable evaluation. The system inputs a text attribute that is designated as an object for detecting the preference expression and detects an evaluating expression, which is detected from a text having an input attribute from the extracted evaluating expressions as one of the preference expressions and outputs the preference expressions in association with a frequency of the preference expressions being determined to have the positive or negative polarity in the text having the attribute.

    Abstract translation: 一种用于检测偏好表达的系统和方法,所述偏好表达指示评估者对产品的喜好和不喜欢从对产品的评估和存储描述与文本的属性相关联的产品的评估的文本。 该方法从每个文本中提取描述特定对象的评估的评估表达式,确定提取的评估表达式是正极还是负极性,其中正表示有利评价,否则表示不利评价。 系统输入被指定为用于检测偏好表达式的对象的文本属性,并检测从具有来自提取的评估表达式的具有输入属性的文本中检测到的评估表达式作为偏好表达式之一,并将偏好表达式输出 与偏好表达式的频率的关联被确定为在具有该属性的文本中具有正极性或负极性。

    Method and system to analyze data
    4.
    发明授权
    Method and system to analyze data 有权
    方法和系统分析数据

    公开(公告)号:US07493252B1

    公开(公告)日:2009-02-17

    申请号:US09612136

    申请日:2000-07-07

    CPC classification number: G06F17/3061 Y10S707/99931 Y10S707/99935

    Abstract: Useful knowledge is acquired from a large amount of data by extracting concepts of a unique characteristic. The present invention comprises a concept extractor and a unique concept extractor. The concept extractor extracts categorized concepts from the data. The unique concept extractor is a device for extracting unique concepts from those extracted concepts, and extracts in the categorized concepts, of the concepts belonging to the same category, a concept whose statistical characteristic is distinguished beyond a threshold with respect to the set in which it belongs.

    Abstract translation: 通过提取独特特征的概念,从大量数据中获取有用的知识。 本发明包括概念提取器和独特的概念提取器。 概念提取器从数据中提取分类概念。 独特的概念提取器是从这些提取的概念中提取独特概念的设备,并且提取属于相同类别的概念的分类概念,其统计特征被超出阈值的概念相对于其中的集合 属于

    Parsing method and system for natural language processing
    5.
    发明授权
    Parsing method and system for natural language processing 失效
    自然语言处理的分析方法和系统

    公开(公告)号:US5761631A

    公开(公告)日:1998-06-02

    申请号:US501749

    申请日:1995-07-12

    Inventor: Tetsuya Nasukawa

    CPC classification number: G06F17/2755 G06F17/271 G06F17/2725

    Abstract: The present invention provides a method and system to improve the accuracy of natural language processing systems by enabling them to acquire a parsing output that is accurate to a degree for any type of sentence. The feature of the present invention is that a sentence (a non-grammatical, ill-formed sentence) that cannot be parsed when using a conventional parsing process, which relies on grammatical knowledge, is re-analyzed by using the parsing results for an identical word row in a sentence (a well-formed sentence) in the same context that could be parsed. More specifically, the processing below is performed: (1) a well-formed sentence in a context that includes a word row that is identical to a word row in an ill-formed sentence is searched for, and a dependency structure for that word row is extracted from the parsing results for the well-formed sentence; and (2) the dependency structures of the phrases obtained in (1) are linked by referring to linking information for the well-formed sentence and a grammar, and the dependency structure of the entire sentence is introduced.

    Abstract translation: 本发明提供了一种通过使自己的语言处理系统能够获得对任何类型的句子的程度精确的解析输出来提高自然语言处理系统的准确性的方法和系统。 本发明的特征在于,当使用依赖于语法知识的常规解析过程时不能解析的句子(非语法,不合格的句子)通过使用相同的解析结果来重新分析 在一个可以被解析的相同上下文中的一个句子(一个格式良好的句子)中的单词行。 更具体地,执行以下处理:(1)搜索包括与错字体中的字行相同的字行的上下文中的格式正确的句子,以及该字行的依赖结构 从解析结果中提取出正确句子; 和(2)通过参考形成良好的句子和语法的链接信息来链接在(1)中获得的短语的依赖结构,并引入了整个句子的依赖结构。

    Defect predicate expression extraction
    6.
    发明授权
    Defect predicate expression extraction 有权
    缺陷谓词表达提取

    公开(公告)号:US08484622B2

    公开(公告)日:2013-07-09

    申请号:US13087639

    申请日:2011-04-15

    CPC classification number: G06F11/008

    Abstract: A defect predicate expression extraction device. The device extracts, as candidates for predicate expressions representing defects, predicate expressions occurring in the neighborhood of predicate modifying expressions representing suddenness or predicate modifying expressions representing repeatability. The defect predicate expression extraction device further extracts, as predicate expressions representing normality, predicate expressions occurring in the neighborhood of predicate modifying expressions representing normality and extracts predicate expressions representing defects by removing the predicate expressions representing normality from a list of the candidates for predicate expressions representing defects.

    Abstract translation: 缺陷谓词表达提取装置。 该设备提取代表缺陷的谓词表达的候选者,在谓词修饰表达式邻域中发生的谓词表达式,表示代表重复性的突发性或谓词修饰表达式。 缺陷谓词表达提取装置进一步提取表示正常性的谓词表达式,在表达正常性的谓词修饰表达式的邻域中发生谓词表达,并通过从代表正则性的谓词表达式的候选者的列表中移除表示正常性的谓词表达,并提取表示缺陷的谓词表达 缺陷

    METHOD FOR AUTOMATICALLY IDENTIFYING SENTENCE BOUNDARIES IN NOISY CONVERSATIONAL DATA
    7.
    发明申请
    METHOD FOR AUTOMATICALLY IDENTIFYING SENTENCE BOUNDARIES IN NOISY CONVERSATIONAL DATA 有权
    自动识别语音对话数据中的声界边界的方法

    公开(公告)号:US20090063150A1

    公开(公告)日:2009-03-05

    申请号:US11845462

    申请日:2007-08-27

    CPC classification number: G10L15/26

    Abstract: Sentence boundaries in noisy conversational transcription data are automatically identified. Noise and transcription symbols are removed, and a training set is formed with sentence boundaries marked based on long silences or on manual markings in the transcribed data. Frequencies of head and tail n-grams that occur at the beginning and ending of sentences are determined from the training set. N-grams that occur a significant number of times in the middle of sentences in relation to their occurrences at the beginning or ending of sentences are filtered out. A boundary is marked before every head n-gram and after every tail n-gram occurring in the conversational data and remaining after filtering. Turns are identified. A boundary is marked after each turn, unless the turn ends with an impermissible tail word or is an incomplete turn. The marked boundaries in the conversational data identify sentence boundaries.

    Abstract translation: 嘈杂会话转录数据中的句子边界自动识别。 删除噪声和转录符号,并且形成一个训练集,其中以基于长期沉默或手写标记的转录数据标记的句子边界。 从训练集确定在句子的开头和结尾出现的头和尾n-gram的频率。 在句子中间出现相当于句子开头或结尾的出现次数的N-gram被过滤掉。 在每个头n-gram之前和之后的每个尾部n-gram出现在对话数据中并且在过滤之后保留边界。 确认车辙。 每转后,边界都会被标记出来,除非转弯以不允许的尾字结束,或者是不完整的转弯。 会话数据中的标记边界识别句子边界。

    Expression extraction device, expression extraction method, and recording medium
    8.
    发明授权
    Expression extraction device, expression extraction method, and recording medium 有权
    表达提取装置,表达提取方法和记录介质

    公开(公告)号:US07475007B2

    公开(公告)日:2009-01-06

    申请号:US11061335

    申请日:2005-02-18

    CPC classification number: G06F17/2785

    Abstract: Provided is an expression extraction device for extracting evaluation expressions from text having descriptions on evaluations of a specific evaluation target, which includes a registered expression storage unit for registering an evaluation expression including a predetermined polarity as a registered expression, an expression extraction unit for extracting multiple evaluation expressions and a conjunction expression from the text, a registered expression detection unit for detecting the evaluation expression including the registered expression registered with the registered expression storage unit out of the multiple evaluation expressions, and a polarity judgment unit for judging that the evaluation expression, which is in conjunction with the evaluation expression including the registered expression by means of the conjunction expression in a form of ordinary conjunction, and the series of evaluation expressions, which are not in conjunction with the evaluation expression by means of the conjunction expression in any form of the ordinary conjunction and adversative/concessive conjunction and are not in conjunction with each other by means of the conjunction expression in any form of the ordinary conjunction and the adversative/concessive conjunction, are of the same polarity as the registered expression.

    Abstract translation: 提供了一种用于从具有评估特定评价对象的评价的文本中提取评价表达的表达提取装置,其包括用于登记包含预定极性的评价表达作为注册表达式的登记表达式存储单元,用于提取多个 评价表达式和来自文本的连接表达式,用于从多个评价表达式中检测包括登记在表达式存储单元中的登记表达式的评价表达式的登记表达检测单元和用于判断评价表达式的极性判断单元, 其结合包括通过以普通连接形式的连接表达式的登记表达式的评估表达式以及不与评估表达式结合的一系列评估表达式 通过任何形式的普通连接和对抗/特殊连接的连接表达式,通过任何形式的普通连接和敌对/特定连接的连接表达式彼此不结合,具有相同的极性 注册表达式。

    Document processing method, system and medium

    公开(公告)号:US07046847B2

    公开(公告)日:2006-05-16

    申请号:US09891080

    申请日:2001-06-25

    CPC classification number: G06F17/2745 G06F17/2229

    Abstract: A technique for extracting a meaningful text block from a document where a table, an itemized list, a multiple column, etc., are arbitrarily laid out. A document is input which is laid out using blanks or the like, then a symbol is acquired which is associated with a spatial coordinate of the document. Consecutive characters of the same type are extracted from the symbol to generate a token and a space. A stream is generated from consecutive spaces in the column direction, while a text block is generated from streams and tokens. A link is generated between the text blocks to form a document graph. Validity of a connection (link) between the text blocks in the document graph is evaluated using a language model, then the text blocks are merged if the connection is valid.

    METHOD, PROGRAM AND SYSTEM FOR FINDING CORRESPONDENCE BETWEEN TERMS

    公开(公告)号:US20120316863A1

    公开(公告)日:2012-12-13

    申请号:US13591292

    申请日:2012-08-22

    CPC classification number: G06F17/2809

    Abstract: A computer implemented method, system, and product for finding correspondence between terms in two different languages. The method includes the steps of: creating a technical term set and a general term set for each of i) a first language and ii) a second language, creating two bipartite graphs, where each graph corresponds to one of the two languages, and connects the technical term set and general term set of each language, respectively, with weighted links based on corpus information, creating a third bipartite graph by creating weighted links between general terms in the first language and general terms in the second language by using a translation dictionary, creating an association matrix M corresponding to the three bipartite graphs, calculating a similarity matrix Q by calculation of an inverse matrix, and outputting correspondence between the technical term sets of the first and second language on basis of the similarity matrix.

Patent Agency Ranking