Text mining device, method thereof, and program
    91.
    发明授权
    Text mining device, method thereof, and program 有权
    文本挖掘设备,其方法和程序

    公开(公告)号:US08612207B2

    公开(公告)日:2013-12-17

    申请号:US10593375

    申请日:2005-03-17

    Abstract: Language analysis means 21 analyzes texts read from a text DB 11, and generates a sentence structure as the analysis result. Similar-structure generation adjustment means 25 generates, from an input of an input device, a determination item for determining whether or not the structures are identical every type of differences between the sentence structures. Similar-structure determination adjustment means 26 generates, from an input of the input device 6, a determination item for determining whether or not the difference between attribute values is ignored every type of attribute values. Similar-structure generating means 22 generates a similar structure of a partial structure forming the sentence structure obtained by language analysis means 21 in accordance with the determination item from the similar-structure generation adjustment means 25, and sets the generated similar structure as an equivalent class of the partial structure on the generation source. Frequent-similar-pattern detection means 24 ignores the attribute value in accordance with the determination item given from the similar-structure determination adjustment means 26, detects the frequent pattern on the basis of a set of equivalent classes from the similar-structure generating means 22, and outputs the frequent pattern to an output device 3.

    Abstract translation: 语言分析装置21分析从文本DB11读取的文本,并生成句子结构作为分析结果。 类似结构生成调整装置25从输入装置的输入产生用于确定结构是否相同的确定项,每个语句结构之间的每种类型的差异是相同的。 类似结构确定调整装置26从输入装置6的输入产生用于确定每种属性值是否忽略属性值之间的差异的确定项目。 类似结构生成装置22根据来自类似结构生成调整装置25的确定项目生成形成由语言分析装置21获得的句子结构的部分结构的类似结构,并将生成的类似结构设置为等效类 的部分结构在发电源上。 频率相似图案检测装置24根据从类似结构确定调整装置26给出的确定项目忽略属性值,基于来自类似结构生成装置22的一组等效类别来检测频繁模式 ,并将频繁模式输出到输出装置3。

    SYSTEM AND METHODS FOR SEMIAUTOMATIC GENERATION AND TUNING OF NATURAL LANGUAGE INTERACTION APPLICATIONS
    92.
    发明申请
    SYSTEM AND METHODS FOR SEMIAUTOMATIC GENERATION AND TUNING OF NATURAL LANGUAGE INTERACTION APPLICATIONS 有权
    自然语言交互应用的半自动生成与调谐的系统与方法

    公开(公告)号:US20130268260A1

    公开(公告)日:2013-10-10

    申请号:US13731091

    申请日:2012-12-30

    Abstract: A system for supervised automatic code generation and tuning for natural language interaction applications, comprising a build environment comprising a developer user interface, automated coding tools, automated testing tools, and automated optimization tools, and an analytics framework software module. Text samples are imported into the build environment and automated clustering is performed to assign them to a plurality of input groups, each input group comprising a plurality of semantically related inputs. Language recognition rules are generated by automated coding tools. Automated testing tools carry out automated testing of language recognition rules and generate recommendations for tuning language recognition rules. The analytics framework performs analysis of interaction log files to identify problems in a candidate natural language interaction application. Optimizations to the candidate natural language interaction application are carried out and an optimized natural language interaction application is deployed into production and stored in the solution data repository.

    Abstract translation: 一种用于自然语言交互应用的监督自动代码生成和调优的系统,包括构建环境,包括开发者用户界面,自动编码工具,自动化测试工具和自动化优化工具以及分析框架软件模块。 将文本样本导入到构建环境中,并执行自动聚类以将它们分配给多个输入组,每个输入组包括多个语义相关的输入。 语言识别规则由自动编码工具生成。 自动测试工具对语言识别规则进行自动测试,并产生调整语言识别规则的建议。 分析框架执行交互日志文件的分析,以识别候选自然语言交互应用程序中的问题。 对候选自然语言交互应用进行优化,并将优化的自然语言交互应用程序部署到生产中并存储在解决方案数据存储库中。

    DOMAIN SPECIFIC NATURAL LANGUAGE NORMALIZATION

    公开(公告)号:US20130238313A1

    公开(公告)日:2013-09-12

    申请号:US13414687

    申请日:2012-03-07

    CPC classification number: G06F17/28 G06F17/21 G06F17/2795

    Abstract: Embodiments of the present invention provide a method, system and computer program product for the domain specific normalization of a corpus of text. In an embodiment of the invention, a method for domain specific normalization of a corpus of text is provided, including an industrial, organization, demographic or geographic domain. The method includes loading a corpus of text in memory of a computer and determining a domain for the corpus of text. The method also includes retrieving a lexicon of replacement words for the determined domain. Finally, the method includes text simplifying the corpus of text using the retrieved lexicon. In one aspect of the embodiment, the domain is determined through inference based upon words already presence in the corpus of text. In another aspect of the embodiment, the domain is determined based upon meta-data provided with the corpus of text.

    DOCUMENT ANALYSIS SYSTEM, DOCUMENT ANALYSIS METHOD, DOCUMENT ANALYSIS PROGRAM AND RECORDING MEDIUM
    94.
    发明申请
    DOCUMENT ANALYSIS SYSTEM, DOCUMENT ANALYSIS METHOD, DOCUMENT ANALYSIS PROGRAM AND RECORDING MEDIUM 有权
    文件分析系统,文件分析方法,文件分析程序和记录介质

    公开(公告)号:US20130151957A1

    公开(公告)日:2013-06-13

    申请号:US13817807

    申请日:2011-06-16

    Applicant: Yukiko Kuroiwa

    Inventor: Yukiko Kuroiwa

    CPC classification number: G06F17/2211 G06F17/2223 G06F17/2755 G06F17/2795

    Abstract: As a document analysis system to calculate a similarity degree between texts with high accuracy, an information processing device includes: a common character string calculation unit to extract character strings that are common between two texts and to determine whether or not the two texts are to be set as calculation objects based on a number of the extracted character strings that are common; and a similarity degree calculation unit to calculate, when the two texts are the determined calculation objects, a similarity degree therebetween by using an approximation of a Kolmogorov complexity, and when the two texts are not the calculation objects, handling the similarity degree between the two texts as being dissimilar.

    Abstract translation: 作为用于以高精度计算文本之间的相似度的文档分析系统,信息处理装置包括:公共字符串计算单元,用于提取两个文本之间共同的字符串,并确定两个文本是否为 基于所提取的字符串的数量被设置为计算对象; 以及相似度计算单元,当两个文本是确定的计算对象时,通过使用Kolmogorov复杂度的近似来计算它们之间的相似度,并且当两个文本不是计算对象时,处理两者之间的相似度 文本不相似

    Text mining device, text mining method, text mining program, and recording medium
    95.
    发明授权
    Text mining device, text mining method, text mining program, and recording medium 有权
    文本挖掘设备,文本挖掘方法,文本挖掘程序和记录介质

    公开(公告)号:US08452782B2

    公开(公告)日:2013-05-28

    申请号:US12919463

    申请日:2009-03-06

    CPC classification number: G06F17/2211 G06F17/2795

    Abstract: Provided is a text mining device that performs an analysis properly with respect to a difference between plural related document data. Equipped are an element extracting section 140 that extracts language elements from related two or more document data respectively; a differential processing section 150 that extracts a difference between the document data by comparing the elements between the document data which were extracted by the element extracting means 140; and a statistical processing section 170 that performs statistical processing on the difference extracted by the differential processing section 150. The differential processing section 150 has: element associating section 151 that associates respective elements which are in identical, similar, synonymous, or analogous relation by comparing the elements of the document data between the document data which were extracted by the element extracting section 140; and differential element extracting section 152 that extracts an element with no corresponding element of a pair in the association by the element association section 151.

    Abstract translation: 提供了一种文本挖掘装置,其针对多个相关文档数据之间的差异正确地执行分析。 装备有分别从相关的两个或多个文档数据中提取语言元素的元素提取部分140; 差分处理部分150,通过比较由元素提取装置140提取的文档数据之间的元素来提取文档数据之间的差异; 以及统计处理部分170,其对由差分处理部分150提取的差异进行统计处理。差分处理部分150具有:元素关联部分151,其通过比较将相同,相似,同义或相似关系的各个元素相关联 由元素提取部140提取的文档数据之间的文档数据的元素; 以及差分元素提取部分152,其通过元素关联部分151提取在关联中没有对的元素的元素。

    Methodology to establish term co-relationship using sentence boundary detection
    96.
    发明授权
    Methodology to establish term co-relationship using sentence boundary detection 有权
    使用句子边界检测建立术语共同关系的方法

    公开(公告)号:US08452774B2

    公开(公告)日:2013-05-28

    申请号:US13044873

    申请日:2011-03-10

    CPC classification number: G06F17/30734 G06F17/2795

    Abstract: A method and system for splitting a text document into individual sentences using sentence boundary detection, and establishing co-relationships between terms which are present in the same sentence. A document corpus, or collection of text records, is provided, containing text with terms to be extracted. The text records in the document corpus are divided into individual sentences, using a set of rules for sentence boundary detection. The individual sentences are then analyzed to extract and correlate terms, such as parts and symptoms, symptoms and actions, or parts and failure modes. The correlated terms are then validated based on frequency of occurrence, with term pairs being considered valid if their frequency of occurrence exceeds a minimum frequency threshold. The validated term correlations can be used for fault model development, document classification, and document clustering.

    Abstract translation: 一种使用句子边界检测将文本文档分割成单个句子的方法和系统,并且在同一句子中存在的术语之间建立共同关系。 提供文档语料库或文本记录集合,其中包含要提取的术语的文本。 文档语料库中的文本记录被分为单独的句子,使用一组用于句子边界检测的规则。 然后分析单个句子以提取和关联术语,如部分和症状,症状和动作,或部分和失败模式。 然后根据出现频率对相关项进行验证,如果出现的频率超过最小频率阈值,则术语对被认为是有效的。 验证的术语相关性可用于故障模型开发,文档分类和文档聚类。

    Text paraphrasing method and program, conversion rule computing method and program, and text paraphrasing system
    97.
    发明授权
    Text paraphrasing method and program, conversion rule computing method and program, and text paraphrasing system 有权
    文本改写方法和程序,转换规则计算方法和程序,以及文本翻译系统

    公开(公告)号:US08447589B2

    公开(公告)日:2013-05-21

    申请号:US12448421

    申请日:2007-12-21

    CPC classification number: G06F17/2795 G06F17/28

    Abstract: A paraphrase model of a question text inputted by a user is learned, and a paraphrase expression is generated in real time. When information in text set storage unit is updated, text pair extracting unit extracts a paraphrase text pair from the text set storage unit and stores it in text pair storage unit. Model learning unit learns a question text paraphrase model from the paraphrase text pair in text pair storage unit, and stores it in model storage unit. Text pair extracting unit extracts a paraphrase text pair again from the text set storage unit by using the question text paraphrase model which the model storage unit possesses, and stores it in the text pair storage unit. In case where the stored paraphrase text pair is the same as the paraphrase text pair stored in the text pair storage unit, learning of the question text paraphrase model is ended. Candidate creating unit reads the question text paraphrase model from the model storage unit and generates a paraphrase candidate of the inputted question text.

    Abstract translation: 学习用户输入的问题文本的释义模型,并实时地生成释义表达式。 当文本集存储单元中的信息被更新时,文本对提取单元从文本集存储单元中提取释义文本对,并将其存储在文本对存储单元中。 模型学习单元从文字对存储单元中的释义文本对中学习一个问题文本改写模型,并将其存储在模型存储单元中。 文本对提取单元通过使用模型存储单元拥有的问题文本解释模型,从文本集存储单元再次提取释义文本对,并将其存储在文本对存储单元中。 在存储的释义文本对与存储在文本对存储单元中的释义文本对相同的情况下,问题文本解释模型的学习结束。 候选创建单元从模型存储单元读取问题文本替代模型,并生成输入的问题文本的释义候选。

    SYSTEM AND METHOD FOR SUGGESTION MINING
    98.
    发明申请
    SYSTEM AND METHOD FOR SUGGESTION MINING 有权
    用于建筑采矿的系统和方法

    公开(公告)号:US20130096909A1

    公开(公告)日:2013-04-18

    申请号:US13272553

    申请日:2011-10-13

    CPC classification number: G06F17/2795 G06F17/271 G06F17/278

    Abstract: A system and method for extraction of suggestions for improvement form a corpus of documents, such as customer reviews, are disclosed. A structured terminology provided or a topic includes a set of semantic classes, each including a set of terms. A thesaurus of terms relating to suggestions of improvement is provided. Text elements of text strings in the documents which are instances of terms in the structured terminology are labeled with the corresponding semantic class and text elements which are instances of terms in the thesaurus are also labeled. A set of patterns is applied to the labeled text strings to identify suggestions of improvement expressions. The patterns define syntactic relations between text elements, some of which are required to be instances of one of the terms in a particular semantic class or thesaurus. A set of suggestions for improvements is output based on the identified suggestions of improvement expressions.

    Abstract translation: 公开了一种用于提取改进建议的系统和方法,形成了诸如客户评论的文件语料库。 提供的结构化术语或主题包括一组语义类,每个语义类包括一组术语。 提供了与改进建议有关的术语的词典。 在结构化术语中的术语实例的文档中的文本字符串的文本元素被标记为对应的语义类和文本元素,这些元素是同义词库中的术语的实例也被标记。 将一组模式应用于标记的文本字符串,以识别改进表达的建议。 这些模式定义了文本元素之间的句法关系,其中一些必须是特定语义类或词典中的一个术语的实例。 根据确定的改进表达建议,输出一组改进建议。

    Automatic answering device, automatic answering system, conversation scenario editing device, conversation server, and automatic answering method
    99.
    发明授权
    Automatic answering device, automatic answering system, conversation scenario editing device, conversation server, and automatic answering method 有权
    自动应答设备,自动应答系统,会话场景编辑设备,会话服务器和自动应答方式

    公开(公告)号:US08374859B2

    公开(公告)日:2013-02-12

    申请号:US12542170

    申请日:2009-08-17

    Abstract: An automatic answering device and an automatic answering method for automatically answering to a user utterance are configured: to prepare a conversation scenario that is a set of input sentences and replay sentences, the input sentences each corresponding to a user utterance assumed to be uttered by a user, the reply sentences each being an automatic reply to the inputted sentence; to accept a user utterance; to determine the reply sentence to the accepted user utterance on the basis of the conversation scenario; and to present the determined reply sentence to the user. Data of the conversation scenario have a data structure that enables the inputted sentences and the reply sentences to be expressed in a state transition diagram in which each of the inputted sentences is defined as a morphism and the reply sentence corresponding to the inputted sentence is defined as an object.

    Abstract translation: 配置自动应答设备和用于自动应答用户话语的自动应答方法:准备作为一组输入句子和重播句子的会话场景,每个对应于用户话语的输入句子假定为由 用户,所述回复句子各自是对所输入的句子的自动回复; 接受用户的话语; 根据对话情景确定接受的用户话语的回复句; 并向用户呈现确定的回复句子。 会话场景的数据具有数据结构,使得输入的句子和答复语句能够在状态转换图中表达,其中每个输入的句子被定义为态射,并且与输入的句子相对应的回答句被定义为 一个东西。

    System and a Method for Generating Semantically Similar Sentences for Building a Robust SLM
    100.
    发明申请
    System and a Method for Generating Semantically Similar Sentences for Building a Robust SLM 有权
    用于生成语义类似句子的系统和方法,用于构建稳健的SLM

    公开(公告)号:US20130018649A1

    公开(公告)日:2013-01-17

    申请号:US13181923

    申请日:2011-07-13

    CPC classification number: G06F17/274 G06F17/2795 G06F17/2881 G10L15/26

    Abstract: A system and method are described for generating semantically similar sentences for a statistical language model. A semantic class generator determines for each word in an input utterance a set of corresponding semantically similar words. A sentence generator computes a set of candidate sentences each containing at most one member from each set of semantically similar words. A sentence verifier grammatically tests each candidate sentence to determine a set of grammatically correct sentences semantically similar to the input utterance. Also note that the generated semantically similar sentences are not restricted to be selected from an existing sentence database.

    Abstract translation: 描述了用于为统计语言模型生成语义上类似的句子的系统和方法。 语义类生成器确定输入语义中的每个单词一组相应的语义上相似的单词。 句子生成器从每个语义上相似的单词集合中计算出一组候选句子,每个候选句子最多包含一个成员。 句子验证器语法测试每个候选句子以确定一组语法上正确的句子,其语义上类似于输入的话语。 还要注意,生成的语义上相似的句子不限于从现有句子数据库中选择。

Patent Agency Ranking