Object location and processing
    21.
    发明授权

    公开(公告)号:US09773056B1

    公开(公告)日:2017-09-26

    申请号:US13070465

    申请日:2011-03-23

    CPC classification number: G06F17/30666 G06F17/30864

    Abstract: Embodiments described herein locate objects in input. Embodiments first parse the input into a form that can be used to perform the analysis required to construct a set of one or more objects. Embodiments then form, when possible, object character strings by using the grammatical values of the underlying terms. The set of object character strings can be used in a variety of textual analysis procedures, such as search, comparisons, and other combinatorial analysis that requires the use of objects in performing tasks related to an information repository of documents, files, messages, etc.

    Category-based lemmatizing of a phrase in a document

    公开(公告)号:US09672278B2

    公开(公告)日:2017-06-06

    申请号:US14820601

    申请日:2015-08-07

    Abstract: A processor receives a string of binary data that represents an initial phrase that includes multiple words and is associated with a specific category. The processor removes one or more letters from an end of a word in the initial phrase to form an initial truncated version of the phrase. The processor runs a TF-IDF algorithm on the initial truncated version of the phrase, and lemmatizes subsequent truncated versions of the initial phrase by recursively removing remaining letters from the end of the word. The processor runs the TF-IDF algorithm on subsequent truncated versions of the initial truncated version of the initial phrase until a highest TF-IDF value is identified. The processor defines a breadth of a lemma for a lexeme based on the specific category of the phrase, and assigns the specific truncated version having the highest TF-IDF value to the specific category.

    INFORMATION REPLYING METHOD AND APPARATUS
    23.
    发明申请
    INFORMATION REPLYING METHOD AND APPARATUS 审中-公开
    信息回答方法和装置

    公开(公告)号:US20160269326A1

    公开(公告)日:2016-09-15

    申请号:US15163337

    申请日:2016-05-24

    Abstract: The information replying method includes: receiving to-be-replied information, where the to-be-replied information includes text content and contact information; searching a database for corresponding dialog style information according to the text content and the contact information; performing preprocessing on the text content, where the preprocessing includes word segmentation processing and stop word removal processing; and searching, according to data that has undergone the preprocessing, the database corresponding to the dialog style information, to determine reply information.

    Abstract translation: 信息回复方法包括:收到要回复的信息,待回复的信息包括文本内容和联系信息; 根据文本内容和联系信息,搜索数据库中相应的对话风格信息; 对文本内容执行预处理,其中预处理包括字分割处理和停止字删除处理; 并且根据已经经过预处理的数据,搜索与对话样式信息对应的数据库,以确定回复信息。

    CATEGORY-BASED LEMMATIZING OF A PHRASE IN A DOCUMENT
    25.
    发明申请
    CATEGORY-BASED LEMMATIZING OF A PHRASE IN A DOCUMENT 有权
    文档中基于类别的文本的简化

    公开(公告)号:US20150347575A1

    公开(公告)日:2015-12-03

    申请号:US14820601

    申请日:2015-08-07

    Abstract: A processor receives a string of binary data that represents an initial phrase that includes multiple words and is associated with a specific category. The processor removes one or more letters from an end of a word in the initial phrase to form an initial truncated version of the phrase. The processor runs a TF-IDF algorithm on the initial truncated version of the phrase, and lemmatizes subsequent truncated versions of the initial phrase by recursively removing remaining letters from the end of the word. The processor runs the TF-IDF algorithm on subsequent truncated versions of the initial truncated version of the initial phrase until a highest TF-IDF value is identified. The processor defines a breadth of a lemma for a lexeme based on the specific category of the phrase, and assigns the specific truncated version having the highest TF-IDF value to the specific category.

    Abstract translation: 处理器接收一串二进制数据,该字符串表示包含多个单词并与特定类别关联的初始短语。 处理器从初始短语中的单词的末尾删除一个或多个字母,以形成短语的初始截断版本。 处理器在短语的初始截断版本上运行TF-IDF算法,并通过递归地从单词结尾删除剩余的字母,从而缩小初始短语的后续截断版本。 处理器在初始短语的初始截断版本的后续截断版本上运行TF-IDF算法,直到识别出最高TF-IDF值。 处理器基于短语的特定类别定义词法的引用范围,并将具有最高TF-IDF值的特定截断版本分配给特定类别。

    Category-based lemmatizing of a phrase in a document
    26.
    发明授权
    Category-based lemmatizing of a phrase in a document 有权
    一个短语在文档中基于类别的分类

    公开(公告)号:US09158755B2

    公开(公告)日:2015-10-13

    申请号:US13663563

    申请日:2012-10-30

    Abstract: A processor-implemented method, system, and/or computer program product lemmatizes a phrase for a specific category. An initial phrase, which is associated with a specific category, is received by a processor. The processor removes a last letter or set of letters from a word in the initial phrase to form an initial truncated version of the phrase, and then runs a term frequency-inverse document frequency (TF-IDF) algorithm on the initial truncated version of the phrase. The processor lemmatizes subsequent truncated versions of the initial phrase, and then runs the TF-IDF algorithm until a highest TF-IDF value is identified for a specific truncated version of the initial phrase when compared to TF-IDF values of other truncated versions of the initial phrase. The specific truncated version of the initial phrase that is associated with the highest TF-IDF value is then associated with the specific category.

    Abstract translation: 处理器实现的方法,系统和/或计算机程序产品使特定类别的短语缩小。 与特定类别相关联的初始短语由处理器接收。 处理器从初始短语中的单词中删除最后一个字母或一组字母,以形成该短语的初始截断版本,然后在初始截断版本的该文本上运行术语频率 - 逆文档频率(TF-IDF) 短语。 处理器对初始短语的后续截断版本进行了修改,然后运行TF-IDF算法,直到对于特定截断版本的初始短语识别出最高的TF-IDF值,与其他截断版本的TF-IDF值相比较 初始短语。 与最高TF-IDF值相关联的初始短语的特定截断版本然后与特定类别相关联。

    Question answering device, question answering method, and question answering program
    27.
    发明授权
    Question answering device, question answering method, and question answering program 有权
    问答设备,问答方法和问答方案

    公开(公告)号:US08983977B2

    公开(公告)日:2015-03-17

    申请号:US12281350

    申请日:2007-02-20

    Abstract: A question-answering device, a question-answering method, and a question-answering program that can obtain an answer to an inputted query with high probability are described. A score calculation element 305 determines a matching degree between the group of the style and the topic of an inputted query and the group of the style and the topic of the query of question-answer pairs. A search result presentation element 306 narrows the question-answer pairs on the basis of the matching degree.

    Abstract translation: 描述可以以高概率获得输入查询的答案的问答设备,问答方法和问答程序。 分数计算元件305确定所输入的查询的样式组和主题的组与问答对查询的样式和主题的组之间的匹配程度。 搜索结果呈现元素306基于匹配度来缩小问答对。

    Systems and methods for accessing web pages using natural language
    28.
    发明授权
    Systems and methods for accessing web pages using natural language 有权
    使用自然语言访问网页的系统和方法

    公开(公告)号:US08943095B2

    公开(公告)日:2015-01-27

    申请号:US13624762

    申请日:2012-09-21

    Abstract: Systems and methods for building an interface that receives and responds to varied natural language expressions. In an embodiment, the system receives a natural language expression in text or audio, and translates it by building at least one data structure which reflects the concepts expressed in the natural language expression. The data structure may comprise a symbol representing each concept. In an embodiment, a parser utilizes the data structure to parse language expressions to single concept symbols that represent the meaning of the expressions. Response actions may also be performed in response to the parsed language expressions. In addition, a parser may receive a single concept symbol, and generate one or many natural language expressions of the meaning of the concept symbol. Furthermore, the system may be configured to understand the local meaning of words and phrases.

    Abstract translation: 用于构建接收和响应各种自然语言表达的接口的系统和方法。 在一个实施例中,系统接收文本或音频中的自然语言表达,并且通过构建反映自然语言表达式中所概述的概念的至少一个数据结构进行翻译。 数据结构可以包括表示每个概念的符号。 在一个实施例中,解析器利用数据结构将语言表达式解析为表示表达式的含义的单个概念符号。 还可以响应于解析的语言表达来执行响应动作。 此外,解析器可以接收单个概念符号,并且生成概念符号的含义的一个或多个自然语言表达。 此外,系统可以被配置为理解单词和短语的本地含义。

    CATEGORY-BASED LEMMATIZING OF A PHRASE IN A DOCUMENT
    29.
    发明申请
    CATEGORY-BASED LEMMATIZING OF A PHRASE IN A DOCUMENT 有权
    文档中基于类别的文本的简化

    公开(公告)号:US20140122514A1

    公开(公告)日:2014-05-01

    申请号:US13663563

    申请日:2012-10-30

    Abstract: A processor-implemented method, system, and/or computer program product lemmatizes a phrase for a specific category. An initial phrase, which is associated with a specific category, is received by a processor. The processor removes a last letter or set of letters from a word in the initial phrase to form an initial truncated version of the phrase, and then runs a term frequency-inverse document frequency (TF-IDF) algorithm on the initial truncated version of the phrase. The processor lemmatizes subsequent truncated versions of the initial phrase, and then runs the TF-IDF algorithm until a highest TF-IDF value is identified for a specific truncated version of the initial phrase when compared to TF-IDF values of other truncated versions of the initial phrase. The specific truncated version of the initial phrase that is associated with the highest TF-IDF value is then associated with the specific category.

    Abstract translation: 处理器实现的方法,系统和/或计算机程序产品使特定类别的短语缩小。 与特定类别相关联的初始短语由处理器接收。 处理器从初始短语中的单词中删除最后一个字母或一组字母,以形成该短语的初始截断版本,然后在初始截断版本的该文本上运行术语频率 - 逆文档频率(TF-IDF) 短语。 处理器对初始短语的后续截断版本进行了修改,然后运行TF-IDF算法,直到对于特定截断版本的初始短语识别出最高的TF-IDF值,与其他截断版本的TF-IDF值相比较 初始短语。 与最高TF-IDF值相关联的初始短语的特定截断版本然后与特定类别相关联。

    Locating meaningful stopwords or stop-phrases in keyword-based retrieval systems
    30.
    发明授权
    Locating meaningful stopwords or stop-phrases in keyword-based retrieval systems 有权
    在基于关键字的检索系统中找到有意义的词汇或停止词组

    公开(公告)号:US08626787B1

    公开(公告)日:2014-01-07

    申请号:US13922968

    申请日:2013-06-20

    Applicant: Google Inc.

    Abstract: A stopword detection component detects stopwords (also stop-phrases) in search queries input to keyword-based information retrieval systems. Potential stopwords are initially identified by comparing the terms in the search query to a list of known stopwords. Context data is then retrieved based on the search query and the identified stopwords. In one implementation, the context data includes documents retrieved from a document index. In another implementation, the context data includes categories relevant to the search query. Sets of retrieved context data are compared to one another to determine if they are substantially similar. If the sets of context data are substantially similar, this fact may be used to infer that the removal of the potential stopword(s) is not material to the search. If the sets of context data are not substantially similar, the potential stopword can be considered material to the search and should not be removed from the query.

    Abstract translation: 停止词检测组件在输入到基于关键字的信息检索系统的搜索查询中检测到停止词(也称为停止词)。 最初通过将搜索查询中的术语与已知无效词列表进行比较来识别潜在的禁忌词。 然后基于搜索查询和所识别的无效词来检索上下文数据。 在一个实现中,上下文数据包括从文档索引检索的文档。 在另一实现中,上下文数据包括与搜索查询相关的类别。 将检索到的上下文数据的集合彼此进行比较,以确定它们是否基本相似。 如果上下文数据集合基本相似,则可以使用该事实来推断潜在的停止词的移除对搜索不重要。 如果上下文数据集基本上不相似,潜在的停用词可以被认为是搜索的重要内容,不应该从查询中移除。

Patent Agency Ranking