Locating meaningful stopwords or stop-phrases in keyword-based retrieval systems

    公开(公告)号:US09817920B1

    公开(公告)日:2017-11-14

    申请号:US14628692

    申请日:2015-02-23

    Applicant: GOOGLE INC.

    Abstract: A stopword detection component detects stopwords (also stop-phrases) in search queries input to keyword-based information retrieval systems. Potential stopwords are initially identified by comparing the terms in the search query to a list of known stopwords. Context data is then retrieved based on the search query and the identified stopwords. In one implementation, the context data includes documents retrieved from a document index. In another implementation, the context data includes categories relevant to the search query. Sets of retrieved context data are compared to one another to determine if they are substantially similar. If the sets of context data are substantially similar, this fact may be used to infer that the removal of the potential stopword(s) is not material to the search. If the sets of context data are not substantially similar, the potential stopword can be considered material to the search and should not be removed from the query.

    Document retrieval using internal dictionary-hierarchies to adjust per-subject match results
    3.
    发明授权
    Document retrieval using internal dictionary-hierarchies to adjust per-subject match results 有权
    使用内部词典 - 层次结构的文档检索来调整每个主题的匹配结果

    公开(公告)号:US09430559B2

    公开(公告)日:2016-08-30

    申请号:US14854767

    申请日:2015-09-15

    Abstract: Techniques for managing big data include retrieval using per-subject dictionaries having multiple levels of sub-classification hierarchy within the subject. Entries may include subject-determining-power (SDP) scores that provide an indication of the descriptive power of the entry term with respect to the subject of the dictionary containing the term. The same term may have entries in multiple dictionaries with different SDP scores in each of the dictionaries. A retrieval request for one or more documents containing search terms descriptive of the one or more documents can be processed by identifying a set of candidate documents tagged with subjects, i.e., identifiers of per-subject dictionaries having entries corresponding to a search term, then using affinity values to adjust the aggregate score for the terms in the dictionaries. Documents are then selected for best match to the subject based on the adjusted scores. Alternatively, the adjustment may be performed after selecting the documents by re-ordering them according to adjusted scores.

    Abstract translation: 用于管理大数据的技术包括使用在受试者内具有多级子分类层级的每个主体词典的检索。 条目可以包括主题确定能力(SDP)分数,其提供关于包含该术语的字典的主题的入口词的描述性权力的指示。 相同的术语可能在每个词典中具有不同SDP分数的多个词典中的条目。 对包含描述一个或多个文档的搜索术语的一个或多个文档的检索请求可以通过标识被标记的主题的候选文本集合来处理,即,具有与搜索项对应的条目的每个主体词典的标识符,然后使用 亲和力值来调整字典中术语的总分。 然后根据调整后的分数选择文档以与对象最佳匹配。 或者,可以通过根据调整的分数重新排序来选择文档之后执行调整。

    Spell correcting long queries
    4.
    发明授权
    Spell correcting long queries 有权
    拼写更长的查询

    公开(公告)号:US09317606B1

    公开(公告)日:2016-04-19

    申请号:US13757271

    申请日:2013-02-01

    Applicant: Google Inc.

    CPC classification number: G06F17/30864 G06F17/273 G06F17/30666

    Abstract: A computer implemented method and system for spell correcting terms within a string of terms that a computer system receives from a computer readable data string representative of a user search query.

    Abstract translation: 计算机实现的方法和系统,用于在计算机系统从表示用户搜索查询的计算机可读数据串接收的一串术语内对术语进行拼写修正。

    Locating meaningful stopwords or stop-phrases in keyword-based retrieval systems
    6.
    发明授权
    Locating meaningful stopwords or stop-phrases in keyword-based retrieval systems 有权
    在基于关键字的检索系统中找到有意义的词汇或停止词组

    公开(公告)号:US08965919B1

    公开(公告)日:2015-02-24

    申请号:US14143161

    申请日:2013-12-30

    Applicant: Google Inc.

    Abstract: A stopword detection component detects stopwords (also stop-phrases) in search queries input to keyword-based information retrieval systems. Potential stopwords are initially identified by comparing the terms in the search query to a list of known stopwords. Context data is then retrieved based on the search query and the identified stopwords. In one implementation, the context data includes documents retrieved from a document index. In another implementation, the context data includes categories relevant to the search query. Sets of retrieved context data are compared to one another to determine if they are substantially similar. If the sets of context data are substantially similar, this fact may be used to infer that the removal of the potential stopword(s) is not material to the search. If the sets of context data are not substantially similar, the potential stopword can be considered material to the search and should not be removed from the query.

    Abstract translation: 停止词检测组件在输入到基于关键字的信息检索系统的搜索查询中检测到停止词(也称为停止词)。 最初通过将搜索查询中的术语与已知无效词列表进行比较来识别潜在的禁忌词。 然后基于搜索查询和所识别的无效词来检索上下文数据。 在一个实现中,上下文数据包括从文档索引检索的文档。 在另一实现中,上下文数据包括与搜索查询相关的类别。 将检索到的上下文数据的集合彼此进行比较以确定它们是否基本相似。 如果上下文数据集合基本相似,则可以使用该事实来推断潜在的停止词的移除对搜索不重要。 如果上下文数据集基本上不相似,潜在的停用词可以被认为是搜索的重要内容,不应该从查询中移除。

    Linguistically-adapted structural query annotation
    7.
    发明授权
    Linguistically-adapted structural query annotation 有权
    语言适应性结构查询注释

    公开(公告)号:US08812301B2

    公开(公告)日:2014-08-19

    申请号:US13245147

    申请日:2011-09-26

    Abstract: A system and method for natural language processing of queries are provided. A lexicon includes text elements that are recognized as being a proper noun when capitalized. A natural language query includes a sequence of text elements including words. The query is processed. The processing includes a preprocessing step, in which part of speech features are assigned to the text elements in the query. This includes identifying, from a lexicon, a text element in the query which starts with a lowercase letter and assigning recapitalization information to the text element in the query, based on the lexicon. This information includes a part of speech feature of the capitalized form of the text element. Then parts of speech for the text elements in the query are disambiguated, which includes applying rules for recapitalizing text elements based on the recapitalization information.

    Abstract translation: 提供了一种用于查询的自然语言处理的系统和方法。 词典包括文本元素,当大写时被认为是专有名词。 自然语言查询包括包括单词的文本元素序列。 查询被处理。 处理包括预处理步骤,其中部分语音特征被分配给查询中的文本元素。 这包括从词典中识别出以小写字母开头的查询中的文本元素,并根据词典将查询中的文本元素分配资本重组信息。 该信息包括文本元素的大写形式的一部分词性特征。 然后,查询中的文本元素的部分语义被消歧,其中包括基于资本重组信息应用用于资本化文本元素的规则。

    SEARCHING FOR ASSOCIATED EVENTS IN LOG DATA

    公开(公告)号:US20130185286A1

    公开(公告)日:2013-07-18

    申请号:US13668847

    申请日:2012-11-05

    CPC classification number: G06F17/30424 G06F17/30637 G06F17/30666

    Abstract: To retrieve a sequence of associated events in log data, a request expression is parsed to retrieve types of dependencies between events which are searched, and the constraints (e.g., keywords) which characterize each event. Based on the parsing results, query components can be formed, expressing the constraints for individual events and interrelations (e.g., time spans) between events. A resultant span query comprising the query components can then be run against an index of events, which encodes a mutual location of associated events in storage.

    Scoring concepts for contextual personalized information retrieval
    9.
    发明授权
    Scoring concepts for contextual personalized information retrieval 有权
    评估上下文个性化信息检索的概念

    公开(公告)号:US08463810B1

    公开(公告)日:2013-06-11

    申请号:US11757040

    申请日:2007-06-01

    Applicant: Earl Rennison

    Inventor: Earl Rennison

    Abstract: Information retrieval systems face challenging problems with delivering highly relevant and highly inclusive search results in response to a user's query. Contextual personalized information retrieval uses a set of integrated methodologies that can combine automatic concept extraction/matching from text, a powerful fuzzy search engine, and a collaborative user preference learning engine to provide accurate and personalized search results. The system can include constructing a search query to execute a search of a database. The system can parse an input query from a user conducting the search of the database into sub-strings, and can match the sub-strings to concepts in a semantic concept network of a knowledge base. The system can further map the matched concepts to criteria and criteria values that specify a set of constraints on and scoring parameters for the matched concepts.

    Abstract translation: 信息检索系统面对挑战性问题,提供高度相关性和高度包容性的搜索结果以响应用户的查询。 上下文个性化信息检索使用一套集合的方法,可以组合文本的自动概念提取/匹配,强大的模糊搜索引擎和协作用户偏好学习引擎,以提供准确和个性化的搜索结果。 系统可以包括构建搜索查询以执行对数据库的搜索。 该系统可以从进行数据库搜索的用户解析输入查询到子字符串,并且可以将子串匹配到知识库的语义概念网络中的概念。 该系统可以进一步将匹配的概念映射到标准和标准值,该值指定对匹配概念的一组约束和评分参数。

    Locating meaningful stopwords or stop-phrases in keyword-based retrieval systems
    10.
    发明授权
    Locating meaningful stopwords or stop-phrases in keyword-based retrieval systems 有权
    在基于关键字的检索系统中找到有意义的词汇或停止词组

    公开(公告)号:US08214385B1

    公开(公告)日:2012-07-03

    申请号:US13098956

    申请日:2011-05-02

    Abstract: A stopword detection component detects stopwords (also stop-phrases) in search queries input to keyword-based information retrieval systems. Potential stopwords are initially identified by comparing the terms in the search query to a list of known stopwords. Context data is then retrieved based on the search query and the identified stopwords. In one implementation, the context data includes documents retrieved from a document index. In another implementation, the context data includes categories relevant to the search query. Sets of retrieved context data are compared to one another to determine if they are substantially similar. If the sets of context data are substantially similar, this fact may be used to infer that the removal of the potential stopword(s) is not material to the search. If the sets of context data are not substantially similar, the potential stopword can be considered material to the search and should not be removed from the query.

    Abstract translation: 停止词检测组件在输入到基于关键字的信息检索系统的搜索查询中检测到停止词(也称为停止词)。 最初通过将搜索查询中的术语与已知无效词列表进行比较来识别潜在的禁忌词。 然后基于搜索查询和所识别的无效词来检索上下文数据。 在一个实现中,上下文数据包括从文档索引检索的文档。 在另一实现中,上下文数据包括与搜索查询相关的类别。 将检索到的上下文数据的集合彼此进行比较,以确定它们是否基本相似。 如果上下文数据集合基本相似,则可以使用该事实来推断潜在的停止词的移除对搜索不重要。 如果上下文数据集基本上不相似,潜在的停用词可以被认为是搜索的重要内容,不应该从查询中移除。

Patent Agency Ranking