Generating synthetic descriptive text
    1.
    发明授权
    Generating synthetic descriptive text 有权
    生成合成描述性文本

    公开(公告)号:US09208232B1

    公开(公告)日:2015-12-08

    申请号:US13731891

    申请日:2012-12-31

    Applicant: Google Inc.

    CPC classification number: G06F17/30864

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating synthetic descriptive text. One of the methods includes identifying a group of linking resources, wherein each of the linking resources includes a link to a respective target resource; determining, from a search engine index, that at least some of the target resources are associated with seed queries; generating term location information that identifies, for each seed query, locations of terms from the seed query in the linking resource that links to the target resource associated with the seed query; generating synthetic descriptive text for the target resources based on the term location information; and associating the synthetic descriptive text with the target resources in the search engine index.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于产生合成描述性文本。 方法之一包括识别一组链接资源,其中每个链接资源包括到相应目标资源的链接; 从搜索引擎索引确定至少一些目标资源与种子查询相关联; 生成术语位置信息,其针对每个种子查询标识来自链接到与种子查询相关联的目标资源的链接资源中的种子查询的术语的位置; 基于术语位置信息为目标资源生成合成描述性文本; 并将合成描述性文本与搜索引擎索引中的目标资源相关联。

    GENERATING DESCRIPTIVE TEXT FOR IMAGES
    2.
    发明申请
    GENERATING DESCRIPTIVE TEXT FOR IMAGES 有权
    为图像生成描述性文本

    公开(公告)号:US20150161086A1

    公开(公告)日:2015-06-11

    申请号:US14211487

    申请日:2014-03-14

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating descriptive text for images. In one aspect, a method includes identifying a set of seed descriptors for an image in a document that is hosted on a website. For each seed descriptor, structure information is generated that specifies a structure of the document with respect to the image and the seed descriptor. One or more templates are generated for each seed descriptor using the structure information for the seed descriptor. Each template can include image location information, document structure information, image feature information, and a generative rule that generates descriptive text for other images in other documents. Descriptive text for other images is generated using the templates and the other documents. The descriptive text is associated with the images.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于生成用于图像的描述性文本。 一方面,一种方法包括识别在一个网站上托管的文档中的图像的一组种子描述符。 对于每个种子描述符,生成指定关于图像和种子描述符的文档的结构的结构信息。 使用种子描述符的结构信息为每个种子描述符生成一个或多个模板。 每个模板可以包括图像位置信息,文档结构信息,图像特征信息和生成其他文档中的其他图像的描述文本的生成规则。 使用模板和其他文档生成其他图像的描述性文本。 描述性文字与图像相关联。

    Systems and methods for providing search query refinements

    公开(公告)号:US10223439B1

    公开(公告)日:2019-03-05

    申请号:US15348314

    申请日:2016-11-10

    Applicant: Google Inc.

    Abstract: A system and method for generating query refinement suggestions may include collecting refinement data for at least one received source query. The collected refinement data is then clustered to form at least one cluster. At least one potential refinement query suggestion is identified from the refinement data within the at least one cluster.

    Determining a quality measure for a resource
    4.
    发明授权
    Determining a quality measure for a resource 有权
    确定资源的质量度量

    公开(公告)号:US09558233B1

    公开(公告)日:2017-01-31

    申请号:US13731354

    申请日:2012-12-31

    Applicant: Google Inc.

    CPC classification number: G06F17/30386 G06F17/30 G06F17/30864

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining a measure of quality for a resource. In one aspect, a method includes determining a seed score for each seed resource in a set. The seed score for a seed resource can be based on a number of resources that include a link to the seed resource and a number of selections of the links A set of source resources is identified. A source score is determined for each source resource. The source score for a source resource is based on the seed score for each seed resource linked to by the source resource. Source-referenced resources are identified. A resource score is determined for each source-referenced resource. The resource score for a source-referenced resource can be based on the source score for each source resource that includes a link to the source-referenced resource.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于确定资源的质量度量。 一方面,一种方法包括确定一组中每个种子资源的种子分数。 种子资源的种子分数可以基于包括到种子资源的链接的资源的数量以及链接的多个选择一组源资源被识别。 确定每个源资源的源分数。 源资源的源分数基于源资源链接的每个种子资源的种子分数。 源引用的资源被识别。 为每个源引用的资源确定资源分数。 源引用资源的资源分数可以基于每个源资源的源分数,其中包括到源引用资源的链接。

    Providing result-based query suggestions
    6.
    发明授权
    Providing result-based query suggestions 有权
    提供基于结果的查询建议

    公开(公告)号:US09092528B1

    公开(公告)日:2015-07-28

    申请号:US14075366

    申请日:2013-11-08

    Applicant: Google Inc.

    Abstract: In general, one aspect of the subject matter described can be embodied in a method that includes, for a first document that is included in first search results responsive to a first user-submitted query, selecting a plurality of previously submitted queries for which the first document was a responsive search result. The method can further include determining whether second documents that are relevant to the previously submitted query have at least a threshold level of diversity in comparison to the first search results, wherein second documents are determined to be relevant to the previously submitted query based on data that is indicative of user behavior. The method can additionally include identifying one or more queries from the selected previously submitted queries to provide as first suggested queries, and providing the one or more identified queries as first suggested queries with the first search results for the first user-submitted query.

    Abstract translation: 通常,所描述的主题的一个方面可以体现在一种方法中,该方法包括:响应于第一用户提交的查询而包括在第一搜索结果中的第一文档,选择多个先前提交的查询,其中第一 文件是一个敏感的搜索结果。 该方法还可以包括确定与先前提交的查询相关的第二文档是否具有与第一搜索结果相比至少具有阈值分集水平,其中第二文档被确定为与先前提交的查询相关,基于 表示用户行为。 所述方法还可以包括从所选择的先前提交的查询中识别一个或多个查询以提供作为第一建议查询,以及将所述一个或多个已识别查询提供为具有第一用户提交查询的第一搜索结果的第一建议查询。

    Personally identifiable information detection
    7.
    发明授权
    Personally identifiable information detection 有权
    个人身份信息检测

    公开(公告)号:US09015802B1

    公开(公告)日:2015-04-21

    申请号:US14024943

    申请日:2013-09-12

    Applicant: Google Inc.

    CPC classification number: G06F21/6245 G06F21/577 H04L63/0823

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for privacy protection. In one aspect, a method includes accessing personally identifiable information (PII) type definitions that characterize PII types; identifying PII type information included in content of a web page, the PII type information being information matching at least one PII type definition; identifying secondary information included in the content of the web page, the secondary information being information that is predefined as being associated with PII type information; determining a risk score from the PII type information and the secondary information; and classifying the web page as a personal information exposure risk if the risk score meets a confidentiality threshold, wherein the personal information exposure risk is indicative of the web page including personally identifiable information.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于隐私保护。 一方面,一种方法包括访问表征PII类型的个人身份信息(PII)类型定义; 识别包括在网页内容中的PII类型信息,所述PII类型信息是与至少一个PII类型定义相匹配的信息; 识别包括在网页的内容中的次要信息,次要信息是被预先定义为与PII类型信息相关联的信息; 从PII类型信息和次要信息确定风险评分; 以及如果所述风险分数满足机密阈值,则将所述网页分类为个人信息暴露风险,其中所述个人信息暴露风险指示所述网页包括个人身份信息。

    SYSTEM AND METHOD FOR PROVIDING SEARCH QUERY REFINEMENTS
    8.
    发明申请
    SYSTEM AND METHOD FOR PROVIDING SEARCH QUERY REFINEMENTS 有权
    提供搜索查询的系统和方法

    公开(公告)号:US20140149415A1

    公开(公告)日:2014-05-29

    申请号:US14169879

    申请日:2014-01-31

    Applicant: Google Inc.

    Abstract: A system and method for providing search query refinements are presented. A stored query and a stored document are associated as a logical pairing. A weight is assigned to the logical pairing. The search query is issued and a set of search documents is produced. At least one search document is matched to at least one stored document. The stored query and the assigned weight associated with the matching at least one stored document are retrieved. At least one cluster is formed based on the stored query and the assigned weight associated with the matching at least one stored document. The stored query associated with the matching at least one stored document are scored for the at least one cluster relative to at least one other cluster. At least one such scored search query is suggested as a set of query refinements.

    Abstract translation: 提出了一种提供搜索查询优化的系统和方法。 存储的查询和存储的文档被关联为逻辑配对。 权重被分配给逻辑配对。 发出搜索查询,并生成一组搜索文档。 至少一个搜索文档与至少一个存储的文档匹配。 检索存储的查询和与匹配的至少一个存储的文档相关联的分配的权重。 基于存储的查询和与匹配至少一个存储的文档相关联的分配的权重,形成至少一个群集。 与至少一个存储的文档匹配的存储查询相对于至少一个其他集群对于至少一个集群进行评分。 建议至少一个这样的计分搜索查询作为一组查询优化。

    Locating meaningful stopwords or stop-phrases in keyword-based retrieval systems
    9.
    发明授权
    Locating meaningful stopwords or stop-phrases in keyword-based retrieval systems 有权
    在基于关键字的检索系统中找到有意义的词汇或停止词组

    公开(公告)号:US08626787B1

    公开(公告)日:2014-01-07

    申请号:US13922968

    申请日:2013-06-20

    Applicant: Google Inc.

    Abstract: A stopword detection component detects stopwords (also stop-phrases) in search queries input to keyword-based information retrieval systems. Potential stopwords are initially identified by comparing the terms in the search query to a list of known stopwords. Context data is then retrieved based on the search query and the identified stopwords. In one implementation, the context data includes documents retrieved from a document index. In another implementation, the context data includes categories relevant to the search query. Sets of retrieved context data are compared to one another to determine if they are substantially similar. If the sets of context data are substantially similar, this fact may be used to infer that the removal of the potential stopword(s) is not material to the search. If the sets of context data are not substantially similar, the potential stopword can be considered material to the search and should not be removed from the query.

    Abstract translation: 停止词检测组件在输入到基于关键字的信息检索系统的搜索查询中检测到停止词(也称为停止词)。 最初通过将搜索查询中的术语与已知无效词列表进行比较来识别潜在的禁忌词。 然后基于搜索查询和所识别的无效词来检索上下文数据。 在一个实现中,上下文数据包括从文档索引检索的文档。 在另一实现中,上下文数据包括与搜索查询相关的类别。 将检索到的上下文数据的集合彼此进行比较,以确定它们是否基本相似。 如果上下文数据集合基本相似,则可以使用该事实来推断潜在的停止词的移除对搜索不重要。 如果上下文数据集基本上不相似,潜在的停用词可以被认为是搜索的重要内容,不应该从查询中移除。

    Query generation using structural similarity between documents

    公开(公告)号:US09436747B1

    公开(公告)日:2016-09-06

    申请号:US14750483

    申请日:2015-06-25

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer program products, for generating synthetic queries using seed queries and structural similarity between documents are described. In one aspect, a method includes identifying embedded coding fragments (e.g., HTML tag) from a structured document and a seed query; generating one or more query templates, each query template corresponding to at least one coding fragment, the query template including a generative rule to be used in generating candidate synthetic queries; generating the candidate synthetic queries by applying the query templates to other documents that are hosted on the same web site as the document; identifying terms that match structure of the query templates as candidate synthetic queries; measuring a performance for each of the candidate synthetic queries; and designating as synthetic queries the candidate synthetic queries that have performance measurements exceeding a performance threshold.

Patent Agency Ranking