-
1.
公开(公告)号:US20150066934A1
公开(公告)日:2015-03-05
申请号:US14480528
申请日:2014-09-08
Applicant: Yahoo! Inc.
Inventor: Lei Duan , Fan Li , Srinivas Vadrevu , Emre Velipasaoglu , Swapnil Hajela , Deepayan Chakrabarti
CPC classification number: G06F17/30598 , G06F15/18 , G06F17/30873 , G06K9/6256 , G06N5/04 , G06N99/005 , G06Q10/10
Abstract: Exemplary methods and apparatuses are provided which may be used for classifying and indexing segmented portions of web pages and providing related information for use in information extraction and/or information retrieval systems.
Abstract translation: 提供了可用于对网页的分段部分进行分类和索引并提供用于信息提取和/或信息检索系统的相关信息的示例性方法和装置。
-
公开(公告)号:US20160378858A1
公开(公告)日:2016-12-29
申请号:US15261546
申请日:2016-09-09
Applicant: Yahoo! Inc.
Inventor: Srinivas Vadrevu , Yi Chang , Zhaohui Zheng , Bo Long
IPC: G06F17/30
CPC classification number: G06F16/35 , G06F16/334 , G06F16/338 , G06F16/93
Abstract: One particular embodiment clusters a plurality of documents using one or more clustering algorithms to obtain one or more first sets of clusters, wherein: each first set of clusters results from clustering the documents using one of the clustering algorithms; and with respect to each first set of clusters, each of the documents belongs to one of the clusters from the first set of clusters; accesses a search query; identifies a search result in response to the search query, wherein the search result comprises two or more of the documents; and clusters the search result to obtain a second set of clusters, wherein each document of the search result belongs to one of the clusters from the second set of clusters.
Abstract translation: 一个特定实施例使用一个或多个聚类算法来聚集多个文档以获得一个或多个第一组聚类,其中:每个第一组聚类是使用聚类算法之一聚类文档而得到的; 并且对于每个第一组集合,每个文档属于来自第一组集合的集群之一; 访问搜索查询; 识别响应于搜索查询的搜索结果,其中搜索结果包括两个或更多个文档; 并且聚集搜索结果以获得第二组聚类,其中搜索结果的每个文档属于来自第二组聚类的聚类中的一个。
-
3.
公开(公告)号:US09514216B2
公开(公告)日:2016-12-06
申请号:US14480528
申请日:2014-09-08
Applicant: Yahoo! Inc.
Inventor: Lei Duan , Fan Li , Srinivas Vadrevu , Emre Velipasaoglu , Swapnil Hajela , Deepayan Chakrabarti
CPC classification number: G06F17/30598 , G06F15/18 , G06F17/30873 , G06K9/6256 , G06N5/04 , G06N99/005 , G06Q10/10
Abstract: Exemplary methods and apparatuses are provided which may be used for classifying and indexing segmented portions of web pages and providing related information for use in information extraction and/or information retrieval systems. In an embodiment, an index of segmented portions may be used by a search engine to respond to a search query. In an embodiment, one or more machine learned models may be used to identify one or more feature properties of a plurality of segmented portions within one or more files, or otherwise inferable from the one or more files. In an embodiment, one or more machine learned models may be used to classify one or more of a plurality of segmented portions as being at least one of a plurality of segment types.
Abstract translation: 提供了可用于对网页的分段部分进行分类和索引并提供用于信息提取和/或信息检索系统的相关信息的示例性方法和装置。 在一个实施例中,搜索引擎可以使用分段部分的索引来响应搜索查询。 在一个实施例中,可以使用一个或多个机器学习模型来识别一个或多个文件内的多个分段部分的一个或多个特征属性,或者可以从一个或多个文件推断。 在一个实施例中,可以使用一个或多个机器学习模型来将多个分段部分中的一个或多个分类为多个分段类型中的至少一个。
-
公开(公告)号:US20140222800A1
公开(公告)日:2014-08-07
申请号:US14249055
申请日:2014-04-09
Applicant: YAHOO! INC.
Inventor: Srinivas Vadrevu , Su-Lin Wu , Ben Shahshahani
IPC: G06F17/30
CPC classification number: G06F17/3053 , G06F17/30477 , G06F17/30705 , G06F17/30864 , G06F17/30867 , G06F17/3087
Abstract: News search and browse experience is personalized based on user preferences. User attributes like a geographic location are obtained and news sources preferred by other users with attributes similar to those of a requesting user are identified. News sources that are popular across different user groups are eliminated and relevant news items from the remaining news sources are retrieved and presented to the requesting user.
Abstract translation: 新闻搜索和浏览体验基于用户偏好进行个性化。 获得诸如地理位置的用户属性,并且识别出具有与请求用户的属性相似的属性的其他用户优选的新闻源。 在不同用户组中流行的新闻来源被消除,并且来自剩余新闻来源的相关新闻项被检索并呈现给请求用户。
-
-
-