Managing selection of a representative data subset according to user-specified parameters with clustering

发明授权

US10585910B1 Managing selection of a representative data subset according to user-specified parameters with clustering 有权

请登陆查看更多内容

专利标题： Managing selection of a representative data subset according to user-specified parameters with clustering
申请号： US15421406

申请日： 2017-01-31
公开(公告)号： US10585910B1

公开(公告)日： 2020-03-10
发明人: R. David Carasso , Micah James Delfino
申请人： Splunk Inc.
申请人地址： US CA San Francisco
专利权人： SPLUNK INC.
当前专利权人： SPLUNK INC.
当前专利权人地址： US CA San Francisco
代理机构： Perkins Coie LLP
主分类号： G06F16/00
IPC分类号： G06F16/00 ; G06F16/25 ; G06F7/24 ; G06F3/0482 ; G06F16/28 ; G06F16/904 ; G06F3/0488

Managing selection of a representative data subset according to user-specified parameters with clustering

摘要：

Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.

信息查询

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F16/00	信息检索；数据库结构；文件系统结构