- 专利标题: Managing selection of a representative data subset according to user-specified parameters with clustering
-
申请号: US15421406申请日: 2017-01-31
-
公开(公告)号: US10585910B1公开(公告)日: 2020-03-10
- 发明人: R. David Carasso , Micah James Delfino
- 申请人: Splunk Inc.
- 申请人地址: US CA San Francisco
- 专利权人: SPLUNK INC.
- 当前专利权人: SPLUNK INC.
- 当前专利权人地址: US CA San Francisco
- 代理机构: Perkins Coie LLP
- 主分类号: G06F16/00
- IPC分类号: G06F16/00 ; G06F16/25 ; G06F7/24 ; G06F3/0482 ; G06F16/28 ; G06F16/904 ; G06F3/0488
摘要:
Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.
信息查询