Abstract:
Event data comprising an unordered string set may be received. String set dictionary indexes may be assigned for strings of the unordered string set in a string set dictionary. The unordered string set may be sorted to provide a sorted series based on the string set dictionary indexes for the unordered string set. A differential series may be computed from the sorted series. The differential series may be encoded into binary code words. In an embodiment, the event data also may comprise strings. A schema version associated with the strings in a row may be determined. Computing resources may be allocated based on the schema version.
Abstract:
Techniques provided herein allow for management of data. In various embodiments, systems and methods prune and retain data being managed by a data management system, where the managed data can include log data aggregated from one or more servers for analysis purposes. According to some embodiments, pruning can be triggered according to one or more constraints, such as the age of managed data (e.g., retain only 30 days of managed data) or the memory space required to store the managed data (e.g., retain only 100 GB worth of managed data). The constraints that trigger data pruning can be based on a data retention policy. When triggered, pruning can be performed on a fraction of the managed data stored based on the data retention policy (e.g., 3 days of full managed data, 27 days of pruned managed data). The pruning may be performed by sampling, at a desired rate, the managed data.
Abstract:
Event data comprising an unordered string set may be received. String set dictionary indexes may be assigned for strings of the unordered string set in a string set dictionary. The unordered string set may be sorted to provide a sorted series based on the string set dictionary indexes for the unordered string set. A differential series may be computed from the sorted series. The differential series may be encoded into binary code words. In an embodiment, the event data also may comprise strings. A schema version associated with the strings in a row may be determined. Computing resources may be allocated based on the schema version.
Abstract:
A query may be provided to aggregators at hierarchical levels in an in-memory data storage module. The query may be provided to leaf nodes of the in-memory data storage module. The leaf nodes may execute the query, returning results of the query to the aggregators. One or more aggregations may be performed based on the results. In an embodiment, log entries associated with a logged event may be serialized and divided into distributed chunks for storage in the leaf nodes. A leaf node, from the leaf nodes, having storage capacity for a distributed chunk may be identified. The distributed chunk may be stored in the leaf node.
Abstract:
A query may be provided to aggregators at hierarchical levels in an in-memory data storage module. The query may be provided to leaf nodes of the in-memory data storage module. The leaf nodes may execute the query, returning results of the query to the aggregators. One or more aggregations may be performed based on the results. In an embodiment, log entries associated with a logged event may be serialized and divided into distributed chunks for storage in the leaf nodes. A leaf node, from the leaf nodes, having storage capacity for a distributed chunk may be identified. The distributed chunk may be stored in the leaf node.
Abstract:
Techniques provided herein allow for management of data. In various embodiments, systems and methods prune and retain data being managed by a data management system, where the managed data can include log data aggregated from one or more servers for analysis purposes. According to some embodiments, pruning can be triggered according to one or more constraints, such as the age of managed data (e.g., retain only 30 days of managed data) or the memory space required to store the managed data (e.g., retain only 100 GB worth of managed data). The constraints that trigger data pruning can be based on a data retention policy. When triggered, pruning can be performed on a fraction of the managed data stored based on the data retention policy (e.g., 3 days of full managed data, 27 days of pruned managed data). The pruning may be performed by sampling, at a desired rate, the managed data.
Abstract:
Techniques provided herein allow for management of data. In various embodiments, systems and methods prune and retain data being managed by a data management system, where the managed data can include log data aggregated from one or more servers for analysis purposes. According to some embodiments, pruning can be triggered according to one or more constraints, such as the age of managed data (e.g., retain only 30 days of managed data) or the memory space required to store the managed data (e.g., retain only 100 GB worth of managed data). The constraints that trigger data pruning can be based on a data retention policy. When triggered, pruning can be performed on a fraction of the managed data stored based on the data retention policy (e.g., 3 days of full managed data, 27 days of pruned managed data). The pruning may be performed by sampling, at a desired rate, the managed data.
Abstract:
Techniques provided herein allow for estimating data missing in query results provided in response to queries performed on data managed by a data management system. In the event that one or more leaf nodes are unable or unavailable to process a query, a final query result provided in response to the original query may be missing data that exists on those leaf nodes. A data accounting service monitors what managed data is being stored on the leaf nodes and on what leaf node. The data accounting service can estimate how much data is missing from a final query result when one or more of the leaf nodes are unable or unavailable to process a query.
Abstract:
Techniques provided herein allow for estimating data missing in query results provided in response to queries performed on data managed by a data management system. In the event that one or more leaf nodes are unable or unavailable to process a query, a final query result provided in response to the original query may be missing data that exists on those leaf nodes. A data accounting service monitors what managed data is being stored on the leaf nodes and on what leaf node. The data accounting service can estimate how much data is missing from a final query result when one or more of the leaf nodes are unable or unavailable to process a query.