Mechanism for stream processing efficiency using probabilistic model to reduce data redundancy
Abstract:
A method and system of data deduplication for data streams in a multi-tenant system. The method receives, at a data accuracy manager, an event from an activity tracking component, determine whether the event is recorded in a probabilistic model that tracks previously received events from the activity tracking component, where the probabilistic model can accurately identify the event has not been previously received with a possible false positive response where the event has been previously received, determines whether information for the event is stored in a metric storage, where the metric storage is a database of metrics derived from the previously received events, and discards the event in response to determining that the event is recorded in the probabilistic model and in the metric storage.
Information query
Patent Agency Ranking
0/0