Probabilistic indexing of textual data

    公开(公告)号:US11138246B1

    公开(公告)日:2021-10-05

    申请号:US15194339

    申请日:2016-06-27

    Abstract: Techniques for searching a corpus of textual data using probabilistic data structures are described herein. The corpus of textual data is indexed using the probabilistic data structure on a piece-by-piece basis and the pieces are combined so that the textual data can be searched. The search results are returned, indicating a likelihood that the data item is in the textual data.

    Partitioned search of log events
    2.
    发明授权

    公开(公告)号:US10235417B1

    公开(公告)日:2019-03-19

    申请号:US14843850

    申请日:2015-09-02

    Abstract: A technology is provided for enabling a partitioned search to be performed on log events from multiple log streams that are stored by multiple hosts. A search query may be submitted to identify the log streams whose log events are to be searched and to indicate a time interval in which log events are to have occurred as indicated by the log events' time stamps. The multiple hosts may search stored log events in parallel and return a set of log-event search results satisfying the search query. A pagination token can be included with the set of log event search results. The pagination token may be used to resume the search if the multiple hosts were not able to completely finish searching the stored log events before the set of log-event search results had to be returned to prevent a timeout of a search client.

    Dynamic clustering for unstructured data

    公开(公告)号:US10331722B1

    公开(公告)日:2019-06-25

    申请号:US15607162

    申请日:2017-05-26

    Abstract: A dynamic clustering algorithm is used to process log data to generate pattern information. A word frequency map may be generated and/or updated based at least in part on entries of the log data. The word frequency map may indicate occurrences of words in the log data. In addition a modified word frequency map may be determined based at least in part on the frequency of adjacent words as indicated in the word frequency map. Based at least in part on the modified word frequency map a line threshold is determined. The line threshold indicating a common frequency indicated in the modified word frequency map. The line threshold may then be used to generate a pattern for an entry of the log data.

    Data log stream processing using probabilistic data structures

    公开(公告)号:US10853359B1

    公开(公告)日:2020-12-01

    申请号:US14977497

    申请日:2015-12-21

    Abstract: A computing resource monitoring service receives a request to obtain data for various computing resources. The service obtains, from the various computing resources, one or more data log streams that include the requested data. The service utilizes the one or more data log streams to generate a probabilistic data structure that can be used to indicate that data log streams have been processed. If the one or more data log streams are not completely processed prior to the end of an allotted time period for processing of the request, the service generates a token that specifies partially processed data log streams and the probabilistic data structure. The token can be used to enable resumption of processing of the request.

    Clustered architecture design
    6.
    发明授权

    公开(公告)号:US10178021B1

    公开(公告)日:2019-01-08

    申请号:US14981646

    申请日:2015-12-28

    Abstract: Systems and methods are provided for organizing data channels and processing hosts included in a system into clusters. A cluster management service may receive data from a steam of data and may route the data to a cluster associated with the data stream. A data channel routing service included in the cluster may route the data to the set of processing hosts included in the cluster through multiple data channels included in the cluster. In some instances, the data channel routing service may use any of the data channels to send data to the set of processing hosts. Because incoming data may be distributed among multiple data channels, the cluster may experience less congestion. Further, the system may also process the stream of data using the same processing hosts by routing the stream of data to the same cluster, thereby avoiding split processing of the data stream.

    VERIFIABLE RECORD STORAGE SERVICE
    7.
    发明申请

    公开(公告)号:US20190007393A1

    公开(公告)日:2019-01-03

    申请号:US16127091

    申请日:2018-09-10

    Abstract: A record storage system maintains an interdependent series of hash values for records submitted to the record storage service by one or more clients. The record storage service generates a hash value for each record based at least in part on the content of the record and a hash value of one or more previous records. In some examples, the generated hash values are saved in an audit database by the clients. Clients may retain some, all, or none of the hash values based on the amount of auditing desired and the amount of storage space available in the audit database. The clients are able to verify the integrity of records submitted to the record storage system by retrieving the records from the system, recalculating the hash values of the records, and comparing the recalculated hash values to the hash values retained by the client.

    Verifiable log service
    8.
    发明授权

    公开(公告)号:US10075425B1

    公开(公告)日:2018-09-11

    申请号:US15249136

    申请日:2016-08-26

    CPC classification number: H04L63/123 H04L9/3239 H04L63/1425 H04L2209/38

    Abstract: A logging service maintains an interdependent series of hash values for log entries submitted to the logging service by one or more clients. The logging service generates a hash value for each log entry based at least in part on the content of the log entry and a hash value of one or more previous log entries. The generated hash values are saved in an audit database by the clients. Clients may retain some, all, or none of the hash values based at least in part on the amount of auditing desired and the amount of storage space available in the audit database. The clients are able to verify the integrity of log entries submitted to the logging service retrieving the log entries from the logging service, recalculating the hash values, and comparing the recalculated hash values to the hash values in the audit database.

    Metrics prediction using dynamic confidence coefficients

    公开(公告)号:US11295224B1

    公开(公告)日:2022-04-05

    申请号:US15373369

    申请日:2016-12-08

    Abstract: A method includes obtaining time series data for a usage or performance metric for computing resources in a service provider network comprising a plurality of observations recorded in a plurality of respective time steps. A prediction error is determined for a previous prediction of an observation in the time series data. The prediction error is used to update a standard deviation of a set of predication errors for the usage or performance metric. The standard deviation and the prediction error are then used to update a confidence coefficient. A prediction limit for the usage or performance metric is then determined based on an expected value, the confidence coefficient, and the standard deviation. One or more events may be generated based on the prediction limit, which may be used to trigger a reconfiguration or auto-scaling of the computing resources.

    Verifiable record storage service
    10.
    发明授权

    公开(公告)号:US10904264B2

    公开(公告)日:2021-01-26

    申请号:US16127091

    申请日:2018-09-10

    Abstract: A record storage system maintains an interdependent series of hash values for records submitted to the record storage service by one or more clients. The record storage service generates a hash value for each record based at least in part on the content of the record and a hash value of one or more previous records. In some examples, the generated hash values are saved in an audit database by the clients. Clients may retain some, all, or none of the hash values based on the amount of auditing desired and the amount of storage space available in the audit database. The clients are able to verify the integrity of records submitted to the record storage system by retrieving the records from the system, recalculating the hash values of the records, and comparing the recalculated hash values to the hash values retained by the client.

Patent Agency Ranking