Data field extraction by a data intake and query system

    公开(公告)号:US12205022B2

    公开(公告)日:2025-01-21

    申请号:US16945415

    申请日:2020-07-31

    Applicant: Splunk Inc.

    Abstract: Systems and methods are described for extracting data fields from logs ingested in a data processing pipeline or otherwise stored. For example, a log can be applied as an input to an artificial intelligence model trained to infer a log sourcetype of logs, and the artificial intelligence model can output an inferred log sourcetype of the log. The inferred log sourcetype can be used to select another artificial intelligence model trained to extract data fields from logs having the inferred log sourcetype, and the log can then be applied as an input to the other artificial intelligence model. The other artificial intelligence model may then output one or more data fields extracted from the log.

    System and method for categorical drift detection

    公开(公告)号:US11995052B1

    公开(公告)日:2024-05-28

    申请号:US17591528

    申请日:2022-02-02

    Applicant: Splunk Inc.

    CPC classification number: G06F16/215

    Abstract: A computerized method for detection of categorical drift within an incoming data stream. Herein, an error threshold is computed based on a first set of training data samples selected to detect categorical drift occurring for a data stream. Thereafter, probability distributions associated with content of a first and second data samples of the data stream are computed. Analytics are conducted to compute a difference between content of the first probability distribution that is based on a first data point of the first data sample and content of the second probability distribution that is based on a first data point of the second data sample. After computing the difference, that categorical drift is determined whether categorical drift detection has been conducted.

    System and method for changepoint detection in streaming data

    公开(公告)号:US11907227B1

    公开(公告)日:2024-02-20

    申请号:US17591511

    申请日:2022-02-02

    Applicant: Splunk, Inc.

    CPC classification number: G06F16/24568 G06F16/22 G06F16/2462 G06F16/24552

    Abstract: A computerized method is disclosed including operations of receiving a data stream, performing a changepoint detection resulting in a detection of changepoints in the data stream including: maintaining a listing of starting indices for each run within the data stream in a buffer of size L wherein each index of the listing has a run length probability representing a likelihood of being a changepoint, receiving a new data point within the data stream and adding a new index to the buffer resulting in the buffer having size L+1, calculating a posterior run length probability that the new data point is a changepoint, and removing an index from the listing that has a lowest run length probability thereby returning the buffer to size L, and responsive to determining the index removed from the listing does not correspond to the new data point, identifying a changepoint associated with the new data point.

    Systems and methods for detecting DNS communications through time-to-live analyses

    公开(公告)号:US11477161B1

    公开(公告)日:2022-10-18

    申请号:US17514814

    申请日:2021-10-29

    Applicant: SPLUNK Inc.

    Abstract: A computerized method is disclosed that includes accessing domain name server (DNS) record data including a plurality of DNS records spanning a first time period, performing a time-to-live (TTL) analysis to determine a TTL run length distribution for the DNS record data, wherein the TTL analysis includes: generating a vector of the TTL values of each DNS record ordered sequentially in time, parsing the vector of the TTL values into segments, where a segment consists of one or more TTL values where a current TTL value is less than an immediately preceding TTL value, and determining the TTL run length distribution, determining whether DNS beaconing is present based on a result of the TTL analysis and in response to determining that DNS beaconing is present, generating an alert for a system administrator.

    LOG SOURCETYPE INFERENCE MODEL TRAINING FOR A DATA INTAKE AND QUERY SYSTEM

    公开(公告)号:US20220036002A1

    公开(公告)日:2022-02-03

    申请号:US16945448

    申请日:2020-07-31

    Applicant: Splunk Inc.

    Abstract: Systems and methods are described for training an artificial intelligence model to infer a log sourcetype of a log. For example, logs may have different log sourcetypes, and logs having the same log sourcetypes may have different messagetypes. The artificial intelligence model may be a machine learning model, and can be trained using training data that includes logs with known log sourcetypes. Each log can be tokenized, filtered, converted into a vector, and applied to a machine learning model as an input to perform the training. The machine learning model may output an inferred log sourcetype, which can be compared with the known log sourcetype to update model parameters to improve the machine learning model accuracy. The trained machine learning model may be trained to infer a log sourcetype of a log regardless of the messagetype of the log.

    DATA FIELD EXTRACTION MODEL TRAINING FOR A DATA INTAKE AND QUERY SYSTEM

    公开(公告)号:US20220035775A1

    公开(公告)日:2022-02-03

    申请号:US16945229

    申请日:2020-07-31

    Applicant: Splunk Inc.

    Abstract: Systems and methods are described for training an artificial intelligence model to extract one or more data fields from a log. For example, the artificial intelligence model may be a neural network. The neural network may be trained using training data obtained by iterating through a plurality of logs using active learning, and selecting a subset of the logs in the plurality to be labeled by a user. For example, the selected subset of logs may be logs that are not similar to other logs already labeled by a user. The user may be prompted to label the selected subset of logs to identify one or more data fields to extract. Once the selected subset of logs are labeled, these labeled logs can be used as the training data to train the neural network.

    DATA FIELD EXTRACTION BY A DATA INTAKE AND QUERY SYSTEM

    公开(公告)号:US20220036177A1

    公开(公告)日:2022-02-03

    申请号:US16945415

    申请日:2020-07-31

    Applicant: Splunk Inc.

    Abstract: Systems and methods are described for extracting data fields from logs ingested in a data processing pipeline or otherwise stored. For example, a log can be applied as an input to an artificial intelligence model trained to infer a log sourcetype of logs, and the artificial intelligence model can output an inferred log sourcetype of the log. The inferred log sourcetype can be used to select another artificial intelligence model trained to extract data fields from logs having the inferred log sourcetype, and the log can then be applied as an input to the other artificial intelligence model. The other artificial intelligence model may then output one or more data fields extracted from the log.

    Systems and methods for DNS text classification

    公开(公告)号:US12056169B1

    公开(公告)日:2024-08-06

    申请号:US17513670

    申请日:2021-10-28

    Applicant: SPLUNK Inc.

    CPC classification number: G06F16/334 G06F16/35 G06N20/00

    Abstract: A computerized method is disclosed that includes operations of training a machine learning model using a labeled training set of data, wherein the machine learning model is configured to classify domain name server (DNS) records, obtaining DNS record data including at least a first DNS Txt record, applying the trained machine learning model to the first DNS Txt record to classify the first DNS Txt record and responsive to the classification of the first DNS Txt record, generating a flag for a system administrator. The trained machine learning model may classify the first DNS Txt record using logistic regression. In some instances, applying the trained machine learning model to the first DNS Txt record includes performing a tokenizing operation on the first DNS Txt record to generate a tokenized first DNS Txt record.

    Log sourcetype inference model training for a data intake and query system

    公开(公告)号:US11704490B2

    公开(公告)日:2023-07-18

    申请号:US16945448

    申请日:2020-07-31

    Applicant: Splunk Inc.

    CPC classification number: G06F40/284 G06F16/3347 G06F40/242 G06N5/04 G06N20/00

    Abstract: Systems and methods are described for training an artificial intelligence model to infer a log sourcetype of a log. For example, logs may have different log sourcetypes, and logs having the same log sourcetypes may have different messagetypes. The artificial intelligence model may be a machine learning model, and can be trained using training data that includes logs with known log sourcetypes. Each log can be tokenized, filtered, converted into a vector, and applied to a machine learning model as an input to perform the training. The machine learning model may output an inferred log sourcetype, which can be compared with the known log sourcetype to update model parameters to improve the machine learning model accuracy. The trained machine learning model may be trained to infer a log sourcetype of a log regardless of the messagetype of the log.

Patent Agency Ranking