-
公开(公告)号:US12205022B2
公开(公告)日:2025-01-21
申请号:US16945415
申请日:2020-07-31
Applicant: Splunk Inc.
Inventor: Ram Sriharsha , Zhaohui Wang , Kristal Curtis
IPC: G06N3/08 , G06F16/23 , G06F16/245
Abstract: Systems and methods are described for extracting data fields from logs ingested in a data processing pipeline or otherwise stored. For example, a log can be applied as an input to an artificial intelligence model trained to infer a log sourcetype of logs, and the artificial intelligence model can output an inferred log sourcetype of the log. The inferred log sourcetype can be used to select another artificial intelligence model trained to extract data fields from logs having the inferred log sourcetype, and the log can then be applied as an input to the other artificial intelligence model. The other artificial intelligence model may then output one or more data fields extracted from the log.
-
公开(公告)号:US11995052B1
公开(公告)日:2024-05-28
申请号:US17591528
申请日:2022-02-02
Applicant: Splunk Inc.
Inventor: Zhaohui Wang , Ryan Gannon , Xiao Lin , Chandrima Sarkar
IPC: G06F16/215
CPC classification number: G06F16/215
Abstract: A computerized method for detection of categorical drift within an incoming data stream. Herein, an error threshold is computed based on a first set of training data samples selected to detect categorical drift occurring for a data stream. Thereafter, probability distributions associated with content of a first and second data samples of the data stream are computed. Analytics are conducted to compute a difference between content of the first probability distribution that is based on a first data point of the first data sample and content of the second probability distribution that is based on a first data point of the second data sample. After computing the difference, that categorical drift is determined whether categorical drift detection has been conducted.
-
公开(公告)号:US11792157B1
公开(公告)日:2023-10-17
申请号:US17941502
申请日:2022-09-09
Applicant: SPLUNK Inc.
Inventor: Abhinav Mishra , Giovanni Mola , Ram Sriharsha , Zhaohui Wang
IPC: H04L61/4511 , H04L67/141 , G06F40/205 , H04L43/067 , H04L47/28
CPC classification number: H04L61/4511 , G06F40/205 , H04L43/067 , H04L47/286 , H04L67/141
Abstract: The disclosure provides implementations for determining whether domain name server (DNS) beaconing is present within a communication session. Some implementations provide a method that includes multiple analyses directed to analyzing each of a time-to-live (TTL) run length distribution for a plurality of DNS records within the communication session and analyzing whether the communication is comprised of at least a threshold number of transmissions. As used in the analyses, the communication session may be comprised of transmissions between a first source device and a first DNS. When DNS beaconing is detected within the communication session, some implementations of the disclosure provide for generating an alert to an administrator or other user.
-
公开(公告)号:US11907227B1
公开(公告)日:2024-02-20
申请号:US17591511
申请日:2022-02-02
Applicant: Splunk, Inc.
Inventor: Zhaohui Wang , Ryan Gannon , Xiao Lin , Abhinav Mishra , Chandrima Sarkar , Ram Sriharsha
IPC: G06F16/00 , G06F16/2455 , G06F16/22 , G06F16/2458
CPC classification number: G06F16/24568 , G06F16/22 , G06F16/2462 , G06F16/24552
Abstract: A computerized method is disclosed including operations of receiving a data stream, performing a changepoint detection resulting in a detection of changepoints in the data stream including: maintaining a listing of starting indices for each run within the data stream in a buffer of size L wherein each index of the listing has a run length probability representing a likelihood of being a changepoint, receiving a new data point within the data stream and adding a new index to the buffer resulting in the buffer having size L+1, calculating a posterior run length probability that the new data point is a changepoint, and removing an index from the listing that has a lowest run length probability thereby returning the buffer to size L, and responsive to determining the index removed from the listing does not correspond to the new data point, identifying a changepoint associated with the new data point.
-
公开(公告)号:US11477161B1
公开(公告)日:2022-10-18
申请号:US17514814
申请日:2021-10-29
Applicant: SPLUNK Inc.
Inventor: Abhinav Mishra , Giovanni Mola , Ram Sriharsha , Zhaohui Wang
IPC: H04L61/4511 , H04L67/141 , H04L43/067 , H04L47/28 , G06F40/205
Abstract: A computerized method is disclosed that includes accessing domain name server (DNS) record data including a plurality of DNS records spanning a first time period, performing a time-to-live (TTL) analysis to determine a TTL run length distribution for the DNS record data, wherein the TTL analysis includes: generating a vector of the TTL values of each DNS record ordered sequentially in time, parsing the vector of the TTL values into segments, where a segment consists of one or more TTL values where a current TTL value is less than an immediately preceding TTL value, and determining the TTL run length distribution, determining whether DNS beaconing is present based on a result of the TTL analysis and in response to determining that DNS beaconing is present, generating an alert for a system administrator.
-
公开(公告)号:US20220036002A1
公开(公告)日:2022-02-03
申请号:US16945448
申请日:2020-07-31
Applicant: Splunk Inc.
Inventor: Ram Sriharsha , Zhaohui Wang
IPC: G06F40/284 , G06N20/00 , G06N5/04 , G06F16/33 , G06F40/242
Abstract: Systems and methods are described for training an artificial intelligence model to infer a log sourcetype of a log. For example, logs may have different log sourcetypes, and logs having the same log sourcetypes may have different messagetypes. The artificial intelligence model may be a machine learning model, and can be trained using training data that includes logs with known log sourcetypes. Each log can be tokenized, filtered, converted into a vector, and applied to a machine learning model as an input to perform the training. The machine learning model may output an inferred log sourcetype, which can be compared with the known log sourcetype to update model parameters to improve the machine learning model accuracy. The trained machine learning model may be trained to infer a log sourcetype of a log regardless of the messagetype of the log.
-
公开(公告)号:US20220035775A1
公开(公告)日:2022-02-03
申请号:US16945229
申请日:2020-07-31
Applicant: Splunk Inc.
Inventor: Ram Sriharsha , Zhaohui Wang
Abstract: Systems and methods are described for training an artificial intelligence model to extract one or more data fields from a log. For example, the artificial intelligence model may be a neural network. The neural network may be trained using training data obtained by iterating through a plurality of logs using active learning, and selecting a subset of the logs in the plurality to be labeled by a user. For example, the selected subset of logs may be logs that are not similar to other logs already labeled by a user. The user may be prompted to label the selected subset of logs to identify one or more data fields to extract. Once the selected subset of logs are labeled, these labeled logs can be used as the training data to train the neural network.
-
公开(公告)号:US20220036177A1
公开(公告)日:2022-02-03
申请号:US16945415
申请日:2020-07-31
Applicant: Splunk Inc.
Inventor: Ram Sriharsha , Zhaohui Wang
IPC: G06N3/08 , G06F16/245 , G06F16/23
Abstract: Systems and methods are described for extracting data fields from logs ingested in a data processing pipeline or otherwise stored. For example, a log can be applied as an input to an artificial intelligence model trained to infer a log sourcetype of logs, and the artificial intelligence model can output an inferred log sourcetype of the log. The inferred log sourcetype can be used to select another artificial intelligence model trained to extract data fields from logs having the inferred log sourcetype, and the log can then be applied as an input to the other artificial intelligence model. The other artificial intelligence model may then output one or more data fields extracted from the log.
-
公开(公告)号:US12056169B1
公开(公告)日:2024-08-06
申请号:US17513670
申请日:2021-10-28
Applicant: SPLUNK Inc.
Inventor: Abhinav Mishra , Giovanni Mola , Ram Sriharsha , Abraham Starosta , Zhaohui Wang
CPC classification number: G06F16/334 , G06F16/35 , G06N20/00
Abstract: A computerized method is disclosed that includes operations of training a machine learning model using a labeled training set of data, wherein the machine learning model is configured to classify domain name server (DNS) records, obtaining DNS record data including at least a first DNS Txt record, applying the trained machine learning model to the first DNS Txt record to classify the first DNS Txt record and responsive to the classification of the first DNS Txt record, generating a flag for a system administrator. The trained machine learning model may classify the first DNS Txt record using logistic regression. In some instances, applying the trained machine learning model to the first DNS Txt record includes performing a tokenizing operation on the first DNS Txt record to generate a tokenized first DNS Txt record.
-
公开(公告)号:US11704490B2
公开(公告)日:2023-07-18
申请号:US16945448
申请日:2020-07-31
Applicant: Splunk Inc.
Inventor: Ram Sriharsha , Zhaohui Wang , Kristal Curtis
IPC: G06F40/284 , G06N20/00 , G06F40/242 , G06F16/33 , G06N5/04
CPC classification number: G06F40/284 , G06F16/3347 , G06F40/242 , G06N5/04 , G06N20/00
Abstract: Systems and methods are described for training an artificial intelligence model to infer a log sourcetype of a log. For example, logs may have different log sourcetypes, and logs having the same log sourcetypes may have different messagetypes. The artificial intelligence model may be a machine learning model, and can be trained using training data that includes logs with known log sourcetypes. Each log can be tokenized, filtered, converted into a vector, and applied to a machine learning model as an input to perform the training. The machine learning model may output an inferred log sourcetype, which can be compared with the known log sourcetype to update model parameters to improve the machine learning model accuracy. The trained machine learning model may be trained to infer a log sourcetype of a log regardless of the messagetype of the log.
-
-
-
-
-
-
-
-
-