-
公开(公告)号:US20230421528A1
公开(公告)日:2023-12-28
申请号:US18237853
申请日:2023-08-24
Applicant: Oracle International Corporation
Inventor: Renata Khasanova , Felix Schmidt , Stuart Wray , Craig Schelp , Nipun Agarwal , Matteo Casserini
IPC: H04L61/4511 , G06N20/00 , H04L41/16
CPC classification number: H04L61/4511 , G06F40/30 , H04L41/16 , G06N20/00
Abstract: Techniques are described herein for using machine learning to learn vector representations of DNS requests such that the resulting embeddings represent the semantics of the DNS requests as a whole. Techniques described herein perform pre-processing of tokenized DNS request strings in which hashes, which are long and relatively random strings of characters, are detected in DNS request strings and each detected hash token is replaced with a placeholder token. A vectorizing ML model is trained using the pre-processed training dataset in which hash tokens have been replaced. Embeddings for the DNS tokens are derived from an intermediate layer of the vectorizing ML model. The encoding application creates final vector representations for each DNS request string by generating a weighted summation of the embeddings of all of the tokens in the DNS request string. Because of hash replacement, the resulting DNS request embeddings reflect semantics of the hashes as a group.
-
公开(公告)号:US20230419169A1
公开(公告)日:2023-12-28
申请号:US17851120
申请日:2022-06-28
Applicant: Oracle International Corporation
Inventor: Kenyu Kobayashi , Arno Schneuwly , Renata Khasanova , Matteo Casserini , Felix Schmidt
IPC: G06N20/00
CPC classification number: G06N20/00
Abstract: Herein are machine learning (ML) explainability (MLX) techniques that perturb a non-anomalous tuple to generate an anomalous tuple as adversarial input to any explainer that is based on feature attribution. In an embodiment, a computer generates, from a non-anomalous tuple, an anomalous tuple that contains a perturbed value of a perturbed feature. In the anomalous tuple, the perturbed value of the perturbed feature is modified to cause a change in reconstruction error for the anomalous tuple. The change in reconstruction error includes a decrease in reconstruction error of the perturbed feature and/or an increase in a sum of reconstruction error of all features that are not the perturbed feature. After modifying the perturbed value, an attribution-based explainer automatically generates an explanation that identifies an identified feature as a cause of the anomalous tuple being anomalous. Whether the identified feature of the explanation is or is not the perturbed feature is detected.
-
公开(公告)号:US20230368054A1
公开(公告)日:2023-11-16
申请号:US17745103
申请日:2022-05-16
Applicant: Oracle International Corporation
Inventor: Marija Nikolic , Matteo Casserini , Arno Schneuwly , Nikola Milojkovic , Milos Vasic , Renata Khasanova , Felix Schmidt
Abstract: The present invention relates to threshold estimation and calibration for anomaly detection. Herein are machine learning (ML) and extreme value theory (EVT) techniques for normalizing and thresholding anomaly scores without presuming a values distribution. In an embodiment, a computer receives many unnormalized anomaly scores and, according to peak over threshold (POT), selects a highest subset of the unnormalized anomaly scores that exceed a tail threshold. Based on the highest subset of the unnormalized anomaly scores, parameters of a probability density function are trained according to EVT. After training and in a production environment, a normalized anomaly score is generated based on an unnormalized anomaly score and the trained parameters of the probability density function. Anomaly detection compares the normalized anomaly score to an optimized anomaly threshold.
-
公开(公告)号:US20220294757A1
公开(公告)日:2022-09-15
申请号:US17197375
申请日:2021-03-10
Applicant: Oracle International Corporation
Inventor: Renata Khasanova , Felix Schmidt , Stuart Wray , Craig Schelp , Nipun Agarwal , Matteo Casserini
Abstract: Techniques are described herein for using machine learning to learn vector representations of DNS requests such that the resulting embeddings represent the semantics of the DNS requests as a whole. Techniques described herein perform pre-processing of tokenized DNS request strings in which hashes, which are long and relatively random strings of characters, are detected in DNS request strings and each detected hash token is replaced with a placeholder token. A vectorizing ML model is trained using the pre-processed training dataset in which hash tokens have been replaced. Embeddings for the DNS tokens are derived from an intermediate layer of the vectorizing ML model. The encoding application creates final vector representations for each DNS request string by generating a weighted summation of the embeddings of all of the tokens in the DNS request string. Because of hash replacement, the resulting DNS request embeddings reflect semantics of the hashes as a group.
-
-
-