-
21.
公开(公告)号:US20220138504A1
公开(公告)日:2022-05-05
申请号:US17083536
申请日:2020-10-29
Applicant: Oracle International Corporation
Inventor: Hesam Fathi Moghadam , Anatoly Yakovlev , Sandeep Agrawal , Venkatanathan Varadarajan , Robert Hopkins , Matteo Casserini , Milos Vasic , Sanjay Jinturkar , Nipun Agarwal
Abstract: In an embodiment based on computer(s), an ML model is trained to detect outliers. The ML model calculates anomaly scores that include a respective anomaly score for each item in a validation dataset. The anomaly scores are automatically organized by sorting and/or clustering. Based on the organized anomaly scores, a separation is measured that indicates fitness of the ML model. In an embodiment, a computer performs two-clustering of anomaly scores into a first organization that consists of a first normal cluster of anomaly scores and a first anomaly cluster of anomaly scores. The computer performs three-clustering of the same anomaly scores into a second organization that consists of a second normal cluster of anomaly scores, a second anomaly cluster of anomaly scores, and a middle cluster of anomaly scores. A distribution difference between the first organization and the second organization is measured. An ML model is processed based on the distribution difference.
-
22.
公开(公告)号:US20250060951A1
公开(公告)日:2025-02-20
申请号:US18235461
申请日:2023-08-18
Applicant: Oracle International Corporation
Inventor: Tomas Feith , Arno Schneuwly , Saeid Allahdadian , Matteo Casserini , Felix Schmidt
IPC: G06F8/41 , G06F16/901
Abstract: In an embodiment providing natural language processing (NLP), a computer generates a histogram that correctly represents a graph that represents a lexical text, and generates a token sequence encoder that is trainable and untrained. During training such as pretraining, the token sequence encoder infers an encoded sequence that incorrectly represents the lexical text, and the encoded sequence is dense and saves space. To increase the accuracy of the sequence encoder by learning, the token sequence encoder is adjusted based on, as discussed herein, an indirectly measured numeric difference between the encoded sequence that incorrectly represents the lexical text and the histogram that correctly represents the graph.
-
公开(公告)号:US20250036934A1
公开(公告)日:2025-01-30
申请号:US18227758
申请日:2023-07-28
Applicant: Oracle International Corporation
Inventor: Tomas Feith , Arno Schneuwly , Saeid Allahdadian , Matteo Casserini , Felix Schmidt
IPC: G06N3/08
Abstract: Herein is validation of a trained classifier based on novel and accelerated estimation of a confusion matrix. In an embodiment, a computer hosts a trained classifier that infers, from many objects, an inferred frequency of each class. An upscaled magnitude of each class is generated from the inferred frequency of the class. An integer of each class is generated from the upscaled magnitude of the class. Based on those integers of the classes and a target integer for each class, counts are generated of the objects that are true positives, false positives, and false negatives of the class. Based on those counts, an estimated total of true positives, false positives, false negatives are generated that characterizes fitness of the trained classifier. In an embodiment, those counts and totals are downscaled to be fractions from zero to one.
-
公开(公告)号:US12143408B2
公开(公告)日:2024-11-12
申请号:US17739968
申请日:2022-05-09
Applicant: Oracle International Corporation
Inventor: Milos Vasic , Saeid Allahdadian , Matteo Casserini , Felix Schmidt , Andrew Brownsword
Abstract: Techniques for implementing a semi-supervised framework for purpose-oriented anomaly detection are provided. In one technique, a data item in inputted into an unsupervised anomaly detection model, which generates first output. Based on the first output, it is determined whether the data item represents an anomaly. In response to determining that the data item represents an anomaly, the data item is inputted into a supervised classification model, which generates second output that indicates whether the data item is unknown. In response to determining that the data item is unknown, a training instance is generated based on the data item. The supervised classification model is updated based on the training instance.
-
公开(公告)号:US20240126798A1
公开(公告)日:2024-04-18
申请号:US18203195
申请日:2023-05-30
Applicant: Oracle International Corporation
Inventor: Arno Schneuwly , Desislava Wagenknecht-Dimitrova , Felix Schmidt , Marija Nikolic , Matteo Casserini , Milos Vasic , Renata Khasanova
IPC: G06F16/34 , G06F16/335 , G06F40/186
CPC classification number: G06F16/345 , G06F16/335 , G06F40/186
Abstract: In an embodiment, a computer stores, in memory or storage, many explanation profiles, many log entries, and definitions of many features that log entries contain. Some features may contain a logic statement such as a database query, and these are specially aggregated based on similarity. Based on the entity specified by an explanation profile, statistics are materialized for some or all features. Statistics calculation may be based on scheduled batches of log entries or a stream of live log entries. At runtime, an inference that is based on a new log entry is received. Based on an entity specified in the new log entry, a particular explanation profile is dynamically selected. Based on the new log entry and statistics of features for the selected explanation profile, a local explanation of the inference is generated. In an embodiment, an explanation text template is used to generate the local explanation.
-
公开(公告)号:US20240070156A1
公开(公告)日:2024-02-29
申请号:US17893519
申请日:2022-08-23
Applicant: Oracle International Corporation
Inventor: Kenyu Kobayashi , Arno Schneuwly , Renata Khasanova , Matteo Casserini , Felix Schmidt
IPC: G06F16/2457
CPC classification number: G06F16/24575
Abstract: Techniques for propagating scores in subgraphs are provided. In one technique, multiple path scores are stored, each path score associated with a path (or subgraph), of multiple paths, in a graph of nodes. The path scores may be generated by a machine-learned model. For each path score, a path that is associated with that path score is identified and nodes of that path are identified. For each identified node, a node score for that node is determined or computed based on the corresponding path score and the node score is stored in association with that node. Subsequently, for each node in a subset of the graph, multiple node scores that are associated with that node are identified and aggregated to generate a propagated score for that node. In a related technique, a propagated score of a node is used to compute a score for each leaf node of the node.
-
公开(公告)号:US20240061997A1
公开(公告)日:2024-02-22
申请号:US17891350
申请日:2022-08-19
Applicant: Oracle International Corporation
Inventor: Kenyu Kobayashi , Arno Schneuwly , Renata Khasanova , Matteo Casserini , Felix Schmidt
IPC: G06F40/205 , G06N20/00
CPC classification number: G06F40/205 , G06N20/00
Abstract: Herein is a machine learning (ML) explainability (MLX) approach in which a natural language explanation is generated based on analysis of a parse tree such as for a suspicious database query or web browser JavaScript. In an embodiment, a computer selects, based on a respective relevance score for each non-leaf node in a parse tree of a statement, a relevant subset of non-leaf nodes. The non-leaf nodes are grouped in the parse tree into groups that represent respective portions of the statement. Based on a relevant subset of the groups that contain at least one non-leaf node in the relevant subset of non-leaf nodes, a natural language explanation of why the statement is anomalous is generated.
-
公开(公告)号:US20230376743A1
公开(公告)日:2023-11-23
申请号:US17748226
申请日:2022-05-19
Applicant: Oracle International Corporation
Inventor: Marija Nikolic , Nikola Milojkovic , Arno Schneuwly , Matteo Casserini , Milos Vasic , Renata Khasanova , Felix Schmidt
Abstract: The present invention avoids overfitting in deep neural network (DNN) training by using multitask learning (MTL) and self-supervised learning (SSL) techniques when training a multi-branch DNN to encode a sequence. In an embodiment, a computer first trains the DNN to perform a first task. The DNN contains: a first encoder in a first branch, a second encoder in a second branch, and an interpreter layer that combines data from the first branch and the second branch. The DNN second trains to perform a second task. After the first and second trainings, production encoding and inferencing occur. The first encoder encodes a sparse feature vector into a dense feature vector from which an inference is inferred. In an embodiment, a sequence of log messages is encoded into an encoded trace. An anomaly detector infers whether the sequence is anomalous. In an embodiment, the log messages are database commands.
-
公开(公告)号:US20230362180A1
公开(公告)日:2023-11-09
申请号:US17739968
申请日:2022-05-09
Applicant: Oracle International Corporation
Inventor: Milos Vasic , Saeid Allahdadian , Matteo Casserini , Felix Schmidt , Andrew Brownsword
CPC classification number: H04L63/1425 , G06N20/20
Abstract: Techniques for implementing a semi-supervised framework for purpose-oriented anomaly detection are provided. In one technique, a data item in inputted into an unsupervised anomaly detection model, which generates first output. Based on the first output, it is determined whether the data item represents an anomaly. In response to determining that the data item represents an anomaly, the data item is inputted into a supervised classification model, which generates second output that indicates whether the data item is unknown. In response to determining that the data item is unknown, a training instance is generated based on the data item. The supervised classification model is updated based on the training instance.
-
-
-
-
-
-
-
-