APPROXIMATE CONFUSION MATRIX FOR MULTI-LABEL CLASSIFICATION

    公开(公告)号:US20250036934A1

    公开(公告)日:2025-01-30

    申请号:US18227758

    申请日:2023-07-28

    Abstract: Herein is validation of a trained classifier based on novel and accelerated estimation of a confusion matrix. In an embodiment, a computer hosts a trained classifier that infers, from many objects, an inferred frequency of each class. An upscaled magnitude of each class is generated from the inferred frequency of the class. An integer of each class is generated from the upscaled magnitude of the class. Based on those integers of the classes and a target integer for each class, counts are generated of the objects that are true positives, false positives, and false negatives of the class. Based on those counts, an estimated total of true positives, false positives, false negatives are generated that characterizes fitness of the trained classifier. In an embodiment, those counts and totals are downscaled to be fractions from zero to one.

    Semi-supervised framework for purpose-oriented anomaly detection

    公开(公告)号:US12143408B2

    公开(公告)日:2024-11-12

    申请号:US17739968

    申请日:2022-05-09

    Abstract: Techniques for implementing a semi-supervised framework for purpose-oriented anomaly detection are provided. In one technique, a data item in inputted into an unsupervised anomaly detection model, which generates first output. Based on the first output, it is determined whether the data item represents an anomaly. In response to determining that the data item represents an anomaly, the data item is inputted into a supervised classification model, which generates second output that indicates whether the data item is unknown. In response to determining that the data item is unknown, a training instance is generated based on the data item. The supervised classification model is updated based on the training instance.

    PROFILE-ENRICHED EXPLANATIONS OF DATA-DRIVEN MODELS

    公开(公告)号:US20240126798A1

    公开(公告)日:2024-04-18

    申请号:US18203195

    申请日:2023-05-30

    CPC classification number: G06F16/345 G06F16/335 G06F40/186

    Abstract: In an embodiment, a computer stores, in memory or storage, many explanation profiles, many log entries, and definitions of many features that log entries contain. Some features may contain a logic statement such as a database query, and these are specially aggregated based on similarity. Based on the entity specified by an explanation profile, statistics are materialized for some or all features. Statistics calculation may be based on scheduled batches of log entries or a stream of live log entries. At runtime, an inference that is based on a new log entry is received. Based on an entity specified in the new log entry, a particular explanation profile is dynamically selected. Based on the new log entry and statistics of features for the selected explanation profile, a local explanation of the inference is generated. In an embodiment, an explanation text template is used to generate the local explanation.

    SCORE PROPAGATION ON GRAPHS WITH DIFFERENT SUBGRAPH MAPPING STRATEGIES

    公开(公告)号:US20240070156A1

    公开(公告)日:2024-02-29

    申请号:US17893519

    申请日:2022-08-23

    CPC classification number: G06F16/24575

    Abstract: Techniques for propagating scores in subgraphs are provided. In one technique, multiple path scores are stored, each path score associated with a path (or subgraph), of multiple paths, in a graph of nodes. The path scores may be generated by a machine-learned model. For each path score, a path that is associated with that path score is identified and nodes of that path are identified. For each identified node, a node score for that node is determined or computed based on the corresponding path score and the node score is stored in association with that node. Subsequently, for each node in a subset of the graph, multiple node scores that are associated with that node are identified and aggregated to generate a propagated score for that node. In a related technique, a propagated score of a node is used to compute a score for each leaf node of the node.

    TEXTUAL EXPLANATIONS FOR ABSTRACT SYNTAX TREES WITH SCORED NODES

    公开(公告)号:US20240061997A1

    公开(公告)日:2024-02-22

    申请号:US17891350

    申请日:2022-08-19

    CPC classification number: G06F40/205 G06N20/00

    Abstract: Herein is a machine learning (ML) explainability (MLX) approach in which a natural language explanation is generated based on analysis of a parse tree such as for a suspicious database query or web browser JavaScript. In an embodiment, a computer selects, based on a respective relevance score for each non-leaf node in a parse tree of a statement, a relevant subset of non-leaf nodes. The non-leaf nodes are grouped in the parse tree into groups that represent respective portions of the statement. Based on a relevant subset of the groups that contain at least one non-leaf node in the relevant subset of non-leaf nodes, a natural language explanation of why the statement is anomalous is generated.

    TRACE REPRESENTATION LEARNING
    28.
    发明公开

    公开(公告)号:US20230376743A1

    公开(公告)日:2023-11-23

    申请号:US17748226

    申请日:2022-05-19

    CPC classification number: G06N3/08 G06N3/088 G06N20/00

    Abstract: The present invention avoids overfitting in deep neural network (DNN) training by using multitask learning (MTL) and self-supervised learning (SSL) techniques when training a multi-branch DNN to encode a sequence. In an embodiment, a computer first trains the DNN to perform a first task. The DNN contains: a first encoder in a first branch, a second encoder in a second branch, and an interpreter layer that combines data from the first branch and the second branch. The DNN second trains to perform a second task. After the first and second trainings, production encoding and inferencing occur. The first encoder encodes a sparse feature vector into a dense feature vector from which an inference is inferred. In an embodiment, a sequence of log messages is encoded into an encoded trace. An anomaly detector infers whether the sequence is anomalous. In an embodiment, the log messages are database commands.

    SEMI-SUPERVISED FRAMEWORK FOR PURPOSE-ORIENTED ANOMALY DETECTION

    公开(公告)号:US20230362180A1

    公开(公告)日:2023-11-09

    申请号:US17739968

    申请日:2022-05-09

    CPC classification number: H04L63/1425 G06N20/20

    Abstract: Techniques for implementing a semi-supervised framework for purpose-oriented anomaly detection are provided. In one technique, a data item in inputted into an unsupervised anomaly detection model, which generates first output. Based on the first output, it is determined whether the data item represents an anomaly. In response to determining that the data item represents an anomaly, the data item is inputted into a supervised classification model, which generates second output that indicates whether the data item is unknown. In response to determining that the data item is unknown, a training instance is generated based on the data item. The supervised classification model is updated based on the training instance.

Patent Agency Ranking