Automatically determining whether an activation cluster contains poisonous data

    公开(公告)号:US11487963B2

    公开(公告)日:2022-11-01

    申请号:US16571321

    申请日:2019-09-16

    IPC分类号: G06K9/62 G06N3/04 G06N3/08

    摘要: Embodiments relate to a system, program product, and method for automatically determining which activation data points in a neural model have been poisoned to erroneously indicate association with a particular label or labels. A neural network is trained network using potentially poisoned training data. Each of the training data points is classified using the network to retain the activations of the last hidden layer, and segment those activations by the label of corresponding training data. Clustering is applied to the retained activations of each segment, and a cluster assessment is conducted for each cluster associated with each label to distinguish clusters with potentially poisoned activations from clusters populated with legitimate activations. The assessment includes analyzing, for each cluster, a distance of a median of the activations therein to medians of the activations in the labels.

    Automatically Determining Whether an Activation Cluster Contains Poisonous Data

    公开(公告)号:US20210081708A1

    公开(公告)日:2021-03-18

    申请号:US16571321

    申请日:2019-09-16

    IPC分类号: G06K9/62 G06N3/08 G06N3/04

    摘要: Embodiments relate to a system, program product, and method for automatically determining which activation data points in a neural model have been poisoned to erroneously indicate association with a particular label or labels. A neural network is trained network using potentially poisoned training data. Each of the training data points is classified using the network to retain the activations of the last hidden layer, and segment those activations by the label of corresponding training data. Clustering is applied to the retained activations of each segment, and a cluster assessment is conducted for each cluster associated with each label to distinguish clusters with potentially poisoned activations from clusters populated with legitimate activations. The assessment includes analyzing, for each cluster, a distance of a median of the activations therein to medians of the activations in the labels.

    DETECTING AND MITIGATING POISON ATTACKS USING DATA PROVENANCE

    公开(公告)号:US20200019821A1

    公开(公告)日:2020-01-16

    申请号:US16031953

    申请日:2018-07-10

    IPC分类号: G06K9/62 G06F15/18 H04L29/06

    摘要: Computer-implemented methods, program products, and systems for provenance-based defense against poison attacks are disclosed. In one approach, a method includes: receiving observations and corresponding provenance data from data sources; determining whether the observations are poisoned based on the corresponding provenance data; and removing the poisoned observation(s) from a final training dataset used to train a final prediction model. Another implementation involves provenance-based defense against poison attacks in a fully untrusted data environment. Untrusted data points are grouped according to provenance signature, and the groups are used to train learning algorithms and generate complete and filtered prediction models. The results of applying the prediction models to an evaluation dataset are compared, and poisoned data points identified where the performance of the filtered prediction model exceeds the performance of the complete prediction model. Poisoned data points are removed from the set to generate a final prediction model.

    Detection of an adversarial backdoor attack on a trained model at inference time

    公开(公告)号:US11601468B2

    公开(公告)日:2023-03-07

    申请号:US16451110

    申请日:2019-06-25

    IPC分类号: H04L9/40 G06N5/04 G06N20/00

    摘要: Systems, computer-implemented methods, and computer program products that can facilitate detection of an adversarial backdoor attack on a trained model at inference time are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a log component that records predictions and corresponding activation values generated by a trained model based on inference requests. The computer executable components can further comprise an analysis component that employs a model at an inference time to detect a backdoor trigger request based on the predictions and the corresponding activation values. In some embodiments, the log component records the predictions and the corresponding activation values from one or more layers of the trained model.

    Detecting backdoor attacks using exclusionary reclassification

    公开(公告)号:US11538236B2

    公开(公告)日:2022-12-27

    申请号:US16571318

    申请日:2019-09-16

    摘要: Embodiments relate to a system, program product, and method for processing an untrusted data set to automatically determine which data points there are poisonous. A neural network is trained network using potentially poisoned training data. Each of the training data points is classified using the network to retain the activations of at least one hidden layer, and segment those activations by the label of corresponding training data. Clustering is applied to the retained activations of each segment, and a clustering assessment is conducted to remove an identified cluster from the data set, form a new training set, and train a second neural model with the new training set. The removed cluster and corresponding data are applied to the trained second neural model to analyze and classify data in the removed cluster as either legitimate or poisonous.

    ADVERSARIAL INTERPOLATION BACKDOOR DETECTION

    公开(公告)号:US20220114259A1

    公开(公告)日:2022-04-14

    申请号:US17068853

    申请日:2020-10-13

    IPC分类号: G06F21/56 G06N20/00 G06N5/04

    摘要: One or more computer processors determine a tolerance value, and a norm value associated with an untrusted model and an adversarial training method. The one or more computer processors generate a plurality of interpolated adversarial images ranging between a pair of images utilizing the adversarial training method, wherein each image in the pair of images is from a different class. The one or more computer processors detect a backdoor associated with the untrusted model utilizing the generated plurality of interpolated adversarial images. The one or more computer processors harden the untrusted model by training the untrusted model with the generated plurality of interpolated adversarial images.

    DETECTING POISONING ATTACKS ON NEURAL NETWORKS BY ACTIVATION CLUSTERING

    公开(公告)号:US20200050945A1

    公开(公告)日:2020-02-13

    申请号:US16057706

    申请日:2018-08-07

    IPC分类号: G06N3/08 G06N3/04

    摘要: One embodiment provides a method comprising receiving a training set comprising a plurality of data points, where a neural network is trained as a classifier based on the training set. The method further comprises, for each data point of the training set, classifying the data point with one of a plurality of classification labels using the trained neural network, and recording neuronal activations of a portion of the trained neural network in response to the data point. The method further comprises, for each classification label that a portion of the training set has been classified with, clustering a portion of all recorded neuronal activations that are in response to the portion of the training set, and detecting one or more poisonous data points in the portion of the training set based on the clustering.