-
公开(公告)号:US11487963B2
公开(公告)日:2022-11-01
申请号:US16571321
申请日:2019-09-16
摘要: Embodiments relate to a system, program product, and method for automatically determining which activation data points in a neural model have been poisoned to erroneously indicate association with a particular label or labels. A neural network is trained network using potentially poisoned training data. Each of the training data points is classified using the network to retain the activations of the last hidden layer, and segment those activations by the label of corresponding training data. Clustering is applied to the retained activations of each segment, and a cluster assessment is conducted for each cluster associated with each label to distinguish clusters with potentially poisoned activations from clusters populated with legitimate activations. The assessment includes analyzing, for each cluster, a distance of a median of the activations therein to medians of the activations in the labels.
-
公开(公告)号:US20210081708A1
公开(公告)日:2021-03-18
申请号:US16571321
申请日:2019-09-16
摘要: Embodiments relate to a system, program product, and method for automatically determining which activation data points in a neural model have been poisoned to erroneously indicate association with a particular label or labels. A neural network is trained network using potentially poisoned training data. Each of the training data points is classified using the network to retain the activations of the last hidden layer, and segment those activations by the label of corresponding training data. Clustering is applied to the retained activations of each segment, and a cluster assessment is conducted for each cluster associated with each label to distinguish clusters with potentially poisoned activations from clusters populated with legitimate activations. The assessment includes analyzing, for each cluster, a distance of a median of the activations therein to medians of the activations in the labels.
-
公开(公告)号:US20200019821A1
公开(公告)日:2020-01-16
申请号:US16031953
申请日:2018-07-10
摘要: Computer-implemented methods, program products, and systems for provenance-based defense against poison attacks are disclosed. In one approach, a method includes: receiving observations and corresponding provenance data from data sources; determining whether the observations are poisoned based on the corresponding provenance data; and removing the poisoned observation(s) from a final training dataset used to train a final prediction model. Another implementation involves provenance-based defense against poison attacks in a fully untrusted data environment. Untrusted data points are grouped according to provenance signature, and the groups are used to train learning algorithms and generate complete and filtered prediction models. The results of applying the prediction models to an evaluation dataset are compared, and poisoned data points identified where the performance of the filtered prediction model exceeds the performance of the complete prediction model. Poisoned data points are removed from the set to generate a final prediction model.
-
公开(公告)号:US11601468B2
公开(公告)日:2023-03-07
申请号:US16451110
申请日:2019-06-25
发明人: Nathalie Baracaldo Angel , Yi Zhou , Bryant Chen , Ali Anwar , Heiko H. Ludwig
摘要: Systems, computer-implemented methods, and computer program products that can facilitate detection of an adversarial backdoor attack on a trained model at inference time are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a log component that records predictions and corresponding activation values generated by a trained model based on inference requests. The computer executable components can further comprise an analysis component that employs a model at an inference time to detect a backdoor trigger request based on the predictions and the corresponding activation values. In some embodiments, the log component records the predictions and the corresponding activation values from one or more layers of the trained model.
-
公开(公告)号:US11188789B2
公开(公告)日:2021-11-30
申请号:US16057706
申请日:2018-08-07
发明人: Bryant Chen , Wilka Carvalho , Heiko H. Ludwig , Ian Michael Molloy , Taesung Lee , Jialong Zhang , Benjamin J. Edwards
摘要: One embodiment provides a method comprising receiving a training set comprising a plurality of data points, where a neural network is trained as a classifier based on the training set. The method further comprises, for each data point of the training set, classifying the data point with one of a plurality of classification labels using the trained neural network, and recording neuronal activations of a portion of the trained neural network in response to the data point. The method further comprises, for each classification label that a portion of the training set has been classified with, clustering a portion of all recorded neuronal activations that are in response to the portion of the training set, and detecting one or more poisonous data points in the portion of the training set based on the clustering.
-
公开(公告)号:US11645515B2
公开(公告)日:2023-05-09
申请号:US16571323
申请日:2019-09-16
IPC分类号: G06G7/00 , G06N3/08 , G06N20/00 , G06F18/23 , G06F18/24 , G06V10/762 , G06V10/771 , G06V10/776
CPC分类号: G06N3/08 , G06F18/23 , G06F18/24 , G06N20/00 , G06V10/762 , G06V10/771 , G06V10/776
摘要: Embodiments relate to a system, program product, and method for automatically determining which activation data points in a neural model have been poisoned to erroneously indicate association with a particular label or labels. A neural network is trained using potentially poisoned training data. Each of the training data points is classified using the network to retain the activations of the last hidden layer, and segment those activations by the label of corresponding training data. Clustering is applied to the retained activations of each segment, and a cluster assessment is conducted for each cluster associated with each label to distinguish clusters with potentially poisoned activations from clusters populated with legitimate activations. The assessment includes executing a set of analyses and integrating the results of the analyses into a determination as to whether a training data set is poisonous based on determining if resultant activation clusters are poisoned.
-
公开(公告)号:US11538236B2
公开(公告)日:2022-12-27
申请号:US16571318
申请日:2019-09-16
IPC分类号: G06V10/774 , G06K9/62 , G06N3/04 , G06N3/08 , G06V10/764 , G06V10/762
摘要: Embodiments relate to a system, program product, and method for processing an untrusted data set to automatically determine which data points there are poisonous. A neural network is trained network using potentially poisoned training data. Each of the training data points is classified using the network to retain the activations of at least one hidden layer, and segment those activations by the label of corresponding training data. Clustering is applied to the retained activations of each segment, and a clustering assessment is conducted to remove an identified cluster from the data set, form a new training set, and train a second neural model with the new training set. The removed cluster and corresponding data are applied to the trained second neural model to analyze and classify data in the removed cluster as either legitimate or poisonous.
-
公开(公告)号:US20220114259A1
公开(公告)日:2022-04-14
申请号:US17068853
申请日:2020-10-13
发明人: Heiko H. Ludwig , Ebube Chuba , Bryant Chen , Benjamin James Edwards , Taesung Lee , Ian Michael Molloy
摘要: One or more computer processors determine a tolerance value, and a norm value associated with an untrusted model and an adversarial training method. The one or more computer processors generate a plurality of interpolated adversarial images ranging between a pair of images utilizing the adversarial training method, wherein each image in the pair of images is from a different class. The one or more computer processors detect a backdoor associated with the untrusted model utilizing the generated plurality of interpolated adversarial images. The one or more computer processors harden the untrusted model by training the untrusted model with the generated plurality of interpolated adversarial images.
-
公开(公告)号:US20200050945A1
公开(公告)日:2020-02-13
申请号:US16057706
申请日:2018-08-07
发明人: Bryant Chen , Wilka Carvalho , Heiko H. Ludwig , Ian Michael Molloy , Taesung Lee , Jialong Zhang , Benjamin J. Edwards
摘要: One embodiment provides a method comprising receiving a training set comprising a plurality of data points, where a neural network is trained as a classifier based on the training set. The method further comprises, for each data point of the training set, classifying the data point with one of a plurality of classification labels using the trained neural network, and recording neuronal activations of a portion of the trained neural network in response to the data point. The method further comprises, for each classification label that a portion of the training set has been classified with, clustering a portion of all recorded neuronal activations that are in response to the portion of the training set, and detecting one or more poisonous data points in the portion of the training set based on the clustering.
-
公开(公告)号:US12019747B2
公开(公告)日:2024-06-25
申请号:US17068853
申请日:2020-10-13
发明人: Heiko H. Ludwig , Ebube Chuba , Bryant Chen , Benjamin James Edwards , Taesung Lee , Ian Michael Molloy
CPC分类号: G06F21/566 , G06N5/04 , G06N20/00 , G06F2221/034
摘要: One or more computer processors determine a tolerance value, and a norm value associated with an untrusted model and an adversarial training method. The one or more computer processors generate a plurality of interpolated adversarial images ranging between a pair of images utilizing the adversarial training method, wherein each image in the pair of images is from a different class. The one or more computer processors detect a backdoor associated with the untrusted model utilizing the generated plurality of interpolated adversarial images. The one or more computer processors harden the untrusted model by training the untrusted model with the generated plurality of interpolated adversarial images.
-
-
-
-
-
-
-
-
-