-
公开(公告)号:US11625640B2
公开(公告)日:2023-04-11
申请号:US16152578
申请日:2018-10-05
发明人: Radek Starosta , Jan Brabec , Lukas Machlica
摘要: In one embodiment, a device distributes sets of training records from a training dataset for a random forest-based classifier among a plurality of workers of a computing cluster. Each worker determines whether it can perform a node split operation locally on the random forest by comparing a number of training records at the worker to a predefined threshold. The device determines, for each of the split operations, a data size and entropy measure of the training records to be used for the split operation. The device applies a machine learning-based predictor to the determined data size and entropy measure of the training records to be used for the split operation, to predict its completion time. The device coordinates the workers of the computing cluster to perform the node split operations in parallel such that the node split operations in a given batch are grouped based on their predicted completion times.
-
公开(公告)号:US20210152526A1
公开(公告)日:2021-05-20
申请号:US16686364
申请日:2019-11-18
发明人: Jan Kohout , Martin Kopp , Jan Brabec , Lukas Bajer
摘要: In one embodiment, a traffic analysis service obtains telemetry data regarding encrypted traffic associated with a particular device in the network, wherein the telemetry data comprises Transport Layer Security (TLS) features of the traffic. The service determines, based on the TLS features from the obtained telemetry data, a set of one or more TLS fingerprints for the traffic associated with the particular device. The service calculates a measure of similarity between the set of one or more TLS fingerprints for the traffic associated with the particular device and a set of one or more TLS fingerprints of traffic associated with a second device. The service determines, based on the measure of similarity, that the particular device and the second device were operated by the same user.
-
公开(公告)号:US10885469B2
公开(公告)日:2021-01-05
申请号:US15722412
申请日:2017-10-02
发明人: Jan Brabec , Lukas Machlica
摘要: In one embodiment, a device trains a machine learning-based malware classifier using a first randomly selected subset of samples from a training dataset. The classifier comprises a random decision forest. The device identifies, using at least a portion of the training dataset as input to the malware classifier, a set of misclassified samples from the training dataset that the malware classifier misclassifies. The device retrains the malware classifier using a second randomly selected subset of samples from the training dataset and the identified set of misclassified samples. The device adjusts prediction labels of individual leaves of the random decision forest of the retrained malware classifier based in part on decision changes in the forest that result from assessing the entire training dataset with the classifier. The device sends the malware classifier with the adjusted prediction labels for deployment into a network.
-
公开(公告)号:US10728271B2
公开(公告)日:2020-07-28
申请号:US16437417
申请日:2019-06-11
发明人: Jan Brabec , Lukas Machlica
摘要: In one embodiment, a computing device provides a feature vector as input to a random decision forest comprising a plurality of decision trees trained using a training dataset, each decision tree being configured to output a classification label prediction for the input feature vector. For each of the decision trees, the computing device determines a conditional probability of the decision tree based on a true classification label and the classification label prediction from the decision tree for the input feature vector. The computing device generates weightings for the classification label predictions from the decision trees based on the determined conditional probabilities. The computing device applies a final classification label to the feature vector based on the weightings for the classification label predictions from the decision trees.
-
公开(公告)号:US20240333733A1
公开(公告)日:2024-10-03
申请号:US18127501
申请日:2023-03-28
发明人: Jan Brabec , Radek Starosta
CPC分类号: H04L63/1425 , G06V10/82 , H04L63/1416 , H04L63/1441
摘要: In some aspects, the techniques described herein relate to a method for detecting malicious emails, the method including: receiving an email, wherein the email is associated with a markup payload; determining, based on the markup payload, text data associated with the email; determining, using the text data and a first machine learning model, a first representation of the email representing text associated with the email; rendering the email to generate image data that represents a rendering of the email; determining, using the image data and a second machine learning model, a second representation of the email that represents at least the rendering of the email; and determining a prediction for the email based on the first representation and the second representation, wherein the prediction represents whether the email is predicted to be malicious based on the first representation and the second representation.
-
公开(公告)号:US20220191244A1
公开(公告)日:2022-06-16
申请号:US17117942
申请日:2020-12-10
发明人: Tomas Komarek , Jan Brabec , Cenek Skarda
IPC分类号: H04L29/06
摘要: Inverse imbalance subspace searching techniques are used to detect potential malware among samples of network communication data. A large number of samples of network communication data, such as proxy log data and/or network flows, are received and analyzed by a malware detection system. A number of the samples are associated with known malware, while other unlabeled samples are either benign or may be associated with unknown malware. An inverse imbalance subspace search may be performed, in which the sample sets are divided into subsets based on random feature thresholds, and each subset is evaluated based on the ratio of known malware samples to unlabeled samples. Unlabeled samples within subsets having high malware sample ratios may be identified, aggregated, and processed as potential malware.
-
公开(公告)号:US11245675B2
公开(公告)日:2022-02-08
申请号:US16686364
申请日:2019-11-18
发明人: Jan Kohout , Martin Kopp , Jan Brabec , Lukas Bajer
摘要: In one embodiment, a traffic analysis service obtains telemetry data regarding encrypted traffic associated with a particular device in the network, wherein the telemetry data comprises Transport Layer Security (TLS) features of the traffic. The service determines, based on the TLS features from the obtained telemetry data, a set of one or more TLS fingerprints for the traffic associated with the particular device. The service calculates a measure of similarity between the set of one or more TLS fingerprints for the traffic associated with the particular device and a set of one or more TLS fingerprints of traffic associated with a second device. The service determines, based on the measure of similarity, that the particular device and the second device were operated by the same user.
-
公开(公告)号:US20240106836A1
公开(公告)日:2024-03-28
申请号:US18225517
申请日:2023-07-24
发明人: Petr Somol , Martin Kopp , Jan Kohout , Jan Brabec , Marc René Jacques Marie Dupont , Cenek Skarda , Lukas Bajer , Danila Khikhlukha
摘要: In one embodiment, a device obtains input features for a neural network-based model. The device pre-defines a set of neurons of the model to represent known behaviors associated with the input features. The device constrains weights for a plurality of outputs of the model. The device trains the neural network-based model using the constrained weights for the plurality of outputs of the model and by excluding the pre-defined set of neurons from updates during the training.
-
9.
公开(公告)号:US20230376836A1
公开(公告)日:2023-11-23
申请号:US17749740
申请日:2022-05-20
发明人: Tomas Komarek , Stepan Dvorak , Jan Brabec
CPC分类号: G06N20/00 , H04L63/1441
摘要: Techniques and architecture are described for converting tree structured data such as, for example, JavaScript Object Notation (JSON) data, into multiple feature vectors to train multiple instance learning (MIL) models for providing cybersecurity in networks. In particular, a data set is provided, wherein the data set comprises a sample configured as a hierarchal tree. The sample is converted into a set of path and value pairs, e.g., flattened into a set of path and value pairs, where the path is a sequence of field names and array indices encoding a position of a value. Each path and value pair of the set of path and value pairs is converted into a respective feature vector to form a set of feature vectors. The set of feature vectors is used to train a multiple instance learning (MIL) model, wherein each feature vector has a same, fixed length.
-
公开(公告)号:US11799904B2
公开(公告)日:2023-10-24
申请号:US17117942
申请日:2020-12-10
发明人: Tomas Komarek , Jan Brabec , Cenek Skarda
IPC分类号: H04L9/40
CPC分类号: H04L63/1466 , H04L63/1416 , H04L63/1425 , H04L63/1433 , H04L63/20
摘要: Inverse imbalance subspace searching techniques are used to detect potential malware among samples of network communication data. A large number of samples of network communication data, such as proxy log data and/or network flows, are received and analyzed by a malware detection system. A number of the samples are associated with known malware, while other unlabeled samples are either benign or may be associated with unknown malware. An inverse imbalance subspace search may be performed, in which the sample sets are divided into subsets based on random feature thresholds, and each subset is evaluated based on the ratio of known malware samples to unlabeled samples. Unlabeled samples within subsets having high malware sample ratios may be identified, aggregated, and processed as potential malware.
-
-
-
-
-
-
-
-
-