Distance and method of indexing sandbox logs for mapping program behavior

    公开(公告)号:US10437986B2

    公开(公告)日:2019-10-08

    申请号:US15374670

    申请日:2016-12-09

    Inventor: Martin Vejmelka

    Abstract: Systems and methods index and search log files created after execution of binaries. A plurality of log files each have one or more sequences. An index tree is created for the log files. A first log file is placed into a bucket of the index tree according to the lengths of the one or more sequences of the first log file. Remaining logs files are placed the index tree according to their respective sequence lengths. Each log becomes a representative in the bucket or associated with a representative in the bucket. The index tree can be searched, where an incurred distance and a remaining distance is maintained during the search. Nodes are pruned based, at least in part, on the incurred distance and the remaining distance.

    Identification of mislabeled samples via phantom nodes in label propagation

    公开(公告)号:US10198576B2

    公开(公告)日:2019-02-05

    申请号:US15374865

    申请日:2016-12-09

    Inventor: Martin Vejmelka

    Abstract: Systems and method identify potentially mislabeled file samples. A graph is created from a plurality of sample files. The graph includes nodes associated with the sample files and behavior nodes associated with behavior signatures. Phantom nodes are created in the graph for those sample files having a known label. During a label propagation operation, a node receives data indicating a label distribution of a neighbor node in the graph. In response to determining that the current label for the node is known, a neighborhood opinion is determined for the associated phantom node, based at least in part on the label distribution of the neighboring nodes. After the label propagation operation has completed, differences between the neighborhood opinion and the current label distribution for nodes are determined. If the difference exceeds a threshold, then the current label may be incorrect.

    DISTANCE AND METHOD OF INDEXING SANDBOX LOGS FOR MAPPING PROGRAM BEHAVIOR

    公开(公告)号:US20170169214A1

    公开(公告)日:2017-06-15

    申请号:US15374670

    申请日:2016-12-09

    Inventor: Martin Vejmelka

    CPC classification number: G06F21/53 G06F16/2246 G06F16/2358 G06F21/552

    Abstract: Systems and methods index and search log files created after execution of binaries. A plurality of log files each have one or more sequences. An index tree is created for the log files. A first log file is placed into a bucket of the index tree according to the lengths of the one or more sequences of the first log file. Remaining logs files are placed the index tree according to their respective sequence lengths. Each log becomes a representative in the bucket or associated with a representative in the bucket. The index tree can be searched, where an incurred distance and a remaining distance is maintained during the search. Nodes are pruned based, at least in part, on the incurred distance and the remaining distance.

    IDENTIFICATION OF MISLABELED SAMPLES VIA PHANTOM NODES IN LABEL PROPAGATION

    公开(公告)号:US20170169215A1

    公开(公告)日:2017-06-15

    申请号:US15374865

    申请日:2016-12-09

    Inventor: Martin Vejmelka

    CPC classification number: G06F21/53 G06F21/564

    Abstract: Systems and method identify potentially mislabeled file samples. A graph is created from a plurality of sample files. The graph includes nodes associated with the sample files and behavior nodes associated with behavior signatures. Phantom nodes are created in the graph for those sample files having a known label. During a label propagation operation, a node receives data indicating a label distribution of a neighbor node in the graph. In response to determining that the current label for the node is known, a neighborhood opinion is determined for the associated phantom node, based at least in part on the label distribution of the neighboring nodes. After the label propagation operation has completed, differences between the neighborhood opinion and the current label distribution for nodes are determined. If the difference exceeds a threshold, then the current label may be incorrect.

Patent Agency Ranking