Semi-automatic rule generator
摘要:
A computer-implemented method for generating a first set of longest common sequences from a plurality of known malicious webpages, the first set of longest common sequences representing input data from which a human generates a set of regular expressions for detecting phishing webpages. There is included obtaining HTML source strings from the plurality of known malicious webpages and transforming the HTML source strings to reduce the number of at least one of stop words and repeated tags, thereby obtaining a set of transformed source strings. There is further included performing string alignment on the set of transformed source strings, thereby obtaining at least a scoring matrix. There is additionally included obtaining a second set of longest common sequences responsive to the performing the string alignment. There is further included filtering the second set of longest common sequences, thereby obtaining the first set of longest common sequences.
公开/授权文献
信息查询
0/0