Identifying spam using near-duplicate detection for text and images
摘要:
Embodiments described herein provide systems, methods, and computer storage media for detecting spam using by comparing hash values of content. In embodiments, hash values are generated based on the type of content and compared to other hash values in storage buckets. The similarity of content is determined by calculating the distance between two hash values and determining whether the distance exceeds a distance index. Counter values associated with hash values in storage are incremented when the distances between hash values exceed the distance index. Spam indications are communicated when the counter values for associated with hash values exceed a count threshold.
信息查询
0/0