WEB CONTENT RELIABILITY CLASSIFICATION
摘要:
Technology described herein assigns a reliability score to web content, such as a web site or portion of a website. In one aspect, an output of the technology is a high reliability score and a low reliability score for a web content. The high reliability score represents conformance to high reliability sites, while the low reliability score represents conformance to low reliability sites. The high reliability score may be generated by first identifying high reliability online content within a compressed web graph. In a first iteration, the high reliability score of the seeds is used to score online content that is linked to the seed sites. At a high level, the more links that originate from high reliability sources, the higher the reliability score for the linked content. The low reliability score is similar, but uses outgoing links to low reliability sites instead of incoming links from high reliability sites.
公开/授权文献
信息查询
0/0