Methods and systems for generating a stable identifier for nodes likely including primary content within an information resource

    公开(公告)号:US09665617B1

    公开(公告)日:2017-05-30

    申请号:US14254349

    申请日:2014-04-16

    Applicant: Google Inc.

    Abstract: Systems and methods of generating a stable identifier for nodes likely to include primary content of an information resource are disclosed. A processor identifies, on an information resource, a plurality of content-related Document Object Model (DOM) nodes based on a primary content detection policy including one or more rules. The processor determines one or more container nodes containing one or more of the identified content-related DOM nodes. The processor generates, for each of the container nodes, one or more identifiers corresponding to the container node. The processor then determines, for each of the generated identifiers, one or more container nodes to which the identifier corresponds. The processor identifies, from the generated identifiers, a subset of the generated identifiers that correspond only to container nodes that contain the content-related DOM nodes and selects one of the identifiers of the subset as a stable identifier.

Patent Agency Ranking