Optimizing web crawling through web page pruning

    公开(公告)号:US09996619B2

    公开(公告)日:2018-06-12

    申请号:US15688167

    申请日:2017-08-28

    CPC classification number: G06F17/30864 G06F17/2235 G06F17/30958

    Abstract: Crawling computer-based documents by performing static analysis on a computer-based document to identify within the computer-based document one or more execution vectors, where each execution vector includes a computer program segment including a call to an entity that is external to the computer-based document, and one or more additional computer program segments whose execution precedes and leads ultimately to execution of the computer program segment that includes the call to the entity, and causing any of the computer program segments in any of the execution vectors to be executed during a crawling of the computer-based document, and any computer program segment within the computer-based document that is excluded from the execution vectors to be excluded from execution during the crawling of the computer-based document.

    CRAWLING COMPUTER-BASED OBJECTS
    3.
    发明申请
    CRAWLING COMPUTER-BASED OBJECTS 审中-公开
    基于计算机的对象

    公开(公告)号:US20160179793A1

    公开(公告)日:2016-06-23

    申请号:US15068799

    申请日:2016-03-14

    CPC classification number: G06F16/00 G06F16/951

    Abstract: Crawling computer-based objects by identifying a dependency between a first portion of a computer-based object set and a second portion of the computer-based object set, where the second portion is data-dependent on the first portion, and responsive to identifying the dependency, effecting a crawling of the first portion and thereafter a crawling of the second portion.

    Abstract translation: 通过识别基于计算机的对象集的第一部分和基于计算机的对象集的第二部分之间的依赖关系来爬行基于计算机的对象,其中第二部分是数据依赖于第一部分,并且响应于识别 依赖性,影响第一部分的爬行,此后爬行第二部分。

    Optimizing web crawling through web page pruning

    公开(公告)号:US09390177B2

    公开(公告)日:2016-07-12

    申请号:US14227456

    申请日:2014-03-27

    CPC classification number: G06F17/30864 G06F17/2235 G06F17/30958

    Abstract: Crawling computer-based documents by performing static analysis on a computer-based document to identify within the computer-based document one or more execution vectors, where each execution vector includes a computer program segment including a call to an entity that is external to the computer-based document, and one or more additional computer program segments whose execution precedes and leads ultimately to execution of the computer program segment that includes the call to the entity, and causing any of the computer program segments in any of the execution vectors to be executed during a crawling of the computer-based document, and any computer program segment within the computer-based document that is excluded from the execution vectors to be excluded from execution during the crawling of the computer-based document.

    OPTIMIZING WEB CRAWLING THROUGH WEB PAGE PRUNING
    5.
    发明申请
    OPTIMIZING WEB CRAWLING THROUGH WEB PAGE PRUNING 有权
    通过网页打印优化WEB抓取

    公开(公告)号:US20150278202A1

    公开(公告)日:2015-10-01

    申请号:US14227456

    申请日:2014-03-27

    CPC classification number: G06F17/30864 G06F17/2235 G06F17/30958

    Abstract: Crawling computer-based documents by performing static analysis on a computer-based document to identify within the computer-based document one or more execution vectors, where each execution vector includes a computer program segment including a call to an entity that is external to the computer-based document, and one or more additional computer program segments whose execution precedes and leads ultimately to execution of the computer program segment that includes the call to the entity, and causing any of the computer program segments in any of the execution vectors to be executed during a crawling of the computer-based document, and any computer program segment within the computer-based document that is excluded from the execution vectors to be excluded from execution during the crawling of the computer-based document.

    Abstract translation: 通过对基于计算机的文档进行静态分析来爬行基于计算机的文档,以在基于计算机的文档内识别一个或多个执行向量,其中每个执行向量包括计算机程序段,其包括对计算机外部的实体的调用 以及一个或多个附加的计算机程序段,其执行先于并且最终导致包括对该实体的调用的计算机程序段的执行,并且使任何执行向量中的任何计算机程序段被执行 在基于计算机的文档的爬行期间以及在执行向量中排除的计算机文档中的任何计算机程序段,以便在爬行基于计算机的文档期间被排除在执行之外。

    CRAWLING COMPUTER-BASED OBJECTS
    6.
    发明申请
    CRAWLING COMPUTER-BASED OBJECTS 审中-公开
    基于计算机的对象

    公开(公告)号:US20150095304A1

    公开(公告)日:2015-04-02

    申请号:US14040861

    申请日:2013-09-30

    CPC classification number: G06F16/00 G06F16/951

    Abstract: Crawling computer-based objects is implemented by identifying a dependency between a first portion of a computer-based object set and a second portion of the computer-based object set, where the second portion is data-dependent on the first portion, and responsive to identifying the dependency, effecting a crawling of the first portion and thereafter a crawling of the second portion.

    Abstract translation: 通过识别基于计算机的对象集的第一部分和基于计算机的对象集合的第二部分之间的依赖关系来实现爬行基于计算机的对象,其中第二部分是数据依赖于第一部分,并响应于 识别依赖关系,实现第一部分的爬行,此后爬行第二部分。

    Optimizing web crawling through web page pruning

    公开(公告)号:US09754033B2

    公开(公告)日:2017-09-05

    申请号:US15244427

    申请日:2016-08-23

    CPC classification number: G06F17/30864 G06F17/2235 G06F17/30958

    Abstract: Crawling computer-based documents by performing static analysis on a computer-based document to identify within the computer-based document one or more execution vectors, where each execution vector includes a computer program segment including a call to an entity that is external to the computer-based document, and one or more additional computer program segments whose execution precedes and leads ultimately to execution of the computer program segment that includes the call to the entity, and causing any of the computer program segments in any of the execution vectors to be executed during a crawling of the computer-based document, and any computer program segment within the computer-based document that is excluded from the execution vectors to be excluded from execution during the crawling of the computer-based document.

    OPTIMIZING WEB CRAWLING THROUGH WEB PAGE PRUNING

    公开(公告)号:US20160350423A1

    公开(公告)日:2016-12-01

    申请号:US15244427

    申请日:2016-08-23

    CPC classification number: G06F17/30864 G06F17/2235 G06F17/30958

    Abstract: Crawling computer-based documents by performing static analysis on a computer-based document to identify within the computer-based document one or more execution vectors, where each execution vector includes a computer program segment including a call to an entity that is external to the computer-based document, and one or more additional computer program segments whose execution precedes and leads ultimately to execution of the computer program segment that includes the call to the entity, and causing any of the computer program segments in any of the execution vectors to be executed during a crawling of the computer-based document, and any computer program segment within the computer-based document that is excluded from the execution vectors to be excluded from execution during the crawling of the computer-based document.

    Optimizing web crawling through web page pruning
    9.
    发明授权
    Optimizing web crawling through web page pruning 有权
    通过网页修剪来优化网页爬网

    公开(公告)号:US09495459B2

    公开(公告)日:2016-11-15

    申请号:US15068961

    申请日:2016-03-14

    CPC classification number: G06F17/30864 G06F17/2235 G06F17/30958

    Abstract: Crawling computer-based documents by performing static analysis on a computer-based document to identify within the computer-based document one or more execution vectors, where each execution vector includes a computer program segment including a call to an entity that is external to the computer-based document, and one or more additional computer program segments whose execution precedes and leads ultimately to execution of the computer program segment that includes the call to the entity, and causing any of the computer program segments in any of the execution vectors to be executed during a crawling of the computer-based document, and any computer program segment within the computer-based document that is excluded from the execution vectors to be excluded from execution during the crawling of the computer-based document.

    Abstract translation: 通过对基于计算机的文档进行静态分析来爬行基于计算机的文档,以在基于计算机的文档内识别一个或多个执行向量,其中每个执行向量包括计算机程序段,其包括对计算机外部的实体的调用 以及一个或多个附加的计算机程序段,其执行先于并且最终导致包括对该实体的调用的计算机程序段的执行,并且使任何执行向量中的任何计算机程序段被执行 在基于计算机的文档的爬行期间以及在执行向量中排除的计算机文档中的任何计算机程序段,以便在爬行基于计算机的文档期间被排除在执行之外。

    OPTIMIZING WEB CRAWLING THROUGH WEB PAGE PRUNING

    公开(公告)号:US20160179960A1

    公开(公告)日:2016-06-23

    申请号:US15068961

    申请日:2016-03-14

    CPC classification number: G06F17/30864 G06F17/2235 G06F17/30958

    Abstract: Crawling computer-based documents by performing static analysis on a computer-based document to identify within the computer-based document one or more execution vectors, where each execution vector includes a computer program segment including a call to an entity that is external to the computer-based document, and one or more additional computer program segments whose execution precedes and leads ultimately to execution of the computer program segment that includes the call to the entity, and causing any of the computer program segments in any of the execution vectors to be executed during a crawling of the computer-based document, and any computer program segment within the computer-based document that is excluded from the execution vectors to be excluded from execution during the crawling of the computer-based document.

Patent Agency Ranking