A Transferable Neural Architecture for Structured Data Extraction From Web Documents

    公开(公告)号:US20230014465A1

    公开(公告)日:2023-01-19

    申请号:US17792788

    申请日:2020-01-29

    Applicant: Google LLC

    Abstract: Systems and methods for efficiently identifying and extracting machine-actionable structured data from web documents are provided. The technology employs neural network architectures which process the raw HTML content of a set of seed websites to create transferable models regarding information of interest. These models can then be applied to the raw HTML of other websites to identify similar information of interest. Data can thus be extracted across multiple websites in a functional, structured form that allows it to be used further by a processing system.

Patent Agency Ranking