-
公开(公告)号:US20240126827A1
公开(公告)日:2024-04-18
申请号:US18538584
申请日:2023-12-13
Applicant: Google LLC
Inventor: Ying Sheng , Yuchen Lin , Sandeep Tata , Nguyen Vo
IPC: G06F16/958 , G06F16/957 , G06F40/14
CPC classification number: G06F16/986 , G06F16/957 , G06F40/14
Abstract: Systems and methods for efficiently identifying and extracting machine-actionable structured data from web documents are provided. The technology employs neural network architectures which process the raw HTML content of a set of seed websites to create transferrable models regarding information of interest. These models can then be applied to the raw HTML of other websites to identify similar information of interest. Data can thus be extracted across multiple websites in a functional, structured form that allows it to be used further by a processing system.
-
公开(公告)号:US11886533B2
公开(公告)日:2024-01-30
申请号:US17792788
申请日:2020-01-29
Applicant: Google LLC
Inventor: Ying Sheng , Yuchen Lin , Sandeep Tata , Nguyen Vo
IPC: G06F16/958 , G06F16/957 , G06F40/14
CPC classification number: G06F16/986 , G06F16/957 , G06F40/14
Abstract: Systems and methods for efficiently identifying and extracting machine-actionable structured data from web documents are provided. The technology employs neural network architectures which process the raw HTML content of a set of seed websites to create transferable models regarding information of interest. These models can then be applied to the raw HTML of other websites to identify similar information of interest. Data can thus be extracted across multiple websites in a functional, structured form that allows it to be used further by a processing system.
-
公开(公告)号:US20230014465A1
公开(公告)日:2023-01-19
申请号:US17792788
申请日:2020-01-29
Applicant: Google LLC
Inventor: Ying Sheng , Yuchen Lin , Sandeep Tata , Nguyen Vo
IPC: G06F16/958 , G06F16/957 , G06F40/14
Abstract: Systems and methods for efficiently identifying and extracting machine-actionable structured data from web documents are provided. The technology employs neural network architectures which process the raw HTML content of a set of seed websites to create transferable models regarding information of interest. These models can then be applied to the raw HTML of other websites to identify similar information of interest. Data can thus be extracted across multiple websites in a functional, structured form that allows it to be used further by a processing system.
-
-