-
公开(公告)号:US20230222285A1
公开(公告)日:2023-07-13
申请号:US17928984
申请日:2020-12-22
Applicant: Google LLC
Inventor: Mingyang Zhang , Cheng Li , Tao Chen , Spurthi Amba Hombaiah , Michael Bendersky , Marc Alexander Najork , Te-Lin Wu
IPC: G06F40/166 , G06F40/284 , G06V30/413 , G06F40/109
CPC classification number: G06F40/166 , G06F40/284 , G06V30/413 , G06F40/109
Abstract: Systems and methods for document processing that can process and understand the layout, text size, text style, and multimedia of a document can generate more accurate and informed document representations. The layout of a document paired with text size and style can indicate what portions of a document are possibly more important, and the understanding of that importance can help with understanding of the document. Systems and methods utilizing a hierarchical framework that processes the block-level and the document-level of a document can capitalize on these indicators to generate a better document representation.