-
公开(公告)号:US20240062560A1
公开(公告)日:2024-02-22
申请号:US17901617
申请日:2022-09-01
Applicant: Google LLC
Inventor: Shangbang Long , Siyang Qin , Dmitry Panteleev , Alessandro Bissacco , Yasuhisa Fujii , Michail Raptis
IPC: G06V20/62 , G06V30/414 , G06V10/82 , G06V30/14
CPC classification number: G06V20/63 , G06V30/414 , G06V10/82 , G06V30/1448
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for jointly performing text detection and layout analysis. In one aspect, a method comprises processing the image and a set of object queries to generate an encoded representation of the image and an encoded representation of the set of object queries; processing the encoded representation of the image and the encoded representation of the set of object queries to generate a set of text detection masks; processing the encoded representation of the set of object queries to generate layout relevance measures; processing the encoded representation of the set of object queries to generate textness scores for the text detection masks; generating a text detection output that defines respective areas of the image that include text items; and generating a layout analysis output that defines clusters of respective areas of the image identified by the text detection masks.