MULTIPLE CHANNELS OF RASTERIZED CONTENT FOR PAGE DECOMPOSITION USING MACHINE LEARNING

    公开(公告)号:US20210117666A1

    公开(公告)日:2021-04-22

    申请号:US16655363

    申请日:2019-10-17

    Applicant: Adobe Inc.

    Abstract: Techniques are provided for identifying structural elements of a document. One Methodology includes generating a first channel of rasterized content by rasterizing a full page of the document and generating one or more additional channels of rasterized content from the page of the document by rasterizing one or more corresponding content types from the page of the document. Each of the one or more additional channels includes a specific type of content that is different from each of the other one or more additional channels. The methodology further includes inputting the first channel of rasterized content and the one or more additional channels of rasterized content into a machine learning (ML) model. The methodology continues with determining location and classification for each of a plurality of structural elements on the page of the document using the ML model.

Patent Agency Ranking