-
公开(公告)号:US12118294B2
公开(公告)日:2024-10-15
申请号:US18309857
申请日:2023-05-01
Applicant: Open Text Corporation
Inventor: David Comeau , Jeffrey Williams , Evgeny Kolesnikov , Michael Itkin , June Qiang , James Relunia , Brian Sue
IPC: G06F40/16 , G06F40/154 , G06N20/00 , G06V30/413
CPC classification number: G06F40/16 , G06F40/154 , G06N20/00 , G06V30/413
Abstract: Systems, methods, and products for auto tagging structured PDF documents that do not have accessibility tags. In one embodiment, structured PDF documents having accessibility tags are first parsed and analyzed to organize the visual components of the documents. The relationships of the identified objects to DOM elements (e.g., tags) are determined, and the objects and related DOM elements are stored in training files. The training files are used to train various classifiers. Untagged PDF documents are then parsed to identify included visual objects, and the classifiers are used to determine DOM elements that should be associated with visual objects identified in the untagged PDF documents. This information is used to construct a DOM structure corresponding to each untagged document. A new PDF is then generated corresponding to each untagged document using the generated DOM structure and visual object information.
-
2.
公开(公告)号:US20230315974A1
公开(公告)日:2023-10-05
申请号:US18309857
申请日:2023-05-01
Applicant: Open Text Corporation
Inventor: David Comeau , Jeffrey Williams , Evgeny Kolesnikov , Michael Itkin , June Qiang , James Relunia , Brian Sue
IPC: G06F40/16 , G06F40/154 , G06N20/00 , G06V30/413
CPC classification number: G06F40/16 , G06F40/154 , G06N20/00 , G06V30/413
Abstract: Systems, methods, and products for auto tagging structured PDF documents that do not have accessibility tags. In one embodiment, structured PDF documents having accessibility tags are first parsed and analyzed to organize the visual components of the documents. The relationships of the identified objects to DOM elements (e.g., tags) are determined, and the objects and related DOM elements are stored in training files. The training files are used to train various classifiers. Untagged PDF documents are then parsed to identify included visual objects, and the classifiers are used to determine DOM elements that should be associated with visual objects identified in the untagged PDF documents. This information is used to construct a DOM structure corresponding to each untagged document. A new PDF is then generated corresponding to each untagged document using the generated DOM structure and visual object information.
-
公开(公告)号:US11675970B2
公开(公告)日:2023-06-13
申请号:US17174686
申请日:2021-02-12
Applicant: Open Text Corporation
Inventor: David Comeau , Jeffrey Williams , Evgeny Kolesnikov , Michael Itkin , June Qiang , James Relunia , Brian Sue
IPC: G06F40/16 , G06F40/154 , G06N20/00 , G06V30/413
CPC classification number: G06F40/16 , G06F40/154 , G06N20/00 , G06V30/413
Abstract: Systems, methods, and products for auto tagging structured PDF documents that do not have accessibility tags. In one embodiment, structured PDF documents having accessibility tags are first parsed and analyzed to organize the visual components of the documents. The relationships of the identified objects to DOM elements (e.g., tags) are determined, and the objects and related DOM elements are stored in training files. The training files are used to train various classifiers. Untagged PDF documents are then parsed to identify included visual objects, and the classifiers are used to determine DOM elements that should be associated with visual objects identified in the untagged PDF documents. This information is used to construct a DOM structure corresponding to each untagged document. A new PDF is then generated corresponding to each untagged document using the generated DOM structure and visual object information.
-
公开(公告)号:US20210271805A1
公开(公告)日:2021-09-02
申请号:US17174686
申请日:2021-02-12
Applicant: Open Text Corporation
Inventor: David Comeau , Jeffrey Williams , Evgeny Kolesnikov , Michael Itkin , June Qiang , James Relunia , Brian Sue
IPC: G06F40/16 , G06N20/00 , G06F40/154
Abstract: Systems, methods, and products for auto tagging structured PDF documents that do not have accessibility tags. In one embodiment, structured PDF documents having accessibility tags are first parsed and analyzed to organize the visual components of the documents. The relationships of the identified objects to DOM elements (e.g., tags) are determined, and the objects and related DOM elements are stored in training files. The training files are used to train various classifiers. Untagged PDF documents are then parsed to identify included visual objects, and the classifiers are used to determine DOM elements that should be associated with visual objects identified in the untagged PDF documents. This information is used to construct a DOM structure corresponding to each untagged document. A new PDF is then generated corresponding to each untagged document using the generated DOM structure and visual object information.
-
-
-