-
公开(公告)号:US12189693B2
公开(公告)日:2025-01-07
申请号:US18345886
申请日:2023-06-30
Applicant: Open Text Corporation
Inventor: Lei Zhang , Chao Chen , Kun Zhao , Jingjing Liu , Ying Teng
IPC: G06F16/93 , G06F16/22 , G06F16/2455 , G06F16/2457
Abstract: A method for document similarity analysis. The method includes generating a reference document content identifier for a reference document, including identifying frequently occurring terms in reference document content, encoding each frequently occurring term in a term identifier and combining the term identifiers to form the reference document content identifier associated with the reference document. The method also includes obtaining at least one document similarity value by comparing the reference document content identifier to a set of archived document content identifiers stored in a document repository.
-
公开(公告)号:US11734364B2
公开(公告)日:2023-08-22
申请号:US16791628
申请日:2020-02-14
Applicant: OPEN TEXT CORPORATION
Inventor: Lei Zhang , Chao Chen , Kun Zhao , Jingjing Liu , Ying Teng
IPC: G06F16/93 , G06F16/22 , G06F16/2455 , G06F16/2457
CPC classification number: G06F16/93 , G06F16/2246 , G06F16/2455 , G06F16/24578
Abstract: A method for document similarity analysis. The method includes generating a reference document content identifier for a reference document, including identifying frequently occurring terms in reference document content, encoding each frequently occurring term in a term identifier and combining the term identifiers to form the reference document content identifier associated with the reference document. The method also includes obtaining at least one document similarity value by comparing the reference document content identifier to a set of archived document content identifiers stored in a document repository.
-
公开(公告)号:US20200183986A1
公开(公告)日:2020-06-11
申请号:US16791628
申请日:2020-02-14
Applicant: OPEN TEXT CORPORATION
Inventor: Lei Zhang , Chao Chen , Kun Zhao , Jingjing Liu , Ying Teng
IPC: G06F16/93 , G06F16/2457 , G06F16/2455 , G06F16/22
Abstract: A method for document similarity analysis. The method includes generating a reference document content identifier for a reference document, including identifying frequently occurring terms in reference document content, encoding each frequently occurring term in a term identifier and combining the term identifiers to form the reference document content identifier associated with the reference document. The method also includes obtaining at least one document similarity value by comparing the reference document content identifier to a set of archived document content identifiers stored in a document repository.
-
公开(公告)号:US11321384B2
公开(公告)日:2022-05-03
申请号:US15033309
申请日:2015-09-30
Applicant: Open Text Corporation
Inventor: Chao Chen , Kunwu Huang , Hongtao Dai , Jingjing Liu
IPC: G06F16/583 , G06F16/93 , G06F16/9038 , G06F40/129 , G06F40/166 , G06K9/00
Abstract: Ideogram character analysis includes partitioning an original ideogram character into strokes, and mapping each stroke to a corresponding stroke identifier (id) to create an original stroke id sequence that includes stroke identifiers. A candidate ideogram character that has a candidate stroke id sequence within a threshold distance to the original stroke id sequence is selected. One or more embodiments may create a new phrase by replacing the original ideogram character with the candidate ideogram character in a search phrase. One or more embodiments perform a search using the search phrase and the new phrase to obtain a result, and present the result. One or more embodiments may replace an original ideogram character in a character recognized document with the candidate ideogram character and store the character recognized document.
-
公开(公告)号:US10572544B1
公开(公告)日:2020-02-25
申请号:US14968421
申请日:2015-12-14
Applicant: Open Text Corporation
Inventor: Lei Zhang , Chao Chen , Kun Zhao , Jingjing Liu , Ying Teng
IPC: G06F16/93 , G06F16/22 , G06F16/2455 , G06F16/2457
Abstract: A method for document similarity analysis. The method includes generating a reference document content identifier for a reference document, including identifying frequently occurring terms in reference document content, encoding each frequently occurring term in a term identifier and combining the term identifiers to form the reference document content identifier associated with the reference document. The method also includes obtaining at least one document similarity value by comparing the reference document content identifier to a set of archived document content identifiers stored in a document repository.
-
公开(公告)号:US20250124079A1
公开(公告)日:2025-04-17
申请号:US18929730
申请日:2024-10-29
Applicant: Open Text Corporation
Inventor: Chao Chen , Kunwu Huang , Hongtao Dai , Jingjing Liu
IPC: G06F16/583 , G06F16/9038 , G06F16/93 , G06F40/129 , G06F40/166 , G06V10/20 , G06V10/70 , G06V10/98 , G06V30/28 , G06V30/32
Abstract: Ideogram character analysis includes partitioning an original ideogram character into strokes and mapping each stroke to a corresponding stroke identifier (id) to create an original stroke id sequence that includes stroke identifiers. A candidate ideogram character that has a candidate stroke id sequence within a threshold distance to the original stroke id sequence is selected. One or more embodiments may create a new phrase by replacing the original ideogram character with the candidate ideogram character in a search phrase. One or more embodiments perform a search using the search phrase and the new phrase to obtain a result and present the result. One or more embodiments may replace an original ideogram character in a character recognized document with the candidate ideogram character and store the character recognized document.
-
公开(公告)号:US12153624B2
公开(公告)日:2024-11-26
申请号:US17713074
申请日:2022-04-04
Applicant: Open Text Corporation
Inventor: Chao Chen , Kunwu Huang , Hongtao Dai , Jingjing Liu
IPC: G06F16/583 , G06F16/9038 , G06F16/93 , G06F40/129 , G06F40/166 , G06V10/20 , G06V10/70 , G06V10/98 , G06V30/28 , G06V30/32
Abstract: Ideogram character analysis includes partitioning an original ideogram character into strokes and mapping each stroke to a corresponding stroke identifier (id) to create an original stroke id sequence that includes stroke identifiers. A candidate ideogram character that has a candidate stroke id sequence within a threshold distance to the original stroke id sequence is selected. One or more embodiments may create a new phrase by replacing the original ideogram character with the candidate ideogram character in a search phrase. One or more embodiments perform a search using the search phrase and the new phrase to obtain a result and present the result. One or more embodiments may replace an original ideogram character in a character recognized document with the candidate ideogram character and store the character recognized document.
-
公开(公告)号:US20230342403A1
公开(公告)日:2023-10-26
申请号:US18345886
申请日:2023-06-30
Applicant: Open Text Corporation
Inventor: Lei Zhang , Chao Chen , Kun Zhao , Jingjing Liu , Ying Teng
IPC: G06F16/93 , G06F16/22 , G06F16/2455 , G06F16/2457
CPC classification number: G06F16/93 , G06F16/2246 , G06F16/2455 , G06F16/24578
Abstract: A method for document similarity analysis. The method includes generating a reference document content identifier for a reference document, including identifying frequently occurring terms in reference document content, encoding each frequently occurring term in a term identifier and combining the term identifiers to form the reference document content identifier associated with the reference document. The method also includes obtaining at least one document similarity value by comparing the reference document content identifier to a set of archived document content identifiers stored in a document repository.
-
公开(公告)号:US20220222292A1
公开(公告)日:2022-07-14
申请号:US17713074
申请日:2022-04-04
Applicant: Open Text Corporation
Inventor: Chao Chen , Kunwu Huang , Hongtao Dai , Jingjing Liu
IPC: G06F16/583 , G06F16/93 , G06F16/9038 , G06F40/129 , G06F40/166 , G06V30/32
Abstract: Ideogram character analysis includes partitioning an original ideogram character into strokes and mapping each stroke to a corresponding stroke identifier (id) to create an original stroke id sequence that includes stroke identifiers. A candidate ideogram character that has a candidate stroke id sequence within a threshold distance to the original stroke id sequence is selected. One or more embodiments may create a new phrase by replacing the original ideogram character with the candidate ideogram character in a search phrase. One or more embodiments perform a search using the search phrase and the new phrase to obtain a result and present the result. One or more embodiments may replace an original ideogram character in a character recognized document with the candidate ideogram character and store the character recognized document.
-
-
-
-
-
-
-
-