-
公开(公告)号:US10970536B2
公开(公告)日:2021-04-06
申请号:US16692005
申请日:2019-11-22
Applicant: Open Text Corporation
Inventor: Jeroen Mattijs van Rotterdam , Michael T Mohen , Chao Chen , Kun Zhao
IPC: G06K9/00 , G06F16/93 , G06F16/335
Abstract: Systems and methods for assessing similarity of documents are provided. Embodiments of the systems and methods include extracting a reference document text from a reference document, extracting an archived document text from an archived document, and quantifying the reference document and the archived document. The systems and methods may also include determining a document similarity value of the quantified reference document and the archived document. Determining the document similarity value includes calculating a set of vector similarity values for a set of combinations of a reference document text vector and an archived document text vector, and calculating the document similarity value, including a sum of the plurality of vector similarity values.
-
公开(公告)号:US09852337B1
公开(公告)日:2017-12-26
申请号:US14871501
申请日:2015-09-30
Applicant: Open Text Corporation
Inventor: Jeroen Mattijs van Rotterdam , Michael T Mohen , Chao Chen , Kun Zhao
CPC classification number: G06K9/00483 , G06F17/30011 , G06F17/30699 , G06K9/00469
Abstract: A method for assessing similarity of documents. The method includes extracting a reference document text from a reference document, extracting an archived document text from an archived document, and quantifying the reference document and the archived document. Quantifying the reference and archived documents includes tokenizing sentences of the reference document and archived document, respectively, and vectorizing the tokenized sentences to obtain a reference document text vector and an archived document text vector for each sentence of the reference and archived document, respectively. The method also includes determining a document similarity value of the quantified reference document and the quantified archived document. Determining the document similarity value includes calculating a set of vector similarity values for a set of combinations of a reference document text vector and an archived document text vector, and calculating the document similarity value, including a sum of the plurality of vector similarity values.
-
公开(公告)号:US20180068183A1
公开(公告)日:2018-03-08
申请号:US15811118
申请日:2017-11-13
Applicant: Open Text Corporation
Inventor: Jeroen Mattijs van Rotterdam , Michael T Mohen , Chao Chen , Kun Zhao
CPC classification number: G06K9/00483 , G06F16/335 , G06F16/93 , G06K9/00469
Abstract: Systems and methods for assessing similarity of documents are provided. Embodiments of the systems and methods include extracting a reference document text from a reference document, extracting an archived document text from an archived document, and quantifying the reference document and the archived document. The systems and methods may also include determining a document similarity value of the quantified reference document and the archived document. Determining the document similarity value includes calculating a set of vector similarity values for a set of combinations of a reference document text vector and an archived document text vector, and calculating the document similarity value, including a sum of the plurality of vector similarity values.
-
公开(公告)号:US11682226B2
公开(公告)日:2023-06-20
申请号:US17192498
申请日:2021-03-04
Applicant: Open Text Corporation
Inventor: Jeroen Mattijs van Rotterdam , Michael T Mohen , Chao Chen , Kun Zhao
IPC: G06V30/418 , G06F16/93 , G06F16/335 , G06V30/416 , G06V30/40
CPC classification number: G06V30/418 , G06F16/335 , G06F16/93 , G06V30/40 , G06V30/416 , G06V2201/10
Abstract: Systems and methods for assessing similarity of documents are provided. Embodiments of the systems and methods include extracting a reference document text from a reference document, extracting an archived document text from an archived document, and quantifying the reference document and the archived document. The systems and methods may also include determining a document similarity value of the quantified reference document and the archived document. Determining the document similarity value includes calculating a set of vector similarity values for a set of combinations of a reference document text vector and an archived document text vector, and calculating the document similarity value, including a sum of the plurality of vector similarity values.
-
公开(公告)号:US10521656B2
公开(公告)日:2019-12-31
申请号:US15811118
申请日:2017-11-13
Applicant: Open Text Corporation
Inventor: Jeroen Mattijs van Rotterdam , Michael T Mohen , Chao Chen , Kun Zhao
IPC: G06K9/68 , G06K9/00 , G06F16/93 , G06F16/335
Abstract: Systems and methods for assessing similarity of documents are provided. Embodiments of the systems and methods include extracting a reference document text from a reference document, extracting an archived document text from an archived document, and quantifying the reference document and the archived document. The systems and methods may also include determining a document similarity value of the quantified reference document and the archived document. Determining the document similarity value includes calculating a set of vector similarity values for a set of combinations of a reference document text vector and an archived document text vector, and calculating the document similarity value, including a sum of the plurality of vector similarity values.
-
-
-
-