Method and system for document similarity analysis

    公开(公告)号:US12189693B2

    公开(公告)日:2025-01-07

    申请号:US18345886

    申请日:2023-06-30

    Abstract: A method for document similarity analysis. The method includes generating a reference document content identifier for a reference document, including identifying frequently occurring terms in reference document content, encoding each frequently occurring term in a term identifier and combining the term identifiers to form the reference document content identifier associated with the reference document. The method also includes obtaining at least one document similarity value by comparing the reference document content identifier to a set of archived document content identifiers stored in a document repository.

    Method and system for assessing similarity of documents

    公开(公告)号:US10970536B2

    公开(公告)日:2021-04-06

    申请号:US16692005

    申请日:2019-11-22

    Abstract: Systems and methods for assessing similarity of documents are provided. Embodiments of the systems and methods include extracting a reference document text from a reference document, extracting an archived document text from an archived document, and quantifying the reference document and the archived document. The systems and methods may also include determining a document similarity value of the quantified reference document and the archived document. Determining the document similarity value includes calculating a set of vector similarity values for a set of combinations of a reference document text vector and an archived document text vector, and calculating the document similarity value, including a sum of the plurality of vector similarity values.

    METHOD AND SYSTEM FOR ASSESSING SIMILARITY OF DOCUMENTS

    公开(公告)号:US20200089947A1

    公开(公告)日:2020-03-19

    申请号:US16692005

    申请日:2019-11-22

    Abstract: Systems and methods for assessing similarity of documents are provided. Embodiments of the systems and methods include extracting a reference document text from a reference document, extracting an archived document text from an archived document, and quantifying the reference document and the archived document. The systems and methods may also include determining a document similarity value of the quantified reference document and the archived document. Determining the document similarity value includes calculating a set of vector similarity values for a set of combinations of a reference document text vector and an archived document text vector, and calculating the document similarity value, including a sum of the plurality of vector similarity values.

    METHOD AND SYSTEM FOR ASSESSING SIMILARITY OF DOCUMENTS

    公开(公告)号:US20210192204A1

    公开(公告)日:2021-06-24

    申请号:US17192498

    申请日:2021-03-04

    Abstract: Systems and methods for assessing similarity of documents are provided. Embodiments of the systems and methods include extracting a reference document text from a reference document, extracting an archived document text from an archived document, and quantifying the reference document and the archived document. The systems and methods may also include determining a document similarity value of the quantified reference document and the archived document. Determining the document similarity value includes calculating a set of vector similarity values for a set of combinations of a reference document text vector and an archived document text vector, and calculating the document similarity value, including a sum of the plurality of vector similarity values.

    METHOD AND SYSTEM FOR DOCUMENT SIMILARITY ANALYSIS

    公开(公告)号:US20200183986A1

    公开(公告)日:2020-06-11

    申请号:US16791628

    申请日:2020-02-14

    Abstract: A method for document similarity analysis. The method includes generating a reference document content identifier for a reference document, including identifying frequently occurring terms in reference document content, encoding each frequently occurring term in a term identifier and combining the term identifiers to form the reference document content identifier associated with the reference document. The method also includes obtaining at least one document similarity value by comparing the reference document content identifier to a set of archived document content identifiers stored in a document repository.

    Method and system for assessing similarity of documents

    公开(公告)号:US09852337B1

    公开(公告)日:2017-12-26

    申请号:US14871501

    申请日:2015-09-30

    CPC classification number: G06K9/00483 G06F17/30011 G06F17/30699 G06K9/00469

    Abstract: A method for assessing similarity of documents. The method includes extracting a reference document text from a reference document, extracting an archived document text from an archived document, and quantifying the reference document and the archived document. Quantifying the reference and archived documents includes tokenizing sentences of the reference document and archived document, respectively, and vectorizing the tokenized sentences to obtain a reference document text vector and an archived document text vector for each sentence of the reference and archived document, respectively. The method also includes determining a document similarity value of the quantified reference document and the quantified archived document. Determining the document similarity value includes calculating a set of vector similarity values for a set of combinations of a reference document text vector and an archived document text vector, and calculating the document similarity value, including a sum of the plurality of vector similarity values.

Patent Agency Ranking