-
公开(公告)号:US20230070497A1
公开(公告)日:2023-03-09
申请号:US17589675
申请日:2022-01-31
Applicant: salesforce.com, inc.
Inventor: Jered McInerney , Wojciech Kryscinski , Nazneen Rajani
IPC: G06F40/166 , G06F40/40 , G06F40/20 , G06N5/02
Abstract: Embodiments described herein provide methods and systems for summarizing multiple documents. A system receives a plurality of documents and generates embeddings of the sentences from the plurality of documents. The embedded sentences are clustered in a representation space. Sentences from a reference summary are embedded and aligned with the closest cluster. Sentences from each cluster are summarized with the aligned reference sentences as a target. A loss is computed based on the summarized sentences and the aligned references, and the natural language processing model is updated based on the loss. Sentences may be masked from being used in the summarization by identifying sentences that are contradicted by other sentences within the plurality of documents.
-
公开(公告)号:US11699026B2
公开(公告)日:2023-07-11
申请号:US17589675
申请日:2022-01-31
Applicant: salesforce.com, inc.
Inventor: Jered McInerney , Wojciech Kryscinski , Nazneen Rajani
IPC: G06F17/00 , G06F40/166 , G06N5/022 , G06F40/20 , G06F40/40
CPC classification number: G06F40/166 , G06F40/20 , G06F40/40 , G06N5/022
Abstract: Embodiments described herein provide methods and systems for summarizing multiple documents. A system receives a plurality of documents and generates embeddings of the sentences from the plurality of documents. The embedded sentences are clustered in a representation space. Sentences from a reference summary are embedded and aligned with the closest cluster. Sentences from each cluster are summarized with the aligned reference sentences as a target. A loss is computed based on the summarized sentences and the aligned references, and the natural language processing model is updated based on the loss. Sentences may be masked from being used in the summarization by identifying sentences that are contradicted by other sentences within the plurality of documents.
-