-
公开(公告)号:US12204847B2
公开(公告)日:2025-01-21
申请号:US17938572
申请日:2022-10-06
Applicant: Salesforce, Inc.
Inventor: Alexander R. Fabbri , Prafulla Kumar Choubey , Jesse Vig , Chien-Sheng Wu , Caiming Xiong
IPC: G06F17/00 , G06F40/166 , G06F40/284 , G06N20/00
Abstract: Embodiments described herein provide a method for text summarization. The method includes receiving a training dataset having at least an uncompressed text, a compressed text, and one or more information entities accompanying the compressed text. The method also includes generating, using a perturber model, a perturbed text with the one or more information entities being inserted into the compressed text. The method further includes training the perturber model based on a first training objective, and generating, using the trained perturber model, a perturbed summary in response to an input of a reference summary. The method further includes generating, via an editor model, a predicted summary by removing information from the perturbed summary conditioned on a source document of the reference summary, and training the editor model based on a second training objective.
-
公开(公告)号:US12050855B2
公开(公告)日:2024-07-30
申请号:US17749837
申请日:2022-05-20
Applicant: Salesforce, Inc.
Inventor: Wojciech Kryscinski , Alexander R. Fabbri , Jesse Vig
IPC: G06F16/00 , G06F16/332 , G06F16/34 , G06F40/166
CPC classification number: G06F40/166 , G06F16/3329 , G06F16/345
Abstract: Embodiments described herein provide a query-focused summarization model that employs a single or dual encoder model. A two-step approach may be adopted that first extracts parts of the source document and then synthesizes the extracted segments into a final summary. In another embodiment, an end-to-end approach may be adopted that splits the source document into overlapping segments, and then concatenates encodings into a single embedding sequence for the decoder to output a summary.
-
公开(公告)号:US20220277135A1
公开(公告)日:2022-09-01
申请号:US17749837
申请日:2022-05-20
Applicant: Salesforce, Inc.
Inventor: Wojciech Kryscinski , Alexander R. Fabbri , Jesse Vig
IPC: G06F40/166 , G06F16/34 , G06F16/332
Abstract: Embodiments described herein provide a query-focused summarization model that employs a single or dual encoder model. A two-step approach may be adopted that first extracts parts of the source document and then synthesizes the extracted segments into a final summary. In another embodiment, an end-to-end approach may be adopted that splits the source document into overlapping segments, and then concatenates encodings into a single embedding sequence for the decoder to output a summary.
-
公开(公告)号:US20240370640A1
公开(公告)日:2024-11-07
申请号:US18774375
申请日:2024-07-16
Applicant: Salesforce, Inc.
Inventor: Wojciech Kryscinski , Alexander R. Fabbri , Jesse Vig
IPC: G06F40/166 , G06F16/332 , G06F16/34
Abstract: Embodiments described herein provide a query-focused summarization model that employs a single or dual encoder model. A two-step approach may be adopted that first extracts parts of the source document and then synthesizes the extracted segments into a final summary. In another embodiment, an end-to-end approach may be adopted that splits the source document into overlapping segments, and then concatenates encodings into a single embedding sequence for the decoder to output a summary.
-
公开(公告)号:US20240249082A1
公开(公告)日:2024-07-25
申请号:US18460373
申请日:2023-09-01
Applicant: Salesforce, Inc.
Inventor: Philippe Laban , Jesse Vig , Wojciech Kryscinski
IPC: G06F40/40 , G06F40/166 , G06F40/284 , G06F40/30
CPC classification number: G06F40/40 , G06F40/166 , G06F40/284 , G06F40/30
Abstract: A method of training a text simplification model is provided. A training dataset including a first set of original textual samples and original revision histories and a second set of simplified textual samples and simplified revision histories is received via a data interface. A training pair including an original textual sample and corresponding original revision history from the first set and a counterpart simplified textual sample and corresponding simplified revision history from the second set are identified. An alignment label for a first revision in the corresponding original revision history and a second revision in the corresponding simplified revision history are generated using a neural network-based alignment model from a score. A revision category label for each of the first revision and second revision is generated using a neural network-based classification model. A neural network-based text simplification model is trained based on the updated training dataset.
-
公开(公告)号:US20230419017A1
公开(公告)日:2023-12-28
申请号:US17938572
申请日:2022-10-06
Applicant: Salesforce, Inc.
Inventor: Alexander R. Fabbri , Prafulla Kumar Choubey , Jesse Vig , Chien-Sheng Wu , Caiming Xiong
IPC: G06F40/166 , G06F40/284 , G06N20/00
CPC classification number: G06F40/166 , G06F40/284 , G06N20/00
Abstract: Embodiments described herein provide a method for text summarization. The method includes receiving a training dataset having at least an uncompressed text, a compressed text, and one or more information entities accompanying the compressed text. The method also includes generating, using a perturber model, a perturbed text with the one or more information entities being inserted into the compressed text. The method further includes training the perturber model based on a first training objective, and generating, using the trained perturber model, a perturbed summary in response to an input of a reference summary. The method further includes generating, via an editor model, a predicted summary by removing information from the perturbed summary conditioned on a source document of the reference summary, and training the editor model based on a second training objective.
-
7.
公开(公告)号:US20230376677A1
公开(公告)日:2023-11-23
申请号:US17880502
申请日:2022-08-03
Applicant: Salesforce, Inc.
Inventor: Prafulla Kumar Choubey , Alexander R. Fabbri , Jesse Vig , Chien-Sheng Wu , Wenhao Liu , Nazneen Rajani
IPC: G06F40/166 , G06N20/00
CPC classification number: G06F40/166 , G06N20/00
Abstract: Embodiments described herein provide a document summarization framework that employs an ensemble of summarization models, each of which is a modified version of a base summarization model to control hallucination. For example, a base summarization model may first be trained on a full training data set. The trained base summarization model is then fine-tuned using a first filtered subset of the training data which contains noisy data, resulting in an “anti-expert” model. The parameters of the anti-expert model are subtracted from the parameters of the trained base model to produce a final summarization model which yields robust factual performance.
-
-
-
-
-
-