-
公开(公告)号:US11087092B2
公开(公告)日:2021-08-10
申请号:US16399871
申请日:2019-04-30
Applicant: salesforce.com, inc.
Inventor: Stephan Zheng , Wojciech Kryscinski , Michael Shum , Richard Socher , Caiming Xiong
IPC: G06F40/30 , G06N3/08 , G06F40/205
Abstract: Approaches for determining a response for an agent in an undirected dialogue are provided. The approaches include a dialogue generating framework comprising an encoder neural network, a decoder neural network, and a language model neural network. The dialogue generating framework generates a sketch sentence response with at least one slot. The sketch sentence response is generated word by word and takes into account the undirected dialogue and agent traits of the agent making the response. The dialogue generating framework generates sentence responses by filling the slot with words from the agent traits. The dialogue generating framework ranks the sentence responses according to perplexity by passing the sentence responses through a language model and selects a final response which is a sentence response that has a lowest perplexity.
-
公开(公告)号:US20210124876A1
公开(公告)日:2021-04-29
申请号:US16750598
申请日:2020-01-23
Applicant: salesforce.com, inc.
Inventor: Wojciech Kryscinski , Bryan McCann
IPC: G06F40/30 , G06F40/268 , G06F16/34 , G06K9/62
Abstract: A weakly-supervised, model-based approach is provided for verifying or checking factual consistency and identifying conflicts between source documents and a generated summary. In some embodiments, an artificially generated training dataset is created by applying rule-based transformations to sentences sampled from one or more unannotated source documents of a dataset. Each of the resulting transformed sentences can be either semantically variant or invariant from the respective original sampled sentence, and labeled accordingly. In some embodiments, the generated training dataset is used to train a factual consistency checking model. The factual consistency checking model can classify whether a corresponding text summary is factually consistent with a source text document, and if so, may identify a span in the source text document that supports the corresponding text summary.
-
公开(公告)号:US20230065155A1
公开(公告)日:2023-03-02
申请号:US17572549
申请日:2022-01-10
Applicant: salesforce.com, inc.
Inventor: Tanya Goyal , Wojciech Kryscinski , Nazneen Rajani
IPC: G06F40/166 , G06N7/00
Abstract: The decoder network includes multiple decoders trained to generate different types of summaries. The lower layers of the multiple decoders are shared. The upper layers of the multiple decoders do not overlap. The multiple decoders generate probability distributions. A gating mechanism combines the probability distributions of the multiple decoders into a probability distribution of the decoder network. Words in the summary are selected based on the probability distribution of the decoder network.
-
公开(公告)号:US20220067302A1
公开(公告)日:2022-03-03
申请号:US17161327
申请日:2021-01-28
Applicant: salesforce.com, inc.
Inventor: Hiroaki Hayashi , Wojciech Kryscinski
IPC: G06F40/40 , G06F40/284 , G06N20/00 , G06N5/04
Abstract: Embodiments described herein provide natural language processing (NLP) systems and methods that provide a customized summarization of scientific or technical articles, which disentangles background information from new contributions, and summarizes the background information or the new information (or both) based on a user's preference. Specifically, the systems and methods utilize machine learning classifiers to classify portions of sentences within the article as containing background information or as containing a new contribution attributable to the article. The systems and methods then incorporate the background information in the summary or incorporate the new contribution in the summary and output the summary. In this way, the systems and methods can provide summaries of scientific literatures, which largely accelerates literature review in scientific fields.
-
公开(公告)号:US11790184B2
公开(公告)日:2023-10-17
申请号:US17161327
申请日:2021-01-28
Applicant: salesforce.com, inc.
Inventor: Hiroaki Hayashi , Wojciech Kryscinski
IPC: G06F40/40 , G06F40/284 , G06N5/04 , G06N20/00 , G06F40/30
CPC classification number: G06F40/40 , G06F40/284 , G06N5/04 , G06N20/00 , G06F40/30
Abstract: Embodiments described herein provide natural language processing (NLP) systems and methods that provide a customized summarization of scientific or technical articles, which disentangles background information from new contributions, and summarizes the background information or the new information (or both) based on a user's preference. Specifically, the systems and methods utilize machine learning classifiers to classify portions of sentences within the article as containing background information or as containing a new contribution attributable to the article. The systems and methods then incorporate the background information in the summary or incorporate the new contribution in the summary and output the summary. In this way, the systems and methods can provide summaries of scientific literatures, which largely accelerates literature review in scientific fields.
-
公开(公告)号:US11755637B2
公开(公告)日:2023-09-12
申请号:US17572549
申请日:2022-01-10
Applicant: salesforce.com, inc.
Inventor: Tanya Goyal , Wojciech Kryscinski , Nazneen Rajani
IPC: G06F16/34 , G06F40/166 , G06N3/02 , G06N7/01
CPC classification number: G06F16/345 , G06F40/166 , G06N3/02 , G06N7/01
Abstract: The decoder network includes multiple decoders trained to generate different types of summaries. The lower layers of the multiple decoders are shared. The upper layers of the multiple decoders do not overlap. The multiple decoders generate probability distributions. A gating mechanism combines the probability distributions of the multiple decoders into a probability distribution of the decoder network. Words in the summary are selected based on the probability distribution of the decoder network.
-
公开(公告)号:US11741142B2
公开(公告)日:2023-08-29
申请号:US17589522
申请日:2022-01-31
Applicant: salesforce.com, inc.
Inventor: Haopeng Zheng , Semih Yavuz , Wojciech Kryscinski , Kazuma Hashimoto , Yingbo Zhou
IPC: G06F16/34 , G06F40/166 , G06N20/00 , G06F40/117 , G06F40/279
CPC classification number: G06F16/345 , G06F40/166 , G06N20/00 , G06F40/117 , G06F40/279
Abstract: Embodiments described herein provide document summarization systems and methods that utilize fine-tuning of pre-trained abstractive summarization models to produce summaries that more faithfully track the content of the documents. Such abstractive summarization models may be pre-trained using a corpus consisting of pairs of articles and associated summaries. For each article-summary pair, a pseudo label or control code is generated and represents a faithfulness of the summary with respect to the article. The pre-trained model is then fine-tuned based on the article-summary pairs and the corresponding control codes. The resulting fine-tuned models then provide improved faithfulness in document summarization tasks.
-
公开(公告)号:US20230070497A1
公开(公告)日:2023-03-09
申请号:US17589675
申请日:2022-01-31
Applicant: salesforce.com, inc.
Inventor: Jered McInerney , Wojciech Kryscinski , Nazneen Rajani
IPC: G06F40/166 , G06F40/40 , G06F40/20 , G06N5/02
Abstract: Embodiments described herein provide methods and systems for summarizing multiple documents. A system receives a plurality of documents and generates embeddings of the sentences from the plurality of documents. The embedded sentences are clustered in a representation space. Sentences from a reference summary are embedded and aligned with the closest cluster. Sentences from each cluster are summarized with the aligned reference sentences as a target. A loss is computed based on the summarized sentences and the aligned references, and the natural language processing model is updated based on the loss. Sentences may be masked from being used in the summarization by identifying sentences that are contradicted by other sentences within the plurality of documents.
-
公开(公告)号:US11699026B2
公开(公告)日:2023-07-11
申请号:US17589675
申请日:2022-01-31
Applicant: salesforce.com, inc.
Inventor: Jered McInerney , Wojciech Kryscinski , Nazneen Rajani
IPC: G06F17/00 , G06F40/166 , G06N5/022 , G06F40/20 , G06F40/40
CPC classification number: G06F40/166 , G06F40/20 , G06F40/40 , G06N5/022
Abstract: Embodiments described herein provide methods and systems for summarizing multiple documents. A system receives a plurality of documents and generates embeddings of the sentences from the plurality of documents. The embedded sentences are clustered in a representation space. Sentences from a reference summary are embedded and aligned with the closest cluster. Sentences from each cluster are summarized with the aligned reference sentences as a target. A loss is computed based on the summarized sentences and the aligned references, and the natural language processing model is updated based on the loss. Sentences may be masked from being used in the summarization by identifying sentences that are contradicted by other sentences within the plurality of documents.
-
公开(公告)号:US20230054068A1
公开(公告)日:2023-02-23
申请号:US17589522
申请日:2022-01-31
Applicant: salesforce.com, inc.
Inventor: Haopeng Zheng , Semih Yavuz , Wojciech Kryscinski , Kazuma Hashimoto , Yingbo Zhou
IPC: G06F40/166 , G06F40/279 , G06F40/117 , G06N20/00
Abstract: Embodiments described herein provide document summarization systems and methods that utilize fine-tuning of pre-trained abstractive summarization models to produce summaries that more faithfully track the content of the documents. Such abstractive summarization models may be pre-trained using a corpus consisting of pairs of articles and associated summaries. For each article-summary pair, a pseudo label or control code is generated and represents a faithfulness of the summary with respect to the article. The pre-trained model is then fine-tuned based on the article-summary pairs and the corresponding control codes. The resulting fine-tuned models then provide improved faithfulness in document summarization tasks.
-
-
-
-
-
-
-
-
-