-
91.
公开(公告)号:US20210042604A1
公开(公告)日:2021-02-11
申请号:US17080656
申请日:2020-10-26
Applicant: salesforce.com, inc.
Inventor: Kazuma Hashimoto , Caiming Xiong , Richard SOCHER
IPC: G06N3/04 , G06N3/08 , G06F40/205 , G06F40/284 , G06F40/253 , G06F40/216 , G06N3/063 , G06F40/30 , G10L15/16 , G06F40/00 , G10L15/18 , G10L25/30
Abstract: The technology disclosed provides a so-called “joint many-task neural network model” to solve a variety of increasingly complex natural language processing (NLP) tasks using growing depth of layers in a single end-to-end model. The model is successively trained by considering linguistic hierarchies, directly connecting word representations to all model layers, explicitly using predictions in lower tasks, and applying a so-called “successive regularization” technique to prevent catastrophic forgetting. Three examples of lower level model layers are part-of-speech (POS) tagging layer, chunking layer, and dependency parsing layer. Two examples of higher level model layers are semantic relatedness layer and textual entailment layer. The model achieves the state-of-the-art results on chunking, dependency parsing, semantic relatedness and textual entailment.
-
公开(公告)号:US20200372339A1
公开(公告)日:2020-11-26
申请号:US16592474
申请日:2019-10-03
Applicant: salesforce.com, inc.
Inventor: Tong Che , Caiming Xiong
Abstract: Verification of discriminative models includes receiving an input; receiving a prediction from a discriminative model for the input; encoding, using an encoder, a latent variable based on the input; decoding, using a decoder, a reconstructed input based on the prediction and the latent variable; and determining, using an anomaly detection module, whether the prediction is reliable based on the input, the reconstructed input, and the latent variable. The encoder and the decoder are jointly trained to maximize an evidence lower bound of the encoder and the decoder. In some embodiments, the encoder and the decoder are further trained using a disentanglement constraint between the prediction and the latent variable. In some embodiments, the encoder and the decoder are further trained without using inputs that are out of a distribution of inputs used to train the discriminative model or that are adversarial to the discriminative model.
-
公开(公告)号:US10817650B2
公开(公告)日:2020-10-27
申请号:US15982841
申请日:2018-05-17
Applicant: salesforce.com, inc.
Inventor: Bryan McCann , Caiming Xiong , Richard Socher
IPC: G06F40/126 , G06N3/08 , G06N3/04 , G06F40/30 , G06F40/47 , G06F40/205 , G06F40/289 , G06F40/44 , G06F40/58
Abstract: A system is provided for natural language processing. In some embodiments, the system includes an encoder for generating context-specific word vectors for at least one input sequence of words. The encoder is pre-trained using training data for performing a first natural language processing task. A neural network performs a second natural language processing task on the at least one input sequence of words using the context-specific word vectors. The first natural language process task is different from the second natural language processing task and the neural network is separately trained from the encoder. In some embodiments, the first natural processing task can be machine translation, and the second natural processing task can be one of sentiment analysis, question classification, entailment classification, and question answering.
-
94.
公开(公告)号:US20200334334A1
公开(公告)日:2020-10-22
申请号:US16518905
申请日:2019-07-22
Applicant: salesforce.com, inc.
Inventor: Nitish Shirish Keskar , Bryan McCann , Richard Socher , Caiming Xiong
IPC: G06F17/27
Abstract: Systems and methods for unifying question answering and text classification via span extraction include a preprocessor for preparing a source text and an auxiliary text based on a task type of a natural language processing task, an encoder for receiving the source text and the auxiliary text from the preprocessor and generating an encoded representation of a combination of the source text and the auxiliary text, and a span-extractive decoder for receiving the encoded representation and identifying a span of text within the source text that is a result of the NLP task. The task type is one of entailment, classification, or regression. In some embodiments, the source text includes one or more of text received as input when the task type is entailment, a list of classifications when the task type is entailment or classification, or a list of similarity options when the task type is regression.
-
公开(公告)号:US20200302178A1
公开(公告)日:2020-09-24
申请号:US16394964
申请日:2019-04-25
Applicant: salesforce.com, inc.
Inventor: Mingfei Gao , Richard Socher , Caiming Xiong
Abstract: Embodiments described herein provide a two-stage online detection of action start system including a classification module and a localization module. The classification module generates a set of action scores corresponding to a first video frame from the video, based on the first video frame and video frames before the first video frames in the video. Each action score indicating a respective probability that the first video frame contains a respective action class. The localization module is coupled to the classification module for receiving the set of action scores from the classification module and generating an action-agnostic start probability that the first video frame contains an action start. A fusion component is coupled to the localization module and the localization module for generating, based on the set of action scores and the action-agnostic start probability, a set of action-specific start probabilities, each action-specific start probability corresponding to a start of an action belonging to the respective action class.
-
公开(公告)号:US10699060B2
公开(公告)日:2020-06-30
申请号:US16000638
申请日:2018-06-05
Applicant: salesforce.com, inc.
Inventor: Bryan McCann , Caiming Xiong , Richard Socher
IPC: G06F40/126 , G06N3/08 , G06N3/04 , G06F40/30 , G06F40/47 , G06F40/205 , G06F40/289 , G06F40/44 , G06F40/58
Abstract: A system includes a neural network for performing a first natural language processing task. The neural network includes a first rectifier linear unit capable of executing an activation function on a first input related to a first word sequence, and a second rectifier linear unit capable of executing an activation function on a second input related to a second word sequence. A first encoder is capable of receiving the result from the first rectifier linear unit and generating a first task specific representation relating to the first word sequence, and a second encoder is capable of receiving the result from the second rectifier linear unit and generating a second task specific representation relating to the second word sequence. A biattention mechanism is capable of computing, based on the first and second task specific representations, an interdependent representation related to the first and second word sequences. In some embodiments, the first natural processing task performed by the neural network is one of sentiment classification and entailment classification.
-
公开(公告)号:US20200065651A1
公开(公告)日:2020-02-27
申请号:US16664508
申请日:2019-10-25
Applicant: salesforce.com, inc.
Inventor: Stephen Joseph Merity , Caiming Xiong , James Bradbury , Richard Socher
Abstract: The technology disclosed provides a so-called “pointer sentinel mixture architecture” for neural network sequence models that has the ability to either reproduce a token from a recent context or produce a token from a predefined vocabulary. In one implementation, a pointer sentinel-LSTM architecture achieves state of the art language modeling performance of 70.9 perplexity on the Penn Treebank dataset, while using far fewer parameters than a standard softmax LSTM.
-
公开(公告)号:US10565493B2
公开(公告)日:2020-02-18
申请号:US15421016
申请日:2017-01-31
Applicant: salesforce.com, inc.
Inventor: Stephen Joseph Merity , Caiming Xiong , James Bradbury , Richard Socher
Abstract: The technology disclosed provides a so-called “pointer sentinel mixture architecture” for neural network sequence models that has the ability to either reproduce a token from a recent context or produce a token from a predefined vocabulary. In one implementation, a pointer sentinel-LSTM architecture achieves state of the art language modeling performance of 70.9 perplexity on the Penn Treebank dataset, while using far fewer parameters than a standard softmax LSTM.
-
公开(公告)号:US20190251431A1
公开(公告)日:2019-08-15
申请号:US15974075
申请日:2018-05-08
Applicant: salesforce.com, inc.
Inventor: Nitish Shirish Keskar , Bryan McCann , Caiming Xiong , Richard Socher
CPC classification number: G06N3/08 , G06F17/2785 , G06F17/2881 , G06N3/0445 , G06N3/0454 , G06N5/04
Abstract: Approaches for multitask learning as question answering include a method for training that includes receiving a plurality of training samples including training samples from a plurality of task types, presenting the training samples to a neural model to generate an answer, determining an error between the generated answer and the natural language ground truth answer for each training sample presented, and adjusting parameters of the neural model based on the error. Each of the training samples includes a natural language context, question, and ground truth answer. An order in which the training samples are presented to the neural model includes initially selecting the training samples according to a first training strategy and switching to selecting the training samples according to a second training strategy. In some embodiments the first training strategy is a sequential training strategy and the second training strategy is a joint training strategy.
-
公开(公告)号:US10282663B2
公开(公告)日:2019-05-07
申请号:US15237575
申请日:2016-08-15
Applicant: salesforce.com, inc.
Inventor: Richard Socher , Caiming Xiong , Kai Sheng Tai
Abstract: The technology disclosed uses a 3D deep convolutional neural network architecture (DCNNA) equipped with so-called subnetwork modules which perform dimensionality reduction operations on 3D radiological volume before the 3D radiological volume is subjected to computationally expensive operations. Also, the subnetworks convolve 3D data at multiple scales by subjecting the 3D data to parallel processing by different 3D convolutional layer paths. Such multi-scale operations are computationally cheaper than the traditional CNNs that perform serial convolutions. In addition, performance of the subnetworks is further improved through 3D batch normalization (BN) that normalizes the 3D input fed to the subnetworks, which in turn increases learning rates of the 3D DCNNA. After several layers of 3D convolution and 3D sub-sampling with 3D across a series of subnetwork modules, a feature map with reduced vertical dimensionality is generated from the 3D radiological volume and fed into one or more fully connected layers.
-
-
-
-
-
-
-
-
-