-
公开(公告)号:US20210089882A1
公开(公告)日:2021-03-25
申请号:US16658399
申请日:2019-10-21
Applicant: salesforce.com, inc.
Inventor: Lichao SUN , Jia LI , Caiming XIONG , Yingbo ZHOU
Abstract: Systems and methods are provided for near-zero-cost (NZC) query framework or approach for differentially private deep learning. To protect the privacy of training data during learning, the near-zero-cost query framework transfers knowledge from an ensemble of teacher models trained on partitions of the data to a student model. Privacy guarantees may be understood intuitively and expressed rigorously in terms of differential privacy. Other features are also provided.
-
公开(公告)号:US20200084465A1
公开(公告)日:2020-03-12
申请号:US16687405
申请日:2019-11-18
Applicant: Salesforce.com, inc.
Inventor: Yingbo ZHOU , Luowei ZHOU , Caiming XIONG , Richard SOCHER
IPC: H04N19/46 , H04N19/132 , H04N19/126 , H04N19/33 , H04N21/81 , H04N19/187 , H04N19/60 , H04N19/44
Abstract: Systems and methods for dense captioning of a video include a multi-layer encoder stack configured to receive information extracted from a plurality of video frames, a proposal decoder coupled to the encoder stack and configured to receive one or more outputs from the encoder stack, a masking unit configured to mask the one or more outputs from the encoder stack according to one or more outputs from the proposal decoder, and a decoder stack coupled to the masking unit and configured to receive the masked one or more outputs from the encoder stack. Generating the dense captioning based on one or more outputs of the decoder stack. In some embodiments, the one or more outputs from the proposal decoder include a differentiable mask. In some embodiments, during training, error in the dense captioning is back propagated to the decoder stack, the encoder stack, and the proposal decoder.
-
公开(公告)号:US20200005765A1
公开(公告)日:2020-01-02
申请号:US16562257
申请日:2019-09-05
Applicant: salesforce.com, inc.
Inventor: Yingbo ZHOU , Caiming XIONG
Abstract: The disclosed technology teaches a deep end-to-end speech recognition model, including using multi-objective learning criteria to train a deep end-to-end speech recognition model on training data comprising speech samples temporally labeled with ground truth transcriptions. The multi-objective learning criteria updates model parameters of the model over one thousand to millions of backpropagation iterations by combining, at each iteration, a maximum likelihood objective function that modifies the model parameters to maximize a probability of outputting a correct transcription and a policy gradient function that modifies the model parameters to maximize a positive reward defined based on a non-differentiable performance metric which penalizes incorrect transcriptions in accordance with their conformity to corresponding ground truth transcriptions; and upon convergence after a final backpropagation iteration, persisting the modified model parameters learned by using the multi-objective learning criteria with the model to be applied to further end-to-end speech recognition.
-
公开(公告)号:US20190130896A1
公开(公告)日:2019-05-02
申请号:US15851579
申请日:2017-12-21
Applicant: salesforce.com, inc.
Inventor: Yingbo ZHOU , Caiming XIONG , Richard SOCHER
IPC: G10L15/06 , G10L15/24 , G10L13/033 , G10L13/04 , G10L15/20
Abstract: The disclosed technology teaches regularizing a deep end-to-end speech recognition model to reduce overfitting and improve generalization: synthesizing sample speech variations on original speech samples labelled with text transcriptions, and modifying a particular original speech sample to independently vary tempo and pitch of the original speech sample while retaining the labelled text transcription of the original speech sample, thereby producing multiple sample speech variations having multiple degrees of variation from the original speech sample. The disclosed technology includes training a deep end-to-end speech recognition model, on thousands to millions of original speech samples and the sample speech variations on the original speech samples, that outputs recognized text transcriptions corresponding to speech detected in the original speech samples and the sample speech variations. Additional sample speech variations include augmented volume, temporal alignment offsets and the addition of pseudo-random noise to the particular original speech sample.
-
-
-