专利检索 ap:("Google LLC") AND inv:"Cho-Jui Hsieh" 第 1 页

1.

发明申请
DECREASING NEURAL NETWORK INFERENCE TIMES USING SOFTMAX APPROXIMATION 审中-公开

公开(公告)号：US20200104686A1

公开(公告)日：2020-04-02

申请号：US16586702

申请日：2019-09-27

申请人： Google LLC

发明人： Yang Li , Sanjiv Kumar , Pei-Hung Chen , Si Si , Cho-Jui Hsieh

IPC分类号： G06N3/04 , G06K9/62 , G06F17/18 , G06F17/16

摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for decreasing neural network inference times using softmax approximation. One of the methods includes maintaining data specifying a respective softmax weight vector for each output in a vocabulary of possible neural network outputs; receiving a neural network input; processing the neural network input using one or more initial neural network layers to generate a context vector for the neural network input; and generating an approximate score distribution over the vocabulary of possible neural network outputs for the neural network input, comprising: processing the context vector using a screening model configured to predict a proper subset of the vocabulary for the context input; and generating a respective logit for each output that is in the proper subset, comprising applying the softmax weight vector for the output to the context vector.

2.

发明授权
Decreasing neural network inference times using softmax approximation 审中-公开

公开(公告)号：US10671909B2

公开(公告)日：2020-06-02

申请号：US16586702

申请日：2019-09-27

申请人： Google LLC

发明人： Yang Li , Sanjiv Kumar , Pei-Hung Chen , Si Si , Cho-Jui Hsieh

IPC分类号： G06N3/04 , G06F17/16 , G06F17/18 , G06K9/62

摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for decreasing neural network inference times using softmax approximation. One of the methods includes maintaining data specifying a respective softmax weight vector for each output in a vocabulary of possible neural network outputs; receiving a neural network input; processing the neural network input using one or more initial neural network layers to generate a context vector for the neural network input; and generating an approximate score distribution over the vocabulary of possible neural network outputs for the neural network input, comprising: processing the context vector using a screening model configured to predict a proper subset of the vocabulary for the context input; and generating a respective logit for each output that is in the proper subset, comprising applying the softmax weight vector for the output to the context vector.