DECREASING NEURAL NETWORK INFERENCE TIMES USING SOFTMAX APPROXIMATION

    公开(公告)号:US20200104686A1

    公开(公告)日:2020-04-02

    申请号:US16586702

    申请日:2019-09-27

    申请人: Google LLC

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for decreasing neural network inference times using softmax approximation. One of the methods includes maintaining data specifying a respective softmax weight vector for each output in a vocabulary of possible neural network outputs; receiving a neural network input; processing the neural network input using one or more initial neural network layers to generate a context vector for the neural network input; and generating an approximate score distribution over the vocabulary of possible neural network outputs for the neural network input, comprising: processing the context vector using a screening model configured to predict a proper subset of the vocabulary for the context input; and generating a respective logit for each output that is in the proper subset, comprising applying the softmax weight vector for the output to the context vector.

    Decreasing neural network inference times using softmax approximation

    公开(公告)号:US10671909B2

    公开(公告)日:2020-06-02

    申请号:US16586702

    申请日:2019-09-27

    申请人: Google LLC

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for decreasing neural network inference times using softmax approximation. One of the methods includes maintaining data specifying a respective softmax weight vector for each output in a vocabulary of possible neural network outputs; receiving a neural network input; processing the neural network input using one or more initial neural network layers to generate a context vector for the neural network input; and generating an approximate score distribution over the vocabulary of possible neural network outputs for the neural network input, comprising: processing the context vector using a screening model configured to predict a proper subset of the vocabulary for the context input; and generating a respective logit for each output that is in the proper subset, comprising applying the softmax weight vector for the output to the context vector.