PATH DROPOUT FOR NATURAL LANGUAGE PROCESSING
    49.
    发明公开

    公开(公告)号:US20230154455A1

    公开(公告)日:2023-05-18

    申请号:US17988125

    申请日:2022-11-16

    Abstract: Techniques are provided for improved training of a machine-learning model that includes multiple layers and is configured to process textual language input. The machine-learning model includes one or more blocks in which each block includes a multi-head self-attention network, a first connection for providing input to the multi-head self-attention network, and a second (residual) connection for providing the input to a normalization layer, bypassing the multi-head self-attention network. During training, the second connection is dropped out according to a dropout parameter. Additionally, or alternatively, an attention weight matrix is used for dropout by blocking diagonal entries in the attention weight matrix. As a result, the machine-learning model increasingly focuses on contextual information, which provides more accurate language processing results.

Patent Agency Ranking