MODEL TRAINING METHOD AND APPARATUS

    公开(公告)号:US20230035351A1

    公开(公告)日:2023-02-02

    申请号:US17950521

    申请日:2022-09-22

    IPC分类号: G06N3/08

    摘要: A model training method and apparatus is disclosed, where the model training method acquires a recognition result of a teacher model and a recognition result of a student model for an input sequence and trains the student model such that the recognition result of the teacher model and the recognition result of the student model are not distinguished from each other.

    LANGUAGE MODEL AND ELECTRONIC DEVICE INCLUDING THE SAME

    公开(公告)号:US20220319500A1

    公开(公告)日:2022-10-06

    申请号:US17425211

    申请日:2021-07-08

    IPC分类号: G10L15/16 G10L15/18 G10L15/06

    摘要: Disclosed is an electronic device including processor and memory operatively connected to the processor and storing language model. The electronic device may enter data into the language model, generate an embedding vector in the input embedding layer, add position information to the embedding vector in the positional encoding layer, branch the embedding vector based on domain information, normalize the branched embedding vectors, enter the normalized embedding vectors into the multi-head attention layer, enter output data of the multi-head attention layer into the first layer, normalize pieces of output data of the first layer, enter the normalized pieces of output data of the first layer into the feed-forward layer, enter output data of the feed-forward layer into the second layer and normalize pieces of output data of the second layer, and enter the normalized pieces of output data of the second layer into the linearization layer and the softmax layer to obtain result data. In addition, various embodiments as understood from the specification may be also possible.