Method and apparatus for training retrieval model, device and computer storage medium

    公开(公告)号:US11847150B2

    公开(公告)日:2023-12-19

    申请号:US17407320

    申请日:2021-08-20

    CPC classification number: G06F16/3347 G06F16/3344 G06N20/20

    Abstract: The present application discloses a method and apparatus for training a retrieval model, device and computer storage medium that relate to intelligent search and natural language processing technologies. An implementation includes: acquiring initial training data; performing a training operation using the initial training data to obtain an initial retrieval model; selecting texts with the correlation degrees with a query in the training data meeting a preset first requirement from candidate texts using the initial retrieval model; performing a training operation using the updated training data to obtain a first retrieval model; and selecting texts with the correlation degrees with the query in the training data meeting a preset second requirement from the candidate texts using the first retrieval model; and/or selecting texts with the correlation degrees with the query meeting a preset third requirement; and performing a training operation using the expanded training data to obtain a second retrieval model.

    Text recognition method, electronic device, and storage medium

    公开(公告)号:US11663404B2

    公开(公告)日:2023-05-30

    申请号:US17101789

    申请日:2020-11-23

    CPC classification number: G06F40/279 G06F40/166 G06F40/30 G06N20/00

    Abstract: The disclosure provides a text recognition method, an electronic device, and a storage medium. The method includes: obtaining N segments of a sample text; inputting each of the N segments into a preset initial language model in sequence, to obtain first text vector information corresponding to the N segments; inputting each of the N segments into the initial language model in sequence again, to obtain second text vector information corresponding to a currently input segment; in response to determining that the currently input segment has the mask, predicting the mask according to the second text vector information and the first text vector information to obtain a predicted word at a target position corresponding to the mask; training the initial language model according to an original word and the predicted word to generate a long text language model; and recognizing an input text through the long text language model.

    Method, electronic device, and storage medium for training text generation model

    公开(公告)号:US11574133B2

    公开(公告)日:2023-02-07

    申请号:US17133381

    申请日:2020-12-23

    Abstract: The disclosure may provide a method for obtaining a document layout, an electronic device, and a storage medium. The method may include: obtaining a plurality of pieces of first sample data; extracting structured information from each of the plurality of pieces of first sample data as target structured information corresponding to each of the plurality of pieces of first sample data; inputting the plurality of pieces of first sample data into an initial text generation model to generate predicted structured information corresponding to each of the plurality of pieces of first sample data; generating a first loss value based on a difference between the predicted structured information corresponding to each of the plurality of pieces of first sample data and the corresponding target structured information; and training a phrase generation ability of the initial text generation model based on the first loss value to generate the text generation model.

    METHOD AND APPARATUS FOR TRAINING MODELS IN MACHINE TRANSLATION, ELECTRONIC DEVICE AND STORAGE MEDIUM

    公开(公告)号:US20210390266A1

    公开(公告)日:2021-12-16

    申请号:US17200551

    申请日:2021-03-12

    Abstract: A method and apparatus for training models in machine translation, an electronic device and a storage medium are disclosed, which relates to the field of natural language processing technologies and the field of deep learning technologies. An implementation includes mining similar target sentences of a group of samples based on a parallel corpus using a machine translation model and a semantic similarity model, and creating a first training sample set; training the machine translation model with the first training sample set; mining a negative sample of each sample in the group of samples based on the parallel corpus using the machine translation model and the semantic similarity model, and creating a second training sample set; and training the semantic similarity model with the second sample training set. With the above-mentioned technical solution of the present application, by training the two models jointly, while the semantic similarity model is trained, the machine translation model may be optimized and nurtures the semantic similarity model, thus further improving the accuracy of the semantic similarity model.

    HUMAN-MACHINE INTERACTION
    38.
    发明申请

    公开(公告)号:US20210280190A1

    公开(公告)日:2021-09-09

    申请号:US17327706

    申请日:2021-05-22

    Abstract: A method and apparatus for human-machine interaction, a device, and a medium are provided. A specific implementation solution is: generating reply text of a reply to a received speech signal based on the speech signal; generating a reply speech signal corresponding to the reply text based on a mapping relationship between a speech signal unit and a text unit, the reply text including a group of text units; determining an identifier of an expression and/or action based on the reply text, the expression and/or action being presented by a virtual object; and generating an output video including the virtual object based on the reply speech signal and the identifier of the expression and/or action, the output video including a lip shape sequence determined based on the reply speech signal and to be presented by the virtual object.

Patent Agency Ranking