Method and apparatus for obtaining word vectors based on language model, device and storage medium

    公开(公告)号:US11526668B2

    公开(公告)日:2022-12-13

    申请号:US17095955

    申请日:2020-11-12

    Inventor: Zhen Li Yukun Li Yu Sun

    Abstract: A method and apparatus for obtaining word vectors based on a language model, a device and a storage medium are disclosed, which relates to the field of natural language processing technologies in artificial intelligence. An implementation includes inputting each of at least two first sample text language materials into the language model, and outputting a context vector of a first word mask in each first sample text language material via the language model; determining the word vector corresponding to each first word mask based on a first word vector parameter matrix, a second word vector parameter matrix and a fully connected matrix respectively; and training the language model and the fully connected matrix based on the word vectors corresponding to the first word masks in the at least two first sample text language materials, so as to obtain the word vectors.

    METHOD AND APPARATUS FOR TRAINING SEMANTIC REPRESENTATION MODEL, DEVICE AND COMPUTER STORAGE MEDIUM

    公开(公告)号:US20220004716A1

    公开(公告)日:2022-01-06

    申请号:US17209124

    申请日:2021-03-22

    Abstract: The present application discloses a method and apparatus for training a semantic representation model, a device and a computer storage medium, which relates to the field of natural language processing technologies in artificial intelligence. An implementation includes: acquiring a semantic representation model which has been trained for a first language as a first semantic representation model; taking a bottom layer and a top layer of the first semantic representation model as trained layers, initializing the trained layers, keeping model parameters of other layers unchanged, and training the trained layers using training language materials of a second language until a training ending condition is met; successively bringing the untrained layers into the trained layers from bottom to top, and executing these layers respectively: keeping the model parameters of other layers than the trained layers unchanged, and training the trained layers using the training language materials of the second language until the training ending condition is met respectively; and obtaining a semantic representation model for the second language after all the layers are trained.

    Method, apparatus, electronic device and storage medium for training semantic similarity model

    公开(公告)号:US12118063B2

    公开(公告)日:2024-10-15

    申请号:US17209051

    申请日:2021-03-22

    Inventor: Zhen Li Yukun Li Yu Sun

    CPC classification number: G06F18/2148 G06F18/24147 G06F40/30

    Abstract: The present disclosure provides a method, apparatus, electronic device and storage medium for training a semantic similarity model, which relates to the field of artificial intelligence. A specific implementation solution is as follows: obtaining a target field to be used by a semantic similarity model to be trained; calculating respective correlations between the target field and application fields corresponding to each of training datasets in known multiple training datasets; training the semantic similarity model with the training datasets in turn, according to the respective correlations between the target field and the application fields corresponding to each of the training datasets. According to the technical solution of the present disclosure, it is possible to, in the fine-tuning phase, more purposefully train the semantic similarity model with the training datasets with reference to the correlations between the target field and the application fields corresponding to the training datasets, thereby effectively improving the learning capability of the sematic similarity model and effectively improving the accuracy of the trained semantic similarity model.

    Text recognition method, electronic device, and storage medium

    公开(公告)号:US11663404B2

    公开(公告)日:2023-05-30

    申请号:US17101789

    申请日:2020-11-23

    CPC classification number: G06F40/279 G06F40/166 G06F40/30 G06N20/00

    Abstract: The disclosure provides a text recognition method, an electronic device, and a storage medium. The method includes: obtaining N segments of a sample text; inputting each of the N segments into a preset initial language model in sequence, to obtain first text vector information corresponding to the N segments; inputting each of the N segments into the initial language model in sequence again, to obtain second text vector information corresponding to a currently input segment; in response to determining that the currently input segment has the mask, predicting the mask according to the second text vector information and the first text vector information to obtain a predicted word at a target position corresponding to the mask; training the initial language model according to an original word and the predicted word to generate a long text language model; and recognizing an input text through the long text language model.

    METHOD AND APPARATUS FOR DETERMINING A TOPIC
    27.
    发明申请

    公开(公告)号:US20200210522A1

    公开(公告)日:2020-07-02

    申请号:US16691104

    申请日:2019-11-21

    Abstract: Embodiments of the present disclosure disclose a method and apparatus for determining a topic. A specific embodiment of the method comprises: determining a to-be-recognized sentence sequence; calculating similarities between the to-be-recognized sentence sequence and each of topic templates in a topic template set in a target area, the each of the topic templates in the topic template set corresponding to a topic in at least one topic in the target area, the topic template including a topic section sequence, and a topic section including a topic sentence sequence; and determining a topic of the to-be-recognized sentence sequence according to an associated parameter, the associated parameter including the similarities between the to-be-recognized sentence sequence and the each of the topic templates in the topic template set. This embodiment reduces labor costs during a topic segmentation.

Patent Agency Ranking