Patent search ap:("BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO. Page LTD.") AND inv:"Shuohuan WANG"

1.

发明申请
MULTI-LINGUAL MODEL TRAINING METHOD, APPARATUS, ELECTRONIC DEVICE AND READABLE STORAGE MEDIUM 有权

公开(公告)号：US20220171941A1

公开(公告)日：2022-06-02

申请号：US17348104

申请日：2021-06-15

Applicant: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventor： Xuan OUYANG , Shuohuan WANG , Chao PANG , Yu SUN , Hao TIAN , Hua WU , Haifeng WANG

IPC: G06F40/30 , G06F40/58 , G06N20/00

Abstract: The present disclosure provides a multi-lingual model training method, apparatus, electronic device and readable storage medium and relates to the technical field of deep learning and natural language processing. A technical solution of the present disclosure when training the multi-lingual model is: obtaining training corpuses comprising a plurality of bilingual corpuses and a plurality of monolingual corpuses; training a multi-lingual model with a first training task by using the plurality of bilingual corpuses; training the multi-lingual model with a second training task by using the plurality of monolingual corpuses; and completing the training of the multi-lingual model in a case of determining that loss functions of the first training task and second training task converge. In the present disclosure, the multi-lingual model can be enabled to achieve semantic interaction between different languages and improve the accuracy of the multi-lingual model in learning the semantic representations of the multi-lingual model.

2.

发明申请
METHOD AND APPARATUS FOR GENERATING VECTOR REPRESENTATION OF TEXT, AND RELATED COMPUTER DEVICE 有权

公开(公告)号：US20210192141A1

公开(公告)日：2021-06-24

申请号：US16939947

申请日：2020-07-27

Applicant: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventor： Chao PANG , Shuohuan WANG , Yu SUN , Zhi LI

IPC: G06F40/30 , G06N20/00

Abstract: A method for generating a vector representation of a text includes dividing the text into text segments. Each text segment is represented as a segment vector corresponding to the respective text segment by employing a first-level semantic model. The segment vector is configured to indicate a semantics of the text segment. Text semantics recognition is performed on the segment vector of each text segment by employing a second-level semantic model to obtain a text vector for indicating a topic of the text.

3.

发明申请
METHOD, APPARATUS, DEVICE AND STORAGE MEDIUM FOR TRAINING DIALOGUE UNDERSTANDING MODEL 有权

公开(公告)号：US20220198327A1

公开(公告)日：2022-06-23

申请号：US17348270

申请日：2021-06-15

Applicant: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventor： Shuohuan WANG , Chao PANG , Yu SUN

IPC: G06N20/00 , G06N5/02 , G10L25/54 , G10L25/27 , G06F16/9032 , G06F16/907

Abstract: The present disclosure provides a method, apparatus, device and storage medium for training a dialogue understanding model, and relates to technical field of computers, and specifically to the technical field of artificial intelligence such as natural language processing and deep learning. The method for training a dialogue understanding model includes: obtaining dialogue understanding training data; performing joint training for a dialogue understanding pre-training task and a general pre-training task by using the dialogue understanding training data, to obtain a dialogue understanding model. According to the present disclosure, a model specially adapted for a dialogue understanding task may be obtained by training.

4.

发明申请
METHOD FOR TRAINING MULTILINGUAL SEMANTIC REPRESENTATION MODEL, DEVICE AND STORAGE MEDIUM 有权

公开(公告)号：US20220019743A1

公开(公告)日：2022-01-20

申请号：US17318577

申请日：2021-05-12

Applicant: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventor： Xuan OUYANG , Shuohuan WANG , Yu SUN

IPC: G06F40/30 , G06F40/237 , G06N20/00 , G06N5/04

Abstract: Technical solutions relate to the natural language processing field based on artificial intelligence. According to an embodiment, a multilingual semantic representation model is trained using a plurality of training language materials represented in a plurality of languages respectively, such that the multilingual semantic representation model learns the semantic representation capability of each language; a corresponding mixed-language language material is generated for each of the plurality of training language materials, and the mixed-language language material includes language materials in at least two languages; and the multilingual semantic representation model is trained using each mixed-language language material and the corresponding training language material, such that the multilingual semantic representation model learns semantic alignment information of different languages.

5.

发明申请
METHOD FOR RESOURCE SORTING, METHOD FOR TRAINING SORTING MODEL AND CORRESPONDING APPARATUSES 有权

公开(公告)号：US20210374344A1

公开(公告)日：2021-12-02

申请号：US17094943

申请日：2020-11-11

Applicant: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventor： Shuohuan WANG , Chao PANG , Yu SUN

IPC: G06F40/284 , G06N20/00 , G06F16/23 , G06F7/08

Abstract: A method for resource sorting, a method for training a sorting model and corresponding apparatuses which relate to the technical field of natural language processing under artificial intelligence are disclosed. The method according to some embodiments includes: forming an input sequence in order with an item to be matched and information of candidate resources; performing Embedding processing on each Token in the input sequence, the Embedding processing including: word Embedding, position Embedding and statement Embedding; and inputting result of the Embedding processing in a sorting model to obtain sorting scores of the sorting model for the candidate resources, the sorting model is obtained by pre-training of a Transformer model.

6.

发明申请
METHOD AND APPARATUS FOR GENERATING SEMANTIC REPRESENTATION MODEL, AND STORAGE MEDIUM 有权

公开(公告)号：US20210248484A1

公开(公告)日：2021-08-12

申请号：US17205894

申请日：2021-03-18

Applicant: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventor： Shuohuan WANG , Siyu DING , Yu SUN

IPC: G06N5/02 , G06K9/62 , G06F40/279

Abstract: The disclosure discloses a method and an apparatus for generating a semantic representation model, and a storage medium. The detailed implementation includes: performing recognition and segmentation on the original text included in an original text set to obtain knowledge units and non-knowledge units in the original text; performing knowledge unit-level disorder processing on the knowledge units and the non-knowledge units in the original text to obtain a disorder text; generating a training text set based on the character attribute of each character in the disorder text; and training an initial semantic representation model by employing the training text set to generate the semantic representation model.

7.

发明申请
METHOD, DEVICE, AND STORAGE MEDIUM FOR CORRECTING ERROR IN TEXT 有权

公开(公告)号：US20210397780A1

公开(公告)日：2021-12-23

申请号：US17405813

申请日：2021-08-18

Applicant: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventor： Chao PANG , Shuohuan WANG , Yu SUN , Zhi LI

IPC: G06F40/166 , G06K9/46 , G06K9/62 , G06N20/00

Abstract: A method for correcting an error in a text, an electronic device, and a storage medium are provided. The method includes: obtaining an original text; obtaining a training text by preprocessing the original text; extracting a plurality of feature vectors corresponding to each word in the training text; obtaining an input vector by processing the plurality of feature vectors; obtaining a target text by inputting the input vector into a text error correction model; and adjusting parameters of the text error correction model based on a difference between the target text and the original text.

8.

发明申请
METHOD, APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM FOR PROCESSING A SEMANTIC REPRESENTATION MODEL 有权

公开(公告)号：US20210182498A1

公开(公告)日：2021-06-17

申请号：US16885358

申请日：2020-05-28

Applicant: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventor： Yu SUN , Haifeng WANG , Shuohuan WANG , Yukun LI , Shikun FENG , Hao TIAN , Hua WU

IPC: G06F40/30 , G06F40/40

Abstract: The present disclosure provides a method, apparatus, electronic device and storage medium for processing a semantic representation model, and relates to the field of artificial intelligence technologies. A specific implementation solution is: collecting a training corpus set including a plurality of training corpuses; training the semantic representation model using the training corpus set based on at least one of lexicon, grammar and semantics. In the present disclosure, by building the unsupervised or weakly-supervised training task at three different levels, namely, lexicon, grammar and semantics, the semantic representation model is enabled to learn knowledge at levels of lexicon, grammar and semantics from massive data, enhance the capability of universal semantic representation and improve the processing effect of the NLP task.

9.

发明申请
METHOD AND APPARATUS FOR INFORMATION PROCESSING 审中-公开

公开(公告)号：US20190065507A1

公开(公告)日：2019-02-28

申请号：US16054920

申请日：2018-08-03

Applicant: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventor： Shuohuan WANG , Yu SUN , Dianhai YU

IPC: G06F17/30 , G06F17/27 , G10L15/18

Abstract: Embodiments of the present disclosure disclose a method and apparatus for processing information. A specific implementation of the method includes: acquiring a search result set related to a search statement inputted by a user; parsing the search statement to generate a first syntax tree, and parsing a search result in the search result set to generate a second syntax tree set; calculating a similarity between the search statement and the search result in the search result set using a pre-trained semantic matching model on the basis of the first syntax tree and the second syntax tree set, the semantic matching model being used to determine the similarity between the syntax trees; and sorting the search result in the search result set on the basis of the similarity between the search statement and the search result in the search result set, and pushing the sorted search result set to the user.

10.

发明申请
TEXT RECOGNITION METHOD, ELECTRONIC DEVICE, AND STORAGE MEDIUM 有权

公开(公告)号：US20210383064A1

公开(公告)日：2021-12-09

申请号：US17101789

申请日：2020-11-23

Applicant: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventor： Shuohuan WANG , Siyu DING , Yu SUN , Hua WU , Haifeng WANG

IPC: G06F40/279 , G06F40/166 , G06F40/30 , G06N20/00

Abstract: The disclosure provides a text recognition method, an electronic device, and a storage medium. The method includes: obtaining N segments of a sample text; inputting each of the N segments into a preset initial language model in sequence, to obtain first text vector information corresponding to the N segments; inputting each of the N segments into the initial language model in sequence again, to obtain second text vector information corresponding to a currently input segment; in response to determining that the currently input segment has the mask, predicting the mask according to the second text vector information and the first text vector information to obtain a predicted word at a target position corresponding to the mask; training the initial language model according to an original word and the predicted word to generate a long text language model; and recognizing an input text through the long text language model.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification