-
1.
公开(公告)号:US20240104346A1
公开(公告)日:2024-03-28
申请号:US17945978
申请日:2022-09-15
Applicant: Huawei Technologies Co., Ltd.
Inventor: Lu HOU , Chaofan TAO , Wei ZHANG , Lifeng SHANG , Xin JIANG , Qun LIU , Li QIAN
IPC: G06N3/04
CPC classification number: G06N3/0454
Abstract: A method is provided for quantizing a neural network model performed by a processing system. The method comprises determining a scaling factor based on a distribution of weights associated with the neural network model, determining quantized weights based on the scaling factor and the weights associated with the distribution, determining a training loss of the neural network model based on the quantized weights during training of the neural network model, and determining an updated scaling factor for the neural network model based on a gradient of the training loss.
-
公开(公告)号:US20230048031A1
公开(公告)日:2023-02-16
申请号:US17964165
申请日:2022-10-12
Applicant: HUAWEI TECHNOLOGIES CO., LTD.
IPC: G06N3/08 , G06F40/279 , G06F40/103
Abstract: Relating to the field of artificial intelligence, and specifically relating to the field of natural language processing, a data processing method includes and an apparatus performs: determining original text samples, where masking processing is not performed on the original text samples; and performing mask processing on the original text samples to obtain mask training samples, where the mask processing makes mask proportions of the mask training samples unfixed, and the mask training samples each are used to train a pretrained language model PLM. Training the PLM by using the mask training samples whose mask proportions are unfixed can enhance mode diversity of the training samples of the PLM. Therefore, features learned by the PLM are also diversified, a generalization capability of the PLM can be improved, and a natural language understanding capability of the PLM obtained through training can be improved.
-
公开(公告)号:US20240046067A1
公开(公告)日:2024-02-08
申请号:US18380581
申请日:2023-10-16
Applicant: HUAWEI TECHNOLOGIES CO., LTD.
Inventor: Junqiu WEI , Yi LIAO , Xin JIANG , Qun LIU , Li QIAN
IPC: G06N3/04
CPC classification number: G06N3/04
Abstract: A data processing method includes: obtaining a first embedding vector for indicating a known data unit and a position of the known data unit and a second embedding vector for indicating a position of a to-be-predicted data unit; processing the first embedding vector by using a target encoder, to obtain an output vector; and processing the output vector and the second embedding vector by using a target prediction network, to obtain a to-be-predicted data unit. According to the method, M pieces of additional position information do not need to be separately set as input of the target encoder, and a quantity of latent variables of intermediate output of the target encoder is also consistent with a quantity of input embedding vectors, thereby reducing a computation amount and memory consumption of the target encoder.
-
公开(公告)号:US20230177410A1
公开(公告)日:2023-06-08
申请号:US18161620
申请日:2023-01-30
Applicant: Huawei Technologies Co., Ltd.
Abstract: A model training method applied to the field of artificial intelligence is disclosed. The method includes: sending a first submodel to a first device, where the first submodel is obtained by compressing a to-be-trained model; receiving a first gradient sent by the first device, where the first gradient is obtained when the first device trains the first submodel; and performing model training on the to-be-trained model based on at least the first gradient, to obtain an updated to-be-trained model. In the method, a server compresses the to-be-trained model and delivers the to-be-trained model to a terminal device, so that the terminal device does not need to train a large model with a same scale as that of the server.
-
公开(公告)号:US20240220730A1
公开(公告)日:2024-07-04
申请号:US18604138
申请日:2024-03-13
Applicant: HUAWEI TECHNOLOGIES CO., LTD.
Inventor: Xiaojun MENG , Yasheng WANG , Xin JIANG , Qun LIU
IPC: G06F40/30
CPC classification number: G06F40/30
Abstract: A text data processing method, a neural-network training method, and related devices are provided. The methods may be applied to the text data processing field in the artificial intelligence field. The method includes: obtaining a to-be-processed text, where the to-be-processed text includes a plurality of characters; and processing the to-be-processed text by using a target model to obtain a prediction result, where the prediction result indicates to split the to-be-processed text into a plurality of target character sets, the prediction result further includes a plurality of first labels, one first label indicates semantics of one target character set, and the plurality of first labels are used to determine an intention of the to-be-processed text.
-
公开(公告)号:US20220147715A1
公开(公告)日:2022-05-12
申请号:US17526832
申请日:2021-11-15
Applicant: Huawei Technologies Co., Ltd. , TSINGHUA UNIVERSITY
Inventor: Yasheng WANG , Xin JIANG , Xiao CHEN , Qun LIU , Zhengyan ZHANG , Fanchao QI , Zhiyuan LIU
IPC: G06F40/295
Abstract: This application relates to the field of artificial intelligence, and provides a text processing method, a model training method, and an apparatus. The method includes: obtaining target knowledge data; processing the target knowledge data to obtain a target knowledge vector; processing to-be-processed text to obtain a target text vector; fusing the target text vector and the target knowledge vector based on a target fusion model, to obtain a fused target text vector and a fused target knowledge vector; and processing the fused target text vector and/or the fused target knowledge vector based on a target processing model, to obtain a processing result corresponding to a target task. The foregoing technical solution can improve accuracy of a result of processing a target task by the target processing model.
-
-
-
-
-