Patent search ap:("Apple Inc.") AND inv:"Minsik CHO" Page 1

1.

发明申请
MEMORY-EFFICIENT DIFFERENTIABLE WEIGHT CLUSTERING FOR LARGE LANGUAGE MODEL COMPRESSION 有权

公开(公告)号：US20250037018A1

公开(公告)日：2025-01-30

申请号：US18658919

申请日：2024-05-08

Applicant: Apple Inc.

Inventor： Minsik CHO , Keivan ALIZADEH VAHID , Qichen FU , Saurabh ADYA , Carlo Eduardo Cabanero DEL MUNDO , Mohammad RASTEGARI , Devang K. NAIK , Peter ZATLOUKAL

IPC: G06N20/00

Abstract: The subject technology provides memory-efficient differentiable weight clustering for large language model compression. An apparatus determines a tensor including an attention map between learned weights of a trained machine learning model and corresponding centroids. The apparatus also determines a compressed attention table and a plurality of index lists during compression of the trained machine learning model based on an uniquification of the attention map and sharding of an associated index list. The apparatus determines whether the tensor exists at a destination device during compression of the trained machine learning model using a marshaling layer. The apparatus refrains from copying the tensor to the destination device when the tensor exists at the destination device, or copies the tensor to the destination device when the tensor does not exist at the destination device. The apparatus deploys a compressed machine learning model based on the compression of the trained machine learning model.

Patent Agency Ranking