-
公开(公告)号:US20230100930A1
公开(公告)日:2023-03-30
申请号:US17449576
申请日:2021-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Xiaodan Tan , Paul Gilbert Meyer , Gennady Pekhimenko , Randy Renfu Huang
Abstract: Techniques for compressing a neural network model by mixing compression ratios (sparsity patterns) are described. The weight tensor of a neural network model is divided into weight groups. The pruning cost of compressing the weight values according to a compression ratio is determined for each weight group, and a pruning cost distribution for the compression ratio is generated from the pruning costs of the weight groups. A cost threshold can then be selected from the pruning cost distribution, and weight groups having a pruning cost below the selected cost threshold are compressed according to the compression ratio. The remaining weight groups can be compressed using one or more less aggressive compression ratios. The cost threshold can be adjusted to tune the overall sparsity and accuracy of the compressed neural network.
-
公开(公告)号:US20240111528A1
公开(公告)日:2024-04-04
申请号:US17934147
申请日:2022-09-21
Applicant: Amazon Technologies, Inc.
Inventor: Xiaodan Tan , Paul Gilbert Meyer , Sheng Xu , Ron Diamant
CPC classification number: G06F9/30036 , G06F9/30145 , G06F9/3555
Abstract: A technique to execute transpose and compute operations may include retrieving a set of machine instructions from an instruction buffer of a data processor. The instruction buffer has multiple entries, and each entry stores one machine instruction. A machine instruction from the set of machine instructions is executed to transpose a submatrix of an input tensor and perform computations on column elements of the submatrix. The machine instruction combines the transpose operation with computational operations into a single machine instruction.
-
公开(公告)号:US12099840B1
公开(公告)日:2024-09-24
申请号:US18185236
申请日:2023-03-16
Applicant: Amazon Technologies, Inc.
Inventor: Xiaodan Tan , Paul Gilbert Meyer , Ron Diamant
IPC: G06F9/30
CPC classification number: G06F9/30018 , G06F9/30032
Abstract: A technique for performing a tensor operation includes inputting concatenated data words of a first input tensor and concatenated data words of a second input tensor into a compute channel having a plurality of compute stages coupled in series. The concatenated data words of the first input tensor and the second input tensor represented in a first datatype can be converted into data elements represented in a second datatype using a first subset of the compute stages. A binary operation can be performed on each data element represented in the second datatype from the first input tensor with a corresponding data element represented in the second datatype from the second input tensor to generate output data elements of an output tensor represented in the second datatype using a second subset of the compute stages. The output data elements of the output tensor can then be outputted from the compute channel.
-
公开(公告)号:US20240103813A1
公开(公告)日:2024-03-28
申请号:US17934145
申请日:2022-09-21
Applicant: Amazon Technologies, Inc.
Inventor: Xiaodan Tan , Paul Gilbert Meyer , Sheng Xu , Ron Diamant
Abstract: An integrated circuit that combines transpose and compute operations may include a transpose circuit coupled to a set of compute channels. Each compute channel may include multiple arithmetic logic unit (ALU) circuits coupled in series. The transpose circuit is operable to receive an input tensor, transpose the input tensor, and output a transposed tensor to the set of compute channels. The set of compute channels is operable to generate outputs in parallel, with each of the outputs being generated from a corresponding vector of the transposed tensor.
-
公开(公告)号:US12008368B2
公开(公告)日:2024-06-11
申请号:US17934147
申请日:2022-09-21
Applicant: Amazon Technologies, Inc.
Inventor: Xiaodan Tan , Paul Gilbert Meyer , Sheng Xu , Ron Diamant
CPC classification number: G06F9/30036 , G06F9/30145 , G06F9/3555
Abstract: A technique to execute transpose and compute operations may include retrieving a set of machine instructions from an instruction buffer of a data processor. The instruction buffer has multiple entries, and each entry stores one machine instruction. A machine instruction from the set of machine instructions is executed to transpose a submatrix of an input tensor and perform computations on column elements of the submatrix. The machine instruction combines the transpose operation with computational operations into a single machine instruction.
-
公开(公告)号:US11941397B1
公开(公告)日:2024-03-26
申请号:US17804796
申请日:2022-05-31
Applicant: Amazon Technologies, Inc.
Inventor: Xiaodan Tan , Paul Gilbert Meyer
CPC classification number: G06F9/30101 , G06F9/3001 , G06F9/30032 , G06F9/30036 , G06F9/30043 , G06F9/3887
Abstract: Techniques to take advantage of the single-instruction-multiple-data (SIMD) capabilities of a processor to process data blocks can include implementing an instruction to fuse the data blocks together. The fuse input instruction can have a first input vector, a second input vector, a select input, a first output vector, and a second output vector. The fuse input instruction selects a portion of the first input vector and a portion of the second input vector based on the select input, sign extends the selected portion of the first input vector and the selected portion of the second input vector, and shuffles data elements of the sign extended portion of the first input vector with data elements of the sign extended portion of the second input vector to generate the first and second output vectors.
-
-
-
-
-