Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Xiaodan Tan"

1.

发明申请
MIXING SPARSITY COMPRESSION 有权

公开(公告)号：US20230100930A1

公开(公告)日：2023-03-30

申请号：US17449576

申请日：2021-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Xiaodan Tan , Paul Gilbert Meyer , Gennady Pekhimenko , Randy Renfu Huang

IPC: G06N3/08 , G06N3/04

Abstract: Techniques for compressing a neural network model by mixing compression ratios (sparsity patterns) are described. The weight tensor of a neural network model is divided into weight groups. The pruning cost of compressing the weight values according to a compression ratio is determined for each weight group, and a pruning cost distribution for the compression ratio is generated from the pruning costs of the weight groups. A cost threshold can then be selected from the pruning cost distribution, and weight groups having a pruning cost below the selected cost threshold are compressed according to the compression ratio. The remaining weight groups can be compressed using one or more less aggressive compression ratios. The cost threshold can be adjusted to tune the overall sparsity and accuracy of the compressed neural network.

2.

发明公开
PROGRAMMABLE COMPUTE ENGINE HAVING TRANSPOSE OPERATIONS 审中-公开

公开(公告)号：US20240111528A1

公开(公告)日：2024-04-04

申请号：US17934147

申请日：2022-09-21

Applicant: Amazon Technologies, Inc.

Inventor： Xiaodan Tan , Paul Gilbert Meyer , Sheng Xu , Ron Diamant

IPC: G06F9/30 , G06F9/355

CPC classification number: G06F9/30036 , G06F9/30145 , G06F9/3555

Abstract: A technique to execute transpose and compute operations may include retrieving a set of machine instructions from an instruction buffer of a data processor. The instruction buffer has multiple entries, and each entry stores one machine instruction. A machine instruction from the set of machine instructions is executed to transpose a submatrix of an input tensor and perform computations on column elements of the submatrix. The machine instruction combines the transpose operation with computational operations into a single machine instruction.

3.

发明授权
Throughput increase for tensor operations 有权

公开(公告)号：US12099840B1

公开(公告)日：2024-09-24

申请号：US18185236

申请日：2023-03-16

Applicant: Amazon Technologies, Inc.

Inventor： Xiaodan Tan , Paul Gilbert Meyer , Ron Diamant

IPC: G06F9/30

CPC classification number: G06F9/30018 , G06F9/30032

Abstract: A technique for performing a tensor operation includes inputting concatenated data words of a first input tensor and concatenated data words of a second input tensor into a compute channel having a plurality of compute stages coupled in series. The concatenated data words of the first input tensor and the second input tensor represented in a first datatype can be converted into data elements represented in a second datatype using a first subset of the compute stages. A binary operation can be performed on each data element represented in the second datatype from the first input tensor with a corresponding data element represented in the second datatype from the second input tensor to generate output data elements of an output tensor represented in the second datatype using a second subset of the compute stages. The output data elements of the output tensor can then be outputted from the compute channel.

4.

发明公开
COMPUTE ENGINE WITH TRANSPOSE CIRCUITRY 审中-公开

公开(公告)号：US20240103813A1

公开(公告)日：2024-03-28

申请号：US17934145

申请日：2022-09-21

Applicant: Amazon Technologies, Inc.

Inventor： Xiaodan Tan , Paul Gilbert Meyer , Sheng Xu , Ron Diamant

IPC: G06F7/76 , G06F7/57 , G06F17/16

CPC classification number: G06F7/768 , G06F7/57 , G06F17/16

Abstract: An integrated circuit that combines transpose and compute operations may include a transpose circuit coupled to a set of compute channels. Each compute channel may include multiple arithmetic logic unit (ALU) circuits coupled in series. The transpose circuit is operable to receive an input tensor, transpose the input tensor, and output a transposed tensor to the set of compute channels. The set of compute channels is operable to generate outputs in parallel, with each of the outputs being generated from a corresponding vector of the transposed tensor.

5.

发明授权
Programmable compute engine having transpose operations 有权

公开(公告)号：US12008368B2

公开(公告)日：2024-06-11

申请号：US17934147

申请日：2022-09-21

Applicant: Amazon Technologies, Inc.

Inventor： Xiaodan Tan , Paul Gilbert Meyer , Sheng Xu , Ron Diamant

IPC: G06F9/30 , G06F9/355

CPC classification number: G06F9/30036 , G06F9/30145 , G06F9/3555

Abstract: A technique to execute transpose and compute operations may include retrieving a set of machine instructions from an instruction buffer of a data processor. The instruction buffer has multiple entries, and each entry stores one machine instruction. A machine instruction from the set of machine instructions is executed to transpose a submatrix of an input tensor and perform computations on column elements of the submatrix. The machine instruction combines the transpose operation with computational operations into a single machine instruction.

6.

发明授权
Machine instructions for decoding acceleration including fuse input instructions to fuse multiple JPEG data blocks together to take advantage of a full SIMD width of a processor 有权

公开(公告)号：US11941397B1

公开(公告)日：2024-03-26

申请号：US17804796

申请日：2022-05-31

Applicant: Amazon Technologies, Inc.

Inventor： Xiaodan Tan , Paul Gilbert Meyer

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/30101 , G06F9/3001 , G06F9/30032 , G06F9/30036 , G06F9/30043 , G06F9/3887

Abstract: Techniques to take advantage of the single-instruction-multiple-data (SIMD) capabilities of a processor to process data blocks can include implementing an instruction to fuse the data blocks together. The fuse input instruction can have a first input vector, a second input vector, a select input, a first output vector, and a second output vector. The fuse input instruction selects a portion of the first input vector and a portion of the second input vector based on the select input, sign extends the selected portion of the first input vector and the selected portion of the second input vector, and shuffles data elements of the sign extended portion of the first input vector with data elements of the sign extended portion of the second input vector to generate the first and second output vectors.

Patent Agency Ranking