-
公开(公告)号:US11748622B1
公开(公告)日:2023-09-05
申请号:US16292236
申请日:2019-03-04
Applicant: Amazon Technologies, Inc.
Inventor: Drazen Borkovic , Se jong Oh
IPC: G06N3/082 , G06F16/901 , G06N3/063 , G06F8/41
CPC classification number: G06N3/082 , G06F8/433 , G06F16/9024 , G06N3/063
Abstract: A computing system is configured to access intermediate outputs of a neural network by augmenting a data flow graph generated for the neural network. The data flow graph includes a plurality of nodes interconnected by connections, each node representing an operation to be executed by the neural network. To access the intermediate output, the data flow graph is augmented by inserting a node representing an operation that saves the output of a node which produces the intermediate output. The node representing the save operation is inserted while maintaining all existing nodes and connections in the data flow graph, thereby preserving the behavior of the data flow graph. The augmenting can be performed using a compiler that generates the data flow graph from program code.
-
公开(公告)号:US11308396B2
公开(公告)日:2022-04-19
申请号:US16455329
申请日:2019-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Jindrich Zejda , Jeffrey T. Huynh , Drazen Borkovic , Se jong Oh , Ron Diamant , Randy Renfu Huang
Abstract: Techniques are disclosed for debugging a neural network execution on a target processor. A reference processor may generate a plurality of first reference tensors for the neural network. The neural network may be repeatedly reduced to produce a plurality of lengths. For each of the lengths, a compiler converts the neural network into first machine instructions, the target processor executes the first machine instructions to generate a first device tensor, and the debugger program determines whether the first device tensor matches a first reference tensor. A shortest length is identified for which the first device tensor does not match the first reference tensor. Tensor output is enabled for a lower-level intermediate representation of the shortest neural network, and the neural network is converted into second machine instructions, which are executed by the target processor to generate a second device tensor.
-
公开(公告)号:US10884707B1
公开(公告)日:2021-01-05
申请号:US16455201
申请日:2019-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Haichen Li , Ron Diamant , Jeffrey T. Huynh , Yu Zhou , Se jong Oh
Abstract: Provided are systems and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.
-
公开(公告)号:US11347480B2
公开(公告)日:2022-05-31
申请号:US17122136
申请日:2020-12-15
Applicant: Amazon Technologies, Inc.
Inventor: Haichen Li , Ron Diamant , Jeffrey T. Huynh , Yu Zhou , Se jong Oh
Abstract: Provided are integrated circuits and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.
-
公开(公告)号:US20210096823A1
公开(公告)日:2021-04-01
申请号:US17122136
申请日:2020-12-15
Applicant: Amazon Technologies, Inc.
Inventor: Haichen Li , Ron Diamant , Jeffrey T. Huynh , Yu Zhou , Se jong Oh
Abstract: Provided are integrated circuits and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.
-
-
-
-