-
公开(公告)号:US11494321B1
公开(公告)日:2022-11-08
申请号:US17449586
申请日:2021-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Yunxuan Yu , Hongbin Zheng , Qingrui Liu
Abstract: A computer-implemented method includes identifying, from instruction code for executing by a computing system to implement a neural network, a first instruction for allocating a first region of a local memory of an accelerator of the computing system to a tensor, and a first direct memory access (DMA) load instruction for loading the tensor from a location of a system memory of the computing system to a second region of the local memory; adding a first tensor copy instruction in the instruction code to save the tensor in the first region of the local memory to a third region of the local memory that has dimensions different from dimensions of the first region; and replacing the first DMA load instruction with a second tensor copy instruction for saving data in the third region of the local memory to the second region of the local memory.