-
公开(公告)号:US11568238B2
公开(公告)日:2023-01-31
申请号:US16456414
申请日:2019-06-28
Applicant: Amazon Technologies, Inc.
Inventor: Randy Renfu Huang , Ron Diamant , Richard John Heaton
Abstract: A computer-implemented method includes receiving a neural network model that includes a tensor operation, and dividing the tensor operation into sub-operations. The sub-operations includes at least two sub-operations that have no data dependency between the two sub-operations. The computer-implemented method further includes assigning a first sub-operation in the two sub-operations to a first computing engine, assigning a second sub-operation in the two sub-operations to a second computing engine, and generating instructions for performing, in parallel, the first sub-operation by the first computing engine and the second sub-operation by the second computing engine. An inference is then made based on a result of the first sub-operation, a result of the second sub-operation, or both. The first computing engine and the second computing engine are in a same integrated circuit device or in two different integrated circuit devices.
-
公开(公告)号:US11138106B1
公开(公告)日:2021-10-05
申请号:US16836780
申请日:2020-03-31
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Randy Renfu Huang
Abstract: Provided are integrated circuit devices and methods for operating integrated circuit devices. In various examples, the integrated circuit device can include a target port operable to receive transactions from a master port. The target port can be configured with a multicast address range that is associated with a plurality of indices corresponding to memory banks of the device. When the target port receives a write transaction that has an address that is within the multicast address range, the target port can determine an index from the plurality of indices, and can use the index to determine a second address, which combines the index and the offset value with the address. The target port can then use the second address to write the data to the memory.
-
公开(公告)号:US10956584B1
公开(公告)日:2021-03-23
申请号:US16141770
申请日:2018-09-25
Applicant: Amazon Technologies, inc.
Inventor: Richard John Heaton , Randy Renfu Huang , Ron Diamant , David James Borland
Abstract: Systems and methods for performing neural network processing are provided. In one example, a system comprises a neural network processor comprising: a data decryption engine that receives encrypted data and decrypts the encrypted data, the encrypted data comprising at least one of: encrypted weights data, encrypted input data, or encrypted instruction data related to a neural network model; and a computing engine that receives the weights data and perform computations of neural network processing using the input data and the weights data and based on the instruction data.
-
公开(公告)号:US10929063B1
公开(公告)日:2021-02-23
申请号:US16368538
申请日:2019-03-28
Applicant: Amazon Technologies, Inc.
Inventor: Vignesh Vivekraja , Yu Zhou , Ron Diamant , Randy Renfu Huang , Richard John Heaton
Abstract: Systems and methods for assisted indirect memory addressing are provided. Some computing systems move data between levels of a hierarchical memory system. To accommodate data movement for computing systems that do not natively support indirect addressing between levels of the memory hierarchy, a direct memory access (DMA) engine is used to fetch data. The DMA engine executes a first set of memory instructions that modify a second set of memory instructions to fetch data stored at one level of the memory hierarchy from dynamically computed indirect addresses stored in memory locations at another level of the memory hierarchy.
-
公开(公告)号:US10901492B1
公开(公告)日:2021-01-26
申请号:US16369696
申请日:2019-03-29
Applicant: Amazon Technologies, Inc.
Inventor: Nafea Bshara , Ron Diamant , Randy Renfu Huang , Ali Ghassan Saidi
Abstract: Techniques are described for power reduction in a computer processor based on detection of whether data destined for input to an arithmetic logic unit (ALU) has a particular value. The data is written to a register prior to performing an arithmetic or logical operation using the data as an operand. Depending on a timing of when the data is supplied to the register, the determination is made before or after the data is written to the register, and a memory associated with the register is updated with a result of the determination. Contents of the memory are used to make a decision whether to allow the ALU to perform the arithmetic or logical operation. The memory can be implemented as a non-architectural register.
-
公开(公告)号:US10740432B1
公开(公告)日:2020-08-11
申请号:US16219604
申请日:2018-12-13
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Randy Renfu Huang , Mohammad El-Shabani , Sundeep Amirineni , Kenneth Wayne Patton , Willis Wang
Abstract: Methods and systems for performing hardware computations of mathematical functions are provided. In one example, a system comprises a mapping table that maps each base value of a plurality of base values to parameters related to a mathematical function; a selection module configured to select, based on an input value, a first base value and first parameters mapped to the first base value in the mapping table; and arithmetic circuits configured to: receive, from the mapping table, the first base value and the first plurality of parameters; and compute, based on a relationship between the input value and the first base value, and based on the first parameters, an estimated output value of the mathematical function for the input value.
-
公开(公告)号:US10678479B1
公开(公告)日:2020-06-09
申请号:US16204943
申请日:2018-11-29
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Randy Renfu Huang , Sundeep Amirineni , Jeffrey T. Huynh
Abstract: Provided are integrated circuits and methods for operating integrated circuits. An integrated circuit can include a plurality of memory banks and an execution engine including a set of execution components. Each execution component can be associated with a respective memory bank, and can read from and write to only the respective memory bank. The integrated circuit can further include a set of registers each associated with a respective memory bank from the plurality of memory banks. The integrated circuit can further be operable to load to or store from the set of registers in parallel, and load to or store from the set of registers serially. A parallel operation followed by a serial operation enables data to be moved from many memory banks into one memory bank. A serial operation followed by a parallel operation enables data to be moved from one memory bank into many memory banks.
-
-
-
-
-
-