-
公开(公告)号:US12204757B1
公开(公告)日:2025-01-21
申请号:US18067514
申请日:2022-12-16
Applicant: Amazon Technologies, Inc.
Inventor: Kun Xu , Ron Diamant , Ilya Minkin , Raymond S. Whiteside
IPC: G06F3/06
Abstract: A technique for processing strong ordered transactions in a direct memory access engine may include retrieving a memory descriptor to perform a strong ordered transaction, and delaying the strong ordered transaction until pending write transactions associated with previous memory descriptors retrieved prior to the memory descriptor are complete. Subsequent transactions associated with memory descriptors following the memory descriptor are allowed to be issued while waiting for the pending write transactions to complete. Upon completion of the pending write transactions, the strong ordered transaction is performed.
-
公开(公告)号:US11983128B1
公开(公告)日:2024-05-14
申请号:US18067109
申请日:2022-12-16
Applicant: Amazon Technologies, Inc.
Inventor: Kun Xu , Ron Diamant , Ilya Minkin , Mohammad El-Shabani , Raymond S. Whiteside , Uday Shilton Udayaselvam
CPC classification number: G06F13/30 , G06F13/1621 , G06F13/1642
Abstract: Techniques to reduce overhead in a direct memory access (DMA) engine can include processing descriptors from a descriptor queue to obtain a striding configuration to generate tensorized memory descriptors. The striding configuration can include, for each striding dimension, a stride and a repetition number indicating a number of times to repeat striding in the corresponding striding dimension. One or more sets of tensorized memory descriptors can be generated based on the striding configuration. Data transfers are then performed based on the generated tensorized memory descriptors.
-
公开(公告)号:US11907144B1
公开(公告)日:2024-02-20
申请号:US17805410
申请日:2022-06-03
Applicant: Amazon Technologies, Inc.
Inventor: Raymond S. Whiteside , Thomas A. Volpe
IPC: G06F13/28
CPC classification number: G06F13/28 , G06F2213/28
Abstract: Techniques to reduce the latency in notifying that space in a memory has been freed up are described. For example, when moving data from on-chip memory of a computing engine to system memory, the computing engine can be notified that its on-chip memory is free before an acknowledgment is provided by the system memory that the data being moved has been written into the system memory. The computing engine can be given access to the on-chip memory sooner by generating an early semaphore update based on a determination that the set of data being moved to system memory has been read out from the on-chip memory. The early semaphore update need not wait for the acknowledgement from the system memory, thus reducing the latency of notifying the computing engine that the on-chip memory is free.
-
公开(公告)号:US11550736B1
公开(公告)日:2023-01-10
申请号:US17449581
申请日:2021-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Kun Xu , Ron Diamant , Ilya Minkin , Mohammad El-Shabani , Raymond S. Whiteside , Uday Shilton Udayaselvam
Abstract: To reduce direct memory access (DMA) overhead, a tensorized descriptor can be used to generate a series of memory descriptors to perform a series of DMA data transfers. The tensorized descriptor may include attributes such as a stride and a memory descriptor template, which can be used to generate the series of memory descriptors. Hence, instead of having to retrieve each of the memory descriptors to perform the series of DMA transfers, a single tensorized descriptor can be retrieved to perform a series of data transfers.
-
公开(公告)号:US10956248B1
公开(公告)日:2021-03-23
申请号:US16200602
申请日:2018-11-26
Applicant: Amazon Technologies, Inc.
Inventor: Thomas A. Volpe , Raymond S. Whiteside
IPC: G06F11/07
Abstract: An integrated circuit configured to execute program instructions can generate, based on a configuration, any combination of a notification message, a halt signal, or an interrupt signal for a condition detected in the integrated circuit. The detected condition can be an error condition or a non-error condition. The notification message for the condition may be written to memory accessible by a host processor. The non-error condition may be used by the host processor to monitor internal states of the integrated circuit. The halt signal may be used to stop the integrated circuit from executing the instructions.
-
-
-
-