-
公开(公告)号:US12236134B2
公开(公告)日:2025-02-25
申请号:US17953723
申请日:2022-09-27
Applicant: Advanced Micro Devices, Inc.
Inventor: Mahzabeen Islam , Shaizeen Dilawarhusen Aga , Johnathan Robert Alsop , Mohamed Assem Abd ElMohsen Ibrahim , Nuwan S Jayasena
IPC: G06F3/06
Abstract: In accordance with the described techniques for bank-level parallelism for processing in memory, a plurality of commands are received for execution by a processing in memory component embedded in a memory. The memory includes a first bank and a second bank. The plurality of commands include a first stream of commands which cause the processing in memory component to perform operations that access the first bank and a second stream of commands which cause the processing in memory component to perform operations that access the second bank. A next row of the first bank that is to be accessed by the processing in memory component is identified. Further, a precharge command is scheduled to close a first row of the first bank and an activate command is scheduled to open the next row of the first bank in parallel with execution of the second stream of commands.
-
公开(公告)号:US20240069915A1
公开(公告)日:2024-02-29
申请号:US17899231
申请日:2022-08-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Meysam Taassori , Shaizeen Dilawarhusen Aga , Mohamed Assem Abd ElMohsen Ibrahim , Johnathan Robert Alsop
CPC classification number: G06F9/30036 , G06F12/10 , G06F16/2237
Abstract: A virtual padding unit provides a virtual padded data structure (e.g., virtually padded matrix) that provides output values for a padded data structure without storing all of the padding elements in memory. When the virtual padding unit receives a virtual memory address of a location in the virtual padded data structure, the virtual padding unit checks whether the location is a non-padded location in the virtual padded data structure or a padded location in the virtual padded data structure. If the location is a padded location in the virtual padded data structure, the virtual padding unit outputs a padding value rather than a value stored in the virtual padded data structure. If the location is a non-padded location in the virtual padded data structure, a value stored at the location is output.
-
公开(公告)号:US20240220122A1
公开(公告)日:2024-07-04
申请号:US18147088
申请日:2022-12-28
Applicant: Advanced Micro Devices, Inc.
IPC: G06F3/06
CPC classification number: G06F3/0613 , G06F3/0659 , G06F3/0673
Abstract: Partial address memory requests for data are described. In accordance with the described techniques, an accelerator receives a request for data that does not include address information for a data storage location from which the data is to be retrieved. The accelerator identifies at least one data storage location that includes data produced by the accelerator and retrieves the data from the at least one data storage location. A result is then output by the accelerator that includes the data retrieved from the at least one data storage location.
-
公开(公告)号:US20230401154A1
公开(公告)日:2023-12-14
申请号:US17835810
申请日:2022-06-08
Applicant: Advanced Micro Devices, Inc.
Inventor: Mohamed Assem Abd ElMohsen Ibrahim , Onur Kayiran , Shaizeen Dilawarhusen Aga , Yasuko Eckert
IPC: G06F12/0862
CPC classification number: G06F12/0862 , G06F2212/602
Abstract: A system and method for efficiently accessing sparse data for a workload are described. In various implementations, a computing system includes an integrated circuit and a memory for storing tasks of a workload that includes sparse accesses of data items stored in one or more tables. The integrated circuit receives a user query, and generates a result based on multiple data items targeted by the user query. To reduce the latency of processing the workload even with sparse lookup operations performed on the one or more tables, a prefetch engine of the integrated circuit stores a subset of data items in prefetch data storage. The prefetch engine also determines which data items to store in the prefetch data storage based on one or more of a frequency of reuse, a distance or latency of access of a corresponding table of the one more tables, or other.
-
公开(公告)号:US20230359558A1
公开(公告)日:2023-11-09
申请号:US17739817
申请日:2022-05-09
Applicant: Advanced Micro Devices, Inc.
Inventor: Shaizeen Aga , Mohamed Assem Abd ElMohsen Ibrahim
IPC: G06F12/0804
CPC classification number: G06F12/0804 , G06F2212/251
Abstract: An approach is provided for skipping, i.e., not processing and/or deleting, near-memory processing commands when one or more skip criteria are satisfied. Examples of skip criteria include, without limitation, specific operations, specific operands, and combinations of specific operations and specific operands. The approach is implemented at one or more memory command processing elements in the memory pipeline of a processor, such as memory controllers, caches, queues, and buffers, etc. Implementations include exceptions to skipping in certain situations and software support for configuring skip criteria, including particular operations and operands for which skip checking is performed. The approach provides the benefits of reducing command bus traffic and power consumption while maintaining functional correctness.
-
公开(公告)号:US20230098421A1
公开(公告)日:2023-03-30
申请号:US17490703
申请日:2021-09-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Onur Kayiran , Mohamed Assem Abd ElMohsen Ibrahim , Shaizeen Aga
Abstract: Methods and apparatuses include a processing unit which helps control the speed and computational resources required for arithmetic operations of two numbers in a first format. The control unit of the processing unit approximates the arithmetic operations using a plurality of decomposed numbers in a second format that facilitates faster calculations than the first format, such that performing arithmetic operations using the decomposed numbers is capable of approximating the results of the arithmetic operations of the two numbers in the first format.
-
公开(公告)号:US20230065546A1
公开(公告)日:2023-03-02
申请号:US17489576
申请日:2021-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Mohamed Assem Abd ElMohsen Ibrahim , Onur Kayiran , Shaizeen Aga
IPC: G06F16/2457
Abstract: An electronic device includes a plurality of nodes, each node having a processor that performs operations for processing instances of input data through a model, a local memory that stores a separate portion of model data for the model, and a controller. The controller identifies model data that meets one or more predetermined conditions in the separate portion of the model data in the local memory in some or all of the nodes that is accessible by the processors when processing the instances of input data through the model. The controller then copies the model data that meets the one or more predetermined conditions from the separate portion of the model data in the local memory in the some or all of the nodes to local memories in other nodes. In this way, the controller distributes model data that meets the one or more predetermined conditions among the nodes, making the model data that meets the one or more predetermined conditions available to the nodes without performing remote memory accesses.
-
公开(公告)号:US20240411462A1
公开(公告)日:2024-12-12
申请号:US18207314
申请日:2023-06-08
Applicant: Advanced Micro Devices, Inc.
IPC: G06F3/06
Abstract: Local and dynamic triggering of operations executed by a processing-in-memory component is described. In accordance with the described techniques, a processing-in-memory component receives a command from a host for execution by the processing-in-memory component. The processing-in-memory component references a tracking table that includes at least one entry associated with an operation performed as part of executing the command and identifies at least one additional command to be triggered locally after executing the command received from the host. Responsive to identifying that conditions associated with the at least one additional command are satisfied, the processing-in-memory component executes the at least one additional command, independent of instructions from the host.
-
公开(公告)号:US12118354B2
公开(公告)日:2024-10-15
申请号:US17899231
申请日:2022-08-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Meysam Taassori , Shaizeen Dilawarhusen Aga , Mohamed Assem Abd ElMohsen Ibrahim , Johnathan Robert Alsop
CPC classification number: G06F9/30036 , G06F12/10 , G06F16/2237
Abstract: A virtual padding unit provides a virtual padded data structure (e.g., virtually padded matrix) that provides output values for a padded data structure without storing all of the padding elements in memory. When the virtual padding unit receives a virtual memory address of a location in the virtual padded data structure, the virtual padding unit checks whether the location is a non-padded location in the virtual padded data structure or a padded location in the virtual padded data structure. If the location is a padded location in the virtual padded data structure, the virtual padding unit outputs a padding value rather than a value stored in the virtual padded data structure. If the location is a non-padded location in the virtual padded data structure, a value stored at the location is output.
-
10.
公开(公告)号:US11977782B2
公开(公告)日:2024-05-07
申请号:US17855442
申请日:2022-06-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Mohamed Assem Abd ElMohsen Ibrahim , Meysam Taassori , Mahzabeen Islam , Shaizeen Aga
IPC: G06F3/06
CPC classification number: G06F3/0659 , G06F3/0613 , G06F3/0673
Abstract: An approach allows concurrent execution of near-memory processing commands, referred to herein as “PIM commands,” and host memory commands. A memory controller determines and issues a plurality of register-only PIM commands that do not reference memory with host memory commands to allow concurrent execution of the register-only PIM commands and the host memory commands. The approach allows concurrent execution of register-only PIM commands and host memory commands without interference, even when the register-only PIM commands and the host memory commands are interleaved, and even for the same memory module, which improves resource utilization and performance. Further improvement of resource utilization and performance is achieved by extending a register-only phase by reordering register-only PIM commands before non-register-only PIM commands, subject to dependency constraints, and using shadow row buffers to provide local working copies of data from memory to near-memory compute elements.
-
-
-
-
-
-
-
-
-