-
公开(公告)号:US12067642B2
公开(公告)日:2024-08-20
申请号:US17030024
申请日:2020-09-23
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Niti Madan , Michael L. Chu , Ashwin Aji
CPC classification number: G06T1/60 , G06F3/0604 , G06F3/0631 , G06F3/0679 , G06F9/5016 , G06T1/20
Abstract: One or more processing units, such as a graphics processing unit (GPU), execute an application. A resource manager selectively allocates a first memory portion or a second memory portion to the processing units based on memory access characteristics. The first memory portion has a first latency that is lower that a second latency of the second memory portion. In some cases, the memory access characteristics indicate a latency sensitivity. In some cases, hints included in corresponding program code are used to determine the memory access characteristics. The memory access characteristics can also be determined by monitoring memory access requests, measuring a cache miss rate or a row buffer miss rate for the monitored memory access requests, and determining the memory access characteristics based on the cache miss rate or the row buffer miss rate.
-
公开(公告)号:US09582402B2
公开(公告)日:2017-02-28
申请号:US14164220
申请日:2014-01-26
Applicant: Advanced Micro Devices, Inc.
Inventor: Steven K. Reinhardt , Michael L. Chu , Vinod Tipparaju , Walter B. Benton
CPC classification number: G06F11/3672 , G06F9/4843 , G06F11/34 , G06F11/3419 , G06F11/3471 , G06F11/3612
Abstract: The described embodiments include a networking subsystem in a second computing device that is configured to receive a task message from a first computing device. Based on the task message, the networking subsystem updates an entry in a task queue with task information from the task message. A processing subsystem in the second computing device subsequently retrieves the task information from the task queue and performs the corresponding task. In these embodiments, the networking subsystem processes the task message (e.g., stores the task information in the task queue) without causing the processing subsystem to perform operations for processing the task message.
Abstract translation: 所描述的实施例包括被配置为从第一计算设备接收任务消息的第二计算设备中的网络子系统。 基于任务消息,网络子系统使用来自任务消息的任务信息来更新任务队列中的条目。 第二计算设备中的处理子系统随后从任务队列检索任务信息并执行相应的任务。 在这些实施例中,网络子系统处理任务消息(例如,将任务信息存储在任务队列中)而不使处理子系统执行用于处理任务消息的操作。
-
公开(公告)号:US12019560B2
公开(公告)日:2024-06-25
申请号:US17556431
申请日:2021-12-20
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Sooraj Puthoor , Muhammad Amber Hassaan , Ashwin Aji , Michael L. Chu , Nuwan Jayasena
IPC: G06F12/10 , G06F12/02 , G06F12/1009 , G06F12/1045 , G06F12/1072 , G06F13/16
CPC classification number: G06F12/1072 , G06F12/0238 , G06F12/1009 , G06F12/1054 , G06F12/1063 , G06F13/1673 , G06F2212/7201
Abstract: Process isolation for a PIM device includes: receiving, from a process, a call to allocate a virtual address space where the process stores a PIM configuration context; allocating the virtual address space including mapping a physical address space including PIM device configuration registers to the virtual address space only if the physical address space is not mapped to another process's virtual address space; and programming the PIM device configuration space according to the configuration context. When a PIM command is executed, a translation mechanism determines whether there is a valid mapping of a virtual address of the PIM command to a physical address of a PIM resource, such as a LIS entry. If a valid mapping exists, the translation is completed and the resource is accessed, but if there is not a valid mapping, the translation fails and the process is blocked from accessing the PIM resource.
-
公开(公告)号:US11934698B2
公开(公告)日:2024-03-19
申请号:US17556503
申请日:2021-12-20
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Sooraj Puthoor , Muhammad Amber Hassaan , Ashwin Aji , Michael L. Chu , Nuwan Jayasena
CPC classification number: G06F3/0659 , G06F3/0622 , G06F3/0631 , G06F3/0656 , G06F3/0679 , G06F7/575
Abstract: Process isolation for a PIM device through exclusive locking includes receiving, from a process, a call requesting ownership of a PIM device. The request includes one or more PIM configuration parameters. The exclusive locking technique also includes granting the process ownership of the PIM device responsive to determining that ownership is available. The PIM device is configured according to the PIM configuration parameters.
-
公开(公告)号:US12158842B2
公开(公告)日:2024-12-03
申请号:US17956995
申请日:2022-09-30
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Benjamin Youngjae Cho , Armand Bahram Behroozi , Michael L. Chu , Ashwin Aji
Abstract: A processing system allocates memory to co-locate input and output operands for operations for processing in memory (PIM) execution in the same PIM-local memory while exploiting row-buffer locality and complying with conventional memory abstraction. The processing system identifies as “super rows” virtual rows that span all the banks of a memory device. Each super row has a different bank-interleaving pattern, referred to as a “color”. A group of contiguous super rows that has the same PIM-interleaving pattern is referred to as a “color group”. The processing system assigns memory addresses to each operand (e.g., vector) of an operation for PIM execution to a super row having a different color within the same color group to co-locate the operands for each PIM execution unit and uses address hashing to alternate between banks assigned to elements of a first operand and elements of a second operand of the operation.
-
公开(公告)号:US11468001B1
公开(公告)日:2022-10-11
申请号:US17217792
申请日:2021-03-30
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Muhammad Amber Hassaan , Michael L. Chu , Ashwin Aji
IPC: G06F15/78
Abstract: A processing system includes a processing unit and a memory device. The memory device includes a processing-in-memory (PIM) module that performs processing operations on behalf of the processing unit. An instruction set architecture (ISA) of the PIM module has fewer instructions than an ISA of the processing unit. Instructions received from the processing unit are translated such that processing resources of the PIM module are virtualized. As a result, the PIM module concurrently performs processing operations for multiple threads or applications of the processing unit.
-
公开(公告)号:US20140331230A1
公开(公告)日:2014-11-06
申请号:US14164220
申请日:2014-01-26
Applicant: Advanced Micro Devices, Inc.
Inventor: Steven K. Reinhardt , Michael L. Chu , Vinod Tipparaju , Walter B. Benton
IPC: G06F9/48
CPC classification number: G06F11/3672 , G06F9/4843 , G06F11/34 , G06F11/3419 , G06F11/3471 , G06F11/3612
Abstract: The described embodiments include a networking subsystem in a second computing device that is configured to receive a task message from a first computing device. Based on the task message, the networking subsystem updates an entry in a task queue with task information from the task message. A processing subsystem in the second computing device subsequently retrieves the task information from the task queue and performs the corresponding task. In these embodiments, the networking subsystem processes the task message (e.g., stores the task information in the task queue) without causing the processing subsystem to perform operations for processing the task message.
Abstract translation: 所描述的实施例包括被配置为从第一计算设备接收任务消息的第二计算设备中的网络子系统。 基于任务消息,网络子系统使用来自任务消息的任务信息来更新任务队列中的条目。 第二计算设备中的处理子系统随后从任务队列检索任务信息并执行相应的任务。 在这些实施例中,网络子系统处理任务消息(例如,将任务信息存储在任务队列中)而不使处理子系统执行用于处理任务消息的操作。
-
公开(公告)号:US20240111672A1
公开(公告)日:2024-04-04
申请号:US17956995
申请日:2022-09-30
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Benjamin Youngjae Cho , Armand Bahram Behroozi , Michael L. Chu , Ashwin Aji
CPC classification number: G06F12/0607 , G06F12/0223 , G06F2212/1024
Abstract: A processing system allocates memory to co-locate input and output operands for operations for processing in memory (PIM) execution in the same PIM-local memory while exploiting row-buffer locality and complying with conventional memory abstraction. The processing system identifies as “super rows” virtual rows that span all the banks of a memory device. Each super row has a different bank-interleaving pattern, referred to as a “color”. A group of contiguous super rows that has the same PIM-interleaving pattern is referred to as a “color group”. The processing system assigns memory addresses to each operand (e.g., vector) of an operation for PIM execution to a super row having a different color within the same color group to co-locate the operands for each PIM execution unit and uses address hashing to alternate between banks assigned to elements of a first operand and elements of a second operand of the operation.
-
公开(公告)号:US20250004730A1
公开(公告)日:2025-01-02
申请号:US18342347
申请日:2023-06-27
Applicant: Advanced Micro Devices, Inc.
Inventor: Emily Anne Furst , Robin Conradine Knauerhase , Sangeeta Chowdhary , Michael L. Chu
IPC: G06F8/41
Abstract: Selecting intermediate representation transformation for compilations is described. In accordance with the described techniques, source code is received to be compiled by a compilation system for execution by a processor of hardware. Intermediate representation transformations are selected for the source code based on system load information associated with the hardware. The intermediate representation transformations are output to the compilation system.
-
公开(公告)号:US20240272791A1
公开(公告)日:2024-08-15
申请号:US18108653
申请日:2023-02-12
Applicant: Advanced Micro Devices, Inc.
Inventor: Benjamin Youngjae Cho , Armand Bahram Behroozi , Michael L. Chu , Emily Anne Furst
IPC: G06F3/06
CPC classification number: G06F3/0604 , G06F3/0629 , G06F3/0671 , G06F9/4881 , G06F9/5016 , G06F9/5066
Abstract: Automatic generation of data layout instructions for locating data objects in memory that are involved in a sequence of operations for a computational task is described. In accordance with the described techniques, an interference graph is generated for the sequence of operations, where individual nodes in the interference graph represent data objects involved in the computational task. The interference graph includes edges connecting different pairs of nodes, such that an edge indicates the connected data objects are involved in a common operation of the sequence of operations. Weights are assigned to edges based on architectural characteristics of a system performing the computational task as well as a size of the data objects connected by an edge. Individual data objects are then assigned to locations in memory based on edge weights of edges connected to a node representing the data object, optimizing system performance during the computational task.
-
-
-
-
-
-
-
-
-