Memory latency-aware GPU architecture

    公开(公告)号:US12067642B2

    公开(公告)日:2024-08-20

    申请号:US17030024

    申请日:2020-09-23

    Abstract: One or more processing units, such as a graphics processing unit (GPU), execute an application. A resource manager selectively allocates a first memory portion or a second memory portion to the processing units based on memory access characteristics. The first memory portion has a first latency that is lower that a second latency of the second memory portion. In some cases, the memory access characteristics indicate a latency sensitivity. In some cases, hints included in corresponding program code are used to determine the memory access characteristics. The memory access characteristics can also be determined by monitoring memory access requests, measuring a cache miss rate or a row buffer miss rate for the monitored memory access requests, and determining the memory access characteristics based on the cache miss rate or the row buffer miss rate.

    Remote task queuing by networked computing devices
    2.
    发明授权
    Remote task queuing by networked computing devices 有权
    网络计算设备的远程任务排队

    公开(公告)号:US09582402B2

    公开(公告)日:2017-02-28

    申请号:US14164220

    申请日:2014-01-26

    Abstract: The described embodiments include a networking subsystem in a second computing device that is configured to receive a task message from a first computing device. Based on the task message, the networking subsystem updates an entry in a task queue with task information from the task message. A processing subsystem in the second computing device subsequently retrieves the task information from the task queue and performs the corresponding task. In these embodiments, the networking subsystem processes the task message (e.g., stores the task information in the task queue) without causing the processing subsystem to perform operations for processing the task message.

    Abstract translation: 所描述的实施例包括被配置为从第一计算设备接收任务消息的第二计算设备中的网络子系统。 基于任务消息,网络子系统使用来自任务消息的任务信息来更新任务队列中的条目。 第二计算设备中的处理子系统随后从任务队列检索任务信息并执行相应的任务。 在这些实施例中,网络子系统处理任务消息(例如,将任务信息存储在任务队列中)而不使处理子系统执行用于处理任务消息的操作。

    Data co-location using address hashing for high-performance processing in memory

    公开(公告)号:US12158842B2

    公开(公告)日:2024-12-03

    申请号:US17956995

    申请日:2022-09-30

    Abstract: A processing system allocates memory to co-locate input and output operands for operations for processing in memory (PIM) execution in the same PIM-local memory while exploiting row-buffer locality and complying with conventional memory abstraction. The processing system identifies as “super rows” virtual rows that span all the banks of a memory device. Each super row has a different bank-interleaving pattern, referred to as a “color”. A group of contiguous super rows that has the same PIM-interleaving pattern is referred to as a “color group”. The processing system assigns memory addresses to each operand (e.g., vector) of an operation for PIM execution to a super row having a different color within the same color group to co-locate the operands for each PIM execution unit and uses address hashing to alternate between banks assigned to elements of a first operand and elements of a second operand of the operation.

    Processing-in-memory concurrent processing system and method

    公开(公告)号:US11468001B1

    公开(公告)日:2022-10-11

    申请号:US17217792

    申请日:2021-03-30

    Abstract: A processing system includes a processing unit and a memory device. The memory device includes a processing-in-memory (PIM) module that performs processing operations on behalf of the processing unit. An instruction set architecture (ISA) of the PIM module has fewer instructions than an ISA of the processing unit. Instructions received from the processing unit are translated such that processing resources of the PIM module are virtualized. As a result, the PIM module concurrently performs processing operations for multiple threads or applications of the processing unit.

    Remote Task Queuing by Networked Computing Devices
    7.
    发明申请
    Remote Task Queuing by Networked Computing Devices 有权
    网络计算设备的远程任务排队

    公开(公告)号:US20140331230A1

    公开(公告)日:2014-11-06

    申请号:US14164220

    申请日:2014-01-26

    Abstract: The described embodiments include a networking subsystem in a second computing device that is configured to receive a task message from a first computing device. Based on the task message, the networking subsystem updates an entry in a task queue with task information from the task message. A processing subsystem in the second computing device subsequently retrieves the task information from the task queue and performs the corresponding task. In these embodiments, the networking subsystem processes the task message (e.g., stores the task information in the task queue) without causing the processing subsystem to perform operations for processing the task message.

    Abstract translation: 所描述的实施例包括被配置为从第一计算设备接收任务消息的第二计算设备中的网络子系统。 基于任务消息,网络子系统使用来自任务消息的任务信息来更新任务队列中的条目。 第二计算设备中的处理子系统随后从任务队列检索任务信息并执行相应的任务。 在这些实施例中,网络子系统处理任务消息(例如,将任务信息存储在任务队列中)而不使处理子系统执行用于处理任务消息的操作。

    DATA CO-LOCATION USING ADDRESS HASHING FOR HIGH-PERFORMANCE PROCESSING IN MEMORY

    公开(公告)号:US20240111672A1

    公开(公告)日:2024-04-04

    申请号:US17956995

    申请日:2022-09-30

    CPC classification number: G06F12/0607 G06F12/0223 G06F2212/1024

    Abstract: A processing system allocates memory to co-locate input and output operands for operations for processing in memory (PIM) execution in the same PIM-local memory while exploiting row-buffer locality and complying with conventional memory abstraction. The processing system identifies as “super rows” virtual rows that span all the banks of a memory device. Each super row has a different bank-interleaving pattern, referred to as a “color”. A group of contiguous super rows that has the same PIM-interleaving pattern is referred to as a “color group”. The processing system assigns memory addresses to each operand (e.g., vector) of an operation for PIM execution to a super row having a different color within the same color group to co-locate the operands for each PIM execution unit and uses address hashing to alternate between banks assigned to elements of a first operand and elements of a second operand of the operation.

    Automatic Data Layout for Operation Chains
    10.
    发明公开

    公开(公告)号:US20240272791A1

    公开(公告)日:2024-08-15

    申请号:US18108653

    申请日:2023-02-12

    Abstract: Automatic generation of data layout instructions for locating data objects in memory that are involved in a sequence of operations for a computational task is described. In accordance with the described techniques, an interference graph is generated for the sequence of operations, where individual nodes in the interference graph represent data objects involved in the computational task. The interference graph includes edges connecting different pairs of nodes, such that an edge indicates the connected data objects are involved in a common operation of the sequence of operations. Weights are assigned to edges based on architectural characteristics of a system performing the computational task as well as a size of the data objects connected by an edge. Individual data objects are then assigned to locations in memory based on edge weights of edges connected to a node representing the data object, optimizing system performance during the computational task.

Patent Agency Ranking