Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Yasuko Eckert"

21.

发明申请
PROCESSING DEVICE WITH INDEPENDENTLY ACTIVATABLE WORKING MEMORY BANK AND METHODS 有权
Title translation: 具有独立可启动工作记忆体的处理装置和方法

公开(公告)号：US20140181411A1

公开(公告)日：2014-06-26

申请号：US13723294

申请日：2012-12-21

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Mithuna Thottethodi , Gabriel Loh , Mauricio Breternitz , James O'Connor , Yasuko Eckert

IPC: G06F12/08

CPC classification number: G06F12/0891 , G06F12/0804 , G06F2212/601 , Y02D10/13

Abstract: A data processing device is provided that includes an array of working memory banks and an associated processing engine. The working memory bank array is configured with at least one independently activatable memory bank. A dirty data counter (DDC) is associated with the independently activatable memory bank and is configured to reflect a count of dirty data migrated from the independently activatable memory bank upon selective deactivation of the independently activatable memory bank. The DDC is configured to selectively decrement the count of dirty data upon the reactivation of the independently activatable memory bank in connection with a transient state. In the transient state, each dirty data access by the processing engine to the reactivated memory bank is also conducted with respect to another memory bank of the array. Upon a condition that dirty data is found in the other memory bank, the count of dirty data is decremented.

Abstract translation: 提供了一种数据处理装置，其包括工作存储器组和相关处理引擎的阵列。工作存储器阵列配置有至少一个可独立激活的存储体。脏数据计数器（DDC）与可独立激活的存储体相关联，并且被配置为反映从可独立激活的存储体选择性地去激活时从可独立激活的存储体组迁移的脏数据的计数。 DDC被配置为在与暂时状态相关联的可独立激活的存储体的重新激活时选择性地减少脏数据的计数。在过渡状态下，处理引擎对重新激活的存储体的每个脏数据访问也相对于阵列的另一存储体进行。在另一个存储体中发现脏数据的情况下，脏数据的计数减少。

22.

发明申请
METHODS AND APPARATUS FOR FILTERING STACK DATA WITHIN A CACHE MEMORY HIERARCHY 审中-公开
Title translation: 用于在高速缓存存储器层次中过滤堆叠数据的方法和装置

公开(公告)号：US20140143498A1

公开(公告)日：2014-05-22

申请号：US13945620

申请日：2013-07-18

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Lena E. Olson , Yasuko Eckert , Vilas K. Sridharan , James M. O'Connor , Mark D. Hill , Srilatha Manne

IPC: G06F12/08

CPC classification number: G06F12/0802 , G06F12/08 , G06F12/0804 , G06F12/0811 , G06F12/0815 , G06F12/0848 , G06F12/0862 , G06F12/0864 , G06F12/0875 , G06F12/1036 , G06F12/123 , G06F2212/1016 , G06F2212/1028 , G06F2212/451 , G06F2212/6032 , G06F2212/683 , G06F2212/684 , Y02B60/1225 , Y02D10/13

Abstract: A method of storing stack data in a cache hierarchy is provided. The cache hierarchy comprises a data cache and a stack filter cache. Responsive to a request to access a stack data block, the method stores the stack data block in the stack filter cache, wherein the stack filter cache is configured to store any requested stack data block.

Abstract translation: 提供了将堆栈数据存储在高速缓存层级中的方法。缓存层级包括数据高速缓存和堆栈过滤器高速缓存。响应于访问堆栈数据块的请求，该方法将堆栈数据块存储在堆栈过滤器高速缓存中，其中堆叠过滤器高速缓存被配置为存储任何所请求的栈数据块。

23.

发明公开
Permute Instructions for Register-Based Lookups 审中-公开

公开(公告)号：US20240220247A1

公开(公告)日：2024-07-04

申请号：US18148873

申请日：2022-12-30

Applicant: Advanced Micro Devices, Inc.

Inventor： Vadim Vadimovich Nikiforov , Yasuko Eckert , Bradford Michael Beckmann

IPC: G06F9/30

CPC classification number: G06F9/30127 , G06F9/30134

Abstract: Permute instructions for register-based lookups is described. In accordance with the described techniques, a processor is configured to perform a register-based lookup by retrieving a first result from a first lookup table based on a subset of bits included in an index of a destination register, retrieving a second result from a second lookup table based on the subset of bits included in the index of the destination register, selecting the first result or the second result based on a bit in the index of the destination register that is excluded from the subset of bits, and overwriting data included in the index of the destination register using a selected one of the first result or the second result.

24.

发明授权
Mechanism for reducing coherence directory controller overhead for near-memory compute elements 有权

公开(公告)号：US12008378B2

公开(公告)日：2024-06-11

申请号：US18132879

申请日：2023-04-10

Applicant: Advanced Micro Devices, Inc.

Inventor： Varun Agrawal , Yasuko Eckert

IPC: G06F9/38 , G06F9/30 , G06F12/0815 , G06F13/16

CPC classification number: G06F9/3895 , G06F9/30036 , G06F9/30105 , G06F12/0815 , G06F13/1668

Abstract: A parallel processing (PP) level coherence directory, also referred to as a Processing In-Memory Probe Filter (PimPF), is added to a coherence directory controller. When the coherence directory controller receives a broadcast PIM command from a host, or a PIM command that is directed to multiple memory banks in parallel, the PimPF accelerates processing of the PIM command by maintaining a directory for cache coherence that is separate from existing system level directories in the coherence directory controller. The PimPF maintains a directory according to address signatures that define the memory addresses affected by a broadcast PIM command. Two implementations are described: a lightweight implementation that accelerates PIM loads into registers, and a heavyweight implementation that accelerates both PIM loads into registers and PIM stores into memory.

25.

发明授权
Method and system for opportunistic load balancing in neural networks using metadata 有权

公开(公告)号：US11880715B2

公开(公告)日：2024-01-23

申请号：US17222543

申请日：2021-04-05

Applicant: Advanced Micro Devices, Inc.

Inventor： Nicholas Malaya , Yasuko Eckert

IPC: G06F9/50 , G06N3/082

CPC classification number: G06F9/5044 , G06F9/505 , G06F9/5066 , G06N3/082

Abstract: Methods and systems for load balancing in a neural network system using metadata are disclosed. Any one or a combination of one or more kernels, one or more neurons, and one or more layers of the neural network system are tagged with metadata. A scheduler detects whether there are neurons that are available to execute. The scheduler uses the metadata to schedule and load balance computations across compute resources and available resources.

26.

发明授权
Distributed coherence directory subsystem with exclusive data regions 有权

公开(公告)号：US11726915B2

公开(公告)日：2023-08-15

申请号：US16821632

申请日：2020-03-17

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Yasuko Eckert , Maurice B. Steinman , Steven Raasch

IPC: G06F12/0817 , G06F12/084

CPC classification number: G06F12/0824 , G06F12/084

Abstract: A processing system includes a first set of one or more processing units including a first processing unit, a second set of one or more processing units including a second processing unit, and a memory having an address space shared by the first and second sets. The processing system further includes a distributed coherence directory subsystem having a first coherence directory to support a first subset of one or more address regions of the address space and a second coherence directory to support a second subset of one or more address regions of the address space. In some implementations, the first coherence directory is implemented in the system so as to have a lower access latency for the first set, whereas the second coherence directory is implemented in the system so as to have a lower access latency for the second set.

27.

发明授权
Distribution of data and memory timing parameters across memory modules based on memory access patterns 有权

公开(公告)号：US11586563B2

公开(公告)日：2023-02-21

申请号：US17130604

申请日：2020-12-22

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Max Ruttenberg , Vendula Venkata Srikant Bharadwaj , Yasuko Eckert , Anthony Gutierrez , Mark H. Oskin

IPC: G06F13/16 , G11C11/4076

Abstract: A processor distributes memory timing parameters and data among different memory modules based upon memory access patterns. The memory access patterns indicate different types, or classes, of data for an executing workload, with each class associated with different memory access characteristics, such as different row buffer hit rate levels, different frequencies of access, different criticalities, and the like. The processor assigns each memory module to a data class and sets the memory timing parameters for each memory module according to the module's assigned data class, thereby tailoring the memory timing parameters for efficient access of the corresponding data.

28.

发明授权
Memory request priority assignment techniques for parallel processors 有权

公开(公告)号：US11507522B2

公开(公告)日：2022-11-22

申请号：US16706421

申请日：2019-12-06

Applicant: Advanced Micro Devices, Inc.

Inventor： Sooraj Puthoor , Kishore Punniyamurthy , Onur Kayiran , Xianwei Zhang , Yasuko Eckert , Johnathan Alsop , Bradford Michael Beckmann

IPC: G06F13/18 , G06F13/16

Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.

29.

发明授权
Adaptive cache reconfiguration via clustering 有权

公开(公告)号：US11360891B2

公开(公告)日：2022-06-14

申请号：US16355168

申请日：2019-03-15

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Mohamed Assem Ibrahim , Onur Kayiran , Yasuko Eckert , Gabriel H. Loh

IPC: G06F12/0802 , G06F12/084 , G06F12/0846

Abstract: A method of dynamic cache configuration includes determining, for a first clustering configuration, whether a current cache miss rate exceeds a miss rate threshold. The first clustering configuration includes a plurality of graphics processing unit (GPU) compute units clustered into a first plurality of compute unit clusters. The method further includes clustering, based on the current cache miss rate exceeding the miss rate threshold, the plurality of GPU compute units into a second clustering configuration having a second plurality of compute unit clusters fewer than the first plurality of compute unit clusters.

30.

发明申请
MEMORY ACCESS RESPONSE MERGING IN A MEMORY HIERARCHY 有权

公开(公告)号：US20220091980A1

公开(公告)日：2022-03-24

申请号：US17031706

申请日：2020-09-24

Applicant: Advanced Micro Devices, Inc.

Inventor： Onur Kayiran , Yasuko Eckert , Mark Henry Oskin , Gabriel H. Loh , Steven E. Raasch , Maxim V. Kazakov

IPC: G06F12/0811 , G06F12/084 , G06F12/0877 , G06F13/16 , G06F11/30

Abstract: A system and method for efficiently processing memory requests are described. A computing system includes multiple compute units, multiple caches of a memory hierarchy and a communication fabric. A compute unit generates a memory access request that misses in a higher level cache, which sends a miss request to a lower level shared cache. During servicing of the miss request, the lower level cache merges identification information of multiple memory access requests targeting a same cache line from multiple compute units into a merged memory access response. The lower level shared cache continues to insert information into the merged memory access response until the lower level shared cache is ready to issue the merged memory access response. An intermediate router in the communication fabric broadcasts the merged memory access response into multiple memory access responses to send to corresponding compute units.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification