Patent search ap:("Intel Corporation") AND inv:"Maxim Kazakov" Page 1

1.

发明公开
SCALABLE AND CONFIGURABLE CLUSTERED SYSTOLIC ARRAY 审中-公开

公开(公告)号：US20240220448A1

公开(公告)日：2024-07-04

申请号：US18148998

申请日：2022-12-30

Applicant: Intel Corporation

Inventor： Chunhui Mei , Jiasheng Chen , Ben J. Ashbaugh , Fangwen Fu , Hong Jiang , Guei-Yuan Lueh , Rama S.B. Harihara , Maxim Kazakov

IPC: G06F15/80 , G06F13/16

CPC classification number: G06F15/8046 , G06F13/1668

Abstract: A scalable and configurable clustered systolic array is described. An example of apparatus includes a cluster including multiple cores; and a cache memory coupled with the cluster, wherein each core includes multiple processing resources, a memory coupled with the plurality of processing resources, a systolic array coupled with the memory, and one or more interconnects with one or more other cores of the plurality of cores; and wherein the systolic arrays of the cores are configurable by the apparatus to form a logically combined systolic array for processing of an operation by a cooperative group of threads running on one or more of the plurality of cores in the cluster.

2.

发明公开
SYNCHRONIZATION FOR DATA MULTICAST IN COMPUTE CORE CLUSTERS 审中-公开

公开(公告)号：US20240220335A1

公开(公告)日：2024-07-04

申请号：US18148993

申请日：2022-12-30

Applicant: Intel Corporation

Inventor： Chunhui Mei , Yongsheng Liu , John A. Wiegert , Vasanth Ranganathan , Ben J. Ashbaugh , Fangwen Fu , Hong Jiang , Guei-Yuan Lueh , James Valerio , Alan M. Curtis , Maxim Kazakov

IPC: G06F9/52 , G06F9/38 , G06F9/50

CPC classification number: G06F9/522 , G06F9/3877 , G06F9/5072 , G06F9/3887

Abstract: Synchronization for data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a graphics processing unit (GPU), the GPU including one or more clusters of cores and a memory, wherein each cluster of cores includes a plurality of cores, each core including one or more processing resources, shared local memory, and gateway circuitry, wherein the GPU is to initiate broadcast of a data element from a producer core to one or more consumer cores, and synchronize the broadcast of the data element utilizing the gateway circuitry of the producer core and the one or more consumer cores, and wherein synchronizing the broadcast of the data element includes establishing a multi-core barrier for broadcast of the data element.

3.

发明公开
SHARED LOCAL REGISTERS FOR THREAD TEAM PROCESSING 审中-公开

公开(公告)号：US20240112295A1

公开(公告)日：2024-04-04

申请号：US17958216

申请日：2022-09-30

Applicant: Intel Corporation

Inventor： Biju George , Fangwen Fu , Supratim Pal , Jorge Parra , Chunhui Mei , Maxim Kazakov , Joydeep Ray

IPC: G06T1/20 , G06F9/30 , G06F9/38

CPC classification number: G06T1/20 , G06F9/30098 , G06F9/3836

Abstract: Shared local registers for thread team processing is described. An example of an apparatus includes one or more processors including a graphic processor having multiple processing resources; and memory for storage of data, the graphics processor to allocate a first thread team to a first processing resource, the first thread team including hardware threads to be executed solely by the first processing resource; allocate a shared local register (SLR) space that may be directly reference in the ISA instructions to the first processing resource, the SLR space being accessible to the threads of the thread team and being inaccessible to threads outside of the thread team; and allocate individual register spaces to the thread team, each of the individual register spaces being accessible to a respective thread of the thread team.

4.

发明公开
CROSS-THREAD REGISTER SHARING FOR MATRIX MULTIPLICATION COMPUTE 审中-公开

公开(公告)号：US20240168807A1

公开(公告)日：2024-05-23

申请号：US18056949

申请日：2022-11-18

Applicant: Intel Corporation

Inventor： Jorge Eduardo Parra Osorio , Guei-Yuan Lueh , Maxim Kazakov , Fangwen Fu , Supratim Pal , Kaiyu Chen

IPC: G06F9/50 , G06F9/48 , G06F9/52 , G06F15/80

CPC classification number: G06F9/5027 , G06F9/48 , G06F9/522 , G06F15/8046

Abstract: An apparatus to facilitate cross-thread register sharing for matrix multiplication compute is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units are to: receive a decoded instruction for a first thread having a first register space, wherein the decoded instruction is for a matrix multiplication operation and comprises an indication to utilize a second register space of a second thread for an operand of the decoded instruction for the first thread; access the second register space of the second thread to obtain data for the operand of the decoded instruction; and perform the matrix multiplication operation for the first thread using the data for the operand from the second register space of the second thread.

5.

发明公开
DETERMINISTIC BROADCASTING FROM SHARED MEMORY 审中-公开

公开(公告)号：US20240111534A1

公开(公告)日：2024-04-04

申请号：US17957486

申请日：2022-09-30

Applicant: Intel Corporation

Inventor： Fangwen Fu , Chunhui Mei , Maxim Kazakov , Biju George , Jorge Parra , Supratim Pal

IPC: G06F9/30 , G06F9/54

CPC classification number: G06F9/30047 , G06F9/3009 , G06F9/542

Abstract: Embodiments described herein provide a technique enable a broadcast load from an L1 cache or shared local memory to register files associated with hardware threads of a graphics core. One embodiment provides a graphics processor comprising a cache memory and a graphics core coupled with the cache memory. The graphics core includes a plurality of hardware threads and memory access circuitry to facilitate access to memory by the plurality of hardware threads. The graphics core is configurable to process a plurality of load request from the plurality of hardware threads, detect duplicate load requests within the plurality of load requests, perform a single read from the cache memory in response to the duplicate load requests, and transmit data associated with the duplicate load requests to requesting hardware threads.

6.

发明申请
INSTRUCTION ENCODING TO IMPLEMENT INCREASED REGISTER CAPACITY PER THREAD 有权

公开(公告)号：US20250068423A1

公开(公告)日：2025-02-27

申请号：US18453861

申请日：2023-08-22

Applicant: Intel Corporation

Inventor： Jorge Eduardo Parra Osorio , Jiasheng Chen , Supratim Pal , Vasanth Ranganathan , Guei-Yuan Lueh , James Valerio , Pradeep Golconda , Brent Schwartz , Fangwen Fu , Sabareesh Ganapathy , Peter Caday , Wei-Yu Chen , Po-Yu Chen , Timothy Bauer , Maxim Kazakov , Stanley Gambarin , Samir Pandya

IPC: G06F9/30 , G06F9/38

Abstract: Described herein is a graphics processor comprising first circuitry configured to execute a decoded instruction and second circuitry configured to second circuitry configured to decode an instruction into the decoded instruction. The second circuitry is configured to determine a number of registers within a register file that are available to a thread of the processing resource and decode the instruction based on that number of registers.

7.

发明公开
DATA MULTICAST IN COMPUTE CORE CLUSTERS 审中-公开

公开(公告)号：US20240220254A1

公开(公告)日：2024-07-04

申请号：US18148997

申请日：2022-12-30

Applicant: Intel Corporation

Inventor： Chunhui Mei , Yongsheng Liu , John A. Wiegert , Vasanth Ranganathan , Ben J. Ashbaugh , Fangwen Fu , Hong Jiang , Guei-Yuan Lueh , James Valerio , Alan M. Curtis , Maxim Kazakov

IPC: G06F9/30 , G06F9/38 , G06F9/50 , G06F9/54

CPC classification number: G06F9/30087 , G06F9/3877 , G06F9/5072 , G06F9/544

Abstract: Data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a first processor, the first processor including one or more clusters of cores and a memory, wherein each cluster of cores includes multiple cores, each core including one or more processing resources, shared memory, and broadcast circuitry; and wherein a first core in a first cluster of cores is to request a data element, determine whether any additional cores in the first cluster require the data element, and, upon determining that one or more additional cores in the first cluster require the data element, broadcast the data element to the one or more additional cores via interconnects between the broadcast circuitry of the cores of the first core cluster.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification