专利检索 cpc:"G06F9/522" 第 1 页

1.

发明公开
THREAD SYNCHRONIZATION MECHANISM 审中-公开

公开(公告)号：US20240362084A1

公开(公告)日：2024-10-31

申请号：US18677458

申请日：2024-05-29

申请人： Intel Corporation

发明人： STAV GURTOVOY , MATEUSZ MARIA PRZYBYLSKI , MICHAEL APODACA , MANJUNATH DS

IPC分类号： G06F9/52 , G06F9/48 , G06T1/20

CPC分类号： G06F9/52 , G06F9/4881 , G06F9/522 , G06T1/20

摘要： An apparatus to facilitate thread synchronization is disclosed. The apparatus comprises one or more processors to execute a producer thread to generate a plurality of commands, execute a consumer thread to process the plurality of commands and synchronize the producer thread with the consumer thread, including updating a producer fence value upon generation of in-order commands, updating a consumer fence value upon processing of the in-order commands and performing a synchronization operation based on the consumer fence value, wherein the producer fence value and the consumer fence value each correspond to an order position of an in-order command.

2.

发明公开
Processing of Synchronization Barrier Instructions 审中-公开

公开(公告)号：US20240329990A1

公开(公告)日：2024-10-03

申请号：US18740430

申请日：2024-06-11

申请人： Apple Inc.

发明人： Deepankar Duggal , Kulin N Kothari , Mridul Agarwal , Chang Xu , Yanran Yang , Richard F Russo , Yuan C Chou , Douglas C Holman

IPC分类号： G06F9/30 , G06F9/38 , G06F9/52

CPC分类号： G06F9/30087 , G06F9/3802 , G06F9/522

摘要： A system, e.g., a system on a chip (SOC), may include one or more processors. A processor may execute an instruction synchronization barrier (ISB) instruction to enforce an ordering constraint on instructions. To execute the ISB instruction, the processor may determine whether contexts of the processor required for execution of instructions older than the ISB instruction are consumed for the older instructions. Responsive to determining that the contexts are consumed for the older instructions, the processor may initiate fetching of an instruction younger than the ISB instruction, without waiting for the older instructions to retire.

3.

发明公开
PROGRAMMATICALLY CONTROLLED DATA MULTICASTING ACROSS MULTIPLE COMPUTE ENGINES 审中-公开

公开(公告)号：US20240289132A1

公开(公告)日：2024-08-29

申请号：US18660763

申请日：2024-05-10

申请人： NVIDIA Corporation

发明人： Apoorv PARLE , Ronny KRASHINSKY , John EDMONDSON , Jack CHOQUETTE , Shirish GADRE , Steve HEINRICH , Manan PATEL , Prakash Bangalore PRABHAKAR, JR. , Ravi MANYAM , Wish GANDHI , Lacky SHAH , Alexander L. Minkin

IPC分类号： G06F9/38 , G06F9/52 , G06F13/16 , G06F13/40 , G06T1/20 , G06T1/60 , H04L49/101

CPC分类号： G06F9/3887 , G06F9/522 , G06F13/1689 , G06F13/4022 , G06T1/20 , G06T1/60 , H04L49/101

摘要： This specification describes a programmatic multicast technique enabling one thread (for example, in a cooperative group array (CGA) on a GPU) to request data on behalf of one or more other threads (for example, executing on respective processor cores of the GPU). The multicast is supported by tracking circuitry that interfaces between multicast requests received from processor cores and the available memory. The multicast is designed to reduce cache (for example, layer 2 cache) bandwidth utilization enabling strong scaling and smaller tile sizes.

4.

发明授权
Barrier synchronization between host and accelerator over network 有权

公开(公告)号：US12073262B2

公开(公告)日：2024-08-27

申请号：US17338898

申请日：2021-06-04

申请人： Graphcore Limited

发明人： Ola Torudbakken , Wei-Lin Guay

IPC分类号： G06F9/52 , G06F9/38 , G06F9/54 , G06F15/173

CPC分类号： G06F9/522 , G06F9/3851 , G06F9/543 , G06F9/544 , G06F15/173 , G06F15/17325

摘要： A host system compiles a set of local programs which are provided over a network to a plurality of subsystems. By defining the synchronisation activity on the host, and then providing that information to the subsystems, the host can service a large number of subsystems. The defined synchronisation activity includes defining the synchronisation groups between which synchronisation barriers occur and the points during program execution at which data exchange with the host occurs. Defining synchronisation activity between the subsystems allows a large number of subsystems to be connecting whilst minimising the required exchanges with the host.

5.

发明公开
NAMED AND CLUSTER BARRIERS 审中-公开

公开(公告)号：US20240231957A9

公开(公告)日：2024-07-11

申请号：US17973234

申请日：2022-10-25

申请人： Intel Corporation

发明人： Fangwen Fu , Chunhui Mei , John A. Wiegert , Yongsheng Liu , Ben J. Ashbaugh

IPC分类号： G06F9/52 , G06F9/48

CPC分类号： G06F9/522 , G06F9/4881

摘要： Embodiments described herein provide a technique to facilitate the synchronization of workgroups executed on multiple graphics cores of a graphics core cluster. One embodiment provides a graphics core including a cache memory and a graphics core coupled with the cache memory. The graphics core includes execution resources to execute an instruction via a plurality of hardware threads and barrier circuitry to synchronize execution of the plurality of hardware threads, wherein the barrier circuitry is configured to provide a plurality of re-usable named barriers.

6.

发明公开
SYNCHRONIZATION FOR DATA MULTICAST IN COMPUTE CORE CLUSTERS 审中-公开

公开(公告)号：US20240220335A1

公开(公告)日：2024-07-04

申请号：US18148993

申请日：2022-12-30

申请人： Intel Corporation

发明人： Chunhui Mei , Yongsheng Liu , John A. Wiegert , Vasanth Ranganathan , Ben J. Ashbaugh , Fangwen Fu , Hong Jiang , Guei-Yuan Lueh , James Valerio , Alan M. Curtis , Maxim Kazakov

IPC分类号： G06F9/52 , G06F9/38 , G06F9/50

CPC分类号： G06F9/522 , G06F9/3877 , G06F9/5072 , G06F9/3887

摘要： Synchronization for data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a graphics processing unit (GPU), the GPU including one or more clusters of cores and a memory, wherein each cluster of cores includes a plurality of cores, each core including one or more processing resources, shared local memory, and gateway circuitry, wherein the GPU is to initiate broadcast of a data element from a producer core to one or more consumer cores, and synchronize the broadcast of the data element utilizing the gateway circuitry of the producer core and the one or more consumer cores, and wherein synchronizing the broadcast of the data element includes establishing a multi-core barrier for broadcast of the data element.

7.

发明公开
DATA DEPENDENCY-AWARE SCHEDULING 审中-公开

公开(公告)号：US20240220314A1

公开(公告)日：2024-07-04

申请号：US18091441

申请日：2022-12-30

申请人： ADVANCED MICRO DEVICES, INC.

发明人： Harris Gasparakis

IPC分类号： G06F9/48 , G06F9/52

CPC分类号： G06F9/4881 , G06F9/522

摘要： A processing system flexibly schedules workgroups across kernels based on data dependencies between workgroups to enhance processing efficiency. The workgroups are partitioned into subsets based on the data dependencies and workgroups of a first subset that produces data are scheduled to execute immediately before workgroups of a second subset that consumes the data generated by the first subset. Thus, the processing system does not execute one kernel at a time, but instead schedules workgroups across kernels based on data dependencies across kernels. By limiting the sizes of the subsets to the amount of data that can be stored at local caches, the processing system increases the probability that data to be consumed by workgroups of a subset will be resident in a local cache and will not require a memory access.

8.

发明授权
Architecture to support color scheme-based synchronization for machine learning 有权

公开(公告)号：US11995463B2

公开(公告)日：2024-05-28

申请号：US17237752

申请日：2021-04-22

申请人： Marvell Asia Pte, Ltd.

发明人： Avinash Sodani , Senad Durakovic , Gopal Nalamalapu

IPC分类号： G06F9/48 , G06F3/06 , G06F9/52 , G06N20/00 , G06F9/30 , G06F9/38 , G06F15/78 , G06F15/80 , G06F17/16 , G06N5/04

CPC分类号： G06F9/4818 , G06F3/0604 , G06F3/0659 , G06F3/0673 , G06F9/4881 , G06F9/52 , G06N20/00 , G06F9/30018 , G06F9/30087 , G06F9/3869 , G06F9/3871 , G06F9/522 , G06F15/7807 , G06F15/7846 , G06F15/8053 , G06F17/16 , G06N5/04

摘要： A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.

9.

发明公开
SYNCHRONIZATION UTILIZING LOCAL TEAM BARRIERS FOR THREAD TEAM PROCESSING 审中-公开

公开(公告)号：US20240111609A1

公开(公告)日：2024-04-04

申请号：US17958213

申请日：2022-09-30

申请人： Intel Corporation

发明人： Biju George , Supratim Pal , James Valerio , Vasanth Ranganathan , Fangwen Fu , Chunhui Mei

IPC分类号： G06F9/52 , G06F9/30

CPC分类号： G06F9/522 , G06F9/30098

摘要： Low-latency synchronization utilizing local team barriers for thread team processing is described. An example of an apparatus includes one or more processors including a graphics processor, the graphics processor including a plurality of processing resources; and memory for storage of data including data for graphics processing, wherein the graphics processor is to receive a request for establishment of a local team barrier for a thread team, the thread team being allocated to a first processing resource, the thread team including multiple threads; determine requirements and designated threads for the local team barrier; and establish the local team barrier in a local register of the first processing resource based at least in part on the requirements and designated threads for the local barrier.

10.

发明授权
Multi-die dot-product engine to provision large scale machine learning inference applications 有权

公开(公告)号：US11947928B2

公开(公告)日：2024-04-02

申请号：US17017557

申请日：2020-09-10

申请人： HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP

发明人： Craig Warner , Eun Sub Lee , Sai Rahul Chalamalasetti , Martin Foltin

IPC分类号： G06F7/544 , G06F9/38 , G06F9/52 , G06F40/20 , G06N3/063

CPC分类号： G06F7/5443 , G06F9/3867 , G06F9/522 , G06F40/20 , G06N3/063

摘要： Systems and methods are provided for a multi-die dot-product engine (DPE) to provision large-scale machine learning inference applications. The multi-die DPE leverages a multi-chip architecture. For example, a multi-chip interface can include a plurality of DPE chips, where each DPE chip performs inference computations for performing deep learning operations. A hardware interface between a memory of a host computer and the plurality of DPE chips communicatively connects the plurality of DPE chips to the memory of the host computer system during an inference operation such that the deep learning operations are spanned across the plurality of DPE chips. Due to the multi-die architecture, multiple silicon devices are allowed to be used for inference, thereby enabling power-efficient inference for large-scale machine learning applications and complex deep neural networks. The multi-die DPE can be used to build a multi-device DNN inference system performing specific applications, such as object recognition, with high accuracy.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类