专利检索 ap:("INTEL CORPORATION") AND inv:"Fangwen Fu" 第 6 页

51.

发明公开
SYNCHRONIZATION FOR DATA MULTICAST IN COMPUTE CORE CLUSTERS 审中-公开

公开(公告)号：US20240220335A1

公开(公告)日：2024-07-04

申请号：US18148993

申请日：2022-12-30

申请人： Intel Corporation

发明人： Chunhui Mei , Yongsheng Liu , John A. Wiegert , Vasanth Ranganathan , Ben J. Ashbaugh , Fangwen Fu , Hong Jiang , Guei-Yuan Lueh , James Valerio , Alan M. Curtis , Maxim Kazakov

IPC分类号： G06F9/52 , G06F9/38 , G06F9/50

CPC分类号： G06F9/522 , G06F9/3877 , G06F9/5072 , G06F9/3887

摘要： Synchronization for data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a graphics processing unit (GPU), the GPU including one or more clusters of cores and a memory, wherein each cluster of cores includes a plurality of cores, each core including one or more processing resources, shared local memory, and gateway circuitry, wherein the GPU is to initiate broadcast of a data element from a producer core to one or more consumer cores, and synchronize the broadcast of the data element utilizing the gateway circuitry of the producer core and the one or more consumer cores, and wherein synchronizing the broadcast of the data element includes establishing a multi-core barrier for broadcast of the data element.

52.

发明授权
Hierarchical thread scheduling based on multiple barriers 有权

公开(公告)号：US11977895B2

公开(公告)日：2024-05-07

申请号：US17131647

申请日：2020-12-22

申请人： Intel Corporation

发明人： Sabareesh Ganapathy , Fangwen Fu , Hong Jiang , James Valerio

IPC分类号： G06F9/38 , G06F9/48 , G06F9/54 , G06T1/20

CPC分类号： G06F9/3838 , G06F9/4881 , G06F9/544 , G06T1/20

摘要： Examples described herein relate to a graphics processing unit (GPU) coupled to the memory device, the GPU configured to: execute an instruction thread; determine if a dual directional signal barrier is associated with the instruction thread; and based on clearance of the dual directional signal barrier for a particular signal barrier identifier and a mode of operation, indicate a clearance of the dual directional signal barrier for the mode of operation, wherein the dual directional signal barrier is to provide a single barrier to gate activity of one or more producers based on activity of one or more consumers or gate activity of one or more consumers based on activity of one or more producers.

53.

发明公开
BROADCAST ASYNCHRONOUS LOADS TO SHARED LOCAL MEMORY 审中-公开

公开(公告)号：US20240134797A1

公开(公告)日：2024-04-25

申请号：US17973203

申请日：2022-10-24

申请人： Intel Corporation

发明人： John A. Wiegert , Joydeep Ray , Vasanth Ranganathan , Biju George , Fangwen Fu , Abhishek R. Appu , Chunhui Mei , Changwon Rhee

IPC分类号： G06F12/0855

CPC分类号： G06F12/0857 , G06F2212/1016

摘要： Embodiments described herein provide a technique to facilitate the broadcast or multicast of asynchronous loads to shared local memory of a plurality of graphics cores within a graphics core cluster. One embodiment provides a graphics processor including a cache memory a graphics core cluster coupled with the cache memory. The graphics core cluster includes a plurality of graphics cores. The plurality of graphics cores includes a graphics core configured to receive a designation as a producer graphics core for a multicast load, read data from the cache memory; and transmit the data read from the cache memory to a consumer graphics core of the plurality of graphics cores.

54.

发明公开
SHARED LOCAL REGISTERS FOR THREAD TEAM PROCESSING 审中-公开

公开(公告)号：US20240112295A1

公开(公告)日：2024-04-04

申请号：US17958216

申请日：2022-09-30

申请人： Intel Corporation

发明人： Biju George , Fangwen Fu , Supratim Pal , Jorge Parra , Chunhui Mei , Maxim Kazakov , Joydeep Ray

IPC分类号： G06T1/20 , G06F9/30 , G06F9/38

CPC分类号： G06T1/20 , G06F9/30098 , G06F9/3836

摘要： Shared local registers for thread team processing is described. An example of an apparatus includes one or more processors including a graphic processor having multiple processing resources; and memory for storage of data, the graphics processor to allocate a first thread team to a first processing resource, the first thread team including hardware threads to be executed solely by the first processing resource; allocate a shared local register (SLR) space that may be directly reference in the ISA instructions to the first processing resource, the SLR space being accessible to the threads of the thread team and being inaccessible to threads outside of the thread team; and allocate individual register spaces to the thread team, each of the individual register spaces being accessible to a respective thread of the thread team.

55.

发明公开
SYNCHRONIZATION UTILIZING LOCAL TEAM BARRIERS FOR THREAD TEAM PROCESSING 审中-公开

公开(公告)号：US20240111609A1

公开(公告)日：2024-04-04

申请号：US17958213

申请日：2022-09-30

申请人： Intel Corporation

发明人： Biju George , Supratim Pal , James Valerio , Vasanth Ranganathan , Fangwen Fu , Chunhui Mei

IPC分类号： G06F9/52 , G06F9/30

CPC分类号： G06F9/522 , G06F9/30098

摘要： Low-latency synchronization utilizing local team barriers for thread team processing is described. An example of an apparatus includes one or more processors including a graphics processor, the graphics processor including a plurality of processing resources; and memory for storage of data including data for graphics processing, wherein the graphics processor is to receive a request for establishment of a local team barrier for a thread team, the thread team being allocated to a first processing resource, the thread team including multiple threads; determine requirements and designated threads for the local team barrier; and establish the local team barrier in a local register of the first processing resource based at least in part on the requirements and designated threads for the local barrier.

56.

发明公开
ORDERED THREAD DISPATCH FOR THREAD TEAMS 审中-公开

公开(公告)号：US20240111590A1

公开(公告)日：2024-04-04

申请号：US17937270

申请日：2022-09-30

申请人： Intel Corporation

发明人： Biju George , Vasanth Ranganathan , Fangwen Fu , Ben Ashbaugh , Roland Schulz

IPC分类号： G06F9/50 , G06T1/20

CPC分类号： G06F9/5038 , G06T1/20

摘要： An apparatus to facilitate ordered thread dispatch for thread teams is disclosed. The apparatus includes one or more processors including a graphic processor, the graphics processor including a plurality of processing resources, and wherein the graphics processor is to: allocate a thread team local identifier (ID) for respective threads of a thread team comprising a plurality of hardware threads that are to be executed solely by a processing resource of the plurality of processing resources; and dispatch the respective threads together into the processing resource, the respective threads having the thread team local ID allocated.

57.

发明公开
HARDWARE ENHANCEMENTS FOR MATRIX LOAD/STORE INSTRUCTIONS 审中-公开

公开(公告)号：US20240069914A1

公开(公告)日：2024-02-29

申请号：US17893985

申请日：2022-08-23

申请人： Intel Corporation

发明人： Biju George , Fangwen Fu , Joydeep Ray

IPC分类号： G06F9/30 , G06F9/345 , G06F9/38

CPC分类号： G06F9/30036 , G06F9/30043 , G06F9/3455 , G06F9/3877

摘要： Embodiments described herein provide a system to enable access to an n-dimensional tensor in memory of a graphics processor via a batch of two-dimensional block access messages. One embodiment provides a graphics processor comprising general-purpose graphics execution resources coupled with the system interface, the general-purpose graphics execution resources including a matrix accelerator. The matrix accelerator is configured to perform a matrix operation on a plurality of tensors stored in a memory. Circuitry is included to facilitate access to the memory by the general-purpose graphics execution resources. The circuitry is configured to receive a request to access a tensor of the plurality of tensors and generate a batch of two-dimensional block access messages along a dimension of n>2 of the tensor. The batch of two-dimensional block access messages enables access to the tensor by the matrix accelerator.

58.

发明申请
EMULATION OF FLOATING POINT CALCULATION 有权

公开(公告)号：US20230086275A1

公开(公告)日：2023-03-23

申请号：US17482166

申请日：2021-09-22

申请人： Intel Corporation

发明人： Jiasheng Chen , Changwon Rhee , Sabareesh Ganapathy , Gregory Henry , Fangwen Fu

IPC分类号： G06F7/487 , G06F7/485 , G06F7/544 , G06F17/16 , G06F15/80

摘要： Emulating floating point calculation using lower precision format calculations is described. An example of a processor includes a floating point unit (FPU) to provide a native floating point operation in a first precision format; and systolic array hardware including multiple data processing units, wherein the processor is to receive data for performance of a matrix multiplication operation in the first precision format; enable an emulated floating point multiplication operation using one or more values with a second precision format, the second precision format having a lower precision than the first precision format, the emulated floating point multiplication including operation of the systolic array hardware; and generate an emulated result for the matrix multiplication operation.

59.

发明申请
REGISTER FILE FOR SYSTOLIC ARRAY 有权

公开(公告)号：US20220413851A1

公开(公告)日：2022-12-29

申请号：US17304794

申请日：2021-06-25

申请人： Intel Corporation

发明人： Chandra Gurram , Wei-yu Chen , Fangwen Fu , Sabareesh Ganapathy , Varghese George , Guei-Yuan Lueh , Subramaniam Maiyuran , Mike Macpherson , Supratim Pal , Jorge Parra

IPC分类号： G06F9/30 , G06F17/16 , G06F7/483

摘要： A processing apparatus includes a general-purpose parallel processing engine including a set of multiple processing elements including a single precision floating-point unit, a double precision floating point unit, and an integer unit; a matrix accelerator including one or more systolic arrays; a first register file coupled with a first read control circuit, wherein the first read control circuit couples with the set of multiple processing elements and the matrix accelerator to arbitrate read requests to the first register file from the set of multiple processing elements and the matrix accelerator; and a second register file coupled with a second read control circuit, wherein the second read control circuit couples with the matrix accelerator to arbitrate read requests to the second register file from the matrix accelerator and limit access to the second register file by the set of multiple processing elements.

60.

发明申请
TEMPORAL MOTION VECTOR PREDICTION CONTROL IN VIDEO CODING 审中-公开

公开(公告)号：US20190098332A1

公开(公告)日：2019-03-28

申请号：US15714808

申请日：2017-09-25

申请人： Intel Corporation

发明人： Fangwen Fu , Jill M. Boyce

IPC分类号： H04N19/52 , H04N19/70 , H04N19/105 , H04N19/82

摘要： Temporal motion vector prediction control is described in video coding. In one example, a method includes receiving a plurality of frames representing encoded video, parsing an uncompressed header for each frame, determining whether a temporal motion vector prediction command is included within the parsed uncompressed header of a first frame, selecting a reference frame from a reference list of frames, retrieving motion vector information from the selected reference frame, performing temporal motion vector prediction on the first frame corresponding to the parsed uncompressed header if a temporal motion vector prediction command is included within the parsed header to form a motion predicted frame, applying a loop filter to the motion predicted frame, and rendering the frame as decoded video.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类