CROSS-THREAD REGISTER SHARING FOR MATRIX MULTIPLICATION COMPUTE

    公开(公告)号:US20240168807A1

    公开(公告)日:2024-05-23

    申请号:US18056949

    申请日:2022-11-18

    申请人: Intel Corporation

    摘要: An apparatus to facilitate cross-thread register sharing for matrix multiplication compute is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units are to: receive a decoded instruction for a first thread having a first register space, wherein the decoded instruction is for a matrix multiplication operation and comprises an indication to utilize a second register space of a second thread for an operand of the decoded instruction for the first thread; access the second register space of the second thread to obtain data for the operand of the decoded instruction; and perform the matrix multiplication operation for the first thread using the data for the operand from the second register space of the second thread.

    DETERMINISTIC BROADCASTING FROM SHARED MEMORY

    公开(公告)号:US20240111534A1

    公开(公告)日:2024-04-04

    申请号:US17957486

    申请日:2022-09-30

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F9/54

    摘要: Embodiments described herein provide a technique enable a broadcast load from an L1 cache or shared local memory to register files associated with hardware threads of a graphics core. One embodiment provides a graphics processor comprising a cache memory and a graphics core coupled with the cache memory. The graphics core includes a plurality of hardware threads and memory access circuitry to facilitate access to memory by the plurality of hardware threads. The graphics core is configurable to process a plurality of load request from the plurality of hardware threads, detect duplicate load requests within the plurality of load requests, perform a single read from the cache memory in response to the duplicate load requests, and transmit data associated with the duplicate load requests to requesting hardware threads.

    TEMPORAL MOTION VECTOR PREDICTION CONTROL IN VIDEO CODING

    公开(公告)号:US20200068216A1

    公开(公告)日:2020-02-27

    申请号:US16666275

    申请日:2019-10-28

    申请人: Intel Corporation

    摘要: Temporal motion vector prediction control is described in video coding. In one example, a method includes receiving a plurality of frames representing encoded video, parsing an uncompressed header for each frame, determining whether a temporal motion vector prediction command is included within the parsed uncompressed header of a first frame, selecting a reference frame from a reference list of frames, retrieving motion vector information from the selected reference frame, performing temporal motion vector prediction on the first frame corresponding to the parsed uncompressed header if a temporal motion vector prediction command is included within the parsed header to form a motion predicted frame, applying a loop filter to the motion predicted frame, and rendering the frame as decoded video.

    ENHANCEMENTS FOR ACCUMULATOR USAGE AND INSTRUCTION FORWARDING IN MATRIX MULTIPLY PIPELINE IN GRAPHICS ENVIRONMENT

    公开(公告)号:US20240169021A1

    公开(公告)日:2024-05-23

    申请号:US18056930

    申请日:2022-11-18

    申请人: Intel Corporation

    IPC分类号: G06F17/16 G06F7/544

    CPC分类号: G06F17/16 G06F7/5443

    摘要: An apparatus to facilitate enhancements for accumulator usage and instruction forwarding in matrix multiply pipeline in graphics environment is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units comprise: multiply-accumulate hardware to generate intermediate results of a matrix multiplication operation; intermediate accumulation hardware to store the intermediate results of the matrix multiplication operation and accumulate with other intermediate results generated by the multiply-accumulate hardware; a bypass data structure to cause a source operand to bypass the multiply-accumulate hardware; and an adder circuit to add an output from the multiply-accumulate hardware with at least one of the source operand or an output of the intermediate accumulation hardware to generate a final output.

    NAMED AND CLUSTER BARRIERS
    10.
    发明公开

    公开(公告)号:US20240134719A1

    公开(公告)日:2024-04-25

    申请号:US17973234

    申请日:2022-10-24

    申请人: Intel Corporation

    IPC分类号: G06F9/52 G06F9/48

    CPC分类号: G06F9/522 G06F9/4881

    摘要: Embodiments described herein provide a technique to facilitate the synchronization of workgroups executed on multiple graphics cores of a graphics core cluster. One embodiment provides a graphics core including a cache memory and a graphics core coupled with the cache memory. The graphics core includes execution resources to execute an instruction via a plurality of hardware threads and barrier circuitry to synchronize execution of the plurality of hardware threads, wherein the barrier circuitry is configured to provide a plurality of re-usable named barriers.