OPERAND CONFLICT RESOLUTION FOR REDUCED PORT GENERAL PURPOSE REGISTER
    11.
    发明申请
    OPERAND CONFLICT RESOLUTION FOR REDUCED PORT GENERAL PURPOSE REGISTER 有权
    减少港口一般用途注册的操作冲突解决方案

    公开(公告)号:US20160098276A1

    公开(公告)日:2016-04-07

    申请号:US14505854

    申请日:2014-10-03

    Abstract: Techniques are described for determining whether execution of an instruction would require reading more values from a memory cell of a general purpose register (GPR) than a read port of the memory cell would allow. In such a case, the techniques may store, prior to execution of the instruction, one or more values from the memory cell in a separate conflict queue. During execution of the instruction to implement an operation defined by the instruction, one value that is an operand of the operation would be read from the memory cell and another value that is an operand of the operation other would be read from the conflict queue.

    Abstract translation: 描述了用于确定指令的执行是否需要从通用目的寄存器(GPR)的存储器单元读取比存储器单元的读取端口将允许的更多值的技术。 在这种情况下,这些技术可以在执行指令之前在单独的冲突队列中存储来自存储器单元的一个或多个值。 在执行用于实现由指令定义的操作的指令期间,将从存储器单元读取作为操作的操作数的一个值,并且将从冲突队列读取作为其他操作的操作数的另一个值。

    Out of order wave slot release for a terminated wave

    公开(公告)号:US11094032B2

    公开(公告)日:2021-08-17

    申请号:US16734252

    申请日:2020-01-03

    Abstract: Methods, systems, and devices for image processing are described. A device may determine, based on a test operation, to terminate a first wave associated with a first slot of a set of slots. The device may update a terminated wave bit associated with the first slot based on the determination to terminate the first wave. In some aspects, the device may update a number of invocations field associated with the first wave based on the determination to terminate the first wave. The device may release the first slot based on updating the terminated wave bit and the number of invocations field. In some examples, the device may output the number of invocations field to a rendering backend of the device based on the terminated wave bit.

    Per-instance preamble for graphics processing

    公开(公告)号:US09799094B1

    公开(公告)日:2017-10-24

    申请号:US15162198

    申请日:2016-05-23

    Abstract: A method for processing data in a graphics processing unit (GPU) including receiving an instance identifier for an instance and a shader program comprising a preamble code block and a main shader code block, assigning, the instance identifier to a general purpose register at wave creation, allocating address space within the constant memory for instance uniforms, and determining the preamble code block has not been executed and the wave is a first wave of the instance to be executed, based on determining the preamble code block has not been executed and the wave is the first wave to be executed, executing the preamble code block to store the plurality of instance uniforms in the constant memory and based, at least in part, on executing the preamble code block, executing the wave of the plurality of waves using at least one of the plurality of instance constants stored inconstant memory.

    LOAD SCHEME FOR SHARED REGISTER IN GPU
    15.
    发明申请
    LOAD SCHEME FOR SHARED REGISTER IN GPU 有权
    GPU中共享注册表的加载方案

    公开(公告)号:US20150379680A1

    公开(公告)日:2015-12-31

    申请号:US14316391

    申请日:2014-06-26

    CPC classification number: G06T1/60 G06T15/80 G09G5/363 G09G2352/00 G09G2360/06

    Abstract: Techniques are described for determining whether data of a variable for each of a plurality of graphics items is same. If determined that the data is the same, the techniques store the data in a storage location of a specialized shared general purpose register that is associated with the variable.

    Abstract translation: 描述了用于确定多个图形项目中的每一个的变量的数据是否相同的技术。 如果确定数据相同,则该技术将数据存储在与变量相关联的专用共享通用寄存器的存储位置中。

    UTILIZING PIPELINE REGISTERS AS INTERMEDIATE STORAGE
    16.
    发明申请
    UTILIZING PIPELINE REGISTERS AS INTERMEDIATE STORAGE 有权
    使用管道注册器作为中间存储

    公开(公告)号:US20150324196A1

    公开(公告)日:2015-11-12

    申请号:US14275047

    申请日:2014-05-12

    Abstract: In one example, a method includes responsive to receiving, by a processing unit, one or more instructions requesting that a first value be moved from a first general purpose register (GPR) to a third GPR and that a second value be moved from a second GPR to a fourth GPR, copying, by an initial logic unit and during a first clock cycle, the first value to an initial pipeline register, copying, by the initial logic and during a second clock cycle, the second value to the initial pipeline register, copying, by a final logic unit and during a third clock cycle, the first value from a final pipeline register to the third GPR, and copying, by the final logic unit and during a fourth clock cycle, the second value from the final pipeline register to the fourth GPR.

    Abstract translation: 在一个示例中,一种方法包括响应于由处理单元接收一个或多个请求将第一值从第一通用寄存器(GPR)移动到第三GPR的指令,并且第二值从第二个 GPR到第四个GPR,由初始逻辑单元和在第一时钟周期期间将第一个值复制到初始流水线寄存器,通过初始逻辑复制第二个时钟周期,将第二个值复制到初始流水线寄存器 ,由最终逻辑单元和在第三时钟周期期间将第一值从最终流水线寄存器复制到第三GPR,并且由最终逻辑单元复制并在第四时钟周期期间从最终管道复制第二值 注册到第四个GPR。

    TECHNIQUES FOR SERIALIZED EXECUTION IN A SIMD PROCESSING SYSTEM
    17.
    发明申请
    TECHNIQUES FOR SERIALIZED EXECUTION IN A SIMD PROCESSING SYSTEM 审中-公开
    SIMD处理系统中串行执行的技术

    公开(公告)号:US20150317157A1

    公开(公告)日:2015-11-05

    申请号:US14268215

    申请日:2014-05-02

    CPC classification number: G06F9/3851 G06F9/3887

    Abstract: A SIMD processor may be configured to determine one or more active threads from a plurality of threads, select one active thread from the one or more active threads, and perform a divergent operation on the selected active thread. The divergent operation may be a serial operation.

    Abstract translation: SIMD处理器可以被配置为从多个线程确定一个或多个活动线程,从一个或多个活动线程中选择一个活动线程,并对所选择的活动线程执行发散操作。 发散操作可以是串行操作。

    GPR optimization in a GPU based on a GPR release mechanism

    公开(公告)号:US11475533B2

    公开(公告)日:2022-10-18

    申请号:US16877367

    申请日:2020-05-18

    Abstract: This disclosure provides systems, devices, apparatus and methods, including computer programs encoded on storage media, for GPR optimization in a GPU based on a GPR release mechanism. More specifically, a GPU may determine at least one unutilized branch within an executable shader based on constants defined for the executable shader. Based on the at least one unutilized branch, the GPU may further determine a number of GPRs that can be deallocated from previously allocated GPRs. The GPU may deallocate, for a subsequent thread within a draw call, the number of GPRs from the previously allocated GPRs during execution of the executable shader based on the determined number of GPRs to be deallocated.

Patent Agency Ranking