LOAD SCHEME FOR SHARED REGISTER IN GPU
    11.
    发明申请
    LOAD SCHEME FOR SHARED REGISTER IN GPU 有权
    GPU中共享注册表的加载方案

    公开(公告)号:US20150379680A1

    公开(公告)日:2015-12-31

    申请号:US14316391

    申请日:2014-06-26

    CPC classification number: G06T1/60 G06T15/80 G09G5/363 G09G2352/00 G09G2360/06

    Abstract: Techniques are described for determining whether data of a variable for each of a plurality of graphics items is same. If determined that the data is the same, the techniques store the data in a storage location of a specialized shared general purpose register that is associated with the variable.

    Abstract translation: 描述了用于确定多个图形项目中的每一个的变量的数据是否相同的技术。 如果确定数据相同,则该技术将数据存储在与变量相关联的专用共享通用寄存器的存储位置中。

    UTILIZING PIPELINE REGISTERS AS INTERMEDIATE STORAGE
    12.
    发明申请
    UTILIZING PIPELINE REGISTERS AS INTERMEDIATE STORAGE 有权
    使用管道注册器作为中间存储

    公开(公告)号:US20150324196A1

    公开(公告)日:2015-11-12

    申请号:US14275047

    申请日:2014-05-12

    Abstract: In one example, a method includes responsive to receiving, by a processing unit, one or more instructions requesting that a first value be moved from a first general purpose register (GPR) to a third GPR and that a second value be moved from a second GPR to a fourth GPR, copying, by an initial logic unit and during a first clock cycle, the first value to an initial pipeline register, copying, by the initial logic and during a second clock cycle, the second value to the initial pipeline register, copying, by a final logic unit and during a third clock cycle, the first value from a final pipeline register to the third GPR, and copying, by the final logic unit and during a fourth clock cycle, the second value from the final pipeline register to the fourth GPR.

    Abstract translation: 在一个示例中,一种方法包括响应于由处理单元接收一个或多个请求将第一值从第一通用寄存器(GPR)移动到第三GPR的指令,并且第二值从第二个 GPR到第四个GPR,由初始逻辑单元和在第一时钟周期期间将第一个值复制到初始流水线寄存器,通过初始逻辑复制第二个时钟周期,将第二个值复制到初始流水线寄存器 ,由最终逻辑单元和在第三时钟周期期间将第一值从最终流水线寄存器复制到第三GPR,并且由最终逻辑单元复制并在第四时钟周期期间从最终管道复制第二值 注册到第四个GPR。

    TECHNIQUES FOR SERIALIZED EXECUTION IN A SIMD PROCESSING SYSTEM
    13.
    发明申请
    TECHNIQUES FOR SERIALIZED EXECUTION IN A SIMD PROCESSING SYSTEM 审中-公开
    SIMD处理系统中串行执行的技术

    公开(公告)号:US20150317157A1

    公开(公告)日:2015-11-05

    申请号:US14268215

    申请日:2014-05-02

    CPC classification number: G06F9/3851 G06F9/3887

    Abstract: A SIMD processor may be configured to determine one or more active threads from a plurality of threads, select one active thread from the one or more active threads, and perform a divergent operation on the selected active thread. The divergent operation may be a serial operation.

    Abstract translation: SIMD处理器可以被配置为从多个线程确定一个或多个活动线程,从一个或多个活动线程中选择一个活动线程,并对所选择的活动线程执行发散操作。 发散操作可以是串行操作。

    Dynamic shader instruction nullification for graphics processing

    公开(公告)号:US10430912B2

    公开(公告)日:2019-10-01

    申请号:US15432170

    申请日:2017-02-14

    Abstract: A GPU may be configured to detect and nullify unnecessary instructions. Nullifying unnecessary instructions include overwriting a detected unnecessary instruction with a no operation (NOP) instruction. In another example, nullifying unnecessary instructions may include writing a value to a 1-bit instruction memory. Each bit of the 1-bit instruction memory may be associated with a particular instruction of the draw call. If the 1-bit instruction memory has a true value (e.g., 1), the GPU is configured to not execute the particular instruction.

    GENERAL PURPOSE REGISTER ALLOCATION IN STREAMING PROCESSOR

    公开(公告)号:US20180165092A1

    公开(公告)日:2018-06-14

    申请号:US15379195

    申请日:2016-12-14

    Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.

    Emulation of fused multiply-add operations

    公开(公告)号:US09645792B2

    公开(公告)日:2017-05-09

    申请号:US14461890

    申请日:2014-08-18

    CPC classification number: G06F7/5443 G06F5/01 G06F7/483 G06F7/57

    Abstract: At least one processor may emulate a fused multiply-add operation for a first operand, a second operand, and a third operand. The at least one processor may determine an intermediate value based at least in part on multiplying the first operand with the second operand, determine at least one of an upper intermediate value or a lower intermediate value, wherein determining the upper intermediate value comprises rounding, towards zero, the intermediate value by a specified number of bits, and wherein determining the lower intermediate value comprises subtracting the intermediate value by the upper intermediate value, determine an upper value and a lower value based at least in part on adding or subtracting the third operand to one of the upper intermediate value or the lower intermediate value, and determine an emulated fused multiply-add result by adding the upper value and the lower value.

    GPU DIVERGENCE BARRIER
    17.
    发明申请

    公开(公告)号:US20150095914A1

    公开(公告)日:2015-04-02

    申请号:US14043562

    申请日:2013-10-01

    CPC classification number: G06F9/4843 G06F9/3887 G06F9/522 G06T1/20

    Abstract: A device includes a memory, and at least one programmable processor configured to determine, for each warp of a plurality of warps, whether a Boolean expression is true for a corresponding thread of each warp, pause execution of each warp having a corresponding thread for which the expression is true, determine a number of active threads for each of the plurality of warps for which the expression is true, sort the plurality of warps for which the expression is true based on the number of active threads in each of the plurality of warps, swap thread data of an active thread of a first warp of the plurality of warps with thread data of an inactive thread of a second warp of the plurality of warps, and resume execution of the at least one of the plurality of warps for which the expression is true.

    Abstract translation: 一种设备包括存储器,以及至少一个可编程处理器,其被配置为针对多个经线的每个翘曲确定布线表达式对于每个翘曲的相应线程是否为真,每个经线的暂停执行具有相应的线程, 表达式是真实的,确定表达式为真的多个经线中的每一个的多个活动线程,基于多个经线中的每一个中的活动线程的数量对表达式为真的多个经线进行排序 通过多个经纱中的第二扭曲的无效线程的线程数据交换多个经纱中的第一翘曲的活动线程的线程数据,并且恢复多个经线中的至少一个经线的执行, 表达是真实的。

Patent Agency Ranking