Vertex attribute compression and decompression in hardware

    公开(公告)号:US10282889B2

    公开(公告)日:2019-05-07

    申请号:US15432782

    申请日:2017-02-14

    Abstract: One or more embodiments of the present disclosure provide an apparatus used in source data compression, comprising a memory and a at least one processor. The memory is configured to store vertex attribute data and a set of instructions. The processor is coupled to the memory. The processor is configured to receive a source data stream that includes one or more values corresponding to the vertex attribute data. The processor is also configured to provide a dictionary for the one or more values in the source data stream, wherein the dictionary includes a plurality of index values corresponding to the one or more values in the source data stream. The processor is also configured to lace at least some of the one or more values in the source data stream with corresponding index values of the plurality of index values.

    POWER SAVING BRANCH MODES IN HARDWARE
    14.
    发明申请

    公开(公告)号:US20180341489A1

    公开(公告)日:2018-11-29

    申请号:US15684573

    申请日:2017-08-23

    Abstract: A method and apparatus are provided. The method includes executing a plurality of threads in a temporal dimension, executing a plurality of threads in a spatial dimension, determining a branch target address for each of the plurality of threads in the temporal dimension and the plurality of threads in the spatial dimension, and comparing each of the branch target addresses to determine a minimum branch target address, wherein the minimum branch target address is a minimum value among branch target addresses of each of the plurality of threads.

    Flexible-access instructions for efficient access of ML data

    公开(公告)号:US11971949B2

    公开(公告)日:2024-04-30

    申请号:US17173203

    申请日:2021-02-10

    CPC classification number: G06F17/16 G06F9/30101 G06F17/15 G06N3/08

    Abstract: A graphics processing unit (GPU) and a method is disclosed that performs a convolution operation recast as a matrix multiplication operation. The GPU includes a register file, a processor and a state machine. The register file stores data of an input feature map and data of a filter weight kernel. The processor performs a convolution operation on data of the input feature map and data of the filter weight kernel as a matrix multiplication operation. The state machine facilitates performance of the convolution operation by unrolling the data of the input feature map and the data of the filter weight kernel in the register file. The state machine includes control registers that determine movement of data through the register file to perform the matrix multiplication operation on the data in the register file in an unrolled manner.

    Methods and apparatus for pixel packing

    公开(公告)号:US11798218B2

    公开(公告)日:2023-10-24

    申请号:US17503259

    申请日:2021-10-15

    CPC classification number: G06T15/005 G06T2210/21

    Abstract: A method of packing coverage in a graphics processing unit (GPU) may include receiving an indication for a portion of an image, determining, based on the indication, a packing technique for the portion of the image, and packing coverage for the portion of the image based on the packing technique. The indication may include one or more of: an importance, a quality, a level of interest, a level of detail, or a variable-rate shading (VRS) level. The indication may be received from an application. The packing technique may include array merging. The array merging may include quad merging. The packing technique may include pixel piling. The packing technique may be a first packing technique, and the method may further include determining, based on the indication, a second packing technique for the portion of the image, and packing coverage for the portion of the image based on the second packing technique.

    Method for performing shader occupancy for small primitives

    公开(公告)号:US11715252B2

    公开(公告)日:2023-08-01

    申请号:US17168168

    申请日:2021-02-04

    CPC classification number: G06T15/005 G06T1/20 G06T15/80

    Abstract: A GPU includes shader cores and a shader warp packer unit. The shader warp packer unit may receive a first primitive associated with a first partially covered quad, and a second primitive associated with a second partially covered quad. The shader warp packer unit may determine that the first partially covered quad and the second partially covered quad have non-overlapping coverage. The shader warp packer unit may pack the first partially covered quad and the second partially covered quad into a packed quad. The shader warp packer unit may send the packed quad to the shader cores. The first partially covered quad and the second partially covered quad may be spatially disjoint from each other. The shader cores may receive and process the packed quad with no loss of information relative to the shader cores individually processing the first partially covered quad and the second partially covered quad.

    Shader accessible configurable binning subsystem

    公开(公告)号:US11416960B2

    公开(公告)日:2022-08-16

    申请号:US17110284

    申请日:2020-12-02

    Abstract: A binning subsystem of a GPU includes a storage subsystem, a shader core to output first data via a first path, a selector to receive the first data via the first path, and to receive second data from the storage subsystem via a second path. The storage subsystem includes a binner unit and a control logic unit. The control logic unit causes the selector to transfer the first data or the second data to the binner unit. The binner unit may transfer binner output data to the shader core via a third path. The binner unit may transfer the binner output data to one or more subsequent stages of a graphics pipeline via a fourth path. The binner unit may transfer the binner output data to the storage subsystem via a fifth path. The control logic unit may control the binner unit such that the binner unit can be used for general purpose computation.

Patent Agency Ranking