Abstract:
A coarse-grained reconfigurable processor having an improved code compression rate and a code decompression method thereof are provided to reduce a capacity of a configuration memory and reduce power consumption in a processor chip. The coarse-grained reconfigurable processor includes a configuration memory configured to store reconfiguration information including a header storing a compression mode indicator and a compressed code for each of a plurality of units and a body storing at least one uncompressed code, a decompressor configured to specify a code corresponding to each of the plurality of units among the at least one uncompressed code within the body based on the compression mode indicator and the compressed code within the header, and a reconfigurator including a plurality of PEs and configured to reconfigure data paths of the plurality of PEs based on the code corresponding to each unit.
Abstract:
An apparatus and a job scheduling method are provided. For example, the apparatus is a multi-core processing apparatus. The apparatus and method minimize performance degradation of a core caused by sharing resources by dynamically managing a maximum number of jobs assigned to each core of the apparatus. The apparatus includes at least one core including an active cycle counting unit configured to store a number of active cycles and a stall cycle counting unit configured to store a number of stall cycles and a job scheduler configured to assign at least one job to each of the at least one core, based on the number of active cycles and the number of stall cycles. When the ratio of the number of stall cycles to a number of active cycles for a core is too great, the job scheduler assigns fewer jobs to that core to improve performance.
Abstract:
A swizzle pattern generator is provided to reduce an overhead due to execution of a swizzle instruction in vector processing. The swizzle pattern generator is configured to provide swizzle patterns with respect to data sets of at least one vector register or vector processing unit. The swizzle pattern generator may be reconfigurable to generate various swizzle patterns for different vector operations.
Abstract:
A graphics processing unit (GPU), configured to perform tile-based rendering using prefetched graphics data, includes a tiler configured to perform binning on a current frame and obtain a first binning bitstream of a first tile among a plurality of tiles of the current frame, a binning correlator configured to determine whether the first tile and a second tile of a previous frame are similar to each other by using the first binning bitstream and a second binning bitstream of the second tile, where the second tile has a same tile ID as the first tile, a prefetcher configured to prefetch second graphics data used to render the second tile by using the tile ID, when it is determined that the first tile and the second tile are similar to each other, and at least one processor configured to render the current frame using the prefetched second graphics data.
Abstract:
A functional unit for supporting multithreading, a processor including the same, and an operating method of the processor are provided. The functional unit for supporting multithreading includes a plurality of input ports configured to receive opcodes and operands for a plurality of threads, wherein each of the plurality of input ports is configured to receive an opcode and an operand for a different thread, a plurality of operators configured to perform operations using the received operands, an operator selector configured to select, based on each opcode, an operator from among the plurality of operators to perform a specific operation using an operand from among the received operands, and a plurality of output ports configured to output operation results of operations for each thread.
Abstract:
A graphics processing unit (GPU), configured to perform tile-based rendering using prefetched graphics data, includes a tiler configured to perform binning on a current frame and obtain a first binning bitstream of a first tile among a plurality of tiles of the current frame, a binning correlator configured to determine whether the first tile and a second tile of a previous frame are similar to each other by using the first binning bitstream and a second binning bitstream of the second tile, where the second tile has a same tile ID as the first tile, a prefetcher configured to prefetch second graphics data used to render the second tile by using the tile ID, when it is determined that the first tile and the second tile are similar to each other, and at least one processor configured to render the current frame using the prefetched second graphics data.
Abstract:
A texture processing method and apparatus that obtains information about a first data loss amount that occurred during a texture compression process. A determination is made regarding a second data loss amount that allowable during a texture filtering process based on the obtained information regarding the first data loss amount. Texture filtering is then performed by using the second data loss amount. At least one processor determines the second data loss amount based on a difference between the third data loss amount and the first data loss amount.
Abstract:
An apparatus and a job scheduling method are provided. For example, the apparatus is a multi-core processing apparatus. The apparatus and method minimize performance degradation of a core caused by sharing resources by dynamically managing a maximum number of jobs assigned to each core of the apparatus. The apparatus includes at least one core including an active cycle counting unit configured to store a number of active cycles and a stall cycle counting unit configured to store a number of stall cycles and a job scheduler configured to assign at least one job to each of the at least one core, based on the number of active cycles and the number of stall cycles. When the ratio of the number of stall cycles to a number of active cycles for a core is too great, the job scheduler assigns fewer jobs to that core to improve performance.
Abstract:
A texture processing method and apparatus that obtains information about a first data loss amount that occurred during a texture compression process. A determination is made regarding a second data loss amount that allowable during a texture filtering process based on the obtained information regarding the first data loss amount. Texture filtering is then performed by using the second data loss amount. At least one processor determines the second data loss amount based on a difference between the third data loss amount and the first data loss amount.
Abstract:
A multi-core apparatus includes cores each including an active cycle counting unit configured to store an active cycle count, and a stall cycle counting unit configured to store a stall cycle count. The multi-core apparatus further includes a job scheduler configured to determine an optimal number of cores in an active state based on state information received from each of the cores, and adjust power to maintain the optimal number of cores.