Abstract:
An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.
Abstract:
An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.
Abstract:
Described herein are technologies related to enforcing thread dependency using a hybrid scoreboard. An encoded video information that includes a plurality of threads is received, a first set and a second set of threads from the plurality of thread is determined, the first and second sets of threads are assigned to a hardware and a software, respectively, and dependency threads in the first and second sets of threads is enforced.
Abstract:
In an embodiment, at least one computer readable storage medium has instructions stored thereon for causing a system to send, from a processor to a task execution device, a first call to execute a first subroutine of a set of chained subroutines. The first subroutine may have a first subroutine output argument that includes a first token to indicate that first output data from execution of the first subroutine is intermediate data of the set of chained subroutines. The instructions are also for causing the system, responsive to inclusion of the first token in the first subroutine output argument, to enable the processor to execute one or more operations while the task execution device executes the first subroutine. Other embodiments are described and claimed.
Abstract:
Described herein is a graphics processor comprising an instruction cache and a plurality of processing elements coupled with the instruction cache. The plurality of processing elements include functional units configured to provide an integer pipeline to execute instructions to perform operations on integer data elements. The integer pipeline including a first multiplier and a second multiplier, the first multiplier and the second multiplier configured to execute operations for a single instruction.
Abstract:
Systems and methods may receive an input image for data processing, divide the input image into a plurality of blocks, each block including a plurality of rows, and each row including a plurality of pixels and process each pixel in the input image within a row in parallel with a user-defined template. In one example, the user-defined template is to include a structuring element and a row pixel mask.