Abstract:
A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
Abstract:
A method, computer readable medium, and system are disclosed for rounding numerical values. A set of bits from an input value is identified as a rounding value. A second set of bits representing a second value is extracted from the input value and added with the rounding value to produce a sum. The sum is truncated to produce the rounded output value. Thus, the present invention provides a stochastic rounding technique that rounds up an input value as a function of a second value and a rounding value, both of which were obtained from the input value. When the second value and rounding value are obtained from consistent bit locations of the input value, the resulting output value is deterministic. Stochastic rounding, which is deterministic, is advantageously applicable in deep learning applications.
Abstract:
A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
Abstract:
A raster unit is configured to generate different sample patterns for adjacent pixels within a given frame. In addition, the raster unit may adjust the sample patterns between frames. The raster unit includes an index unit that selects a sample pattern table for use with a current frame. For a given pixel, the index unit extracts a sample pattern from the selected sample pattern table. The extracted sample pattern is used to generate coverage information for the pixel. The coverage information for all pixels is then used to generate an image. The resultant image may then be filtered to reduce or remove artifacts induced by the changing of sample locations.
Abstract:
A system of interconnected chips comprising a multi-chip module (MCM) includes a processor chip, a system functions chip, and an MCM package configured to include the processor chip, the system functions chip, and an interconnect circuit. The processor chip is configured to include a first ground-referenced single-ended signaling interface circuit. A first set of electrical traces manufactured within the MCM package and configured to couple the first single-ended signaling interface circuit to the interconnect circuit. The system functions chip is configured to include a second single-ended signaling interface circuit and a host interface. A second set of electrical traces manufactured within the MCM package and configured to couple the host interface to at least one external pin of the MCM package. In one embodiment, each single-ended signaling interface advantageously implements ground-referenced single-ended signaling.
Abstract:
One embodiment of the present invention sets forth a technique for capturing and storing a level of an input signal using a single-trigger low-energy flip-flop circuit that is fully-static and insensitive to fabrication process variations, The single-trigger low-energy flip-flop circuit presents only three transistor gate loads to the clock signal and none of the internal nodes toggle when the input signal remains constant, The output signal Q is set or reset at the rising clock edge using a single- trigger sub-circuit. A set or reset may be armed while the clock signal is low, and the set or reset is triggered at the rising edge of the clock.
Abstract:
One embodiment of the present invention sets forth a technique for technique for capturing and storing a level of an input signal using a single-trigger low-energy flip-flop circuit that is fully-static and insensitive to fabrication process variations. The single-trigger low-energy flip-flop circuit presents only three transistor gate loads to the clock signal and none of the internal nodes toggle when the input signal remains constant. The output signal Q is set or reset at the rising clock edge using a single-trigger sub-circuit. A set or reset may be armed while the clock signal is low, and the set or reset is triggered at the rising edge of the clock.