Abstract:
A processor includes a front end to fetch an instruction. The instruction is to calculate a data point using inputs from a plurality of adjacent source data in a plurality of dimensions. The processor includes a decoder to decode the instruction. The processor also includes a core to, based on the decoded instruction, perform a plurality of tabular vector read operations to read the plurality of adjacent source data and perform a tabular vector calculation to execute the instruction. The tabular vector calculation is based upon results of performing the plurality of tabular vector read operations. The core is further to write results of the tabular vector calculation.
Abstract:
Instructions and logic provide vector horizontal majority voting functionality. Some embodiments, responsive to an instruction specifying: a destination operand, a size of the vector elements, a source operand, and a mask corresponding to a portion of the vector element data fields in the source operand; read a number of values from data fields of the specified size in the source operand, corresponding to the mask specified by the instruction and store a result value to that number of corresponding data fields in the destination operand, the result value computed from the majority of values read from the number of data fields of the source operand.
Abstract:
A processor includes a vector register including data fields to store values of vector elements of data, a decoder to decode a single instruction multiple data (SIMD) instruction specifying a source operand and a mask to identify a masked portion of the data fields. An execution unit is to read a plurality of values from unmasked data fields of the plurality of data fields of the vector register; compare, within the vector register, each of the plurality of values from the unmasked data fields for equality with all other values of the plurality of values; and responsive to a detection of an inequality of any two values of the plurality of values, set a mask field, corresponding to a detected unequal value, to a masked state with a flip of a bit value of the mask field, to signal the detection of the inequality.
Abstract:
A processor includes a vector register including data fields to store values of vector elements of data, a decoder to decode a single instruction multiple data (SIMD) instruction specifying a source operand and a mask to identify a masked portion of the data fields. An execution unit is to read a plurality of values from unmasked data fields of the plurality of data fields of the vector register; compare, within the vector register, each of the plurality of values from the unmasked data fields for equality with all other values of the plurality of values; and responsive to a detection of an inequality of any two values of the plurality of values, set a mask field, corresponding to a detected unequal value, to a masked state with a flip of a bit value of the mask field, to signal the detection of the inequality.
Abstract:
Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.
Abstract:
A processor includes a front end to fetch an instruction. The instruction is to calculate a data point using inputs from a plurality of adjacent source data in a plurality of dimensions. The processor includes a decoder to decode the instruction. The processor also includes a core to, based on the decoded instruction, perform a plurality of tabular vector read operations to read the plurality of adjacent source data and perform a tabular vector calculation to execute the instruction. The tabular vector calculation is based upon results of performing the plurality of tabular vector read operations. The core is further to write results of the tabular vector calculation.
Abstract:
A method of assessing energy efficiency of a High-performance computing (HPC) system, including: selecting a plurality of HPC workloads to run on a system under test (SUT) with one or more power constraints, wherein the SUT includes a plurality of HPC nodes in the HPC system, executing the plurality of HPC workloads on the SUT, and generating a benchmark metric for the SUT based on a baseline configuration for each selected HPC workload and a plurality of measured performance per power values for each executed workload at each selected power constraint is shown.
Abstract:
A processor includes an execution unit to execute instructions to load indices from an array of indices and scatter elements to locations in sparse memory based on those indices. The execution unit includes logic to load, for each data element to be scattered by the instruction, as needed, an index value to be used in computing the address in memory at which a particular data element is to be written. The index values may be retrieved from an array of indices identified for the instruction. The execution unit includes logic to compute the addresses based on the sum of a base address specified for the instruction and the index values retrieved for the data element locations, with optional scaling. The execution unit includes logic to retrieve data elements from contiguous locations in a source vector register specified for the instruction and store them to the computed locations.
Abstract:
Instructions and logic provide vector horizontal majority voting functionality. Some embodiments, responsive to an instruction specifying: a destination operand, a size of the vector elements, a source operand, and a mask corresponding to a portion of the vector element data fields in the source operand; read a number of values from data fields of the specified size in the source operand, corresponding to the mask specified by the instruction and store a result value to that number of corresponding data fields in the destination operand, the result value computed from the majority of values read from the number of data fields of the source operand.
Abstract:
A processor includes an execution unit to execute instructions to load indices from an array of indices, optionally perform scatters, and prefetch (to a specified cache) contents of target locations for future scatters from arbitrary locations in memory. The execution unit includes logic to load, for each target location of a scatter or prefetch operation, an index value to be used in computing the address in memory for the operation. The index value may be retrieved from an array of indices identified for the instruction. The execution unit includes logic to compute the addresses based on the sum of a base address specified for the instruction, the index value retrieved for the location, and a prefetch offset (for prefetch operations), with optional scaling. The execution unit includes logic to retrieve data elements from contiguous locations in a source vector register specified for the instruction to be scattered to the memory.