摘要:
Embodiments relate to packed loading and storing of data. An aspect includes a system for packed loading and storing of distributed data. The system includes memory and a processing element configured to communicate with the memory. The processing element is configured to perform a method including fetching and decoding an instruction for execution by the processing element. A plurality of individually addressable data elements is gathered from non-contiguous locations in the memory which are narrower than a nominal width of register file elements in the processing element based on the instruction. The processing element packs and loads the data elements into register file elements of a register file entry based on the instruction, such that at least two of the data elements gathered from the non-contiguous locations in the memory are packed and loaded into a single register file element of the register file entry.
摘要:
Embodiments relate to vector processor predication in an active memory device. An aspect includes a system for vector processor predication in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.
摘要:
Embodiments relate to vector processor predication in an active memory device. An aspect includes a method for vector processor predication in an active memory device that includes memory and a processing element. The method includes decoding, in the processing element, an instruction including a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.
摘要:
Embodiments relate to vector processor predication in an active memory device. An aspect includes a method for vector processor predication in an active memory device that includes memory and a processing element. The method includes decoding, in the processing element, an instruction including a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.
摘要:
Mechanisms are provided for dynamic data driven alignment and data formatting in a floating point SIMD architecture. At least two operand inputs are input to a permute unit of a processor. Each operand input contains at least one floating point value upon which a permute operation is to be performed by the permute unit. A control vector input, having a plurality of floating point values that together constitute the control vector input, is input to the permute unit of the processor for controlling the permute operation of the permute unit. The permute unit performs a permute operation on the at least two operand inputs according to a permutation pattern specified by the plurality of floating point values that constitute the control vector input. Moreover, a result output of the permute operation is output from the permute unit to a result vector register of the processor.
摘要:
Mechanisms for aligning enhanced precision vectors based on reduced precision data values are provided. At least one data value, having a first precision type, is received for storing in a vector register. The vector register stores data as a vector having a plurality of vector elements. The first precision type is modified to have a second precision type different in precision than the first precision type to thereby generate at least one modified data value. The at least one modified data value is stored in at least one vector element of the plurality of vector elements. An alignment of the at least one modified data value is determined relative to a boundary of a vector element of the vector register. An alignment operation to re-align the at least one modified data value based on the boundary of the vector element of the vector register is performed.
摘要:
Systems, methods and computer program products for providing a combined moduli-9 and 3 residue generator. The methods include receiving a number in binary coded decimal (BCD) or binary format. A modulus-9 residue of the number is calculated. The modulus-9 residue that is calculated includes a modulus-3 residue of the number. The modulis-3 residue of the number is output. If the number is in BCD format, then the modulus-9 residue of the number is output.
摘要:
A BiCMOS driver circuit with an improved low output level which is closer to ground than in prior art circuits, both at full speed and with a static (resistive) load. The driver circuit incorporates a gated diode pull-down with a lower voltage drop than in prior art driver circuits, in which a bipolar output transistor remains on for output voltages down to about 0.3V. The voltage drop of the gated diode is set by device size ratios to be less than 0.5V without driving the output transistor into hard saturation. In the circuit, a gated diode pull-down NPN transistor is coupled between an output terminal and ground. A first CMOS transistor pair is coupled between the output terminal and the base of the pull-down NPN transistor, and a second CMOS transistor pair is coupled between the output terminal and the power supply V.sub.DD. The circuit has improved low output level and high-to-low transitions. The circuit has particular applications to many types of logic gates, for example in NAND gates, NOR gates, and inverters, as are used extensively in static or dynamic random access memory logic circuits. Although the disclosed embodiments herein are for NAND gates, one skilled in the art could readily implement the teachings of the present invention in other logic circuits and gates, such as NOR gates and inverters with minor changes in the circuitry.
摘要:
Mechanisms, in a data processing system comprising a single instruction multiple data (SIMD) processor, for performing a data dependency check operation on vector element values of at least two input vector registers are provided. Two calls to a simd-check instruction are performed, one with input vector registers having a first order and one with the input vector registers having a different order. The simd-check instruction performs comparisons to determine if any data dependencies are present. Results of the two calls to the simd-check instruction are obtained and used to determine if any data dependencies are present in the at least two input vector registers. Based on the results, the SIMD processor may perform various operations.
摘要:
An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder.