Abstract:
Systems, apparatuses, and methods for efficiently moving data for storage and processing are described. In various embodiments, a compression unit within a processor includes multiple hardware lanes, selects two or more input words to compress, and for assigns them to two or more of the multiple hardware lanes. As each assigned input word is processed, each word is compared to an entry of a plurality of entries of a table. If it is determined that each of the assigned input words indexes the same entry of the table, the hardware lane with the oldest input word generates a single read request for the table entry and the hardware lane with the youngest input word generates a single write request for updating the table entry upon completing compression. Each hardware lane generates a compressed packet based on its assigned input word.
Abstract:
In an embodiment, a processor may implement a vector instruction set including a conditional termination instruction (CTerm). The CTerm instruction may take two source operands and compare them according to a specified condition, updating flags as a result of the instruction. The flags may be used to affect predicate vector generation to control vectorized loop execution. In an embodiment, the vector instruction set may also include a conditional termination predicate instruction (CTPred). The CTPred instruction may take a pair of predicate vectors and a set of flags as operands, and may generate: a predicate vector to control parallel processing of vector elements, and a set of flags to control further loop processing. Either instruction may be used to efficiently manage vector loops in various embodiments, or the instructions may be used together.
Abstract:
An interrupt delivery mechanism for a system includes and interrupt controller and a plurality of cluster interrupt controllers coupled to respective pluralities of processors in an embodiment. The interrupt controller may serially transmit an interrupt request to respective cluster interrupt controllers, which may acknowledge (Ack) or non-acknowledge (Nack) the interrupt based on attempting to deliver the interrupt to processors to which the cluster interrupt controller is coupled. In a soft iteration, the cluster interrupt controller may attempt to deliver the interrupt to processors that are powered on, without attempting to power on processors that are powered off. If the soft iteration does not result in an Ack response from one of the plurality of cluster interrupt controllers, a hard iteration may be performed in which the powered-off processors may be powered on.
Abstract:
Systems, apparatuses, and methods for efficiently moving data for storage and processing. A compression unit within a processor includes multiple hardware lanes, selects two or more input words to compress, and for assigns them to two or more of the multiple hardware lanes. As each assigned input word is processed, each word is compared to an entry of a plurality of entries of a table. If it is determined that each of the assigned input words indexes the same entry of the table, the hardware lane with the oldest input word generates a single read request for the table entry and the hardware lane with the youngest input word generates a single write request for updating the table entry upon completing compression. Each hardware lane generates a compressed packet based on its assigned input word.
Abstract:
In an embodiment, a processor may implement a vector instruction set including one or more compare break instructions. The compare break instruction may take a pair of operands which may be compared to determine loop termination conditions, and may output a predicate vector indicating which vector elements correspond to loop iterations that are executed and which vector elements correspond to loop iterations that are not executed. The predicate vector may serve as a predicate to vector instructions forming the body of the loop, correctly executing the specified number of iterations. The compare break instruction may be coded to check for a variety of conditions (e.g. equal, not equal, greater than, less than, etc.). In an embodiment, the compare break instruction may take a predicate operand as well, which may be combined with the predicate vector produced by the comparison operations to produce the output vector.
Abstract:
An interrupt delivery mechanism for a system includes and interrupt controller and a plurality of cluster interrupt controllers coupled to respective pluralities of processors in an embodiment. The interrupt controller may serially transmit an interrupt request to respective cluster interrupt controllers, which may acknowledge (Ack) or non-acknowledge (Nack) the interrupt based on attempting to deliver the interrupt to processors to which the cluster interrupt controller is coupled. In a soft iteration, the cluster interrupt controller may attempt to deliver the interrupt to processors that are powered on, without attempting to power on processors that are powered off. If the soft iteration does not result in an Ack response from one of the plurality of cluster interrupt controllers, a hard iteration may be performed in which the powered-off processors may be powered on.
Abstract:
An integrated circuit (IC) including a plurality of processor cores, a plurality of graphics processing units, a plurality of peripheral circuits, and a plurality of memory controllers is configured to support scaling of the system using a unified memory architecture. For example, the IC may include an interconnect fabric configured to provide communication between the one or more memory controller circuits and the processor cores, graphics processing units, and peripheral devices; and an off-chip interconnect coupled to the interconnect fabric and configured to couple the interconnect fabric to a corresponding interconnect fabric on another instance of the integrated circuit, wherein the interconnect fabric and the off-chip interconnect provide an interface that transparently connects the one or more memory controller circuits, the processor cores, graphics processing units, and peripheral devices in either a single instance of the integrated circuit or two or more instances of the integrated circuit.
Abstract:
An interrupt delivery mechanism for a system includes and interrupt controller and a plurality of cluster interrupt controllers coupled to respective pluralities of processors in an embodiment. The interrupt controller may serially transmit an interrupt request to respective cluster interrupt controllers, which may acknowledge (Ack) or non-acknowledge (Nack) the interrupt based on attempting to deliver the interrupt to processors to which the cluster interrupt controller is coupled. In a soft iteration, the cluster interrupt controller may attempt to deliver the interrupt to processors that are powered on, without attempting to power on processors that are powered off. If the soft iteration does not result in an Ack response from one of the plurality of cluster interrupt controllers, a hard iteration may be performed in which the powered-off processors may be powered on.
Abstract:
An interrupt delivery mechanism for a system includes and interrupt controller and a plurality of cluster interrupt controllers coupled to respective pluralities of processors in an embodiment. The interrupt controller may serially transmit an interrupt request to respective cluster interrupt controllers, which may acknowledge (Ack) or non-acknowledge (Nack) the interrupt based on attempting to deliver the interrupt to processors to which the cluster interrupt controller is coupled. In a soft iteration, the cluster interrupt controller may attempt to deliver the interrupt to processors that are powered on, without attempting to power on processors that are powered off. If the soft iteration does not result in an Ack response from one of the plurality of cluster interrupt controllers, a hard iteration may be performed in which the powered-off processors may be powered on.
Abstract:
Systems, apparatuses, and methods for efficiently moving data for storage and processing are described. In various embodiments, a compression unit within a processor includes multiple hardware lanes, selects two or more input words to compress, and for assigns them to two or more of the multiple hardware lanes. As each assigned input word is processed, each word is compared to an entry of a plurality of entries of a table. If it is determined that each of the assigned input words indexes the same entry of the table, the hardware lane with the oldest input word generates a single read request for the table entry and the hardware lane with the youngest input word generates a single write request for updating the table entry upon completing compression. Each hardware lane generates a compressed packet based on its assigned input word.