Abstract:
Disclosed are an apparatus, system, and method for implementing predicated instructions using micro-operations. A micro-code engine receives an instruction, decomposes the instruction, and generates a plurality of micro-operations to implement the instruction. Each of the decomposed micro-operations indicates a single destination register. For predicated instructions, the decomposed micro-operations include “conditional move” micro-operations to select between two potential output values. Except in the case that one of the potential output values is a constant, the decomposed micro-operations for a predicated instruction also include an append instruction that saves the incoming value of a destination register in a temporary variable. For at least one embodiment, the qualifying predicate for a predicated instruction is appended to the incoming value stored in the temporary register.
Abstract:
The present invention relates to the design of highly reliable high performance microprocessors, and more specifically to designs that use cache memory protection schemes such as, for example, a 1-hot plus valid bit scheme and a 2-hot vector cache scheme. These protection schemes protect the 1-hot vectors used in the tag array in the cache and are designed to provide hardware savings, operate at higher speeds and be simple to implement. In accordance with an embodiment of the present invention, a tag array memory including an input conversion circuit to receive a 1-hot vector and to convert the 1-hot vector to a 2-hot vector. The tag array memory also including a memory array coupled to the input conversion circuit, the memory array to store the 2-hot vector; and an output conversion circuit coupled to the memory array, the output conversion circuit to receive the 2-hot vector and to convert the 2-hot vector back to the 1-hot vector.
Abstract:
The present invention relates to the design of highly reliable high performance microprocessors, and more specifically to designs that use cache memory protection schemes such as, for example, a 1-hot plus valid bit scheme and a 2-hot vector cache scheme. These protection schemes protect the 1-hot vectors used in the tag array in the cache and are designed to provide hardware savings, operate at higher speeds and be simple to implement. In accordance with an embodiment of the present invention, a tag array memory circuit includes a plurality of memory bit circuits coupled together to form an n-bit memory cell, and a valid bit circuit coupled to the n-bit memory cell, the valid bit circuit being configured to be accessed simultaneously with the plurality of memory bit circuits.
Abstract:
An apparatus includes a clock to produce pulses and an electronic hardware structure having a plurality of rows and one or more ports. Each row is adapted to record a separate latency vector written through one of the ports. The latency vector recorded therein is responsive to the clock. A method of dispatching instructions in a processor includes updating a plurality of expected latencies to a portion of rows of a register latency table, and decreasing the expected latencies remaining in other of the rows in response to a clock pulse. The rows of the portion correspond to particular registers.
Abstract:
Apparatus for determining the length of an instruction being processed by a computer system when instructions vary in length and appear sequentially in an instruction stream without differentiation between instructions including apparatus for providing an end bit for each predesignated length of an instruction to indicate that the instruction ends at that point in its length, apparatus for setting the end bit at the particular predesignated length of the instruction which is the actual end of the instruction, a first channel for processing a first instruction in sequence, a second channel for processing an instruction next following the first instruction, and apparatus for looking at the end bits of an instruction being processed by the first channel to determine the end point of that instruction and the beginning of the next instruction from the stream of instructions.
Abstract:
A method, system, and apparatus may initialize a fixed plurality of page table entries for a fixed plurality of pages in memory, each page having a first size, wherein a linear address for each page table entry corresponds to a physical address and the fixed plurality of pages are aligned. A bit in each of the page table entries for the aligned pages may be set to indicate whether or not the fixed plurality of pages is to be treated as one combined page having a second page size larger than the first page size. Other embodiments are described and claimed.
Abstract:
In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed.
Abstract:
A technique to perform a fast compare-exchange operation is disclosed. More specifically, a machine-readable medium, processor, and system are described that implement a fast compare-exchange operation as well as a cache line mark operation that enables the fast compare-exchange operation.
Abstract:
A method and apparatus is disclosed that uses an arithmetic circuit for adding numbers represented in a redundant form to also subtract numbers received in redundant form, including numbers received from a bypass circuit. A non-propagative comparator circuit is then used to compare a given value with a result from the arithmetic circuit to determine if the result is equal to the given value. All of the operations described above can be accomplished without propagating carry signals throughout the circuitry.The method includes generating a complemented redundant form of at least one number supplied to the arithmetic circuit in redundant form. It also includes providing adjustment input to the arithmetic circuit to augment a result produced through the arithmetic circuit. This adjustment causes the arithmetic circuit to generate a valid outcome in redundant form as a result of a subtraction operation if the arithmetic operation is subtraction. Then the result is compared to a given value using a non-propagative comparator to determine equality or inequality of the result to the given value.
Abstract:
After an instruction loads data into a register at a first time, the register is monitored to see if it is read in a next clock cycle. When the data is not read in the next clock cycle, the instruction is classified as a slowable instruction. An instruction address associated with the instruction is used to update a history table. The history table stores information to indicate if an instruction is a slowable instruction or a non-slowable instruction. When the instruction address of the instruction is encountered at a second time, the history table is used to determine if the instruction is slowable or non-slowable.