Abstract:
Method and apparatus for performing table lookup are disclosed. In one embodiment, the method includes providing a lookup table, where the lookup table includes a plurality of translation modes and each translation mode includes a corresponding translation table tree supporting a plurality of page sizes. The method further includes receiving a search request from a requester, determining a translation table tree for conducting the search request, determining a lookup sequence based on the translation table tree, generating a search output using the lookup sequence, and transmitting the search output to the requester. The plurality of translation modes includes a first set of page sizes for 32-bit operating system software and a second set of page sizes for 64-bit operating system software. The plurality of page sizes includes non-global pages, global pages, and both non-global and global pages.
Abstract:
A superscalar processor includes a scheduler which selects operations for out-of-order execution. The scheduler contains storage and control logic which is partitioned into entries corresponding to operations to be executed, being executed, or completed. The scheduler issues operations to execution units for parallel pipelined execution, selects and provides operands as required for execution, and acts as a reorder buffer keeping the results of operations until the results can be safely committed. The scheduler is tightly coupled to execution pipelines and provides a large parallel path for initial operation stages which minimize pipeline bottlenecks and hold ups into and out of the execution units. The scheduler monitors the entries to determine when all operands required for execution of an operation are available and provides required operands to the execution units. The operands selected can be from a register file, a scheduler entry, or an execution unit. Control logic in the entries is linked together into scan chains which identify operations and operands for execution.
Abstract:
A processing system includes sequential entries for storing operations of different types and a scan chain which can identify an operation of a first type which follows after an operation of a second type. The first and second types can be identical so that the scan chain identifies the second operation of a particular type in the sequence. The scan chain includes single-entry "generate", "propagate", "kill", and "only" terms which control a scan bit. Conceptually, if the "only" term is not asserted, an entry of the second type generates the scan bit and asserts the "only" term. After the "only" term is asserted, further generation of the scan bit is inhibited. Each entry either propagates the scan bit to the next entry or if the entry is of the first type, kills the scan bit and identifies itself as the selected entry. Look-ahead logic determines group terms from single-entry terms to indicate whether a scan bit would be generated, propagated, or killed by a group of entries. Accordingly, the scan bit is not required to propagate through every entry, and scans can be performed quickly.
Abstract:
A processor which includes tags indicating memory addresses for instructions advancing through pipeline stages of the processor and which includes an instruction decoder having a store target address buffer allows a self-modifying code handling system to detect store operations writing into the instruction stream and trigger a self-modifying code fault. In one embodiment of a seIf-modifying code handling system, a store pipe is coupled to a data cache to commit results of a store operation to a memory subsystem. The store pipe supplies a store operation target address indication on commitment of a store operation result. A scheduler includes ordered Op entries for Ops decoded from instructions and includes corresponding first address tags covering memory addresses for the instructions. First comparison logic is coupled to the store pipe and to the first address tags to trigger self-modifying code fault handling means in response to a match between the store operation target address and one of the first address tags. An instruction decoder is coupled between the instruction cache and the scheduler. The instruction decoder includes instruction buffer entries and second address tags associated with the instruction buffer entries. Second comparison logic is coupled to the store pipe and to the second address tags to trigger the self-modifying code fault handling means in response to a match between the store operation target address and one of the second address tags.
Abstract:
A superscalar microprocessor includes a scheduler which contains storage for information related to operations and scan logic for selecting operations for out-of-order execution by a set of execution units. To provide fast operation, the selection is made without regard for the availability of operands which are required for execution of the operation but may be unavailable pending completion of an operation. An operand forward stage, which follows the issue stage, selects sources for an operand which may be a register file or a sourcing operation in the scheduler, completed or not. The scheduler contains all information describing the sourcing operations and forwards an operand value and information indicating the state of a sourcing operations. The state information indicates whether the sourcing operation is complete and execution of the issued operation can continue. The state also indicates a wait until the sourcing operation will complete. If the wait is too long, the issued operation is bumped so that another operation can be executed. This reduces pipeline hold ups and increase execution unit utilization.
Abstract:
Accordingly, a prefetch instruction mechanism is desired for implementing a prefetch instruction which is non-faulting, non-blocking, and non-modifying of architectural register state. Advantageously, a prefetch mechanism described herein is provided largely without the addition of substantial complexity to a load execution unit. In one embodiment, the non-faulting attribute of the prefetch mechanism is provided though use of the vector decode supplied Op sequence that activates an alternate exception handler. The non-modifying of architectural register state attribute is provided (in an exemplary embodiment) by first decoding a PREFETCH instruction to an Op sequence targeting a scratch register wherein the scratch register has scope limited to the Op sequence corresponding to the PREFETCH instruction. Although described in the context of a vector decode embodiment, the prefetch mechanism can be implemented with hardware decoders and suitable modifications to decode paths will be appreciated by those of skill in the art based on the description herein. Similarly, although in one particular embodiment such a scratch register is architecturally defined to read as a NULL (or zero) value, any target for the Op sequence that is not part of the architectural state of the processor would also be suitable. Finally, in one embodiment the non-blocking attribute is provided by the Op sequence completing (without waiting for return of fill data) upon posting of a cache fill request to load logic of a data cache. In this way, LdOps which follow in a load pipe are not stalled by a prefetch-related miss and can instead execute concurrently with the prefetch-related line fill.
Abstract:
An even bus clock circuit generates logic pulses in response to substantially coincident rising edges of a processor clock and a bus clock over a given range of processor clock to bus clock ratios that includes whole integers and half integers. The even bus clock circuit includes a delay element for receiving the bus clock and generating a delayed bus clock, a first flip-flop for receiving the processor clock at a data input and receiving the delayed bus clock at a clock input, and a second flip-flop for receiving a data output of the first flip-flop at a data input, receiving the processor clock at a clock input and generating a data output that is coupled to an asynchronous reset input of the first flip-flop. The logic pulses are generated at the data output of the first flip-flop and have a pulse width of substantially the same duration as a single cycle of the processor clock.
Abstract:
In a data processing system, a circuit for providing an even bus clock signal, EVENBCLK, when the leading edges of the bus clock signal BCLK and a processor clock signal PCLK are coincident includes a phase-locked loop unit and a coincidence unit. The phase-locked loop unit provides PCLK signals that have a frequency Nx the frequency of the BCLK signals, where N can have an integer or a half integer value. The phase-locked loop unit includes a divide-by-M unit, where M=2N, that receives the PCLK signal at an input terminal and applies an output signal, PCLK/M, to the phase detector unit of the phase-locked loop unit. The operation of the phase-locked loop results in the BCLK signal and the PCLK/M signal having an established phase relationship. The PCLK signal and the PCLK/M signal are applied to the coincidence unit, the simultaneous application of the two signals resulting in the coincidence unit providing the EVENBCLK signals. When N is an integer, the PCLK signal and the BCLK signal have coincident rising edges that do not coincide with a leading edge of a PCLK/M signal. In this situation, a delayed signal, triggered by a previous PCLK/M signal, is generated that is applied to the coincidence unit in place of the missing PCLK/M signal to provide the EVENBCLK signal.
Abstract:
A superscalar processor includes a central scheduler for multiple execution units. The scheduler presumes operations issued to a particular execution unit all have the same latency, e.g., one clock cycle, even though some of the operations have longer latencies, e.g., two clock cycles. The execution unit that executes the operations having with longer than expected latencies, includes scheduling circuitry that holds up particular operation pipelines when operands required for the pipelines will not be valid when the scheduler presumes. Accordingly, the design of the scheduler can be simplified and can accommodate longer latency operations without being significantly redesigned for the longer latency operations.
Abstract:
Scan logic which tracks the relative age of stores with respect to a particular load (or of loads with respect to a particular store) allows at processor to hold younger stores until the completion of older loads (or to hold younger loads until completion of older stores). Embodiments of propagate-kill style lookahead scan logic or of tree-structured, hierarchically-organized scan logic constructed in accordance with the present invention provide store older and load older indications with very few gate delays, even in processor embodiments adapted to concurrently evaluate large numbers of operations. Operating in conjunction with the scan logic, address matching logic allows the processor to more precisely tailor its avoidance of load-store (or store-load) dependencies. In a processor having a load unit and a store unit, a load/store execution control system allows load and store instructions to execute generally out-of-order with respect to each other while enforcing data dependencies between the load and store instructions.