摘要:
A system, processor, and method to predict with high accuracy and retain instruction boundaries for previously executed instructions in order to decode variable length instructions is disclosed. In at least one embodiment, a disclosed processor includes an instruction fetch unit, an instruction cache, a boundary byte predictor, and an instruction decoder. In some embodiments, the instruction fetch unit provides an instruction address and the instruction cache produces an instruction tag and instruction cache content corresponding to the instruction address. The instruction decoder, in some embodiments, includes boundary byte logic to determine an instruction boundary in the instruction cache content.
摘要:
In a processing system comprising a plurality of processing nodes coupled via a switching fabric, a method includes implementing a flow control property for a data flow in the switching fabric based on an addressing property of an address of a virtual network interface controller associated with the data flow. A switching fabric includes a plurality of ports, each port coupleable to a corresponding processing node, and switching logic coupled to the plurality of ports. The switching fabric further includes flow control logic to implement a flow control property for a data flow in the switching logic based on an addressing property of an address of a virtual network interface controller associated with the data flow.
摘要:
An apparatus and method is described herein for conditionally committing and/or speculative checkpointing transactions, which potentially results in dynamic resizing of transactions. During dynamic optimization of binary code, transactions are inserted to provide memory ordering safeguards, which enables a dynamic optimizer to more aggressively optimize code. And the conditional commit enables efficient execution of the dynamic optimization code, while attempting to prevent transactions from running out of hardware resources. While the speculative checkpoints enable quick and efficient recovery upon abort of a transaction. Processor hardware is adapted to support dynamic resizing of the transactions, such as including decoders that recognize a conditional commit instruction, a speculative checkpoint instruction, or both. And processor hardware is further adapted to perform operations to support conditional commit or speculative checkpointing in response to decoding such instructions.
摘要:
An arrangement is provided for compressing microcode ROM (“uROM”) in a processor and for efficiently accessing a compressed “uROM”. A clustering-based approach may be used to effectively compress a uROM. The approach groups similar columns of microcode into different clusters and identifies unique patterns within each cluster. Only unique patterns identified in each cluster are stored in a pattern storage. Indices, which help map an address of a microcode word (“uOP”) to be fetched from a uROM to unique patterns required for the uOP, may be stored in an index storage. Typically it takes a longer time to fetch a uOP from a compressed uROM than from an uncompressed uROM. The compressed uROM may be so designed that the process of fetching a uOP (or uOPs) from a compressed uROM may be fully-pipelined to reduce the access latency.
摘要:
An apparatus and method is described herein for coupling a processor core of a first type with a co-designed core of a second type. Execution of program code on the first core is monitored and hot sections of the program code are identified. Those hot sections are optimize for execution on the co-designed core, such that upon subsequently encountering those hot sections, the optimized hot sections are executed on the co-designed core. When the co-designed core is executing optimized hot code, the first processor core may be in a low-power state to save power or executing other code in parallel. Furthermore, multiple threads of cold code may be pipelined on the first core, while multiple threads of hot code are pipeline on the co-designed core to achieve maximum performance.
摘要:
During a compressing portion, memory (20) is divided into cache line blocks (500). Each cache line block is compressed and modified by replacing address destinations of address indirection instructions with compressed address destinations. Each cache line block is modified to have a flow indirection instruction as the last instruction in each cache line. The compressed cache line blocks (500) are stored in a memory (858). During a decompression portion, a cache line (500) is accessed based on an instruction pointer (902) value. The cache line is decompressed and stored in cache. The cache tag is determined based on the instruction pointer (902) value.
摘要:
A native microprocessor (20) accesses a foreign block of computer code. An initial block scope defining translation parameters is assigned to the block (106). The block of "foreign" code is translated to "native" code (108). An optimization efficiency is calculated for the translated block (110). A rescheduling criterion is established based on the optimization efficiency (112). The block of native code is executed (114). On subsequent accesses of the block when the reschedule criterion is met (116) the block scope is redefined (118).
摘要:
In a data processing system, a plurality of prefetch elements are provided for prefetching instructions from a group of memory arrays coupled to each prefetch element. A plurality of instruction words are sequentially stored in each group of memory arrays coupled to each prefetch element. In response to a selected prefetch element receiving a prefetch token, the selected prefetch element sequentially recalls instruction words from the group of memory arrays coupled to the selected prefetch element. Thereafter, the selected prefetch element transfers the sequence of instruction words to a central processing unit at a rate of one instruction word per cycle time. In response to a forthcoming conditional branch instruction, a plurality of prefetch elements may initiate instruction fetching so that the proper instruction may be executed during the cycle time immediately following the conditional branch instruction. By coupling a group of memory banks to each prefetch element, and limiting the location of branch instructions to the last memory bank in the group of memory banks, the number of prefetch elements required to implement a data processing system having substantially similar performance to the prior art architecture is reduced. In an alternative embodiment, video memories are utilized to store instruction words, and provide such instruction words to the CPU at the rate of one instruction word per cycle time.
摘要:
In a data processing system, a tag memory is divided into a first tag memory portion and a second tag memory portion. Next, an address for recalling requested data is generated by a central processing unit. Thereafter, a first and second tag memory addresses are concurrently computed, where the first and second tag memory addresses have bits which differ in value in a selected corresponding bit location. In response to the value of the bit in the selected bit location, the first tag memory address is coupled to either the first or second tag memory portion, and, concurrently, the second tag memory address is coupled to the other tag memory portion. Next, tag data is concurrently recalled from both the first and second tag memory portions utilizing the first and second tag memory addresses. A search tag is generated in response to the memory address from the CPU. Thereafter, the search tag and the recalled tag data from the first and second tag memory portions are concurrently compared. If either comparison results in a match, a "hit" is indicated. In response to the indication of a hit, requested data is recalled from the data portion of the cache memory system utilizing the recalled tag data that matched the search tag.
摘要:
An apparatus and method is described herein for conditionally committing and/or speculative checkpointing transactions, which potentially results in dynamic resizing of transactions. During dynamic optimization of binary code, transactions are inserted to provide memory ordering safeguards, which enables a dynamic optimizer to more aggressively optimize code. And the conditional commit enables efficient execution of the dynamic optimization code, while attempting to prevent transactions from running out of hardware resources. While the speculative checkpoints enable quick and efficient recovery upon abort of a transaction. Processor hardware is adapted to support dynamic resizing of the transactions, such as including decoders that recognize a conditional commit instruction, a speculative checkpoint instruction, or both. And processor hardware is further adapted to perform operations to support conditional commit or speculative checkpointing in response to decoding such instructions.