Abstract:
Branch history information characterizes results of branch instructions previously executed by a processor. A count is stored of a number of consecutive branch instructions previously executed by the processor whose results all indicate a not taken branch. In a first pipeline stage, a predicted branch result is provided based on at least a portion of the branch history information, and one or more of the branch history information, and the count, is updated based on the predicted branch result. In a second pipeline stage an actual branch result is provided based on an executed branch instruction, and the branch history information is updated based on the actual branch result. If the predicted branch result indicates a taken branch, the branch history information is updated based on the count, and if the predicted branch result indicates a not taken branch, the count is updated but not the branch history information.
Abstract:
A superscalar pipelined microprocessor includes a register set defined by its instruction set architecture, a cache memory, execution units, and a load unit, coupled to the cache memory and distinct from the other execution units. The load unit comprises an ALU. The load unit receives an instruction that specifies a memory address of a source operand, an operation to be performed on the source operand to generate a result, and a destination register of the register set to which the result is to be stored. The load unit reads the source operand from the cache memory. The ALU performs the operation on the source operand to generate the result, rather than forwarding the source operand to any of the other execution units of the microprocessor to perform the operation on the source operand to generate the result. The load unit outputs the result for subsequent retirement to the destination register.
Abstract:
A data processing apparatus and method for performing speculative vector access operations are provided. The data processing apparatus has a reconfigurable buffer accessible to vector data access circuitry and comprising a storage array for storing up to M vectors of N vectors elements. The vector data access circuitry performs speculative data write operations in order to cause vector elements from selected vector operands in a vector register bank to be stored into the reconfigurable buffer. On occurrence of a commit condition, the vector elements currently stored in the reconfigurable buffer are then written to a data store. Speculation control circuitry maintains a speculation width indication indicating the number of vector elements of each selected vector operand stored in the reconfigurable buffer. The speculation width indication is initialized to an initial value, but on detection of an overflow condition within the reconfigurable buffer the speculation width indication is modified to reduce the number of vector elements of each selected vector operand stored in the reconfigurable buffer. The reconfigurable buffer then responds to a change in the speculation width indication by reconfiguring the storage array to increase the number of vectors M and reduce the number of vector elements N per vector. This provides an efficient mechanism for supporting performance of speculative data write operations.
Abstract:
A method includes, in a processor that processes instructions of program code, processing a first segment of the instructions. One or more destination registers are identified in the first segment using an approximate specification of register access by the instructions. Respective values of the destination registers are made available to a second segment of the instructions only upon verifying that the values are valid for readout by the second segment in accordance with the approximate specification. The second segment is processed at least partially in parallel with processing of the first segment, using the values made available from the first segment.
Abstract:
Branch history information characterizes results of branch instructions previously executed by a processor. A count is stored of a number of consecutive branch instructions previously executed by the processor whose results all indicate a not taken branch. In a first pipeline stage, a predicted branch result is provided based on at least a portion of the branch history information, and one or more of the branch history information, and the count, is updated based on the predicted branch result. In a second pipeline stage an actual branch result is provided based on an executed branch instruction, and the branch history information is updated based on the actual branch result. If the predicted branch result indicates a taken branch, the branch history information is updated based on the count, and if the predicted branch result indicates a not taken branch, the count is updated but not the branch history information.
Abstract:
One embodiment of the present invention sets forth an improved way to prefetch instructions in a multi-level cache. Fetch unit initiates a prefetch operation to transfer one of a set of multiple cache lines, based on a function of a pseudorandom number generator and the sector corresponding to the current instruction L1 cache line. The fetch unit selects a prefetch target from the set of multiple cache lines according to some probability function. If the current instruction L1 cache 370 is located within the first sector of the corresponding L1.5 cache line, then the selected prefetch target is located at a sector within the next L1.5 cache line. The result is that the instruction L1 cache hit rate is improved and instruction fetch latency is reduced, even where the processor consumes instructions in the instruction L1 cache at a fast rate.
Abstract:
A data processing apparatus and method are provided for executing a vector scan instruction. The data processing apparatus comprises a vector register store configured to store vector operands, and processing circuitry configured to perform operations on vector operands retrieved from said vector register store. Further, control circuitry is configured to control the processing circuitry to perform the operations required by one or more instructions, said one or more instructions including a vector scan instruction specifying a vector operand comprising N vector elements and defining a scan operation to be performed on a sequence of vector elements within the vector operand. The control circuitry is responsive to the vector scan instruction to partition the N vector elements of the specified vector operand into P groups of adjacent vector elements, where P is between 2 and N/2, and to control the processing circuitry to perform a partitioned scan operation yielding the same result as the defined scan operation. The processing circuitry is configured to perform the partitioned scan operation by performing separate scan operations on those vector elements of the sequence contained within each group to produce intermediate results for each group, and to perform a computation operation to combine the intermediate results into a final result vector operand containing a sequence of result vector elements. The partitioned scan operation approach of the present invention enables a balance to be achieved between energy consumption and performance.
Abstract:
A data processing apparatus and method for performing speculative vector access operations are provided. The data processing apparatus has a reconfigurable buffer accessible to vector data access circuitry and comprising a storage array for storing up to M vectors of N vectors elements. The vector data access circuitry performs speculative data write operations in order to cause vector elements from selected vector operands in a vector register bank to be stored into the reconfigurable buffer. On occurrence of a commit condition, the vector elements currently stored in the reconfigurable buffer are then written to a data store. Speculation control circuitry maintains a speculation width indication indicating the number of vector elements of each selected vector operand stored in the reconfigurable buffer. The speculation width indication is initialised to an initial value, but on detection of an overflow condition within the reconfigurable buffer the speculation width indication is modified to reduce the number of vector elements of each selected vector operand stored in the reconfigurable buffer. The reconfigurable buffer then responds to a change in the speculation width indication by reconfiguring the storage array to increase the number of vectors M and reduce the number of vector elements N per vector. This provides an efficient mechanism for supporting performance of speculative data write operations.
Abstract:
A method includes providing a data processor having an instruction pipeline, where the instruction pipeline has a plurality of instruction pipeline stages, and where the plurality of instruction pipeline stages includes a first instruction pipeline stage and a second instruction pipeline stage. The method further includes providing a data processor instruction that causes the data processor to perform a first set of computational operations during execution of the data processor instruction, performing the first set of computational operations in the first instruction pipeline stage if the data processor instruction is being executed and a first mode has been selected, and performing the first set of computational operations in the second instruction pipeline stage if the data processor instruction is being executed and a second mode has been selected.
Abstract:
Techniques are disclosed relating to integrated circuits that include hardware support for divide and/or square root operations. In one embodiment, an integrated circuit is disclosed that includes a division unit that, in turn, includes a normalization circuit and a plurality of divide engines. The normalization circuit is configured to normalize a set of operands. Each divide engine is configured to operate on a respective normalized set of operands received from the normalization circuit. In some embodiments, the integrated circuit includes a scheduler unit configured to select instructions for issuance to a plurality of execution units including the division unit. The scheduler unit is further configured to maintain a counter indicative of a number of instructions currently being operated on by the division unit, and to determine, based on the counter whether to schedule subsequent instructions for issuance to the division unit.