摘要:
A CPU of the RISC type employs a standardized, fixed instruction size, and permits only simplified memory access data width and addressing modes limited to register-to-register operations and register load/store operations. Byte manipulation instructions include the facility for doing in-register byte extract, insert and masking, along with non-aligned load and store instructions. The load/locked and store/conditional instructions permits the implementation of atomic byte writes. By providing a conditional move instruction, many short branches can be eliminated altogether. A conditional move instruction tests a register and moves a second register to a third if the condition is met; this function can be substituted for short branches and thus maintain the sequentiality of the instruction stream. Performance can be speeded up by predicting the target of a branch and prefetching the new instruction based upon this prediction; a branch prediction rule is followed that requires all forward branches to be predicted not-taken and all backward branches to be predicted as taken. Another embodiment uses unused bits in the standard-sized instruction to provide a hint of the expected target address for jump and jump to subroutine instructions or the like. The target can thus be prefetched before the actual address has been calculated and placed in a register. In addition, the unused displacement part of the jump instruction can contain a field to define the actual type of jump, i.e., jump, jump to subroutine, return from subroutine, and thus place a predicted target address in a stack to allow prefetching before the instruction has been executed. The processor can employ a variable memory page size, so that the entries in a translation buffer for implementing virtual addressing can be optimally used. A granularity hint is added to the page table entry to define the page size for this entry. An additional feature is the addition of a prefetch instruction which serves to move a block of data to a faster-access cache in the memory hierarchy before the data block is to be used.
摘要:
A high-performance CPU of the RISC (reduced instruction set) type employs a standardized, fixed instruction size, and permits only simplified memory access data width and addressing modes. The instruction set is limited to register-to-register operations and register load/store operations. Byte manipulation instructions, included to permit use of previously-established data structures, include the facility for doing in-register byte extract, insert and masking, along with non-aligned load and store instructions. The provision of load/locked and store/conditional instructions permits the implementation of atomic byte writes. By providing a conditional move instruction, many short branches can be eliminated altogether. A conditional move instruction tests a register and moves a second register to a third if the condition is met; this function can be substituted for short branches and thus maintain the sequentiality of the instruction stream. Performance can be speeded up by predicting the target of a branch and prefetching the new instruction based upon this prediction; a branch prediction rule is followed that requires all forward branches to be predicted not-taken and all backward branches (as is common for loops) to be predicted as taken. Another performance improvement makes use of unused bits in the standard-sized instruction to provide a hint of the expected target address for jump and jump to subroutine instructions or the like. The target can thus be prefetched before the actual address has been calculated and placed in a register. In addition, the unused displacement part of the jump instruction can contain a field to define the actual type of jump, i.e., jump, jump to subroutine, return from subroutine, and thus place a predicted target address in a stack to allow prefetching before the instruction has been executed. The processor can employ a variable memory page size, so that the entries in a translation buffer for implementing virtual addressing can be optimally used. A granularity hint is added to the page table entry to define the page size for this entry. An additional feature is the addition of a prefetch instruction which serves to move a block of data to a faster-access cache in the memory hierarchy before the data block is to be used.
摘要:
A high-performance CPU of the RISC (reduced instruction set) type employs a standardized, fixed instruction size, and permits only simplified memory access data width and addressing modes. The instruction set is limited to register-to-register operations and register load/store operations. Byte manipulation instructions, included to permit use of previously-established data structures, include the facility for doing in-register byte extract, insert and masking, along with non-aligned load and store instructions. The provision of load/locked and store/conditional instructions permits the implementation of atomic byte writes. By providing a conditional move instruction, many short branches can be eliminated altogether. A conditional move instruction tests a register and moves a second register to a third if the condition is met; this function can be substituted for short branches and thus maintain the sequentiality of the instruction stream.
摘要:
A high-performance CPU of the RISC (reduced instruction set) type employs a standardized, fixed instruction size, and permits only simplified memory access data width and addressing modes. The instruction set is limited to register-to-register operations and register load/store operations. Performance can be speeded up by predicting the target of a branch and prefetching the new instruction based upon this prediction; a branch prediction rule is followed that requires all forward branches to be predicted not-taken and all backward branches (as is common for loops) to be predicted as taken. Another performance improvement makes use of unused bits in the standard. sized instruction to provide a hint of the expected target address for jump and jump to subroutine instructions or the like. The target can thus be prefetched before the actual address has been calculated and placed in a register. In addition, the unused displacement part of the jump instruction can contain a field to define the actual type of jump, i.e., jump, jump to subroutine, return from subroutine, and thus place a predicted target address in a stack to allow prefetching before the instruction has been executed.
摘要:
A high-performance central processing unit (CPU) of the reduced instruction set (RISC) type employs a standardized, fixed instruction size, and permits only simplified memory access data width and addressing modes. The instruction set is limited to register-to-register operations and register load/store operations. The processor can employ a variable memory page size, so that the entries in a translation buffer for implementing virtual addressing can be optimally used. A granularity hint is added to the page table entry to define the page size for this entry.
摘要:
A load/store pipeline in a computer processor for loading data to registers and storing data from the registers has a cache memory within the pipeline for storing data. The pipeline includes buffers which support multiple outstanding read request misses. Data from out of the pipeline is obtained independently of the operation of the pipeline, this data corresponding to the request misses. The cache memory can then be filled with the requested for data. The provision of a cache memory within the pipeline, and the buffers for supporting the cache memory, speed up loading operations for the computer processor.
摘要:
In a data procesing system having a kernel mode (i.e., for executing privileged instructions) and a user mode of operation, apparatus for responding to interrupt conditions includes a first register, subject to the control of the currently executing program for enabling the generation of a mode-related interrupt signal and includes a second register for indicating the presence of a pending mode-related interrupt condition and a third register for requesting a mode-related interrupt be entered in the second register. The mode of operation and the enable and pending interrupt condition registers are monitored and when the signals in the two registers have the appropriate relationship, an interrupt signal is generated to which a control program will respond. The contents of the first register can be controlled by the currently executing program which can control the enabling signal for the currently executing mode. The pending interrupt condition and the request registers may be accessed only from the privileged mode of operation.
摘要:
A load/store pipeline in a computer processor for loading data to registers and storing data from the registers has a cache memory within the pipeline for storing data. The pipeline includes buffers which support multiple outstanding read request misses. Data from out of the pipeline is obtained independently of the operation of the pipeline, this data corresponding to the request misses. The cache memory can then be filled with the data that has been requested. The provision of a cache memory within the pipeline, and the buffers for supporting the cache memory, speed up loading operations for the computer processor.
摘要:
In a data processing system employing virtual memory techniques and capable of performing a plurality of overlapping scalar and vector data processing operations, apparatus and method are provided to allow continuation of program execution after one or more vector load/store instructions, which refer to data values that are not currently in memory, receive page faults. At the occurrence of such a page fault, all instructions currently in execution are allowed to be completed, whereupon information summarizing the page fault condition is recorded in memory for use by the operating system software and a vector exception is generated. Operating system software responds to this exception, examines the fault information, causes the missing pages to be read into the main memory unit from the mass storage media, re-executes the exception producing vector instruction(s) and continues with the program execution.
摘要:
A data processing system (100) comprises a system bus (120), a plurality of devices (110, 150, 160, 170) coupled to the system bus (120), a bus monitor circuit (140), and a clock generator (130). The plurality of devices (110, 150, 160, 170) includes at least one bus master (110, 150) which is capable of performing accesses on the system bus (120). The bus monitor circuit (140) is coupled to the at least one bus master (110, 150), and has an output for providing a bus idle signal to indicate that no bus master is attempting to perform an access on the system bus (120). The clock generator (130) has an output coupled to at least one of the plurality of devices (110, 150, 160, 170) and provides a bus clock signal having a first frequency when the bus idle signal is inactive and having a second frequency lower than the first frequency when the bus idle signal is active.