Abstract:
Systems and methods for implementing certain load instructions, such as vector load instructions by cooperation of a main processor and a coprocessor. The load instructions which are identified by the main processor for offloading to the coprocessor are committed in the main processor without receiving corresponding load data. Post-commit, the load instructions are processed in the coprocessor, such that latencies incurred in fetching the load data are hidden from the main processor. By implementing an out-of-order load data buffer associated with an in-order instruction buffer, the coprocessor is also configured to avoid stalls due to long latencies which may be involved in fetching the load data from levels of memory hierarchy, such as L2, L3, L4 caches, main memory, etc.
Abstract:
Dynamic load balancing of hardware threads in clustered processor cores using shared hardware resources, and related circuits, methods, and computer readable media are disclosed. In one aspect, a dynamic load balancing circuit comprising a control unit is provided. The control unit is configured to determine whether a suboptimal load condition exists between a first cluster and a second cluster of a clustered processor core. If a suboptimal load condition exists, the control unit is further configured to transfer a content of private register(s) of a first hardware thread of the first cluster to private register(s) of a second hardware thread of the second cluster via shared hardware resources of the first hardware thread and the second hardware thread. The control unit is also configured to exchange a first identifier associated with the first hardware thread with a second identifier associated with the second hardware thread via the shared hardware resources.
Abstract:
Methods, systems, and devices for techniques for instruction perturbation for improved device security are described. A device may assign a set of executable instructions to an instruction packet based on a parameter associated with the instruction packet, and each executable instruction of the set of executable instructions may be independent from other executable instructions of the set of executable instructions. The device may select an order of the set of executable instructions based on a slot instruction rule associated with the device, and each executable instruction of the set of executable instructions may correspond to a respective slot associated with memory of the device. The device may modify the order of the set of executable instructions in a memory hierarchy post pre-decode based on the slot instruction rule and process the set of executable instructions of the instruction packet based on the modified order.
Abstract:
An apparatus includes an access mode selection circuit configured to select a cache access mode based on a number of instructions stored at an issue queue, a number of active threads of an execution unit coupled to a cache, or both. The access mode selection circuit is further configured to generate an access mode signal based on the selected cache access mode. The apparatus further includes an address generation circuit configured to perform a cache access based on the access mode signal.
Abstract:
An apparatus includes an access mode selection circuit configured to select a cache access mode based on a number of instructions stored at an issue queue, a number of active threads of an execution unit coupled to a cache, or both. The access mode selection circuit is further configured to generate an access mode signal based on the selected cache access mode. The apparatus further includes an address generation circuit configured to perform a cache access based on the access mode signal.
Abstract:
Dynamic load balancing of hardware threads in clustered processor cores using shared hardware resources, and related circuits, methods, and computer readable media are disclosed. In one aspect, a dynamic load balancing circuit comprising a control unit is provided. The control unit is configured to determine whether a suboptimal load condition exists between a first cluster and a second cluster of a clustered processor core. If a suboptimal load condition exists, the control unit is further configured to transfer a content of private register(s) of a first hardware thread of the first cluster to private register(s) of a second hardware thread of the second cluster via shared hardware resources of the first hardware thread and the second hardware thread. The control unit is also configured to exchange a first identifier associated with the first hardware thread with a second identifier associated with the second hardware thread via the shared hardware resources.
Abstract:
Dynamic load balancing of hardware threads in clustered processor cores using shared hardware resources, and related circuits, methods, and computer readable media are disclosed. In one aspect, a dynamic load balancing circuit comprising a control unit is provided. The control unit is configured to determine whether a suboptimal load condition exists between a first cluster and a second cluster of a clustered processor core. If a suboptimal load condition exists, the control unit is further configured to transfer a content of private register(s) of a first hardware thread of the first cluster to private register(s) of a second hardware thread of the second cluster via shared hardware resources of the first hardware thread and the second hardware thread. The control unit is also configured to exchange a first identifier associated with the first hardware thread with a second identifier associated with the second hardware thread via the shared hardware resources.
Abstract:
Very long instruction word (VLIW) instruction processing using a reduced-width processor is disclosed. In a particular embodiment, a VLIW processor includes a control circuit configured to receive a VLIW packet that includes a first number of instructions and to distribute the instructions to a second number of instruction execution paths. The first number is greater than the second number. The VLIW processor also includes physical registers configured to store results of executing the instructions and a register renaming circuit that is coupled to the control circuit.
Abstract:
A programmable logic circuit such as a finite state machine is provided that is configured to determine a memory array power-up sequence from a configuration signal to successively enable each memory array. A delay circuit triggers an initial memory bank in each enabled memory array to power-up without a delay. The delay circuit then counts responsive to a clock to determine a delay between a successive triggering of remaining memory banks in each enabled memory array to power-up.
Abstract:
A processor includes priority adjustment circuitry configured to adjust a priority of a thread of multiple threads configured to execute tasks to have a software-defined priority value or a designated high priority value. The processor also includes circuitry configured to identify a lowest priority thread of the multiple threads and a control unit configured to cause the lowest priority thread to take a pending interrupt.