Abstract:
A processor device includes a memory and a sequencer that is responsive to the memory. The sequencer supports very long instruction word (VLIW) type instructions and at least one VLIW instruction packet uses a number of operands during execution, The processor device further includes a plurality of instruction execution units responsive to the sequencer and a plurality of register files. Each of the plurality of register files includes a plurality of registers and the plurality of register files are coupled to the plurality of instruction execution units. Further, each of the plurality of register flies includes a number of data read ports and the number of data read ports of each of the plurality of register files is less than the number of operands used by the at least one VLIW instruction packet.
Abstract:
A method includes executing, at a processor, a dedicated arithmetic encoding instruction. The dedicated arithmetic encoding instruction accepts a plurality of inputs including a first range, a first offset, and a first state and produces one or more outputs based on the plurality of inputs. The method also includes storing a second state, realigning the first range to produce a second range, and realigning the first offset to produce a second offset based on the one or more outputs of the dedicated arithmetic encoding instruction.
Abstract:
A particular method includes receiving at least one translation lookaside buffer (TLB) configuration indicator. The at least one TLB configuration indicator indicates a specific number of entries to be enabled at a TLB. The method further includes modifying a number of searchable entries of the TLB in response to the at least one TLB configuration indicator.
Abstract:
An apparatus includes a processor and a guest operating system. In response to receiving a request to create a task, the guest operating system requests a hypervisor to create a virtual processor to execute the requested task. The virtual processor is schedulable on the processor.
Abstract:
Systems and methods for implementing certain load instructions, such as vector load instructions by cooperation of a main processor and a coprocessor. The load instructions which are identified by the main processor for offloading to the coprocessor are committed in the main processor without receiving corresponding load data. Post-commit, the load instructions are processed in the coprocessor, such that latencies incurred in fetching the load data are hidden from the main processor. By implementing an out-of-order load data buffer associated with an in-order instruction buffer, the coprocessor is also configured to avoid stalls due to long latencies which may be involved in fetching the load data from levels of memory hierarchy, such as L2, L3, L4 caches, main memory, etc.
Abstract:
An apparatus includes a processor and a guest operating system. In response to receiving a request to create a task, the guest operating system requests a hypervisor to create a virtual processor to execute the requested task. The virtual processor is schedulable on the processor.
Abstract:
A method includes executing, at a processor, a dedicated arithmetic encoding instruction. The dedicated arithmetic encoding instruction accepts a plurality of inputs including a first range, a first offset, and a first state and produces one or more outputs based on the plurality of inputs. The method also includes storing a second state, realigning the first range to produce a second range, and realigning the first offset to produce a second offset based on the one or more outputs of the dedicated arithmetic encoding instruction.
Abstract:
In a particular embodiment, a method includes identifying one or more way prediction characteristics of an instruction. The method also includes selectively reading, based on identification of the one or more way prediction characteristics, a table to identify an entry of the table associated with the instruction that identifies a way of a data cache. The method further includes making a prediction whether a next access of the data cache based on the instruction will access the way.
Abstract:
Parallelization of scalar operations by vector processors using data-indexed accumulators in vector register files, related circuits, methods, and computer-readable media are disclosed. In one aspect, a vector processor comprises a vector register file providing a plurality of write ports and a plurality of vector registers each providing a plurality of accumulators. The vector processor receives an input data vector. For each of the plurality of write ports, the vector processor executes vector operation(s) for accessing an input data value of the input data vector, and determining, based on the input data value, a register index for a vector register among the plurality of vector registers, and an accumulator index for an accumulator among the plurality of accumulators of the vector register. Based on the register index, a register value is retrieved from the register index, and a scalar operation is performed based on the register value and the accumulator index.
Abstract:
An apparatus includes a translation lookaside buffer (TLB). The TLB includes at least one entry that includes an entry virtual address and an entry page size indication corresponding to an entry page. The apparatus also includes input logic configured to receive an input page size indication and an input virtual address corresponding to an input page. The apparatus further includes overlap checking logic configured to determine, based at least in part on the entry page size indication and the input page size indication, whether the input page overlaps the entry page.