Replay of partially executed instruction blocks in a processor-based system employing a block-atomic execution model

    公开(公告)号:US11188336B2

    公开(公告)日:2021-11-30

    申请号:US15252323

    申请日:2016-08-31

    Abstract: Replay of partially executed instruction blocks in a processor-based system employing a block-atomic execution model is disclosed. In one aspect, a partial replay controller is provided in a processor(s) of a central processing unit (CPU). If an instruction is detected in the instruction block associated with a potential architectural state modification, or an exception occurs during execution of instructions, the instruction block is re-executed. During re-execution of the instruction block, the partial replay controller is configured to record produced results from load/store instructions. Thus, if an exception occurs during re-execution of the instruction block, previously recorded produced results for the executed load/store instructions before the exception occurred are replayed during re-execution of the instruction block after the exception is resolved. Thus, execution of instructions leading up to side-effect operations in the instruction block can be deterministically repeated with previously produced results, without repeating the side-effects.

    Providing variable interpretation of usefulness indicators for memory tables in processor-based systems

    公开(公告)号:US10725782B2

    公开(公告)日:2020-07-28

    申请号:US15701926

    申请日:2017-09-12

    Abstract: Providing variable interpretation of usefulness indicators for memory tables in processor-based systems is disclosed. In one aspect, a memory system comprises a memory table providing multiple memory table entries, each including a usefulness indicator. A memory controller of the memory system comprises a global polarity indicator representing how the usefulness indicator for each memory table entry is interpreted and updated by the memory controller. If the global polarity indicator is set, the memory controller interprets a value of each usefulness indicator as directly corresponding to the usefulness of the corresponding memory table entry. Conversely, if the global polarity indicator is not set, the polarity is reversed such that the memory controller interprets the usefulness indicator value as inversely corresponding to the usefulness of the corresponding memory table entry. In this manner, the interpretation and updating of usefulness indicators by the memory controller can be varied using the global polarity indicator.

    PROVIDING EFFICIENT HANDLING OF BRANCH DIVERGENCE IN VECTORIZABLE LOOPS BY VECTOR-PROCESSOR-BASED DEVICES

    公开(公告)号:US20200065098A1

    公开(公告)日:2020-02-27

    申请号:US16107136

    申请日:2018-08-21

    Abstract: Providing efficient handling of branch divergence in vectorizable loops by vector-processor-based devices is disclosed. In some aspects, a vector-processor-based device provides a plurality of processing elements (PEs) coupled to a scheduler circuit comprising a clock cycle threshold and a mask register comprising a plurality of bits corresponding to a plurality of loop iterations of a vectorizable loop to be executed. The scheduler circuit initiates a first execution interval, during which loop iterations of the vectorizable loop are assigned to PEs for parallel execution. If a loop iteration's execution time exceeds the clock cycle threshold, the scheduler circuit sets a mask register bit corresponding to the loop iteration indicating that the loop iteration is incomplete, and defers its execution. After the first execution interval is complete, the scheduler circuit initiates a second execution interval, during which incomplete loop iterations indicated by the mask register are executed in parallel by the PEs.

    PROVIDING RECONFIGURABLE FUSION OF PROCESSING ELEMENTS (PEs) IN VECTOR-PROCESSOR-BASED DEVICES

    公开(公告)号:US20200012618A1

    公开(公告)日:2020-01-09

    申请号:US16028072

    申请日:2018-07-05

    Abstract: Providing reconfigurable fusion of processing elements (PEs) in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device provides a vector processor including a plurality of PEs and a decode/control circuit. The decode/control circuit receives an instruction block containing a vectorizable loop comprising a loop body. The decode/control circuit determines how many PEs of the plurality of PEs are required to execute the loop body, and reconfigures the plurality of PEs into one or more fused PEs, each including the determined number of PEs required to execute the loop body. The plurality of PEs, reconfigured into one or more fused PEs, then executes one or more loop iterations of the loop body. Some aspects further include a PE communications link interconnecting the plurality of PEs, to enable communications between PEs of a fused PE and communications of inter-iteration data dependencies between PEs without requiring vector register file access operations.

    REPLAY OF PARTIALLY EXECUTED INSTRUCTION BLOCKS IN A PROCESSOR-BASED SYSTEM EMPLOYING A BLOCK-ATOMIC EXECUTION MODEL

    公开(公告)号:US20170185408A1

    公开(公告)日:2017-06-29

    申请号:US15252323

    申请日:2016-08-31

    CPC classification number: G06F9/30181 G06F9/30043 G06F9/3832 G06F9/3861

    Abstract: Replay of partially executed instruction blocks in a processor-based system employing a block-atomic execution model is disclosed. In one aspect, a partial replay controller is provided in a processor(s) of a central processing unit (CPU). If an instruction is detected in the instruction block associated with a potential architectural state modification, or an exception occurs during execution of instructions, the instruction block is re-executed. During re-execution of the instruction block, the partial replay controller is configured to record produced results from load/store instructions. Thus, if an exception occurs during re-execution of the instruction block, previously recorded produced results for the executed load/store instructions before the exception occurred are replayed during re-execution of the instruction block after the exception is resolved. Thus, execution of instructions leading up to side-effect operations in the instruction block can be deterministically repeated with previously produced results, without repeating the side-effects.

    MANAGING ALLOCATION OF PHYSICAL REGISTERS IN A BLOCK-BASED INSTRUCTION SET ARCHITECTURE (ISA), AND RELATED APPARATUSES AND METHODS
    8.
    发明申请
    MANAGING ALLOCATION OF PHYSICAL REGISTERS IN A BLOCK-BASED INSTRUCTION SET ARCHITECTURE (ISA), AND RELATED APPARATUSES AND METHODS 审中-公开
    在基于块的指令集架构(ISA)中管理物理寄存器的分配以及相关设备和方法

    公开(公告)号:US20160179532A1

    公开(公告)日:2016-06-23

    申请号:US14578913

    申请日:2014-12-22

    Abstract: Managing allocation of physical registers in a block-based instruction set architecture (ISA), and related apparatuses and methods, are disclosed. In one aspect, an apparatus provides an instruction processing circuit communicatively coupled to multiple physical registers. The instruction processing circuit includes a register rename map that comprises an association between at least one architectural register and at least one of the multiple physical registers. The instruction processing circuit further comprises an in-use indicator set associated with the register rename map, the in-use indicator set indicative of an in-use physical register among the multiple physical registers. The instruction processing circuit is configured to copy the in-use indicator set to an output in-use indicator set, and modify the output in-use indicator set upon detection of a block-based write instruction to mark the in-use physical register as unused.

    Abstract translation: 公开了在基于块的指令集体系结构(ISA)中管理物理寄存器的分配以及相关的装置和方法。 一方面,一种装置提供通信地耦合到多个物理寄存器的指令处理电路。 指令处理电路包括寄存器重命名映射,其包括至少一个架构寄存器与多个物理寄存器中的至少一个之间的关联。 所述指令处理电路还包括与所述寄存器重命名映射相关联的使用中的指示符集合,所述指示集合指示所述多个物理寄存器中的使用中的物理寄存器。 指令处理电路被配置为将使用中指示符集合复制到输出使用中指示符集合,并且在检测到基于块的写入指令时修改输出使用中指示符,以将使用中的物理寄存器标记为 没用过。

    Providing multi-element multi-vector (MEMV) register file access in vector-processor-based devices

    公开(公告)号:US11048509B2

    公开(公告)日:2021-06-29

    申请号:US16000580

    申请日:2018-06-05

    Abstract: Providing multi-element multi-vector (MEMV) register file access in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device includes a vector processor comprising multiple processing elements (PEs) communicatively coupled via a corresponding plurality of channels to a vector register file comprising a plurality of memory banks. The vector processor provides a direct memory access (DMA) controller that is configured to receive a plurality of vectors that each comprise a plurality of vector elements representing operands for processing a loop iteration. The DMA controller arranges the vectors in the vector register file such that, for each group of vectors to be accessed in parallel, vector elements for each vector are stored consecutively, but corresponding vector elements of consecutive vectors are stored in different memory banks of the vector register file. As a result, multiple elements of multiple vectors may be accessed with a single vector register file access operation.

    Providing memory dependence prediction in block-atomic dataflow architectures

    公开(公告)号:US10684859B2

    公开(公告)日:2020-06-16

    申请号:US15269254

    申请日:2016-09-19

    Abstract: Providing memory dependence prediction in block-atomic dataflow architectures is provided, in one aspect, la a memory dependence prediction circuit. The memory dependence prediction circuit comprises a predictor table configured to store multiple predictor table entries, each comprising a store instruction identifier, a block reach set, and a load set. Using this data, the memory dependence prediction circuit determines, upon a fetch of an instruction block by an execution pipeline, whether the instruction block contains store instructions that reach dependent load instructions. If so, the store instructions are marked as having dependent load instructions to wake. In some aspects, the memory dependence prediction circuit is configured to determine whether the instruction block contains dependent load instructions reached by store instructions. If so, the memory dependence prediction circuit delays execution of the dependent load instructions.

Patent Agency Ranking