Differential pipeline delays in a coprocessor

    公开(公告)号:US11709681B2

    公开(公告)日:2023-07-25

    申请号:US15837974

    申请日:2017-12-11

    CPC classification number: G06F9/3867 G06F9/3836

    Abstract: A coprocessor such as a floating-point unit includes a pipeline that is partitioned into a first portion and a second portion. A controller is configured to provide control signals to the first portion and the second portion of the pipeline. A first physical distance traversed by control signals propagating from the controller to the first portion of the pipeline is shorter than a second physical distance traversed by control signals propagating from the controller to the second portion of the pipeline. A scheduler is configured to cause a physical register file to provide a first subset of bits of an instruction to the first portion at a first time. The physical register file provides a second subset of the bits of the instruction to the second portion at a second time subsequent to the first time.

    THREAD FORWARD PROGRESS AND/OR QUALITY OF SERVICE

    公开(公告)号:US20230034933A1

    公开(公告)日:2023-02-02

    申请号:US17390149

    申请日:2021-07-30

    Abstract: Methods, systems, and apparatuses provide support for allowing thread forward progress in a processing system and that improves quality of service. One system includes a processor; a bus coupled to the processor; a memory coupled to the processor via the bus; and a floating point unit coupled to the processor via the bus, wherein floating point unit comprises hardware control logic operative to: store for each thread, by a scheduler of the floating point unit, a counter; increase, by the scheduler, a value of the counter for each thread corresponding to a thread when at least one source ready operation exist for the thread; compare, by the scheduler, the value of the counter to a predetermined threshold; and make other threads ineligible to be picked by the scheduler when the counter is greater than or equal to the predetermined threshold.

    APPARATUS AND METHODS EMPLOYING A SHARED READ PORT REGISTER FILE

    公开(公告)号:US20230034072A1

    公开(公告)日:2023-02-02

    申请号:US17389838

    申请日:2021-07-30

    Abstract: In some implementations, a processor includes a plurality of parallel instruction pipes, a register file includes at least one shared read port configured to be shared across multiple pipes of the plurality of parallel instruction pipes. Control logic controls multiple parallel instruction pipes to read from the at least one shared read port. In certain examples, the at least one shared register file read port is coupled as a single read port for one of the parallel instruction pipes and as a shared register file read port for a plurality of other parallel instruction pipes.

    Setting values of portions of registers based on bit values

    公开(公告)号:US11451241B2

    公开(公告)日:2022-09-20

    申请号:US15842027

    申请日:2017-12-14

    Abstract: A processor employs a set of bits to indicate values of portions of registers of a register file. In response to a specified instruction indicating an expected change of instruction types to be executed, the processor sets one or more of the bits and, for subsequent instructions, interprets corresponding portions of the registers as having a specified value (e.g., zero). By employing the set of bits to set the values of the register portions, rather than setting the individual portions of the registers to the specified value, the processor conserves processor resources (e.g., power) when the processor transitions between executing instructions of different types.

    Register renaming after a non-pickable scheduler queue

    公开(公告)号:US11281466B2

    公开(公告)日:2022-03-22

    申请号:US16660495

    申请日:2019-10-22

    Abstract: A floating point unit includes a non-pickable scheduler queue (NSQ) that offers a load operation concurrently with a load store unit retrieving load data for an operand that is to be loaded by the load operation. The floating point unit also includes a renamer that renames architectural registers used by the load operation and allocates physical register numbers to the load operation in response to receiving the load operation from the NSQ. The floating point unit further includes a set of pickable scheduler queues that receive the load operation from the renamer and store the load operation prior to execution. A physical register file is implemented in the floating point unit and a free list is used to store physical register numbers of entries in the physical register file that are available for allocation.

    Computer-based square root and division operations

    公开(公告)号:US09910638B1

    公开(公告)日:2018-03-06

    申请号:US15247416

    申请日:2016-08-25

    CPC classification number: G06F7/5525 G06F7/535 G06F2207/5523

    Abstract: Square root operations in a computer processor are disclosed. A first iteration for calculating partial results of a square root operation is performed in a larger number of cycles than remaining iterations. The first iteration requires calculation of a first digit that is larger than the subsequent digits. The first iteration thus requires multiplication of values that are larger than corresponding values for the subsequent other digits. By splitting the first digit into two parts, the required multiplications can be performed in less time than if the first digit were not split. Performing these multiplications in less time reduces the total delay for clock cycles associated with the first digit calculations, which increases the possible clock frequency allowed. A multiply-and-accumulate unit that performs either packed-single operations or double-precision operations may be used, along with a combined division/square root unit for simultaneous execution of division and square root operations.

    COMPUTER-BASED SQUARE ROOT AND DIVISION OPERATIONS

    公开(公告)号:US20180060039A1

    公开(公告)日:2018-03-01

    申请号:US15247416

    申请日:2016-08-25

    CPC classification number: G06F7/5525 G06F7/535 G06F2207/5523

    Abstract: Square root operations in a computer processor are disclosed. A first iteration for calculating partial results of a square root operation is performed in a larger number of cycles than remaining iterations. The first iteration requires calculation of a first digit that is larger than the subsequent digits. The first iteration thus requires multiplication of values that are larger than corresponding values for the subsequent other digits. By splitting the first digit into two parts, the required multiplications can be performed in less time than if the first digit were not split. Performing these multiplications in less time reduces the total delay for clock cycles associated with the first digit calculations, which increases the possible clock frequency allowed. A multiply-and-accumulate unit that performs either packed-single operations or double-precision operations may be used, along with a combined division/square root unit for simultaneous execution of division and square root operations.

    PROCESSOR AND METHODS FOR FLOATING POINT REGISTER ALIASING
    29.
    发明申请
    PROCESSOR AND METHODS FOR FLOATING POINT REGISTER ALIASING 审中-公开
    浮点注入器的处理器和方法

    公开(公告)号:US20150121040A1

    公开(公告)日:2015-04-30

    申请号:US14523660

    申请日:2014-10-24

    Abstract: Methods, devices, and systems for accessing packed registers are presented. A state of the packed registers may be tracked and it may be determined whether the register is directly accessible based on the state. If the register is not directly accessible, an action may be performed which allows the register to be accessed directly. The action may include injecting at least one uop for reorganizing the physical storage of the register such that it is directly accessible. The action may include aligning the data with the least significant bit of a physical register or otherwise aligning the data with the datapath. The action may also include changing the state of the packed registers.

    Abstract translation: 介绍了访问打包寄存器的方法,设备和系统。 可以跟踪打包寄存器的状态,并且可以基于状态确定寄存器是否可直接访问。 如果寄存器不可直接访问,则可以执行允许直接访问寄存器的动作。 该动作可以包括至少注入一个uop来重新组织寄存器的物理存储器,使得它可以直接访问。 该动作可以包括将数据与物理寄存器的最低有效位对准,或者使数据与数据通路对准。 该动作还可以包括改变打包寄存器的状态。

Patent Agency Ranking