ZERO CYCLE LOAD BYPASS
    21.
    发明申请

    公开(公告)号:US20210173654A1

    公开(公告)日:2021-06-10

    申请号:US16705023

    申请日:2019-12-05

    Applicant: Apple Inc.

    Abstract: Systems, apparatuses, and methods for implementing zero cycle load bypass operations are described. A system includes a processor with at least a decode unit, control logic, mapper, and free list. When a load operation is detected, the control logic determines if the load operation qualifies to be converted to a zero cycle load bypass operation. Conditions for qualifying include the load operation being in the same decode group as an older store operation to the same address. Qualifying load operations are converted to zero cycle load bypass operations. A lookup of the free list is prevented for a zero cycle load bypass operation and a destination operand of the load is renamed with a same physical register identifier used for a source operand of the store. Also, the data of the store is bypassed to the load.

    Methods for partially saving a branch predictor state

    公开(公告)号:US10223123B1

    公开(公告)日:2019-03-05

    申请号:US15133804

    申请日:2016-04-20

    Applicant: Apple Inc.

    Abstract: In an embodiment, an apparatus includes a plurality of memories configured to store respective data in a plurality of branch prediction entries. Each branch prediction entry corresponds to at least one of a plurality of branch instructions. The apparatus also includes a control circuit configured to store first data associated with a first branch instruction into a corresponding branch prediction entry in at least one memory of the plurality of memories. The control circuit is further configured to select a first memory of the plurality of memories, to disconnect the first memory from a power supply in response to a detection of a first power mode signal, and to cease storing data in the plurality of memories in response to the detection of the first power mode signal.

    Immediate branch recode that handles aliasing

    公开(公告)号:US09940262B2

    公开(公告)日:2018-04-10

    申请号:US14491149

    申请日:2014-09-19

    Applicant: Apple Inc.

    Abstract: A system and method for efficiently indicating branch target addresses. A semiconductor chip predecodes instructions of a computer program prior to installing the instructions in an instruction cache. In response to determining a particular instruction is a control flow instruction with a displacement relative to a program counter address (PC), the chip replaces a portion of the PC relative displacement in the particular instruction with a subset of a target address. The subset of the target address is an untranslated physical subset of the full target address. When the recoded particular instruction is fetched and decoded, the remaining portion of the PC relative displacement is added to a virtual portion of the PC used to fetch the particular instruction. The result is concatenated with the portion of the target address embedded in the fetched particular instruction to form a full target address.

    Split register file for operands of different sizes

    公开(公告)号:US09639369B2

    公开(公告)日:2017-05-02

    申请号:US14076660

    申请日:2013-11-11

    Applicant: Apple Inc.

    Inventor: Conrado Blasco

    CPC classification number: G06F9/384 G06F9/30112

    Abstract: In an embodiment, a processor includes a register file having multiple widths corresponding to different operands sizes of a given data type implemented by the processor. For example, the integer register file may have 32 bit and 64 bit widths for 32 and 64 bit operand sizes. The register file may have a section of registers for each operand size, and the map unit may allocate registers from the appropriate section for each instruction operation based on the operand size of that instruction operation. The register file may consume less integrated circuit area than another register file having the same number of registers, all of which are implemented at the largest operand size. In some embodiments, only the register file and the map unit (specifically the free list management logic in the map unit) are changed to implement the multiple-width register file.

    Early loop buffer mode entry upon number of mispredictions of exit condition exceeding threshold
    26.
    发明授权
    Early loop buffer mode entry upon number of mispredictions of exit condition exceeding threshold 有权
    退出条件超过阈值的错误预测数的早期循环缓冲模式输入

    公开(公告)号:US09471322B2

    公开(公告)日:2016-10-18

    申请号:US14179204

    申请日:2014-02-12

    Applicant: Apple Inc.

    Abstract: Systems, processors, and methods for determining when to enter loop buffer mode early for loops in an instruction stream. A processor waits until a branch history register has saturated before entering loop buffer mode for a loop if the processor has not yet determined the loop has an unpredictable exit. However, if the loop has an unpredictable exit, then the loop is allowed to enter loop buffer mode early. While in loop buffer mode, the loop is dispatched from a loop buffer, and the front-end of the processor is powered down until the loop terminates.

    Abstract translation: 用于确定何时在指令流中循环进入循环缓冲模式的系统,处理器和方法。 如果处理器尚未确定循环具有不可预测的退出,处理器将等待直到分支历史寄存器在进入环路循环缓冲区模式之前饱和。 然而,如果循环有一个不可预测的退出,那么循环允许提前进入循环缓冲模式。 在循环缓冲模式下,循环从循环缓冲区中分派,处理器的前端掉电直到循环终止。

    IMMEDIATE BRANCH RECODE THAT HANDLES ALIASING
    27.
    发明申请
    IMMEDIATE BRANCH RECODE THAT HANDLES ALIASING 有权
    立即分配掌握手柄

    公开(公告)号:US20160085550A1

    公开(公告)日:2016-03-24

    申请号:US14491149

    申请日:2014-09-19

    Applicant: Apple Inc.

    Abstract: A system and method for efficiently indicating branch target addresses. A semiconductor chip predecodes instructions of a computer program prior to installing the instructions in an instruction cache. In response to determining a particular instruction is a control flow instruction with a displacement relative to a program counter address (PC), the chip replaces a portion of the PC relative displacement in the particular instruction with a subset of a target address. The subset of the target address is an untranslated physical subset of the full target address. When the recoded particular instruction is fetched and decoded, the remaining portion of the PC relative displacement is added to a virtual portion of the PC used to fetch the particular instruction. The result is concatenated with the portion of the target address embedded in the fetched particular instruction to form a full target address.

    Abstract translation: 一种有效指示分支目标地址的系统和方法。 在将指令安装到指令高速缓存之前,半导体芯片预先对计算机程序的指令进行解码。 响应于确定特定指令是具有相对于程序计数器地址(PC)的位移的控制流程指令,芯片用目标地址的子集替换特定指令中的PC相对位移的一部分。 目标地址的子集是完整目标地址的非翻译物理子集。 当重新编码的特定指令被取出和解码时,PC相对位移的剩余部分被添加到用于获取特定指令的PC的虚拟部分。 结果与嵌入在获取的特定指令中的目标地址的部分连接以形成完整的目标地址。

    REDUCING POWER CONSUMPTION IN A PROCESSOR
    28.
    发明申请
    REDUCING POWER CONSUMPTION IN A PROCESSOR 审中-公开
    降低处理器中的功耗

    公开(公告)号:US20150169041A1

    公开(公告)日:2015-06-18

    申请号:US14104042

    申请日:2013-12-12

    Applicant: Apple Inc.

    Abstract: A processor includes a mechanism for disabling a memory array of a branch prediction unit. The processor may include a next fetch prediction unit that may include a number of entries. Each entry may correspond to a next instruction fetch group and may store an indication of whether or not the corresponding the next fetch group includes a conditional branch instruction. In response to an indication that the next fetch group does not include a conditional branch instruction, the fetch prediction unit may be configured to disable, in a next instruction execution cycle, the memory array of the branch prediction unit.

    Abstract translation: 处理器包括用于禁用分支预测单元的存储器阵列的机构。 处理器可以包括可以包括多个条目的下一个提取预测单元。 每个条目可以对应于下一个指令获取组,并且可以存储对应的下一个提取组是否包括条件分支指令的指示。 响应于下一个提取组不包括条件分支指令的指示,获取预测单元可以被配置为在下一个指令执行周期中禁止分支预测单元的存储器阵列。

    Split Register File for Operands of Different Sizes
    29.
    发明申请
    Split Register File for Operands of Different Sizes 有权
    拆分不同大小的操作数的寄存器文件

    公开(公告)号:US20150134935A1

    公开(公告)日:2015-05-14

    申请号:US14076660

    申请日:2013-11-11

    Applicant: Apple Inc.

    Inventor: Conrado Blasco

    CPC classification number: G06F9/384 G06F9/30112

    Abstract: In an embodiment, a processor includes a register file having multiple widths corresponding to different operands sizes of a given data type implemented by the processor. For example, the integer register file may have 32 bit and 64 bit widths for 32 and 64 bit operand sizes. The register file may have a section of registers for each operand size, and the map unit may allocate registers from the appropriate section for each instruction operation based on the operand size of that instruction operation. The register file may consume less integrated circuit area than another register file having the same number of registers, all of which are implemented at the largest operand size. In some embodiments, only the register file and the map unit (specifically the free list management logic in the map unit) are changed to implement the multiple-width register file.

    Abstract translation: 在一个实施例中,处理器包括具有对应于由处理器实现的给定数据类型的不同操作数大小的多个宽度的寄存器文件。 例如,整数寄存器文件可能具有32位和64位宽度,用于32位和64位操作数大小。 寄存器文件可以具有用于每个操作数大小的一部分寄存器,并且映射单元可以基于该指令操作的操作数大小从适当部分为每个指令操作分配寄存器。 寄存器文件可能消耗的集成电路面积小于具有相同数量寄存器的另一个寄存器文件,所有这些寄存器文件都以最大的操作数大小实现。 在一些实施例中,只有寄存器文件和地图单元(具体地,映射单元中的空闲列表管理逻辑)被改变以实现多宽度寄存器文件。

Patent Agency Ranking