Cache dependency handling
    11.
    发明授权

    公开(公告)号:US10127153B1

    公开(公告)日:2018-11-13

    申请号:US14868245

    申请日:2015-09-28

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to managing data-request dependencies for a cache. In one embodiment, an integrated circuit is disclosed that includes a plurality of requesting agents and a cache. The cache is configured to receive read and write requests from the plurality of requesting agents including a first request and a second request. The cache is configured to detect that the first and second requests specify addresses that correspond to different portions of the same cache line, and to determine whether to delay processing one of the first and second requests based on whether the first and second requests are from the same requesting agent. In some embodiments, the cache is configured to service the first and second requests in parallel in response to determining that the first and second requests are from the same requesting agent.

    Arithmetic branch fusion
    12.
    发明授权

    公开(公告)号:US09672037B2

    公开(公告)日:2017-06-06

    申请号:US13747977

    申请日:2013-01-23

    Applicant: Apple Inc.

    Abstract: A processor and method for fusing together an arithmetic instruction and a branch instruction. The processor includes an instruction fetch unit configured to fetch instructions. The processor may also include an instruction decode unit that may be configured to decode the fetched instructions into micro-operations for execution by an execution unit. The decode unit may be configured to detect an occurrence of an arithmetic instruction followed by a branch instruction in program order, wherein the branch instruction, upon execution, changes a program flow of control dependent upon a result of execution of the arithmetic instruction. In addition, the processor may further be configured to fuse together the arithmetic instruction and the branch instruction such that a single micro-operation is formed. The single micro-operation includes execution information based upon both the arithmetic instruction and the branch instruction.

    Memory Controller Reservation of Retry Queue

    公开(公告)号:US20250103520A1

    公开(公告)日:2025-03-27

    申请号:US18819755

    申请日:2024-08-29

    Applicant: Apple Inc.

    Abstract: A memory controller circuit receives memory access requests from a network of a computer system. Entries are reserved for these requests in a retry queue circuit. An arbitration circuit of the memory controller circuit issues those requests to a tag pipeline circuit that determines whether the received memory access requests hit in a memory cache. As a memory access request passes through the tag pipeline circuit, it may require another pass through this pipeline—for example, if resources such as certain storage circuits needed to complete the memory access request are unavailable (for example a snoop queue circuit). The reservation that has been made in the retry queue circuit thus keeps the request from having to be returned to the network for resubmission to the memory controller circuit if initial processing of the memory access request cannot be completed.

    Coherence Directory Way Tracking in Coherent Agents

    公开(公告)号:US20250103496A1

    公开(公告)日:2025-03-27

    申请号:US18433118

    申请日:2024-02-05

    Applicant: Apple Inc.

    Abstract: An apparatus includes a plurality of coherent agents, and a coherence directory that includes directory ways for storing coherency information. The coherence directory may be configured to determine that a cache block that is not currently cached among the coherent agents, is stored in a first coherent agent. The coherence directory may be further configured to, in response to this determination, create a particular entry in a selected one of the directory ways. The coherence directory may also be configured to send, to the first coherent agent, an indicator identifying a directory way that includes the entry. In response to a second coherent agent caching the cache block, the coherence directory may update the entry to include the second coherent agent. The first and second coherent agents may be configured to receive copies of the indicator, and to store their copy in locations associated with the cache block.

    Reducing latency for pointer chasing loads

    公开(公告)号:US09710268B2

    公开(公告)日:2017-07-18

    申请号:US14264789

    申请日:2014-04-29

    Applicant: Apple Inc.

    CPC classification number: G06F9/30043 G06F9/3826 G06F9/3834 G06F9/3861

    Abstract: Systems, methods, and apparatuses for reducing the load to load/store address latency in an out-of-order processor. When a producer load is detected in the processor pipeline, the processor predicts whether the producer load is going to hit in the store queue. If the producer load is predicted not to hit in the store queue, then a dependent load or store can be issued early. The result data of the producer load is then bypassed forward from the data cache directly to the address generation unit. This result data is then used to generate an address for the dependent load or store, reducing the latency of the dependent load or store by one clock cycle.

    Usefulness indication for indirect branch prediction training
    19.
    发明授权
    Usefulness indication for indirect branch prediction training 有权
    间接分支预测训练的实用指标

    公开(公告)号:US09311100B2

    公开(公告)日:2016-04-12

    申请号:US13735694

    申请日:2013-01-07

    Applicant: Apple Inc.

    CPC classification number: G06F9/3844 G06F9/30072 G06F9/3806 G06F9/3848

    Abstract: A circuit for implementing a branch target buffer. The branch target buffer may include a memory that stores a plurality of entries. Each entry may include a tag value, a target value, and a prediction accuracy value. A received index value corresponding to an indirect branch instruction may be used to select one of entries of the plurality of entries, and a received tag value may then be compared to the tag value of the selected entries in the memory. An entry in the memory may be selected in response to a determination that the received tag does not match the tag value of compared entries. The selected entry may be allocated to the indirect instruction branch dependent upon the prediction accuracy values of the plurality of entries.

    Abstract translation: 用于实现分支目标缓冲器的电路。 分支目标缓冲器可以包括存储多个条目的存储器。 每个条目可以包括标签值,目标值和预测精度值。 对应于间接分支指令的接收到的索引值可以用于选择多个条目中的一个条目,然后将接收到的标签值与存储器中所选条目的标签值进行比较。 响应于接收到的标签与被比较的条目的标签值不匹配的确定,可以选择存储器中的条目。 所选择的条目可以根据多个条目的预测精度值分配给间接指令分支。

    Arithmetic Branch Fusion
    20.
    发明申请
    Arithmetic Branch Fusion 有权
    算术分支融合

    公开(公告)号:US20140208073A1

    公开(公告)日:2014-07-24

    申请号:US13747977

    申请日:2013-01-23

    Applicant: APPLE INC.

    Abstract: A processor and method for fusing together an arithmetic instruction and a branch instruction. The processor includes an instruction fetch unit configured to fetch instructions. The processor may also include an instruction decode unit that may be configured to decode the fetched instructions into micro-operations for execution by an execution unit. The decode unit may be configured to detect an occurrence of an arithmetic instruction followed by a branch instruction in program order, wherein the branch instruction, upon execution, changes a program flow of control dependent upon a result of execution of the arithmetic instruction. In addition, the processor may further be configured to fuse together the arithmetic instruction and the branch instruction such that a single micro-operation is formed. The single micro-operation includes execution information based upon both the arithmetic instruction and the branch instruction.

    Abstract translation: 一种用于将算术指令和分支指令融合在一起的处理器和方法。 处理器包括被配置为提取指令的指令获取单元。 处理器还可以包括指令解码单元,其可被配置为将获取的指令解码为微执行以由执行单元执行。 解码单元可以被配置为以程序顺序检测随后是分支指令的算术指令的发生,其中分支指令在执行时根据算术指令的执行结果改变程序控制流程。 此外,处理器还可以被配置为将算术指令和分支指令融合在一起,使得形成单个微操作。 单个微操作包括基于算术指令和分支指令的执行信息。

Patent Agency Ranking