Parallel coherence and memory cache processing pipelines

    公开(公告)号:US11138111B2

    公开(公告)日:2021-10-05

    申请号:US16129527

    申请日:2018-09-12

    Applicant: Apple Inc.

    Abstract: Systems, apparatuses, and methods for performing coherence processing and memory cache processing in parallel are disclosed. A system includes a communication fabric and a plurality of dual-processing pipelines. Each dual-processing pipeline includes a coherence processing pipeline and a memory cache processing pipeline. The communication fabric forwards a transaction to a given dual-processing pipeline, with the communication fabric selecting the given dual-processing pipeline, from the plurality of dual-processing pipelines, based on a hash of the address of the transaction. The given dual-processing pipeline performs a duplicate tag lookup in parallel with a memory cache tag lookup for the transaction. By performing the duplicate tag lookup and the memory cache tag lookup in a parallel fashion rather than in a serial fashion, latency and power consumption are reduced while performance is enhanced.

    Method and apparatus for ensuring real-time snoop latency

    公开(公告)号:US10795818B1

    公开(公告)日:2020-10-06

    申请号:US16418811

    申请日:2019-05-21

    Applicant: Apple Inc.

    Abstract: Various systems and methods for ensuring real-time snoop latency are disclosed. A system includes a processor and a cache controller. The cache controller receives, via a channel, cache snoop requests from the processor, the snoop requests including latency-sensitive and non-latency sensitive requests. Requests are not prioritized by type within the channel. The cache controller limits a number of non-latency sensitive snoop requests that can be processed ahead of an incoming latency-sensitive snoop requests. Limiting the number of non-latency sensitive snoop requests that can be processed ahead of an incoming latency-sensitive snoop request includes the cache controller determining that the number of received non-latency sensitive snoop requests has reached a predetermined value and responsively prioritizing latency-sensitive requests over non-latency sensitive requests.

    Managing fast to slow links in a bus fabric
    23.
    发明授权
    Managing fast to slow links in a bus fabric 有权
    快速管理以减慢总线结构中的链接

    公开(公告)号:US09170768B2

    公开(公告)日:2015-10-27

    申请号:US13726437

    申请日:2012-12-24

    Applicant: Apple Inc.

    CPC classification number: G06F5/06 G06F13/38 G06F13/382

    Abstract: Systems and methods for managing fast to slow links in a bus fabric. A pair of link interface units connect agents with a clock mismatch. Each link interface unit includes an asynchronous FIFO for storing transactions that are sent over the clock domain crossing. When the command for a new transaction is ready to be sent while data for the previous transaction is still being sent, the link interface unit prevents the last data beat of the previous transaction from being sent. Instead, after a delay of one or more clock cycles, the last data beat overlaps with the command of the new transaction.

    Abstract translation: 用于管理总线结构中快速到慢速链接的系统和方法。 一对链路接口单元连接具有时钟不匹配的代理。 每个链路接口单元包括用于存储通过时钟域穿越发送的事务的异步FIFO。 当新事务的命令准备好发送,而前一个事务的数据仍然被发送时,链接接口单元阻止发送先前事务的最后数据节拍。 相反,在一个或多个时钟周期的延迟之后,最后的数据跳转与新事务的命令重叠。

    Prefetching across page boundaries in hierarchically cached processors
    24.
    发明授权
    Prefetching across page boundaries in hierarchically cached processors 有权
    在分级缓存的处理器中预取页面边界

    公开(公告)号:US09047198B2

    公开(公告)日:2015-06-02

    申请号:US13689696

    申请日:2012-11-29

    Applicant: Apple Inc.

    Abstract: Processors and methods for preventing lower level prefetch units from stalling at page boundaries. An upper level prefetch unit closest to the processor core issues a preemptive request for a translation of the next page in a given prefetch stream. The upper level prefetch unit sends the translation to the lower level prefetch units prior to the lower level prefetch units reaching the end of the current page for the given prefetch stream. When the lower level prefetch units reach the boundary of the current page, instead of stopping, these prefetch units can continue to prefetch by jumping to the next physical page number provided in the translation.

    Abstract translation: 用于防止较低级别的预取单元在页面边界停止的处理器和方法。 最靠近处理器核心的高级预取单元在给定的预取流中发出对下一页的翻译的抢占请求。 在较低级预取单元到达给定预取流的当前页面的末尾之前,高级预取单元将转换发送到较低级预取单元。 当低级预取单元到达当前页面的边界而不是停止时,这些预取单元可以通过跳转到翻译中提供的下一个物理页码继续预取。

    Cache policies for uncacheable memory requests
    25.
    发明授权
    Cache policies for uncacheable memory requests 有权
    缓存不可缓存内存请求的策略

    公开(公告)号:US09043554B2

    公开(公告)日:2015-05-26

    申请号:US13725066

    申请日:2012-12-21

    Applicant: Apple Inc.

    CPC classification number: G06F12/0811 G06F12/0815 G06F12/0888

    Abstract: Systems, processors, and methods for keeping uncacheable data coherent. A processor includes a multi-level cache hierarchy, and uncacheable load memory operations can be cached at any level of the cache hierarchy. If an uncacheable load misses in the L2 cache, then allocation of the uncacheable load will be restricted to a subset of the ways of the L2 cache. If an uncacheable store memory operation hits in the L1 cache, then the hit cache line can be updated with the data from the memory operation. If the uncacheable store misses in the L1 cache, then the uncacheable store is sent to a core interface unit. Multiple contiguous store misses are merged into larger blocks of data in the core interface unit before being sent to the L2 cache.

    Abstract translation: 用于保持不可缓存的数据一致的系统,处理器和方法。 处理器包括多级缓存层次结构,并且不可缓存的加载存储器操作可以在高速缓存层级的任何级别缓存。 如果L2缓存中存在不可缓存的加载错误,则不可缓存的加载的分配将被限制为L2高速缓存的一部分。 如果不可缓存的存储器操作命中在L1缓存中,则命中高速缓存行可以用来自存储器操作的数据来更新。 如果不可缓存的商店在L1缓存中丢失,则不可缓存的商店被发送到核心接口单元。 在发送到L2缓存之前,多个连续的存储器缺失在核心接口单元中被合并到更大的数据块中。

    Memory Controller Reservation of Retry Queue

    公开(公告)号:US20250103520A1

    公开(公告)日:2025-03-27

    申请号:US18819755

    申请日:2024-08-29

    Applicant: Apple Inc.

    Abstract: A memory controller circuit receives memory access requests from a network of a computer system. Entries are reserved for these requests in a retry queue circuit. An arbitration circuit of the memory controller circuit issues those requests to a tag pipeline circuit that determines whether the received memory access requests hit in a memory cache. As a memory access request passes through the tag pipeline circuit, it may require another pass through this pipeline—for example, if resources such as certain storage circuits needed to complete the memory access request are unavailable (for example a snoop queue circuit). The reservation that has been made in the retry queue circuit thus keeps the request from having to be returned to the network for resubmission to the memory controller circuit if initial processing of the memory access request cannot be completed.

    PARALLEL COHERENCE AND MEMORY CACHE PROCESSING PIPELINES

    公开(公告)号:US20200081838A1

    公开(公告)日:2020-03-12

    申请号:US16129527

    申请日:2018-09-12

    Applicant: Apple Inc.

    Abstract: Systems, apparatuses, and methods for performing coherence processing and memory cache processing in parallel are disclosed. A system includes a communication fabric and a plurality of dual-processing pipelines. Each dual-processing pipeline includes a coherence processing pipeline and a memory cache processing pipeline. The communication fabric forwards a transaction to a given dual-processing pipeline, with the communication fabric selecting the given dual-processing pipeline, from the plurality of dual-processing pipelines, based on a hash of the address of the transaction. The given dual-processing pipeline performs a duplicate tag lookup in parallel with a memory cache tag lookup for the transaction. By performing the duplicate tag lookup and the memory cache tag lookup in a parallel fashion rather than in a serial fashion, latency and power consumption are reduced while performance is enhanced.

    Methods for cache line eviction
    28.
    发明授权
    Methods for cache line eviction 有权
    缓存线驱逐的方法

    公开(公告)号:US09529730B2

    公开(公告)日:2016-12-27

    申请号:US14263386

    申请日:2014-04-28

    Applicant: Apple Inc.

    Abstract: A method and apparatus for evicting cache lines from a cache memory includes receiving a request from one of a plurality of processors. The cache memory is configured to store a plurality of cache lines, and a given cache line includes an identifier indicating a processor that performed a most recent access of the given cache line. The method further includes selecting a cache line for eviction from a group of least recently used cache lines, where each cache line of the group of least recently used cache lines occupy a priority position less that a predetermined value, and then evicting the selected cache line.

    Abstract translation: 用于从高速缓冲存储器中取出高速缓存行的方法和装置包括从多个处理器之一接收请求。 高速缓存存储器被配置为存储多条高速缓存行,并且给定的高速缓存行包括指示执行给定高速缓存行的最近访问的处理器的标识符。 该方法还包括从一组最近最少使用的高速缓存行中选择用于逐出的高速缓存行,其中最近最少使用的高速缓存行的组中的每个高速缓存行占据优先级位置小于预定值,然后逐出所选择的高速缓存行 。

    L2 cache retention mode
    29.
    发明授权
    L2 cache retention mode 有权
    L2缓存保留模式

    公开(公告)号:US09513693B2

    公开(公告)日:2016-12-06

    申请号:US14224773

    申请日:2014-03-25

    Applicant: Apple Inc.

    Abstract: Systems and methods for reducing leakage power in a L2 cache within a SoC. The L2 cache is partitioned into multiple banks, and each bank has its own separate power supply. An idle counter is maintained for each bank to count a number of cycles during which the bank has been inactive. The temperature and leaky factor of the SoC are used to select an operating point of the SoC. Based on the operating point, an idle counter threshold is set, with a high temperature and high leaky factor corresponding to a relatively low idle counter threshold, and with a low temperature and low leaky factor corresponding to a relatively high idle counter threshold. When a given idle counter exceeds the idle counter threshold, the voltage supplied to the corresponding bank is reduced to a voltage sufficient for retention of data but not for access.

    Abstract translation: 降低SoC内二级缓存中漏电功率的系统和方法。 L2缓存分为多个银行,每个银行都有自己独立的电源。 为每个银行维护一个空闲计数器来计算银行已经不活动的周期数。 SoC的温度和泄漏因子用于选择SoC的工作点。 基于操作点,设置空闲计数器阈值,具有对应于相对低的空闲计数器阈值的高温度和高泄漏因子,以及对应于相对高的空闲计数器阈值的低温度和低泄漏因子。 当给定的空闲计数器超过空闲计数器阈值时,提供给相应存储体的电压降低到足以保留数据但不能访问的电压。

    Mechanism for sharing private caches in a SoC
    30.
    发明授权
    Mechanism for sharing private caches in a SoC 有权
    在SoC中共享私有缓存的机制

    公开(公告)号:US09280471B2

    公开(公告)日:2016-03-08

    申请号:US14081549

    申请日:2013-11-15

    Applicant: Apple Inc.

    Abstract: Systems, processors, and methods for sharing an agent's private cache with other agents within a SoC. Many agents in the SoC have a private cache in addition to the shared caches and memory of the SoC. If an agent's processor is shut down or operating at less than full capacity, the agent's private cache can be shared with other agents. When a requesting agent generates a memory request and the memory request misses in the memory cache, the memory cache can allocate the memory request in a separate agent's cache rather than allocating the memory request in the memory cache.

    Abstract translation: 与SoC中的其他代理程序共享代理的私有缓存的系统,处理器和方法。 SoC中的许多代理除了SoC的共享缓存和内存之外还有一个专用缓存。 如果代理的处理器关闭或以小于满容量运行,代理的私有缓存可以与其他代理共享。 当请求代理产生存储器请求并且存储器请求丢失在存储器高速缓存中时,存储器高速缓存可以在单独的代理的高速缓存中分配存储器请求,而不是在存储器高速缓存中分配存储器请求。

Patent Agency Ranking