Least recently used mechanism for cache line eviction from a cache memory
    1.
    发明授权
    Least recently used mechanism for cache line eviction from a cache memory 有权
    最近用于高速缓存存储器缓存线驱逐的最近使用的机制

    公开(公告)号:US09563575B2

    公开(公告)日:2017-02-07

    申请号:US14929645

    申请日:2015-11-02

    Applicant: Apple Inc.

    Abstract: A mechanism for evicting a cache line from a cache memory includes first selecting for eviction a least recently used cache line of a group of invalid cache lines. If all cache lines are valid, selecting for eviction a least recently used cache line of a group of cache lines in which no cache line of the group of cache lines is also stored within a higher level cache memory such as the L1 cache, for example. Lastly, if all cache lines are valid and there are no non-inclusive cache lines, selecting for eviction the least recently used cache line stored in the cache memory.

    Abstract translation: 用于从高速缓冲存储器中逐出高速缓存行的机制包括首先选择驱逐一组无效高速缓存行的最近最少使用的高速缓存行。 如果所有高速缓存行都有效,则选择驱逐,一组高速缓存行的最近最少使用的高速缓存行,其中该高速缓存行组中的高速缓存行也不存储在诸如L1高速缓存的更高级高速缓冲存储器中 。 最后,如果所有高速缓存行都是有效的,并且没有非包含的高速缓存行,则选择驱逐存储在高速缓冲存储器中的最近最少使用的高速缓存行。

    Delaying cache data array updates
    2.
    发明授权
    Delaying cache data array updates 有权
    延迟缓存数据阵列更新

    公开(公告)号:US09229866B2

    公开(公告)日:2016-01-05

    申请号:US14089014

    申请日:2013-11-25

    Applicant: Apple Inc.

    CPC classification number: G06F12/0811 G06F12/0842 G06F12/0857 G06F12/0888

    Abstract: Systems, methods, and apparatuses for reducing writes to the data array of a cache. A cache hierarchy includes one or more L1 caches and a L2 cache inclusive of the L2 cache(s). When a request from the L1 cache misses in the L2 cache, the L2 cache sends a fill request to memory. When the fill data returns from memory, the L2 cache delays writing the fill data to its data array. Instead, this cache line is written to the L1 cache and a clean-evict bit corresponding to the cache line is set in the L1 cache. When the L1 cache evicts this cache line, the L1 cache will write back the cache line to the L2 cache even if the cache line has not been modified.

    Abstract translation: 用于减少对缓存的数据阵列的写入的系统,方法和装置。 高速缓存层级包括一个或多个L1高速缓存和包括L2高速缓存的L2高速缓存。 当来自L1缓存的请求在L2高速缓存中丢失时,L2缓存向存储器发送填充请求。 当填充数据从存储器返回时,L2缓存延迟将填充数据写入其数据阵列。 相反,该缓存行被写入到L1高速缓存中,并且在高速缓存中设置与高速缓存行相对应的清除位。 当L1高速缓存驱逐此高速缓存行时,即使高速缓存行未被修改,L1高速缓存也将高速缓存行写回到L2高速缓存。

    SELECTIVE VICTIMIZATION IN A MULTI-LEVEL CACHE HIERARCHY
    3.
    发明申请
    SELECTIVE VICTIMIZATION IN A MULTI-LEVEL CACHE HIERARCHY 有权
    多层次高速缓存中的选择性维权

    公开(公告)号:US20150149721A1

    公开(公告)日:2015-05-28

    申请号:US14088980

    申请日:2013-11-25

    Applicant: Apple Inc.

    Abstract: Systems, methods, and apparatuses for implementing selective victimization to reduce power and utilized bandwidth in a multi-level cache hierarchy. Each set of an upper-level cache includes a counter that keeps track of the number of times the set was accessed. These counters are periodically decremented by another counter that tracks the total number of accesses to the cache. If a given set counter is below a certain threshold value, clean victims are dropped from this given set instead of being sent to a lower-level cache. Also, a separate counter is used to track the total number of outstanding requests for the cache as a proxy for bus-bandwidth in order to gauge the total amount of traffic in the system. The cache will implement selective victimization whenever there is a large amount of traffic in the system.

    Abstract translation: 用于实现选择性受害以在多级缓存层级中降低功率和利用带宽的系统,方法和装置。 每一组上级缓存包括一个计数器,用于跟踪该组被访问的次数。 这些计数器通过另一个计数器周期性递减,该计数器跟踪对高速缓存的总访问次数。 如果给定的设置计数器低于某个阈值,则清除的受害者将从该给定集合中删除,而不是发送到较低级别的缓存。 此外,使用单独的计数器来跟踪作为总线带宽的代理的缓存的未完成请求的总数,以便测量系统中的总流量。 当系统中存在大量流量时,缓存将实现选择性受害。

    Methods for cache line eviction
    4.
    发明授权
    Methods for cache line eviction 有权
    缓存线驱逐的方法

    公开(公告)号:US09529730B2

    公开(公告)日:2016-12-27

    申请号:US14263386

    申请日:2014-04-28

    Applicant: Apple Inc.

    Abstract: A method and apparatus for evicting cache lines from a cache memory includes receiving a request from one of a plurality of processors. The cache memory is configured to store a plurality of cache lines, and a given cache line includes an identifier indicating a processor that performed a most recent access of the given cache line. The method further includes selecting a cache line for eviction from a group of least recently used cache lines, where each cache line of the group of least recently used cache lines occupy a priority position less that a predetermined value, and then evicting the selected cache line.

    Abstract translation: 用于从高速缓冲存储器中取出高速缓存行的方法和装置包括从多个处理器之一接收请求。 高速缓存存储器被配置为存储多条高速缓存行,并且给定的高速缓存行包括指示执行给定高速缓存行的最近访问的处理器的标识符。 该方法还包括从一组最近最少使用的高速缓存行中选择用于逐出的高速缓存行,其中最近最少使用的高速缓存行的组中的每个高速缓存行占据优先级位置小于预定值,然后逐出所选择的高速缓存行 。

    SELECTIVE CACHE WAY-GROUP POWER DOWN
    5.
    发明申请
    SELECTIVE CACHE WAY-GROUP POWER DOWN 有权
    选择性快速组合断电

    公开(公告)号:US20150309939A1

    公开(公告)日:2015-10-29

    申请号:US14263369

    申请日:2014-04-28

    Applicant: Apple Inc.

    CPC classification number: G06F12/0895 G06F2212/1028 Y02D10/13

    Abstract: A method and apparatus for selectively powering down a portion of a cache memory includes determining a power down condition dependent upon a number of accesses to the cache memory. In response to the detection of the power down condition, selecting a group of cache ways included in the cache memory dependent upon a number of cache lines in each cache way that are also included in another cache memory. The method further includes locking and flushing the selected group of cache ways, and then activating a low power mode for the selected group of cache ways.

    Abstract translation: 用于选择性地降低高速缓存存储器的一部分的方法和装置包括根据对高速缓冲存储器的访问次数确定掉电条件。 响应于断电状态的检测,根据还包括在另一个高速缓冲存储器中的每种高速缓存方式中的高速缓存行数量,选择包括在高速缓冲存储器中的一组高速缓存路。 该方法还包括锁定和刷新所选择的一组高速缓存路径,然后激活所选择的高速缓存路径组的低功率模式。

    COMPLETING LOAD AND STORE INSTRUCTIONS IN A WEAKLY-ORDERED MEMORY MODEL
    6.
    发明申请
    COMPLETING LOAD AND STORE INSTRUCTIONS IN A WEAKLY-ORDERED MEMORY MODEL 有权
    在一个令人担忧的内存模型中完成载入和存储指令

    公开(公告)号:US20140215190A1

    公开(公告)日:2014-07-31

    申请号:US13750942

    申请日:2013-01-25

    Applicant: APPLE INC.

    Abstract: Techniques are disclosed relating to completion of load and store instructions in a weakly-ordered memory model. In one embodiment, a processor includes a load queue and a store queue and is configured to associate queue information with a load instruction in an instruction stream. In this embodiment, the queue information indicates a location of the load instruction in the load queue and one or more locations in the store queue that are associated with one or more store instructions that are older than the load instruction. The processor may determine, using the queue information, that the load instruction does not conflict with a store instruction in the store queue that is older than the load instruction. The processor may remove the load instruction from the load queue while the store instruction remains in the store queue. The queue information may include a wrap value for the load queue.

    Abstract translation: 公开了在弱有序存储器模型中完成负载和存储指令的技术。 在一个实施例中,处理器包括加载队列和存储队列,并且被配置为将队列信息与指令流中的加载指令相关联。 在该实施例中,队列信息指示加载队列中的加载指令的位置和存储队列中与一个或多个比加载指令更早的存储指令相关联的一个或多个位置。 处理器可以使用队列信息来确定加载指令不与存储队列中比加载指令更早的存储指令冲突。 当存储指令保留在存储队列中时,处理器可以从加载队列中移除加载指令。 队列信息可以包括加载队列的换行值。

    Completing load and store instructions in a weakly-ordered memory model
    7.
    发明授权
    Completing load and store instructions in a weakly-ordered memory model 有权
    在弱有序的内存模型中完成加载和存储指令

    公开(公告)号:US09535695B2

    公开(公告)日:2017-01-03

    申请号:US13750942

    申请日:2013-01-25

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to completion of load and store instructions in a weakly-ordered memory model. In one embodiment, a processor includes a load queue and a store queue and is configured to associate queue information with a load instruction in an instruction stream. In this embodiment, the queue information indicates a location of the load instruction in the load queue and one or more locations in the store queue that are associated with one or more store instructions that are older than the load instruction. The processor may determine, using the queue information, that the load instruction does not conflict with a store instruction in the store queue that is older than the load instruction. The processor may remove the load instruction from the load queue while the store instruction remains in the store queue. The queue information may include a wrap value for the load queue.

    Abstract translation: 公开了在弱有序存储器模型中完成负载和存储指令的技术。 在一个实施例中,处理器包括加载队列和存储队列,并且被配置为将队列信息与指令流中的加载指令相关联。 在该实施例中,队列信息指示加载队列中的加载指令的位置和存储队列中与一个或多个比加载指令更早的存储指令相关联的一个或多个位置。 处理器可以使用队列信息来确定加载指令不与存储队列中比加载指令更早的存储指令冲突。 当存储指令保留在存储队列中时,处理器可以从加载队列中移除加载指令。 队列信息可以包括加载队列的换行值。

    Prefetching across page boundaries in hierarchically cached processors
    8.
    发明授权
    Prefetching across page boundaries in hierarchically cached processors 有权
    在分级缓存的处理器中预取页面边界

    公开(公告)号:US09047198B2

    公开(公告)日:2015-06-02

    申请号:US13689696

    申请日:2012-11-29

    Applicant: Apple Inc.

    Abstract: Processors and methods for preventing lower level prefetch units from stalling at page boundaries. An upper level prefetch unit closest to the processor core issues a preemptive request for a translation of the next page in a given prefetch stream. The upper level prefetch unit sends the translation to the lower level prefetch units prior to the lower level prefetch units reaching the end of the current page for the given prefetch stream. When the lower level prefetch units reach the boundary of the current page, instead of stopping, these prefetch units can continue to prefetch by jumping to the next physical page number provided in the translation.

    Abstract translation: 用于防止较低级别的预取单元在页面边界停止的处理器和方法。 最靠近处理器核心的高级预取单元在给定的预取流中发出对下一页的翻译的抢占请求。 在较低级预取单元到达给定预取流的当前页面的末尾之前,高级预取单元将转换发送到较低级预取单元。 当低级预取单元到达当前页面的边界而不是停止时,这些预取单元可以通过跳转到翻译中提供的下一个物理页码继续预取。

    Access map-pattern match based prefetch unit for a processor
    9.
    发明授权
    Access map-pattern match based prefetch unit for a processor 有权
    为处理器访问基于地图模式匹配的预取单元

    公开(公告)号:US09015422B2

    公开(公告)日:2015-04-21

    申请号:US13942780

    申请日:2013-07-16

    Applicant: Apple Inc.

    CPC classification number: G06F12/0862 G06F2212/6026 Y02D10/13

    Abstract: In an embodiment, a processor may implement an access map-pattern match (AMPM)-based prefetcher in which patterns may include wild cards for some cache blocks. The wild card may match any access for the corresponding cache block (e.g. no access, demand access, prefetch, successful prefetch, etc.). Furthermore, patterns with irregular strides and/or irregular access patterns may be included in the matching patterns and may be detected for prefetch generation. In an embodiment, the AMPM prefetcher may implement a chained access map for large streaming prefetches. If a stream is detected, the AMPM prefetcher may allocate a pair of map entries for the stream and may reuse the pair for subsequent access map regions within the stream. In some embodiments, a quality factor may be associated with each access map and may control the rate of prefetch generation.

    Abstract translation: 在一个实施例中,处理器可以实现基于访问映射模式匹配(AMPM)的预取器,其中模式可以包括一些高速缓存块的通配符。 通配符可以匹配对应的高速缓存块的任何访问(例如,无访问,请求访问,预取,成功预取等)。 此外,具有不规则步幅和/或不规则访问模式的模式可以被包括在匹配模式中,并且可以被检测用于预取生成。 在一个实施例中,AMPM预取器可以实现用于大型流预取的链接访问映射。 如果检测到流,则AMPM预取器可以为流分配一对映射条目,并且可以将该对重新使用在该流内的后续访问映射区域。 在一些实施例中,质量因子可以与每个访问映射关联,并且可以控制预取生成的速率。

    Selective victimization in a multi-level cache hierarchy
    10.
    发明授权
    Selective victimization in a multi-level cache hierarchy 有权
    多级缓存层次结构中的选择性受害

    公开(公告)号:US09298620B2

    公开(公告)日:2016-03-29

    申请号:US14088980

    申请日:2013-11-25

    Applicant: Apple Inc.

    Abstract: Systems, methods, and apparatuses for implementing selective victimization to reduce power and utilized bandwidth in a multi-level cache hierarchy. Each set of an upper-level cache includes a counter that keeps track of the number of times the set was accessed. These counters are periodically decremented by another counter that tracks the total number of accesses to the cache. If a given set counter is below a certain threshold value, clean victims are dropped from this given set instead of being sent to a lower-level cache. Also, a separate counter is used to track the total number of outstanding requests for the cache as a proxy for bus-bandwidth in order to gauge the total amount of traffic in the system. The cache will implement selective victimization whenever there is a large amount of traffic in the system.

    Abstract translation: 用于实现选择性受害以在多级缓存层级中降低功率和利用带宽的系统,方法和装置。 每一组上级缓存包括一个计数器,用于跟踪该组被访问的次数。 这些计数器通过另一个计数器周期性递减,该计数器跟踪对高速缓存的总访问次数。 如果给定的设置计数器低于某个阈值,则清除的受害者将从该给定集合中删除,而不是发送到较低级别的缓存。 此外,使用单独的计数器来跟踪作为总线带宽的代理的缓存的未完成请求的总数,以便测量系统中的总流量。 当系统中存在大量流量时,缓存将实现选择性受害。

Patent Agency Ranking