Converting victim writeback to a fill
    1.
    发明授权
    Converting victim writeback to a fill 有权
    将受害者回写转换为填充

    公开(公告)号:US08364907B2

    公开(公告)日:2013-01-29

    申请号:US13359547

    申请日:2012-01-27

    Abstract: In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.

    Abstract translation: 在一个实施例中,处理器可以被配置为将ECC粒度存储写入数据高速缓存,而非ECC粒度存储可以与存储器请求缓冲器中的高速缓存数据合并。 在一个实施例中,处理器可以被配置为检测受害者块回写命中存储器请求缓冲器中的一个或多个存储器(或反之亦然),并且可以将受害者块回写转换为填充。 在一个实施例中,处理器可以推测性地发出来自加载/存储队列的负载后的存储,但是响应于负载上的窥探命中而阻止对存储的更新。

    Converting victim writeback to a fill
    2.
    发明授权
    Converting victim writeback to a fill 有权
    将受害者回写转换为填充

    公开(公告)号:US08131946B2

    公开(公告)日:2012-03-06

    申请号:US12908535

    申请日:2010-10-20

    Abstract: In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.

    Abstract translation: 在一个实施例中,处理器可以被配置为将ECC粒度存储写入数据高速缓存,而非ECC粒度存储可以与存储器请求缓冲器中的高速缓存数据合并。 在一个实施例中,处理器可以被配置为检测受害者块回写命中存储器请求缓冲器中的一个或多个存储器(或反之亦然),并且可以将受害者块回写转换为填充。 在一个实施例中,处理器可以推测性地发出来自加载/存储队列的负载后的存储,但是响应于负载上的窥探命中而阻止对存储的更新。

    Multi-stride prefetcher with a recurring prefetch table
    3.
    发明授权
    Multi-stride prefetcher with a recurring prefetch table 有权
    具有循环预取表的多步预取器

    公开(公告)号:US07487296B1

    公开(公告)日:2009-02-03

    申请号:US11062266

    申请日:2005-02-17

    CPC classification number: G06F12/0862 G06F9/3455 G06F9/383 G06F2212/6026

    Abstract: A multi-stride prefetcher includes a recurring prefetch table that in turn includes a stream table and an index table. The stream table includes a valid field and a tag field. The stream table also includes a thread number field to help support multi-threaded processor cores. The tag field stores a tag from an address associated with a cache miss. The index table includes fields for storing information characterizing a state machine. The fields include a learning bit. The multi-stride prefetcher prefetches data into a cache for a plurality of streams of cache misses, each stream having a plurality of strides.

    Abstract translation: 多步预取器包括循环预取表,其又包括流表和索引表。 流表包括一个有效的字段和一个标签字段。 流表还包括一个线程号字段,以帮助支持多线程处理器内核。 标签字段从与缓存未命中相关联的地址中存储标签。 索引表包括用于存储表征状态机的信息的字段。 这些字段包括一个学习位。 多步预取器将数据预取为多个高速缓存未命中流的高速缓存,每个流具有多个步幅。

    Processor that eliminates mis-steering instruction fetch resulting from incorrect resolution of mis-speculated branch instructions

    公开(公告)号:US07076640B2

    公开(公告)日:2006-07-11

    申请号:US10095397

    申请日:2002-03-11

    CPC classification number: G06F9/30058 G06F9/3867

    Abstract: A processor avoids or eliminates repetitive replay conditions and frequent instruction resteering through various techniques including resteering the fetch after the branch instruction retires, and delaying branch resolution. A processor resolves conditional branches and avoids repetitive resteering by delaying branch resolution. The processor has an instruction pipeline with inserted delay in branch condition and replay control pathways. For example, an instruction sequence that includes a load instruction followed by a subtract instruction then a conditional branch, delays branch resolution to allow time for analysis to determine whether the condition branch has resolved correctly. Eliminating incorrect branch resolutions prevents flushing of correctly predicted branches.

    Method and apparatus for reducing register file access times in pipelined processors
    5.
    发明授权
    Method and apparatus for reducing register file access times in pipelined processors 有权
    用于在流水线处理器中减少寄存器文件访问时间的方法和装置

    公开(公告)号:US06934830B2

    公开(公告)日:2005-08-23

    申请号:US10259721

    申请日:2002-09-26

    CPC classification number: G06F9/30138 G06F9/3824 G06F9/3857

    Abstract: One embodiment of the present invention provides a system that reduces the time required to access registers from a register file within a processor. During operation, the system receives an instruction to be executed, wherein the instruction identifies at least one operand to be accessed from the register file. Next, the system looks up the operands in a register pane, wherein the register pane is smaller and faster than the register file and contains copies of a subset of registers from the register file. If the lookup is successful, the system retrieves the operands from the register pane to execute the instruction. Otherwise, if the lookup is not successful, the system retrieves the operands from the register file, and stores the operands into the register pane. This triggers the system to reissue the instruction to be executed again, so that the re-issued instruction retrieves the operands from the register pane.

    Abstract translation: 本发明的一个实施例提供一种减少从处理器内的寄存器文件访问寄存器所需的时间的系统。 在操作期间,系统接收要执行的指令,其中该指令从该寄存器文件中识别要访问的至少一个操作数。 接下来,系统在寄存器窗格中查找操作数,其中寄存器窗格比寄存器文件更小和更快,并且包含寄存器文件中寄存器子集的副本。 如果查找成功,系统将从寄存器窗格中检索操作数,执行指令。 否则,如果查找不成功,系统将从寄存器文件中检索操作数,并将操作数存储到寄存器窗格中。 这将触发系统重新发出要再次执行的指令,以便重新发出的指令从寄存器窗格中检索操作数。

    Prefetch Unit
    6.
    发明申请
    Prefetch Unit 有权
    预取单元

    公开(公告)号:US20110264864A1

    公开(公告)日:2011-10-27

    申请号:US13165297

    申请日:2011-06-21

    Abstract: In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.

    Abstract translation: 在一个实施例中,处理器包括耦合到数据高速缓存的预取单元。 预取单元被配置为同时维护多个单独的活动预取流。 每个预取流是由处理器执行专用预取指令的软件或通过一个或多个加载/存储存储器操作通过检测到数据高速缓存未命中而启动的硬件。 预取单元还被配置为响应于多个预取流来生成预取请求,以将数据预取到数据高速缓存中。

    Data Cache Block Zero Implementation
    7.
    发明申请
    Data Cache Block Zero Implementation 有权
    数据缓存块零实现

    公开(公告)号:US20100106916A1

    公开(公告)日:2010-04-29

    申请号:US12650075

    申请日:2009-12-30

    Abstract: In one embodiment, a processor comprises a core configured to execute a data cache block write instruction and an interface unit coupled to the core and to an interconnect on which the processor is configured to communicate. The core is configured to transmit a request to the interface unit in response to the data cache block write instruction. If the request is speculative, the interface unit is configured to issue a first transaction on the interconnect. On the other hand, if the request is non-speculative, the interface unit is configured to issue a second transaction on the interconnect. The second transaction is different from the first transaction. For example, the second transaction may be an invalidate transaction and the first transaction may be a probe transaction. In some embodiments, the processor may be in a system including the interconnect and one or more caching agents.

    Abstract translation: 在一个实施例中,处理器包括被配置为执行数据高速缓存块写入指令的核心和耦合到所述核心和所述处理器被配置为在其上进行通信的互连的接口单元。 核心被配置为响应于数据高速缓存块写入指令向接口单元发送请求。 如果请求是推测性的,则接口单元被配置为在互连上发布第一事务。 另一方面,如果请求是非推测性的,则接口单元被配置为在互连上发布第二事务。 第二个交易与第一笔交易不同。 例如,第二事务可以是无效事务,并且第一事务可以是探查事务。 在一些实施例中,处理器可以在包括互连和一个或多个高速缓存代理的系统中。

    Prefetch unit
    8.
    发明授权
    Prefetch unit 有权
    预取单元

    公开(公告)号:US07493451B2

    公开(公告)日:2009-02-17

    申请号:US11453708

    申请日:2006-06-15

    Abstract: In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.

    Abstract translation: 在一个实施例中,处理器包括耦合到数据高速缓存的预取单元。 预取单元被配置为同时维护多个单独的活动预取流。 每个预取流是由处理器执行专用预取指令的软件或通过一个或多个加载/存储存储器操作通过检测到数据高速缓存未命中而启动的硬件。 预取单元还被配置为响应于多个预取流来生成预取请求,以将数据预取到数据高速缓存中。

    Partial load/store forward prediction
    9.
    发明申请
    Partial load/store forward prediction 有权
    部分负载/存储正向预测

    公开(公告)号:US20070038846A1

    公开(公告)日:2007-02-15

    申请号:US11200744

    申请日:2005-08-10

    Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.

    Abstract translation: 在一个实施例中,处理器包括预测电路和耦合到预测电路的另一电路。 预测电路被配置为预测在执行期间第一加载指令是否将经历部分存储以进行加载(PSTLF)事件。 如果响应于执行期间的第一加载指令访问的多个字节包括至少响应于先前未提交的存储操作而更新的第一字节,并且还包括响应于先前未提交的存储器而不更新的至少第二字节,则发生PSTLF事件 操作。 耦合以接收第一加载指令,该电路被配置为响应于第一加载指令生成一个或多个加载操作。 在处理器中执行加载操作以执行第一加载指令,并且多个加载操作取决于预测电路的预测。

    Method and apparatus for reducing the effects of hot spots in cache memories
    10.
    发明授权
    Method and apparatus for reducing the effects of hot spots in cache memories 有权
    减少高速缓冲存储器中热点影响的方法和装置

    公开(公告)号:US06948032B2

    公开(公告)日:2005-09-20

    申请号:US10354327

    申请日:2003-01-29

    CPC classification number: G06F12/0897

    Abstract: One embodiment of the present invention provides a system that uses a hot spot cache to alleviate the performance problems caused by hot spots in cache memories, wherein the hot spot cache stores lines that are evicted from hot spots in the cache. Upon receiving a memory operation at the cache, the system performs a lookup for the memory operation in both the cache and the hot spot cache in parallel. If the memory operation is a read operation that causes a miss in the cache and a hit in the hot spot cache, the system reads a data line for the read operation from the hot spot cache, writes the data line to the cache, performs the read operation on the data line in the cache, and then evicts the data line from the hot spot cache.

    Abstract translation: 本发明的一个实施例提供一种使用热点缓存来缓解由高速缓冲存储器中的热点引起的性能问题的系统,其中热点缓存存储从高速缓存中的热点驱逐的线。 在缓存中接收到存储器操作时,系统并行地对高速缓存和热点高速缓存中的存储器操作进行查找。 如果存储器操作是导致高速缓存中的缺失和热点高速缓存中的命中的读取操作,则系统从热点缓存读取用于读取操作的数据行,将数据行写入高速缓存,执行 在缓存中的数据行上读取操作,然后从热点缓存中排除数据行。

Patent Agency Ranking