Processor performance improvement for instruction sequences that include barrier instructions
    2.
    发明授权
    Processor performance improvement for instruction sequences that include barrier instructions 有权
    包括屏障指令的指令序列的处理器性能改善

    公开(公告)号:US08935513B2

    公开(公告)日:2015-01-13

    申请号:US13369029

    申请日:2012-02-08

    IPC分类号: G06F9/52

    摘要: A technique for processing an instruction sequence that includes a barrier instruction, a load instruction preceding the barrier instruction, and a subsequent memory access instruction following the barrier instruction includes determining that the load instruction is resolved based upon receipt of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction. The technique also includes if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating in response to determining the barrier instruction completed, execution of the subsequent memory access instruction. The technique further includes if execution of the subsequent memory access instruction is initiated prior to completion of the barrier instruction, discontinuing in response to determining the barrier instruction completed, tracking of the subsequent memory access instruction with respect to invalidation.

    摘要翻译: 一种用于处理指示序列的技术,该指令序列包括屏障指令,屏障指令之前的加载指令,以及跟随障碍指令之后的随后存储器访问指令,包括:基于接收到最早的良好组合响应来确定加载指令是否被解决 用于与加载指令相对应的读取操作和用于加载指令的数据。 该技术还包括如果在完成屏障指令之前没有启动后续存储器访问指令的执行,则响应于确定完成的屏障指令启动后续存储器访问指令的执行。 该技术还包括如果在完成屏障指令之前启动后续存储器访问指令的执行,则响应于确定所完成的屏障指令而中断,跟踪关于无效的后续存储器访问指令。

    PROCESSOR PERFORMANCE IMPROVEMENT FOR INSTRUCTION SEQUENCES THAT INCLUDE BARRIER INSTRUCTIONS
    3.
    发明申请
    PROCESSOR PERFORMANCE IMPROVEMENT FOR INSTRUCTION SEQUENCES THAT INCLUDE BARRIER INSTRUCTIONS 有权
    包括障碍指示的指令序列的处理器性能改进

    公开(公告)号:US20130205120A1

    公开(公告)日:2013-08-08

    申请号:US13369029

    申请日:2012-02-08

    IPC分类号: G06F9/312

    摘要: A technique for processing an instruction sequence that includes a barrier instruction, a load instruction preceding the barrier instruction, and a subsequent memory access instruction following the barrier instruction includes determining that the load instruction is resolved based upon receipt of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction. The technique also includes if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating in response to determining the barrier instruction completed, execution of the subsequent memory access instruction. The technique further includes if execution of the subsequent memory access instruction is initiated prior to completion of the barrier instruction, discontinuing in response to determining the barrier instruction completed, tracking of the subsequent memory access instruction with respect to invalidation.

    摘要翻译: 一种用于处理指示序列的技术,该指令序列包括屏障指令,屏障指令之前的加载指令,以及跟随障碍指令之后的随后存储器访问指令,包括:基于接收到最早的良好组合响应来确定加载指令是否被解决 用于与加载指令相对应的读取操作和用于加载指令的数据。 该技术还包括如果在完成屏障指令之前没有启动后续存储器访问指令的执行,则响应于确定完成的屏障指令启动后续存储器访问指令的执行。 该技术还包括如果在完成屏障指令之前启动后续存储器访问指令的执行,则响应于确定所完成的屏障指令而中断,跟踪关于无效的后续存储器访问指令。

    Processor, data processing system and method supporting a shared global coherency state
    4.
    发明授权
    Processor, data processing system and method supporting a shared global coherency state 失效
    处理器,数据处理系统和支持共享全局一致性状态的方法

    公开(公告)号:US08495308B2

    公开(公告)日:2013-07-23

    申请号:US11539694

    申请日:2006-10-09

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0831 G06F12/0817

    摘要: A multiprocessor data processing system includes at least first and second coherency domains, where the first coherency domain includes a system memory and a cache memory. According to a method of data processing, a cache line is buffered in a data array of the cache memory and a state field in a cache directory of the cache memory is set to a coherency state to indicate that the cache line is valid in the data array, that the cache line is held in the cache memory non-exclusively, and that another cache in said second coherency domain may hold a copy of the cache line.

    摘要翻译: 多处理器数据处理系统至少包括第一和第二相干域,其中第一相干域包括系统存储器和高速缓冲存储器。 根据数据处理的方法,将高速缓存行缓冲在高速缓冲存储器的数据阵列中,高速缓冲存储器的高速缓存目录中的状态字段被设置为一致性状态,以指示高速缓存行在数据中是有效的 数组,高速缓存存储器行被非排他地保存在高速缓冲存储器中,并且所述第二相干域中的另一个高速缓冲存储器可以保存高速缓存行的副本。

    Mode-based castout destination selection
    5.
    发明授权
    Mode-based castout destination selection 失效
    基于模式的castout目的地选择

    公开(公告)号:US08312220B2

    公开(公告)日:2012-11-13

    申请号:US12420933

    申请日:2009-04-09

    IPC分类号: G06F12/08

    CPC分类号: G06F12/0811 G06F12/12

    摘要: In response to a data request of a first of a plurality of processing units, the first processing unit selects a victim cache line to be castout from the lower level cache of the first processing unit and determines whether a mode is set. If not, the first processing unit issues on the interconnect fabric an LCO command identifying the victim cache line and indicating that a lower level cache is the intended destination. If the mode is set, the first processing unit issues a castout command with an alternative intended destination. In response to a coherence response to the LCO command indicating success of the LCO command, the first processing unit removes the victim cache line from its lower level cache, and the victim cache line is held elsewhere in the data processing system. The mode can be set to inhibit castouts to system memory, for example, for testing.

    摘要翻译: 响应于多个处理单元中的第一处理单元的数据请求,第一处理单元从第一处理单元的较低级高速缓存中选择要丢弃的牺牲高速缓存行,并且确定是否设置了模式。 如果不是,则第一处理单元在互连结构上发出识别受害者高速缓存行的LCO命令,并指示较低级别的高速缓存是预期的目的地。 如果模式被设置,则第一处理单元发出具有替代预定目的地的停顿命令。 响应于指示LCO命令成功的LCO命令的一致性响应,第一处理单元从其较低级高速缓存中去除受害者高速缓存行,并且将受害者高速缓存行保持在数据处理系统的其他地方。 该模式可以设置为抑制系统内存的丢弃,例如进行测试。

    Victim cache prefetching
    6.
    发明授权
    Victim cache prefetching 失效
    受害者缓存预取

    公开(公告)号:US08209489B2

    公开(公告)日:2012-06-26

    申请号:US12256064

    申请日:2008-10-22

    IPC分类号: G06F12/08

    摘要: A processing unit for a multiprocessor data processing system includes a processor core and a cache hierarchy coupled to the processor core to provide low latency data access. The cache hierarchy includes an upper level cache coupled to the processor core and a lower level victim cache coupled to the upper level cache. In response to a prefetch request of the processor core that misses in the upper level cache, the lower level victim cache determines whether the prefetch request misses in the directory of the lower level victim cache and, if so, allocates a state machine in the lower level victim cache that services the prefetch request by issuing the prefetch request to at least one other processing unit of the multiprocessor data processing system.

    摘要翻译: 用于多处理器数据处理系统的处理单元包括处理器核心和耦合到处理器核心的高速缓存层级以提供低延迟数据访问。 高速缓存层级包括耦合到处理器核心的高级缓存和耦合到高级缓存的较低级别的牺牲缓存。 响应于在高级缓存中丢失的处理器核心的预取请求,较低级别的受害者缓存确定预取请求是否丢失在较低级别的受害者缓存的目录中,并且如果是,则在下级缓存中分配状态机 通过向多处理器数据处理系统的至少一个其他处理单元发出预取请求来服务于预取请求。

    Fault tolerant encoding of directory states for stuck bits
    7.
    发明授权
    Fault tolerant encoding of directory states for stuck bits 有权
    卡位的目录状态的容错编码

    公开(公告)号:US08205136B2

    公开(公告)日:2012-06-19

    申请号:US12189808

    申请日:2008-08-12

    IPC分类号: G11C29/00

    CPC分类号: G11C29/832 G06F11/1064

    摘要: A method of handling a stuck bit in a directory of a cache memory, by defining multiple binary encodings to indicate a defective cache state, detecting an error in a tag stored in a member of the directory (wherein the tag at least includes an address field, a state field and an error-correction field), determining that the error is associated with a stuck bit of the directory member, and writing new state information to the directory member which is selected from one of the binary encodings based on a field location of the stuck bit within the directory member. The multiple binary encodings may include a first binary encoding when the stuck bit is in the address field, a second binary encoding when the stuck bit is in the state field, and a third binary encoding when the stuck bit is in the error-correction field. The new state information may also further be selected based on the value of the stuck bit, e.g., a state bit corresponding to the stuck bit is assigned a bit value from the new state information which matches the value of the stuck bit.

    摘要翻译: 一种通过定义多个二进制编码来指示缺陷高速缓存状态来处理高速缓冲存储器的目录中的卡住位的方法,检测存储在目录成员中的标签中的错误(其中标签至少包括地址字段 ,状态字段和纠错字段),确定错误与目录成员的卡住位相关联,并且基于字段位置将新状态信息写入从二进制编码之一中选择的目录成员 的目录成员中的卡住位。 多个二进制编码可以包括当卡住位在地址字段中时的第一二进制编码,当卡位位于状态字段时的第二二进制编码,以及当卡位位于错误校正字段中时的第三二进制编码 。 还可以基于卡住位的值进一步选择新的状态信息,例如,对应于该卡住位的状态位从与该卡位的值匹配的新状态信息中分配一位值。

    Issuing global shared memory operations via direct cache injection to a host fabric interface
    8.
    发明授权
    Issuing global shared memory operations via direct cache injection to a host fabric interface 有权
    通过直接缓存注入向主机结构接口发出全局共享内存操作

    公开(公告)号:US07966454B2

    公开(公告)日:2011-06-21

    申请号:US12024437

    申请日:2008-02-01

    IPC分类号: G06F9/318

    摘要: A data processing system enables global shared memory (GSM) operations across multiple nodes with a distributed EA-to-RA mapping of physical memory. Each node has a host fabric interface (HFI), which includes HFI windows that are assigned to at most one locally-executing task of a parallel job. The tasks perform parallel job execution, but map only a portion of the effective addresses (EAs) of the global address space to the local, real memory of the task's respective node. The HFI window tags all outgoing GSM operations (of the local task) with the job ID, and embeds the target node and HFI window IDs of the node at which the EA is memory mapped. The HFI window also enables processing of received GSM operations with valid EAs that are homed to the local real memory of the receiving node, while preventing processing of other received operations without a valid EA-to-RA local mapping.

    摘要翻译: 数据处理系统通过物理内存的分布式EA-to-RA映射实现跨多个节点的全局共享存储(GSM)操作。 每个节点都有一个主机结构接口(HFI),它包括分配给并行作业最多一个本地执行任务的HFI窗口。 任务执行并行作业执行,但将全局地址空间的有效地址(EA)的一部分映射到任务相应节点的本地实际存储器。 HFI窗口使用作业ID对所有传出的GSM操作(本地任务)进行标记,并嵌入EA被映射到的节点的目标节点和HFI窗口ID。 HFI窗口还能够利用归属于接收节点的本地实际存储器的有效EA来处理接收的GSM操作,同时防止在没有有效的EA到RA本地映射的情况下处理其他接收到的操作。

    MEMORY COHERENCE DIRECTORY SUPPORTING REMOTELY SOURCED REQUESTS OF NODAL SCOPE
    9.
    发明申请
    MEMORY COHERENCE DIRECTORY SUPPORTING REMOTELY SOURCED REQUESTS OF NODAL SCOPE 失效
    记忆协调指导原则支持远程请求的NODAL范围

    公开(公告)号:US20110047352A1

    公开(公告)日:2011-02-24

    申请号:US12545246

    申请日:2009-08-21

    IPC分类号: G06F15/76 G06F9/02 G06F12/08

    CPC分类号: G06F12/0817

    摘要: A data processing system includes at least a first through third processing nodes coupled by an interconnect fabric. The first processing node includes a master, a plurality of snoopers capable of participating in interconnect operations, and a node interface that receives a request of the master and transmits the request of the master to the second processing unit with a nodal scope of transmission limited to the second processing node. The second processing node includes a node interface having a directory. The node interface of the second processing node permits the request to proceed with the nodal scope of transmission if the directory does not indicate that a target memory block of the request is cached other than in the second processing node and prevents the request from succeeding if the directory indicates that the target memory block of the request is cached other than in the second processing node.

    摘要翻译: 数据处理系统至少包括通过互连结构耦合的第一至第三处理节点。 第一处理节点包括主机,能够参与互连操作的多个侦听器,以及接收主机请求的节点接口,并将主机的请求传送到第二处理单元,传送范围限于 第二处理节点。 第二处理节点包括具有目录的节点接口。 第二处理节点的节点接口允许请求继续进行节点传输范围,如果该目录没有指示该请求的目标存储器块不是在第二处理节点中被缓存,并且如果该请求成功 目录指示除第二处理节点之外的请求的目标存储块被缓存。

    Apparatus and computer program product in a processor for performing in-memory tracing using existing communication paths
    10.
    发明授权
    Apparatus and computer program product in a processor for performing in-memory tracing using existing communication paths 失效
    处理器中的设备和计算机程序产品,用于使用现有通信路径执行内存中跟踪

    公开(公告)号:US07844860B2

    公开(公告)日:2010-11-30

    申请号:US12201557

    申请日:2008-08-29

    IPC分类号: G06F11/00

    CPC分类号: G06F11/2268 G06F11/348

    摘要: An apparatus and computer program product are disclosed for performing in-memory hardware tracing in a processor using an existing system bus. The processor includes multiple processing units that are coupled together utilizing the system bus. The processing units include a memory controller that controls a system memory. Information is transmitted among the processing units utilizing the system bus. The information is formatted according to a standard system bus protocol. Hardware trace data is captured utilizing a hardware trace facility that is coupled directly to the system bus. The system bus is utilized for transmitting the hardware trace data to the memory controller for storage in the system memory. The memory controller is coupled directly to the system bus. The hardware trace data is formatted according to the standard system bus protocol for transmission via the system bus.

    摘要翻译: 公开了一种用于使用现有系统总线在处理器中执行存储器内硬件跟踪的装置和计算机程序产品。 处理器包括利用系统总线耦合在一起的多个处理单元。 处理单元包括控制系统存储器的存储器控​​制器。 利用系统总线在处理单元之间传送信息。 信息根据标准系统总线协议进行格式化。 使用直接耦合到系统总线的硬件跟踪功能来捕获硬件跟踪数据。 系统总线用于将硬件跟踪数据发送到存储器控制器以存储在系统存储器中。 存储器控制器直接耦合到系统总线。 硬件跟踪数据根据标准系统总线协议进行格式化,以便通过系统总线进行传输。