In-Data Path Tracking of Floating Point Exceptions and Store-Based Exception Indication
    51.
    发明申请
    In-Data Path Tracking of Floating Point Exceptions and Store-Based Exception Indication 审中-公开
    浮点异常和基于存储的异常指示的数据间路径跟踪

    公开(公告)号:US20110047358A1

    公开(公告)日:2011-02-24

    申请号:US12543614

    申请日:2009-08-19

    IPC分类号: G06F9/30 G06F9/302

    摘要: Mechanisms are provided for tracking exceptions in the execution of vectorized code. A speculative instruction is executed on a vector element of a vector. An exception condition is detected in association with the vector element based on a result of executing the speculative instruction on the vector element. A special exception value is stored in the vector element in a vector register corresponding to the vector, indicative of the exception condition, without invoking an exception handler for the exception condition. The special exception value is propagated with the vector element of the vector through a processor architecture of the processor, without invoking the exception handler for the exception condition. An exception corresponding to the exception condition indicated by the special exception value is generated only in response to a non-speculative instruction being executed that performs a non-speculative operation on the vector element.

    摘要翻译: 提供了用于跟踪执行向量化代码中的异常的机制。 对向量的向量元素执行推测指令。 基于向量元素执行推测指令的结果,与向量元素相关联地检测异常条件。 一个特殊的异常值被存储在矢量寄存器的向量寄存器中,该向量寄存器对应于向量,表示异常条件,而不调用异常条件的异常处理程序。 特殊异常值通过处理器的处理器架构与向量的向量元素传播,而不调用异常条件的异常处理程序。 由特殊异常值指示的异常条件对应的异常仅在响应于执行对向量元素执行非推测性操作的非推测性指令时产生。

    Optimizing layout of an application on a massively parallel supercomputer
    52.
    发明申请
    Optimizing layout of an application on a massively parallel supercomputer 失效
    在大型并行超级计算机上优化应用程序的布局

    公开(公告)号:US20060101104A1

    公开(公告)日:2006-05-11

    申请号:US10963101

    申请日:2004-10-12

    IPC分类号: G06F1/16

    CPC分类号: G06F9/5066

    摘要: A general computer-implement method and apparatus to optimize problem layout on a massively parallel supercomputer is described. The method takes as input the communication matrix of an arbitrary problem in the form of an array whose entries C(i, j) are the amount to data communicated from domain i to domain j. Given C(i, j), first implement a heuristic map is implemented which attempts sequentially to map a domain and its communications neighbors either to the same supercomputer node or to near-neighbor nodes on the supercomputer torus while keeping the number of domains mapped to a supercomputer node constant (as much as possible). Next a Markov Chain of maps is generated from the initial map using Monte Carlo simulation with Free Energy (cost function) F=Σi,jC(i,j)H(i,j)—where H(i,j) is the smallest number of hops on the supercomputer torus between domain i and domain j. On the cases tested, found was that the method produces good mappings and has the potential to be used as a general layout optimization tool for parallel codes. At the moment, the serial code implemented to test the method is un-optimized so that computation time to find the optimum map can be several hours on a typical PC. For production implementation, good parallel code for our algorithm would be required which could itself be implemented on supercomputer.

    摘要翻译: 描述了在大型并行超级计算机上优化问题布局的通用计算机实现方法和装置。 该方法采用数组形式的任意问题的通信矩阵作为输入,其条目C(i,j)是从域i到域j传送的数据量。 给定C(i,j),首先实现启发式映射,其尝试顺序地将域及其通信邻居映射到超级计算机节点或超级计算机环面上的近邻节点,同时保持域的数量映射到 超级计算机节点常数(尽可能多)。 接下来,使用具有自由能量(成本函数)的蒙特卡罗模拟,从初始映射生成马尔可夫链映射,其中F =Σi,j C(i,j)H(i,j) H(i,j)是域i和域j之间的超级计算机环面上的最小跳数。 在测试的情况下,发现该方法产生良好的映射,并且有可能被用作并行代码的通用布局优化工具。 此时,实现测试方法的序列号未优化,以便在典型的PC上找到最佳映射的计算时间可以为几个小时。 对于生产实现,将需要我们的算法的良好的并行代码,这本身可以在超级计算机上实现。

    Cache directory lookup reader set encoding for partial cache line speculation support
    54.
    发明授权
    Cache directory lookup reader set encoding for partial cache line speculation support 有权
    缓存目录查找阅读器集编码为部分缓存线投机支持

    公开(公告)号:US08868837B2

    公开(公告)日:2014-10-21

    申请号:US13008602

    申请日:2011-01-18

    CPC分类号: G06F9/524 G06F12/08

    摘要: In a multiprocessor system, with conflict checking implemented in a directory lookup of a shared cache memory, a reader set encoding permits dynamic recordation of read accesses. The reader set encoding includes an indication of a portion of a line read, for instance by indicating boundaries of read accesses. Different encodings may apply to different types of speculative execution.

    摘要翻译: 在多处理器系统中,通过在共享高速缓冲存储器的目录查找中实现冲突检查,读取器集合编码允许读取访问的动态记录。 读取器组编码包括例如通过指示读取访问的边界来读取行的一部分的指示。 不同的编码可能适用于不同类型的投机执行。

    List based prefetch
    56.
    发明授权
    List based prefetch 有权
    基于列表的预取

    公开(公告)号:US08806141B2

    公开(公告)日:2014-08-12

    申请号:US13593838

    申请日:2012-08-24

    IPC分类号: G06F12/00 G06F12/08

    CPC分类号: G06F12/0862

    摘要: A list prefetch engine improves a performance of a parallel computing system. The list prefetch engine receives a current cache miss address. The list prefetch engine evaluates whether the current cache miss address is valid. If the current cache miss address is valid, the list prefetch engine compares the current cache miss address and a list address. A list address represents an address in a list. A list describes an arbitrary sequence of prior cache miss addresses. The prefetch engine prefetches data according to the list, if there is a match between the current cache miss address and the list address.

    摘要翻译: 列表预取引擎提高并行计算系统的性能。 列表预取引擎接收当前高速缓存未命中地址。 列表预取引擎评估当前缓存未命中地址是否有效。 如果当前高速缓存未命中地址有效,则列表预取引擎将比较当前高速缓存未命中地址和列表地址。 列表地址表示列表中的地址。 列表描述了先前高速缓存未命中地址的任意序列。 如果当前缓存未命中地址和列表地址之间存在匹配,则预取引擎将根据列表预取数据。

    Efficiency of static core turn-off in a system-on-a-chip with variation
    57.
    发明授权
    Efficiency of static core turn-off in a system-on-a-chip with variation 失效
    在具有变化的片上系统中静态磁芯关断的效率

    公开(公告)号:US08571847B2

    公开(公告)日:2013-10-29

    申请号:US12727984

    申请日:2010-03-19

    IPC分类号: G06G7/75

    摘要: A processor-implemented method for improving efficiency of a static core turn-off in a multi-core processor with variation, the method comprising: conducting via a simulation a turn-off analysis of the multi-core processor at the multi-core processor's design stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's design stage includes a first output corresponding to a first multi-core processor core to turn off; conducting a turn-off analysis of the multi-core processor at the multi-core processor's testing stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's testing stage includes a second output corresponding to a second multi-core processor core to turn off; comparing the first output and the second output to determine if the first output is referring to the same core to turn off as the second output; outputting a third output corresponding to the first multi-core processor core if the first output and the second output are both referring to the same core to turn off.

    摘要翻译: 一种用于提高多核处理器中的静态核心关断的效率的处理器实现的方法,所述方法包括:通过模拟在多核处理器的设计处进行多核处理器的关断分析 其中所述多核处理器的设计阶段的所述多核处理器的关断分析包括对应于第一多核处理器核的第一输出关闭; 在多核处理器的测试阶段对多核处理器进行关断分析,其中多核处理器的测试阶段的多核处理器的关断分析包括对应于第二多核处理器的第二多输出 核心处理器核心关闭; 比较第一输出和第二输出以确定第一输出是否指相同的磁芯作为第二输出关闭; 如果第一输出和第二输出均指向相同的核来关闭,则输出对应于第一多核处理器核心的第三输出。

    Ordering of guarded and unguarded stores for no-sync I/O
    58.
    发明授权
    Ordering of guarded and unguarded stores for no-sync I/O 失效
    为不同步I / O订购防护和无保护的存储

    公开(公告)号:US08473683B2

    公开(公告)日:2013-06-25

    申请号:US12986349

    申请日:2011-01-07

    IPC分类号: G06F12/12

    摘要: A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue.

    摘要翻译: 并行计算系统处理至少一个存储指令。 第一个处理器核心发出存储指令。 与第一处理器核心相关联的第一个队列存储存储指令。 与第一处理器核心的第一本地高速缓冲存储器设备相关联的第二队列存储存储指令。 第一处理器核心根据存储指令来更新第一本地高速缓冲存储器设备中的第一数据。 与至少一个共享高速缓冲存储器设备相关联的第三队列存储存储指令。 第一处理器核心使与存储指令相关联的第二数据在至少一个共享高速缓冲存储器中无效。 第一个处理器核心将与存储指令相关联的第三个数据与其他处理器内核的其他本地缓存存储器设备无效。 第一个处理器核心只冲刷第一个队列。

    READER SET ENCODING FOR DIRECTORY OF SHARED CACHE MEMORY IN MULTIPROCESSOR SYSTEM
    60.
    发明申请
    READER SET ENCODING FOR DIRECTORY OF SHARED CACHE MEMORY IN MULTIPROCESSOR SYSTEM 失效
    在多处理器系统中编写共享高速缓存存储器的目录的读写器集

    公开(公告)号:US20110219191A1

    公开(公告)日:2011-09-08

    申请号:US13008583

    申请日:2011-01-18

    IPC分类号: G06F12/08

    CPC分类号: G06F9/524 G06F12/08

    摘要: In a parallel processing system with speculative execution, conflict checking occurs in a directory lookup of a cache memory that is shared by all processors. In each case, the same physical memory address will map to the same set of that cache, no matter which processor originated that access. The directory includes a dynamic reader set encoding, indicating what speculative threads have read a particular line. This reader set encoding is used in conflict checking. A bitset encoding is used to specify particular threads that have read the line.

    摘要翻译: 在具有推测性执行的并行处理系统中,冲突检查发生在所有处理器共享的高速缓冲存储器的目录查找中。 在每种情况下,相同的物理内存地址将映射到同一组缓存,无论哪个处理器发起该访问。 该目录包括一个动态阅读器集编码,指示什么推测线程读取了一条特定的行。 这种读写器编码用于冲突检查。 位组编码用于指定已读取行的特定线程。